This repo is the official implementation for paper: Balanced Classification: A Unified Framework for Long-Tailed Object Detection (Accepted by IEEE Transactions on Multimedia).
2023-08-19: We upload visualizations of different methods to this repo!
2023-08-15: We update the downloading urls of annotations for LVIS dataset (see issue #1), which are expired.
2023-08-14: Our paper receives publicity from the 极市平台!
2023-08-09: Our paper is reported and interpreted by CVHub!
2023-08-03: Our paper is accepted by IEEE Transactions on Multimedia (TMM) and to be published!
- Integrate other SOTA methods to this repo
- Release pretrained Faster R-CNN detector with Swin Transformer as backbone
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories. In this paper, we contend that the learning bias originates from two factors: 1) the unequal competition arising from the imbalanced distribution of foreground categories, and 2) the lack of sample diversity in tail categories. To tackle these issues, we introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of category distribution disparities and dynamic intensification of sample diversities in a synchronized manner. Specifically, a novel foreground classification balance loss (FCBL) is developed to ameliorate the domination of head categories and shift attention to difficult-to-differentiate categories by introducing pairwise class-aware margins and auto-adjusted weight terms, respectively. This loss prevents the over-suppression of tail categories by dominant head categories in the context of unequal competition. Moreover, we propose a dynamic feature hallucination module (FHM), which expands the representation of tail categories in the feature space by synthesizing hallucinated samples to introduce additional data variances. In this divide-and-conquer approach, BACL sets the new state-of-the-art on the challenging LVIS benchmark with a decoupled training pipeline, surpassing vanilla Faster R-CNN with ResNet-50-FPN by 5.8% AP and 16.1% AP for overall and tail categories. Extensive experiments demonstrate that BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
We tested on the following settings:
- python 3.8
- cuda 11.0
- pytorch 1.7.0
- torchvision 0.4.0
- mmcv 1.2.7
We provide a Dockerfile to build an image. Ensure that you are using docker version >=19.03。
# build an image with PyTorch 1.7.0, CUDA 11.0
# If you want to use another version, just modify the Dockerfile
docker build -t mmdetection docker/
Run it with:
docker run --gpus all --shm-size=8g -it -v {DATA_DIR}:/mmdetection/data mmdetection
# Make sure you are in dir BACL
mkdir data
cd data
mkdir lvis_v0.5
mkdir lvis_v1
- If you already have COCO2017 dataset, it will be great. Link
train2017
andval2017
folders under folderlvis_v0.5
andlvis_v1
. - If you do not have COCO2017 dataset, please download:
COCO train set and
COCO val set
and unzip these files and mv them under folder
lvis_v0.5
andlvis_v1
.
-
Download lvis_v0.5 annotations: lvis_v0.5_train_ann and lvis_v0.5_val_ann,
-
Unzip all the files and put them under
lvis_v0.5/annotations
; -
Download lvis_v1 annotations: lvis_v1_train_ann and lvis_v1_val_ann,
-
Unzip all the files and put them under
lvis_v1/annotations
.
After all these operations, the folder data
should be like this:
data
├── lvis_v0.5
│ ├── annotations
│ │ ├── lvis_v0.5_train.json
│ │ ├── lvis_v0.5_val.json
│ ├── train2017
│ │ ├── 000000100582.jpg
│ │ ├── 000000102411.jpg
│ │ ├── ......
│ └── val2017
│ ├── 000000062808.jpg
│ ├── 000000119038.jpg
│ ├── ......
├── lvis_v1
│ ├── annotations
│ │ ├── lvis_v1_train.json
│ │ ├── lvis_v1_val.json
│ ├── train2017
│ │ ├── 000000100582.jpg
│ │ ├── 000000102411.jpg
│ │ ├── ......
│ └── val2017
│ ├── 000000062808.jpg
│ ├── 000000119038.jpg
│ ├── ......
Use the following commands to train a model for lvis_v0.5.
# use decoupled training pipeline:
# 1. representation learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_representation_faster_rcnn_r50_fpn_1x_lvis_v0.5.py 8
# 2. classifier learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v0.5.py 8
Use the following commands to train a model for lvis_v1.
# use decoupled training pipeline:
# 1. representation learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_representation_faster_rcnn_r50_fpn_1x_lvis_v1.py 8
# 2. classifier learning stage of BACL
./tools/dist_train.sh configs/bacl/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v1.py 8
Important: The default learning rate in config files is for 8 GPUs and 2 img/gpu (batch size = 8*2 = 16). According to the Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=0.01 for 4 GPUs * 2 img/gpu and lr=0.08 for 16 GPUs * 4 img/gpu. (Cited from mmdetection.)
Use the following commands to test a trained model.
./tools/dist_test.sh \
${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}]
$RESULT_FILE
: Filename of the output results in pickle format. If not specified, the results will not be saved to a file.$EVAL_METRICS
: Items to be evaluated on the results.bbox
for bounding box evaluation only.bbox segm
for bounding box and mask evaluation.
For example (assume that you have finished the training of BACL models.):
- To evaluate the trained BACL model with Faster R-CNN R50-FPN for object detection:
./tools/dist_test.sh configs/bacl/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v0.5.py \
./work_dirs/bacl_classifier_faster_rcnn_r50_fpn_mstrain_1x_lvis_v0.5/epoch_12.pth 8 \
--eval bbox
For your convenience, we provide the following trained models. All models are trained with 16 images in a mini-batch.
Method | Backbone | Dataset | box AP | Model |
---|---|---|---|---|
baseline | R50_FPN | LVIS v0.5 | 22.0 | config / model |
BACL | R50_FPN | LVIS v0.5 | 27.8 | config / model |
baseline | R50_FPN | LVIS v1 | 19.3 | config / model |
BACL | R50_FPN | LVIS v1 | 26.1 | config / model |
baseline | R101_FPN | LVIS v0.5 | 23.3 | config / model |
BACL | R101_FPN | LVIS v0.5 | 29.4 | config / model |
baseline | R101_FPN | LVIS v1 | 20.9 | config / model |
BACL | R101_FPN | LVIS v1 | 27.8 | config / model |
[0] All results are obtained with a single model and without any test time data augmentation such as multi-scale, flipping and etc..
[1] Refer to more details in config files in config/bacl/
.
If you find it useful in your research, please consider citing our paper as follows:
@misc{qi2023balanced,
title={Balanced Classification: A Unified Framework for Long-Tailed Object Detection},
author={Tianhao Qi and Hongtao Xie and Pandeng Li and Jiannan Ge and Yongdong Zhang},
year={2023},
eprint={2308.02213},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Thanks MMDetection team for the wonderful open source project!