This repo made a few modifications to support both VQA-CP and VQA datasets. Please find more details at the original LXMERT code.
We mainly use this repo to implement our paper - Loss Re-scaling VQA: Revisiting the Language Prior Problem from a Class-imbalance View.
The pre-trained model (870 MB) is available at http://nlp.cs.unc.edu/data/model_LXRT.pth, and can be downloaded with:
mkdir -p snap/pretrained
wget https://nlp.cs.unc.edu/data/model_LXRT.pth -P snap/pretrained
-
Please make sure the LXMERT pre-trained model is either downloaded or pre-trained.
-
Note that we DO NOT use the re-distributed json file provided by LXMERT authors. We use the official splits in this repo. Make sure that these data are in the right position according to the
src/config.py
! -
Download faster-rcnn features for MS COCO train2014 (17 GB) and val2014 (8 GB) images (VQA 2.0 is collected on MS COCO dataset).
mkdir -p data/mscoco_imgfeat wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/train2014_obj36.zip -P data/mscoco_imgfeat unzip data/mscoco_imgfeat/train2014_obj36.zip -d data/mscoco_imgfeat && rm data/mscoco_imgfeat/train2014_obj36.zip wget https://nlp.cs.unc.edu/data/lxmert_data/mscoco_imgfeat/val2014_obj36.zip -P data/mscoco_imgfeat unzip data/mscoco_imgfeat/val2014_obj36.zip -d data && rm data/mscoco_imgfeat/val2014_obj36.zip
-
We convert the image features from
tsv
toh5
first:python src/tools/detection_feature_converter.py
We fold the train and val image features together for supporting both VQA-CP and VQA.
-
Process answers and question types:
python src/tools/compute_softscore.py
-
Fine-tuning on VQA-CP or VQA (set this on the
src/config.py
):PYTHONPATH=$PYTHONPATH:./src \ python -u src/tasks/vqa.py \ --train train --valid val \ --llayers 9 --xlayers 5 --rlayers 5 \ --loadLXMERTQA snap/pretrained/model \ --batchSize 32 --optim bert --lr 5e-5 --epochs 4 \ --tqdm --name vqa-cp-test
-
Evaluating on the validation set (according to the official implementation):
PYTHONPATH=$PYTHONPATH:./src \ python -u src/tasks/vqa.py \ --train train --test val \ --llayers 9 --xlayers 5 --rlayers 5 \ --loadLXMERTQA snap/pretrained/model \ --batchSize 32 --load output/vqa-cp-test.pth \ --tqdm
python acc_per_type.py output/val_predict.json
Loss Function | Model | Y/N | Num. | Others | All |
---|---|---|---|---|---|
BCE | LXMERT | 46.70 | 27.14 | 61.20 | 51.78 |
BCE | LXMERT+Ours | 79.77 | 59.06 | 61.41 | 66.40 |
CE | LXMERT | - | - | - | 58.07 |
CE | LXMERT+Ours | - | - | - | 69.37 |
If you found this repo useful, please cite the following paper:
@article{rescale-vqa,
title={Loss Re-scaling VQA: Revisiting the Language Prior Problem from a Class-imbalance View},
author={Guo, Yangyang and Nie, Liqiang and Cheng, Zhiyong and Tian, Qi and Zhang, Min},
journal={IEEE TIP},
year={2021}
}