Xin Xu*, Tianyi Xiong*, Zheng Ding and Zhuowen Tu (*Equal Contribution)
This is the repository for MasQCLIP for Open-Vocabulary Universal Image Segmentation, published at ICCV 2023.
[Project Page
] [Paper
]
Please refer to dataset preparation.
Please refer to installation instructions for environment setup.
In the base-novel setting, the model is trained on the base classes and tested on novel classes. To train a model under base-novel setting (on COCO-instance), run
# Progressive Distillation
python train_net.py --num-gpus 8 --config-file configs/base-novel/coco-instance/teacher_R50_100k_base48.yaml OUTPUT_DIR "${work_dir}/teacher"
python train_net.py --num-gpus 4 --config-file configs/base-novel/coco-instance/student_R50_30k_base48.yaml OUTPUT_DIR "${work_dir}/student" MODEL.WEIGHTS "${work_dir}/teacher/model_final.pth"
# MasQ-Tuning
python train_net.py --num-gpus 4 --config-file configs/base-novel/coco-instance/masqclip_R50_bs4_10k_base48.yaml OUTPUT_DIR "${work_dir}/masq" MODEL.WEIGHTS "${work_dir}/student/model_final.pth"
To evaluate a model's performance, use
python train_net.py --eval-only --num-gpus 4 --config-file configs/base-novel/coco-instance/masqclip_R50_bs4_10k_instance65.yaml OUTPUT_DIR "${work_dir}/generalized" MODEL.WEIGHTS "${work_dir}/masq/model_final.pth"
In the cross-dataset setting, the model is trained on one dataset e.g., COCO, and tested on another dataset e.g., ADE20K. To train a model under cross-dataset setting (on COCO-panoptic), run
# Progressive Distillation
python train_net.py --num-gpus 8 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/teacher_R50_200k.yaml OUTPUT_DIR "${work_dir}/train_coco/teacher"
python train_net.py --num-gpus 4 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/student_R50_30k.yaml OUTPUT_DIR "${work_dir}/train_coco/student" MODEL.WEIGHTS "${work_dir}/train_coco/teacher/model_final.pth"
# MasQ-Tuning
python train_net.py --num-gpus 4 --config-file configs/cross-dataset/coco-train/panoptic-segmentation/masqclip_R50_bs4_10k.yaml OUTPUT_DIR "${work_dir}/train_coco/masq" MODEL.WEIGHTS "${work_dir}/train_coco/student/model_final.pth"
To evaluate a model's performance, use
model_path="${work_dir}/train_coco/masq/model_final.pth"
# For example, to evaluate on ADE20K-150, use
python train_net.py --eval-only --num-gpus 4 --config-file configs/cross-dataset/test/ade20k-150/panoptic-segmentation/masqclip_R50_bs4_10k.yaml OUTPUT_DIR "${work_dir}/test_ade20k_150" MODEL.WEIGHTS $model_path
Pre-trained models can be found in this Google Drive link.
The code is based on MaskCLIP, Mask2Former and CLIP.
Please consider citing MasQCLIP and MaskCLIP if you find the codes useful:
@inproceedings{xu2023masqclip,
author = {Xu, Xin and Xiong, Tianyi and Ding, Zheng and Tu, Zhuowen},
title = {MasQCLIP for Open-Vocabulary Universal Image Segmentation},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {887-898},
}
@inproceedings{ding2023maskclip,
author = {Ding, Zheng and Wang, Jieke and Tu, Zhuowen},
title = {Open-Vocabulary Universal Image Segmentation with MaskCLIP},
booktitle = {International Conference on Machine Learning},
year = {2023},
}