A curated list of prompt/adapter learning methods for vision-language models (e.g., CLIP).
- If you know that some papers published in top conferences (CVPR, ICCV, ECCV, ICML, NeurlPS, ICLR) or journals (TPAMI, IJCV, TIP) have not been included in this list, please feel free to contact me at any time, either by sending an email (zhengli97[at]qq.com) or submitting an issue.
- We would appreciate more people joining us in maintaining this list of papers.
- Note that papers without open-source code are not recommended.
Use text-based prompts/adapters.
Use image-based prompts/adapters.
Use text- and image-based prompts/adapters.
- A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models. [Paper]
- Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey. [Paper]
Base-to-Novel Generalization. (ViT-B/16 CLIP)
Methods | Pub | Base | Novel | HM (main) | Code |
---|---|---|---|---|---|
CLIP | ICML 21 | 69.34 | 74.22 | 71.70 | Link |
CoOp | IJCV 22 | 82.69 | 63.22 | 71.66 | Link |
CoCoOp | CVPR 22 | 80.47 | 71.69 | 75.83 | Link |
ProDA | CVPR 22 | 81.56 | 72.30 | 76.65 | Link |
KgCoOp | CVPR 23 | 80.73 | 73.60 | 77.00 | Link |
RPO | ICCV 23 | 81.13 | 75.00 | 77.78 | Link |
MaPLe | CVPR 23 | 82.28 | 75.14 | 78.55 | Link |
DePT | CVPR 24 | 83.62 | 75.04 | 79.10 | Link |
TCP | CVPR 24 | 84.13 | 75.36 | 79.51 | Link |
MMA | CVPR 24 | 83.20 | 76.80 | 79.87 | Link |
PromptSRC | ICCV 23 | 84.26 | 76.10 | 79.97 | Link |
HPT | AAAI 24 | 84.32 | 76.86 | 80.23 | Link |
CoPrompt | ICLR 24 | 84.00 | 77.23 | 80.48 | Link |
CasPL | ECCV 24 | 86.11 | 79.54 | 82.69 | Link |
PromptKD | CVPR 24 | 86.96 | 80.73 | 83.73 | Link |
Table 1. Average results on 11 datasets. (Only works with open-source code will be listed.)
CoOp
Learning to Prompt for Vision-Language Models. IJCV 2022.
[Paper] [Code]CoCoOp
Conditional Prompt Learning for Vision-Language Models. CVPR 2022.
[Paper] [Code]ProDA
Prompt Distribution Learning. CVPR 2022.
[Paper] [Code]VPT
Visual Prompt Tuning. ECCV 2022.
[Paper] [Code]VP
Exploring Visual Prompts for Adapting Large-Scale Models. Arxiv 2022.
[Paper] [Code]
MaPLe
MaPLe: Multi-modal Prompt Learning. CVPR 2023.
[Paper] [Code]KgCoOp
Visual-Language Prompt Tuningx with Knowledge-guided Context Optimization. CVPR 2023.
[Paper] [Code]LASP
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language Models. CVPR 2023.
[Paper]DAM-VP
Diversity-Aware Meta Visual Prompting. CVPR 2023.
[Paper] [Code]TaskRes
Task Residual for Tuning Vision-Language Models. CVPR 2023.
[Paper] [Code]RPO
Read-only Prompt Optimization for Vision-Language Few-shot Learning. ICCV 2023.
[Paper] [Code]KAPT
Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models. ICCV 2023.
[Paper]CuPL
What does a platypus look like? Generating customized prompts for zero-shot image classification. ICCV 2023.
[Paper] [Code]ProGrad
Prompt-aligned Gradient for Prompt Tuning. ICCV 2023.
[Paper][Code]PromptSRC
Self-regulating Prompts: Foundational Model Adaptation without Forgetting. ICCV 2023.
[Paper] [Code]DeFo
Learning to Decompose Visual Features with Latent Textual Prompts. ICLR 2023.
[Paper]PLOT
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models. ICLR 2023.
[Paper] [Code]POMP
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition. NeurIPS 2023.
[Paper] [Code]
MetaPrompt
Learning Domain Invariant Prompt for Vision-Language Models. TIP 2024.
[Paper]ProVP
Progressive Visual Prompt Learning with Contrastive Feature Re-formation. IJCV 2024.
[Paper] [Code]SA2VP
SA2VP: Spatially Aligned-and-Adapted Visual Prompt. AAAI 2024.
[Paper] [Code]HPT
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models. AAAI 2024.
[Paper] [Code]LaViP
LaViP: Language-Grounded Visual Prompts. AAAI 2024.
[Paper]CoPrompt
Consistency-guided Prompt Learning for Vision-Language Models. ICLR 2024.
[Paper] [Code]ProText
Learning to Prompt with Text Only Supervision for Vision-Language Models. arxiv 24.
[Paper] [Code]PromptKD
PromptKD: Unsupervised Prompt Distillation for Vision Language Models. CVPR 2024.
[Paper] [Code]DePT
DePT: Decoupled Prompt Tuning. CVPR 2024.
[Paper] [Code]ArGue
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models. CVPR 2024.
[Paper]TCP
TCP: Textual-based Class-aware Prompt tuning for Visual-Language Model. CVPR 2024.
[Paper] [Code]MMA
MMA: Multi-Modal Adapter for Vision-Language Models. CVPR 2024.
[Paper] [Code]KDPL
Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation. ECCV 2024.
[Paper] [Code]CoCoLe
Conceptual Codebook Learning for Vision-Language Models. ECCV 2024.
[Paper]CasPL
Cascade Prompt Learning for Vision-Language Model Adaptation ECCV 2024.
[Paper] [Code]AWT
AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation. NeurIPS 2024.
[Paper] [Code]
CPT
CPT: Colorful Prompt Tuning for pre-trained vision-language models Arxiv 2021.
[Paper] [Code]DetPro
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model. CVPR 2022.
[Paper] [Code]PromptDet
PromptDet: Towards Open-vocabulary Detection using Uncurated Images. ECCV 2022.
[Paper] [Code]- Visual Prompting via Image Inpainting. NeurIPS 2022.
[Paper] OVSeg
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP. CVPR 2023.
[Paper] [Code]LoGoPrompt
LoGoPrompt: Synthetic Text Images Can Be Good Visual Prompts for Vision-Language Models. ICCV 2023.
[Paper]RedCircle
What does CLIP know about a red circle? Visual prompt engineering for VLMs. ICCV 2023.
[Paper]]FGVP
Fine-Grained Visual Prompting. NeurIPS 2023.
[Paper] [Code]SoM
Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. Arxiv 2023.
[Paper] [Code]Alpha-CLIP
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want. CVPR 2024.
[Paper] [Code]ViP-LLaVA
Making Large Multimodal Models Understand Arbitrary Visual Prompts. CVPR 2024.
[Paper] [Code]SSC
Segment, Select, Correct: A Framework for Weakly-Supervised Referring Segmentation. ECCV 2024.
[Paper] [Code]
Methods | Pub | ImageNet | -A | -V2 | -R | -S | Avg. (main) | Code |
---|---|---|---|---|---|---|---|---|
CoOp | IJCV 22 | 71.51 | 49.71 | 64.20 | 75.21 | 47.99 | 59.28 | Link |
CoCoOp | CVPR 22 | 71.02 | 50.63 | 64.07 | 76.18 | 48.75 | 59.91 | Link |
TPT | NeurIPS 22 | 68.98 | 54.77 | 63.45 | 77.06 | 47.94 | 60.81 | Link |
TPT+CoOp | NeurIPS 22 | 73.61 | 57.95 | 66.83 | 77.27 | 49.29 | 62.84 | Link |
PromptAlign | NeurIPS 23 | --- | 59.37 | 65.29 | 79.33 | 59.37 | 63.55 | Link |
TPS+CoOp | Arxiv 24 | 73.73 | 60.49 | 66.84 | 77.44 | 49.08 | 65.52 | Link |
RLCF | ICLR 24 | 73.23 | 65.45 | 69.77 | 83.35 | 54.74 | 68.33 | Link |
RLCF+CoOp | ICLR 24 | 76.05 | 69.74 | 70.62 | 84.51 | 56.49 | 70.34 | Link |
Table 2. Test-time prompt tuning methods on OOD data.
TPT
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models. NeurIPS 2022.
[Paper] [Code]SwapPrompt
SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS 2023.
[Paper]PrompAlign
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization. NeurIPS 2023.
[Paper] [Code]TPS
Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models. Arxiv 2024.
[Paper] [Code]RLCF
Test-time Adaptation with CLIP reward for zero-shot generalization in Vision-Language Models. ICLR 2024.
[Paper] [Code]InTTA
Invariant Test-Time Adaptation for Vision-Language Model Generalization. Arxiv 2024.
[Paper] [Code]
CLIP-Adapter
CLIP-Adapter: Better Vision-Language Models with Feature Adapters. Arxiv 2021.
[Paper] [Code]Tip-Adapter
Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification. ECCV 2022.
[Paper] [Code]APE
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement. ICCV 2023.
[Paper] [Code]CaFo
Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners. CVPR 2023.
[Paper] [Code]Meta-Adapter
Meta-Adapter: An Online Few-shot Learner for Vision-Language Model. NeurIPS 2023.
[Paper] [Code]
Efficient-Prompt
Prompting visual-language models for efficient video understanding. ECCV 2022.
[Paper] [Code]InTTA
Expanding Language-Image Pretrained Models for General Video Recognition. ECCV 2022.
[Paper] [Code]RePro
Compositional Prompt Tuning with Motion Cues for Open-vocabulary Video Relation Detection. ICLR 2023.
[Paper] [Code]
L2P
Learning to Prompt for Continual Learning. CVPR 2022.
[Paper] [Code]DualPrompt
DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning. ECCV 2022.
[Paper] [Code]EvoPrompt
Evolving Parameterized Prompt Memory for Continual Learning. AAAI 2024.
[Paper]CPrompt
Consistent Prompting for Rehearsal-Free Continual Learning. CVPR 2024.
[Paper] [Code]DIKI
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models. ECCV 2024.
[Paper] [Code]