One-Shot Open Affordance Learning with Foundation Models

Abstract

We introduce One-shot Open Affordance Learning (OOAL), where a model is trained with just one example per base object category, but is expected to identify novel objects and affordances. While vision-language models excel at recognizing novel objects and scenes, they often struggle to understand finer levels of granularity such as affordances. To handle this issue, we conduct a comprehensive analysis of existing foundation models, to explore their inherent understanding of affordances and assess the potential for data-limited affordance learning. We then propose a vision-language framework with simple and effective designs that boost the alignment between visual features and affordance text embeddings. Experiments on two affordance segmentation benchmarks show that the proposed method outperforms state-of-the-art models with less than 1% of the full training data, and exhibits reasonable generalization capability on unseen objects and affordances.

Usage

1. Requirements

Code is tested under Pytorch 1.12.1, python 3.7, and CUDA 11.3

pip install -r requirements.txt

2. Dataset

Download the AGD20K dataset from [ Google Drive | Baidu Pan (g23n) ] .

Download the one-shot data Google Drive (you can annotate your own one-shot data in the same format).

Put the data in the dataset folder with following structure.

dataset
├── one-shot-seen
├── one-shot-unseen
├── Seen
└── Unseen

3. Train and Test

Run following commands to start training or testing:

python train.py
python test.py --model_file <PATH_TO_MODEL>

Our pretrained model can be downloaded from Google Drive.

Limitations

Text prompt learning improves the performance on unseen objects but diminishes the framework’s ability to handle unseen affordances. For open-vocabulary affordance usage, we suggest either removing text prompts or combing learnable prompts with manually designed prompts.
The performance is notably influenced by the selection of the one-shot example. Instances with heavy occlusion or inferior lighting conditions can impact the learning performance.

Citation

@inproceedings{li:ooal:2024,
  title = {One-Shot Open Affordance Learning with Foundation Models},
  author = {Li, Gen and Sun, Deqing and Sevilla-Lara, Laura and Jampani, Varun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}

Anckowledgement

Some code is borrowed from CoOp and ZegCLIP. Thanks for their great work!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
data		data
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

One-Shot Open Affordance Learning with Foundation Models

Abstract

Usage

1. Requirements

2. Dataset

3. Train and Test

Limitations

Citation

Anckowledgement

About

Releases

Packages

Languages

License

Reagan1311/OOAL

Folders and files

Latest commit

History

Repository files navigation

One-Shot Open Affordance Learning with Foundation Models

Abstract

Usage

1. Requirements

2. Dataset

3. Train and Test

Limitations

Citation

Anckowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages