Detailed Object Description with Controllable Dimensions

Code for our paper "Detailed Object Description with Controllable Dimensions".

The diagram of our description refinement pipeline, Dimension Tailor

Install

Clone this repository and navigate to Attribute Polisher folder

git clone https://anonymous.4open.science/r/145501C3/
cd ObjectDescription

Install Package

conda create -n tailor python=3.10 -y
conda activate tailor
pip install --upgrade pip  # enable PEP 660 support
pip install -r requirements.txt

Weights

We used BLIP-itm and Llama 3 to score and process the descriptions.

Please download the weights of BLIP-itm from https://huggingface.co/Salesforce/blip-itm-large-coco/tree/main.

For Llama 3, please follow the guidance of this to download the weights of Meta-Llama-3-8B-Instruct.

Inference

You can use any image and its corresponding caption to refine with our model. Here, we used images from COCO2017 dataset for experiments. LLaVA generation caption were performed in advance, and the results were saved in a JSON file.

Download the COCO2017 val dataset

You can use this to download the COCO2017 val dataset.

Detailed Object Descriptions Inference

CUDA_VISIBLE_DEVICES=$gpu python description_refiner.py \
    --llama_model_dir $HUGGINGFACE_DIR/Meta-Llama-3-8B-Instruct \
    --blip_model_dir $HUGGINGFACE_DIR/blip-itm-large-coco \
    --image_root $DATASET_DIR/COCO/val2017 \
    --f_description LLaVA/llava_detailed_answer.json \
    --f_result results/llava_refined_detailed_answer.json \
    --f_ovad_anno ovad/ovad2000.json \
    --complete True \
    --complete_threshold 0.3 \
    --l_threshold 0.1 \
    --seed 33 \
    --f_obj_dim_attr_comb pred_obj_dim_attr_comb.json

Controllable Object Descriptions Inference

CUDA_VISIBLE_DEVICES=$gpu python description_refiner.py \
    --llama_model_dir $HUGGINGFACE_DIR/Meta-Llama-3-8B-Instruct \
    --blip_model_dir $HUGGINGFACE_DIR/blip-itm-large-coco \
    --image_root $DATASET_DIR/COCO/val2017 \
    --f_description LLaVA/llava_ctrl_detailed_answer.json \
    --f_result results/llava_refined_ctrl_detailed_answer.json \
    --f_ovad_anno ovad/ovad2000.json \
    --refine_control True \
    --complete True \
    --complete_threshold 0.3 \
    --l_threshold 0.2 \
    --seed 33 \
    --f_obj_dim_attr_comb pred_obj_dim_attr_comb.json

Acknowledgement

OVAD: Some of our code borrows from OVAD, and you can find more information here.

Related models

Here's the model we tested in our paper:

Citation

If this work is helpful to you please consider citing the article below.

@misc{wang2024detailedobjectdescriptioncontrollable,
      title={Detailed Object Description with Controllable Dimensions}, 
      author={Xinran Wang and Haiwen Zhang and Baoteng Li and Kongming Liang and Hao Sun and Zhongjiang He and Zhanyu Ma and Jun Guo},
      year={2024},
      eprint={2411.19106},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.19106}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
LLaVA		LLaVA
fadata		fadata
figs		figs
ovad		ovad
.DS_Store		.DS_Store
LICENSE.txt		LICENSE.txt
README.md		README.md
check_dataset_prompts.py		check_dataset_prompts.py
check_prompts.py		check_prompts.py
description_refiner.py		description_refiner.py
dim_checker.py		dim_checker.py
dim_eraser.py		dim_eraser.py
filter_attributes.py		filter_attributes.py
generate_context.py		generate_context.py
gt_obj_dim_attr_comb.json		gt_obj_dim_attr_comb.json
pred_obj_dim_attr_comb.json		pred_obj_dim_attr_comb.json
refine_prompts.py		refine_prompts.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detailed Object Description with Controllable Dimensions

Contents

Install

Weights

Inference

Acknowledgement

Related models

Citation

About

Releases

Packages

Contributors 2

Languages

License

xin-ran-w/ControllableObjectDescription

Folders and files

Latest commit

History

Repository files navigation

Detailed Object Description with Controllable Dimensions

Contents

Install

Weights

Inference

Acknowledgement

Related models

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages