Skip to content

xin-ran-w/ControllableObjectDescription

Repository files navigation

Detailed Object Description with Controllable Dimensions

Code for our paper "Detailed Object Description with Controllable Dimensions".


The diagram of our description refinement pipeline, Dimension Tailor

Contents

Install

  1. Clone this repository and navigate to Attribute Polisher folder
git clone https://anonymous.4open.science/r/145501C3/
cd ObjectDescription
  1. Install Package
conda create -n tailor python=3.10 -y
conda activate tailor
pip install --upgrade pip  # enable PEP 660 support
pip install -r requirements.txt

Weights

We used BLIP-itm and Llama 3 to score and process the descriptions.

Please download the weights of BLIP-itm from https://huggingface.co/Salesforce/blip-itm-large-coco/tree/main.

For Llama 3, please follow the guidance of this to download the weights of Meta-Llama-3-8B-Instruct.

Inference

You can use any image and its corresponding caption to refine with our model. Here, we used images from COCO2017 dataset for experiments. LLaVA generation caption were performed in advance, and the results were saved in a JSON file.

  1. Download the COCO2017 val dataset

You can use this to download the COCO2017 val dataset.

  1. Detailed Object Descriptions Inference
CUDA_VISIBLE_DEVICES=$gpu python description_refiner.py \
    --llama_model_dir $HUGGINGFACE_DIR/Meta-Llama-3-8B-Instruct \
    --blip_model_dir $HUGGINGFACE_DIR/blip-itm-large-coco \
    --image_root $DATASET_DIR/COCO/val2017 \
    --f_description LLaVA/llava_detailed_answer.json \
    --f_result results/llava_refined_detailed_answer.json \
    --f_ovad_anno ovad/ovad2000.json \
    --complete True \
    --complete_threshold 0.3 \
    --l_threshold 0.1 \
    --seed 33 \
    --f_obj_dim_attr_comb pred_obj_dim_attr_comb.json
  1. Controllable Object Descriptions Inference
CUDA_VISIBLE_DEVICES=$gpu python description_refiner.py \
    --llama_model_dir $HUGGINGFACE_DIR/Meta-Llama-3-8B-Instruct \
    --blip_model_dir $HUGGINGFACE_DIR/blip-itm-large-coco \
    --image_root $DATASET_DIR/COCO/val2017 \
    --f_description LLaVA/llava_ctrl_detailed_answer.json \
    --f_result results/llava_refined_ctrl_detailed_answer.json \
    --f_ovad_anno ovad/ovad2000.json \
    --refine_control True \
    --complete True \
    --complete_threshold 0.3 \
    --l_threshold 0.2 \
    --seed 33 \
    --f_obj_dim_attr_comb pred_obj_dim_attr_comb.json

Acknowledgement

  • OVAD: Some of our code borrows from OVAD, and you can find more information here.

Related models

Here's the model we tested in our paper:

Citation

If this work is helpful to you please consider citing the article below.

@misc{wang2024detailedobjectdescriptioncontrollable,
      title={Detailed Object Description with Controllable Dimensions}, 
      author={Xinran Wang and Haiwen Zhang and Baoteng Li and Kongming Liang and Hao Sun and Zhongjiang He and Zhanyu Ma and Jun Guo},
      year={2024},
      eprint={2411.19106},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.19106}, 
}

About

A training-free pipeline to control dimension details in object description.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages