Follow the steps below to install the required packages and set up the environment.
Open your terminal and clone the repository using the following command:
git clone https://github.com/behavior-in-the-wild/ad-memorability.git
Create and activate the Conda environment:
conda create -n admem python=3.10 -y
conda activate admem
pip install --upgrade pip # Enable PEP 660 support
pip install -e .
pip install ninja
pip install flash-attn --no-build-isolation
pip install opencv-python
pip install numpy==1.26.4
Create directories and download the required models:
mkdir model_zoo
mkdir model_zoo/LAVIS
cd ./model_zoo/LAVIS
wget https://storage.googleapis.com/sfr-vision-language-research/LAVIS/models/BLIP2/eva_vit_g.pth
cd path/to/ad-memorability
mkdir work_dirs
cd work_dirs
git lfs install
git clone https://huggingface.co/YanweiLi/llama-vid-13b-full-224-video-fps-1
cd path/to/ad-memorability
mkdir data
cd ./data
LAMBDA videos are available on huggingface!
LAMBDA sampled frames coming soon!
- Create .npy files of your videos. A sample file is given in the sample folder.
- Store them in as ./data/videos/video_scenes/{id}.npy
-
Install the desired version of DeepSpeed.
-
Update the train.sh script: Replace the --data_path argument with one of the following options, depending on your training task: lambda_bs_train.json lambda_combine_train.json lambda_cs_train.json
-
If you don't have the frames and want to train directly on the video @ 1 FPS, reformat your data as given here and replace the --data_path argument with lambda_train.json.
-
If you're training on your own dataset, create a train.json file. Each entry should contain an id and a conversation. You can use lambda_bs_train.json as a reference for formatting.
bash train.sh
- For predicting memorability scores:
bash eval_bs.sh
- For generating memorable videos:
bash eval_cs.sh
If you find this repo useful for your research, please consider citing the paper
@misc{s2024longtermadmemorabilityunderstanding,
title={Long-Term Ad Memorability: Understanding and Generating Memorable Ads},
author={Harini S I au2 and Somesh Singh and Yaman K Singla and Aanisha Bhattacharyya and Veeky Baths and Changyou Chen and Rajiv Ratn Shah and Balaji Krishnamurthy},
year={2024},
eprint={2309.00378},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2309.00378}}
We would like to thank the following repos for their great work:
- This work is built upon the LLaMA-VID
- This work is built upon the LLaVA.
- This work utilizes LLMs from Vicuna.
- This work utilizes pretrained weights from InstructBLIP.
- We perform video-based evaluation from Video-ChatGPT.
The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA-VID,LLaVA, LLaMA, Vicuna and GPT-4.