[AAAI-24] VVS : Video-to-Video Retrieval
With Irrelevant Frame Suppression [Project Page]

Official Pytorch Implementation of VVS: Video-to-Video Retrieval With Irrelevant Frame Suppression

Paper: Video-to-Video Retrieval With Irrelevant Frame Suppression

⏩ For a fast evaluation

For a fast verification, a simple evaluation protocol is guided as follows.

The process of fast evaluation for VVS on FIVR5K can be summarized into 3 steps:
1. Download the data from an Google Drive link.
2. Please locate the data as below
  - Place the pca.pkl inside a VVS/data/vcdb folder
  - Place the fivr5k_resnet50_l4imac inside a VVS/features folder
  - Place the table_benchmark_dim_3840 inside a VVS/jobs folder
3. Run the command to evaluate the VVS on FIVR5K
  - bash experiments/review/fast_evaluation_fivr5k.sh

🎞 Data Preparation

Getting the Dataset

Download the raw video dataset you want. The supported options are:
- CC_WEB_VIDEO
- VCDB
- FIVR
- EVVE
You should contact the author about the missing video that occurs during the download process.
The raw video data should be located like the structure below.
But preparing raw video is not essential. We provide the features, we used.

├── videos
   ├── fivr
      └── videos
         ├── video_1
         ├── video_2
         └── ...
   ├── cc_web
      └── videos
         ├── video_1
         ├── video_2
         └── ...
   ├── evve
      └── videos
         ├── video_1
         ├── video_2
         └── ...

➡️ Getting the Feature

For convenience, we provide the features we used. You can find them here.
- CC_WEB_VIDEO
- VCDB
- FIVR
- EVVE
Before running, Place the features inside a VVS/features folder.

├── features
   └── vcdb_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...
   └── fivr_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...
   └── cc_web_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...
    └── evve_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...

🔨 Prerequisites

Recommended Environment

OS : Ubuntu 18.04
CUDA : 10.2
Python 3.7
Pytorch 1.8.1 Torchvision 0.9.1
GPU : NVIDA-Tesla V100(32G)

Required packages are listed in environment.yaml. You can install by running:

conda env create -f environment.yaml
conda activate VVS

If your GPU only support above CUDA 11.0, you can install by running:

conda env create -f environment_cuda11.yaml
conda activate VVS

🔄 Running

Before running, Place the pca.pkl inside a VVS/data/vcdb folder or you can calculate PCA weight directly python cal_pca.py.
You can easily evaluate the model by running the provided script.

Please follow the instructions in README.md for training and evaluation

🔑 Models

We provide checkpoints, to succesfully reproduce our benchmark experiments.

You can run the script according to the feature dimension.

Dataset	script
FIVR5K	`$ bash experiments/main_script/train/table_benchmark/eval_benchmark_fivr5k_dim_{dim}.sh`
FIVR200K	`$ bash experiments/main_script/train/table_benchmark/eval_benchmark_fivr200k_dim_{dim}.sh`
CC_WEB_VIDEO	`$ bash experiments/main_script/train/table_benchmark/eval_benchmark_cc_web_dim_{dim}.sh`

📑 Experiments

FIVR200K

Usage	Method	train dataset	DSVR	CSVR	ISVR
frame	TN	VCDB	0.724	0.699	0.589
	DP	VCDB	0.775	0.740	0.632
	TCA_sym	VCDB	0.728	0.698	0.592
	TCA_f	VCDB	0.877	0.830	0.703
	SCFV+NIP₂₅₆	VCDB	0.819	0.764	0.622
	SCFV+TNIP₂₅₆	VCDB	0.896	0.833	0.674
	ViSiL_sym	VCDB	0.833	0.792	0.654
	ViSiL_f	VCDB	0.843	0.797	0.660
	ViSiL_v	VCDB	0.892	0.841	0.702
	DnS(S^f_A)	DnS-100K	0.921	0.875	0.741
video	HC	VCDB	0.265	0.247	0.193
	DML	VCDB	0.398	0.378	0.309
	TMK	VCDB	0.417	0.394	0.319
	LAMV	VCDB	0.489	0.459	0.364
	VRAG	VCDB	0.484	0.470	0.399
	TCA_c	VCDB	0.570	0.553	0.473
	DnS(S^c)	DnS-100K	0.574	0.558	0.476
	VVS₅₀₀(Ours)	VCDB	0.606	0.588	0.502
	VVS₅₁₂(Ours)	VCDB	0.608	0.590	0.505
	VVS₁₀₂₄(Ours)	VCDB	0.645	0.627	0.536
	VVS₃₈₄₀(Ours)	VCDB	0.711	0.689	0.590

CC_WEB_VIDEO

Usage	Method	train dataset	cc_web	cc_web^*	cc_web_c	cc_web_c^*
frame	TN	VCDB	0.978	0.965	0.991	0.987
	DP	VCDB	0.975	0.958	0.990	0.982
	CTE	VCDB	0.996	-	-	-
	TCA_sym	VCDB	0.982	0.962	0.992	0.981
	TCA_f	VCDB	0.983	0.969	0.994	0.990
	SCFV+NIP₂₅₆	VCDB	0.973	0.953	0.976	0.959
	SCFV+TNIP₂₅₆	VCDB	0.978	0.969	0.983	0.975
	ViSiL_sym	VCDB	0.982	0.969	0.991	0.988
	ViSiL_f	VCDB	0.984	0.969	0.993	0.987
	ViSiL_v	VCDB	0.985	0.971	0.996	0.993
	DnS(S^f_A)	DnS-100K	0.984	0.973	0.995	0.992
video	HC	VCDB	0.958	-	-	-
	DML	VCDB	0.971	0.941	0.979	0.959
	VRAG	VCDB	0.971	0.952	0.980	0.967
	TCA_c	VCDB	0.973	0.947	0.983	0.965
	DnS(S^c)	DnS-100K	0.972	0.952	0.980	0.967
	VVS₅₀₀(Ours)	VCDB	0.973	0.952	0.981	0.966
	VVS₅₁₂(Ours)	VCDB	0.973	0.952	0.981	0.967
	VVS₁₀₂₄(Ours)	VCDB	0.973	0.952	0.982	0.969
	VVS₃₈₄₀(Ours)	VCDB	0.975	0.955	0.984	0.973

👍 References

We referenced the repos below for the code.

ViSiL
FIVR
CC_WEB

✉ Contact

If you have any question or comment, please contact using the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

[AAAI-24] VVS : Video-to-Video Retrieval
With Irrelevant Frame Suppression [Project Page]

Official Pytorch Implementation of VVS: Video-to-Video Retrieval With Irrelevant Frame Suppression

⏩ For a fast evaluation

🎞 Data Preparation

Getting the Dataset

➡️ Getting the Feature

🔨 Prerequisites

Recommended Environment

🔄 Running

🔑 Models

📑 Experiments

FIVR200K

CC_WEB_VIDEO

👍 References

✉ Contact

Files

README.md

Latest commit

History

README.md

File metadata and controls

[AAAI-24] VVS : Video-to-Video Retrieval With Irrelevant Frame Suppression [Project Page]

Official Pytorch Implementation of VVS: Video-to-Video Retrieval With Irrelevant Frame Suppression

⏩ For a fast evaluation

🎞 Data Preparation

Getting the Dataset

➡️ Getting the Feature

🔨 Prerequisites

Recommended Environment

🔄 Running

🔑 Models

📑 Experiments

FIVR200K

CC_WEB_VIDEO

👍 References

✉ Contact

[AAAI-24] VVS : Video-to-Video Retrieval
With Irrelevant Frame Suppression [Project Page]