Skip to content

Latest commit

 

History

History
498 lines (453 loc) · 11.1 KB

README.md

File metadata and controls

498 lines (453 loc) · 11.1 KB

[AAAI-24] VVS : Video-to-Video Retrieval
With Irrelevant Frame Suppression [Project Page]

Paper: Video-to-Video Retrieval With Irrelevant Frame Suppression


⏩ For a fast evaluation

  • For a fast verification, a simple evaluation protocol is guided as follows.

  • The process of fast evaluation for VVS on FIVR5K can be summarized into 3 steps:

    1. Download the data from an Google Drive link.

    2. Please locate the data as below

      • Place the pca.pkl inside a VVS/data/vcdb folder
      • Place the fivr5k_resnet50_l4imac inside a VVS/features folder
      • Place the table_benchmark_dim_3840 inside a VVS/jobs folder
    3. Run the command to evaluate the VVS on FIVR5K

      • bash experiments/review/fast_evaluation_fivr5k.sh

🎞 Data Preparation

Getting the Dataset

  • Download the raw video dataset you want. The supported options are:

  • You should contact the author about the missing video that occurs during the download process.

  • The raw video data should be located like the structure below.

  • But preparing raw video is not essential. We provide the features, we used.

├── videos
   ├── fivr
      └── videos
         ├── video_1
         ├── video_2
         └── ...
   ├── cc_web
      └── videos
         ├── video_1
         ├── video_2
         └── ...
   ├── evve
      └── videos
         ├── video_1
         ├── video_2
         └── ...

➡️ Getting the Feature

  • For convenience, we provide the features we used. You can find them here.

  • Before running, Place the features inside a VVS/features folder.

├── features
   └── vcdb_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...
   └── fivr_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...
   └── cc_web_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...
    └── evve_resnet50_l4imac
      ├── features
         ├── feat_1
         ├── feat_2
         └── ...

🔨 Prerequisites

Recommended Environment

  • OS : Ubuntu 18.04
  • CUDA : 10.2
  • Python 3.7
  • Pytorch 1.8.1 Torchvision 0.9.1
  • GPU : NVIDA-Tesla V100(32G)

Required packages are listed in environment.yaml. You can install by running:

conda env create -f environment.yaml
conda activate VVS

If your GPU only support above CUDA 11.0, you can install by running:

conda env create -f environment_cuda11.yaml
conda activate VVS

🔄 Running

  • Before running, Place the pca.pkl inside a VVS/data/vcdb folder or you can calculate PCA weight directly python cal_pca.py.
  • You can easily evaluate the model by running the provided script.

Please follow the instructions in README.md for training and evaluation

🔑 Models

We provide checkpoints, to succesfully reproduce our benchmark experiments.

  • You can run the script according to the feature dimension.
Dataset script
FIVR5K $ bash experiments/main_script/train/table_benchmark/eval_benchmark_fivr5k_dim_{dim}.sh
FIVR200K $ bash experiments/main_script/train/table_benchmark/eval_benchmark_fivr200k_dim_{dim}.sh
CC_WEB_VIDEO $ bash experiments/main_script/train/table_benchmark/eval_benchmark_cc_web_dim_{dim}.sh

📑 Experiments

FIVR200K

Usage Method train dataset DSVR CSVR ISVR
frame TN VCDB 0.724 0.699 0.589
DP VCDB 0.775 0.740 0.632
TCAsym VCDB 0.728 0.698 0.592
TCAf VCDB 0.877 0.830 0.703
SCFV+NIP256 VCDB 0.819 0.764 0.622
SCFV+TNIP256 VCDB 0.896 0.833 0.674
ViSiLsym VCDB 0.833 0.792 0.654
ViSiLf VCDB 0.843 0.797 0.660
ViSiLv VCDB 0.892 0.841 0.702
DnS(SfA) DnS-100K 0.921 0.875 0.741
video HC VCDB 0.265 0.247 0.193
DML VCDB 0.398 0.378 0.309
TMK VCDB 0.417 0.394 0.319
LAMV VCDB 0.489 0.459 0.364
VRAG VCDB 0.484 0.470 0.399
TCAc VCDB 0.570 0.553 0.473
DnS(Sc) DnS-100K 0.574 0.558 0.476
VVS500(Ours) VCDB 0.606 0.588 0.502
VVS512(Ours) VCDB 0.608 0.590 0.505
VVS1024(Ours) VCDB 0.645 0.627 0.536
VVS3840(Ours) VCDB 0.711 0.689 0.590

CC_WEB_VIDEO

Usage Method train dataset cc_web cc_web* cc_webc cc_webc*
frame TN VCDB 0.978 0.965 0.991 0.987
DP VCDB 0.975 0.958 0.990 0.982
CTE VCDB 0.996 - - -
TCAsym VCDB 0.982 0.962 0.992 0.981
TCAf VCDB 0.983 0.969 0.994 0.990
SCFV+NIP256 VCDB 0.973 0.953 0.976 0.959
SCFV+TNIP256 VCDB 0.978 0.969 0.983 0.975
ViSiLsym VCDB 0.982 0.969 0.991 0.988
ViSiLf VCDB 0.984 0.969 0.993 0.987
ViSiLv VCDB 0.985 0.971 0.996 0.993
DnS(SfA) DnS-100K 0.984 0.973 0.995 0.992
video HC VCDB 0.958 - - -
DML VCDB 0.971 0.941 0.979 0.959
VRAG VCDB 0.971 0.952 0.980 0.967
TCAc VCDB 0.973 0.947 0.983 0.965
DnS(Sc) DnS-100K 0.972 0.952 0.980 0.967
VVS500(Ours) VCDB 0.973 0.952 0.981 0.966
VVS512(Ours) VCDB 0.973 0.952 0.981 0.967
VVS1024(Ours) VCDB 0.973 0.952 0.982 0.969
VVS3840(Ours) VCDB 0.975 0.955 0.984 0.973

👍 References

We referenced the repos below for the code.

✉ Contact

If you have any question or comment, please contact using the issue.