Skip to content

[CVPR-2024] The First High Definition (HD) Event based Visual Object Tracking Benchmark Dataset

Notifications You must be signed in to change notification settings

Event-AHU/EventVOT_Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The First High Definition (HD) Event based Visual Object Tracking Benchmark Dataset


arXivBaselinesDemoVideoTutorial

Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline. Xiao Wang, Shiao Wang, Chuanming Tang, Lin Zhu, Bo Jiang, Yonghong Tian, Jin Tang (2023). arXiv preprint arXiv:2309.14611. [Paper] [Code] [DemoVideo]

🎯 Abstract

Tracking using bio-inspired event cameras draws more and more attention in recent years. Existing works either utilize aligned RGB and event data for accurate tracking or directly learn an event-based tracker. The first category needs more cost for inference and the second one may be easily influenced by noisy events or sparse spatial resolution. In this paper, we propose a novel hierarchical knowledge distillation framework that can fully utilize multi-modal / multi-view information during training to facilitate knowledge transfer, enabling us to achieve high-speed and low-latency visual tracking during testing by using only event signals. Specifically, a teacher Transformer based multi-modal tracking framework is first trained by feeding the RGB frame and event stream simultaneously. Then, we design a new hierarchical knowledge distillation strategy which includes pairwise similarity, feature representation and response maps based knowledge distillation to guide the learning of the student Transformer network. Moreover, since existing event-based tracking datasets are all low-resolution ($346 \times 260$), we propose the first large-scale high-resolution ($1280 \times 720$) dataset named EventVOT. It contains 1141 videos and covers a wide range of categories such as pedestrians, vehicles, UAVs, ping pongs, etc. Extensive experiments on both low-resolution (FE240hz, VisEvent, COESOT), and our newly proposed high-resolution EventVOT dataset fully validated the effectiveness of our proposed method.

💥 Update Log

  • 🔥 [2024.03.12] A New Long-term RGB-Event based Visual Object Tracking Benchmark Dataset (termed FELT) is available at [Paper] [Code] [DemoVideo]

  • 🔥 [2024.02.28] Our code, visualizations and other experimental results have been updated.

  • 🔥 [2024.02.27] Our work is accepted by CVPR-2024!

  • 🔥 [2023.12.04] EventVOT_eval_toolkit, from EventVOT_eval_toolki (Passcode:wsad)

  • 🔥 [2023.09.26] arXiv paper, dataset, pre-trained models, and benchmark results are all released [arXiv]

📹 Demo Video

A demo video Youtube can be found by clicking the image below:

DemoVideo

EventVOT_samples

EventVOT_gif

🔨 Environment

A distillation framework for Event Stream-based Visual Object Tracking.

[HDETrack_S_ep0050.pth] Passcode:wsad

[Raw Results] Passcode:wsad

Framework

Install env

conda create -n hdetrack python=3.8
conda activate hdetrack
bash install.sh

Run the following command to set paths for this project

python tracking/create_default_local_file.py --workspace_dir . --data_dir ./data --save_dir ./output

After running this command, you can also modify paths by editing these two files

lib/train/admin/local.py  # paths about training
lib/test/evaluation/local.py  # paths about testing

Then, put the tracking datasets EventVOT in ./data.

Download pre-trained MAE ViT-Base weights and put it under $/pretrained_models

Download teacher pre-trained CEUTrack_ep0050.pth and put it under $/pretrained_models

Download the trained model weights from [HDETrack_S_ep0050.pth] and put it under $/output/checkpoints/train/hdetrack/hdetrack_eventvot for test directly.

You can also access Weight files in Dropbox to download these weights files.

Train & Test

# train
python tracking/train.py --script hdetrack --config hdetrack_eventvot --save_dir ./output --mode single --nproc_per_node 1 --use_wandb 0

# test
python tracking/test.py hdetrack hdetrack_eventvot --dataset eventvot --threads 1 --num_gpus 1

Test FLOPs, and Speed

Note: The speeds reported in our paper were tested on a single RTX 3090 GPU.

📀 EventVOT Dataset

  • Event Image version (train.zip 28.16GB, val.zip 703M, test.zip 9.94GB)

💾 Baidu Netdisk: link:https://pan.baidu.com/s/1NLSnczJ8gnHqF-69bE7Ldg?pwd=wsad code:wsad

  • Complete version (Event Image + Raw Event data, train.zip 180.7GB, val.zip 4.34GB, test.zip 64.88GB)

💾 Baidu Netdisk: link:https://pan.baidu.com/s/1ZTX7O5gWlAdpKmd4R9VhYA?pwd=wsad code:wsad

💾 Dropbox: https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p&dl=0

  • If you want to download the dataset directly on the Ubuntu terminal using a script, please try this:
wget -O EventVOT_dataset.zip https://www.dropbox.com/scl/fo/fv2e3i0ytrjt14ylz81dx/h?rlkey=6c2wk2z7phmbiwqpfhhe29i5p"&"dl=1

The directory should have the below format:

├── EventVOT dataset
    ├── Training Subset (841 videos, 180.7GB)
        ├── recording_2022-10-10_17-28-38
            ├── img
            ├── recording_2022-10-10_17-28-38.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ... 
    ├── Testing Subset (282 videos, 64.88GB)
        ├── recording_2022-10-10_17-28-24
            ├── img
            ├── recording_2022-10-10_17-28-24.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ...
    ├── validating Subset (18 videos, 4.34GB)
        ├── recording_2022-10-10_17-31-07
            ├── img
            ├── recording_2022-10-10_17-31-07.csv
            ├── groundtruth.txt
            ├── absent.txt
        ├── ... 

Normally, we only need the "img" and "..._voxel" files from the EventVOT dataset for training. During testing, we only input "img" for inference. As shown in the following figure,

EventVOT_files

Note: Our EventVOT dataset is an unimodal Event Dataset, if you need a multimodal RGB-E dataset, please refer to [COESOT], [VisEvent], or [FELT].

📐 Evaluation Toolkit

  1. Download the EventVOT_eval_toolkit from EventVOT_eval_toolki (Passcode:wsad), and open it with Matlab (over Matlab R2020).
  2. add your tracking results and baseline results (Passcode:wsad) in $/eventvot_tracking_results/ and modify the name in $/utils/config_tracker.m
  3. run Evaluate_EventVOT_benchmark_SP_PR_only.m for the overall performance evaluation, including SR, PR, NPR.
  4. run plot_BOC.m for BOC score evaluation and figure plot.
  5. run plot_radar.m for attributes radar figrue plot.
  6. run Evaluate_EventVOT_benchmark_attributes.m for attributes analysis and figure saved in $/res_fig/.

RadarRadar

📈 Benchmark Results

The overall performance evaluation, including SR, PR, NPR.

SRPRNPR

💘 Acknowledgement

📰 Citation

@inproceedings{wang2024event,
  title={Event stream-based visual object tracking: A high-resolution benchmark dataset and a novel baseline},
  author={Wang, Xiao and Wang, Shiao and Tang, Chuanming and Zhu, Lin and Jiang, Bo and Tian, Yonghong and Tang, Jin},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={19248--19257},
  year={2024}
}

Star History

Star History Chart