This repository is for VTT tracking research results. Even if a scene change occurs in the video, it gives the same ID. The configuration is largely based on multi-object tracking using re-id method and the image2text method is a separate module.
We mainly use Image-Text-Embedding method and Person ReID baseline.
current code borrows heavily from Image-Text-Embedding. The images were taken from CUHK PEDES dataset.
- NVIDIA GPU + CUDA + CuDNN
- Matconvnet (Unzip matlab) + Matlab 2017b
- Pytorch 1.0 + Python 3.6
- Install requirements
- For Visual Tracking
- Unzip friends2.zip and after download unzip into /MOT_Re-Id/
- Download pre-trained model Download
- Locate the pre-trained model into /MOT_Re-Id/model/ft_resNet50/
- For Image2Text
- Download GoogleNews
- Download CUHK-PEDES
- Pre-trained model (currently uploading in progress)
- Visual Tracking
dataset structre:
/MOT_Re-Id/Friends
└ ep1
└ gallery
└ 0001 (frame), 0002, 0003, ....
└ data
└ 0001.png, 0002.png, ... (detection results)
run /MOR_Re-Id/MOT_reid.py
- Image2Text
run src/find_pic_feature_word2_plus
tracker_results.json has tracking coordinates.
coordinates information is as follows.
"coordinates" : x1, y1, x2, y2, id_number
{
"dataset": "Friends_EP1",
"coordinates": [
[
252, 338, 584, 819, 1
],
[
688, 376, 951, 748, 2
],
[
...
}
- Write MORE example
- Currently uploading in progress
This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding)
MIT