This is the demo implementation of the paper "An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos". [NOT OFFICIAL!!]
Original Paper Project Page | Paper
- PyTorch (ver. 0.4+ required)
- FFmpeg
- Python3
- Pyqt5
If U just want to use the DEMO, this step is not necessary. Download the pre-trained and trained model is enough!!
- Download the videos here.(offical)
- video pre-processing using
/tools/processing.py
(mp4 to jpg+ Add n_frames information + Generate annotation file in json format + mp4 to mp3) - Here, We provide the processed dataset, including VideoEmotion8-imgs(splitted by FFmpeg) and VideoEmotion8-mp3, so that you can train your own model easier.
VideoEmotion8-imgs: here (extraction code: fhom)
VideoEmotion8-mp3: here (extraction code: 7tn3)
- resnet-101-kinetics.pth: pre-trained model download here (extraction code:0bi8)
- save_30.pth: trained model download here (extraction code:uq82)
- ve8_01.json: download here (extraction code:s567)
Assume the strcture of data directories is the following:
~/
data
Joy/
.../(video name)
images/(jpg files)
mp3/
mp3/(mp3 file)
results
resnet-101-kinetics.pth
save_30.pth
ve8_01.json
Confirm all options in ~/opts.py
.
python Emotion.py
See the next section for details.
To see another branch:click here --Tutorial
(Chinese version)