Skip to content

Latest commit

 

History

History
executable file
·
74 lines (57 loc) · 3.68 KB

README.md

File metadata and controls

executable file
·
74 lines (57 loc) · 3.68 KB

VTT video emotion recognition

Video Emotion Recognition for VTT Friends dataset, trained with RAF dataset and AffectNet dataset

Alt text


Preliminaries

Packages

  • python==2.7.1
  • pytorch==1.3.0
  • numpy==1.16.5
  • opencv-python==4.1.0
  • matplotlib==2.2.4
  • face_recognition==1.2.3

Datasets

  • AffectNet dataset #
  • RAF dataset #
  • VTT Friends dataset

Preprocessing

  • Parsed AffectNet dataset can be preprocessed by using parse_AffectNet.py script

    • Parsed AffectNet dictionary has a categorical structure of [af_dict] -> [emotion label] -> [example index] -> ['img':image filenames], ['gt':ground truth bounding boxes], ['emo':emotion labels], ['aro':arousal intensity]
  • Parsed RAF dataset can be preprocessed by using parse_RAF.py script

    • Parsed RAF dictionary has a categorical structure of [raf_dict] -> [example index] -> ['img':image filenames], ['gt':ground truth bounding boxes], ['em':emotion labels]
  • VTT Friends dataset (episode 1-10) is used as the validation set

    • All frames of the videos should be extracted and saved as image files as in extract_friends.py
    • Metadata (json) files are used for processing as in parse_friends_new.ipynb to extract facial region images and emotion labels
    • The dataset can be indexed using friends_parsed_new.npy, with the structure of [val_dict] -> [emotion label] -> [example index] -> ['img':image filenames], ['pos':episode num, character id], ['emo':emotion label]

Train

model_train.py and model_tsm_train.py are training script where training variables are configured in ops.py. Training can be performed by running

python model_train.py

where the training process (error and accuracy) can be plotted by running,

python plot_errval.py

Test

Testing for a video frame can be performed by importing the network model function by,

from resnet_tsm import resnet18 as resnet

where the model function is resnet(input_img), input_img is cropped RGB face sequence image (4x3x224x224), where 4 is the buffer size for video sequence.

Results

Results for VTT_friends dataset can be obtained using create_data.ipynb and stored as in friends_s01_ep00.json as,

{"type": "emotion", "class": "happy", "seconds": 15.0, "object": Object}, 
...

References

  • Cohn-Kanade (CK+) dataset

    • Kanade, T., Cohn, J. F., & Tian, Y. (2000). Comprehensive database for facial expression analysis. Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG'00), Grenoble, France, 46-53.
    • Lucey, P., Cohn, J. F., Kanade, T., Saragih, J., Ambadar, Z., & Matthews, I. (2010). The Extended Cohn-Kanade Dataset (CK+): A complete expression dataset for action unit and emotion-specified expression. Proceedings of the Third International Workshop on CVPR for Human Communicative Behavior Analysis (CVPR4HB 2010), San Francisco, USA, 94-101.
  • AffectNet dataset

    • Ali Mollahosseini, Behzad Hasani, and Mohammad H. Mahoor, “AffectNet: A New Database for Facial Expression, Valence, and Arousal Computation in the Wild”, IEEE Transactions on Affective Computing, 2017.
  • RAF dataset

    • Li, Shan and Deng, Weihong and Du, JunPing, "Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild", CVPR 2017

Acknowledgements

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIT) (2017-0-01780, The technology development for event recognition/relational reasoning and learning knowledge based system for video understanding)