NN-Classifier for human activity videos from HMDB dataset

Part of the Foundations of Deep Learning project | UniMiB

The purpose of the project is the development of different classification algorithms in order to predict and recognize the simplest human actions and compare their performance.

Dataset

The selected dataset is named HMDB (Human Emotion DB) and is available at the following link. Each observation corresponds to one video, for a total of 6849 clips. Each video has been associated with one of 51 possible classes, each of which identifies a specific human behavior. Moreover, the classes of actions can be grouped into:

general facial actions, such as smiling or laughing;
facial actions with object manipulation, such as smoking;
general body movements, such as running;
body movements with object interaction, such as golfing;
body movements for human interaction, such as kissing.

Due to computational problems, we have chosen only 19 classes (general body movements) on which to train the human activity recognition algorithm.

LRCN approach

LRCN is a class of architectures which combines Convolutional layers and Long Short-Term Memory (LSTM).

BASIC LRCN

Convolutional2D Layer
LSTM Layer
Dense Layer (fully connected)

ADVANCED LRCN

3 Convolutional2D Layers
LSTM Layer
Dense Layer (fully connected)

MoveNet approach

MoveNet is an ultra fast and accurate model that detects 17 keypoints on a body. The model is offered in two variants, known as Lightning and Thunder. Lightning is intended for latency-critical applications, while Thunder is intended for applications that require high accuracy.

MoveNet is a bottom-up estimation model that uses heatmaps to accurately localize human keypoints. The architecture consists of two components: a feature extractor and a set of prediction heads.

The feature extractor in MoveNet is MobileNetV2 with an attached feature pyramid network (FPN), which allows for a high-resolution, semantically rich feature map output. There are four prediction heads attached to the feature extractor, responsible for densely predicting:

person center heatmap: predicts the geometric center of person instances;
keypoint regression field: predicts full set of keypoints for a person, used for grouping keypoints into instances;
person keypoint heatmap: predicts the location of all keypoints, independent of person instances;
2D per-keypoint offset field: predicts local offsets from each output feature map pixel to the precise sub-pixel location of each keypoint.

Results

Network	Validation Accuracy
Basic LRCN	34%
Adavnced LRCN	41%
MoveNet	70%

References

[1] Deep Learning Models for Human Activity Recognition

[2] Long-term Recurrent Convolutional Networks for Visual Recognition and Description

[3] Long-term Recurrent Convolutional Network for Video Regression

[4] Long-term Recurrent Convolutional Networks

[5] Next-Generation Pose Detection with MoveNet

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
models		models
LICENSE.md		LICENSE.md
LRCN_classifier.ipynb		LRCN_classifier.ipynb
MoveNet_classifier.ipynb		MoveNet_classifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NN-Classifier for human activity videos from HMDB dataset

Dataset

LRCN approach

MoveNet approach

Results

References

About

Releases

Packages

Languages

License

valots12/video-activity-recognition

Folders and files

Latest commit

History

Repository files navigation

NN-Classifier for human activity videos from HMDB dataset

Dataset

LRCN approach

MoveNet approach

Results

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages