Multiresolution CNNs for Video Classification

Implemented Multiresolution CNN for video classification on the Sports-1M dataset based on the architecture given in [1]. The model uses two separate streams – ‘fovea’ and ‘context’ that are responsible for learning features from different scaled-down resolutions, and are concatenated later. This helps in avoiding losing important information while speeding up the training process.

Model architecture from [1]

Images are resized to 200x200
170x170 crops are randomly sampled
Horizontal flipping = 0.5
Each pixel is mean subtracted
Optimization - mini-batches = 32, momentum = 0.9, weight decay = 0.0005, learning rate = 0.001
Local Response Normalization layers are replaced by Batch Normalization layers

The sports video dataset can be downloaded from this link

Achieved the highest validation accuracy of 65 % using this implementation which is comparable to the results obtained in [1]

The sample video outputs can be seen here

Reference

[1] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar and L. Fei-Fei, "Large-Scale Video Classification with Convolutional Neural Networks," 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1725-1732, doi: 10.1109/CVPR.2014.223. Link

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
sports-video-data		sports-video-data
README.md		README.md
multi-res-CNN.py		multi-res-CNN.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiresolution CNNs for Video Classification

Reference

About

Languages

nirmal-25/Multi-Res-CNN

Folders and files

Latest commit

History

Repository files navigation

Multiresolution CNNs for Video Classification

Reference

About

Topics

Resources

Stars

Watchers

Forks

Languages