Tag 8 million Youtube video

The YouTube8M challenge is a multi-class classification problem, where we are asked to predict for each video, given video & frame level audio and frame RGB features, to which group of categories it belongs to. The number of classes is 3807 for our subset of data.

Approach

classifier.ipynb shows the approach to solve the problem. The main idea is to separate video level features with frame level features, and apply context gating (non linear learnable unit to model interdependencies between activations) [1] for video classification

Video level features

mean rgb and mean audio are the video level features. We pass them through Dense layers.

Frame level features

mean frame rgb and mean frame audio are the frame the frame level feautures. We pass them through Bi-LSTM layers.

Merge

In the end we merge the outputs of video and frame level features into a dense layer and a sigmoid layer is used to predict the tag for the video.

[1] Miech, Antoine, Ivan Laptev, and Josef Sivic. "Learnable pooling with Context Gating for video classification." arXiv preprint arXiv:1706.06905 (2017).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
classifier.ipynb		classifier.ipynb
label_names_2018.csv		label_names_2018.csv
model.png		model.png
vocabulary.csv		vocabulary.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tag 8 million Youtube video

Approach

Video level features

Frame level features

Merge

About

Releases

Packages

Languages

Shivam-Miglani/youtube_8m_analysis

Folders and files

Latest commit

History

Repository files navigation

Tag 8 million Youtube video

Approach

Video level features

Frame level features

Merge

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages