Using a dataset obtained from Kaggle, this project attempts to identify different generes of videos by analyzing trending U.S. Youtube video's tags, descriptions, and titles.
Data is trimmed and the more common and unnecessary words are weeded out for vectorization.
Words that were weeded out including common english words like to
and a
as well as words like youtube.com
, instagram
, and patreon
that provided no meaningful information to the categorical genres.
This project utilized Binary classification and two different models for comparison, a Fully Connected Neural Network and a Convultion Neural Network.
This project was done in Python notebook and can be viewed alternatively in Jupyter's nbviewer for the best results.
Link to the project's nbviewer
While this project only utilized the U.S. dataset of trending videos, the full set gathered by the user Mitchell J., can be found online and downloaded here:
Trending Youtube Video Statistics
Documentation and presentations folder contains the final report with the hypothesis and conclusion to the project. I have also included the PowerPoint presentation used.