- This project aims to identify which genre a song belongs to on the basis of its lyrics
- The dataset contains 62155 songs
- Each song is classified into 5 genres- Country, Rock, Hip-Hop, Pop and Rhythm and Blues
- Two major text classification method were tested- Logistic Regression and Naïve Bayes
- Under Naïve Bayes, 3 models were tested- Count Vectorizer, TF-IDF, and Text Cleaning
- Upon testing, Multinomial Naïve Bayes with Text Cleaning was found to give the highest accuracy
- Further analysis was also done to find top songs and generate WordClouds
- For music enthusiasts to have some fun by giving desired lyrics as an input to the classifier and getting the closest matching genre as output
- To find out if there is any direct correlation between lyrics and genres of songs
- To organize songs on the basis of their genre, for example top hip-hop songs, most hip-hop like rock songs, hip-hop like country songs, hip -hop like RnB songs, hip-hop like pop songs; similarly for rock, country and RnB
- To create WordClouds, a technique to find words are the most frequent in the given dataset, and to find the most common lyrics used in a particular genre
- Works on bulk data
- Consistent and meaningful results
- Improved accuracy of the models
- Top songs in each genre apart from genre classification
- Confusion matrix: gives the basic idea of distribution of genres in the given dataset
- Basic building block for more sophisticated music genre predication systems
- Applications include music retrieval and recommendation
The code can be found here