This project explores the Netflix dataset using SQL to uncover key insights about the platform's content. The analysis answers several business-relevant questions, helping to identify trends in content distribution, release patterns, and characteristics of movies and TV shows.
📁 Dataset Overview
The dataset includes information about Netflix's movies and TV shows, such as:
type: Movie or TV Show
title: Title of the content
director: Director's name
cast: Main actors and actresses
country: Country of production
date_added: When the content was added to Netflix
release_year: The year the content was released
rating: Content rating (e.g., PG-13, TV-MA)
duration: Duration of movies or number of seasons for TV shows
listed_in: Genres or categories
description: A short description of the content
🛠️ Analysis Questions
The following SQL queries were used to explore the dataset and address important business questions:
- Different type of content Netflix carries
- Percentage of different type of content in Netflix
- Find the most common rating for the movies and TV shows
- Release all the movies released in Covid
- Which year has the highest number of release
- Find the top 10 countries with the mostcontent on netflix
- Identify the longest movie
- Identify the longest tv show
- Average duration of movies and tv shows
- Find the content that was added in recent 5 years
- Find all the movies/TV shows by director Rajiv Chilaka
- List all the TV shows with more than 5 seasons
- Most common genres
- Count the number of content item in each genre
- Find the movie and title which contains multi-genre content in them
- Find the Tv show and title which contains multi-genre content in them
- Find each year and the average number of content released by India on netflix
- List all the movies that are documentaries
- Find all the content without a director
- Find how many movies actor Amitabh Bachchan appeared in last 10 years
- Find the top actors who have appeared in the highest number of movies produced in india
- Find the top directors and their most frequent actors/actresses
- Who are the directors having most content in netflix
- Categorize the content based on the keywords 'kill', 'violence' and 'sex' etc. Label them as '18+', 'bad' and rest as 'good'
🚀 Conclusion
This project demonstrates how SQL can be used to analyze a real-world dataset and derive actionable business insights. From content distribution to trends in releases and average durations, this analysis helps to inform strategic decisions for platforms like Netflix.
Feel free to explore the SQL queries and adapt them for your own analysis!
🛠️ Tools Used
SQL: Structured Query Language for querying the dataset, data exploration and analysis.
📈 Future Work
Deeper analysis on content ratings and viewer preferences.
Time-series analysis to explore the growth of content over time.
Sentiment analysis of descriptions to identify content themes.