Skip to content

This project leverages SQL to perform a comprehensive analysis of Netflix's movies and TV shows dataset, with the goal of uncovering actionable insights and addressing critical business questions.

Notifications You must be signed in to change notification settings

SriSurya-DA/netflix_analytics_sql

Repository files navigation

Exploring Netflix's Content Catalog with Data-Driven Insights

Netflix

Overview

This project utilizes SQL to conduct an in-depth analysis of Netflix's movies and TV shows dataset, aiming to extract valuable insights and answer key business questions. This README provides a detailed account of the project's objectives, business problems, solutions, findings, and conclusions.

Objective

  • Content Type Distribution: Analyze the distribution of content types (movies vs TV shows) to understand the composition of Netflix's catalog.
  • Rating Analysis: Identify the most common ratings for movies and TV shows to inform content acquisition and recommendation strategies.
  • Content Attributes: List and analyze content based on release years, countries, and durations to identify trends and patterns.
  • Content Categorization: Explore and categorize content based on specific criteria and keywords to enable targeted content discovery and recommendation.

Dataset

The Data for this Project sourced from the Kaggle Dataset:

Schema

   Drop table if exists netflix
   create table netflix
(
                show_id varchar(10),
  		type varchar(10),
		title varchar(150),
		director varchar(220),
		casts varchar(1000),
		country varchar(150),
		date_added varchar(150),
		release_year int,
		rating varchar(10),
		duration varchar(20),
		listed_in varchar(100),
		description varchar(270)

);

Business Problems and Solutions

1. Count the number of Movies and number of TV shows

select type, count(*) as shows_count
from netflix
group by type;

Objective: Determine the distribution of content types on Netflix.

2. Find the most common ratings for movies and TV Shows

select t1.type, t1.rating
	from (
select type,rating,count(*),
rank() over (partition by type order by count(*) desc) as ranking
from netflix
group by 1,2) as t1

where t1.ranking =1;

Objective: Identify the most frequently occurring rating for each type of content.

3. List all the movies released in the specific year like 2020

select n.*
from netflix n
where release_year = 2020 and type = 'Movie';

Objective: Retrieve all movies released in a specific year.

4. Find the top 5 countries with the most content on Netflix

  • Some rows having multiple entires like (india, canada,us) in same row
  • For this we using an uncut and unnest function --> (unnest(string_to_array(country, ',')))
select unnest(string_to_array(country, ',')) as country_name, count(show_id) as total_count
from netflix
group by 1
order by 2 desc
limit 5;

Objective: Identify the top 5 countries with the highest number of content items.

5. Identify the longest movie

select movie_list.type, movie_list.duration, movie_list.title
from (
select type, duration, title,
rank () over(order by duration desc) total_duration
from netflix
where type = 'Movie' and duration <> '') as movie_list

where movie_list.total_duration = 1

Objective: Find the movie with the longest duration.

6. Find the Content released in the last 5-years

-- Here need to convert the date from varchar to string

select *
from netflix
where to_date(date_added, 'month DD,YYY') >= current_date - interval '5 years'

Objective: Retrieve content added to Netflix in the last 5 years.

7. Find all the Movie and Series directed by Rajiv Chilaka

select type, title
from netflix
where director = 'Rajiv Chilaka'

-- If there are more then 1-direcor then we need to slipt the rows which have more than 1 name


select new_one.director_name, new_one.type, new_one.title
from
(select unnest(string_to_array(director, ',')) as director_name, type, title
from netflix) as new_one

where new_one.director_name = 'Rajiv Chilaka';

-- or simple

select *
from netflix
where director Ilike '%Rajiv Chilaka%' --Here Ilike will check the lower case also

Objective: List all content directed by 'Rajiv Chilaka'.

8. List the TV shows which have seasons >5

-- In text we not use the '>= are <, <=, >' it may give false result
-- Hence using an split function

select kt.*
from
(select *, split_part(duration, ' ',1):: numeric as seasons
from netflix
where type = 'TV Show') as kt

where kt.seasons > 5;

--or

select *
from netflix
where type = 'TV Show' and split_part(duration, ' ',1):: numeric > 5;

Objective: Identify TV shows with more than 5 seasons.

9. Count the number of content items in each genre

select gn.genre, count(gn.title)
from(select *, unnest(string_to_array(listed_in, ',')) as genre
from netflix) as gn
group by 1
order by 2 desc;

Objective: Count the number of content items in each genre.

10. Find each year and the average numbers of content release only in India on netflix.Return top 5 year with highest avg content release !

SELECT Extract(Year from to_date(date_added, 'Month DD, YYYY')) as years, count(*) as yearly_content,
Round(count(*)::numeric / (select count(*) from netflix where country = 'India')*100,2) as Avg_content
FROM netflix  
WHERE country = 'India'  
group by 1
order by avg_content desc
limit 5;

Objective: Calculate and rank years by the average number of content releases by India.

11. List all the movies that are Documentry

select *
from netflix
where type = 'Movie' and listed_in Ilike '%Documentaries%';

Objective: Retrieve all movies classified as documentaries.

12. Find all the content without director

select *
from netflix
where director is null;

Objective: List content that does not have a director.

13. Find How Many movie actor "Salman Khan" appeared in last "10-years"

select *
from netflix
where casts Ilike '%Salman Khan%' and release_year > extract(year from current_date) - 10;

-- For static datasets

select *
from netflix
where casts Ilike '%Salman Khan%' and release_year between 2014 and 2024

Objective: Count the number of movies featuring 'Salman Khan' in the last 10 years.

14. Find top 10 actors who appeared in the highest number of movies produced in India

select count(*), jk.actors
from(select *, unnest(string_to_array(casts, ',')) as actors
from netflix
where country Ilike '%India%' and type = 'Movie') as jk
group by 2
order by 1 desc
limit 10

Objective: Identify the top 10 actors with the most appearances in Indian-produced movies.

15. Categorize Content Based on the Presence of 'Kill' and 'Violence' Keywords

select cont.category, count(cont.description) as Total_count
from
(select *,
	case
	when
	  description Ilike '%kill%' or description Ilike '%violence%' then 'Bad Content'
	  else 'Good Content'
	end category
from netflix) as cont
group by 1

Objective: Categorize content as 'Bad' if it contains 'kill' or 'violence' and 'Good' otherwise. Count the number of items in each category.

Findings and Conclusion

This analysis of the Netflix dataset reveals several key insights:

  • Content Distribution: The dataset showcases a diverse range of movies and TV shows, with varying ratings and genres, catering to a broad audience.
  • Common Ratings: The most common ratings provide valuable insights into the target audience for each content type, informing content acquisition and recommendation strategies.
  • Geographical Insights: The top countries and average content releases by India highlight regional content distribution patterns, enabling targeted content localization.
  • Content Categorization: Categorizing content based on specific keywords provides a deeper understanding of the nature of content available on Netflix, facilitating content discovery and recommendation.

Contact

For inquiries and collaborations, feel free to reach out to srisuryaofficial@gmail.com

About

This project leverages SQL to perform a comprehensive analysis of Netflix's movies and TV shows dataset, with the goal of uncovering actionable insights and addressing critical business questions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published