This project involves a comprehensive analysis of Netflix's movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset. This README provides a detailed account of the project's objectives, dataset, business problems, and findings.
- Analyze the distribution of content types (Movies vs. TV Shows).
- Identify the most common ratings for Movies and TV Shows.
- Explore content trends by release year, country, and duration.
- Categorize and analyze content based on specific criteria and keywords.
The data used for this project is sourced from Kaggle.
Dataset Link: Movies Dataset
CREATE TABLE netflix (
show_id VARCHAR(5),
type VARCHAR(10),
title VARCHAR(250),
director VARCHAR(550),
casts VARCHAR(1050),
country VARCHAR(550),
date_added VARCHAR(55),
release_year INT,
rating VARCHAR(15),
duration VARCHAR(15),
listed_in VARCHAR(250),
description VARCHAR(550)
);
SELECT type, COUNT(*) AS total
FROM netflix
GROUP BY type;
Objective: Determine the share of Movies and TV Shows on Netflix.
WITH RatingCounts AS (
SELECT type, rating, COUNT(*) AS rating_count
FROM netflix
GROUP BY type, rating
),
RankedRatings AS (
SELECT type, rating, rating_count,
RANK() OVER (PARTITION BY type ORDER BY rating_count DESC) AS rank
FROM RatingCounts
)
SELECT type, rating AS most_frequent_rating
FROM RankedRatings
WHERE rank = 1;
Objective: Identify the most frequent ratings for each content type.
SELECT country, COUNT(*) AS total_content
FROM (
SELECT UNNEST(STRING_TO_ARRAY(country, ',')) AS country
FROM netflix
) AS countries
GROUP BY country
ORDER BY total_content DESC
LIMIT 5;
Objective: Identify the countries producing the most content for Netflix.
SELECT title, duration
FROM netflix
WHERE type = 'Movie'
ORDER BY SPLIT_PART(duration, ' ', 1)::INT DESC
LIMIT 1;
Objective: Find the movie with the longest duration.
SELECT
*
FROM netflix
WHERE type = 'Movie'
ORDER BY SPLIT_PART(duration, ' ', 1)::INT DESC;
Objective: Find the movie with the longest duration.
SELECT *
FROM netflix
WHERE TO_DATE(date_added, 'Month DD, YYYY') >= CURRENT_DATE - INTERVAL '5 years';
Objective: Retrieve content added to Netflix in the last 5 years.
SELECT *
FROM (
SELECT
*,
UNNEST(STRING_TO_ARRAY(director, ',')) AS director_name
FROM netflix
) AS t
WHERE director_name = 'Rajiv Chilaka';
Objective: List all content directed by 'Rajiv Chilaka'.
SELECT *
FROM netflix
WHERE type = 'TV Show'
AND SPLIT_PART(duration, ' ', 1)::INT > 5;
Objective: Identify TV shows with more than 5 seasons.
SELECT
UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
COUNT(*) AS total_content
FROM netflix
GROUP BY 1;
Objective: Count the number of content items in each genre.
return top 5 year with highest avg content release!
SELECT
country,
release_year,
COUNT(show_id) AS total_release,
ROUND(
COUNT(show_id)::numeric /
(SELECT COUNT(show_id) FROM netflix WHERE country = 'India')::numeric * 100, 2
) AS avg_release
FROM netflix
WHERE country = 'India'
GROUP BY country, release_year
ORDER BY avg_release DESC
LIMIT 5;
Objective: Calculate and rank years by the average number of content releases by India.
SELECT *
FROM netflix
WHERE listed_in LIKE '%Documentaries';
Objective: Retrieve all movies classified as documentaries.
SELECT *
FROM netflix
WHERE director IS NULL;
Objective: List content that does not have a director.
SELECT *
FROM netflix
WHERE casts LIKE '%Salman Khan%'
AND release_year > EXTRACT(YEAR FROM CURRENT_DATE) - 10;
Objective: Count the number of movies featuring 'Salman Khan' in the last 10 years.
SELECT
UNNEST(STRING_TO_ARRAY(casts, ',')) AS actor,
COUNT(*)
FROM netflix
WHERE country = 'India'
GROUP BY actor
ORDER BY COUNT(*) DESC
LIMIT 10;
Objective: Identify the top 10 actors with the most appearances in Indian-produced movies.
SELECT
category,
COUNT(*) AS content_count
FROM (
SELECT
CASE
WHEN description ILIKE '%kill%' OR description ILIKE '%violence%' THEN 'Bad'
ELSE 'Good'
END AS category
FROM netflix
) AS categorized_content
GROUP BY category;
Objective: Categorize content as 'Bad' if it contains 'kill' or 'violence' and 'Good' otherwise. Count the number of items in each category.
- Content Distribution: Netflix has a balanced mix of Movies and TV Shows catering to diverse audiences.
- Common Ratings: PG-13 and TV-MA are the most frequent ratings, indicating a focus on mature content.
- Top Countries: USA, India, and Canada lead in content production.
- Longest Movie: Identifying the longest movie helps highlight Netflix's unique offerings.
This project demonstrates the power of SQL in extracting meaningful insights from complex datasets. The analysis provides valuable information about Netflix's content distribution, trends, and audience focus, aiding strategic decisions in content planning.