Spotify Music Data Analysis Project🎧

Introduction

The Spotify Data Analysis Python Project delves into the world of music data analysis using Python, showcasing the powerful capabilities of data-driven insights in understanding trends, patterns, and correlations within music datasets. All data was collected directly from the Spotify API, underscoring the authenticity and reliability of the dataset used for analysis. In today's digital age, data analysis plays a pivotal role in various domains, including music streaming services like Spotify. This project serves as an exploration into the realm of data science, specifically focusing on extracting meaningful insights from Spotify's extensive dataset.
Feel free to reach out! Linkedln | Cristina Genduso

Tools Used🛠️:

Programming Language: Python
Libraries: Pandas, Numpy, Matplotlib, Seaborn
IDE: Jupyter Notebook
Dataset: Personal Spotify Dataset

Import Required Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import numpy as np: This imports the NumPy library and aliases it as 'np'. NumPy is used for numerical computations and provides support for arrays and matrices.
import pandas as pd: This imports the Pandas library and aliases it as 'pd'. Pandas is used for data manipulation and analysis, providing data structures like DataFrames for tabular data.
import matplotlib.pyplot as plt: This imports the Pyplot module from the Matplotlib library and aliases it as 'plt'. Matplotlib is a popular plotting library in Python, and Pyplot provides a convenient interface to create visualizations.
import seaborn as sns: This imports the Seaborn library and aliases it as 'sns'. Seaborn is built on top of Matplotlib and offers a higher-level interface for creating attractive statistical visualizations.

Exploring the Dataset

Data Collection

The dataset used in this project was meticulously collected directly from the Spotify API, comprising a comprehensive collection of personal liked songs. Leveraging the capabilities of the Spotify API, I gathered a diverse range of music tracks, reflecting my musical preferences and tastes. This hands-on approach ensured the authenticity and relevance of the dataset, as it consists entirely of songs that resonate with me personally.

tracks = pd.read_csv('./inputs/saved_tracks_with_audio_features.csv')

Overview

The dataset provides a detailed glimpse into my music library, encompassing various audio features, artist information, genre classifications, and temporal attributes of each track. With this rich dataset at hand, the exploration aims to uncover intriguing patterns, correlations, and insights hidden within the vast realm of my favorite songs on Spotify. Let's delve deeper into the dataset to uncover fascinating insights and trends that illuminate my musical journey.

tracks.head()

NOTE: The image provided is not the entirety of the complete image, as there are restrictions in capturing full images through screenshots. To access the comprehensive table, please refer to the Jupyter notebook folder within this repository.

Output:

This line of code calls the head() method on the 'tracks' DataFrame. This method is used to display the first few (5 by default) rows of the DataFrame. This is useful for quickly getting an overview of the data.

Identifying Null Values in the Dataset

#checking null in tracks data
pd.isnull(tracks).sum()

Output:

This line of code uses the pd.isnull() function on the 'tracks' DataFrame to create a new boolean DataFrame where each cell contains:

True if the corresponding cell in the original DataFrame ('tracks') is null;
False otherwise.

The .sum() function is then used to count the number of True values in each column, effectively giving the count of missing values in each column.

Dataset Info

#checking info in tracks data
tracks.info()

Output:

This line of code calls the info() method on the 'tracks' DataFrame. The info() method provides a concise summary of the DataFrame, including the data types of each column, the number of non-null values, and memory usage.

Extracting Insights from the Dataset through Analysis

Discovering the Top 10 Popular Songs in the Spotify Dataset

most = tracks.sort_values(by='Popularity', ascending=False).head(10)

Output:

This line of code creates sorts the 'tracks' DataFrame based on the 'popularity' column in ascending order. The head(10) notation selects the first 10 rows of the sorted DataFrame, effectively selecting the 10 most popular tracks.

Descriptive Statistics

# Display summary statistics of the numerical columns in the dataset
tracks.describe().transpose()

Output:

This line of code generates a concise summary of the numerical features in the tracks DataFrame, providing insights into the central tendency, dispersion, and distribution of the data across different attributes

Average popularity of the tracks

average_popularity = tracks['Popularity'].mean()

Output:

This line of code calculates the average popularity of all the tracks in the tracks DataFrame and stores this value in the variable average_popularity.

Visualization: Pearson Correlation Heatmap for Two Variables

numeric_columns = tracks.select_dtypes(include=['float64',
'int64']).columns td = tracks[numeric_columns].corr(method = 'pearson') hmap =
sns.heatmap(td, annot = True, fmt = '.1g', vmin=-1, vmax=1, center=0,
cmap='crest', linewidths=0.1, linecolor='black')

Output:

numeric_columns = tracks.select_dtypes(include=['float64', 'int64']).columns: This line of code select the columns from the tracks DataFrame that have numeric data types ('float64', 'int64').
hmap = sns.heatmap(td, annot=True, fmt='.1g', vmin=-1, vmax=1, center=0, cmap='crest', linewidths=0.1, linecolor='black'): This line of code uses Seaborn's heatmap() function to create a heatmap visualization of the correlation matrix. It displays the correlation values as annotations, uses a color map ('crest') to represent the correlation strength, and sets the range of correlation values to be between -1 and 1.

Regression Plot of Popularity vs. Acousticness with Regression Line

sns.set_style('darkgrid') plt.figure(figsize=(10, 6))
sns.regplot(data=tracks, x='Acousticness', y='Popularity',
color='orange').set(title='Popularity vs Acousticness Correlation') plt.show()

Output:

sns.regplot(data=tracks, x='Acousticness', y='Popularity', color='orange'): This line of code uses Seaborn's regplot() function to create a regression plot. It visualizes the relationship between the 'popularity' and 'acousticness' columns from the tracks DataFrame.
.set(title='Popularity vs Acousticness Correlation'): This line of code sets the title for the regression plot.

Top 5 Genres the Spotify Dataset

popular = genre_counts.sort_values(ascending=False).head(5)
sns.barplot(y=popular.index, x=popular.values, palette='viridis',
legend=False).set(title='Top 5 Genres by Frequency')

Output:

popular = genre_counts.sort_values(ascending=False).head(5): This line of code extracts the 5 most popular genres in the dataset.
sns.barplot(y=popular.index, x=popular.values, palette='viridis', legend=False): This line of code uses Seaborn's barplot() function to create a bar plot.
.set(title='Top 5 Genres by Frequency'): This line of code sets the title for the regression plot.

Monthly additions of favorite tracks

monthly_additions = tracks['Month'].value_counts().reindex(['Jan',
'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
sns.barplot(x=monthly_additions.index, y=monthly_additions.values,
palette='flare')

Output:

monthly_additions = tracks['Month'].value_counts().reindex(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']): This line of code reindexes the Month columns, into the name of the corresponding month.
sns.barplot(x=monthly_additions.index, y=monthly_additions.values, palette='flare'): This line of code uses Seaborn's barplot() function to create a bar plot.

Days with Peaks of Indie Tracks

daily_indie_additions = indie_tracks['Added
At'].dt.date.value_counts().sort_index()
sns.lineplot(x=daily_indie_additions.index, y=daily_indie_additions.values,
marker='o', color='purple')

Output:

Top 5 indie artists in the dataset

indie_artists =
indie_tracks['Artists'].str.split(',').explode().value_counts().head(5)
sns.barplot(y=indie_artists.index, x=indie_artists.values,
palette='cubehelix')

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assets		assets
inputs		inputs
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify Music Data Analysis Project🎧

Introduction

Import Required Libraries

Exploring the Dataset

Data Collection

Overview

Output:

Identifying Null Values in the Dataset

Output:

Dataset Info

Output:

Extracting Insights from the Dataset through Analysis

Output:

Output:

Output:

Output:

Output:

Output:

Output:

Output:

Output:

About

Releases

Packages

Languages

License

cristinagenduso/Spotify-Music-Data-Analysis

Folders and files

Latest commit

History

Repository files navigation

Spotify Music Data Analysis Project🎧

Introduction

Import Required Libraries

Exploring the Dataset

Data Collection

Overview

Output:

Identifying Null Values in the Dataset

Output:

Dataset Info

Output:

Extracting Insights from the Dataset through Analysis

Output:

Output:

Output:

Output:

Output:

Output:

Output:

Output:

Output:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages