This project dives into the world of fragrance sales, uncovering key insights such as trends, price distributions, brand dominance, and consumer preferences. The analysis covers both menβs and womenβs fragrances, using Python-based data science tools to clean, explore, and visualize large datasets. The goal is to provide actionable insights into market trends, product positioning, and consumer behavior within the fragrance industry.
- Fragrance Analysis: Insights into the Perfume Industry with Data π¬
- Project Overview π
- Tools and Technologies π οΈ
- Step-by-Step Code Overview π§βπ»
- Graphical Insights π
- Key Findings π
- References π
- Python: The core language for data manipulation and analysis.
- Pandas: A powerful library for data cleaning, structuring, and analysis.
- Matplotlib/Seaborn: Visualization libraries used to create plots and graphs that reveal trends and patterns.
- Jupyter Notebooks: A platform for documenting the data exploration process and organizing analysis workflows.
- GitHub: For version control, collaboration, and code sharing.
This section covers the key Python skills demonstrated in the project, focusing on data cleaning, aggregation, and visualization using Pandas, Matplotlib, and Seaborn. The full code can be found in the GitHub repository.
The first step was to load the fragrance sales dataset and perform an initial exploration to understand its structure. You can view the full code for this part here.
import pandas as pd
# Load dataset
data = pd.read_csv('fragrance_sales.csv')
# Display first few rows of the dataset
data.head()
Explanation:
- Skills: Using Pandas to load and explore data.
- Purpose: The
.head()
function provides a quick look at the datasetβs structure, helping to identify key columns and understand the overall data composition.
Next, the dataset was cleaned by removing rows with missing values and eliminating duplicate records to ensure data accuracy. Full code for data cleaning can be accessed here.
# Remove rows with missing values in key columns
data.dropna(subset=['price', 'brand', 'category'], inplace=True)
# Drop duplicate entries
data.drop_duplicates(inplace=True)
# Ensure that 'price' is treated as a numeric data type
data['price'] = pd.to_numeric(data['price'], errors='coerce')
Explanation:
- Skills: Cleaning data by handling missing values (
dropna()
) and removing duplicates (drop_duplicates()
). - Purpose: These operations improve the quality of the data, ensuring that the analysis is reliable and accurate.
After cleaning, the data was grouped by relevant categories to calculate total sales, providing insights into the most profitable product categories. The full code for this can be found here.
# Group data by category and calculate total sales
total_sales_by_category = data.groupby('category')['sales'].sum()
# Display the results
total_sales_by_category
Explanation:
- Skills: Using
groupby()
andsum()
to aggregate data. - Purpose: This aggregation helps to summarize sales by category, providing an understanding of which categories perform best.
This section showcases the key visualizations created during the analysis of the fragrance sales data. Each graph was generated using Python libraries like Matplotlib and Seaborn, and they provide valuable insights into brand dominance, price distribution, consumer preferences, and market trends.
This scatter plot shows the most frequently used fragrance notes in perfumes. Musk, Jasmine, and Amber are among the top fragrance notes, indicating their popularity among perfume brands.
# Scatter plot for most popular fragrance notes
sns.scatterplot(data=notes, x='percent_of_all', y='count', palette='magma', hue='percent_of_all')
plt.legend().remove()
plt.xlabel('Percent of all Perfumes')
plt.ylabel('Total of Perfumes')
plt.title('Most Popular Notes')
for i in range(len(notes)):
plt.text(notes['percent_of_all'].iloc[i],
notes['count'].iloc[i],
notes['notes_str'].iloc[i],
fontsize=10, ha='right')
#Building Barplot
sns.barplot(data=top10_notes, x='count', y='notes_str', palette='magma')
sns.despine()
plt.title('Most Popular Notes')
plt.ylabel('')
plt.xlabel('Total of Perfumes')
This bar chart highlights the brands that have created the highest number of perfume variations. Musk leads the way, followed by Jasmine and Amber, showing their strong presence in the fragrance market.
#Building Barplot
sns.barplot(data=top10_size, x='perfume', y='brand', palette='magma')
sns.despine()
plt.title('Brands that created the most perfume variations')
plt.ylabel('')
plt.xlabel('Total of Perfumes')
This boxplot shows the price distribution of different types of perfumes. It highlights how Eau de Parfum generally has the highest price range, while Eau de Toilette and Cologne are more affordable.
sns.boxplot(data=med_category, y='type_cleaned', x='price', palette='magma')
plt.ylabel('')
plt.title('Price Distribution by Perfume Type')
plt.xlabel('Price (USD)')
This bar chart shows the top 25 menβs perfumes ranked by the number of sales. Calvin Klein, Versace, and Davidoff are among the most popular brands in menβs fragrance.
sns.barplot(data=top_brand, x='sold', y='brand', palette='magma')
sns.despine()
plt.title('Top 25 Men Perfumes by number of sales')
plt.ylabel('')
plt.xlabel('Number of Sales')
This line chart shows the upward trend in the number of perfumes launched over the past two decades. The data reveals a steady increase in perfume launches since the early 2000s.
sns.lineplot(data=year, x='launch_year', y='total', color='red')
plt.xticks(rotation=45)
sns.despine()
plt.xlabel('Launch Year')
plt.ylabel('Total of Perfumes')
plt.title('Trend of Pefumes Over Last 20 Years')
This bar chart provides a breakdown of the total sales for different perfume categories. Eau de Toilette leads the market in terms of sales, followed by Eau de Parfum and Cologne.
sns.barplot(data=top_category, x='type_cleaned', y='sold', palette='magma')
sns.despine()
plt.title('Total sales for different perfume categories')
plt.xlabel('')
plt.ylabel('Number of Sales')
plt.figtext(0.5, -0.2, """Typical fragrance concentrations for each type:
1. Perfume (Parfum): 20% - 40%
2. Eau de Parfum: 15% - 20%
3. Eau de Toilette: 5% - 15%
4. Cologne (Eau de Cologne): 2% - 4%""",
ha="center", fontsize=10)
- Brand Dominance: Brands like Avon and Demeter Fragrance dominate the market with a wide variety of products.
- Price Distribution: Eau de Parfum commands higher prices, while Eau de Toilette and Cologne offer more affordable options.
- Consumer Preferences: Popular fragrance notes such as Musk, Jasmine, and Amber are consistently in demand.
- Market Growth: The fragrance market has seen significant growth, with a steady increase in new perfume launches over the last 20 years.
- Top Performers: Calvin Klein, Versace, and Davidoff lead the menβs fragrance market in terms of sales.
- Perfume dataset: Web scraped from Fragrantica and analyzed by this GitHub repository.
- Sales dataset: Web scraped sales data from eBay (2024), available on Kaggle.