Skip to content

inderpalk/WholeSale_data-analysis

Repository files navigation

Wholesale Customer Segmentation and Analysis

Overview

This project involves the analysis of a wholesale customer dataset. The dataset contains annual spending information (in monetary units) for various product categories. The primary objective is to segment customers based on their purchasing behavior and gain insights into their preferences.

Table of Contents

Project Description

The project involves the following key steps:

  • Exploratory Data Analysis (EDA): This phase focuses on understanding the dataset, cleaning and preprocessing the data, and generating insights through various visualizations and statistical summaries.

  • Clustering Analysis: The dataset is clustered using unsupervised machine learning techniques, such as K-means and hierarchical clustering, to group similar customers together based on their spending behavior.

  • Principal Component Analysis (PCA): PCA is applied to identify the principal components that best describe the variance in the data and reduce dimensionality.

  • Results: The findings from the analysis, including customer segments and insights gained, are presented in the README and in the project's documentation.

Getting Started

Prerequisites

Before running the project, ensure you have the following prerequisites:

  • Python
  • Jupyter Notebook

Installation

  1. Clone this repository:

  2. Install the required Python packages: To run this project, you need to have the following Python packages installed:

  • numpy
  • pandas
  • matplotlib
  • seaborn
  • scikit-learn

You can install these packages using pip:

Exploratory Data Analysis (EDA)

The EDA phase involves data cleaning, visualization, and summary statistics to gain insights into the dataset. Key visualizations and observations include:

  • Histograms and box plots to understand the distribution of spending in each product category.
  • Correlation analysis to identify relationships between variables.
  • Outlier detection and handling.

Clustering Analysis

The dataset is segmented into clusters using the following methods:

  • K-means Clustering: The optimal number of clusters is determined using the Elbow Method, and customers are grouped accordingly.
  • Hierarchical Clustering: Clusters are formed based on hierarchical relationships between data points.
  • Cluster Interpretation: Each cluster is described, and insights into customer behavior are provided.

Principal Component Analysis (PCA)

PCA is applied to understand the underlying structure of the data and reduce dimensionality. Key components are identified and interpreted.

Results

The key findings and insights from the analysis are presented, including:

  • Customer segments and their characteristics.
  • Optimal number of clusters.
  • Principal components and their interpretations.

Contributing

Contributions to this project are welcome. If you have suggestions, bug reports, or feature requests, please create an issue or submit a pull request.

Feel free to reach out if you have any questions or feedback about the project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published