Predicting product sales based on advertising expenditures across various platforms.
##Problem Statement Sales prediction means predicting how much of a product people will buy based on factors such as the amount you spend to advertise your product, the segment of people you advertise for, or the platform you are Advertising on about your product.
Typically, a product and service-based business always need their Data Scientist to predict their future sales with every step they take to manipulate the cost of advertising their product. So let’s start the task of sales prediction with machine learning using Python.
To predict how much of a product consumers will purchase based on advertising budgets, a machine learning analysis was performed using a dataset of 200 entries. The analysis focused on the relationship between advertising spend across three channels (TV, Radio, and Newspaper) and the resulting Sales.
The objective of this analysis was to determine how advertising expenditures influence product sales to help the business optimize its marketing budget for maximum revenue.
- Advertising Allocation: On average, the business spends significantly more on TV advertising ($147.04) than on Radio ($23.26) or Newspaper ($30.55).
- Sales Performance: Sales figures range from a minimum of 1.6 to a maximum of 27.0 units, with an average of 14.02 units.
- Data Integrity: The dataset was found to be clean, with no null or duplicate values, ensuring a reliable basis for statistical modeling.
- Exploratory Data Analysis (EDA): Visual inspections using histograms and boxplots were used to understand distributions and identify potential outliers.
- Data Preprocessing: Redundant columns (such as index identifiers) were removed to streamline the modeling process.
- Modeling: The project implements statistical techniques, specifically Linear Regression (using the statsmodels library), to establish the mathematical relationship between the independent variables (ad spend) and the dependent variable (Sales).
The analysis provides a framework for data-driven decision-making. By leveraging the regression models developed in this project, the business can manipulate advertising costs across platforms to forecast future sales with high accuracy, ultimately allowing for a more efficient and effective distribution of the marketing budget.
├── Advertising.csv # Dataset containing advertising spend and sales data
├── Sales Prediction.ipynb # Jupyter Notebook containing the full analysis
└── README.md # Project documentation
- Python 3.8 or higher
- Jupyter Notebook or JupyterLab
The following Python libraries are required to run the analysis:
- pandas
- numpy
- seaborn
- matplotlib
- statsmodels
- Clone the repository to your local machine.
- Ensure the dataset
Advertising.csvis in the same directory as the notebook. - Install the required dependencies via pip:
pip install pandas numpy seaborn matplotlib statsmodels
- Launch the Jupyter Notebook and execute the cells sequentially to reproduce the analysis.
The project follows a standard data science pipeline:
- Data Loading: Importing the dataset into a pandas DataFrame.
- Cleaning: Dropping unnecessary columns and verifying data quality.
- Visualization: Creating boxplots and histograms to identify data spread and outliers.
- Statistical Analysis: Describing the data to extract mean, standard deviation, and range.
- Regression Modeling: Building models to quantify the impact of each advertising medium on total sales.
This project is licensed under the MIT License. See the LICENSE file for more details.