The client for this project is a large retailer based in the United States. The company has identified issues in its warehouse operations, leading to losses and stock-outs for several products. The objective is to implement a forecasting model using artificial intelligence algorithms to predict the appropriate stock levels for at least the next 8 days. This initiative aims to enhance operational efficiency and increase the company’s profitability.
The primary objective is to develop a forecasting model utilizing a set of machine learning algorithms to predict sales for the next 8 days at the store-product level. These algorithms are trained using the extensive three-year history available in the retail company’s SQL database, employing massive modeling techniques to ensure accuracy and reliability.
Several insights have been uncovered through the exploratory data analysis. The main actionable initiatives are summarized below.
-
With the current information collected by the company, it is challenging to determine whether intermittent demand is due to stock-outs or simply zero demand for the product. Therefore, it is highly recommended to track warehouse data closely.
-
To optimize sales and reduce warehouse costs, implementing a forecasting model based on machine learning algorithms can be highly effective. A bottom-up approach, starting at the product level, is particularly recommended in this context.
-
Discounts have proven to be quite successful, especially for product 090, which is the top seller. It may be worthwhile to apply this strategy to other products and evaluate the overall impact.
-
Saturdays and Sundays show the highest sales volumes, presenting an opportunity to develop targeted strategies and campaigns.
-
Special days like Thanksgiving, Labor Day, and Easter generate significant sales. These occasions should be thoroughly analyzed, and marketing strategies should be developed to enhance the company’s profitability.
In this project, we have developed a robust recursive forecasting model based on a set of machine learning algorithms for sales prediction. This model analyzes each product-store combination individually and tailors the algorithm to predict demand for the next 8 days. LightGBM, with standard hyperparameters, was identified as the best option for predictive performance. The model was tested with new data from December 2015, yielding satisfactory results, with a mean absolute error of approximately 4.73.
This forecasting model will help to reduce warehouse costs and stock-outs, significantly boosting the company's performance and profitability.
- 📁 00_Imagenes: Contains project images.
- 📁 01_Documentos: Contains basic project files:
- retail.yml: Project environment file.
- FaseDesarrollo_Transformaciones.xlsx: Support file for designing feature transformation processes.
- FaseProduccion_Procesos.xlsx: Support file for designing final production script.
- 📁 02_Datos
- 📁 01_Originales
- hipermercado.db: Original SQL dataset.
- 📁 02_Validacion
- validacion.csv: Sample extracted from the original dataset at the beginning of the project, which is used to check the correct performance of the model once it is put into production.
- DatosParaProduccion.csv: Support file for the execution of the recursive forecasting model. It also shows the required structure for the files that are passed to the forecasting model.
- 📁 03_Trabajo
- This folder contains the datasets resulting from each of the stages of the project (data quality, exploratory data analysis, variable transformation, ...).
- 📁 01_Originales
- 📁 03_Notebooks
- 📁 01_Funciones
- FuncionesRetail.ipynb: Notebook containing all custom functions used in the training and production of the model.
- 📁 02_Desarrollo
- 01_Set Up.ipynb: Notebook used for the initial set up of the project.
- 02_Calidad de Datos.ipynb: Notebook detailing and executing all data quality processes.
- 03_EDA.ipynb: Notebook used for the execution of the exploratory data analysis.
- 04_Transformacion de datos.ipynb: Notebook that details and executes the data transformation processes necessary to prepare the variables for the models.
- 05_Preseleccion de variables.ipynb: Notebook used for the variable selection process.
- 06_Modelizacion para Regresion.ipynb: Notebook for modeling the predictive forecasting model. It contains the model selection, the hyperparametrization, and the evaluation of results. Both individual and massive modeling are also developed in this notebook.
- 07_Preparacion del codigo de produccion.ipynb: Notebook used to compile all the quality, transformation, and variable selection processes, as well as the final model and execution and retraining processes. It is used to create the final retraining and execution pipes that condense all the aforementioned processes.
- 📁 03_Sistema
- This folder contains the files (production script, models, functions ...) used in the model's deployment.
- 📁 01_Funciones
- 📁 04_Modelos
- lista_modelos_retail.pickle: File containing all of the developed models for each product-store combination.
- ohe_retail.pickle: File containing the one hot encoding pipe.
- te_retail.pickle: File containing the target encoding pipe.
- 📁 05_Resultados
- FuncionesRetail.py: Python script that contains all custom functions needed when training or executing the model.
- Codigo de ejecucion.py: Python script to execute the model and obtain the results.
- Codigo de reentrenamiento.py: Python script to retrain the model with new data when necessary.
- lista_modelos_retail.pickle: File containing all of the developed models for each product-store combination.
- variables_finales.pickle: Names of the final selected variables after training.
The project should be run using the same environment in which it was created.
- Project environment can be replicated using the retail.yml file, which was created during the set up phase of the project. It can be found in the folder 01_Documentos.
- To replicate the environment it is necessary to copy the retail.yml file to the directory and use the terminal or anaconda prompt executing:
- conda env create --file retail.yml --name project_name
On the other hand, remember to update the project_path variable of the notebooks to the path where you have replicated the project.