This repository contains all code, saved models and plots used in the CZ 4032 Data Analytics and Mining course at NTU.
The Rossmann Store Sales problem is a Kaggle Competition. The challenge requires participants to forecast sales of Rossmann over a period of 6 weeks given historical data of 1,115 stores located across Germany.
CZ 4032 Group 7, AY 17/18 Semester 1
- Nikhil Venkatesh (nikv96)
- Suyash Lakhotia (SuyashLakhotia)
- Virat Chopra (chopravirat)
- Priyanshu Singh (singhpriyanshu5)
- Shantanu Kamath (ShantanuKamath)
All code for this project is written in Python 3. The list of dependencies can be found in requirements.txt
. To set up your development environment, navigate to this repository and run the following on a terminal:
$ pip install -r requirements.txt
The data/
directory contains all data files downloaded from Kaggle. The plots/
directory contains all plots generated and all output files are recorded in the predictions/
directory.
The first step is to analyze the datasets for relationships and trends. To generate all plots we used for data analysis, run the following:
$ python DataAnalysis/generate_plots.py
To generate more plots, please extend the code in DataAnalysis/generate_plots.py
.
For each type of model, the codes can be found in the respective directories. The private leaderboard RMSPE score of each model is reported in the docstring of the Python file. Description of the model and assumptions made can also be found in the Python file. To run the code, run the following from the root of the repository:
$ python ModelType/model-name.py
The code will automatically generate the output predictions in predictions/
. The generated .csv
file can be uploaded to Kaggle to get the private & public RMSPE scores.