https://zrp.github.io/challenges/data-science/
- PyCharm with Jupyter Notebook
- Anaconda
- Python 3.9.10
- Windows 10
- All packages used are listed in the
requirements.txtfile
- Diagram:

assetsfolder contains the dataframes used in the projectpipelinefolder contains the funnel of the project, described by the diagram aboveprepare_data.ipynbis the file that contains the code to prepare the data for Exploratory Analysis and Prediction.eda.ipynbcontains the Exploratory Analysis of the datasetmethodology 1contains code used to evaluate traditional machine learning models for the prediction of the inference.methodology 2contains code used to evaluate deep learning model (The recurrent neural network LSTM was used) for the prediction of the inference.
Observations:
prepare_data.ipynbis the file that contains the code to prepare the data for Exploratory Analysis and Prediction.- The resultant dataframes of this file are already stored at
assets - Therefore, you can skip the execution of
prepare_data.ipynband go directly to the Exploratory Analysis (eda.ipynb) and prediction files (methodology 1 and 2).
- Download the zip of the repository and upload it to your Google Drive
- Extract it
- Open any .ipynb file and click on the button "Open in Colab" at the top of the file.
- Put the following code to mount your google drive to the Google Colab environment:
from google.colab import drive
drive.mount('/content/drive')
- In the cells that contains the code:
df_diff = pd.read_csv('../assets/df_diff.csv', index_col=0)
# or
df_diff = pd.read_csv('assets/df_start_end.csv', index_col=0)
# or
model = keras.models.load_model('../../assets/model')
Change the path from where you stored the extracted zip of the repository. Here is an example:
df_diff = pd.read_csv('/content/drive/MyDrive/Empresas/ZRP/Desafio Técnico/zrp_case-main/assets/df_diff.csv', index_col=0)
You are all set to execute the project in Google Colab!
Comparing the best traditional model (KNN) with the RNN model (LSTM), the RNN model had a better accuracy: 75.85% againist 73% of KNN. Besides that, the RNN Model hyperparams can be tuned and trained for more epochs to show better results.