Skip to content

EugenTheMachine/DL-git-project

 
 

Repository files navigation

Homework for module 8: MLE

This DL-git-project project is Yevhen Ponomarov's assignment for H/W #8.

The goal of this project is to create, train, save, and test a neural network which uses the "Iris flower dataset" which was taken from sklearn.

The project consists of several directories and modules. Each of the modules has its brief description and all the neccessary notes and comments related to the code.

Note that each of the project folders includes an empty __init__.py file. It is required for proper work of modular structure.

Additional remarks

The notes below explain why some parts of this homework were completed the way they are presented. Please, read them carefully to ensure you understand my logic:

  1. Although data_process is not to be included in the final repository, this directory was included in order to show the script for data downloading which we were supposed to create, according to paragraph #1 of the homework instructions. It is not being used and can be deleted without any harm for the project.
  2. No matter how many times you run the training script trying to create different models, only the last model will be saved. That happens because the model name is pre-defines in settings and is not changed. It was done in order to simplify the work a little bit. The instructions say that the training container should "...save the trained model", without saying that it must be able to save multiple models, so in my opinion no rules were broken.
  3. Exception handling and unit testing parts are presented, but they are not implemented widely - just a little, in order to show the process purpose.

Project structure:

This project has a modular structure, where each folder has a specific duty. Note that in the original repository some folders are not presented originally - they appear while executing different code modules, as specified in the instructions below.

MLE_basic_example
├── data                      # Data files used for training and inference (it can be generated with download_data.py script)
│   ├── test_data.csv
│   └── train_data.csv
├── data_process              # Scripts used for data processing and generation
│   ├── download_data.py
│   └── __init__.py           
├── inference                 # Scripts and Dockerfiles used for inference
│   ├── Dockerfile
│   ├── run.py
│   └── __init__.py
├── models                    # Folder where last trained model is stored
│   └── prod_model.keras
├── net                    # Folder where neural network class definition is stored
|   ├── __init__.py
│   └── net.py
├── results                    # Folder where test dataset added with model predictions is stored
│   └── tested_data.csv
├── training                  # Scripts and Dockerfiles used for training
│   ├── Dockerfile
│   ├── train.py
│   └── __init__.py
├── utils.py                  # Utility functions and classes that are used in scripts
├── __init__.py
├── settings.json             # All configurable parameters and settings
├── requirements.txt          # The required libraries and their versions specified
├── .gitignore                # The git-ignored files and directories
└── README.md

Settings:

The configurations for the project are managed using the settings.json file, which stores important variables that control the behaviour of the whole project. Note that any changes may lead to incorrect code work inside of docker container.

Data:

In this project "Iris flower dataset" is used which was loaded from sklearn library. Following the steps from this H/W instructions, the script for creating train/test datasets was created. It can be found in the data_process/download_data.py file. However, there is no need to run it, because all the data is already prepared and stored in data folder, according to H/W instructions.

Getting started using Docker containers

The instructions below imply that you already have Docker Desktop application installed on your local machine and set up properly.

Training:

The training phase of the ML pipeline includes preprocessing of data, the actual training of the model, and the evaluation and validation of the model's performance. All of these steps are performed by the script training/train.py. The result of this phase is models/prod_model.keras file which contains a trained neural network instance which can be used for further predictions.

To train the model using Docker:

  1. Enter command prompt inside the project directory and build the training Docker image:
docker build -f ./training/Dockerfile --build-arg settings_name=settings.json -t training_image .
  1. Now, run the following command to run the container to train the model, save it and copy the resulting model to your local machine. Note, that the "models" folder will be created automatically, if it does not exist yet:
docker run -v %cd%/models:/app/models training_image
  1. Optional: if, for some reason, you want to copy a model from a specific container, use the command below to move the trained model from the directory inside the Docker container /app/models to the local machine:
docker cp <container_id>:/app/models/<model_name>.keras ./models

Replace <container_id> with your running Docker container ID and <model_name>.keras with your model's name.

Inference:

Once a model has been trained, it can be used to make predictions on new data in the inference stage. The inference stage is implemented in inference/run.py. The result of this stage is a results/tested_data.csv file which contains the original dataset + a new column with the model's predictions.

  1. Enter command prompt inside the project directory and build the inference Docker image:
docker build -f ./inference/Dockerfile --build-arg model_name=prod_model.keras --build-arg settings_name=settings.json -t inference_image .
  1. Run the inference Docker container. Note, that the "results" folder will be created automatically, if it does not exist yet::
docker run -v %cd%/models:/app/models -v %cd%/data:/app/data -v %cd%/results:/app/results inference_image

Getting started on your local Windows machine

Note: this paragraph is not obligatory to be followed and was created to provide an additional way of working with the project. The instructions below imply that you have Python (and IDE, if neccessary) installed on your local machine. Also, remember that in other OSes the commands may be written differently - check out official documentation on your OS for more information.

Should you want to start using this project on your local machine, please complete the steps listed below:

  1. On the main page of the DL-git-project project webpage, click the green "Code" button and clone the web URL in the opened tab;
  2. Choose the directory you want your project folder to be located in and open the terminal on your local machine. Then, run git clone <URL> command. Replace <URL> with the project's URL copied in step 1;
  3. Next, run the cd DL-git-project command in the terminal to enter the project folder.
  4. Now, run the pip install --no-cache-dir -r requirements.txt command in the terminal. This is needed to install some additional modules the project uses.
  5. To train and save the model, run python training/train.py in Windows command prompt. Alternatively you can run the file in your IDE.
  6. To evaluate the model, run python inference/run.py in Windows command prompt. Alternatively you can run the file in your IDE.

Wrap Up

That is the end of the project description!

About

Completed by Yevhen Ponomarov

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.4%
  • Dockerfile 7.6%