This is a simple, modular machine learning project template designed for training purposes. It implements best practices in Python and machine learning development.
MachineLearningWorkflow/
├── data/ # Folder for datasets
├── notebooks/ # Jupyter notebooks for experimentation
├── src/ # Main Python package
│ ├── __init__.py
│ ├── preprocess.py # Data cleaning and transformation
│ ├── model.py # Define train/test functions and model architectures
│ ├── evaluate.py # Functions for accuracy, precision, recall, etc.
│ └── utils/ # Utility functions
│ ├── __init__.py
│ └── helpers.py
├── tests/ # Unit tests
│ └── run_tests.py
├── README.md # Project overview
└── requirements.txt # Dependencies
- Python 3.8 or higher
- Libraries: Install dependencies from
requirements.txt
using:pip install -r requirements.txt
- Clone the repository.
- Place your dataset in the
data/
folder. - Follow the Jupyter notebooks in the
notebooks/
folder to understand the pipeline. - Modify the modules in the
src/
folder to customize the pipeline.
Located in src/preprocess.py
. This module includes:
- Functions for data cleaning, missing value handling, and feature scaling.
- Splitting datasets into training and testing sets.
Located in src/model.py
. This module includes:
- Definitions for different machine learning models.
- Training and testing pipelines.
Located in src/evaluate.py
. This module includes:
- Functions for calculating performance metrics like accuracy, precision, recall, and F1-score.
- Visualization tools for confusion matrices and learning curves.
Located in src/helpers.py
. This module includes:
- Helper functions for logging, model saving/loading, and miscellaneous utilities.
Trainees are encouraged to:
- Extend modules by adding new functionalities.
- Experiment with different datasets and models.
- Write unit tests for their additions in the
tests/
folder.