This repository provides a starter code for a finance deep learning project using PyTorch.
The repository is organized as follows.
├── 📂 config/
│ ├── 📄 dvc_pipeline.yaml
│ ├── 📄 tune.yaml
│ ├── 📄 train.yaml
│ └── 📄 inference.yaml
├── 📂 dataset/
│ └── 📂 raw/
│ └── 📄 sample_data.csv
├── 📂 notebook/
│ ├── 📄 inference.ipynb
├── 📂 src/
│ ├── 📄 __init__.py
│ ├── 📄 data.py
│ ├── 📄 model.py
│ ├── 📄 utils.py
│ ├── 📄 tune.py
│ ├── 📄 train.py
│ ├── 📄 app.py
│ ├── 📄 serve.py
│ └── 📄 deploy.py
├── 📄 dvc.yaml
└── 📄 requirements.txt
-
configcontains the default YAML configuration files for the scripts.-
config/dvc_pipeline.yamlcontains the hyperparameters and configurations for the DVC pipeline indvc.yaml. -
config/tune.yamlcontains the hyperparameters and configurations forsrc/tune.py. -
config/train.yamlcontains the hyperparameters and configurations forsrc/train.py. -
config/inference.yamlcontains the configurations fornotebook/inference.ipynb,src/app.py,src/serve.py, andsrc/deploy.py.
-
-
datasetis the directory for raw and processed data files (to be tracked by DVC).-
dataset/raw/is the directory for placing raw data files. -
dataset/raw/sample_data.csvcontains a sample PSE data.
-
-
notebookcontains the Jupyter notebooks.notebook/inference.ipynbis the notebook for performing model inference and other analysis.
-
srccontains the Python scripts.-
src/data.pycontains data-related classes and functions. -
src/model.pycontains model-related classes and functions. -
src/utils.pycontains additional utility functions. -
src/tune.pycontains the code for hyperparameter tuning. -
src/train.pycontains the code for training the model. -
src/app.pycontains the code for an inference front-end for the model. -
src/serve.pycontains the code for locally serving the model. -
src/deploy.pycontains the code for deploying the model to Docker.
-
-
dvc.yamldefines the DVC pipeline for hyperparameter tuning and model training. -
requirements.txtlists the Python package dependencies for this repository.
To install the dependencies of this repository, run the command
pip install -r requirements.txt
To use the DVC pipeline, first remove the raw dataset from being git-tracked by running the command
git rm -r --cached dataset/raw/sample_data.csv
git commit -m 'Stopped Tracking Dataset'
To perform hyperparameter tuning and model training using the pipeline, edit the configuration config/dvc_pipeline.yaml and then run
dvc exp run
This will create params.yaml containing the final configurations used in the pipeline.
To perform hyperparameter tuning, edit the configuration config/tune.yaml to specify the search space and then run
python -m src.tune
After tuning the hyperparameters, edit the configuration config/train.yaml with the best hyperparameter and then run
python -m src.train
To perform model inference, edit the configuration config/inference.yaml to specify the MLFlow and other settings.
Run the notebook notebook/inference.ipynb to perform model inference and other analysis.
Run the following to serve the model with a Gradio front-end app
python -m src.app
Run the following to serve the model locally with an API
python -m src.serve
Run the following to deploy the model with Docker
python -m src.deploy