The raw data has been recorded by the Weather station of Max Planck Institute for Biogeochemistry, Jena, Germany. Jena Weather dataset is made up of many different quantities (such air temperature, atmospheric pressure, humidity, wind direction, and so on) were recorded every 10 minutes, over several years. This dataset covers data from January 1st 2004 to December 31st 2020 The actual data is this a copy which is published for academic purposes as a kaggle dataset, Link : kaggle/Weather Station Beutenberg Dataset . The primary data is stored as a single .csv file which is later processed to processed.csv file to be taken for training.
Data has been stored using DVC(Data version Control), so the repository package can be used flexibly without adding the data straight in the repo but fetch from any remote source e.g. AWS S3, GDRIVE, etc. For this case, the data has been stored in GDRIVE.
The data follows a strict data science project structure.
.
└── root/
├──.dvc/
├── config/
├── mlruns/
├── models/
├── notebooks/
├── results/
└── src/
├── data
├── features
├── models
└── visualization
- Create a Virtual Environment : Tutorial
- Clone the repository by running this command.
git clone https://github.com/sagnik1511/samay_yantra.git
- Open the directory with cmd.
- Copy this command in terminal to install dependencies.
pip install -r requirements.txt
- Installing the requirements.txt may generate some error due to outdated MS Visual C++ Build. You can fix this problem using this.
- Go to the root directory using
cd
command. - The first step is to download the actual data into the project.Copy and run this command.
dvc pull
- If you want to run the training process, simply change the configuration in
config/pt_training.yaml
and then run this command . Keep in mind that you have to stay at the root directory.
python -m src.engine.pytorch_trainer
- Further usage will be updated soon...
You can visit reports directory where all the runs are stored. Currently, for some privacy issues, the mlflow runs are not shared in here.