Restructure from scripts #8

jatkinson1000 · 2023-02-16T15:22:26Z

The code is currently run using scripts as main entry points (cmip26.py and trainScript.py).

This is not ideal for distributing as they are long scripts that a user needs to run and change details ('magic numbers') in.
These will also not work in the proposed src/ layout.

Scripts should be broken up in to functions with simple entry/call in main package, and grouped with other relvant code.

However, there is a larger discussion that needs to be held to decide how to break up the code.

Separate libraries within this repo for data acquisition/processing and model?
Separate repositories for data acquisition/processing and model?
- This starts to feel verbose, but is what is required for Huggingface?
cmip26.py probably needs to be grouped with data acquisition and processing src/gz_ocean_momentum/data/
trainScript.py probably needs to be grouped with src/gz_ocean_momentum/train/

Remove MLFlow framework #4

The text was updated successfully, but these errors were encountered:

raehik · 2023-05-16T16:21:16Z

This is the next large refactoring that we should look at. Mostly a question of how we configure runs, and optionally MLflow. (The training stage in particular I think depends on MLflow to locate data to process. The data processing stage cmip26.py I was able to run without MLflow.)

raehik · 2023-09-20T15:30:03Z

#85 addresses this for the data step.

MarionBWeinzierl · 2023-09-28T08:18:35Z

#95 does the training step refactor

raehik · 2023-11-08T16:48:08Z

We're closing this issue as partially done:

The data step was refactored in refactor data step script into library (API) and consumer (CLI) #85 (will be properly merged in Refactor data step, inference step, Jupyter notebooks #97), and defines a bunch of library functions which the CLI then calls. Much less top-level code.
The training step was improved in refactor data step script into library (API) and consumer (CLI) #85 / Refactor data step, inference step, Jupyter notebooks #97 , with a lot of hardcoded values removed. We didn't perform a larger refactor.
We're not splitting up the code into separate repositories. Lots of coupling remains between steps, and it would be a bit complicated to untangle this (and would mean lots more maintenance stress).
We did general file renamings (again refactor data step script into library (API) and consumer (CLI) #85 , Refactor data step, inference step, Jupyter notebooks #97).

jatkinson1000 added question Further information is requested code review Resulting from Feb 2023 code review labels Mar 1, 2023

jatkinson1000 mentioned this issue Mar 2, 2023

Add basic pyproject.toml for installing dependencies #13

Merged

mondus added this to the First release deliverable of the project milestone May 9, 2023

raehik closed this as completed Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure from scripts #8

Restructure from scripts #8

jatkinson1000 commented Feb 16, 2023 •

edited

Loading

raehik commented May 16, 2023

raehik commented Sep 20, 2023

MarionBWeinzierl commented Sep 28, 2023

raehik commented Nov 8, 2023

Restructure from scripts #8

Restructure from scripts #8

Comments

jatkinson1000 commented Feb 16, 2023 • edited Loading

raehik commented May 16, 2023

raehik commented Sep 20, 2023

MarionBWeinzierl commented Sep 28, 2023

raehik commented Nov 8, 2023

jatkinson1000 commented Feb 16, 2023 •

edited

Loading