Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/experiment tracking #122

Open
wants to merge 22 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
a9bc8dc
Initial version of experiment_tracking.py
antoniogonzalezsuarez Jan 11, 2024
2900bc8
Lazy imports
antoniogonzalezsuarez Jan 11, 2024
7e7a320
Merge branch 'development' into feature/experiment_tracking
antoniogonzalezsuarez Jan 11, 2024
495fd7e
Merge branch 'development' into feature/experiment_tracking
antoniogonzalezsuarez Jan 11, 2024
5f6ef4d
Merge remote-tracking branch 'origin/feature/experiment_tracking' int…
antoniogonzalezsuarez Jan 11, 2024
db6e65f
Added documentation
antoniogonzalezsuarez Jan 11, 2024
f2eae14
Added metrics.py to avoid sklearn dependency
antoniogonzalezsuarez Jan 11, 2024
0816e67
Added metrics.py to avoid sklearn dependency
antoniogonzalezsuarez Jan 11, 2024
de3815f
Missing documentation. Added first version of MLTracking
antoniogonzalezsuarez Jan 12, 2024
fc65ba9
Added docs.
antoniogonzalezsuarez Jan 23, 2024
b6da2a5
Bugfix with fig.close()
antoniogonzalezsuarez Jan 23, 2024
5f0991e
Update experiment_tracking.py
antoniogonzalezsuarez Feb 5, 2024
6a297c9
Make all metrics round to 4 decimals
antoniogonzalezsuarez Feb 6, 2024
26b7388
Added predict methods
antoniogonzalezsuarez Feb 14, 2024
317c9d0
Merge branch 'development' into feature/experiment_tracking
antoniogonzalezsuarez Feb 14, 2024
37932b2
Fix small issues in predict
antoniogonzalezsuarez Feb 14, 2024
f294e53
Merge remote-tracking branch 'origin/feature/experiment_tracking' int…
antoniogonzalezsuarez Feb 14, 2024
cac5120
Merge branch 'develop' into feature/experiment_tracking
antoniogonzalezsuarez Jul 22, 2024
3502189
Merge branch 'develop' into feature/experiment_tracking
antoniogonzalezsuarez Jul 23, 2024
8107ddc
Merge branch 'develop' into feature/experiment_tracking
antoniogonzalezsuarez Jul 23, 2024
8a05e53
Fixed issues on testing
antoniogonzalezsuarez Jul 24, 2024
045af73
Update docs
antoniogonzalezsuarez Jul 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions docs/source/dev/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,3 +62,154 @@ The following functions are tools made to work with optimization models created
the `pyomo library. <http://www.pyomo.org/>`_

.. automodule:: mango.models.pyomo

Machine Learning
================

Metrics
~~~~~~~~

As a part of mango we have implemented some metrics that are used to evaluate the performance of the models. The metrics are implemented in the following module.

.. automodule:: mango.models.metrics

Enumerations
~~~~~~~~~~~~

The enumerations are used to define the type of problem and the type of model.

.. automodule:: mango.models.enums

Experiment tracking
~~~~~~~~~~~~~~~~~~~~

During the training of the models, the user may develop many models and it is important to keep track of the results.
For this purpose, we have implemented several classes that can be used to keep track of the experiments. The classes
are implemented in the following module.

The main class is the MLExperiment class. This class is used to keep track of the results of the experiments. The
MLExperiment class is used to save the results of the experiments in a folder structure and provides some methods to
analyze the results.

.. autoclass:: mango.models.experiment_tracking.MLExperiment
:members:
:undoc-members:
:private-members:
:show-inheritance:

MLTracker is a class that can be used to keep track of the experiments. It is a simple manager that uses the folder
where all the experiments are saved. It provides some methods to analyze the results and compare the experiments.

.. autoclass:: mango.models.experiment_tracking.MLTracker
:members:
:undoc-members:
:private-members:
:show-inheritance:


In case does not want to use the MLExperiment class, the user can use the following function to save the results of the
trained model into a folder structure. The model is saved as a pickle file and the
data is saved as csv files. The function also saves a summary of the model in a json file. This way many models
(experiments) can be saved in the same folder and the user can easily compare them.

.. autofunction:: mango.models.export_model

The subfolder structure after running export_model is the following:

If not zipped:

.. code-block:: bash

base_path
|-- experiment_LinearRegression_20240111-133955
| `-- summary.json
| |-- data
| | |-- X_test.csv
| | |-- X_train.csv
| | |-- y_test.csv
| | `-- y_train.csv
| `-- model
| |-- hyperparameters.json
| `-- model.pkl

In case of zipped:

.. code-block:: bash

base_path
|-- experiment_LinearRegression_20240111-133955
| |-- summary.json
| |-- data.zip
| `-- model.zip


The following is an example of the summary.json file:

.. code-block:: json

{
"model": {
"name": "LinearRegression",
"problem_type": "regression",
"input": "X_train.csv",
"target": "y_train.csv",
"hyperparameters": {
"fit_intercept": true,
"normalize": false,
"copy_X": true,
"n_jobs": null
},
"library": "sklearn"
},
"results": {
"train": {
"r2": 0.9999999999999999,
"rmse": 0.0,
"mae": 0.0
},
"test": {
"r2": 0.9999999999999999,
"rmse": 0.0,
"mae": 0.0
}
}
}

If save_dataset is set to True, the JSON file will also contain the following:

.. code-block:: json

{
"data": {
"X_train": {
"path": "X_train.csv",
"shape": [
100,
2
]
},
"y_train": {
"path": "y_train.csv",
"shape": [
100,
1
]
},
"X_test": {
"path": "X_test.csv",
"shape": [
100,
2
]
},
"y_test": {
"path": "y_test.csv",
"shape": [
100,
1
]
}
}
}

Model experiments
119 changes: 119 additions & 0 deletions docs/source/experiment_tracking.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
Experiment Tracking
-------------------

This section describes how to use the experiment tracking system.

We will use the california housing dataset from sklearn as an example.

.. code-block:: python

from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.3)
X_validation, X_test, y_validation, y_test = train_test_split(X_test, y_test, random_state=0, test_size=0.5)

Now we will create a simple pipeline to train a linear regression model and wrap it in an instance of :class:`MLExperiment<mango.models.experiment_tracking.MLExperiment>`

.. code-block:: python

from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from mango.models import MLExperiment
pipeline = Pipeline([
('scaler', StandardScaler()),
('regressor', LinearRegression())
])

pipeline.fit(X_train, y_train)
experiment = MLExperiment(
model=pipeline,
name='California Housing LinearRegression',
description='LinearRegression on California Housing dataset',
problem_type='regression',
X_train=X_train,
X_test=X_test,
y_train=y_train,
y_test=y_test,
X_validation=X_validation,
y_validation=y_validation
)

Once the model is wrapped several metrics are pre-computed and stored in the experiment object.

.. code-block:: python

print(experiment.metrics["test"])

{
"train_score":{
"r2_score":0.606,
"mean_squared_error":0.524,
"mean_absolute_error":0.524,
"median_absolute_error":0.524,
"explained_variance_score":0.606
},
"test_score":{
"r2_score":0.606,
"mean_squared_error":0.524,
"mean_absolute_error":0.524,
"median_absolute_error":0.524,
"explained_variance_score":0.606
}
}

This experiment can be registered with the experiment tracking system by calling the :meth:`register<mango.models.experiment_tracking.MLExperiment.register_experiment>` method.

.. code-block:: python

experiments_folder = "/home/user/experiments"
experiment.register_experiment(experiments_folder)


The experiment is now registered and can be viewed in the experiment tracking system.

The tracking system is used in python with :class:`MLTracker<mango.models.experiment_tracking.MLTracker>`.

.. code-block:: python

from mango.models import MLTracker
tracker = MLTracker(experiments_folder)
traker.scan_for_experiments(experiment_folder)

If we now create another experiment using a RandomForestRegressor, we can register it with the tracking system and view it. Now we will show another
way of adding the experiment to the tracking system. We will use the :meth:`add_experiment<mango.models.experiment_tracking.MLTracker.add_experiment>` method.
that adds the experiment to the tracking system and also registers (saves into a subfolder) it for future use.

.. code-block:: python

from sklearn.ensemble import RandomForestRegressor
pipeline = Pipeline([
('scaler', StandardScaler()),
('regressor', RandomForestRegressor())
])

pipeline.fit(X_train, y_train)
experiment = MLExperiment(
model=pipeline,
name='California Housing RandomForestRegressor',
description='RandomForestRegressor on California Housing dataset',
problem_type='regression',
X_train=X_train,
X_test=X_test,
y_train=y_train,
y_test=y_test
)
tracker.add_experiment(experiment, experiments_folder)


Once we added different experiments to the tracking system we can use the :meth:`create_compare_df<mango.models.experiment_tracking.MLTracker.create_compare_df>`
to create a dataframe that compares the different experiments and shows their metrics.

.. code-block:: python

tracker.create_compare_df()

For more information about other methods and usages go to :class:`MLTracker<mango.models.experiment_tracking.MLTracker>`.

.. note::
This module is still under development and some of the features described in this documentation may not be implemented yet. If you find any bug or have any suggestion, please, open an issue in the `GitHub repository <https://github.com/baobabsoluciones/mango>`_.
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Welcome to mango's documentation!

readme
changelog
experiment_tracking
genetic/index
dev/index
bib
1 change: 1 addition & 0 deletions mango/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .neural_networks import calculate_network_output
from .activations import sigmoid, tanh
from .experiment_tracking import MLExperiment, MLTracker, export_model
28 changes: 28 additions & 0 deletions mango/models/enums.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from enum import Enum


class ProblemType(Enum):
"""
Enum to represent the problem type.
"""

REGRESSION = "regression"
CLASSIFICATION = "classification"

# When creating a new one convert to lowercase
@classmethod
def _missing_(cls, value: str):
for member in cls:
if member.value.lower() == value.lower():
return member
return super()._missing_(value)


class ModelLibrary(Enum):
"""
Enum to represent the model library.
"""

SCIKIT_LEARN = "scikit-learn"
CATBOOST = "catboost"
LIGHTGBM = "lightgbm"
Loading
Loading