baobabsoluciones · antoniogonzalezsuarez · Jan 11, 2024 · Jan 11, 2024 · Jan 11, 2024 · Jan 11, 2024
@@ -62,3 +62,154 @@ The following functions are tools made to work with optimization models created
 the `pyomo library. <http://www.pyomo.org/>`_
 
 .. automodule:: mango.models.pyomo
+
+Machine Learning
+================
+
+Metrics
+~~~~~~~~
+
+As a part of mango we have implemented some metrics that are used to evaluate the performance of the models. The metrics are implemented in the following module.
+
+.. automodule:: mango.models.metrics
+
+Enumerations
+~~~~~~~~~~~~
+
+The enumerations are used to define the type of problem and the type of model.
+
+.. automodule:: mango.models.enums
+
+Experiment tracking
+~~~~~~~~~~~~~~~~~~~~
+
+During the training of the models, the user may develop many models and it is important to keep track of the results.
+For this purpose, we have implemented several classes that can be used to keep track of the experiments. The classes
+are implemented in the following module.
+
+The main class is the MLExperiment class. This class is used to keep track of the results of the experiments. The
+MLExperiment class is used to save the results of the experiments in a folder structure and provides some methods to
+analyze the results.
+
+.. autoclass:: mango.models.experiment_tracking.MLExperiment
+    :members:
+    :undoc-members:
+    :private-members:
+    :show-inheritance:
+
+MLTracker is a class that can be used to keep track of the experiments. It is a simple manager that uses the folder
+where all the experiments are saved. It provides some methods to analyze the results and compare the experiments.
+
+.. autoclass:: mango.models.experiment_tracking.MLTracker
+    :members:
+    :undoc-members:
+    :private-members:
+    :show-inheritance:
+
+
+In case does not want to use the MLExperiment class, the user can use the following function to save the results of the
+trained model into a folder structure. The model is saved as a pickle file and the
+data is saved as csv files. The function also saves a summary of the model in a json file. This way many models
+(experiments) can be saved in the same folder and the user can easily compare them.
+
+.. autofunction:: mango.models.export_model
+
+The subfolder structure after running export_model is the following:
+
+If not zipped:
+
+.. code-block:: bash
+
+    base_path
+    |-- experiment_LinearRegression_20240111-133955
+    |   `-- summary.json
+    |   |-- data
+    |   |   |-- X_test.csv
+    |   |   |-- X_train.csv
+    |   |   |-- y_test.csv
+    |   |   `-- y_train.csv
+    |   `-- model
+    |       |-- hyperparameters.json
+    |       `-- model.pkl
+
+In case of zipped:
+
+.. code-block:: bash
+
+    base_path
+    |-- experiment_LinearRegression_20240111-133955
+    |   |-- summary.json
+    |   |-- data.zip
+    |   `-- model.zip
+
+
+The following is an example of the summary.json file:
+
+.. code-block:: json
+
+    {
+        "model": {
+            "name": "LinearRegression",
+            "problem_type": "regression",
+            "input": "X_train.csv",
+            "target": "y_train.csv",
+            "hyperparameters": {
+                "fit_intercept": true,
+                "normalize": false,
+                "copy_X": true,
+                "n_jobs": null
+            },
+            "library": "sklearn"
+        },
+        "results": {
+            "train": {
+                "r2": 0.9999999999999999,
+                "rmse": 0.0,
+                "mae": 0.0
+            },
+            "test": {
+                "r2": 0.9999999999999999,
+                "rmse": 0.0,
+                "mae": 0.0
+            }
+        }
+    }
+
+If save_dataset is set to True, the JSON file will also contain the following:
+
+.. code-block:: json
+
+        {
+            "data": {
+                "X_train": {
+                    "path": "X_train.csv",
+                    "shape": [
+                        100,
+                        2
+                    ]
+                },
+                "y_train": {
+                    "path": "y_train.csv",
+                    "shape": [
+                        100,
+                        1
+                    ]
+                },
+                "X_test": {
+                    "path": "X_test.csv",
+                    "shape": [
+                        100,
+                        2
+                    ]
+                },
+                "y_test": {
+                    "path": "y_test.csv",
+                    "shape": [
+                        100,
+                        1
+                    ]
+                }
+            }
+        }
+
+Model experiments
@@ -0,0 +1,119 @@
+Experiment Tracking
+-------------------
+
+This section describes how to use the experiment tracking system.
+
+We will use the california housing dataset from sklearn as an example.
+
+.. code-block:: python
+
+    from sklearn.datasets import fetch_california_housing
+    X, y = fetch_california_housing(return_X_y=True, as_frame=True)
+    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0, test_size=0.3)
+    X_validation, X_test, y_validation, y_test = train_test_split(X_test, y_test, random_state=0, test_size=0.5)
+
+Now we will create a simple pipeline to train a linear regression model and wrap it in an instance of :class:`MLExperiment<mango.models.experiment_tracking.MLExperiment>`
+
+.. code-block:: python
+
+    from sklearn.linear_model import LinearRegression
+    from sklearn.pipeline import Pipeline
+    from sklearn.preprocessing import StandardScaler
+    from mango.models import MLExperiment
+    pipeline = Pipeline([
+        ('scaler', StandardScaler()),
+        ('regressor', LinearRegression())
+    ])
+
+    pipeline.fit(X_train, y_train)
+    experiment = MLExperiment(
+        model=pipeline,
+        name='California Housing LinearRegression',
+        description='LinearRegression on California Housing dataset',
+        problem_type='regression',
+        X_train=X_train,
+        X_test=X_test,
+        y_train=y_train,
+        y_test=y_test,
+        X_validation=X_validation,
+        y_validation=y_validation
+    )
+
+Once the model is wrapped several metrics are pre-computed and stored in the experiment object.
+
+.. code-block:: python
+
+    print(experiment.metrics["test"])
+
+    {
+        "train_score":{
+            "r2_score":0.606,
+            "mean_squared_error":0.524,
+            "mean_absolute_error":0.524,
+            "median_absolute_error":0.524,
+            "explained_variance_score":0.606
+        },
+        "test_score":{
+            "r2_score":0.606,
+            "mean_squared_error":0.524,
+            "mean_absolute_error":0.524,
+            "median_absolute_error":0.524,
+            "explained_variance_score":0.606
+        }
+    }
+
+This experiment can be registered with the experiment tracking system by calling the :meth:`register<mango.models.experiment_tracking.MLExperiment.register_experiment>` method.
+
+.. code-block:: python
+
+    experiments_folder = "/home/user/experiments"
+    experiment.register_experiment(experiments_folder)
+
+
+The experiment is now registered and can be viewed in the experiment tracking system.
+
+The tracking system is used in python with :class:`MLTracker<mango.models.experiment_tracking.MLTracker>`.
+
+.. code-block:: python
+
+    from mango.models import MLTracker
+    tracker = MLTracker(experiments_folder)
+    traker.scan_for_experiments(experiment_folder)
+
+If we now create another experiment using a RandomForestRegressor, we can register it with the tracking system and view it. Now we will show another
+way of adding the experiment to the tracking system. We will use the :meth:`add_experiment<mango.models.experiment_tracking.MLTracker.add_experiment>` method.
+that adds the experiment to the tracking system and also registers (saves into a subfolder) it for future use.
+
+.. code-block:: python
+
+    from sklearn.ensemble import RandomForestRegressor
+    pipeline = Pipeline([
+        ('scaler', StandardScaler()),
+        ('regressor', RandomForestRegressor())
+    ])
+
+    pipeline.fit(X_train, y_train)
+    experiment = MLExperiment(
+        model=pipeline,
+        name='California Housing RandomForestRegressor',
+        description='RandomForestRegressor on California Housing dataset',
+        problem_type='regression',
+        X_train=X_train,
+        X_test=X_test,
+        y_train=y_train,
+        y_test=y_test
+    )
+    tracker.add_experiment(experiment, experiments_folder)
+
+
+Once we added different experiments to the tracking system we can use the :meth:`create_compare_df<mango.models.experiment_tracking.MLTracker.create_compare_df>`
+to create a dataframe that compares the different experiments and shows their metrics.
+
+.. code-block:: python
+
+    tracker.create_compare_df()
+
+For more information about other methods and usages go to :class:`MLTracker<mango.models.experiment_tracking.MLTracker>`.
+
+.. note::
+        This module is still under development and some of the features described in this documentation may not be implemented yet. If you find any bug or have any suggestion, please, open an issue in the `GitHub repository <https://github.com/baobabsoluciones/mango>`_.
@@ -12,6 +12,7 @@ Welcome to mango's documentation!
 
    readme
    changelog
+   experiment_tracking
    genetic/index
    dev/index
    bib
@@ -1,2 +1,3 @@
 from .neural_networks import calculate_network_output
 from .activations import sigmoid, tanh
+from .experiment_tracking import MLExperiment, MLTracker, export_model
@@ -0,0 +1,28 @@
+from enum import Enum
+
+
+class ProblemType(Enum):
+    """
+    Enum to represent the problem type.
+    """
+
+    REGRESSION = "regression"
+    CLASSIFICATION = "classification"
+
+    # When creating a new one convert to lowercase
+    @classmethod
+    def _missing_(cls, value: str):
+        for member in cls:
+            if member.value.lower() == value.lower():
+                return member
+        return super()._missing_(value)
+
+
+class ModelLibrary(Enum):
+    """
+    Enum to represent the model library.
+    """
+
+    SCIKIT_LEARN = "scikit-learn"
+    CATBOOST = "catboost"
+    LIGHTGBM = "lightgbm"