From 413970cfe74006b66a572fb0a1cb7d6e545e7bce Mon Sep 17 00:00:00 2001 From: Niels Nuyttens Date: Thu, 9 Nov 2023 15:35:37 +0100 Subject: [PATCH] Re-align docs with main version --- .../multiclass_performance_estimation.rst | 151 +++--------------- .../standard_metric_estimation.rst | 13 +- 2 files changed, 24 insertions(+), 140 deletions(-) diff --git a/docs/tutorials/performance_estimation/multiclass_performance_estimation.rst b/docs/tutorials/performance_estimation/multiclass_performance_estimation.rst index 49e575b0..e87136bc 100644 --- a/docs/tutorials/performance_estimation/multiclass_performance_estimation.rst +++ b/docs/tutorials/performance_estimation/multiclass_performance_estimation.rst @@ -1,144 +1,29 @@ .. _multiclass-performance-estimation: -==================================================== +================================================ Estimating Performance for Multiclass Classification -==================================================== +================================================ -This tutorial explains how to use NannyML to estimate the performance of binary classification -models in the absence of target data. To find out how :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE` estimates performance, read the :ref:`explanation of Confidence-based -Performance Estimation`. +We currently support the following **standard** metrics for multiclass classification performance estimation: -.. note:: - The following example uses :term:`timestamps`. - These are optional but have an impact on the way data is chunked and results are plotted. - You can read more about them in the :ref:`data requirements`. + * **roc_auc** - one-vs-the-rest, macro-averaged + * **f1** - macro-averaged + * **precision** - macro-averaged + * **recall** - macro-averaged + * **specificity** - macro-averaged + * **accuracy** +For more information about estimating these metrics, refer to the :ref:`standard-metric-estimation` section. -Just The Code -------------- +We also support the following *complex* metrics for multiclass classification performance estimation: -.. nbimport:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cells: 1 3 4 6 + * **confusion_matrix** -.. admonition:: **Advanced configuration** - :class: hint +For more information about estimating the confusion matrix for multiclass problems, +refer to the :ref:`multiclass-confusion-matrix-estimation` section. - - To learn how :class:`~nannyml.chunk.Chunk` works and to set up custom chunkings check out the :ref:`chunking tutorial ` - - To learn how :class:`~nannyml.thresholds.ConstantThreshold` works and to set up custom threshold check out the :ref:`thresholds tutorial ` +.. toctree:: + :maxdepth: 2 -Walkthrough ------------ - - -For simplicity this guide is based on a synthetic dataset where the monitored model predicts -which type of credit card product new customers should be assigned to. -Check out :ref:`Credit Card Dataset` to learn more about this dataset. - -In order to monitor a model, NannyML needs to learn about it and set expectations from a reference dataset. -Then it can monitor the data that is subject to actual analysis, provided as the analysis dataset. -You can read more about this in our section on :ref:`data periods`. - -.. nbimport:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cells: 1 - -.. nbtable:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cell: 2 - -Next we create the Confidence-based Performance Estimation (:class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE`) -estimator with a list of metrics, and an optional :term:`chunking` specification. For more information about -chunking check out the :ref:`chunking tutorial` and it's :ref:`advanced guide`. - -.. note:: - The list of metrics specifies which performance metrics of the monitored model will be estimated. - The following metrics are currently supported: - - - ``roc_auc`` - one-vs-the-rest, macro-averaged - - ``f1`` - macro-averaged - - ``precision`` - macro-averaged - - ``recall`` - macro-averaged - - ``specificity`` - macro-averaged - - ``accuracy`` - - -.. nbimport:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cells: 3 - -The :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE` -estimator is then fitted using the -:meth:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE.fit` method on the reference data. - -The fitted ``estimator`` can be used to estimate performance on other data, for which performance cannot be calculated. -Typically, this would be used on the latest production data where target is missing. In our example this is -the ``analysis_df`` data. - -NannyML can then output a dataframe that contains all the results. Let's have a look at the results for analysis period -only. - -.. nbimport:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cells: 4 - -.. nbtable:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cell: 5 - -Apart from chunk-related data, the results data have the following columns for each metric -that was estimated: - - - **value** - the estimate of a metric for a specific chunk. - - **sampling_error** - the estimate of the :term:`Sampling Error`. - - **realized** - when **target** values are available for a chunk, the realized performance metric will also - be calculated and included within the results. - - **upper_confidence_boundary** and **lower_confidence_boundary** - These values show the :term:`Confidence Band` of the relevant metric - and are equal to estimated value +/- 3 times the estimated :term:`Sampling Error`. - - **upper_threshold** and **lower_threshold** - crossing these thresholds will raise an alert on significant - performance change. The thresholds are calculated based on the actual performance of the monitored model on chunks in - the reference partition. By default, the thresholds are 3 standard deviations away from the mean performance calculated on - chunks. They are calculated during ``fit`` phase. You can also set up custom thresholds using constant or standard deviations thresholds, - to learn more about it check out our :ref:`tutorial on thresholds`. - - **alert** - flag indicating potentially significant performance change. ``True`` if estimated performance crosses - upper or lower threshold. - - -These results can be also plotted. Our plot contains several key elements. - -* The purple dashed step plot shows the estimated performance in each chunk of the analysis period. Thick squared point - markers indicate the middle of these chunks. - -* The black vertical line splits the reference and analysis periods. - -* The low-saturated colored area around the estimated performance indicates the :ref:`sampling error`. - -* The red horizontal dashed lines show upper and lower thresholds for alerting purposes. - -* If the estimated performance crosses the upper or lower threshold an alert is raised which is indicated with a red - diamond-shaped point marker in the middle of the chunk. - -Description of tabular results above explains how the -:term:`confidence bands` and thresholds are calculated. Additional information is shown in the hover (these are -interactive plots, though only static views are included here). - - -.. nbimport:: - :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cells: 6 - -.. image:: ../../_static/tutorials/performance_estimation/multiclass_synthetic.svg - -Insights --------- - -After reviewing the performance estimation results, we should be able to see any indications of performance change that -NannyML has detected based upon the model's inputs and outputs alone. - - -What's next ------------ - -The :ref:`Data Drift` functionality can help us to understand whether data drift is causing the performance problem. -When the target values become available we can -:ref:`compared realized and performance results`. + multiclass_performance_estimation/standard_metric_estimation + multiclass_performance_estimation/confusion_matrix_estimation diff --git a/docs/tutorials/performance_estimation/multiclass_performance_estimation/standard_metric_estimation.rst b/docs/tutorials/performance_estimation/multiclass_performance_estimation/standard_metric_estimation.rst index beccf876..75ae505f 100644 --- a/docs/tutorials/performance_estimation/multiclass_performance_estimation/standard_metric_estimation.rst +++ b/docs/tutorials/performance_estimation/multiclass_performance_estimation/standard_metric_estimation.rst @@ -1,8 +1,8 @@ .. _multiclass_standard-metric-estimation: -===================================================================== -Estimating Standard Performance Metrics for Multiclass Classification -===================================================================== +==================================================== +Estimating Performance for Multiclass Classification +==================================================== This tutorial explains how to use NannyML to estimate the performance of binary classification models in the absence of target data. To find out how :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE` estimates performance, read the :ref:`explanation of Confidence-based @@ -19,7 +19,7 @@ Just The Code .. nbimport:: :path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb - :cells: 1 3 4 6 8 + :cells: 1 3 4 6 .. admonition:: **Advanced configuration** :class: hint @@ -140,6 +140,5 @@ What's next ----------- The :ref:`Data Drift` functionality can help us to understand whether data drift is causing the performance problem. -When the target values become available we can use -:ref:`realized performance calculation` to -:ref:`compare realized and estimated confusion matrix results`. +When the target values become available we can +:ref:`compared realized and performance results`.