Skip to content

Commit

Permalink
Re-align docs with main version
Browse files Browse the repository at this point in the history
  • Loading branch information
nnansters committed Nov 9, 2023
1 parent 22f47a8 commit 413970c
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 140 deletions.
Original file line number Diff line number Diff line change
@@ -1,144 +1,29 @@
.. _multiclass-performance-estimation:

====================================================
================================================
Estimating Performance for Multiclass Classification
====================================================
================================================

This tutorial explains how to use NannyML to estimate the performance of binary classification
models in the absence of target data. To find out how :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE` estimates performance, read the :ref:`explanation of Confidence-based
Performance Estimation<performance-estimation-deep-dive>`.
We currently support the following **standard** metrics for multiclass classification performance estimation:

.. note::
The following example uses :term:`timestamps<Timestamp>`.
These are optional but have an impact on the way data is chunked and results are plotted.
You can read more about them in the :ref:`data requirements<data_requirements_columns_timestamp>`.
* **roc_auc** - one-vs-the-rest, macro-averaged
* **f1** - macro-averaged
* **precision** - macro-averaged
* **recall** - macro-averaged
* **specificity** - macro-averaged
* **accuracy**

For more information about estimating these metrics, refer to the :ref:`standard-metric-estimation` section.

Just The Code
-------------
We also support the following *complex* metrics for multiclass classification performance estimation:

.. nbimport::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cells: 1 3 4 6
* **confusion_matrix**

.. admonition:: **Advanced configuration**
:class: hint
For more information about estimating the confusion matrix for multiclass problems,
refer to the :ref:`multiclass-confusion-matrix-estimation` section.

- To learn how :class:`~nannyml.chunk.Chunk` works and to set up custom chunkings check out the :ref:`chunking tutorial <chunking>`
- To learn how :class:`~nannyml.thresholds.ConstantThreshold` works and to set up custom threshold check out the :ref:`thresholds tutorial <thresholds>`
.. toctree::
:maxdepth: 2

Walkthrough
-----------


For simplicity this guide is based on a synthetic dataset where the monitored model predicts
which type of credit card product new customers should be assigned to.
Check out :ref:`Credit Card Dataset<dataset-synthetic-multiclass>` to learn more about this dataset.

In order to monitor a model, NannyML needs to learn about it and set expectations from a reference dataset.
Then it can monitor the data that is subject to actual analysis, provided as the analysis dataset.
You can read more about this in our section on :ref:`data periods<data-drift-periods>`.

.. nbimport::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cells: 1

.. nbtable::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cell: 2

Next we create the Confidence-based Performance Estimation (:class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE`)
estimator with a list of metrics, and an optional :term:`chunking<Data Chunk>` specification. For more information about
chunking check out the :ref:`chunking tutorial<chunking>` and it's :ref:`advanced guide<chunk-data>`.

.. note::
The list of metrics specifies which performance metrics of the monitored model will be estimated.
The following metrics are currently supported:

- ``roc_auc`` - one-vs-the-rest, macro-averaged
- ``f1`` - macro-averaged
- ``precision`` - macro-averaged
- ``recall`` - macro-averaged
- ``specificity`` - macro-averaged
- ``accuracy``


.. nbimport::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cells: 3

The :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE`
estimator is then fitted using the
:meth:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE.fit` method on the reference data.

The fitted ``estimator`` can be used to estimate performance on other data, for which performance cannot be calculated.
Typically, this would be used on the latest production data where target is missing. In our example this is
the ``analysis_df`` data.

NannyML can then output a dataframe that contains all the results. Let's have a look at the results for analysis period
only.

.. nbimport::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cells: 4

.. nbtable::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cell: 5

Apart from chunk-related data, the results data have the following columns for each metric
that was estimated:

- **value** - the estimate of a metric for a specific chunk.
- **sampling_error** - the estimate of the :term:`Sampling Error`.
- **realized** - when **target** values are available for a chunk, the realized performance metric will also
be calculated and included within the results.
- **upper_confidence_boundary** and **lower_confidence_boundary** - These values show the :term:`Confidence Band` of the relevant metric
and are equal to estimated value +/- 3 times the estimated :term:`Sampling Error`.
- **upper_threshold** and **lower_threshold** - crossing these thresholds will raise an alert on significant
performance change. The thresholds are calculated based on the actual performance of the monitored model on chunks in
the reference partition. By default, the thresholds are 3 standard deviations away from the mean performance calculated on
chunks. They are calculated during ``fit`` phase. You can also set up custom thresholds using constant or standard deviations thresholds,
to learn more about it check out our :ref:`tutorial on thresholds<thresholds>`.
- **alert** - flag indicating potentially significant performance change. ``True`` if estimated performance crosses
upper or lower threshold.


These results can be also plotted. Our plot contains several key elements.

* The purple dashed step plot shows the estimated performance in each chunk of the analysis period. Thick squared point
markers indicate the middle of these chunks.

* The black vertical line splits the reference and analysis periods.

* The low-saturated colored area around the estimated performance indicates the :ref:`sampling error<estimation_of_standard_error>`.

* The red horizontal dashed lines show upper and lower thresholds for alerting purposes.

* If the estimated performance crosses the upper or lower threshold an alert is raised which is indicated with a red
diamond-shaped point marker in the middle of the chunk.

Description of tabular results above explains how the
:term:`confidence bands<Confidence Band>` and thresholds are calculated. Additional information is shown in the hover (these are
interactive plots, though only static views are included here).


.. nbimport::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cells: 6

.. image:: ../../_static/tutorials/performance_estimation/multiclass_synthetic.svg

Insights
--------

After reviewing the performance estimation results, we should be able to see any indications of performance change that
NannyML has detected based upon the model's inputs and outputs alone.


What's next
-----------

The :ref:`Data Drift<data-drift>` functionality can help us to understand whether data drift is causing the performance problem.
When the target values become available we can
:ref:`compared realized and performance results<compare_estimated_and_realized_performance>`.
multiclass_performance_estimation/standard_metric_estimation
multiclass_performance_estimation/confusion_matrix_estimation
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
.. _multiclass_standard-metric-estimation:

=====================================================================
Estimating Standard Performance Metrics for Multiclass Classification
=====================================================================
====================================================
Estimating Performance for Multiclass Classification
====================================================

This tutorial explains how to use NannyML to estimate the performance of binary classification
models in the absence of target data. To find out how :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE` estimates performance, read the :ref:`explanation of Confidence-based
Expand All @@ -19,7 +19,7 @@ Just The Code

.. nbimport::
:path: ./example_notebooks/Tutorial - Estimating Performance - Multiclass Classification.ipynb
:cells: 1 3 4 6 8
:cells: 1 3 4 6

.. admonition:: **Advanced configuration**
:class: hint
Expand Down Expand Up @@ -140,6 +140,5 @@ What's next
-----------

The :ref:`Data Drift<data-drift>` functionality can help us to understand whether data drift is causing the performance problem.
When the target values become available we can use
:ref:`realized performance calculation<multiclass-standard-metric-calculation>` to
:ref:`compare realized and estimated confusion matrix results<compare_estimated_and_realized_performance>`.
When the target values become available we can
:ref:`compared realized and performance results<compare_estimated_and_realized_performance>`.

0 comments on commit 413970c

Please sign in to comment.