doc fixes

nikml · nikml · commit abc1d7cab4f7 · 2024-02-05T21:48:34.000+02:00
diff --git a/docs/how_it_works/performance_estimation.rst b/docs/how_it_works/performance_estimation.rst
@@ -400,7 +400,7 @@ does have it's limitations:
 **There is no covariate shift to previously unseen regions in the input space.**
     The algorithm will most likely not work if the drift happens to subregions previously unseen in the model input
     space. Mathematically we can also state that the support of the chunk data needs to be a subset of the
-    support of the reference data. In those cases density ratio estimation is theoritically not defined.
+    support of the reference data. If not density ratio estimation is theoretically not defined.
     Practically if we don't have data from a chunk region in the reference data we can't account for that
     shift with a weighted calculation from reference data.
 
@@ -411,9 +411,9 @@ does have it's limitations:
 
 .. _how-it-works-dle:
 
------------------
-Direct Loss (DLE)
------------------
+----------------------------
+Direct Loss Estimation (DLE)
+----------------------------
 
 The Intuition
 =============
diff --git a/docs/tutorials/performance_estimation/binary_performance_estimation/business_value_estimation/cbpe.rst b/docs/tutorials/performance_estimation/binary_performance_estimation/business_value_estimation/cbpe.rst
@@ -1,8 +1,8 @@
 .. _business-value-estimation-cbpe:
 
-===================================================
-Estimating Business Value for Binary Classification
-===================================================
+========================================
+Confidence Based Performannce Estimation
+========================================
 
 Let's see how to use NannyML how to use NannyML to estimate business value for binary classification
 models in the absence of target data. To find out how :class:`~nannyml.performance_estimation.confidence_based.cbpe.CBPE`
@@ -16,15 +16,15 @@ estimates performance, read the :ref:`explanation of Confidence-based Performanc
 .. _business-value-estimation-binary-just-the-code-cbpe:
 
 Just The Code
-----------------
+-------------
 
 .. nbimport::
     :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
     :cells: 1 3 4 5 7
 
 
 Walkthrough
---------------
+-----------
 
 For simplicity this guide is based on a synthetic dataset included in the library, where the monitored model
 predicts whether a customer will repay a loan to buy a car.
diff --git a/docs/tutorials/performance_estimation/binary_performance_estimation/business_value_estimation/iw.rst b/docs/tutorials/performance_estimation/binary_performance_estimation/business_value_estimation/iw.rst
@@ -20,7 +20,7 @@ Just The Code
 ----------------
 
 .. nbimport::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cells: 1 3 4 5 7
 
 
@@ -37,11 +37,11 @@ You can read more about this in our section on :ref:`data periods<data-drift-per
 We start by loading the dataset we'll be using:
 
 .. nbimport::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cells: 1
 
 .. nbtable::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cell: 2
 
 Next we create the Importance Weighting
@@ -112,15 +112,15 @@ parameters:
     the business value matrix, check out the :ref:`Business Value "How it Works" page<business-value-deep-dive>`.
 
 .. nbimport::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cells: 3
 
 The :class:`~nannyml.performance_estimation.importance_weighting.iw.IW`
 estimator is then fitted using the
 :meth:`~nannyml.performance_estimation.importance_weighting.iw.IW.fit` method on the reference data.
 
 .. nbimport::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cells: 4
 
 The fitted ``estimator`` can be used to estimate performance on other data, for which performance cannot be calculated.
@@ -131,11 +131,11 @@ NannyML can then output a dataframe that contains all the results. Let's have a
 only.
 
 .. nbimport::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cells: 5
 
 .. nbtable::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cell: 6
 
 Apart from chunk-related data, the results data have the following columns for each metric
@@ -170,7 +170,7 @@ These results can be also plotted. Our plots contains several key elements.
 * *The red diamond-shaped point markers* in the middle of a chunk indicate that an alert has been raised. Alerts are caused by the estimated performance crossing the upper or lower threshold.
 
 .. nbimport::
-    :path: ./example_notebooks/Tutorial - Estimating Business Value - Binary Classification.ipynb
+    :path: ./example_notebooks/Tutorial - Estimating Business Value - IW - Binary Classification.ipynb
     :cells: 7
 
 .. image:: ../../../../_static/tutorials/performance_estimation/binary/tutorial-business-value-estimation-iw-car-loan-analysis-with-ref.svg
diff --git a/docs/tutorials/performance_estimation/binary_performance_estimation/custom_metric_estimation.rst b/docs/tutorials/performance_estimation/binary_performance_estimation/custom_metric_estimation.rst
@@ -4,7 +4,8 @@
 Creating and Estimating a Custom Binary Classification Metric
 ========================================================================================
 This tutorial explains how to use NannyML to estimate a custom metric based on :term:`confusion matrix<Confusion Matrix>` for binary classification
-models in the absence of target data. In particular, we will be creating a **balanced accuracy** metric.
+models in the absence of target data. In particular, we will be creating the **balanced accuracy** metric.
+We will do this using the CBPE algorithm but custom metrics can also be created with the IW as well.
 To find out how CBPE estimates the confusion matrix components, read the :ref:`explanation of Confidence-based
 Performance Estimation<performance-estimation-deep-dive>`.
 
diff --git a/docs/tutorials/performance_estimation/multiclass_performance_estimation.rst b/docs/tutorials/performance_estimation/multiclass_performance_estimation.rst
@@ -1,8 +1,8 @@
 .. _multiclass-performance-estimation:
 
-================================================
+====================================================
 Estimating Performance for Multiclass Classification
-================================================
+====================================================
 
 We currently support the following **standard** metrics for multiclass classification performance estimation:
 
diff --git a/nannyml/performance_estimation/importance_weighting/iw.py b/nannyml/performance_estimation/importance_weighting/iw.py
@@ -263,33 +263,54 @@ def __init__(
 
         Examples
         --------
-        Using CBPE to estimate the perfomance of a model for a binary classification problem.
+
+        Using IW to estimate the perfomance of a model for a binary classification problem.
 
         >>> import nannyml as nml
         >>> from IPython.display import display
         >>> reference_df = nml.load_synthetic_car_loan_dataset()[0]
         >>> analysis_df = nml.load_synthetic_car_loan_dataset()[1]
         >>> display(reference_df.head(3))
-        >>> estimator = nml.CBPE(
-        ...     y_pred_proba='y_pred_proba',
-        ...     y_pred='y_pred',
-        ...     y_true='repaid',
+        >>> estimator = nml.IW(
         ...     timestamp_column_name='timestamp',
-        ...     metrics=['roc_auc', 'accuracy', 'f1'],
-        ...     chunk_size=5000,
+        ...     feature_column_names=[
+        ...         "car_value",
+        ...         "debt_to_income_ratio",
+        ...         "loan_length",
+        ...         "driver_tenure",
+        ...         "salary_range",
+        ...         "repaid_loan_on_prev_car",
+        ...         "size_of_downpayment"
+        ...     ],
+        ...     y_true='repaid',
+        ...     y_pred='y_pred',
+        ...     y_pred_proba='y_pred_proba',
+        ...     metrics=['accuracy', 'roc_auc', 'f1'],
         ...     problem_type='classification_binary',
+        ...     chunk_size=5000
         >>> )
         >>> estimator.fit(reference_df)
         >>> results = estimator.estimate(analysis_df)
         >>> display(results.filter(period='analysis').to_df())
         >>> metric_fig = results.plot()
         >>> metric_fig.show()
 
-        Using CBPE to estimate the perfomance of a model for a multiclass classification problem.
+        Using IW to estimate the perfomance of a model for a multiclass classification problem.
 
         >>> import nannyml as nml
+        >>> from IPython.display import display
         >>> reference_df, analysis_df, _ = nml.load_synthetic_multiclass_classification_dataset()
-        >>> estimator = nml.CBPE(
+        >>> display(reference_df.head(3))
+        >>> estimator = nml.IW(
+        ...     feature_column_names=[
+        ...             "app_behavioral_score",
+        ...             "requested_credit_limit",
+        ...             "credit_bureau_score",
+        ...             "stated_income",
+        ...             "acq_channel",
+        ...             "app_channel",
+        ...             "is_customer"
+        ...         ],
         ...     y_pred_proba={
         ...         'prepaid_card': 'y_pred_proba_prepaid_card',
         ...         'highstreet_card': 'y_pred_proba_highstreet_card',
@@ -303,6 +324,7 @@ def __init__(
         >>> )
         >>> estimator.fit(reference_df)
         >>> results = estimator.estimate(analysis_df)
+        >>> display(results.filter(period='analysis').to_df())
         >>> metric_fig = results.plot()
         >>> metric_fig.show()
         """