From 3246e76c4ba3db1ad8bbf31926abb06e047710d0 Mon Sep 17 00:00:00 2001 From: Kishan Savant <66986430+NeoKish@users.noreply.github.com> Date: Sat, 24 Feb 2024 15:47:17 +0530 Subject: [PATCH] Minor fixes (#366) * Minor typo fixes * Fixed some broken links * Fixed with correct link --- README.md | 6 +++--- docs/how_it_works/estimation_of_standard_error.rst | 2 +- docs/how_it_works/multivariate_drift.rst | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 965313bb..54da26e0 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ NannyML is an open-source python library that allows you to **estimate post-deployment model performance** (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, interactive visualizations, is completely model-agnostic and currently supports all tabular use cases, classification and **regression**. The core contributors of NannyML have researched and developed multiple novel algorithms for estimating model performance: [confidence-based performance estimation (CBPE)](https://nannyml.readthedocs.io/en/stable/how_it_works/performance_estimation.html#confidence-based-performance-estimation-cbpe) and [direct loss estimation (DLE)](https://nannyml.readthedocs.io/en/stable/how_it_works/performance_estimation.html#direct-loss-estimation-dle). -The nansters also invented a new approach to detect [multivariate data drift](https://nannyml.readthedocs.io/en/stable/how_it_works/data_reconstruction.html) using PCA-based data reconstruction. +The nansters also invented a new approach to detect [multivariate data drift](https://nannyml.readthedocs.io/en/stable/how_it_works/multivariate_drift.html#data-reconstruction-with-pca) using PCA-based data reconstruction. If you like what we are working on, be sure to become a Nanster yourself, join our [community slack](https://join.slack.com/t/nannymlbeta/shared_invite/zt-16fvpeddz-HAvTsjNEyC9CE6JXbiM7BQ) and support us with a GitHub star ⭐. @@ -98,9 +98,9 @@ NannyML can also **track the realised performance** of your machine learning mod ### 2. Data drift detection -To detect **multivariate feature drift** NannyML uses [PCA-based data reconstruction](https://nannyml.readthedocs.io/en/main/how_it_works/data_reconstruction.html). Changes in the resulting reconstruction error are monitored over time and data drift alerts are logged when the reconstruction error in a certain period exceeds a threshold. This threshold is calculated based on the reconstruction error observed in the reference period. +To detect **multivariate feature drift** NannyML uses [PCA-based data reconstruction](https://nannyml.readthedocs.io/en/stable/how_it_works/multivariate_drift.html#data-reconstruction-with-pca). Changes in the resulting reconstruction error are monitored over time and data drift alerts are logged when the reconstruction error in a certain period exceeds a threshold. This threshold is calculated based on the reconstruction error observed in the reference period. -
+ NannyML utilises statistical tests to detect **univariate feature drift**. We have just added a bunch of new univariate tests including Jensen-Shannon Distance and L-Infinity Distance, check out the [comprehensive list](https://nannyml.readthedocs.io/en/stable/how_it_works/univariate_drift_detection.html#methods-for-continuous-features). The results of these tests are tracked over time, properly corrected to counteract multiplicity and overlayed on the temporal feature distributions. (It is also possible to visualise the test-statistics over time, to get a notion of the drift magnitude.) diff --git a/docs/how_it_works/estimation_of_standard_error.rst b/docs/how_it_works/estimation_of_standard_error.rst index cd1cc727..0c9b8955 100644 --- a/docs/how_it_works/estimation_of_standard_error.rst +++ b/docs/how_it_works/estimation_of_standard_error.rst @@ -205,7 +205,7 @@ Through a simple application of error propagation: which means that the standard error of the sum is the standard error of the mean multiplied by sample size. -Stnadard Deviation +Standard Deviation ------------------ The standard error of the variance of a random variable is given by the following exact formula: diff --git a/docs/how_it_works/multivariate_drift.rst b/docs/how_it_works/multivariate_drift.rst index 997d08e1..93355b1f 100644 --- a/docs/how_it_works/multivariate_drift.rst +++ b/docs/how_it_works/multivariate_drift.rst @@ -177,7 +177,7 @@ The classifier cross validation part uses the data created and consists of the f - Optionally, hyperparameter tuning is performed. The hyperparameters learnt during this step will be used in the model training steps below. If hyperparameter tuning - is not requested, user specified hyperpatameters can be used instead of the default LightGBM optioms. + is not requested, user specified hyperparameters can be used instead of the default LightGBM options. - Stratified split is used to split the data into validation folds - For each split NannyML trains an `LGBMClassifier` and saves its predicted scores in the validation fold.