Merge branch 'refs/heads/main' into add_more_multiclass

NannyML · Jul 19, 2024 · 238ae8d · 238ae8d
2 parents 882e230 + 3b0242b
commit 238ae8d
Show file tree

Hide file tree

Showing 19 changed files with 616 additions and 827 deletions.
diff --git a/.bumpversion.cfg b/.bumpversion.cfg
@@ -1,5 +1,5 @@
 [bumpversion]
-current_version = 0.10.7
+current_version = 0.11.0
 commit = True
 tag = True
 

diff --git a/.github/workflows/dev.yml b/.github/workflows/dev.yml
@@ -20,7 +20,7 @@ jobs:
     # The type of runner that the job will run on
     strategy:
       matrix:
-        python-versions: ['3.7', '3.8', '3.9', '3.10', '3.11']
+        python-versions: ['3.8', '3.9', '3.10', '3.11']
         os: [ubuntu-20.04]
 #        os: [ubuntu-18.04, macos-latest, windows-latest]
     runs-on: ${{ matrix.os }}

diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml
@@ -12,6 +12,9 @@ on:
   # Allows you to run this workflow manually from the Actions tab
   workflow_dispatch:
 
+permissions:
+  contents: write
+
 # A workflow run is made up of one or more jobs that can run sequentially or in parallel
 jobs:
   # This workflow contains a single job called "release"
@@ -87,4 +90,4 @@ jobs:
         with:
           user: __token__
           password: ${{ secrets.PYPI_API_TOKEN }}
-          skip_existing: true
+          skip-existing: true
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,30 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.11.0] - 2024-07-19
+
+### Changed
+
+- Updated `Pydantic` to `^2.7.4`, `SQLModel` to `^0.0.19`. [(#401)](https://github.com/NannyML/nannyml/issues/401)
+- Removed the `drop_duplicates` step from the `DomainClassifier` for a further speedup. [(#402)](https://github.com/NannyML/nannyml/issues/402)
+- Reverted to previous working dependency configuration for `matplotlib` as the current one causes issues in `conda`. [(#403)](https://github.com/NannyML/nannyml/issues/403)
+
+### Fixed
+
+- Added `DomainClassifier` method for drift detection to be run in the CLI.
+- Fixed `NaN` handling for multiclass confusion matrix estimation in CBPE. [(#400)](https://github.com/NannyML/nannyml/issues/400)
+- Fixed incorrect handling of columns marked as categorical in Wasserstein and Hellinger drift detection methods.
+  The `treat_as_categorical` value was ignored. We've also added a `treat_as_continuous` column to explicitly mark columns as continuous.
+  [(#404)](https://github.com/NannyML/nannyml/issues/404)
+- Fixed an issue with multiclass `AUROC` calculation and estimation when not all classes are available in a
+  reference chunk during fitting. [(#405)](https://github.com/NannyML/nannyml/issues/405)
+
+### Added
+
+- Added a new data quality calculator to check if continuous values in analysis data are within the ranges
+  encountered in the reference data. Big thanks to [@jnesfield](https://github.com/jnesfield)! Still needs some documentation...
+  [(#408)](https://github.com/NannyML/nannyml/issues/408)
+
 ## [0.10.7] - 2024-06-07
 
 ### Changed

diff --git a/README.md b/README.md
@@ -71,15 +71,15 @@ Allowing you to have the following benefits:
 | 🔬 **[Technical reference]**                                                                                    | Monitor the performance of your ML models.                                             |
 | 🔎 **[Blog]**                                                                                                   | Thoughts on post-deployment data science from the NannyML team.                        |
 | 📬 **[Newsletter]**                                                                                             | All things post-deployment data science. Subscribe to see the latest papers and blogs. |
-| 💎 **[New in v0.10.7]**                                                                                          | New features, bug fixes.                                                               |
+| 💎 **[New in v0.11.0]**                                                                                          | New features, bug fixes.                                                               |
 | 🧑‍💻 **[Contribute]**                                                                                             | How to contribute to the NannyML project and codebase.                                 |
 | <img src="https://raw.githubusercontent.com/NannyML/nannyml/main/media/slack.png" height='15'> **[Join slack]** | Need help with your specific use case? Say hi on slack!                                |
 
 [nannyml 101]: https://nannyml.readthedocs.io/en/stable/
 [performance estimation]: https://nannyml.readthedocs.io/en/stable/how_it_works/performance_estimation.html
 [key concepts]: https://nannyml.readthedocs.io/en/stable/glossary.html
 [technical reference]: https://nannyml.readthedocs.io/en/stable/nannyml/modules.html
-[new in v0.10.7]: https://github.com/NannyML/nannyml/releases/latest/
+[new in v0.11.0]: https://github.com/NannyML/nannyml/releases/latest/
 [real world example]: https://nannyml.readthedocs.io/en/stable/examples/california_housing.html
 [blog]: https://www.nannyml.com/blog
 [newsletter]: https://mailchi.mp/022c62281d13/postdeploymentnewsletter
@@ -264,11 +264,11 @@ Curious what we are working on next? Have a look at our [roadmap](https://bit.ly
 
 To cite NannyML in academic papers, please use the following BibTeX entry.
 
-### Version 0.10.7
+### Version 0.11.0
 
 ```
     @misc{nannyml,
-        title = {{N}anny{ML} (release 0.10.7)},
+        title = {{N}anny{ML} (release 0.11.0)},
         howpublished = {\url{https://github.com/NannyML/nannyml}},
         month = mar,
         year = 2023,

diff --git a/nannyml/__init__.py b/nannyml/__init__.py
@@ -31,15 +31,15 @@
 # Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
 # 'X.Y.dev0' is the canonical version of 'X.Y.dev'
 #
-__version__ = '0.10.7'
+__version__ = '0.11.0'
 
 import logging
 
 from dotenv import load_dotenv
 
 from .calibration import Calibrator, IsotonicCalibrator, needs_calibration
 from .chunk import Chunk, Chunker, CountBasedChunker, DefaultChunker, PeriodBasedChunker, SizeBasedChunker
-from .data_quality import MissingValuesCalculator, UnseenValuesCalculator
+from .data_quality import MissingValuesCalculator, UnseenValuesCalculator, NumericalRangeCalculator
 from .datasets import (
     load_modified_california_housing_dataset,
     load_synthetic_binary_classification_dataset,

diff --git a/nannyml/data_quality/__init__.py b/nannyml/data_quality/__init__.py
@@ -7,3 +7,4 @@
 
 from .missing import MissingValuesCalculator
 from .unseen import UnseenValuesCalculator
+from .range import NumericalRangeCalculator
diff --git a/nannyml/data_quality/missing/calculator.py b/nannyml/data_quality/missing/calculator.py
@@ -76,8 +76,7 @@ def __init__(
         ...     timestamp_column_name='timestamp',
         ... ).fit(reference_df)
         >>> res = calc.calculate(analysis_df)
-        >>> for column_name in res.feature_column_names:
-        ...     res = res.filter(period='analysis', column_name=column_name).plot().show()
+        >>> res.filter(period='analysis').plot().show()
         """
         super(MissingValuesCalculator, self).__init__(
             chunk_size, chunk_number, chunk_period, chunker, timestamp_column_name

diff --git a/nannyml/data_quality/missing/result.py b/nannyml/data_quality/missing/result.py
@@ -79,8 +79,7 @@ def plot(
         ...     timestamp_column_name='timestamp',
         ... ).fit(reference)
         >>> res = calc.calculate(analysis)
-        >>> for column_name in res.column_names:
-        ...     res = res.filter(period='analysis', column_name=column_name).plot().show()
+        >>> res.filter(period='analysis').plot().show()
 
         """
         return plot_metrics(

diff --git a/nannyml/data_quality/range/__init__.py b/nannyml/data_quality/range/__init__.py
@@ -0,0 +1,8 @@
+#  Author:  James Nesfield <jamesnesfield@live.com>
+#
+#  License: Apache Software License 2.0
+
+"""Package containing the Data Quality Calculators implementation."""
+
+from .calculator import NumericalRangeCalculator
+from .result import Result
Original file line number	Diff line number	Diff line change
Expand Up		@@ -7,3 +7,4 @@

		from .missing import MissingValuesCalculator
		from .unseen import UnseenValuesCalculator
		from .range import NumericalRangeCalculator