diff --git a/docs/user-guide/feature-selection.md b/docs/user-guide/feature-selection.md index 8d10611a7..a2b62d053 100644 --- a/docs/user-guide/feature-selection.md +++ b/docs/user-guide/feature-selection.md @@ -2,6 +2,8 @@ ## Maximum Relevance Minimum Redundancy +!!! info "New in version 0.8.0" + The [`Maximum Relevance Minimum Redundancy`][MaximumRelevanceMinimumRedundancy-api] (MRMR) is an iterative feature selection method commonly used in data science to select a subset of features from a larger feature set. The goal of MRMR is to choose features that have high *relevance* to the target variable while minimizing *redundancy* among the already selected features. MRMR is heavily dependent on the two functions used to determine relevace and redundancy. However, the paper [Maximum Relevanceand Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform](https://arxiv.org/pdf/1908.05376.pdf) shows that using [f_classif](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_classif.html) or [f_regression](https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.f_regression.html) as relevance function and Pearson correlation as redundancy function is the best choice for a variety of different problems and in general is a good choice. @@ -57,7 +59,7 @@ Feature selection method: mrmr_smile F1 score: 0.849 ``` -The MRMR feature selection model provides better results compared against the other methods, although the smile technique performs rather good as well. +The MRMR feature selection model provides better results compared against the other methods, although the smile technique performs rather good as well. Finally, we can take a look at the selected features. diff --git a/docs/user-guide/meta-models.md b/docs/user-guide/meta-models.md index 97c55286c..08d4c032c 100644 --- a/docs/user-guide/meta-models.md +++ b/docs/user-guide/meta-models.md @@ -136,7 +136,7 @@ Note that these predictions seems to yield the lowest error but take it with a g ### Specialized Estimators -!!! info "New in version 0.7.5" +!!! info "New in version 0.8.0" Instead of using the generic `GroupedPredictor` directly, it is possible to work with _task specific_ estimators, namely: [`GroupedClassifier`][grouped-classifier-api] and [`GroupedRegressor`][grouped-regressor-api]. diff --git a/pyproject.toml b/pyproject.toml index 7f32f6267..824739835 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "scikit-lego" -version = "0.7.4" +version = "0.8.0" description="A collection of lego bricks for scikit-learn pipelines" license = {file = "LICENSE"} diff --git a/sklego/feature_selection/mrmr.py b/sklego/feature_selection/mrmr.py index 5cb1c842e..4eceb39cd 100644 --- a/sklego/feature_selection/mrmr.py +++ b/sklego/feature_selection/mrmr.py @@ -83,6 +83,8 @@ class MaximumRelevanceMinimumRedundancy(SelectorMixin, BaseEstimator): - np.ndarray, shape = (len(left), ), The array containing the redundancy score using the custom function. + !!! info "New in version 0.8.0" + Parameters ---------- k : int diff --git a/sklego/meta/grouped_predictor.py b/sklego/meta/grouped_predictor.py index 08279c49d..1d6b2cde3 100644 --- a/sklego/meta/grouped_predictor.py +++ b/sklego/meta/grouped_predictor.py @@ -397,7 +397,7 @@ class GroupedRegressor(GroupedPredictor, RegressorMixin): Its spec is the same as [`GroupedPredictor`][sklego.meta.grouped_predictor.GroupedPredictor] but it is available only for regression models. - !!! info "New in version 0.7.5" + !!! info "New in version 0.8.0" """ def fit(self, X, y): @@ -434,7 +434,7 @@ class GroupedClassifier(GroupedPredictor, ClassifierMixin): Its equivalent to [`GroupedPredictor`][sklego.meta.grouped_predictor.GroupedPredictor] with `shrinkage=None` but it is available only for classification models. - !!! info "New in version 0.7.5" + !!! info "New in version 0.8.0" """ def __init__(