Skip to content

Commit

Permalink
feat: feature selector (#17)
Browse files Browse the repository at this point in the history
* feat: feature-selector

* unit tests

* docs update

* bump version
  • Loading branch information
FBruzzesi authored Jun 8, 2024
1 parent 25cc6db commit 91ba7ce
Show file tree
Hide file tree
Showing 8 changed files with 79 additions and 10 deletions.
13 changes: 12 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ How can you use it?
smith forge
```

🚧 As a TUI (terminal user interface): Working in progress!
🚧 As a TUI (terminal user interface): Work in progress!

All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.

Expand Down Expand Up @@ -66,6 +66,17 @@ and it should be compatible with scikit-learn Pipeline, GridSearchCV, etc.
Scikit-learn documentation on how to
[develop estimators](https://scikit-learn.org/dev/developers/develop.html#developing-scikit-learn-estimators).

## Supported estimators

The following types of scikit-learn estimator are supported:

- Classifier
- Regressor
- Transformer
- Feature Selector
- Outlier Detector
- Clusterer

## Installation

sklearn-smithy is available on [pypi](https://pypi.org/project/sklearn-smithy), so you can install it directly from there:
Expand Down
13 changes: 12 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,21 @@ How can you use it?
smith forge
```

- [ ] As a TUI (terminal user interface): [Working in progress](https://github.com/FBruzzesi/sklearn-smithy/issues/1)!
- [ ] As a TUI (terminal user interface): [Work in progress](https://github.com/FBruzzesi/sklearn-smithy/issues/1)!

All these tools will prompt a series of questions regarding the estimator you want to create, and then it will generate the boilerplate code for you.

## Supported estimators

The following types of scikit-learn estimator are supported:

- Classifier
- Regressor
- Transformer
- Feature Selector
- Outlier Detector
- Clusterer

## Origin story

The idea for this tool originated from [scikit-lego #660](https://github.com/koaning/scikit-lego/pull/660){:target="_blank"}, which I cannot better explain than quoting the PR description itself:
Expand Down
4 changes: 2 additions & 2 deletions docs/user-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,14 +28,14 @@ Let's see an example of how to use `smith forge` command:
```console
$ <font color="#4E9A06">smith</font> forge
# 🐍 How would you like to name the estimator?:$ MightyClassifier
# 🎯 Which kind of estimator is it? (classifier, outlier, regressor, transformer, cluster):$ classifier
# 🎯 Which kind of estimator is it? (classifier, outlier, regressor, transformer, cluster, feature-selector):$ classifier
# 📜 Please list the required parameters (comma-separated) []:$ alpha,beta
# 📑 Please list the optional parameters (comma-separated) []:$ mu,sigma
# 📶 Does the `.fit()` method support `sample_weight`? [y/N]:$ y
# 📏 Is the estimator linear? [y/N]:$ N
# 🎲 Should the estimator implement a `predict_proba` method? [y/N]:$ N
# ❓ Should the estimator implement a `decision_function` method? [y/N]:$ y
# 🧪 We are almost there... Is there any tag you want to add? (comma-separated) []:$ binary_only
# 🧪 We are almost there... Is there any tag you want to add? (comma-separated) []:$ binary_only,non_deterministic
# 📂 Where would you like to save the class? [mightyclassifier.py]:$ path/to/file.py
<span style="color: green; font-weight: bold;">Template forged at path/to/file.py </span>
```
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "sklearn-smithy"
version = "0.0.9"
version = "0.0.10"
description = "Toolkit to forge scikit-learn compatible estimators."
requires-python = ">=3.10"

Expand Down
1 change: 1 addition & 0 deletions sksmithy/_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ class EstimatorType(str, Enum):
RegressorMixin = "regressor"
TransformerMixin = "transformer"
ClusterMixin = "cluster"
SelectorMixin = "feature-selector"


class TagType(str, Enum):
Expand Down
2 changes: 1 addition & 1 deletion sksmithy/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@
estimator = st.selectbox(
label=PROMPT_ESTIMATOR,
options=tuple(e.value for e in EstimatorType),
format_func=lambda x: x.capitalize(),
format_func=lambda v: " ".join(x.capitalize() for x in v.split("-")),
index=None,
key="estimator",
)
Expand Down
30 changes: 27 additions & 3 deletions sksmithy/template.py.jinja
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{%- if estimator_type=='classifier' %}
{%- if estimator_type in ('classifier', 'feature-selector') %}
import numpy as np
{% endif -%}
{%- if estimator_type == 'classifier' and linear %}
Expand All @@ -7,6 +7,9 @@ from sklearn.linear_model._base import LinearClassifierMixin
{% elif estimator_type == 'regressor' and linear%}
from sklearn.base import {{ mixin }}
from sklearn.linear_model._base import LinearModel
{% elif estimator_type == 'feature-selector'%}
from sklearn.base import BaseEstimator
from sklearn.feature_selection import SelectorMixin
{% else %}
from sklearn.base import BaseEstimator, {{ mixin }}
{% endif -%}
Expand Down Expand Up @@ -56,7 +59,7 @@ class {{ name }}(
{% endfor -%}
{% endif %}

def fit(self, X, y{% if estimator_type == 'transformer' %}=None{% endif %}{% if sample_weight %}, sample_weight=None{% endif %}):
def fit(self, X, y{% if estimator_type in ('transformer', 'feature-selector') %}=None{% endif %}{% if sample_weight %}, sample_weight=None{% endif %}):
"""
Fit {{name}} estimator.

Expand All @@ -82,7 +85,7 @@ class {{ name }}(
self : {{name}}
Fitted {{name}} estimator.
"""
{%- if estimator_type == 'transformer' %}
{%- if estimator_type in ('transformer', 'feature-selector') %}
X = check_array(X, ...) #TODO: Fill in `check_array` arguments
{% else %}
X, y = check_X_y(X, y, ...) #TODO: Fill in `check_X_y` arguments
Expand All @@ -105,6 +108,13 @@ class {{ name }}(
{% if 'max_iter' in parameters -%}self.n_iter_ = ...{%- endif %}
{% if estimator_type=='outlier' -%}self.offset_ = ...{%- endif %}
{% if estimator_type=='cluster' -%}self.labels_ = ...{%- endif %}
{% if estimator_type=='feature-selector'%}
self.selected_features_ = ... # TODO: Indexes of selected features
self.support_ = np.isin(
np.arange(0, self.n_features_in_), # all_features
self.selected_features_
)
{%- endif %}

return self

Expand Down Expand Up @@ -255,6 +265,20 @@ class {{ name }}(
return X_ts
{%- endif %}

{% if estimator_type=='feature-selector' -%}
def _get_support_mask(self, X):
"""Get the boolean mask indicating which features are selected.

Returns
-------
support : boolean array of shape [# input features]
An element is True iff its corresponding feature is selected for retention.
"""

check_is_fitted(self)
return self.support_
{%- endif %}

{% if tags %}
def _more_tags(self):
return {
Expand Down
24 changes: 23 additions & 1 deletion tests/test_render.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def test_common_estimator(name: str, estimator: EstimatorType, sample_weight: bo
assert ("sample_weight = _check_sample_weight(sample_weight)" in result) == sample_weight

match estimator:
case EstimatorType.TransformerMixin:
case EstimatorType.TransformerMixin | EstimatorType.SelectorMixin:
assert "X = check_array(X, ...)" in result
assert ("def fit(self, X, y=None, sample_weight=None)" in result) == (sample_weight)
assert ("def fit(self, X, y=None)" in result) == (not sample_weight)
Expand Down Expand Up @@ -191,6 +191,28 @@ def test_transformer(name: str) -> None:
assert "def predict(self, X)" not in result


def test_feature_selector(name: str) -> None:
"""Tests transformer specific rendering."""
estimator_type = EstimatorType.SelectorMixin

result = render_template(
name=name,
estimator_type=estimator_type,
required=[],
optional=[],
sample_weight=False,
linear=False,
predict_proba=False,
decision_function=False,
tags=None,
)
# Transformer specific
assert "class MightyEstimator(SelectorMixin, BaseEstimator)" in result
assert "def _get_support_mask(self, X)" in result
assert "self.support_" in result
assert "def predict(self, X)" not in result


def test_cluster(name: str) -> None:
"""Tests cluster specific rendering."""
estimator_type = EstimatorType.ClusterMixin
Expand Down

0 comments on commit 91ba7ce

Please sign in to comment.