Release status
The current public release on PyPI is
clusterlens==0.1.0. This is an early alpha / preview release of the library.
Expect small API adjustments and visual polish as we gather feedback.
ClusterLens is an interpretability engine for clustered / segmented data.
You already have clusters - customer segments, user personas, product tiers, risk bands.
ClusterLens answers the harder questions:
- What actually drives each cluster?
- How is Cluster 1 different from Cluster 3 in a statistically meaningful way?
- Which features make Cluster A "high value" or "high risk" compared to others?
- How can I turn a big table into cluster narratives that non-ML stakeholders can read?
ClusterLens sits on top of any clustering method (k-means, GMM, HDBSCAN, rule-based labels, etc.).
All it requires is a DataFrame with a column that holds the cluster labels.
ClusterLens wraps a train-once, reuse-everywhere pipeline:
-
One shared train/test split, stratified by cluster.
-
A one-vs-rest classifier per cluster (RandomForest by default, optional LightGBM / XGBoost).
-
SHAP values computed on a held-out evaluation set for each cluster.
-
A set of utilities that reuse this shared state to give you interpretations and exports:
- Global & per-cluster classification metrics -
get_cluster_classification_stats() - Per-cluster feature rankings -
get_top_shap_features(...),plot_cluster_shap(...) - Contrastive importance between two clusters -
contrastive_importance(...) - Distribution plots across clusters -
compare_feature_across_clusters(...) - Markdown-ready cluster narratives -
generate_cluster_narratives(...) - A cluster summary table and export helpers:
- Global & per-cluster classification metrics -
It is built to be:
- Model-agnostic on the clustering side: ClusterLens never clusters; it interprets the labels you already have.
- Numerically honest: Combines SHAP with effect sizes (
Cohen's d, standardized median gaps, Cramér’s V, lifts). - Report-friendly: Outputs narratives and tables you can drop directly into notebooks, dashboards, or slide decks.
Interactive, full documentation for the library is available here. The documentation app mirrors the API, shows example calls, and is the best place to explore ClusterLens features end to end.
We also provide a full, end-to-end notebook that illustrates how to use ClusterLens on a real credit customer segmentation dataset.
It walks through data preparation, fitting ClusterAnalyzer, inspecting SHAP plots, and generating narratives. the notebook can be found here.
-
Current release -
0.1.0: The first public version focuses on:- Core
ClusterAnalyzerAPI (fit, SHAP integration, narratives, contrastive stats). - RandomForest OVR models with optional LightGBM / XGBoost.
- Summary exports and basic SHAP bar plots for each cluster.
- A minimal, opinionated interface that works out-of-the-box on most clustered tables.
- Core
-
Next planned release -
0.1.1(upcoming): Planned improvements for the next minor version include:- Removing deprecation warnings (e.g., upcoming seaborn changes) so notebooks stay clean.
- Improved stability & error messages around input validation and edge cases.
- Better visual defaults for SHAP and distribution plots. (clearer labels, tighter layouts, more readable colors).
- Minor bug fixes and doc updates based on community feedback.
If you hit an issue or have a request for
0.1.1, please open a GitHub issue - that's what will drive the next releases.
- From PyPI (recommended):
# Fresh install:
pip install clusterlens
# Upgrade to the latest version:
pip install -U clusterlens
# With optional extras (LightGBM, XGBoost):
pip install -U "clusterlens[lightgbm,xgboost]"
# To pin a specific version:
pip install "clusterlens==0.1.0"- From GitHub (latest main):
# Install directly from the GitHub repo:
pip install "git+https://github.com/akthammomani/ClusterLens.git"
# With extras:
pip install "clusterlens[lightgbm,xgboost] @ git+https://github.com/akthammomani/ClusterLens.git"- From a local clone:
git clone https://github.com/akthammomani/ClusterLens.git
cd ClusterLens
# standard install:
pip install .
# or editable (developer) install:
pip install -e .- Inside a conda or virtual environment (recommended practice):
# Create and activate an environment, then install via pip:
conda create -n clusterlens-env python=3.10
conda activate clusterlens-env
pip install -U clusterlens # or use any of the commands aboveAfter installation you should be able to do:
from clusterlens import ClusterAnalyzer