ClusterLens

Release status

The current public release on PyPI is clusterlens==0.1.0. This is an early alpha / preview release of the library.
Expect small API adjustments and visual polish as we gather feedback.

ClusterLens

ClusterLens is an interpretability engine for clustered / segmented data.

You already have clusters - customer segments, user personas, product tiers, risk bands.
ClusterLens answers the harder questions:

What actually drives each cluster?
How is Cluster 1 different from Cluster 3 in a statistically meaningful way?
Which features make Cluster A "high value" or "high risk" compared to others?
How can I turn a big table into cluster narratives that non-ML stakeholders can read?

ClusterLens sits on top of any clustering method (k-means, GMM, HDBSCAN, rule-based labels, etc.).
All it requires is a DataFrame with a column that holds the cluster labels.

Key ideas

ClusterLens wraps a train-once, reuse-everywhere pipeline:

One shared train/test split, stratified by cluster.
A one-vs-rest classifier per cluster (RandomForest by default, optional LightGBM / XGBoost).
SHAP values computed on a held-out evaluation set for each cluster.
A set of utilities that reuse this shared state to give you interpretations and exports:
- Global & per-cluster classification metrics - get_cluster_classification_stats()
- Per-cluster feature rankings - get_top_shap_features(...), plot_cluster_shap(...)
- Contrastive importance between two clusters - contrastive_importance(...)
- Distribution plots across clusters - compare_feature_across_clusters(...)
- Markdown-ready cluster narratives - generate_cluster_narratives(...)
- A cluster summary table and export helpers:

It is built to be:

Model-agnostic on the clustering side: ClusterLens never clusters; it interprets the labels you already have.
Numerically honest: Combines SHAP with effect sizes (Cohen's d, standardized median gaps, Cramér’s V, lifts).
Report-friendly: Outputs narratives and tables you can drop directly into notebooks, dashboards, or slide decks.

Library documentation

Interactive, full documentation for the library is available here. The documentation app mirrors the API, shows example calls, and is the best place to explore ClusterLens features end to end.

Notebook example: credit customer segmentation

We also provide a full, end-to-end notebook that illustrates how to use ClusterLens on a real credit customer segmentation dataset.
It walks through data preparation, fitting ClusterAnalyzer, inspecting SHAP plots, and generating narratives. the notebook can be found here.

Release roadmap

Current release - 0.1.0: The first public version focuses on:
- Core ClusterAnalyzer API (fit, SHAP integration, narratives, contrastive stats).
- RandomForest OVR models with optional LightGBM / XGBoost.
- Summary exports and basic SHAP bar plots for each cluster.
- A minimal, opinionated interface that works out-of-the-box on most clustered tables.
Next planned release - 0.1.1 (upcoming): Planned improvements for the next minor version include:
- Removing deprecation warnings (e.g., upcoming seaborn changes) so notebooks stay clean.
- Improved stability & error messages around input validation and edge cases.
- Better visual defaults for SHAP and distribution plots. (clearer labels, tighter layouts, more readable colors).
- Minor bug fixes and doc updates based on community feedback.

If you hit an issue or have a request for 0.1.1, please open a GitHub issue - that's what will drive the next releases.

Installation

From PyPI (recommended):

# Fresh install:
pip install clusterlens

# Upgrade to the latest version:
pip install -U clusterlens

# With optional extras (LightGBM, XGBoost):
pip install -U "clusterlens[lightgbm,xgboost]"

# To pin a specific version:
pip install "clusterlens==0.1.0"

From GitHub (latest main):

# Install directly from the GitHub repo:
pip install "git+https://github.com/akthammomani/ClusterLens.git"

# With extras:
pip install "clusterlens[lightgbm,xgboost] @ git+https://github.com/akthammomani/ClusterLens.git"

From a local clone:

git clone https://github.com/akthammomani/ClusterLens.git
cd ClusterLens

# standard install:
pip install .

# or editable (developer) install:
pip install -e .

Inside a conda or virtual environment (recommended practice):

# Create and activate an environment, then install via pip:
conda create -n clusterlens-env python=3.10
conda activate clusterlens-env
pip install -U clusterlens       # or use any of the commands above

After installation you should be able to do:

from clusterlens import ClusterAnalyzer

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Example		Example
src/clusterlens		src/clusterlens
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClusterLens

Key ideas

Library documentation

Notebook example: credit customer segmentation

Release roadmap

Installation

About

Uh oh!

Releases

Packages

Languages

License

akthammomani/ClusterLens

Folders and files

Latest commit

History

Repository files navigation

ClusterLens

Key ideas

Library documentation

Notebook example: credit customer segmentation

Release roadmap

Installation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages