Skip to content

akthammomani/ClusterLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python NumPy Pandas Matplotlib Seaborn SciPy scikit-learn RF Encoders Metrics SHAP LightGBM XGBoost Task Contrastive analysis Distributions Narratives Exports License Status PyPI

ClusterLens logo

Release status

The current public release on PyPI is clusterlens==0.1.0. This is an early alpha / preview release of the library.
Expect small API adjustments and visual polish as we gather feedback.

ClusterLens

ClusterLens is an interpretability engine for clustered / segmented data.

You already have clusters - customer segments, user personas, product tiers, risk bands.
ClusterLens answers the harder questions:

  • What actually drives each cluster?
  • How is Cluster 1 different from Cluster 3 in a statistically meaningful way?
  • Which features make Cluster A "high value" or "high risk" compared to others?
  • How can I turn a big table into cluster narratives that non-ML stakeholders can read?

ClusterLens sits on top of any clustering method (k-means, GMM, HDBSCAN, rule-based labels, etc.).
All it requires is a DataFrame with a column that holds the cluster labels.

Key ideas

ClusterLens wraps a train-once, reuse-everywhere pipeline:

  1. One shared train/test split, stratified by cluster.

  2. A one-vs-rest classifier per cluster (RandomForest by default, optional LightGBM / XGBoost).

  3. SHAP values computed on a held-out evaluation set for each cluster.

  4. A set of utilities that reuse this shared state to give you interpretations and exports:

    • Global & per-cluster classification metrics - get_cluster_classification_stats()
    • Per-cluster feature rankings - get_top_shap_features(...), plot_cluster_shap(...)
    • Contrastive importance between two clusters - contrastive_importance(...)
    • Distribution plots across clusters - compare_feature_across_clusters(...)
    • Markdown-ready cluster narratives - generate_cluster_narratives(...)
    • A cluster summary table and export helpers:

It is built to be:

  • Model-agnostic on the clustering side: ClusterLens never clusters; it interprets the labels you already have.
  • Numerically honest: Combines SHAP with effect sizes (Cohen's d, standardized median gaps, Cramér’s V, lifts).
  • Report-friendly: Outputs narratives and tables you can drop directly into notebooks, dashboards, or slide decks.

Library documentation

Interactive, full documentation for the library is available here. The documentation app mirrors the API, shows example calls, and is the best place to explore ClusterLens features end to end.

Notebook example: credit customer segmentation

We also provide a full, end-to-end notebook that illustrates how to use ClusterLens on a real credit customer segmentation dataset.
It walks through data preparation, fitting ClusterAnalyzer, inspecting SHAP plots, and generating narratives. the notebook can be found here.

Release roadmap

  • Current release - 0.1.0: The first public version focuses on:

    • Core ClusterAnalyzer API (fit, SHAP integration, narratives, contrastive stats).
    • RandomForest OVR models with optional LightGBM / XGBoost.
    • Summary exports and basic SHAP bar plots for each cluster.
    • A minimal, opinionated interface that works out-of-the-box on most clustered tables.
  • Next planned release - 0.1.1 (upcoming): Planned improvements for the next minor version include:

    • Removing deprecation warnings (e.g., upcoming seaborn changes) so notebooks stay clean.
    • Improved stability & error messages around input validation and edge cases.
    • Better visual defaults for SHAP and distribution plots. (clearer labels, tighter layouts, more readable colors).
    • Minor bug fixes and doc updates based on community feedback.

If you hit an issue or have a request for 0.1.1, please open a GitHub issue - that's what will drive the next releases.

Installation

# Fresh install:
pip install clusterlens

# Upgrade to the latest version:
pip install -U clusterlens

# With optional extras (LightGBM, XGBoost):
pip install -U "clusterlens[lightgbm,xgboost]"

# To pin a specific version:
pip install "clusterlens==0.1.0"
  • From GitHub (latest main):
# Install directly from the GitHub repo:
pip install "git+https://github.com/akthammomani/ClusterLens.git"

# With extras:
pip install "clusterlens[lightgbm,xgboost] @ git+https://github.com/akthammomani/ClusterLens.git"
  • From a local clone:
git clone https://github.com/akthammomani/ClusterLens.git
cd ClusterLens

# standard install:
pip install .

# or editable (developer) install:
pip install -e .
  • Inside a conda or virtual environment (recommended practice):
# Create and activate an environment, then install via pip:
conda create -n clusterlens-env python=3.10
conda activate clusterlens-env
pip install -U clusterlens       # or use any of the commands above

After installation you should be able to do:

from clusterlens import ClusterAnalyzer