Releases: chrbailey/transaction-forensics
Releases · chrbailey/transaction-forensics
v3.0.0 — Elite: BERTopic + Network + Temporal
Transaction Forensics v3.0.0
Elite-tier upgrade. Semantic NLP, graph analysis, temporal detection.
New Capabilities
BERTopic Clustering (replaces TF-IDF + KMeans as primary)
- Sentence-Transformers (all-MiniLM-L6-v2) for semantic embeddings
- UMAP dimensionality reduction + HDBSCAN density-based clustering
- 371 semantic topics found (vs 16 KMeans clusters)
- Silhouette: 0.09 (3.5x improvement over KMeans 0.026)
- Topics are genuinely meaningful: "kubernetes/pod/cluster", "graphql/schema", "onboarding/ux"
Network Analysis (new)
- Communication graph via NetworkX
- Louvain community detection (7 communities found)
- Bridge user identification (betweenness centrality)
- Product silo detection (isolated products with no shared users)
- Graph density: 0.062
Temporal Change-Point Detection (new)
- Ruptures PELT algorithm with RBF kernel
- 3 change-points detected across 549-day window
- Per-product trend analysis (increasing/stable/decreasing)
- Peak activity identification (208 messages, Oct 8 2026)
Pipeline (9 stages)
Ingest → Normalize → Embed (SBERT) → BERTopic → Network → Temporal → KMeans (stability baseline) → Bootstrap → Measure → Report
Total: ~109s on 37K docs (CPU only)
Live Demo
v2.0.0 — Evidence-Backed Scoring
Transaction Forensics v2.0.0
Addresses all reviewer feedback. Moves from pattern detection to evidence-backed scoring.
What Changed
1. Computed Metrics Replace Keyword Heuristics
- Severity now computed from: cross-team entropy, author Gini coefficient, customer density, source diversity, temporal span, reaction rate
- Every metric includes HOW it was computed (method field)
- No keyword matching anywhere in the severity pipeline
2. Bootstrap Cluster Stability
- 10 runs with different random seeds
- Jaccard similarity scoring per cluster across runs
- Only clusters stable >50% of runs are surfaced
- 4 of 15 clusters pruned as unstable in latest run
3. Relational Joins
- Employee master (530 employees) cross-referenced with message authors
- Customer master (120 accounts) joined with message content
- Product-to-customer relationships surfaced
Results
- 37,064 documents analyzed across 30 products
- 11 patterns surfaced, 4 pruned for instability
- Pipeline: 33 seconds, no GPU required
Live Demo
v1.0.0 — Initial Release
Transaction Forensics v1.0.0
Enterprise communication pattern analysis using NLP clustering.
What's Included
- Pattern engine (
analyze.py) — TF-IDF + KMeans pipeline - Interactive viewer (
public/index.html) — zero-dependency static frontend - Pre-computed analysis of Salesforce/HERB dataset (37,064 documents)
- Pipeline transparency showing computation provenance
Analysis Results
- 37,064 documents analyzed (32,781 Slack messages, 321 transcripts, 400 documents, 3,562 PRs)
- 30 products, 120 customers, 18 team members
- 10 patterns surfaced across compliance, bottleneck, communication, and approval categories
- 18-second total pipeline execution (no GPU required)
Quick Start
pip install -r requirements.txt
python analyze.py
open public/index.htmlLive Demo
transaction-forensics.vercel.app
Data Source
Salesforce/HERB — Heterogeneous Enterprise Reasoning Benchmark (CC-BY-NC-4.0)