Skip to content

Releases: chrbailey/transaction-forensics

v3.0.0 — Elite: BERTopic + Network + Temporal

25 Mar 21:23

Choose a tag to compare

Transaction Forensics v3.0.0

Elite-tier upgrade. Semantic NLP, graph analysis, temporal detection.

New Capabilities

BERTopic Clustering (replaces TF-IDF + KMeans as primary)

  • Sentence-Transformers (all-MiniLM-L6-v2) for semantic embeddings
  • UMAP dimensionality reduction + HDBSCAN density-based clustering
  • 371 semantic topics found (vs 16 KMeans clusters)
  • Silhouette: 0.09 (3.5x improvement over KMeans 0.026)
  • Topics are genuinely meaningful: "kubernetes/pod/cluster", "graphql/schema", "onboarding/ux"

Network Analysis (new)

  • Communication graph via NetworkX
  • Louvain community detection (7 communities found)
  • Bridge user identification (betweenness centrality)
  • Product silo detection (isolated products with no shared users)
  • Graph density: 0.062

Temporal Change-Point Detection (new)

  • Ruptures PELT algorithm with RBF kernel
  • 3 change-points detected across 549-day window
  • Per-product trend analysis (increasing/stable/decreasing)
  • Peak activity identification (208 messages, Oct 8 2026)

Pipeline (9 stages)

Ingest → Normalize → Embed (SBERT) → BERTopic → Network → Temporal → KMeans (stability baseline) → Bootstrap → Measure → Report

Total: ~109s on 37K docs (CPU only)

Live Demo

transaction-forensics.vercel.app

v2.0.0 — Evidence-Backed Scoring

25 Mar 21:05

Choose a tag to compare

Transaction Forensics v2.0.0

Addresses all reviewer feedback. Moves from pattern detection to evidence-backed scoring.

What Changed

1. Computed Metrics Replace Keyword Heuristics

  • Severity now computed from: cross-team entropy, author Gini coefficient, customer density, source diversity, temporal span, reaction rate
  • Every metric includes HOW it was computed (method field)
  • No keyword matching anywhere in the severity pipeline

2. Bootstrap Cluster Stability

  • 10 runs with different random seeds
  • Jaccard similarity scoring per cluster across runs
  • Only clusters stable >50% of runs are surfaced
  • 4 of 15 clusters pruned as unstable in latest run

3. Relational Joins

  • Employee master (530 employees) cross-referenced with message authors
  • Customer master (120 accounts) joined with message content
  • Product-to-customer relationships surfaced

Results

  • 37,064 documents analyzed across 30 products
  • 11 patterns surfaced, 4 pruned for instability
  • Pipeline: 33 seconds, no GPU required

Live Demo

transaction-forensics.vercel.app

v1.0.0 — Initial Release

25 Mar 20:14

Choose a tag to compare

Transaction Forensics v1.0.0

Enterprise communication pattern analysis using NLP clustering.

What's Included

  • Pattern engine (analyze.py) — TF-IDF + KMeans pipeline
  • Interactive viewer (public/index.html) — zero-dependency static frontend
  • Pre-computed analysis of Salesforce/HERB dataset (37,064 documents)
  • Pipeline transparency showing computation provenance

Analysis Results

  • 37,064 documents analyzed (32,781 Slack messages, 321 transcripts, 400 documents, 3,562 PRs)
  • 30 products, 120 customers, 18 team members
  • 10 patterns surfaced across compliance, bottleneck, communication, and approval categories
  • 18-second total pipeline execution (no GPU required)

Quick Start

pip install -r requirements.txt
python analyze.py
open public/index.html

Live Demo

transaction-forensics.vercel.app

Data Source

Salesforce/HERB — Heterogeneous Enterprise Reasoning Benchmark (CC-BY-NC-4.0)