Real-Time Fraud Detection System

Modular, streaming-native architecture integrating Kafka, Pathway, River Online ML, Neo4j Graph Intelligence, SHAP Explainability, and an LLM-powered Ambient Agent — engineered for sub-second predictions and production-grade fraud detection.

System Overview

The system is a fully modular, real-time fraud detection platform designed to operate at high throughput with sub-second latency. It integrates Kafka, Pathway, River online ML, Neo4j graph intelligence, LLM-powered reasoning, and a complete HITL (Human-in-the-Loop) feedback loop. All services are fully dockerized and continuously observable via OpenTelemetry + Grafana, enabling production-grade reliability and transparency.

Data Flow

The data flow begins with real-time transactions entering Kafka, which serves as the high-throughput ingestion backbone. These events are consumed by a distributed Pathway ETL pipeline, where schema validation, enrichment, rule-based classification, feature computation, and user/device profile updates occur in streaming mode. Pathway then publishes enriched transactions back to Kafka, while simultaneously persisting them to Delta Lake for auditability and offline analytics, and exporting feature snapshots to BigQuery for monthly retraining workflows. The enriched stream is consumed by downstream services that update MySQL, keeping the dashboard and APIs in sync with the latest fraud scores. In parallel, Pathway updates a dynamic fraud graph in Neo4j, where metrics such as PageRank, centrality, community structure, and cycle detection are computed and fed back into the ML pipeline. Predictions generated inside the Pathway model wrapper are streamed through Kafka and ultimately written to MySQL along with SHAP explanations and metadata. Analyst feedback travels through the same channels, enabling the ML engine to learn online. This unified streaming loop: Kafka → Pathway → Kafka → Consumers → MySQL, ensures that every component of the system receives consistent, real-time updates while maintaining exactly-once semantics, low latency, and high reliability.

Ingestion Pipeline

System Architecture

The system follows a modular, event-driven architecture centered around Kafka and Pathway, where each subsystem—ingestion, enrichment, machine learning, explainability, monitoring, and storage—operates independently but remains connected through streaming interfaces. Kafka handles real-time transaction ingestion, while Pathway performs continuous enrichment and transformation without disrupting downstream components. The ML layer, built on River, generates fraud predictions and confidence scores that are synchronized with SHAP-based explanations and the Human-in-the-Loop module for transparent decision-making. Verified transactions are stored in Delta Lake and streamed to Neo4j to build entity-relationship graphs for fraud ring detection. Grafana, the Ambient Agent, and the dashboard provide monitoring, alerting, forensic insights, and analyst-facing visualizations. This modular, stream-oriented design enables scalable, explainable, and production-ready fraud detection with minimal coupling and high maintainability.

Key Highlights

Category	Description
Real-Time Streaming	Built on Pathway + Kafka with exactly-once ingestion and low latency.
Adaptive Learning	Online ML ensemble (Half-Space Trees + One-Class SVM + Logistic Regression) using the River framework.
Explainability (XAI)	SHAP plots highlight feature influences; RCA generates natural-language summaries for transparency.
Human-in-the-Loop (HITL)	Analysts review uncertain predictions and send real-time feedback that updates the model instantly.
Ambient Agents	Autonomous background agents monitor latency, CPU, and logs; detect errors, perform root-cause analysis, and auto-recover.
Fault Tolerance	Deterministic checkpointing ensures recovery from crashes without data loss.
Graph Analytics	Neo4j visualizes user–merchant–device relationships for fraud-ring detection.
Dockerized Deployment	Single docker-compose stack for complete reproducibility.
Monitoring & Telemetry	Integrated metrics and OpenTelemetry for observability.

ML Pipeline

💻Online Machine Learning Engine (River)

The ML system uses adaptive online learners that update on every transaction:

Models:
Base Models: Adaptive Random Forest (AdaRF) which detects evolving and non-linear fraud patterns, drift aware and two logistic regression models.
Meta-Classifier: Logistic Regression which learns the optimal combination of the three models and outputs the final calibrated fraud probability
Dynamic Thresholding Fraud threshold is recalculated in real time using the most recent N predictions, making the system drift-aware and stable under rapidly changing transaction patterns.
Online Learning with HITL Feedback Analyst corrections are streamed back through Kafka → ML Engine → Database, enabling continuous incremental improvement.

👨‍💼 Human-in-the-Loop (HITL)

Analysts can override low-confidence flagged predictions. If the analyst feedback contradicts the model:

The feedback is streamed back to Kafka
Stored as ground truth in MySQL
ML model is incrementally updated
Future predictions shift closer to human judgment

🔍 Explainability & Transaction Reports

For every suspicious transaction, the frontend displays:

complete user & device profile
transaction metadata
SHAP feature importance values with graphs
natural language explanation (LLM-generated)
RCA report

This gives full transparency into why a transaction was flagged.

🕸 Graph Intelligence with Neo4j

Every enriched transaction updates a dynamic user–device–merchant graph, enabling advanced fraud-ring detection. Neo4j periodically computes:

PageRank (influence estimation)
Betweenness Centrality (bridge nodes)
Community Detection (Louvain)
Cycle Detection for suspicious loops
Neighborhood anomaly scores

🖥️System Monitoring and Persistence

Ambient Agent for RCA & Monitoring

The Ambient Agent is the autonomous supervisory layer of our fraud detection system. It continuously monitors model health, analyses fraud patterns, recalculates risk, and orchestrates alerts by coordinating three sub-agents:

Health Agent
Re-analyser Agent
Alert Agent

These agents operate inside the Pathway MCP Server, making the entire pipeline streaming-native, reactive, and self-correcting.

The Ambient Agent updates the dashboard in real time with:

Recalculated fraud scores
Alert summaries
Model health insights
Neo4j graph changes

This supports full system transparency and analyst trust.

A key innovation validated during experimentation was the introduction of an Ambient Health Check Agent. This agent continuously monitors model metrics, drift indicators, and confidence distributions. If degradation is detected, it triggers a retraining request via RabbitMQ. The retraining pipeline automatically generates an updated model, logs it to Weights & Biases, and pushes a signal back to Pathway to load the new model.

Experimental Validation

We conducted extensive experiments to compare different fraud-detection approaches across offline, online, and adaptive learning settings. Traditional supervised models such as XGBoost, Random Forests, and standalone Adaptive Random Forest achieved strong accuracy during offline evaluation on PaySim. However, they lacked the ability to continuously adapt, recalibrate thresholds, or respond to drift in real streaming environments.

To address these limitations, we designed a stacking ensemble composed of:

Adaptive Random Forest (captures non-linear fraud patterns and reacts to concept drift)
Logistic Regression (Calibrated Model)
Logistic Regression (Fraud-Sensitive Model)
combined through a Meta Logistic Regression Classifier that learns the optimal weighted combination of all base learners.

This ensemble is wrapped inside a Pathway streaming wrapper, enabling real-time ingestion, inference, and incremental updates without batch recomputation.

During experiments, the ensemble produced a highly calibrated fraud probability which was further stabilized by dynamic thresholding. HITL (human-in-the-loop) overrides were fed back into the system to correct misclassifications and improve downstream thresholds.

This validation demonstrated that the final architecture is not only accurate but also:

Fully streaming-native
Self-correcting and drift-aware
Explainable through structured LLM reasoning
Production-ready with automated monitoring and retraining

Overall, the ensemble + ambient agent + retraining loop proved significantly more robust and scalable than any standalone offline or unsupervised solution we tested.

Retraining & Model Registry

The Ambient Agent monitors model health (drift indicators, precision/recall trends, confidence distribution).

When performance degrades, it publishes a retrain request to RabbitMQ.
A separate training worker consumes this request, retrains the stacking ensemble on recent data, and logs the new version to Weights & Biases (W&B) as a production candidate.
Once approved, the inference service is restarted or updated to load the latest model from W&B.

Pathway continues to stream transactions and features, calling the updated model for scoring without changing the streaming layer.

Role of Pathway in our system

Pathway serves as the streaming backbone of our entire fraud detection system, enabling real-time ingestion, feature computation, model inference and learning, monitoring. It ensures that every component—from ML decisions to agent workflows—operates consistently on live, continuously updating data streams.

1. Exactly-Once Streaming Ingestion

Pathway consumes transactions from Kafka with checkpointed offsets, guaranteeing:

no duplicate processing,
no missing events, and
strict ordering of financial transactions.

This reliability is essential for fraud scoring and downstream auditability.

2. Pathway Incremental Checkpointing (State Recovery)

Incremental Checkpointing

Pathway snapshots operator state at regular intervals.
On crash/restart, Pathway replays Kafka events up to the checkpointed offset
This guarantees consistent reconstruction of the streaming state.

3. Real-Time Feature Enrichment

Pathway maintains live transaction tables and performs:

rolling window aggregations,
frequency/velocity feature updates,
joins with account metadata,
lookups against blocklists.

These enriched feature streams become the single source of truth for the ML pipeline, without batch recomputation or stale data.

4. Synchronized Stacking Ensemble Inference (Pathway Wrapper)

Pathway wraps our final ML architecture:

Adaptive Random Forest
Logistic Regression (Calibrated)
Logistic Regression (Fraud-Sensitive)
Meta Logistic Regression

It ensures:

temporally aligned inputs across all learners,
deterministic scoring,
low-latency inference,
consistent fraud probabilities for downstream components,
on-the-fly threshold recalculation.

The wrapper allows the model to behave as a true streaming ensemble rather than a batch classifier bolted onto a stream.

5. Integrated HITL + Ambient Agent Flows

Pathway emits events to the Ambient Health Check Agent, enabling:

live monitoring of drift, accuracy, and score distributions,
immediate health feedback loops,
coordination with the Re-analyser and Alert Agents,
dynamic recalculation of fraud labels when HITL overrides occur.

Every update flows through Pathway tables, ensuring a consistent state across the system.

6. Vector Store for Document Verification & RAG

We use the Pathway Vector Store to support real-time document intelligence:

PDF embeddings are generated and inserted as vectors
Top-k similarity retrieval runs each time a new document arrives
Retrieved chunks power LLM-based classification & RCA

This provides fast, low-latency document relevance verification at ingestion speed.

7. MCP server

Pathway serves as the MCP server for our system, hosting tool logic through @pw.udf functions:

McpServable runs the MCP server
safe inputs/outputs to LLMs
function implements the tool
register_mcp exposes it to the LLM
controlled memory access
this emphasizes LLM guardrails,

Grafana alerts trigger these tools to fetch logs, send them to the LLM for analysis, and stream summarized outputs to the frontend.

8. Production-Grade Reliability

Pathway abstracts away the complexities of:

state management,
fault tolerance,
recovery,
rollback and replay,
ordering guarantees,
concurrency control.

This allows the entire fraud platform to behave like a single, consistent, fully streaming system, drastically reducing engineering overhead.

Tech Stack

Layer	Technologies
Streaming & Compute	Apache Kafka, Pathway (Vectorstore + Real-time pipelines), Redis Streams
Offline ML Engine	Stacking Ensemble (Adaptive Random Forest + Logistic Regression + Logistic Regression → Meta Logistic Regression), Scikit-Learn
Online ML & Explainability	River (Adaptive Models, Thresholding), SHAP, Model Drift Monitors
Document AI / PDF Verification	Pathway PDF Parser, OpenAI Embeddings, Similarity Search (Vectorstore), Django Document Verification Agent
Databases & Storage	MySQL (metadata), Neo4j (Graph Fraud Network), Delta Lake / Parquet (batch data), Redis (caching), RabbitMQ (queueing) , Wandb (model registry)
Backend & APIs	Django REST Framework, FastAPI microservices, Guvicorn (ASGI server), WebSockets (real-time chat & streaming)
Monitoring & RCA & Observability	Grafana, Prometheus, Ambient Agent, Kafka UI, Neo4j Browser
Frontend	React (TypeScript), Tailwind / CSS / JS, WebSocket Client, Axios
Containerization & Deployment	Docker, Docker Compose, Ngrok Tunneling

Setup and Run Instructions

1. Clone the Repository

git clone cd

2. Set environment variables

Create a .env file, copy the environment file:

  cp .env.example .env

3. Environment Variable Setup Guide

Follow the instructions below to generate all required API keys and credentials for the platform.

1. HuggingFace Token (`HUGGINGFACE_HUB_TOKEN`)

How to get it:

Go to: https://huggingface.co/settings/tokens
Click New token.
Choose permissions: Read (pull models) or Write (push checkpoints).
Click Generate Token and copy it.

Add to .env:

HUGGINGFACE_HUB_TOKEN="hf_xxxxx"
HF_REPO_NAME="username/repo-name"

2. OpenAI API Key (`OPENAI_API_KEY`)

How to get it:

Go to: https://platform.openai.com/settings/organization/api-keys
Click Create new secret key.
Copy the key.

Add to .env:

OPENAI_API_KEY="sk-xxxxxx"

3. WANDB API Key (`WANDB_KEY`)

How to get it:

Go to: https://wandb.ai/authorize
Copy the API key displayed after login.

Add to .env:

WANDB_KEY="xxxxxxxx"
WANDB_PROJECT="fraud-detection"
REGISTRY_NAME="Model"
COLLECTION_NAME="Production"

4. AWS Credentials (S3, Delta Lake, SHAP Reports)

How to get them:

Go to https://console.aws.amazon.com/iam/home
IAM → Users → choose/create user.
Open Security Credentials tab.
Click Create Access Key → choose CLI.
Copy Access Key ID + Secret Key.

Add to .env:

AWS_ACCESS_KEY_ID="AKIAxxxxx"
AWS_SECRET_ACCESS_KEY="xxxxxxxx"

AWS_S3_ACCESS_KEY="AKIAxxxxx"
AWS_S3_SECRET_ACCESS_KEY="xxxxxxxx"
AWS_REGION="ap-south-1"

BUCKET_NAME="your-s3-bucket"
RCA_BUCKET_NAME="shapreport"
AWS_S3_SIGNATURE_VERSION="s3v4"

5. Google Cloud Credentials (`GOOGLE_APPLICATION_CREDENTIALS`)

How to get it:

Go to https://console.cloud.google.com
IAM & Admin → Service Accounts.
Create/select an account → Keys → Add Key → JSON.
Download the key file.

Add to .env:

GOOGLE_APPLICATION_CREDENTIALS="/app/key.json"

6. Neo4j Credentials

How to set:

Open Neo4j Browser → http://localhost:7474
Login with default credentials → set new password.

Add to .env:

NEO4J_URI="bolt://neo4j-fraud:7687"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="frauddetection"

7. Pinecone Vector DB (Chatbot + Document Verification)

How to get it:

Go to https://app.pinecone.io
Create an index.
Go to API Keys → copy Key & Environment.

Add to .env:

PINECONE_API_KEY="xxxxx"
PINECONE_INDEX="fraud-assistant"
PINECONE_ENVIRONMENT="gcp-starter"

8. Grafana Cloud (Metrics, Logs, Traces)

How to get credentials:

Go to https://grafana.com/auth/sign-in
Create Cloud Stack → Connections → Send Data → OpenTelemetry.
Copy:

OTLP Metrics endpoint
OTLP Logs endpoint
OTLP Traces endpoint
Access Token

Add to .env:

GRAFANA_CLOUD_ACCESS_TOKEN="xxxx"

PROMETHEUS_ENDPOINT="https://prometheus-prod-xxx.grafana.net/api/prom/push"
PROMETHEUS_INSTANCE_ID="12345"

LOKI_ENDPOINT="https://logs-prod-xxx.grafana.net/loki/api/v1/push"
LOKI_INSTANCE_ID="12345"

TEMPO_ENDPOINT="https://tempo-prod-xxx.grafana.net/tempo/api/push"
TEMPO_INSTANCE_ID="tempo-12345"

9. RabbitMQ (Celery + Async Tasks)

How to configure:

Go to http://localhost:15672
Create a new user if needed.

Add to .env:

RABBITMQ_URL="amqp://my_user:my_password@rabbitmq:5672/"

10. MCP / Ambient Agent Settings

Add to .env:

MCP_BROKER_URL="http://localhost:8123"
MCP_HOST="localhost"
MCP_PORT=8123
MCP_SERVER_NAME="Ambient MCP Server"
MCP_GATEWAY_URL="http://127.0.0.1:8123/mcp"
MCP_REQUEST_TIMEOUT=20

11. Pathway Vector Store

PATHWAY_VECTORSTORE_URL="http://127.0.0.1:8765"
PATHWAY_VECTORSTORE_API_KEY=""

12. Validate Your Environment

Run:

docker compose config

If anything is missing, Docker will show warnings or errors.

4. Running the Backend System

1. Create virtual environment and install dependencies

python -m venv venv
source venv/bin/activate

2. Start services with Docker

The full stack is dockerized. To start all services, simply run:

cd deploy/
./start.sh

This launches: Kafka + Zookeeper Schema Registry Multi-instance Pathway Cluster MySQL Neo4j + Graph Service Redis ML Engine Ambient Agent Kafka Producer/Consumers Fraud Ingest API Frontend Dashboard OTEL Collector Grafana Cloud integration

3. Start Streaming Transactions

To produce transactions from a CSV file into the streaming pipeline: Upload your CSV file to this path: fraud_ingest/data

  cd ..
  cd fraud_ingest/
  ./start.sh produce data/file.csv

Backend runs on http://localhost:8000

5. Open frontend

  http://localhost:3000

Frontend Tabs

Tab	Description (short)
Summary	KPI cards (Fraud Alerts, Protected Amount, Transactions, Suspected Txns), total transactions chart and a live transaction stream table with quick actions.
Analytics	Model evaluation: confusion matrix, ROC / PR curves, partial dependence (Amount vs Fraud probability), feature importance trends and per-slice metrics for root-cause.
Audits	Live audit log stream from Pathway, Kafka, ML engine and Ambient Agent. Search, filter by service/level, and link log lines to transactions.
Metrics	System metrics and historical charts: Pathway CPU/Memory, Kafka lag, model latency, consumer throughput. Time-range selector + service filter.
Observability	Web Vitals, traces, request latency percentiles (p50/p90/p99). Correlates traces with transaction IDs for RCA.
AI Assistant	Natural language queries to fetch transactions, create custom plots, compare user/device behavior, and verify uploaded documents (PDF).
Transaction Details	Inspect a transaction: user/device profiles, SHAP waterfall & bar charts, NLP explanation, graph context, verdict & history. Includes SHAP PDF/report link.
Graph Explorer	Interactive Neo4j visualizer – search nodes, view communities (Louvain), PageRank, centrality and detect suspicious cycles.
Feedback	Analyst feedback UI (Verify modal). Low-confidence predictions automatically go to analyst queue. Feedback is streamed to ground truth and triggers online learning.
Ambient Agent	LLM-powered background agent producing RCA summaries, retrain suggestions, and automated remediation playbooks.
Security & Audit	All actions are authenticated and role-based. All analyst overrides and model changes are timestamped and auditable.

Frontend Interface Overview

The frontend is a real-time, analyst-centric investigation console designed for transparency, rapid triage, and deep forensic analysis. It brings together live transaction streams, ML explanations, graph intelligence, system metrics, and LLM reasoning into one unified interface.

The dashboard transforms raw streaming data into actionable insights, enabling analysts to understand what happened, why it happened, and how to respond - all within a few seconds.

Frontend Dashboard Snapshot

---

Performance Metrics

Metric	Result
Latency	~40 ms
Throughput	~70,000 transactions/min (simulated)
Precision	0.923 (Stacking Ensemble)
Recall	0.923 (Stacking Ensemble)
F1 Score	0.923 (Stacking Ensemble)
Accuracy	99.9% (PaySim dataset)

Production-Ready Features

Our platform is engineered for enterprise-grade reliability, fault tolerance, and real-time performance.

1. Fault-Tolerant Streaming Architecture

Distributed Kafka ingestion with high throughput
Exactly-once semantics via Pathway offset checkpointing
Multi-instance Pathway cluster for horizontal scaling and auto-recovery

2. Real-Time ETL, Rule Engine & Enrichment<

Millisecond-level transformations and validation
Real-time rule-based classification inside Pathway
Continuous profile updates (user, device, merchant)
Live streaming joins and feature computation

3. Persistent & Replayable Storage

Delta Lake for durable, versioned storage
BigQuery for historical analysis, drift detection, and retraining
Scheduled monthly Vertex AI training pipeline

4. Graph Intelligence at Scale

Neo4j fraud graph updated in real time
PageRank, Centrality, Community Detection, Cycle Detection
Optimized reads for analyst-facing tools

5. Adaptive Online Machine Learning

River-based online learning updated continuously
Dynamic Thresholding adapting to streaming distributions
Human-in-the-loop feedback correcting drift instantly

6. Human-in-the-Loop Decisioning

Suspicious predictions routed to analysts
Delayed finalization: model decisions finalized only without feedback
Ground-truth ingestion improves online model performance

7. LLM Guardrails & Ambient Agent

Strict schemas and sanitization for safe LLM execution
Autonomous root-cause reasoning from system logs
ML health assessment and drift monitoring

8. End-to-End Observability

OpenTelemetry instrumentation across all services
Unified logs, metrics, and traces via Grafana
Dashboard visibility into latency, Kafka lag, system health, error rates
Frontend displays live audit logs for transparency

9. Secure & Compliant API Layer

Authentication required for all backend endpoints
Role-based access for analysts
Complete audit trails of decisions and overrides

10. Fully Containerized Deployment

Single docker-compose stack with ~20 interconnected services
Reproducible environments with isolated networks & persistent volumes
One-command startup: ./start.sh and data injection

11. Real-Time Explainability & Analytics

SHAP explanation graphs for every transaction
NLP-powered reasoning summaries
Real-time fraud trend analytics, confusion matrix, PR curves

12. Analyst-Centric Tools

AI assistant for querying transactions, generating insights & document verification
Transaction explorer with SHAP, graph context, user/device profiles
Feedback portal updating the model instantly

Future Improvements

1. Integrating BDH (Baby Dragon Hatchling)

A brain-inspired LLM model using scale-free neuron graphs and Hebbian plasticity to enable transaction-level adaptation without retraining.
Future pipeline: Kafka → Pathway → Neo4j → BDH (Hebbian) → Score + Synaptic Explanation.

2. TGN (Templar Graph Network) for Deep Graph Intelligence

Introduce a Templar Graph Network (TGN) layer on top of the Neo4j transaction graph to model:

Long-term temporal patterns
Entity evolution (user, device, merchant)
Cross-community fraud signals
High-order structural anomalies

TGN will run in parallel with online ML to generate deeper risk embeddings and detect coordinated fraud rings.

3. Advanced Neo4j Background Analysis

Enhance background graph processing with:

Multi-hop cycle detection (fraud loops)
Temporal community drift analysis
Subgraph anomaly detection for collusion patterns
Fraud ring “core node” identification using centrality deltas

These computations will run asynchronously and feed back enriched graph features into Pathway.

4. Data Encryption & Privacy Enhancements

As the system handles sensitive financial information, the next improvements include:

Full encryption-at-rest (Delta Lake, MySQL, Neo4j)
Field-level encryption for personally identifiable information
Encrypted feature generation pipelines for KYC & device metadata
Secure audit logging with tamper-evident storage

5. TLS & Secure Transport Across All Services

Upgrade the entire microservice ecosystem to strict TLS:

mTLS between backend → Kafka → Pathway → Neo4j
TLS termination at ingress for API and frontend
Certificate rotation automation
Zero-trust service communication policies

This reduces the attack surface and protects privacy-critical financial flows.

6. Security Hardening & Compliance

As the pipeline contains highly sensitive user and transaction information:

Migration toward RBAC + OAuth2
Request lineage tracking
SIEM integration for anomaly detection
Compliance alignment for financial standards (PCI-DSS, ISO 27001)

Name		Name	Last commit message	Last commit date
Latest commit History 171 Commits
BDH		BDH
assets		assets
deploy		deploy
experiments		experiments
fraud_ingest		fraud_ingest
frontend		frontend
ml_engine		ml_engine
neo4j_service		neo4j_service
vertex		vertex
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
nginx.conf		nginx.conf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt
setup_instructions.md		setup_instructions.md

YugDalwadi/FraudDetection

Folders and files

Latest commit

History

Repository files navigation

Real-Time Fraud Detection System

System Overview

Data Flow

Ingestion Pipeline

System Architecture

Key Highlights

ML Pipeline

💻Online Machine Learning Engine (River)

👨‍💼 Human-in-the-Loop (HITL)

🔍 Explainability & Transaction Reports

🕸 Graph Intelligence with Neo4j

🖥️System Monitoring and Persistence

Ambient Agent for RCA & Monitoring

Experimental Validation

Retraining & Model Registry

Role of Pathway in our system

1. Exactly-Once Streaming Ingestion

2. Pathway Incremental Checkpointing (State Recovery)

3. Real-Time Feature Enrichment

4. Synchronized Stacking Ensemble Inference (Pathway Wrapper)

5. Integrated HITL + Ambient Agent Flows

6. Vector Store for Document Verification & RAG

7. MCP server

8. Production-Grade Reliability

Tech Stack

Setup and Run Instructions

1. Clone the Repository

2. Set environment variables

3. Environment Variable Setup Guide

1. HuggingFace Token (HUGGINGFACE_HUB_TOKEN)

2. OpenAI API Key (OPENAI_API_KEY)

3. WANDB API Key (WANDB_KEY)

4. AWS Credentials (S3, Delta Lake, SHAP Reports)

5. Google Cloud Credentials (GOOGLE_APPLICATION_CREDENTIALS)

6. Neo4j Credentials

7. Pinecone Vector DB (Chatbot + Document Verification)

8. Grafana Cloud (Metrics, Logs, Traces)

9. RabbitMQ (Celery + Async Tasks)

10. MCP / Ambient Agent Settings

11. Pathway Vector Store

12. Validate Your Environment

4. Running the Backend System

1. Create virtual environment and install dependencies

2. Start services with Docker

3. Start Streaming Transactions

Backend runs on http://localhost:8000

5. Open frontend

Frontend Tabs

Frontend Interface Overview

Frontend Dashboard Snapshot

Performance Metrics

Production-Ready Features

1. Fault-Tolerant Streaming Architecture

2. Real-Time ETL, Rule Engine & Enrichment<

3. Persistent & Replayable Storage

4. Graph Intelligence at Scale

5. Adaptive Online Machine Learning

6. Human-in-the-Loop Decisioning

7. LLM Guardrails & Ambient Agent

8. End-to-End Observability

9. Secure & Compliant API Layer

10. Fully Containerized Deployment

11. Real-Time Explainability & Analytics

12. Analyst-Centric Tools

Future Improvements

1. Integrating BDH (Baby Dragon Hatchling)

2. TGN (Templar Graph Network) for Deep Graph Intelligence

3. Advanced Neo4j Background Analysis

4. Data Encryption & Privacy Enhancements

5. TLS & Secure Transport Across All Services

6. Security Hardening & Compliance

References

About

Resources

Uh oh!

1. HuggingFace Token (`HUGGINGFACE_HUB_TOKEN`)

2. OpenAI API Key (`OPENAI_API_KEY`)

3. WANDB API Key (`WANDB_KEY`)

5. Google Cloud Credentials (`GOOGLE_APPLICATION_CREDENTIALS`)

Packages