" A hybrid ML (Isolation Forest) + GenAI system that detects sensor anomalies and generates human-readable root cause narratives
Anomaly detection systems in industrial operations typically stop at flaging an event. Engineers then spend significatn amount of time manually interpreting sensor readings to determine root cause and next action, which is a slow, expertise-dependent process.
The project explores whether an LLM layer added on top of a standard ML detector (Isolation Forest) could close the gap. The LLM layer would be able to automatically generate strucutred root cause narratives and recommend immediate actions from raw anomaly context in plain human language.
- Detection: Isolation Forest trained on normalized multi-sensor readings from the NASA CMAPSS Jet Engine dataset. Rolling statistics are computed per engine unit to capture degradation trends, not just point values.
- Context extraction: For each flaged anoamly, the system extracts the top deviating sensors relative to that unit's early-cycle baseline. Such numeric value would work as % deviation from baseline.
- LLM narration: Built from the anomaly context and passed to a Mistral/LLama model, following a structure of 3 fields:
likely_cause,risk_levelandrecommended_actionin JSON format.
| Layer | Tool | Reason |
|---|---|---|
| Anomaly Detection | Isolation Forest (scikit-learn) | Unsupervised, non labelled anomaly data required. |
| Feature Engineering | Rolling means, std per unit | Captures deradation trends, not just point values. |
| LLM | Mistral-small | Low cost and reliability of mistral-small for structured output. |
| UI | Streamlit | Fast iteration and deployable to HuggingFace spaces |
| Visualization | Plotly | Interactive tim series with anomaly overlay |
- Isolation Forest at contamination=0.05 flags ~1,000 anomaly events
- LLM narrative generation averages ~1.2s per explanation (Mistral free tier)
- JSON parse success rate: ~96% on first attempt; regex fallback handles the remainder
- Risk level distribution across flagged events: ~60% Medium, ~25% High, ~15% Low
# 1. Clone and set up environment
git clone https://github.com/Pinghsuanlin/llm-anomaly-explainer
cd llm-anomaly-explainer
conda create -n anomaly-explainer python=3.11 -y
conda activate anomaly-explainer
pip install -r requirements.txt
# 2. Add your API key (Mistral free tier — console.mistral.ai)
cp .env.example .env
# Edit .env: MISTRAL_API_KEY=your_key_here
(if prefer full local; no API key)
# Install Ollama from ollama.com, then:
ollama pull mistral
# Set LLM_BACKEND=ollama in your .env
# 3. Download dataset
# Place train_FD001.txt in data/raw/
# Dataset: https://data.nasa.gov/dataset/cmapss-jet-engine-simulated-data
# 4. Run
streamlit run app.py
- Why Isolation Forest over LSTM? CMAPSS has no labelled anomaly ground truth, making supervised approaches hard to validate. Isolation Forest is fast, unsupervised, and produces a continuous anomaly score. Though LSTM would add signal on degradation trajectory but requires more engineering and computing power (may be very slow in training).
- Why not send raw sensor values to the LLM?
Raw normalized floats (e.g.
s11: 0.743) are not meaningful to an LLM without domain context. Baseline-relative deviations (s11: +34% vs normal operating range) give the model something it can reason about linguistically. - Why structured JSON output?* Enforcing a schema makes the LLM output programmatically usable; also, the Streamlit UI can render risk level as a colour-coded badge rather than parsing free text. Which makes the system's behaviour testable and predictable.
- Isolation Forest has no temporal memory; that is, it treats each cycle as independent. A proper production system would use sequence-aware detection (LSTM autoencoder, Prophet) for forecast.
- LLM explanations are plausible, not verified. They reflect the model's training on engineering text, not ground truth fault labels.
- CMAPSS FD001 uses a single operating condition; real-world sensor data is noisier and multi-regime.
