This repository demonstrates a machine learning pipeline for detecting MITRE ATT&CK techniques from logs and enriching the output using a local LLM.
The project is divided into two main components:
- Machine Learning Model (ML) – classifies logs to MITRE ATT&CK techniques
- LLM Enrichment – enriches the ML prediction with analyst-friendly explanations and actionable insights
The ML model is trained to predict a MITRE technique (or BENIGN) from log events.
This allows automation of detection and categorization of potentially malicious behavior.
- Feature:
commandlinefield from logs - Vectorization: TF-IDF (Term Frequency – Inverse Document Frequency)
- Model: RandomForestClassifier (robust, interpretable, CPU-friendly)
- Target:
mitre_label(e.g.,T1059.001,T1105,BENIGN)
python scripts/train_mitre_model.pyThis script:
-
Loads the dataset dataset_full_160k.csv
-
Splits data into train/test sets
-
Converts command lines to TF-IDF vectors
-
Trains the Random Forest model
-
Evaluates performance (precision, recall, F1-score, confusion matrix)
-
Saves the trained model and vectorizer:
-
models/mitre_ml_model.pkl
-
models/tfidf_vectorizer.pkl
-
Once the ML model predicts a MITRE technique, the LLM enriches the result by providing:
-
Technique explanation
-
Why the command matches the technique
-
Attacker intent
-
Recommended investigation steps
-
Suggested detection rules
This step bridges raw ML prediction and SOC analyst actionable insights.
Local LLM (e.g., Phi-3 via Ollama) is called with a prompt containing:
-
ML prediction
-
Raw command line
python scripts/enrich_with_llm.pypip3 install -r requirements.txtLibrairies
- pandas
- scikit-learn
- requests
- test_model.py is used for testing your model, it is optional.
- The scripts are designed to be run locally, in a Python 3.13+ environment with the listed dependencies.
- Install Ollama on your machine and add the Phi-3 mini model.

