Host-Based AI EDR – Proof of Concept
A proof-of-concept host-based Endpoint Detection & Response (EDR) system demonstrating how machine learning, explainable AI, and realtime alerting can be integrated into a single monitoring pipeline.
This project focuses on system design, data flow, and explainability, and is not intended as a production security solution.
- Agent: Python
- Backend API: Python (FastAPI)
- Machine Learning: scikit-learn (Random Forest)
- Explainability: SHAP
- Database: Supabase (PostgreSQL + Realtime)
- Dashboard: Web-based dashboard (HTML, CSS, JavaScript, Chart.js, Supabase Realtime)
The system follows a simple event-driven pipeline where host-level process activity is collected by an agent, analyzed by a backend API, stored in a database, and visualized through a web dashboard.
flowchart LR
A[Agent] -->|Sends Data| B[Backend API]
B -->|Stores Data| C[Database]
C -->|Reads Data| D[Dashboard]
- Backend API processes incoming telemetry
- Agent replays dataset events to simulate host activity
- Alerts are stored in the database and streamed to the dashboard in realtime
Note: This project is a proof-of-concept and is not packaged for public deployment.
The machine learning model is trained using the LANL Cyber Security Dataset:
proc.txt– normal process activityredteam.txt– malicious / attack-related activity
The model only recognizes patterns present in this dataset.
Unseen or unknown malware will be classified as Normal due to dataset limitations.
- Primary model: Random Forest (supervised)
Encoded features include:
- user
- computer
- process
- event type
- time-based features (hour, weekday)
An Isolation Forest model was added experimentally to explore unsupervised anomaly detection. Due to the strongly encoded and dataset-specific nature of the LANL data, it did not provide meaningful results and is not relied upon for final detection decisions.
Primary classification is performed using the Random Forest model.
SHAP (SHapley Additive exPlanations) is used to explain why an event was classified as malicious. Feature contributions are stored and displayed in the dashboard to avoid black-box decisions.
This proof-of-concept focuses on user-space, process-level telemetry. The following threat categories are considered out of scope:
- Kernel-level malware
- Memory-only attacks
- Living-off-the-land binaries (LOLBins)
- Adversarial ML evasion
- Monitors local process creation events
- Collects user, host, and process metadata
- Sends structured logs to the backend API
- Acts as a telemetry collector (no prevention or blocking)
- Feature encoding
- ML inference
- SHAP explanation generation
- Writes alerts to the database
- Stores alerts, scores, and explanations
- Publishes realtime updates for the dashboard
- Displays live alerts
- Shows total detections and response time
- Allows viewing SHAP-based explanations
Demo scripts replay malicious events from the dataset. These events are detected by the model, stored in the database, and appear instantly on the dashboard with explanations.
- Dataset-dependent detection only
- No zero-day or unknown malware detection
- No response or blocking actions
- Built as a proof-of-concept for learning and demonstration
LANL Cyber Security Dataset Los Alamos National Laboratory https://csr.lanl.gov/data/
This project was developed as part of a college mini-project. Primary responsibility for system design, model training, backend API, database integration, and dashboard implementation was handled by the author.
This project is for educational and research purposes only. It is not intended to replace commercial EDR solutions.

