Host-Based AI EDR

Host-Based AI EDR – Proof of Concept

A proof-of-concept host-based Endpoint Detection & Response (EDR) system demonstrating how machine learning, explainable AI, and realtime alerting can be integrated into a single monitoring pipeline.

This project focuses on system design, data flow, and explainability, and is not intended as a production security solution.

Tech Stack

Agent: Python
Backend API: Python (FastAPI)
Machine Learning: scikit-learn (Random Forest)
Explainability: SHAP
Database: Supabase (PostgreSQL + Realtime)
Dashboard: Web-based dashboard (HTML, CSS, JavaScript, Chart.js, Supabase Realtime)

System Overview

The system follows a simple event-driven pipeline where host-level process activity is collected by an agent, analyzed by a backend API, stored in a database, and visualized through a web dashboard.

System Architecture

flowchart LR
    A[Agent] -->|Sends Data| B[Backend API]
    B -->|Stores Data| C[Database]
    C -->|Reads Data| D[Dashboard]

Demo Flow

Backend API processes incoming telemetry
Agent replays dataset events to simulate host activity
Alerts are stored in the database and streamed to the dashboard in realtime

Note: This project is a proof-of-concept and is not packaged for public deployment.

Detection Approach

Dataset

The machine learning model is trained using the LANL Cyber Security Dataset:

proc.txt – normal process activity
redteam.txt – malicious / attack-related activity

The model only recognizes patterns present in this dataset.
Unseen or unknown malware will be classified as Normal due to dataset limitations.

Machine Learning

Primary model: Random Forest (supervised)

Encoded features include:

user
computer
process
event type
time-based features (hour, weekday)

Anomaly Detection (Experimental)

An Isolation Forest model was added experimentally to explore unsupervised anomaly detection. Due to the strongly encoded and dataset-specific nature of the LANL data, it did not provide meaningful results and is not relied upon for final detection decisions.

Primary classification is performed using the Random Forest model.

Explainability

SHAP (SHapley Additive exPlanations) is used to explain why an event was classified as malicious. Feature contributions are stored and displayed in the dashboard to avoid black-box decisions.

Threat Model (Out of Scope)

This proof-of-concept focuses on user-space, process-level telemetry. The following threat categories are considered out of scope:

Kernel-level malware
Memory-only attacks
Living-off-the-land binaries (LOLBins)
Adversarial ML evasion

Components

Agent

Monitors local process creation events
Collects user, host, and process metadata
Sends structured logs to the backend API
Acts as a telemetry collector (no prevention or blocking)

Backend API

Feature encoding
ML inference
SHAP explanation generation
Writes alerts to the database

Database

Stores alerts, scores, and explanations
Publishes realtime updates for the dashboard

Dashboard

Displays live alerts
Shows total detections and response time
Allows viewing SHAP-based explanations

Dashboard – Live Alerts

Detection Explanation (SHAP)

Demo Behavior

Demo scripts replay malicious events from the dataset. These events are detected by the model, stored in the database, and appear instantly on the dashboard with explanations.

Limitations

Dataset-dependent detection only
No zero-day or unknown malware detection
No response or blocking actions
Built as a proof-of-concept for learning and demonstration

Dataset Citation

LANL Cyber Security Dataset Los Alamos National Laboratory https://csr.lanl.gov/data/

Project Context

This project was developed as part of a college mini-project. Primary responsibility for system design, model training, backend API, database integration, and dashboard implementation was handled by the author.

Disclaimer

This project is for educational and research purposes only. It is not intended to replace commercial EDR solutions.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dashboard		dashboard
models		models
screenshots		screenshots
scripts		scripts
src		src
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Host-Based AI EDR

Tech Stack

System Overview

System Architecture

Demo Flow

Detection Approach

Dataset

Machine Learning

Anomaly Detection (Experimental)

Explainability

Threat Model (Out of Scope)

Components

Agent

Backend API

Database

Dashboard

Dashboard – Live Alerts

Detection Explanation (SHAP)

Demo Behavior

Limitations

Dataset Citation

Project Context

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Host-Based AI EDR

Tech Stack

System Overview

System Architecture

Demo Flow

Detection Approach

Dataset

Machine Learning

Anomaly Detection (Experimental)

Explainability

Threat Model (Out of Scope)

Components

Agent

Backend API

Database

Dashboard

Dashboard – Live Alerts

Detection Explanation (SHAP)

Demo Behavior

Limitations

Dataset Citation

Project Context

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages