Skip to content

shaaz1n/Host-based-AI-EDR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Host-Based AI EDR

Host-Based AI EDR – Proof of Concept

A proof-of-concept host-based Endpoint Detection & Response (EDR) system demonstrating how machine learning, explainable AI, and realtime alerting can be integrated into a single monitoring pipeline.

This project focuses on system design, data flow, and explainability, and is not intended as a production security solution.


Tech Stack

  • Agent: Python
  • Backend API: Python (FastAPI)
  • Machine Learning: scikit-learn (Random Forest)
  • Explainability: SHAP
  • Database: Supabase (PostgreSQL + Realtime)
  • Dashboard: Web-based dashboard (HTML, CSS, JavaScript, Chart.js, Supabase Realtime)

System Overview

The system follows a simple event-driven pipeline where host-level process activity is collected by an agent, analyzed by a backend API, stored in a database, and visualized through a web dashboard.


System Architecture

flowchart LR
    A[Agent] -->|Sends Data| B[Backend API]
    B -->|Stores Data| C[Database]
    C -->|Reads Data| D[Dashboard]
Loading

Demo Flow

  1. Backend API processes incoming telemetry
  2. Agent replays dataset events to simulate host activity
  3. Alerts are stored in the database and streamed to the dashboard in realtime

Note: This project is a proof-of-concept and is not packaged for public deployment.


Detection Approach

Dataset

The machine learning model is trained using the LANL Cyber Security Dataset:

  • proc.txt – normal process activity
  • redteam.txt – malicious / attack-related activity

The model only recognizes patterns present in this dataset.
Unseen or unknown malware will be classified as Normal due to dataset limitations.


Machine Learning

  • Primary model: Random Forest (supervised)

Encoded features include:

  • user
  • computer
  • process
  • event type
  • time-based features (hour, weekday)

Anomaly Detection (Experimental)

An Isolation Forest model was added experimentally to explore unsupervised anomaly detection. Due to the strongly encoded and dataset-specific nature of the LANL data, it did not provide meaningful results and is not relied upon for final detection decisions.

Primary classification is performed using the Random Forest model.

Explainability

SHAP (SHapley Additive exPlanations) is used to explain why an event was classified as malicious. Feature contributions are stored and displayed in the dashboard to avoid black-box decisions.


Threat Model (Out of Scope)

This proof-of-concept focuses on user-space, process-level telemetry. The following threat categories are considered out of scope:

  • Kernel-level malware
  • Memory-only attacks
  • Living-off-the-land binaries (LOLBins)
  • Adversarial ML evasion

Components

Agent

  • Monitors local process creation events
  • Collects user, host, and process metadata
  • Sends structured logs to the backend API
  • Acts as a telemetry collector (no prevention or blocking)

Backend API

  • Feature encoding
  • ML inference
  • SHAP explanation generation
  • Writes alerts to the database

Database

  • Stores alerts, scores, and explanations
  • Publishes realtime updates for the dashboard

Dashboard

  • Displays live alerts
  • Shows total detections and response time
  • Allows viewing SHAP-based explanations

Dashboard – Live Alerts

Live Dashboard


Detection Explanation (SHAP)

Detection Explanation


Demo Behavior

Demo scripts replay malicious events from the dataset. These events are detected by the model, stored in the database, and appear instantly on the dashboard with explanations.

Limitations

  • Dataset-dependent detection only
  • No zero-day or unknown malware detection
  • No response or blocking actions
  • Built as a proof-of-concept for learning and demonstration

Dataset Citation

LANL Cyber Security Dataset Los Alamos National Laboratory https://csr.lanl.gov/data/

Project Context

This project was developed as part of a college mini-project. Primary responsibility for system design, model training, backend API, database integration, and dashboard implementation was handled by the author.

Disclaimer

This project is for educational and research purposes only. It is not intended to replace commercial EDR solutions.

About

Host-based Endpoint Detection & Response (EDR) proof-of-concept using machine learning and explainable AI for realtime threat monitoring.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors