This repository contains the "AI brain" for the TracerAI platform. It is a data-processing backend built in Python using FastAPI and Scikit-learn, designed to work with the go-Tracer sensor.
Its job is to ingest, enrich, analyze, and store the FlowEvent summaries sent by one or more go-Tracer agents.
- High-Speed Ingestion: Built with FastAPI to asynchronously handle thousands of incoming flow events.
- Real-time Enrichment: Automatically enriches incoming flows with GeoIP data (e.g., "Local" -> "USA").
- Advanced Feature Engineering: Includes an offline script (
feature_engineer.py) to aggregate raw flows into behavioral "fingerprints" for each host (e.g.,port_entropy,unique_dest_ips). - AI Anomaly Detection: Uses an IsolationForest model (from
scikit-learn) trained on these advanced fingerprints to detect anomalous host behavior. - Simple Dashboard: Includes a basic HTML dashboard (served by Jinja2) to view the latest raw flow data.
The backend uses a multi-stage pipeline, separating "fast" collection from "slow" analysis.
-
Stage 1: Collect (Real-time)
- The
go-Traceragent sends aFlowEventto thePOST /ingestendpoint. - FastAPI enriches the event with GeoIP data and saves it to the
flow_eventstable.
- The
-
Stage 2: Engineer Features (Offline)
- The
python feature_engineer.pyscript is run (e.g., every 10 minutes via cron). - It queries the
flow_eventstable, aggregates data by host, and saves the "fingerprints" to thehost_behavior_summarytable.
- The
-
Stage 3: Train AI (Offline)
- The
python train.pyscript is run. - It loads all "fingerprints" from the
host_behavior_summarytable and trains theIsolationForestAI model, saving it to a.pklfile.
- The
-
Stage 4: Detect (On-Demand)
- A request to
POST /run-analysistriggers a background task. - This task runs the feature engineering (Stage 2) and then runs the AI (Stage 3) on the new data, printing any
🚨 [AI ALERT] 🚨messages.
- A request to
-
Clone the repository:
git clone https://github.com/codetheuri/TracerAI.git cd TracerAI -
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt
-
Download GeoIP Database:
- Sign up for a free MaxMind account at maxmind.com.
- Download the
GeoLite2-Country.mmdbfile and place it in the root of this project folder.
You must run these steps in order.
Step 1: Collect Data (Let it run for 10-15 minutes)
- Start the FastAPI server (this creates the database):
uvicorn app.main:app --reload --port 8000
- Start your go-Tracer agent (in its own terminal), making sure its
AGENT_ENDPOINT_URLis pointed tohttp://127.0.0.1:8000/ingest. - Let this run and collect raw flow data.
- Once done, stop the
uvicornserver (Ctrl+C) to unlock the database.
Step 2: Engineer Features
- Run the feature engineering script to process the raw data:
python feature_engineer.py
Step 3: Train the AI
- Run the training script to build your AI model:
python train.py
Step 4: Go "Live"
- Restart the FastAPI server. It will now load the new AI models.
uvicorn app.main:app --reload --port 8000
- You can view the live data stream on the dashboard at
http://127.0.0.1:8000/. - To trigger an analysis of the latest data, send a POST request:
curl -X POST http://127.0.0.1:8000/run-analysis