Skip to content
/ SOC Public

Intelligent SOC automation framework powered by LangGraph multi-agent workflows for alert triage, correlation, and incident response

License

Notifications You must be signed in to change notification settings

gapilongo/SOC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

96 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SOC POC with LangGraph

๐Ÿ”’ Intelligent Security Operations Center - Multi-Agent Workflow System

An advanced Security Operations Center (SOC) proof-of-concept built on LangGraph, featuring autonomous AI agents for alert processing, threat analysis, and incident response. This system demonstrates how modern AI can enhance security operations through intelligent automation, human-AI collaboration, and continuous learning.

๐Ÿ—๏ธ Architecture Overview

The SOC POC implements a sophisticated multi-layered architecture designed for scalability, reliability, and intelligent decision-making:

Core Framework

  • State Management: Centralized state handling with versioning and audit trails
  • Workflow Engine: LangGraph-powered orchestration for complex multi-agent workflows
  • Configuration: Dynamic policy-driven configuration management

Agent Ecosystem

Six specialized AI agents working in harmony:

  1. Ingestion Agent - Multi-source alert collection with deduplication
  2. Triage Agent - Intelligent alert prioritization and initial classification
  3. Analysis Agent - Deep threat investigation using ReAct reasoning loops
  4. Human-in-the-Loop - Structured analyst collaboration and escalation
  5. Response Agent - Automated containment and remediation actions
  6. Learning Agent - Continuous model improvement and knowledge extraction

Tool Orchestration Layer

Unified interface to security tools:

  • SIEM Integration - Splunk, QRadar, Sentinel connectivity
  • Threat Intelligence - IOC enrichment and reputation services
  • Sandbox Analysis - Automated malware detonation
  • EDR/XDR Tools - Endpoint investigation and response

๐Ÿš€ Key Features

Intelligent Decision Making

  • Confidence-Based Routing - Alerts flow through agents based on confidence thresholds
  • Policy-Driven Workflows - Configurable decision points with business logic
  • Adaptive Thresholds - Dynamic adjustment based on historical performance

Advanced State Management

Enhanced State Object:
โ”œโ”€โ”€ alert_id & raw_alert
โ”œโ”€โ”€ enriched_data & triage_status  
โ”œโ”€โ”€ confidence_score & FP/TP_indicators
โ”œโ”€โ”€ workflow_history & agent_executions
โ”œโ”€โ”€ human_feedback & response_actions
โ””โ”€โ”€ metadata & audit_trail

Asynchronous Processing

  • Concurrent Tool Execution - Parallel security tool queries
  • Retry Mechanisms - Robust error handling and fallback strategies
  • Caching Layer - Redis-powered performance optimization
  • Rate Limiting - API protection and resource management

Human-AI Collaboration

  • Structured Feedback Interface - Clear escalation and review processes
  • Role-Based Access Control - Analyst, senior analyst, and manager workflows
  • SLA Tracking - Response time monitoring and alerting
  • Knowledge Transfer - Human insights fed back to learning systems

๐Ÿ”„ Workflow Process

graph TD
    %% Enhanced State Definition
    State[<b>Enhanced State Object</b><br/>โ€ข alert_id<br/>โ€ข raw_alert<br/>โ€ข enriched_data<br/>โ€ข triage_status<br/>โ€ข confidence_score<br/>โ€ข FP/TP_indicators<br/>โ€ข workflow_history<br/>โ€ข agent_executions<br/>โ€ข human_feedback<br/>โ€ข response_actions<br/>โ€ข metadata]
    
    %% Enhanced Nodes with Async Support
    Ingestion[<b>Ingestion Agent</b><br/>โ€ข Async Sources<br/>โ€ข Batching<br/>โ€ข Deduplication<br/>โ€ข Rate Limiting]
    Triage[<b>Triage Agent</b><br/>โ€ข Rule Engine<br/>โ€ข ML Scoring<br/>โ€ข Thresholds<br/>โ€ข Fallback]
    Correlation[<b>Correlation Agent</b><br/>โ€ข Async Queries<br/>โ€ข Caching<br/>โ€ข Timeouts<br/>โ€ข Retry Logic]
    Analysis[<b>Analysis Agent</b><br/>โ€ข ReAct Loop<br/>โ€ข Tool Orchestration<br/>โ€ข Fallback to Human<br/>โ€ข Reasoning Logs]
    HumanLoop[<b>Human-in-the-Loop</b><br/>โ€ข Structured Feedback<br/>โ€ข Role-Based Access<br/>โ€ข Escalation Levels<br/>โ€ข SLA Tracking]
    Response[<b>Response Agent</b><br/>โ€ข Playbook Engine<br/>โ€ข Approval Workflow<br/>โ€ข Rollback Support<br/>โ€ข Action Audit]
    Learning[<b>Learning Agent</b><br/>โ€ข Model Versioning<br/>โ€ข Training Pipeline<br/>โ€ข Performance Metrics<br/>โ€ข A/B Testing]
    Close[<b>Close Alert</b><br/>โ€ข State Validation<br/>โ€ข Audit Trail<br/>โ€ข Archive Process<br/>โ€ข Metrics Collection]
    
    %% Enhanced Decision Points with Thresholds
    IsFP{Confidence > 80%<br/>AND FP Indicators?<br/>Policy: FP_THRESHOLD}
    NeedsCorrelation{Confidence 40-70%<br/>OR Needs Context?<br/>Policy: CORRELATION_POLICY}
    NeedsAnalysis{Confidence < 60%<br/>OR Complex Alert?<br/>Policy: ANALYSIS_POLICY}
    NeedsHuman{Confidence Grey Zone<br/>OR High Risk?<br/>Policy: HUMAN_REVIEW_POLICY}
    NeedsResponse{Confirmed Threat<br/>AND Auto-Response<br/>Enabled?<br/>Policy: RESPONSE_POLICY}
    
    %% Enhanced Tool Orchestration
    Tools[<b>Tool Orchestration Layer</b><br/>โ€ข Async Execution<br/>โ€ข Retry Strategies<br/>โ€ข Caching Layer<br/>โ€ข Metrics Collection<br/>โ€ข Timeout Handling<br/>โ€ข Fallback Logic]
    
    %% Storage Layer
    Storage[<b>Storage Layer</b><br/>โ€ข PostgreSQL<br/>โ€ข Redis Cache<br/>โ€ข Vector DB<br/>โ€ข State History<br/>โ€ข Audit Logs]
    
    %% Monitoring & Observability
    Monitoring[<b>Monitoring & Observability</b><br/>โ€ข Metrics<br/>โ€ข Tracing<br/>โ€ข Logging<br/>โ€ข Alerts<br/>โ€ข Dashboards]
    
    %% Enhanced Flow with Async and Feedback Loops
    Ingestion -->|Initialize State| State
    State -->|New Alert| Triage
    Triage -->|Update State| State
    State -->|Triage Complete| IsFP
    IsFP -->|Yes| Close
    IsFP -->|No| NeedsCorrelation
    NeedsCorrelation -->|Yes| Correlation
    NeedsCorrelation -->|No| NeedsAnalysis
    Correlation -->|Update State| State
    State -->|Correlation Complete| NeedsAnalysis
    NeedsAnalysis -->|Yes| Analysis
    NeedsAnalysis -->|No| NeedsHuman
    Analysis -->|ReAct Loop| State
    State -->|Analysis Complete| NeedsHuman
    NeedsHuman -->|Yes| HumanLoop
    NeedsHuman -->|No| NeedsResponse
    HumanLoop -->|Update State| State
    State -->|Human Feedback| NeedsResponse
    NeedsResponse -->|Yes| Response
    NeedsResponse -->|No| Learning
    Response -->|Update State| State
    State -->|Response Complete| Learning
    Learning -->|Update State| State
    State -->|Learning Complete| Close
    
    %% Enhanced Tool Connections with Orchestration
    Triage -.->|Call via Orchestrator| Tools
    Correlation -.->|Async Call via Orchestrator| Tools
    Analysis -.->|ReAct via Orchestrator| Tools
    HumanLoop -.->|Call via Orchestrator| Tools
    Response -.->|Call via Orchestrator| Tools
    Learning -.->|Call via Orchestrator| Tools
    
    %% Storage Connections
    State <-->|Persist/Load| Storage
    Tools <-->|Cache/Store| Storage
    
    %% Monitoring Connections
    State -.->|State Changes| Monitoring
    Tools -.->|Tool Metrics| Monitoring
    Triage -.->|Agent Metrics| Monitoring
    Correlation -.->|Agent Metrics| Monitoring
    Analysis -.->|Agent Metrics| Monitoring
    HumanLoop -.->|Agent Metrics| Monitoring
    Response -.->|Agent Metrics| Monitoring
    Learning -.->|Agent Metrics| Monitoring
    
    %% Feedback Loops
    HumanLoop -.->|Feedback| Learning
    Learning -.->|Improved Models| Analysis
    Learning -.->|Improved Models| Triage
    Learning -.->|Improved Models| Correlation
    
    %% Styling
    classDef agent fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    classDef decision fill:#fff3e0,stroke:#e65100,stroke-width:2px
    classDef state fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    classDef tools fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
    classDef storage fill:#fff8e1,stroke:#ff8f00,stroke-width:2px
    classDef monitoring fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    classDef terminal fill:#ffebee,stroke:#b71c1c,stroke-width:2px
    
    class Ingestion,Triage,Correlation,Analysis,HumanLoop,Response,Learning agent
    class IsFP,NeedsCorrelation,NeedsAnalysis,NeedsHuman,NeedsResponse decision
    class State state
    class Tools tools
    class Storage storage
    class Monitoring monitoring
    class Close terminal
Loading

Architecture Layers

graph TB
    subgraph "Core Framework"
        Core[Core Framework]
        State[State Management]
        Workflow[Workflow Engine]
        Config[Configuration]
    end
    
    subgraph "Agent Layer"
        IngestionMod[Ingestion Module]
        TriageMod[Triage Module]
        AnalysisMod[Analysis Module]
        HumanMod[Human Loop Module]
        ResponseMod[Response Module]
        LearningMod[Learning Module]
    end
    
    subgraph "Tool Layer"
        ToolOrchestrator[Tool Orchestrator]
        SIEMTools[SIEM Tools]
        IntelTools[Intel Tools]
        SandboxTools[Sandbox Tools]
        EDRTools[EDR Tools]
    end
    
    subgraph "Storage Layer"
        StateStorage[State Storage]
        CacheLayer[Cache Layer]
        VectorDB[Vector DB]
        AuditLogs[Audit Logs]
    end
    
    subgraph "Monitoring Layer"
        Metrics[Metrics Collection]
        Tracing[Distributed Tracing]
        Logging[Structured Logging]
        Alerting[Alerting System]
    end
    
    Core --> State
    Core --> Workflow
    Core --> Config
    
    Workflow --> IngestionMod
    Workflow --> TriageMod
    Workflow --> AnalysisMod
    Workflow --> HumanMod
    Workflow --> ResponseMod
    Workflow --> LearningMod
    
    IngestionMod --> ToolOrchestrator
    TriageMod --> ToolOrchestrator
    AnalysisMod --> ToolOrchestrator
    HumanMod --> ToolOrchestrator
    ResponseMod --> ToolOrchestrator
    LearningMod --> ToolOrchestrator
    
    ToolOrchestrator --> SIEMTools
    ToolOrchestrator --> IntelTools
    ToolOrchestrator --> SandboxTools
    ToolOrchestrator --> EDRTools
    
    State --> StateStorage
    ToolOrchestrator --> CacheLayer
    LearningMod --> VectorDB
    State --> AuditLogs
    
    IngestionMod --> Metrics
    TriageMod --> Metrics
    AnalysisMod --> Metrics
    HumanMod --> Metrics
    ResponseMod --> Metrics
    LearningMod --> Metrics
    
    Metrics --> Tracing
    Metrics --> Logging
    Metrics --> Alerting
Loading

1. Alert Ingestion

graph LR
    A[Alert Sources] --> B[Ingestion Agent]
    B --> C[State Initialization]
    C --> D[Deduplication]
    D --> E[Rate Limiting]
    E --> F[Triage Queue]
Loading

2. Intelligent Triage

graph TD
    A[Alert Input] --> B{Rule Engine}
    B --> C[ML Scoring]
    C --> D{Confidence Score}
    D -->|>80%| E[Auto-Close FP]
    D -->|40-80%| F[Correlation Queue]
    D -->|<40%| G[Analysis Queue]
    
    E --> H[Learning Feedback]
    F --> I[Correlation Agent]
    G --> J[Analysis Agent]
Loading

3. Contextual Correlation

graph LR
    A[Alert] --> B[Correlation Agent]
    B --> C[Async Queries]
    C --> D[Threat Intel APIs]
    C --> E[SIEM Historical Data]
    C --> F[Asset Information]
    
    D --> G[Enrichment Results]
    E --> G
    F --> G
    
    G --> H[Temporal Analysis]
    H --> I[Entity Resolution]
    I --> J[Updated State]
Loading

4. Deep Analysis (ReAct Loop)

graph TD
    A[Analysis Agent] --> B[Reasoning]
    B --> C{Need More Data?}
    C -->|Yes| D[Action: Query Tools]
    D --> E[Tool Execution]
    E --> F[Observation]
    F --> B
    
    C -->|No| G[Conclusion]
    G --> H{Confidence Level}
    H -->|High| I[Auto Response]
    H -->|Medium| J[Human Review]
    H -->|Low| K[Escalate to Senior]
    
    I --> L[Response Agent]
    J --> M[Human-in-Loop]
    K --> M
Loading

5. Human Escalation

graph TD
    A[Human Review Required] --> B{Risk Level}
    B -->|Critical| C[Immediate Escalation]
    B -->|High| D[Senior Analyst Queue]
    B -->|Medium| E[Standard Review Queue]
    
    C --> F[Manager/CISO Alert]
    D --> G[Senior Analyst]
    E --> H[Analyst]
    
    F --> I[Structured Feedback]
    G --> I
    H --> I
    
    I --> J{Decision}
    J -->|Approve| K[Response Agent]
    J -->|Deny| L[Close Alert]
    J -->|Need More Info| M[Back to Analysis]
Loading

6. Automated Response

graph TD
    A[Response Triggered] --> B[Playbook Selection]
    B --> C{Approval Required?}
    C -->|Yes| D[Approval Workflow]
    C -->|No| E[Execute Actions]
    
    D --> F{Approved?}
    F -->|Yes| E
    F -->|No| G[Log Decision & Close]
    
    E --> H[Block IP/Domain]
    E --> I[Quarantine File]
    E --> J[Disable Account]
    E --> K[Network Isolation]
    
    H --> L[Action Audit]
    I --> L
    J --> L
    K --> L
    
    L --> M{Success?}
    M -->|Yes| N[Update State]
    M -->|No| O[Rollback & Alert]
    
    N --> P[Learning Agent]
    O --> Q[Human Intervention]
Loading

7. Continuous Learning

graph TD
    A[Learning Agent] --> B[Collect Feedback]
    B --> C[Human Feedback]
    B --> D[Agent Performance]
    B --> E[Response Outcomes]
    
    C --> F[Model Training Data]
    D --> F
    E --> F
    
    F --> G[Model Versioning]
    G --> H{A/B Testing}
    H -->|Champion| I[Deploy New Model]
    H -->|Challenger| J[Performance Analysis]
    
    J --> K{Better Performance?}
    K -->|Yes| I
    K -->|No| L[Keep Current Model]
    
    I --> M[Update Agent Configs]
    L --> N[Log Results]
    
    M --> O[Performance Monitoring]
    N --> O
    
    O --> P[Metrics Dashboard]
Loading

๐Ÿ“Š Monitoring & Observability

Comprehensive Telemetry

  • Metrics Collection: Agent performance, tool latency, accuracy rates
  • Distributed Tracing: End-to-end workflow visibility
  • Structured Logging: Searchable audit trails and debugging
  • Real-time Dashboards: Operations center visibility

Key Performance Indicators

  • Mean Time to Detection (MTTD)
  • Mean Time to Response (MTTR)
  • False Positive Rate
  • Agent Accuracy Scores
  • Human Escalation Rate
  • Tool Utilization Metrics

๐Ÿ› ๏ธ Technology Stack

Core Technologies

  • LangGraph: Multi-agent workflow orchestration
  • LangChain: LLM integration and tool connectivity
  • PostgreSQL: Primary state and audit storage
  • Redis: Caching and session management
  • Vector Database: Similarity search and embeddings

AI/ML Components

  • Large Language Models: GPT-4, Claude for reasoning
  • Custom ML Models: Specialized threat detection
  • Embedding Models: Semantic similarity analysis
  • Classification Models: Alert categorization

Security Tool Integrations

  • SIEM Platforms: Splunk, IBM QRadar, Microsoft Sentinel
  • Threat Intelligence: VirusTotal, MISP, commercial feeds
  • Sandbox Solutions: Cuckoo, Joe Sandbox, Falcon Sandbox
  • EDR/XDR: CrowdStrike, SentinelOne, Microsoft Defender

๐Ÿšฆ Getting Started

Prerequisites

Python 3.11+
PostgreSQL 14+
Redis 7+
Docker & Docker Compose
Security tool API credentials

Quick Setup

# Clone repository
git clone https://github.com/your-org/soc-langgraph-poc
cd soc-langgraph-poc

# Install dependencies
pip install -r requirements.txt

# Configure environment
cp .env.example .env
# Edit .env with your API keys and database URLs

# Initialize database
python scripts/init_db.py

# Start services
docker-compose up -d

# Launch SOC workflow
python main.py

Configuration

Key configuration areas:

  • Agent Policies: Confidence thresholds and routing logic
  • Tool Credentials: API keys and connection strings
  • Workflow Rules: Business logic and escalation procedures
  • Learning Settings: Model update frequencies and training data

๐Ÿ“ˆ Use Cases & Benefits

Automated Threat Detection

  • 24/7 Operations: Continuous alert processing without human fatigue
  • Consistent Analysis: Standardized investigation procedures
  • Rapid Response: Sub-minute detection-to-containment cycles

Analyst Augmentation

  • Decision Support: AI-powered recommendations with explanations
  • Workload Optimization: Focus on high-value investigative work
  • Knowledge Scaling: Junior analysts with senior-level insights

Operational Excellence

  • Reduced False Positives: Intelligent filtering and correlation
  • Improved MTTR: Faster incident response through automation
  • Audit Compliance: Complete workflow documentation and traceability

๐Ÿ”ฎ Roadmap & Future Enhancements

Planned Features

  • Multi-Tenant Architecture: Support for multiple customer environments
  • Advanced ML Models: Custom threat detection model training
  • Integration Marketplace: Plug-and-play security tool connectors
  • Mobile Interface: Analyst mobile app for on-the-go response

Research Areas

  • Federated Learning: Cross-organization threat intelligence sharing
  • Explainable AI: Enhanced transparency in AI decision-making
  • Attack Graph Analysis: Multi-stage attack detection and visualization

๐Ÿค Contributing

We welcome contributions from the security and AI communities:

  1. Issue Reporting: Bug reports and feature requests
  2. Code Contributions: Pull requests with improvements
  3. Tool Integrations: New security tool connectors
  4. Documentation: Enhanced guides and examples

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • LangGraph Team: For the excellent multi-agent framework
  • Security Community: For threat intelligence and best practices
  • Open Source Contributors: For the foundational tools and libraries

Built with โค๏ธ for the cybersecurity community

Empowering security teams with intelligent automation while keeping humans in control of critical decisions.

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •