Home

LogSentinelAI Wiki

Welcome to the LogSentinelAI Wiki! This comprehensive guide covers everything you need to know about using LogSentinelAI for intelligent log analysis.

📚 Table of Contents

Declarative Extraction: Schema-Driven AI Log Structuring

LogSentinelAI's core feature is Declarative Extraction. In each analyzer, you simply declare the result structure (Pydantic class) you want, and the LLM automatically analyzes logs and returns results in that structure as JSON. No complex parsing or post-processing—just declare the fields you want, and the AI fills them in.

Basic Usage

In your analyzer script, declare the result structure (Pydantic class) you want to receive.
When you run the analysis command, the LLM automatically generates JSON matching that structure.

Example: Customizing HTTP Access Log Analyzer

from pydantic import BaseModel

class MyAccessLogResult(BaseModel):
    ip: str
    url: str
    is_attack: bool

Just define the fields you want, and the LLM will generate results like:

{
  "ip": "192.168.0.1",
  "url": "/admin.php",
  "is_attack": true
}

Example: Customizing Apache Error Log Analyzer

from pydantic import BaseModel

class MyApacheErrorResult(BaseModel):
    log_level: str
    event_message: str
    is_critical: bool

Example: Customizing Linux System Log Analyzer

from pydantic import BaseModel

class MyLinuxLogResult(BaseModel):
    event_type: str
    user: str
    is_anomaly: bool

By declaring only the result structure you want in each analyzer, the LLM automatically returns results in that structure—no manual parsing required.

Analyzing Different Log Types

Apache/Nginx Access Logs

# Basic analysis
logsentinelai-httpd-access /var/log/apache2/access.log

# With Elasticsearch output
logsentinelai-httpd-access /var/log/nginx/access.log --output elasticsearch

# Real-time monitoring
logsentinelai-httpd-access /var/log/apache2/access.log --monitor

What it detects:

SQL injection attempts
XSS attacks
Brute force attacks
Suspicious user agents
Unusual request patterns
Geographic anomalies

Apache Error Logs

logsentinelai-httpd-server /var/log/apache2/error.log

What it detects:

Configuration errors
Module failures
Security-related errors
Performance issues

Linux System Logs

logsentinelai-linux-system /var/log/syslog

What it detects:

Authentication failures
Service crashes
Security events
System anomalies

LLM Provider Setup

OpenAI Setup Guide

Get API Key
- Visit https://platform.openai.com/api-keys
- Create new API key
- Copy the key

Configure LogSentinelAI

[llm]
provider = "openai"
model = "gpt-4o-mini"
api_key = "sk-your-key-here"

Test Configuration

logsentinelai-httpd-access sample-logs/access-100.log

Ollama Setup Guide

Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Pull Model
```
ollama pull llama3.1:8b
```

Configure LogSentinelAI

[llm]
provider = "ollama"
model = "llama3.1:8b"
base_url = "http://localhost:11434"

Model Recommendations

Use Case	OpenAI	Ollama	Performance
High Accuracy	gpt-4o	llama3.1:70b	Excellent
Balanced	gpt-4o-mini	llama3.1:8b	Good
Fast/Local	gpt-3.5-turbo	mistral:7b	Fast

Elasticsearch Integration

📋 Installation: See INSTALL.ko.md for complete Docker-ELK setup instructions.

Quick Usage After Installation

Once your Elasticsearch is running (via Docker-ELK or standalone), configure LogSentinelAI:

[elasticsearch]
enabled = true
host = "localhost"
port = 9200
index_prefix = "logsentinelai"

Automatic Index Management

LogSentinelAI automatically creates optimized index templates:

Security Events: logsentinelai-security-YYYY.MM.DD
Raw Logs: logsentinelai-logs-YYYY.MM.DD
Metadata: logsentinelai-metadata-YYYY.MM.DD

Index Lifecycle Management (ILM)

Default retention policy automatically applied:

Hot Phase: 7 days (frequent searches)
Warm Phase: 30 days (occasional searches)
Cold Phase: 90 days (rare searches)
Delete: 365 days (automatic cleanup)

Usage Tips

Real-time Indexing:

# Stream analysis results directly to Elasticsearch
logsentinelai-httpd-access /var/log/apache2/access.log --output elasticsearch --mode realtime

Bulk Processing:

# Process multiple log files into Elasticsearch
logsentinelai-httpd-access /var/log/apache2/access.log.* --output elasticsearch

Index Monitoring:

# Check index status
curl "http://localhost:9200/_cat/indices/logsentinelai-*?v"

# View today's security events count
curl "http://localhost:9200/logsentinelai-security-$(date +%Y.%m.%d)/_count"

Kibana Dashboard Setup

📋 Installation: See INSTALL.ko.md for complete Kibana setup with Docker-ELK.

Quick Setup After Installation

Import Pre-built Dashboard

# Dashboard file is included in the repository
curl -X POST "localhost:5601/api/saved_objects/_import" \
  -H "kbn-xsrf: true" \
  -H "Content-Type: application/json" \
  --form file=@Kibana-9.0.3-Dashboard-LogSentinelAI.ndjson

Configure Index Patterns
- Go to Kibana → Stack Management → Index Patterns
- Create pattern: logsentinelai-*
- Set time field: @timestamp

Dashboard Features

🚨 Security Overview: Real-time threat detection with severity breakdown
🌍 Geographic Analysis: Attack origin mapping with coordinates
📈 Timeline Analysis: Event chronology and trend analysis
👥 Top Attackers: Most active threat sources ranked
🎯 Attack Types: Categorized threat analysis with drill-down

Usage Tips

Custom Time Ranges:

Use Kibana's time picker for specific analysis periods
Set up auto-refresh for real-time monitoring

Filtering and Searching:

# Example KQL queries for LogSentinelAI data
severity: "high" OR severity: "critical"
source_ips: "192.168.*"
event_type: "sql_injection"

Dashboard Customization:

Clone existing dashboard for custom views
Add new visualizations based on your specific log patterns
Set up custom alerts based on threat patterns

Remote Log Analysis via SSH

⚠️ Important: For SSH connections, the target host must be added to your system's known_hosts file first. Run ssh-keyscan -H <hostname> >> ~/.ssh/known_hosts or manually connect once to accept the host key.

Configuration

[ssh]
enabled = true
host = "remote-server.com"
username = "loguser"
key_file = "~/.ssh/id_rsa"

Usage

# Analyze remote logs
logsentinelai-httpd-access \
  --ssh-host remote-server.com \
  --ssh-user loguser \
  --ssh-key ~/.ssh/id_rsa \
  /var/log/apache2/access.log

Security Best Practices

Use SSH keys, not passwords
Limit SSH user permissions
Use dedicated log analysis user
Consider SSH tunneling for security

Real-time Monitoring

Real-time Mode Behavior

Real-time monitoring in LogSentinelAI works with new logs only:

Starts monitoring from the current end of the log file
Only processes newly added log entries after the monitoring starts
Past logs are never processed - this ensures true real-time behavior
If monitoring is stopped and restarted, it continues from the current file position (not from where it was previously stopped)

Monitor Mode

# Monitor Apache logs in real-time
logsentinelai-httpd-access --mode realtime

# With custom sampling threshold
logsentinelai-httpd-access --mode realtime --sampling-threshold 200

Monitoring Features

Live Analysis: Process logs as they're written
Sampling: Reduce load on high-traffic systems
Real-time Alerts: Immediate threat detection
Continuous Indexing: Stream to Elasticsearch

Custom Prompts

Modifying Prompts

Edit src/logsentinelai/core/prompts.py:

HTTPD_ACCESS_PROMPT = """
Analyze this Apache/Nginx access log for security threats:

Focus on:
1. SQL injection patterns
2. XSS attempts
3. Your custom criteria here

Log entry: {log_entry}
"""

Language Support

Change analysis language in config:

[analysis]
language = "korean"  # korean, japanese, spanish, etc.

Performance Optimization

Batch Processing

# Process multiple files
logsentinelai-httpd-access /var/log/apache2/access.log.* --batch

# Parallel processing
logsentinelai-httpd-access /var/log/*.log --parallel 4

Memory Optimization

[analysis]
batch_size = 100  # Process 100 entries at once
max_tokens = 2000  # Reduce token limit

LLM Optimization

Use smaller models for high-volume analysis
Enable sampling for real-time monitoring
Cache results for repeated patterns

CLI Commands Reference

Core Commands

logsentinelai-httpd-access

logsentinelai-httpd-access [OPTIONS] LOG_FILE

Options:
  --output [json|elasticsearch|stdout]  Output format
  --monitor                            Real-time monitoring
  --sample-rate INTEGER               Sampling rate for monitoring
  --ssh-host TEXT                     SSH hostname
  --ssh-user TEXT                     SSH username
  --ssh-key TEXT                      SSH key file path
  --help                              Show help message

logsentinelai-httpd-server

logsentinelai-httpd-server [OPTIONS] LOG_FILE
# Similar options to httpd-access

logsentinelai-linux-system

logsentinelai-linux-system [OPTIONS] LOG_FILE
# Similar options to httpd-access

Utility Commands

logsentinelai-geoip-download

logsentinelai-geoip-download [OPTIONS]

Options:
  --force    Force re-download even if database exists
  --help     Show help message

Global Options

All commands support:

--config PATH: Custom configuration file
--verbose: Enable verbose logging
--quiet: Suppress output except errors

Configuration Options

Complete Configuration Reference

LogSentinelAI uses environment variables for configuration. Copy config.template to config and customize:

# Copy configuration template
cp config.template config
# Edit configuration
nano config

Key Configuration Sections:

LLM Provider Configuration:

# Provider Selection
LLM_PROVIDER=openai          # openai, ollama, vllm, gemini

# Model Selection (per provider)
LLM_MODEL_OPENAI=gpt-4o-mini
LLM_MODEL_OLLAMA=qwen2.5-coder:3b
LLM_MODEL_GEMINI=gemini-1.5-pro

# API Configuration
OPENAI_API_KEY=sk-your-key-here
LLM_API_HOST_OPENAI=https://api.openai.com/v1
LLM_API_HOST_OLLAMA=http://127.0.0.1:11434

# Generation Parameters
LLM_TEMPERATURE=0.1          # Consistency for log analysis
LLM_TOP_P=0.3               # Focus on high-probability tokens

Analysis Configuration:

# Language and Mode
RESPONSE_LANGUAGE=english    # korean, japanese, etc.
ANALYSIS_MODE=batch         # batch or realtime

# Chunk Sizes (entries per LLM request)
CHUNK_SIZE_HTTPD_ACCESS=10
CHUNK_SIZE_LINUX_SYSTEM=10
CHUNK_SIZE_GENERAL_LOG=10

# Default Log Paths
LOG_PATH_HTTPD_ACCESS=sample-logs/access-10k.log
LOG_PATH_LINUX_SYSTEM=sample-logs/linux-2k.log

Real-time Monitoring:

# Polling Configuration
REALTIME_POLLING_INTERVAL=5      # Check interval (seconds)
REALTIME_MAX_LINES_PER_BATCH=50  # Max lines per poll
REALTIME_BUFFER_TIME=2           # Wait for complete lines

# Sampling Control
REALTIME_SAMPLING_THRESHOLD=100  # Auto-sampling trigger

GeoIP Configuration:

GEOIP_ENABLED=true
GEOIP_DATABASE_PATH=~/.logsentinelai/GeoLite2-City.mmdb
GEOIP_INCLUDE_PRIVATE_IPS=false
GEOIP_CACHE_SIZE=1000

Elasticsearch Configuration:

ELASTICSEARCH_HOST=http://localhost:9200
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
ELASTICSEARCH_INDEX=logsentinelai-analysis

SSH Remote Access:

REMOTE_LOG_MODE=local       # local or ssh
REMOTE_SSH_HOST=server.com
REMOTE_SSH_USER=loguser
REMOTE_SSH_KEY_PATH=~/.ssh/id_rsa
REMOTE_SSH_TIMEOUT=10

📋 Full Reference: See config.template in the repository for all available options with detailed comments.

Output Format

🚀 Key Advantage: LogSentinelAI uses Declarative Extraction - you simply declare the output structure you want (Pydantic models), and the LLM automatically extracts relevant information from logs to match that structure. No manual parsing or field mapping required!

How Declarative Extraction Works

Traditional Log Analysis:

# Manual parsing, regex patterns, field mapping
grep "ERROR" /var/log/app.log | awk '{print $1, $3}' | sed 's/[^a-zA-Z0-9]//g'

LogSentinelAI Approach:

# Just declare what you want
class SecurityEvent(BaseModel):
    severity: str
    threat_type: str
    source_ip: str
    confidence_score: float
    description: str

The LLM automatically fills these fields from any log format - no parsing rules needed!

Example Output Structures

Each analyzer produces structured output based on its declared Pydantic model:

HTTP Access Log Analysis:

{
  "events": [
    {
      "event_type": "suspicious_access",
      "severity": "high", 
      "source_ips": ["192.168.1.100"],
      "url_pattern": "/admin.php",
      "attack_patterns": ["sql_injection"],
      "confidence_score": 0.85,
      "description": "SQL injection attempt detected",
      "recommended_actions": ["Block IP", "Review security rules"]
    }
  ],
  "statistics": {
    "total_requests": 1500,
    "unique_ips": 45,
    "error_rate": 0.12
  }
}

Linux System Log Analysis:

{
  "events": [
    {
      "event_type": "auth_failure", 
      "severity": "medium",
      "username": "admin",
      "source_ips": ["10.0.0.5"],
      "process": "sshd",
      "confidence_score": 0.9,
      "description": "Multiple authentication failures detected"
    }
  ],
  "statistics": {
    "total_events": 25,
    "auth_failures": 8,
    "unique_users": 3
  }
}

JSON Output Structure

{
  "timestamp": "2024-01-15T10:30:45Z",
  "log_type": "httpd_access",
  "original_log": "192.168.1.100 - - [15/Jan/2024:10:30:45 +0000] \"GET /admin.php HTTP/1.1\" 200 1234",
  "analysis": {
    "threat_detected": true,
    "threat_type": "suspicious_access",
    "severity": "medium",
    "confidence": 0.85,
    "description": "Access to admin interface from unusual IP",
    "recommendations": [
      "Monitor this IP for further suspicious activity",
      "Consider implementing IP-based access controls"
    ]
  },
  "parsed_fields": {
    "ip_address": "192.168.1.100",
    "timestamp": "15/Jan/2024:10:30:45 +0000",
    "method": "GET",
    "path": "/admin.php",
    "status_code": 200,
    "response_size": 1234
  },
  "enrichment": {
    "geoip": {
      "ip": "192.168.1.100",
      "country_code": "US",
      "country_name": "United States",
      "city": "New York",
      "latitude": 40.7128,
      "longitude": -74.0060
    },
    "reputation": {
      "is_known_bad": false,
      "threat_score": 0.3
    }
  },
  "metadata": {
    "analyzer_version": "0.2.3",
    "model_used": "gpt-4o-mini",
    "processing_time": 1.2
  }
}

Customizing Output Fields

The Power of Declaration: Want different fields? Just declare them in your analyzer's Pydantic model:

# Custom Security Event Structure
class MySecurityEvent(BaseModel):
    timestamp: str
    risk_level: int  # 1-10 scale
    attack_vector: str
    affected_service: str
    remediation_steps: List[str]
    business_impact: str

The LLM will automatically extract and populate these fields from your logs, regardless of the original log format!

Security Event Fields

Field	Type	Description	Auto-Extracted
`threat_detected`	boolean	Whether a threat was detected	✅ From log patterns
`threat_type`	string	Type of threat (sql_injection, xss, brute_force, etc.)	✅ From attack signatures
`severity`	string	Severity level (low, medium, high, critical)	✅ From impact analysis
`confidence`	float	Confidence score (0.0-1.0)	✅ From pattern matching
`description`	string	Human-readable description	✅ From log context
`recommendations`	array	Recommended actions	✅ From threat intelligence

✨ Key Insight: All these fields are automatically extracted by the LLM based on your declared structure. Change the structure, get different data - no code changes needed!

Troubleshooting

Common Issues

1. "LLM API Error"

Problem: API calls to LLM provider failing

Solutions:

Check API key validity
Verify network connectivity
Check provider status page
Increase timeout in config

# Test connectivity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
  https://api.openai.com/v1/models

2. "GeoIP Database Not Found"

Problem: GeoIP lookups failing

Solutions:

# Re-download database (City database includes coordinates)
logsentinelai-geoip-download

# Check database location and verify it's the City database
ls -la ~/.logsentinelai/GeoLite2-City.mmdb

# Test GeoIP functionality
python -c "from logsentinelai.core.geoip import get_geoip_lookup; g=get_geoip_lookup(); print(g.lookup_geoip('8.8.8.8'))"

3. "Elasticsearch Connection Failed"

Problem: Cannot connect to Elasticsearch

Solutions:

Check Elasticsearch status: curl http://localhost:9200
Verify configuration in config file
Check network connectivity

4. "Permission Denied on Log Files"

Problem: Cannot read log files

Solutions:

# Add user to log group
sudo usermod -a -G adm $USER

# Change log file permissions
sudo chmod 644 /var/log/apache2/access.log

Performance Issues

High Memory Usage

Reduce batch_size in config
Use smaller LLM models
Enable sampling for large files

Slow Processing

Use local LLM (Ollama) instead of API
Reduce max_tokens
Enable parallel processing

Contributing

Adding New Analyzers

Create analyzer file: src/logsentinelai/analyzers/your_analyzer.py
Define Pydantic models for structured output
Create LLM prompts in src/logsentinelai/core/prompts.py
Add CLI entry point in pyproject.toml
Add tests in tests/

Submitting Changes

Fork the repository
Create feature branch
Make changes following style guide
Add tests
Submit pull request

Data Flow

Input: Log files (local/remote)
Parsing: Extract structured data
Analysis: LLM-powered threat detection
Enrichment: GeoIP, reputation data
Output: JSON, Elasticsearch, stdout
Visualization: Kibana dashboards

This wiki provides comprehensive documentation for LogSentinelAI. For specific questions or issues, please:

Happy Log Analyzing! 🚀

Home

LogSentinelAI Wiki

📚 Table of Contents

Core Concepts

User Guides

Advanced Usage

Reference

Development

Declarative Extraction: Schema-Driven AI Log Structuring

Basic Usage

Example: Customizing HTTP Access Log Analyzer

Example: Customizing Apache Error Log Analyzer

Example: Customizing Linux System Log Analyzer

Analyzing Different Log Types

Apache/Nginx Access Logs

Apache Error Logs

Linux System Logs

LLM Provider Setup

OpenAI Setup Guide

Ollama Setup Guide

Model Recommendations

Elasticsearch Integration

Quick Usage After Installation

Automatic Index Management

Index Lifecycle Management (ILM)

Usage Tips

Kibana Dashboard Setup

Quick Setup After Installation

Dashboard Features

Usage Tips

Remote Log Analysis via SSH

Configuration

Usage

Security Best Practices

Real-time Monitoring

Real-time Mode Behavior

Monitor Mode

Monitoring Features

Custom Prompts

Modifying Prompts

Language Support

Performance Optimization

Batch Processing

Memory Optimization

LLM Optimization

CLI Commands Reference

Core Commands

logsentinelai-httpd-access

logsentinelai-httpd-server

logsentinelai-linux-system

Utility Commands

logsentinelai-geoip-download

Global Options

Configuration Options

Complete Configuration Reference

Output Format

How Declarative Extraction Works

Example Output Structures

JSON Output Structure

Customizing Output Fields

Security Event Fields

Troubleshooting

Common Issues

1. "LLM API Error"

2. "GeoIP Database Not Found"

3. "Elasticsearch Connection Failed"

4. "Permission Denied on Log Files"

Performance Issues

High Memory Usage

Slow Processing

Contributing

Adding New Analyzers

Submitting Changes

Data Flow

Clone this wiki locally