-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the LogSentinelAI Wiki! This comprehensive guide covers everything you need to know about using LogSentinelAI for intelligent log analysis.
LogSentinelAI's core feature is Declarative Extraction. In each analyzer, you simply declare the result structure (Pydantic class) you want, and the LLM automatically analyzes logs and returns results in that structure as JSON. No complex parsing or post-processing—just declare the fields you want, and the AI fills them in.
- In your analyzer script, declare the result structure (Pydantic class) you want to receive.
- When you run the analysis command, the LLM automatically generates JSON matching that structure.
from pydantic import BaseModel
class MyAccessLogResult(BaseModel):
ip: str
url: str
is_attack: bool
Just define the fields you want, and the LLM will generate results like:
{
"ip": "192.168.0.1",
"url": "/admin.php",
"is_attack": true
}
from pydantic import BaseModel
class MyApacheErrorResult(BaseModel):
log_level: str
event_message: str
is_critical: bool
from pydantic import BaseModel
class MyLinuxLogResult(BaseModel):
event_type: str
user: str
is_anomaly: bool
By declaring only the result structure you want in each analyzer, the LLM automatically returns results in that structure—no manual parsing required.
# Basic analysis
logsentinelai-httpd-access /var/log/apache2/access.log
# With Elasticsearch output
logsentinelai-httpd-access /var/log/nginx/access.log --output elasticsearch
# Real-time monitoring
logsentinelai-httpd-access /var/log/apache2/access.log --monitor
What it detects:
- SQL injection attempts
- XSS attacks
- Brute force attacks
- Suspicious user agents
- Unusual request patterns
- Geographic anomalies
logsentinelai-httpd-server /var/log/apache2/error.log
What it detects:
- Configuration errors
- Module failures
- Security-related errors
- Performance issues
logsentinelai-linux-system /var/log/syslog
What it detects:
- Authentication failures
- Service crashes
- Security events
- System anomalies
-
Get API Key
- Visit https://platform.openai.com/api-keys
- Create new API key
- Copy the key
-
Configure LogSentinelAI
[llm] provider = "openai" model = "gpt-4o-mini" api_key = "sk-your-key-here"
-
Test Configuration
logsentinelai-httpd-access sample-logs/access-100.log
-
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
-
Pull Model
ollama pull llama3.1:8b
-
Configure LogSentinelAI
[llm] provider = "ollama" model = "llama3.1:8b" base_url = "http://localhost:11434"
Use Case | OpenAI | Ollama | Performance |
---|---|---|---|
High Accuracy | gpt-4o | llama3.1:70b | Excellent |
Balanced | gpt-4o-mini | llama3.1:8b | Good |
Fast/Local | gpt-3.5-turbo | mistral:7b | Fast |
📋 Installation: See INSTALL.ko.md for complete Docker-ELK setup instructions.
Once your Elasticsearch is running (via Docker-ELK or standalone), configure LogSentinelAI:
[elasticsearch]
enabled = true
host = "localhost"
port = 9200
index_prefix = "logsentinelai"
LogSentinelAI automatically creates optimized index templates:
-
Security Events:
logsentinelai-security-YYYY.MM.DD
-
Raw Logs:
logsentinelai-logs-YYYY.MM.DD
-
Metadata:
logsentinelai-metadata-YYYY.MM.DD
Default retention policy automatically applied:
- Hot Phase: 7 days (frequent searches)
- Warm Phase: 30 days (occasional searches)
- Cold Phase: 90 days (rare searches)
- Delete: 365 days (automatic cleanup)
Real-time Indexing:
# Stream analysis results directly to Elasticsearch
logsentinelai-httpd-access /var/log/apache2/access.log --output elasticsearch --mode realtime
Bulk Processing:
# Process multiple log files into Elasticsearch
logsentinelai-httpd-access /var/log/apache2/access.log.* --output elasticsearch
Index Monitoring:
# Check index status
curl "http://localhost:9200/_cat/indices/logsentinelai-*?v"
# View today's security events count
curl "http://localhost:9200/logsentinelai-security-$(date +%Y.%m.%d)/_count"
📋 Installation: See INSTALL.ko.md for complete Kibana setup with Docker-ELK.
-
Import Pre-built Dashboard
# Dashboard file is included in the repository curl -X POST "localhost:5601/api/saved_objects/_import" \ -H "kbn-xsrf: true" \ -H "Content-Type: application/json" \ --form file=@Kibana-9.0.3-Dashboard-LogSentinelAI.ndjson
-
Configure Index Patterns
- Go to Kibana → Stack Management → Index Patterns
- Create pattern:
logsentinelai-*
- Set time field:
@timestamp
- 🚨 Security Overview: Real-time threat detection with severity breakdown
- 🌍 Geographic Analysis: Attack origin mapping with coordinates
- 📈 Timeline Analysis: Event chronology and trend analysis
- 👥 Top Attackers: Most active threat sources ranked
- 🎯 Attack Types: Categorized threat analysis with drill-down
Custom Time Ranges:
- Use Kibana's time picker for specific analysis periods
- Set up auto-refresh for real-time monitoring
Filtering and Searching:
# Example KQL queries for LogSentinelAI data
severity: "high" OR severity: "critical"
source_ips: "192.168.*"
event_type: "sql_injection"
Dashboard Customization:
- Clone existing dashboard for custom views
- Add new visualizations based on your specific log patterns
- Set up custom alerts based on threat patterns
⚠️ Important: For SSH connections, the target host must be added to your system's known_hosts file first. Runssh-keyscan -H <hostname> >> ~/.ssh/known_hosts
or manually connect once to accept the host key.
[ssh]
enabled = true
host = "remote-server.com"
username = "loguser"
key_file = "~/.ssh/id_rsa"
# Analyze remote logs
logsentinelai-httpd-access \
--ssh-host remote-server.com \
--ssh-user loguser \
--ssh-key ~/.ssh/id_rsa \
/var/log/apache2/access.log
- Use SSH keys, not passwords
- Limit SSH user permissions
- Use dedicated log analysis user
- Consider SSH tunneling for security
Real-time monitoring in LogSentinelAI works with new logs only:
- Starts monitoring from the current end of the log file
- Only processes newly added log entries after the monitoring starts
- Past logs are never processed - this ensures true real-time behavior
- If monitoring is stopped and restarted, it continues from the current file position (not from where it was previously stopped)
# Monitor Apache logs in real-time
logsentinelai-httpd-access --mode realtime
# With custom sampling threshold
logsentinelai-httpd-access --mode realtime --sampling-threshold 200
- Live Analysis: Process logs as they're written
- Sampling: Reduce load on high-traffic systems
- Real-time Alerts: Immediate threat detection
- Continuous Indexing: Stream to Elasticsearch
Edit src/logsentinelai/core/prompts.py
:
HTTPD_ACCESS_PROMPT = """
Analyze this Apache/Nginx access log for security threats:
Focus on:
1. SQL injection patterns
2. XSS attempts
3. Your custom criteria here
Log entry: {log_entry}
"""
Change analysis language in config:
[analysis]
language = "korean" # korean, japanese, spanish, etc.
# Process multiple files
logsentinelai-httpd-access /var/log/apache2/access.log.* --batch
# Parallel processing
logsentinelai-httpd-access /var/log/*.log --parallel 4
[analysis]
batch_size = 100 # Process 100 entries at once
max_tokens = 2000 # Reduce token limit
- Use smaller models for high-volume analysis
- Enable sampling for real-time monitoring
- Cache results for repeated patterns
logsentinelai-httpd-access [OPTIONS] LOG_FILE
Options:
--output [json|elasticsearch|stdout] Output format
--monitor Real-time monitoring
--sample-rate INTEGER Sampling rate for monitoring
--ssh-host TEXT SSH hostname
--ssh-user TEXT SSH username
--ssh-key TEXT SSH key file path
--help Show help message
logsentinelai-httpd-server [OPTIONS] LOG_FILE
# Similar options to httpd-access
logsentinelai-linux-system [OPTIONS] LOG_FILE
# Similar options to httpd-access
logsentinelai-geoip-download [OPTIONS]
Options:
--force Force re-download even if database exists
--help Show help message
All commands support:
-
--config PATH
: Custom configuration file -
--verbose
: Enable verbose logging -
--quiet
: Suppress output except errors
LogSentinelAI uses environment variables for configuration. Copy config.template
to config
and customize:
# Copy configuration template
cp config.template config
# Edit configuration
nano config
Key Configuration Sections:
LLM Provider Configuration:
# Provider Selection
LLM_PROVIDER=openai # openai, ollama, vllm, gemini
# Model Selection (per provider)
LLM_MODEL_OPENAI=gpt-4o-mini
LLM_MODEL_OLLAMA=qwen2.5-coder:3b
LLM_MODEL_GEMINI=gemini-1.5-pro
# API Configuration
OPENAI_API_KEY=sk-your-key-here
LLM_API_HOST_OPENAI=https://api.openai.com/v1
LLM_API_HOST_OLLAMA=http://127.0.0.1:11434
# Generation Parameters
LLM_TEMPERATURE=0.1 # Consistency for log analysis
LLM_TOP_P=0.3 # Focus on high-probability tokens
Analysis Configuration:
# Language and Mode
RESPONSE_LANGUAGE=english # korean, japanese, etc.
ANALYSIS_MODE=batch # batch or realtime
# Chunk Sizes (entries per LLM request)
CHUNK_SIZE_HTTPD_ACCESS=10
CHUNK_SIZE_LINUX_SYSTEM=10
CHUNK_SIZE_GENERAL_LOG=10
# Default Log Paths
LOG_PATH_HTTPD_ACCESS=sample-logs/access-10k.log
LOG_PATH_LINUX_SYSTEM=sample-logs/linux-2k.log
Real-time Monitoring:
# Polling Configuration
REALTIME_POLLING_INTERVAL=5 # Check interval (seconds)
REALTIME_MAX_LINES_PER_BATCH=50 # Max lines per poll
REALTIME_BUFFER_TIME=2 # Wait for complete lines
# Sampling Control
REALTIME_SAMPLING_THRESHOLD=100 # Auto-sampling trigger
GeoIP Configuration:
GEOIP_ENABLED=true
GEOIP_DATABASE_PATH=~/.logsentinelai/GeoLite2-City.mmdb
GEOIP_INCLUDE_PRIVATE_IPS=false
GEOIP_CACHE_SIZE=1000
Elasticsearch Configuration:
ELASTICSEARCH_HOST=http://localhost:9200
ELASTICSEARCH_USER=elastic
ELASTICSEARCH_PASSWORD=changeme
ELASTICSEARCH_INDEX=logsentinelai-analysis
SSH Remote Access:
REMOTE_LOG_MODE=local # local or ssh
REMOTE_SSH_HOST=server.com
REMOTE_SSH_USER=loguser
REMOTE_SSH_KEY_PATH=~/.ssh/id_rsa
REMOTE_SSH_TIMEOUT=10
📋 Full Reference: See
config.template
in the repository for all available options with detailed comments.
🚀 Key Advantage: LogSentinelAI uses Declarative Extraction - you simply declare the output structure you want (Pydantic models), and the LLM automatically extracts relevant information from logs to match that structure. No manual parsing or field mapping required!
Traditional Log Analysis:
# Manual parsing, regex patterns, field mapping
grep "ERROR" /var/log/app.log | awk '{print $1, $3}' | sed 's/[^a-zA-Z0-9]//g'
LogSentinelAI Approach:
# Just declare what you want
class SecurityEvent(BaseModel):
severity: str
threat_type: str
source_ip: str
confidence_score: float
description: str
The LLM automatically fills these fields from any log format - no parsing rules needed!
Each analyzer produces structured output based on its declared Pydantic model:
HTTP Access Log Analysis:
{
"events": [
{
"event_type": "suspicious_access",
"severity": "high",
"source_ips": ["192.168.1.100"],
"url_pattern": "/admin.php",
"attack_patterns": ["sql_injection"],
"confidence_score": 0.85,
"description": "SQL injection attempt detected",
"recommended_actions": ["Block IP", "Review security rules"]
}
],
"statistics": {
"total_requests": 1500,
"unique_ips": 45,
"error_rate": 0.12
}
}
Linux System Log Analysis:
{
"events": [
{
"event_type": "auth_failure",
"severity": "medium",
"username": "admin",
"source_ips": ["10.0.0.5"],
"process": "sshd",
"confidence_score": 0.9,
"description": "Multiple authentication failures detected"
}
],
"statistics": {
"total_events": 25,
"auth_failures": 8,
"unique_users": 3
}
}
{
"timestamp": "2024-01-15T10:30:45Z",
"log_type": "httpd_access",
"original_log": "192.168.1.100 - - [15/Jan/2024:10:30:45 +0000] \"GET /admin.php HTTP/1.1\" 200 1234",
"analysis": {
"threat_detected": true,
"threat_type": "suspicious_access",
"severity": "medium",
"confidence": 0.85,
"description": "Access to admin interface from unusual IP",
"recommendations": [
"Monitor this IP for further suspicious activity",
"Consider implementing IP-based access controls"
]
},
"parsed_fields": {
"ip_address": "192.168.1.100",
"timestamp": "15/Jan/2024:10:30:45 +0000",
"method": "GET",
"path": "/admin.php",
"status_code": 200,
"response_size": 1234
},
"enrichment": {
"geoip": {
"ip": "192.168.1.100",
"country_code": "US",
"country_name": "United States",
"city": "New York",
"latitude": 40.7128,
"longitude": -74.0060
},
"reputation": {
"is_known_bad": false,
"threat_score": 0.3
}
},
"metadata": {
"analyzer_version": "0.2.3",
"model_used": "gpt-4o-mini",
"processing_time": 1.2
}
}
The Power of Declaration: Want different fields? Just declare them in your analyzer's Pydantic model:
# Custom Security Event Structure
class MySecurityEvent(BaseModel):
timestamp: str
risk_level: int # 1-10 scale
attack_vector: str
affected_service: str
remediation_steps: List[str]
business_impact: str
The LLM will automatically extract and populate these fields from your logs, regardless of the original log format!
Field | Type | Description | Auto-Extracted |
---|---|---|---|
threat_detected |
boolean | Whether a threat was detected | ✅ From log patterns |
threat_type |
string | Type of threat (sql_injection, xss, brute_force, etc.) | ✅ From attack signatures |
severity |
string | Severity level (low, medium, high, critical) | ✅ From impact analysis |
confidence |
float | Confidence score (0.0-1.0) | ✅ From pattern matching |
description |
string | Human-readable description | ✅ From log context |
recommendations |
array | Recommended actions | ✅ From threat intelligence |
✨ Key Insight: All these fields are automatically extracted by the LLM based on your declared structure. Change the structure, get different data - no code changes needed!
Problem: API calls to LLM provider failing
Solutions:
- Check API key validity
- Verify network connectivity
- Check provider status page
- Increase timeout in config
# Test connectivity
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models
Problem: GeoIP lookups failing
Solutions:
# Re-download database (City database includes coordinates)
logsentinelai-geoip-download
# Check database location and verify it's the City database
ls -la ~/.logsentinelai/GeoLite2-City.mmdb
# Test GeoIP functionality
python -c "from logsentinelai.core.geoip import get_geoip_lookup; g=get_geoip_lookup(); print(g.lookup_geoip('8.8.8.8'))"
Problem: Cannot connect to Elasticsearch
Solutions:
- Check Elasticsearch status:
curl http://localhost:9200
- Verify configuration in config file
- Check network connectivity
Problem: Cannot read log files
Solutions:
# Add user to log group
sudo usermod -a -G adm $USER
# Change log file permissions
sudo chmod 644 /var/log/apache2/access.log
- Reduce
batch_size
in config - Use smaller LLM models
- Enable sampling for large files
- Use local LLM (Ollama) instead of API
- Reduce
max_tokens
- Enable parallel processing
-
Create analyzer file:
src/logsentinelai/analyzers/your_analyzer.py
- Define Pydantic models for structured output
-
Create LLM prompts in
src/logsentinelai/core/prompts.py
-
Add CLI entry point in
pyproject.toml
-
Add tests in
tests/
- Fork the repository
- Create feature branch
- Make changes following style guide
- Add tests
- Submit pull request
- Input: Log files (local/remote)
- Parsing: Extract structured data
- Analysis: LLM-powered threat detection
- Enrichment: GeoIP, reputation data
- Output: JSON, Elasticsearch, stdout
- Visualization: Kibana dashboards
This wiki provides comprehensive documentation for LogSentinelAI. For specific questions or issues, please:
Happy Log Analyzing! 🚀