A cross-platform Telegraf-based utilization agent that collects CPU, memory, and disk metrics from virtual machines and uploads them to AWS S3. This agent provides automated deployment scripts for multiple Linux distributions and Windows environments.
Linux (Ubuntu, Debian, CentOS, RHEL, Fedora, openSUSE, SLES, Alpine):
# Interactive setup
./setup-env.sh
# Install agent
sudo ./install.sh
Windows:
# Create environment file
cp env-template-windows.ps1 .env.ps1
# Edit .env.ps1 with your values
# Load environment and install
. .\.env.ps1
.\install.ps1
Linux:
sudo ./install.sh \
--telegraf-url "https://dl.influxdata.com/telegraf/releases/telegraf-1.34.4_linux_amd64.tar.gz" \
--bucket "your-s3-bucket" \
--access-key "YOUR_AWS_ACCESS_KEY" \
--secret-key "YOUR_AWS_SECRET_KEY"
Windows:
.\install.ps1 -TelegrafUrl "https://dl.influxdata.com/telegraf/releases/telegraf-1.34.4_windows_amd64.zip" `
-Bucket "your-s3-bucket" `
-AccessKey "YOUR_AWS_ACCESS_KEY" `
-SecretKey "YOUR_AWS_SECRET_KEY"
The Linux installer automatically detects and supports the following distributions:
- Ubuntu (18.04+) -
apt
package manager - Debian (9+) -
apt
package manager - CentOS (7+) -
yum
/dnf
package manager - RHEL (7+) -
yum
/dnf
package manager - Fedora (30+) -
dnf
package manager - openSUSE -
zypper
package manager - SLES -
zypper
package manager - Alpine Linux -
apk
package manager
The installer automatically:
- Detects the distribution type
- Uses the appropriate package manager
- Handles different user creation methods
- Adapts file ownership commands
- Supports both x86_64 and aarch64 architectures
- Collects Metrics: CPU, memory, and disk utilization every 30 seconds
- Stores Locally: Metrics saved as newline-delimited JSON files with daily rotation
- Uploads to S3: Automated sync to AWS S3 every 5 minutes
- Self-Managing: Systemd services (Linux) or Windows Services for reliability
- Minimal Footprint: ~2-5 MB storage per day per VM
- Cross-Platform: Works on all major Linux distributions and Windows
Once metrics are collected and stored in S3, you can perform powerful analytics using AWS Athena:
- No infrastructure - Query directly from S3 using standard SQL
- Cost-effective - Pay only for queries you run
- Scalable - Handles petabytes of data automatically
- Fast - Optimized for JSON data with proper partitioning
- Real-time Monitoring - Current system status across your fleet
- Historical Analysis - Trends and patterns over time
- Performance Optimization - Identify bottlenecks and inefficiencies
- Cost Optimization - Find underutilized resources for downsizing
- Capacity Planning - Predict future resource needs
- Custom Alerting - SQL-based thresholds and notifications
-- Get current CPU usage across all VMs
SELECT tags.host, AVG(fields.usage_active) as avg_cpu
FROM vm_metrics_db.vm_utilization
WHERE name = 'cpu' AND timestamp > UNIX_TIMESTAMP() - 300
GROUP BY tags.host;
-- Find underutilized VMs for cost savings
SELECT tags.host, AVG(fields.usage_active) as avg_cpu
FROM vm_metrics_db.vm_utilization
WHERE name = 'cpu' AND timestamp > UNIX_TIMESTAMP() - 604800
GROUP BY tags.host
HAVING AVG(fields.usage_active) < 15;
For complete SQL examples and setup instructions, see Athena Analytics Guide.
vm-utilization/
βββ README.md # This file - getting started guide
βββ LICENSE # MIT License
βββ install.sh # Multi-distribution Linux installer
βββ install.ps1 # Windows installation script
βββ setup-env.sh # Interactive environment setup (Linux)
βββ env-template.txt # Linux environment template
βββ env-template-windows.ps1 # Windows environment template
βββ ENVIRONMENT-SETUP.md # Detailed environment setup guide
βββ ATHENA-ANALYTICS.md # SQL analytics with AWS Athena
βββ LIVE-TESTING-REPORT.md # Live Azure testing results
βββ SECURITY.md # Security considerations
βββ CHANGELOG.md # Version history and changes
βββ docs/ # Additional documentation
- Environment Setup Guide - Detailed configuration setup
- Live Testing Report - Real-world Azure deployment results
- Security Guide - Security best practices and compliance
- Athena Analytics Guide - SQL analytics and reporting with AWS Athena
Requirement | Linux | Windows | Description |
---|---|---|---|
Admin Rights | sudo access |
Administrator | Required to install services |
Internet Access | HTTPS (443) | HTTPS (443) | For downloads and S3 sync |
AWS Credentials | S3 write permissions | S3 write permissions | For metrics upload |
S3 Bucket | Pre-existing | Pre-existing | Target for metrics storage |
System Requirements | systemd-based distro | Windows Server 2016+ | Service management |
The installation scripts support multiple configuration methods with the following priority:
- Environment Variables (highest priority)
- Command Line Arguments (fallback)
- Default Values (lowest priority)
Variable | Description | Example |
---|---|---|
VM_TELEGRAF_URL |
Telegraf download URL | https://dl.influxdata.com/telegraf/releases/telegraf-1.34.4_linux_amd64.tar.gz |
VM_S3_BUCKET |
S3 bucket for metrics storage | my-vm-metrics-bucket |
VM_AWS_ACCESS_KEY |
AWS access key ID | AKIA... |
VM_AWS_SECRET_KEY |
AWS secret access key | wJalrXUtn... |
Variable | Description | Default |
---|---|---|
VM_AWS_REGION |
AWS region | us-east-1 |
VM_CUSTOMER_ID |
Customer identifier | default-customer |
For detailed configuration options, see ENVIRONMENT-SETUP.md.
The agent collects the following metrics:
cpu.usage_active
- Active CPU percentagecpu.usage_idle
- Idle CPU percentage- Per-CPU core metrics (when available)
mem.used_percent
- Memory usage percentagemem.available
- Available memory in bytesmem.total
- Total memory in bytes
disk.used_percent
- Disk usage percentage per mount/drivedisk.free
- Free disk space in bytesdisk.total
- Total disk space in bytes
VM Utilization Agent Architecture:
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Telegraf β β Local Storage β β AWS S3 β
β Collector βββββΆβ (JSON files) βββββΆβ Bucket β
β (30s interval)β β Daily rotation β β (5min sync) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Linux Implementation:
- Supports all major distributions automatically
- Telegraf runs as systemd service
- S3 sync via systemd timer (5-minute intervals)
- Secure credential storage with proper permissions
Windows Implementation:
- Telegraf runs as Windows Service
- S3 sync via Scheduled Task (5-minute intervals)
- Credentials stored as task environment variables
-
Clone the repository:
git clone https://github.com/ops-guru/vm-utilization.git cd vm-utilization
-
Test on your distribution:
# Linux (any supported distribution) sudo ./install.sh --telegraf-url "..." --bucket "test-bucket" --region "us-east-1" --access-key "..." --secret-key "..." # Windows .\install.ps1 -TelegrafUrl "..." -Bucket "test-bucket" -Region "us-east-1" -AccessKey "..." -SecretKey "..."
-
Verify metrics collection:
# Linux sudo ls -la /var/lib/vm-metrics/ sudo systemctl status telegraf # Windows Get-ChildItem "C:\ProgramData\vm-metrics\" Get-Service -Name "Telegraf"
See LIVE-TESTING-REPORT.md for comprehensive testing results including:
- β Azure cloud deployment testing
- β Multi-VM environment validation
- β Performance benchmarks (30MB memory, minimal CPU)
- β Security compliance verification
- β End-to-end metric flow validation
Linux (All Distributions):
# Check Telegraf service
sudo systemctl status telegraf
# Check S3 sync timer
sudo systemctl status vm-metrics-sync.timer
# View sync logs
sudo journalctl -u vm-metrics-sync -f
# Check distribution detection
./install.sh --help # Shows supported distributions
Windows:
# Check Telegraf service
Get-Service -Name "Telegraf"
# Check scheduled task
Get-ScheduledTask -TaskName "VM-Metrics-Sync"
# View task history
Get-WinEvent -FilterHashtable @{LogName='Microsoft-Windows-TaskScheduler/Operational'; ID=200,201} | Where-Object {$_.Message -like "*VM-Metrics-Sync*"}
# Stop and disable services
sudo systemctl stop telegraf vm-metrics-sync.timer
sudo systemctl disable telegraf vm-metrics-sync.timer vm-metrics-sync.service
# Remove service files
sudo rm -f /etc/systemd/system/telegraf.service
sudo rm -f /etc/systemd/system/vm-metrics-sync.service
sudo rm -f /etc/systemd/system/vm-metrics-sync.timer
# Remove application files
sudo rm -rf /etc/telegraf
sudo rm -rf /var/lib/vm-metrics
sudo rm -rf /etc/vm-metrics
sudo rm -f /usr/local/bin/telegraf
# Reload systemd
sudo systemctl daemon-reload
# Stop and remove Telegraf service
Stop-Service -Name "Telegraf" -Force
sc.exe delete "Telegraf"
# Remove scheduled task
Unregister-ScheduledTask -TaskName "VM-Metrics-Sync" -Confirm:$false
# Remove application directories
Remove-Item -Path "C:\Program Files\Telegraf" -Recurse -Force
Remove-Item -Path "C:\ProgramData\Telegraf" -Recurse -Force
Remove-Item -Path "C:\ProgramData\vm-metrics" -Recurse -Force
- Credentials: Stored with restricted permissions (root/SYSTEM only)
- Transport: HTTPS/TLS for all S3 communications
- File Permissions: Metrics files readable only by system accounts
- No Network Exposure: Agent only makes outbound connections
Issue | Cause | Solution |
---|---|---|
Service won't start | Configuration error | Check telegraf --config <config> --test |
S3 upload fails | Credentials/permissions | Verify IAM permissions and bucket access |
High disk usage | Sync failure | Check network connectivity and S3 permissions |
Linux:
- Telegraf:
sudo journalctl -u telegraf
- S3 Sync:
sudo journalctl -u vm-metrics-sync
Windows:
- Telegraf: Event Viewer > Windows Logs > System
- S3 Sync: Event Viewer > Task Scheduler logs
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
For issues and support:
- GitHub Issues: Create an issue
- Documentation: See Installation Guide
telegraf
monitoring
metrics
vm-utilization
aws-s3
linux
windows
automation
devops