o11y

Automated Full-Stack Observability deployment resources for Uhstray.io

Architecture

graph LR
    %% Add direction controls to subgraphs
    subgraph Observability Collection
        subgraph Data Sources
            apps[Applications]
            containers[Containers]
            servers[Servers]
            logs[Logs]
        end
        
        subgraph Instrumentation
            subgraph Exporters
                otel_agent[OpenTelemetry Agent - Ports :4317, :4318]
                nodeexp[Node Exporter - Ports :9090]
                cadvisor[cAdvisor - Ports :9090]
                promtail[Promtail - Ports :3100]
            end
            subgraph Local OTEL Collector
                otel_col[OpenTelemetry Collector - Ports :4317, :4318]
            end
            subgraph Local Time Series Database
                prom[Prometheus - Ports :9090]
            end
        end
    end
    
    subgraph Observability Pipeline

        subgraph Logging
            subgraph Logs Pipeline
                alloy_logs{"Alloy<br>:4318"}
            end
            loki[Loki - Ports :3100]
            
        end
        subgraph Tracing
            subgraph Tracing Pipeline
                alloy_trace{"Alloy<br>:12345/:4319"}
            end
            tempoDistributor["Tempo Distributor"]
            tempoIngesters["Tempo Ingesters"]
            tempoQuery["Tempo Query Frontend<br>:3200"]
            tempoQuerier["Tempo Querier"]
            tempoCompactor["Tempo Compactor"]
            tempoMetricsGen["Tempo Metrics Generator"]
        end
        
        
        subgraph Metrics Pipeline
            mimirLB{"mimir Load Balancer<br>:9009"}
            mimir1["mimir-1<br>:8080"]
            mimir2["mimir-2<br>:8080"]
            mimir3["mimir-3<br>:8080"]
        end
    end

    subgraph Observability Analytics

        subgraph Visualization and Analytics
            grafana[Grafana - Ports :3000]
        end

        subgraph Profiling
            pyroscope["Pyroscope<br>:4040"]
        end

    end

    subgraph Data Storage and Recovery
            subgraph Object Storage
            minio[MinIO S3 Object Storage - Ports :9000]
            end
            subgraph Relational Storage
                postgres[PostgreSQL - Ports :5432]
        end
    end
    
    %% Data flow connections
    apps --> otel_agent
    containers --> cadvisor
    servers --> nodeexp
    logs --> promtail
    
    otel_agent --> otel_col
    nodeexp --> prom
    cadvisor --> prom
    tempoMetricsGen --> mimirLB
    promtail --> alloy_logs

    otel_col --> alloy_trace
    prom --> mimirLB
    alloy_logs --> loki
    alloy_trace --> tempoDistributor
    alloy_trace --> pyroscope
    mimirLB --> mimir1
    mimirLB --> mimir2
    mimirLB --> mimir3
    mimir1 --> minio
    mimir2 --> minio
    mimir3 --> minio
    
    
    tempoDistributor --> tempoIngesters
    tempoQuery --> tempoQuerier
    tempoQuerier --> tempoIngesters
    tempoCompactor --> minio
    tempoIngesters --> minio

    grafana --> mimirLB
    grafana --> loki
    grafana --> tempoQuery
    grafana --> pyroscope
    grafana --> postgres

Overview

This repository contains the deployment resources for our observability stack, including Grafana, Prometheus, Mimir, Tempo, Loki, and OpenTelemetry components. The stack provides comprehensive monitoring, logging, tracing, and metrics collection for Uhstray.io services.

Contributing Guidelines

Component Overview

Metrics Collection

Prometheus: Time-series database for storing metrics
Node Exporter: Hardware and OS metrics collection
cAdvisor: Container metrics collection
Mimir: Scalable, long-term metrics storage

Logs Management

Loki: Log aggregation system
Promtail: Log collection agent

Tracing

Tempo: Distributed tracing backend
OpenTelemetry Collector: Trace collection and processing
Alloy: Unified telemetry collector

Visualization

Grafana: Unified visualization platform for metrics, logs, and traces
Pyroscope: Continuous profiling platform

Getting Started

Prerequisites

Docker and Docker Compose installed
Minimum recommended resources: 8 CPU cores, 16GB RAM

Deployment

Pull down this repository and navigate to the main o11y directory:

git clone https://github.com/uhstray-io/o11y.git
cd ./o11y

Run docker compose:

docker compose up -d

Accessing Dashboards

Navigate to the following dashboards:

Grafana Dashboard: http://localhost:3000 (default credentials: admin/admin)
Prometheus Dashboard: http://localhost:9090
Mimir Dashboard: http://localhost:9009/
cAdvisor Dashboard: http://localhost:9092/
Tempo UI: http://localhost:3200
Loki UI: http://localhost:3100

Testing and Developing

Get the current logs from the deployment to triage:

docker compose logs

Spin the current deployment down:

docker compose down

Spin down the deployment and remove all volumes:

docker compose down -v

Spin down the deployment and remove all images+volumes:

docker compose down --rmi="all" -v

Troubleshooting

Common Issues

Services fail to start: Check for port conflicts with docker ps -a and stop any conflicting services
Out of memory errors: Increase Docker memory allocation in Docker Desktop settings
Permission issues: Ensure proper file permissions for volume mounts

Viewing Component Logs

# View logs for a specific service
docker compose logs grafana

# Follow logs live
docker compose logs -f prometheus

Technology References

Grafana Ecosystem

Grafana - Visualization platform
Grafana Mimir - Scalable metrics storage
Grafana Alloy - Unified telemetry collector
Grafana Beyla - eBPF-based auto-instrumentation
Grafana Pyroscope - Continuous profiling

Prometheus Ecosystem

Prometheus - Metrics collection and storage
Node Exporter - System metrics collection
Promtail - Log collector
Windows Exporter - Windows metrics collection
PostgreSQL Exporter - PostgreSQL metrics

OpenTelemetry

OpenTelemetry Collector - Telemetry collection
OTEL Protocol - Telemetry protocol specification
OTEL GO Instrumentation - Go instrumentation
OpenLLMetry - LLM observability

TODO

General Todo

Initial deployment with Grafana, Prometheus, and exporters
Upgrade Grafana to use Mimir Prometheus TSDB
Develop OpenTelemetry Collector Process for Wisbot
Deploy OpenTelemetry o11y collector integrated with Grafana
Upgrade Alert Manager Storage to use GitHub Actions driven Secrets

WisBot Todo

Upgrade to Alloy Collector where necessary for production needs
Migrate Mimir to Microservice Deployment Mode
Determine Beyla eBPF Instrumentation Targets
Add Pyroscope for Wisbot Profiling
Setup relabeling to streamline service discovery | https://grafana.com/docs/loki/latest/send-data/promtail/scraping/
Implement high availability configuration for production
Add custom dashboards for Wisbot service monitoring

Security Enhancements

Implement proper secrets management through environment variables
Remove hardcoded credentials from configuration files
Configure TLS for exposed services
Review and secure default credentials

Operational Improvements

Enable AlertManager integration with proper configuration
Develop comprehensive alerting rules beyond basic infrastructure monitoring
Setup proper backup/restore procedures for persistent data
Standardize configuration practices across components

Application Observability

Complete Wisbot instrumentation to enable service-level metrics
Implement distributed tracing for application components
Configure application profiling via Pyroscope

Documentation

Create operational procedures and runbooks
Develop dashboard usage guidelines
Define metrics dictionary for key indicators
Document architectural decisions for component choices

alertmanager_storage:
      backend: s3
      s3:
        access_key_id: {{ .Values.minio.rootUser }}
        bucket_name: {{ include "mimir.minioBucketPrefix" . }}-ruler
        endpoint: {{ template "minio.fullname" .Subcharts.minio }}.{{ .Release.Namespace }}.svc:{{ .Values.minio.service.port }}
        insecure: true
        secret_access_key: {{ .Values.minio.rootPassword }}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

o11y

Table of Contents

Architecture

Overview

Contributing Guidelines

Component Overview

Metrics Collection

Logs Management

Tracing

Visualization

Getting Started

Prerequisites

Deployment

Accessing Dashboards

Testing and Developing

Troubleshooting

Common Issues

Viewing Component Logs

Technology References

Grafana Ecosystem

Prometheus Ecosystem

OpenTelemetry

TODO

General Todo

WisBot Todo

Security Enhancements

Operational Improvements

Application Observability

Documentation

About

Releases

Packages

Contributors 2

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
alertmanager		alertmanager
alloy		alloy
cadvisor		cadvisor
grafana		grafana
loki		loki
mimir		mimir
minio		minio
node-exporter		node-exporter
opentelemetry		opentelemetry
prometheus		prometheus
promtail		promtail
pyroscope		pyroscope
tempo		tempo
windows-exporter		windows-exporter
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
compose.yml		compose.yml
observability.drawio.png		observability.drawio.png

License

uhstray-io/o11y

Folders and files

Latest commit

History

Repository files navigation

o11y

Table of Contents

Architecture

Overview

Contributing Guidelines

Component Overview

Metrics Collection

Logs Management

Tracing

Visualization

Getting Started

Prerequisites

Deployment

Accessing Dashboards

Testing and Developing

Troubleshooting

Common Issues

Viewing Component Logs

Technology References

Grafana Ecosystem

Prometheus Ecosystem

OpenTelemetry

TODO

General Todo

WisBot Todo

Security Enhancements

Operational Improvements

Application Observability

Documentation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages