OpenTelemetry Observability Stack PoC

Complete observability platform demonstrating traces, metrics and logs across C#, Go, Python, Rust and C++ using OpenTelemetry.

Quick Start

Docker Compose (Local Development)

# Start everything
make compose-start

# Or start specific services (Recommended)
make compose-infra
make compose-start SERVICES="python-otel-service go-otel-service csharp-otel-service rust-otel-service"
make compose-start SERVICES="cpp-otel-service" # NOTE: C++ service is resource- and time-consuming to build on first run. Use selective service startup to skip it initially.

# Generate test traffic and view results
make compose-test
make open-grafana  # Open http://localhost:3000 (admin/admin)

# Clean up docker resources
make compose-clean

Kubernetes Kind (Local Development)

Open the matching dev container in any IDE that supports dev containers, then run:

# Terminal A - Deploy everything to Kind cluster
make k8s-deploy

# Terminal B - Port-forward everything (observability + services)
make k8s-fwd

# Terminal A - Generate test traffic and view results
make k8s-traffic
make open-grafana  # Open http://localhost:3000 (admin/admin)

# Clean up k8s resources
make k8s-clean

What's Included

Observability Stack:

OpenTelemetry Collector (port 4317) - Central telemetry hub
Jaeger (port 16686) - Distributed tracing
Prometheus (port 9090) - Metrics storage
Loki (port 3100) - Log aggregation
Grafana (port 3000) - Unified visualization

Example Services:

Language	Port	Endpoint
C# (ASP.NET)	5001	http://localhost:5001/api/hello
Go (Gin)	5002	http://localhost:5002/api/hello
Python (FastAPI)	5003	http://localhost:5003/api/hello
Rust (Actix)	5004	http://localhost:5004/api/hello
C++ (httplib)	5005	http://localhost:5005/api/hello

Architecture:

Services → OTLP (gRPC) → Collector → Jaeger/Prometheus/Loki → Grafana

Why OpenTelemetry?

Vendor-neutral standard (CNCF)
Single API across all languages
Auto-instrumentation for common frameworks
Future-proof observability
Backend-agnostic (easily switch Jaeger/Tempo, Prometheus/Graphite, Loki/Elasticsearch)

Viewing Telemetry

1. Traces (Jaeger)

make open-grafana
# Navigate to: Explore → Jaeger → Select service

2. Metrics (Prometheus)

# Request rate (Go, C# services)
otel_http_server_request_duration_seconds_count
rate(otel_http_server_request_duration_seconds_count[5m]) # per-second request rate based on the otel_http_server_request_duration_seconds_count counter, averaged over the last 5 minutes

# Response size (Go service)
otel_http_server_response_body_size_bytes_count
rate(otel_http_server_response_body_size_bytes_count[5m]) # per-second increase of the response body size counter over the last 5 minutes

# 95th percentile latency (Go service)
histogram_quantile(0.95, rate(otel_http_server_request_duration_seconds_bucket[5m])) # histogram_quantile(0.95, rate(otel_http_server_request_duration_seconds_bucket[5m])) calculates the 95th percentile request duration over the last 5 minutes

# Browse all HTTP metrics
{__name__=~"otel_http.*"}

# Note: Not all services export the same metrics due to varying auto-instrumentation support:
# - Go: ✅ Full HTTP metrics (Gin auto-instrumentation)
# - C#: Partial (client metrics only, server metrics missing)
# - Python: Different metric names (use response_size instead of duration)
# - Rust: ❌ No HTTP metrics (Actix has no auto-instrumentation)
# - C++: Manual counter only
# 
# TODO: Add manual HTTP metric instrumentation for Rust/C++ services.
# Refer to OpenTelemetry examples for your language:
# - Rust: https://github.com/open-telemetry/opentelemetry-rust/tree/main/examples
# - Python: https://opentelemetry.io/docs/languages/python/instrumentation/
# - C++: https://github.com/open-telemetry/opentelemetry-cpp/tree/main/examples

3. Logs (Loki)

# In Grafana Explore → Loki, try:
{service_name="rust-otel-service"}
{service_name=~".*-service"} |= "error"

Development

Dev Containers: Each service has a pre-configured dev container with debugging support. Open the dev container in a supported IDE for the chosen service → run make compose-infra inside the container to launch external dependencies → set breakpoints in the service’s source code and start debugging

Available Commands:

Usage: make [target] [SERVICES="service1 service2"]

Common targets:
  open-grafana       Open Grafana in browser
  open-jaeger        Open Jaeger in browser
  open-prometheus    Open Prometheus in browser

Docker Compose targets:
  compose-start      Start services (use SERVICES="svc1 svc2" for specific)
  compose-stop       Stop services
  compose-restart    Restart services
  compose-logs       Show logs
  compose-build      Build service images
  compose-clean      Stop services and remove volumes
  compose-status     Show status of all services
  compose-test       Generate test traffic
  compose-infra      Start only infrastructure services

Kubernetes targets:
  k8s-deploy         Deploy all services to Kind cluster
  k8s-clean          Remove all deployments from Kind cluster
  k8s-fwd-obs        Port-forward observability stack only
  k8s-fwd-svc        Port-forward OpenTelemetry services only
  k8s-forward        Port-forward everything (observability + services)
  k8s-traffic        Generate test traffic to all services

Troubleshooting

No telemetry data?

Check collector: docker ps | grep otel-collector
Generate traffic: make test
View logs: docker logs otel-collector

Port conflicts? Edit left side of port mappings in docker-compose.yml

Service issues?

make logs SERVICES="service-name"
make compose-restart

Resources

Technical Documentation

ADRs

ADR-001: OpenTelemetry Collector for Centralized Observability Pipeline

Common Use Cases

Debugging a Slow Request

Find Jaeger traces by trace ID or time in Grafana or Jaeger
Identify slow span/operation
Check metrics for that service in Grafana
View logs from that timeframe in Grafana
Fix and verify with new traces

Performance Monitoring

Set up Grafana dashboard with key metrics
Track request rates, error rates, latencies
Set alerts on SLO violations
Correlate metrics with traces for investigation

Production / Kubernetes Considerations

This PoC can be migrated to Kubernetes or deployed in the cloud:

Cloud/On-prem: Deploy on managed/self-hosted Kubernetes; configure persistent storage, load balancers and ingress controllers or Gateway API.
Best practices: Enable TLS, authentication/authorization, resource limits, sampling and backups for reliability.

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.devcontainer		.devcontainer
.vscode		.vscode
charts/services		charts/services
docs		docs
images		images
infra		infra
scripts		scripts
services		services
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenTelemetry Observability Stack PoC

Quick Start

Docker Compose (Local Development)

Kubernetes Kind (Local Development)

What's Included

Viewing Telemetry

1. Traces (Jaeger)

2. Metrics (Prometheus)

3. Logs (Loki)

Development

Troubleshooting

Resources

Technical Documentation

ADRs

Common Use Cases

Debugging a Slow Request

Performance Monitoring

Production / Kubernetes Considerations

About

Uh oh!

Releases

Packages

Languages

License

MGTheTrain/otel-poc

Folders and files

Latest commit

History

Repository files navigation

OpenTelemetry Observability Stack PoC

Quick Start

Docker Compose (Local Development)

Kubernetes Kind (Local Development)

What's Included

Viewing Telemetry

1. Traces (Jaeger)

2. Metrics (Prometheus)

3. Logs (Loki)

Development

Troubleshooting

Resources

Technical Documentation

ADRs

Common Use Cases

Debugging a Slow Request

Performance Monitoring

Production / Kubernetes Considerations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages