-
Notifications
You must be signed in to change notification settings - Fork 13
Description
I used Claude Code to run Osprey, here's a friction log of where I got stuck and what changes I made to make things work properly:
Osprey Setup Friction Log
This document records all issues encountered while setting up the Osprey project from scratch on December 12, 2025.
Environment
- OS: macOS (Darwin 24.5.0)
- Platform: Apple Silicon (ARM64)
- Python: 3.11
- Date: December 12, 2025
Issue 1: Postgres 18 Volume Mount Incompatibility
Problem
When running docker compose up -d, the Postgres container immediately crashed with exit code 1.
Root Cause
Postgres 18 changed its data storage structure to use major-version-specific directory names compatible with pg_ctlcluster. The docker-compose.yaml was configured with the old mount point /var/lib/postgresql/data, and there was existing data from a previous Postgres version in the volume.
Error Message
Error: in 18+, these Docker images are configured to store database data in a
format which is compatible with "pg_ctlcluster" (specifically, using
major-version-specific directory names).
Counter to that, there appears to be PostgreSQL data in:
/var/lib/postgresql/data (unused mount/volume)
Solution
-
Updated
docker-compose.yamlline 261:# Before - metadata_data:/var/lib/postgresql/data # After - metadata_data:/var/lib/postgresql
-
Removed old volumes with
docker compose down -v
Files Modified
docker-compose.yaml:261
Issue 2: osprey-ui-api Container Startup Failures
Problem
The osprey-ui-api container kept crashing on startup with database connection errors.
Root Cause
The UI API service had a basic depends_on configuration that didn't wait for Postgres to be healthy before starting. It attempted to connect to Postgres before the database was ready to accept connections.
Error Message
psycopg2.OperationalError: could not translate host name "postgres" to address: Name or service not known
Solution
Updated docker-compose.yaml lines 144-156 to use health check conditions:
# Before
depends_on:
- osprey-worker
- druid-broker
- postgres
- snowflake-id-worker
- bigtable
- bigtable-initializer
# After
depends_on:
osprey-worker:
condition: service_started
druid-broker:
condition: service_started
postgres:
condition: service_healthy
snowflake-id-worker:
condition: service_healthy
bigtable:
condition: service_healthy
bigtable-initializer:
condition: service_completed_successfullyFiles Modified
docker-compose.yaml:144-156
Issue 3: UI "No data for selected features"
Problem
Events were appearing in the Event Stream panel on the right side of the UI, but each event showed "No data for selected features" instead of the actual field values.
Root Cause
UX Issue: The Osprey UI requires users to explicitly select which fields they want to display in the event stream. This is not immediately obvious to new users.
Solution
Click the "Select Summary Features" button (top right of Event Stream panel) and select fields to display:
- ContainsHello
- PostText
- UserId
- EventType
Recommendation for Improvement
Consider either:
- Pre-selecting common fields by default
- Adding a tooltip/hint when events show "No data for selected features"
- Auto-selecting all available fields on first use
Issue 4: Timeseries Chart "No data available"
Problem
The Timeseries Chart consistently showed "No data available" even though the Event Stream showed matching events.
Root Cause
Time Range Mismatch: The query time range ended at 1:44pm EST, but data with ContainsHello field only started being ingested at 1:50pm EST (after we fixed some conflicts we between local rule experiments and the example data that was restored). This created a 6-minute gap where no matching data existed.
Additional Factor
Browser Caching: Earlier queries that returned empty results were cached, so even after data became available, the cached empty response was being shown.
Solution
- Adjust the end time in the date range selector to be after 1:50pm EST (current time)
- Hard refresh the page (Cmd+Shift+R on Mac, Ctrl+Shift+R on Windows)
- Resubmit the query
Verification
Direct Druid query confirmed data existed:
{
"total": 189,
"with_hello": 60
}Recommendation for Improvement
- Default to "now" for end time instead of a fixed historical timestamp
- Add a "Last hour" or "Last 30 minutes" quick select option
- Show a more helpful message when no data is found (e.g., "No data in selected time range")
Issue 5: Platform Architecture Warnings
Problem
Multiple Docker containers showed platform mismatch warnings during startup:
The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)
Affected Images
gcr.io/google.com/cloudsdktool/cloud-sdk:latest(bigtable)apache/druid:34.0.0(all druid containers)ghcr.io/ayubun/snowflake-id-worker:0
Impact
Non-blocking: The containers run successfully under emulation, but there may be performance implications.
Recommendation for Improvement
Consider providing ARM64 native images or documenting the expected platform in the README.
Summary of Changes Required
Files Modified
-
docker-compose.yaml- Line 261: Changed Postgres volume mount
- Lines 144-156: Added health check conditions for osprey-ui-api dependencies
-
druid/specs/execution_results.json- Line 8: Changed offset reset from "latest" to "earliest"
Files Restored (from git)
example_rules/main.smlexample_plugins/src/register_plugins.pyexample_rules/config/labels.yaml
Time to Resolution
| Issue | Discovery | Resolution | Time Spent |
|---|---|---|---|
| Postgres 18 mount | Immediate | 5 mins | 5 mins |
| UI API crashes | 2 mins | 3 mins | 5 mins |
| Wrong rules loaded | 10 mins | 5 mins | 15 mins |
| Druid schema | 5 mins | 10 mins | 15 mins |
| UI features | 2 mins | 1 min | 3 mins |
| Time range | 5 mins | 2 mins | 7 mins |
| Total | ~50 mins |
Positive Notes
What Went Well
- Dependencies installed cleanly -
uv syncworked perfectly on first try - Pre-commit hooks - Installed without issues
- Test data generator - Started and worked immediately once rules were fixed
- Documentation - README.md was clear and accurate for the basic setup
- Error messages - Most errors (especially Postgres) had clear, actionable error messages
Infrastructure Highlights
- Docker Compose setup is well-structured
- Health checks are properly configured (once we used them)
- Druid ingestion works reliably with schema discovery
- Worker processes events correctly once rules are loaded
Recommendations
For New Users
- Always run
git statusbefore assuming the repository is clean - Check docker logs (
docker compose logs <service>) when services fail - Allow 30-60 seconds after restarting services for Druid to ingest new schema
- Use "now" or recent timestamps for query end times, not historical dates
For the Project
-
Add a quickstart troubleshooting section to README covering:
- Postgres volume issues on upgrade
- How to reset Druid schema
- Time range configuration in UI
-
Improve onboarding UX:
- Pre-select common fields in Event Stream
- Default time ranges to "last hour" instead of fixed dates
- Add inline help for "No data for selected features"
-
Add a setup verification script that:
- Checks if services are healthy
- Verifies data is flowing through the pipeline
- Confirms Druid has ingested recent data
-
Document platform compatibility for ARM64/M-series Macs
-
Consider adding
.dockerignorepatterns to prevent local rule changes from being mounted into containers, or add a warning in the README about uncommitted changes
Metadata
Metadata
Assignees
Labels
Type
Projects
Status