Skip to content

Friction Log for setting up Osprey #96

@julietshen

Description

@julietshen

I used Claude Code to run Osprey, here's a friction log of where I got stuck and what changes I made to make things work properly:

Osprey Setup Friction Log

This document records all issues encountered while setting up the Osprey project from scratch on December 12, 2025.

Environment

  • OS: macOS (Darwin 24.5.0)
  • Platform: Apple Silicon (ARM64)
  • Python: 3.11
  • Date: December 12, 2025

Issue 1: Postgres 18 Volume Mount Incompatibility

Problem

When running docker compose up -d, the Postgres container immediately crashed with exit code 1.

Root Cause

Postgres 18 changed its data storage structure to use major-version-specific directory names compatible with pg_ctlcluster. The docker-compose.yaml was configured with the old mount point /var/lib/postgresql/data, and there was existing data from a previous Postgres version in the volume.

Error Message

Error: in 18+, these Docker images are configured to store database data in a
       format which is compatible with "pg_ctlcluster" (specifically, using
       major-version-specific directory names).

       Counter to that, there appears to be PostgreSQL data in:
         /var/lib/postgresql/data (unused mount/volume)

Solution

  1. Updated docker-compose.yaml line 261:

    # Before
    - metadata_data:/var/lib/postgresql/data
    
    # After
    - metadata_data:/var/lib/postgresql
  2. Removed old volumes with docker compose down -v

Files Modified

  • docker-compose.yaml:261

Issue 2: osprey-ui-api Container Startup Failures

Problem

The osprey-ui-api container kept crashing on startup with database connection errors.

Root Cause

The UI API service had a basic depends_on configuration that didn't wait for Postgres to be healthy before starting. It attempted to connect to Postgres before the database was ready to accept connections.

Error Message

psycopg2.OperationalError: could not translate host name "postgres" to address: Name or service not known

Solution

Updated docker-compose.yaml lines 144-156 to use health check conditions:

# Before
depends_on:
  - osprey-worker
  - druid-broker
  - postgres
  - snowflake-id-worker
  - bigtable
  - bigtable-initializer

# After
depends_on:
  osprey-worker:
    condition: service_started
  druid-broker:
    condition: service_started
  postgres:
    condition: service_healthy
  snowflake-id-worker:
    condition: service_healthy
  bigtable:
    condition: service_healthy
  bigtable-initializer:
    condition: service_completed_successfully

Files Modified

  • docker-compose.yaml:144-156

Issue 3: UI "No data for selected features"

Problem

Events were appearing in the Event Stream panel on the right side of the UI, but each event showed "No data for selected features" instead of the actual field values.

Root Cause

UX Issue: The Osprey UI requires users to explicitly select which fields they want to display in the event stream. This is not immediately obvious to new users.

Solution

Click the "Select Summary Features" button (top right of Event Stream panel) and select fields to display:

  • ContainsHello
  • PostText
  • UserId
  • EventType

Recommendation for Improvement

Consider either:

  1. Pre-selecting common fields by default
  2. Adding a tooltip/hint when events show "No data for selected features"
  3. Auto-selecting all available fields on first use

Issue 4: Timeseries Chart "No data available"

Problem

The Timeseries Chart consistently showed "No data available" even though the Event Stream showed matching events.

Root Cause

Time Range Mismatch: The query time range ended at 1:44pm EST, but data with ContainsHello field only started being ingested at 1:50pm EST (after we fixed some conflicts we between local rule experiments and the example data that was restored). This created a 6-minute gap where no matching data existed.

Additional Factor

Browser Caching: Earlier queries that returned empty results were cached, so even after data became available, the cached empty response was being shown.

Solution

  1. Adjust the end time in the date range selector to be after 1:50pm EST (current time)
  2. Hard refresh the page (Cmd+Shift+R on Mac, Ctrl+Shift+R on Windows)
  3. Resubmit the query

Verification

Direct Druid query confirmed data existed:

{
  "total": 189,
  "with_hello": 60
}

Recommendation for Improvement

  1. Default to "now" for end time instead of a fixed historical timestamp
  2. Add a "Last hour" or "Last 30 minutes" quick select option
  3. Show a more helpful message when no data is found (e.g., "No data in selected time range")

Issue 5: Platform Architecture Warnings

Problem

Multiple Docker containers showed platform mismatch warnings during startup:

The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8)

Affected Images

  • gcr.io/google.com/cloudsdktool/cloud-sdk:latest (bigtable)
  • apache/druid:34.0.0 (all druid containers)
  • ghcr.io/ayubun/snowflake-id-worker:0

Impact

Non-blocking: The containers run successfully under emulation, but there may be performance implications.

Recommendation for Improvement

Consider providing ARM64 native images or documenting the expected platform in the README.


Summary of Changes Required

Files Modified

  1. docker-compose.yaml

    • Line 261: Changed Postgres volume mount
    • Lines 144-156: Added health check conditions for osprey-ui-api dependencies
  2. druid/specs/execution_results.json

    • Line 8: Changed offset reset from "latest" to "earliest"

Files Restored (from git)

  • example_rules/main.sml
  • example_plugins/src/register_plugins.py
  • example_rules/config/labels.yaml

Time to Resolution

Issue Discovery Resolution Time Spent
Postgres 18 mount Immediate 5 mins 5 mins
UI API crashes 2 mins 3 mins 5 mins
Wrong rules loaded 10 mins 5 mins 15 mins
Druid schema 5 mins 10 mins 15 mins
UI features 2 mins 1 min 3 mins
Time range 5 mins 2 mins 7 mins
Total ~50 mins

Positive Notes

What Went Well

  1. Dependencies installed cleanly - uv sync worked perfectly on first try
  2. Pre-commit hooks - Installed without issues
  3. Test data generator - Started and worked immediately once rules were fixed
  4. Documentation - README.md was clear and accurate for the basic setup
  5. Error messages - Most errors (especially Postgres) had clear, actionable error messages

Infrastructure Highlights

  • Docker Compose setup is well-structured
  • Health checks are properly configured (once we used them)
  • Druid ingestion works reliably with schema discovery
  • Worker processes events correctly once rules are loaded

Recommendations

For New Users

  1. Always run git status before assuming the repository is clean
  2. Check docker logs (docker compose logs <service>) when services fail
  3. Allow 30-60 seconds after restarting services for Druid to ingest new schema
  4. Use "now" or recent timestamps for query end times, not historical dates

For the Project

  1. Add a quickstart troubleshooting section to README covering:

    • Postgres volume issues on upgrade
    • How to reset Druid schema
    • Time range configuration in UI
  2. Improve onboarding UX:

    • Pre-select common fields in Event Stream
    • Default time ranges to "last hour" instead of fixed dates
    • Add inline help for "No data for selected features"
  3. Add a setup verification script that:

    • Checks if services are healthy
    • Verifies data is flowing through the pipeline
    • Confirms Druid has ingested recent data
  4. Document platform compatibility for ARM64/M-series Macs

  5. Consider adding .dockerignore patterns to prevent local rule changes from being mounted into containers, or add a warning in the README about uncommitted changes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions