Generate Airlines Demo DB (Big Data) and Build a Flight Timetable to make a benchmark for PostgreSQL vs Doublets

## Summary
Set up the **PostgresPro Airlines demo database**, generate a **large (“big data”)** instance with the official generator, and implement **flight timetable queries** (departures/arrivals by airport/date/route). Use these queries to establish a **repeatable benchmark** and compare **PostgreSQL 18** performance and ergonomics with a **Doublets** implementation on equivalent operations. The environment must be **containerized**, with **two modes**: default (durable) and **embedded-like (WAL-light)**.

## Scope
- PostgreSQL side (Docker): data generation, schema inspection, timetable queries (with **validity** checks), basic indexes/constraints, timing runs in **both modes**.
- Doublets side: functionally equivalent data model and queries (or pipelines), plus timing runs.
- Deliver reproducible scripts, Docker assets, and result artifacts (logs/CSVs).

## Deliverables
- `[docker]/docker-compose.yml` pinned to `postgres:18`.
- `[docker]/compose.embedded.yml` (override) for embedded-like mode (WAL-light).
- `[docker]/init/99_unlogged.sql` (optional) to switch data tables to **UNLOGGED** in embedded-like runs.
- `[docs]/HOWTO.md` — runbook for Docker, generation parameters, and how to run **both** benchmark modes.
- `[sql]/10_timetable_queries.sql` — queries below.
- `[bench]/pg/run.sh` — timings (`EXPLAIN (ANALYZE, BUFFERS)` + wall-clock), CSV output.
- `[bench]/doublets/run.*` — equivalent Doublets operations, same CSV schema.
- `[bench]/schema-mapping.md` — mapping Airlines entities/fields to Doublets structures.
- README update with a “Benchmark: Timetable” section linking to the above.

## Prerequisites (PostgreSQL)
- **PostgreSQL 18 (pinned)** via Docker image `postgres:18`.
- Disk space per chosen dataset size (3m/6m/1y/2y).
- Extensions typically used by the generator: `btree_gist`, `cube`, `earthdistance`, `dblink`.

## Containerized Environment

### Durable (default) compose
```yaml
# [docker]/docker-compose.yml
version: "3.9"
services:
  pg:
    image: postgres:18
    container_name: airlines-pg18
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: postgres
    ports: ["5432:5432"]
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./init:/docker-entrypoint-initdb.d
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 20
volumes:
  pgdata:
```

### Embedded-like (WAL-light) override
> This mode **trades durability for speed** to better match embedded DB behavior.
```yaml
# [docker]/compose.embedded.yml
services:
  pg:
    command:
      - "postgres"
      - "-c"; "fsync=off"
      - "-c"; "synchronous_commit=off"
      - "-c"; "full_page_writes=off"
      - "-c"; "wal_level=minimal"
```

**Optional (strongly recommended for WAL-free data writes):** switch large tables to **UNLOGGED** in embedded mode.
```sql
-- [docker]/init/99_unlogged.sql (only applied when you want embedded-like runs)
DO $$
DECLARE r record;
BEGIN
  FOR r IN
    SELECT format('%I.%I', n.nspname, c.relname) AS fqname
    FROM pg_class c
    JOIN pg_namespace n ON n.oid = c.relnamespace
    WHERE n.nspname = 'bookings' AND c.relkind = 'r'
  LOOP
    EXECUTE 'ALTER TABLE '|| r.fqname || ' SET UNLOGGED';
  END LOOP;
END$$;
```
_Revert to durable:_ replace `SET UNLOGGED` with `SET LOGGED`.

**Run:**
```bash
# Durable
docker compose -f docker/docker-compose.yml up -d

# Embedded-like (WAL-light)
docker compose -f docker/docker-compose.yml -f docker/compose.embedded.yml up -d
```

## Setup Options

### Option A — Load a ready snapshot
```bash
gunzip -c demo-YYYYMMDD-1y.sql.gz \
| docker compose -f docker/docker-compose.yml exec -T pg psql -U postgres
```
> Recreates `demo` inside the container.

### Option B — Generate your own “big data” (recommended)
From `psql` in the container (any DB, not `demo` yet):
```sql
\i install;  -- creates DB demo, schemas, extensions

-- Example: generate 1 year with 4 workers
CALL generate(now(), now() + interval '1 year', 4);

-- Progress
SELECT busy();

-- Post-checks
\i check.sql;
```

## What to implement (PostgreSQL)

### A. Timetable queries (using the provided view)
> `bookings.timetable` already handles airport-local times and the **temporal validity** join.
```sql
-- Departures from SVO on 2025-10-07
SELECT *
FROM bookings.timetable
WHERE departure_airport = 'SVO'
  AND (scheduled_departure AT TIME ZONE 'UTC')::date = DATE '2025-10-07'
ORDER BY scheduled_departure;

-- Arrivals to SVO on 2025-10-07
SELECT *
FROM bookings.timetable
WHERE arrival_airport = 'SVO'
  AND (scheduled_arrival AT TIME ZONE 'UTC')::date = DATE '2025-10-07'
ORDER BY scheduled_arrival;

-- Next available flight after model-time “now” for a route
SELECT *
FROM bookings.timetable t
WHERE t.departure_airport = 'SVX'
  AND t.arrival_airport   = 'WUH'
  AND t.scheduled_departure > bookings.now()
ORDER BY t.scheduled_departure
LIMIT 1;
```

### B. Manual timetable (without the view, **explicit validity check**)
```sql
SELECT f.flight_id,
       r.route_no,
       r.departure_airport,
       r.arrival_airport,
       f.status,
       f.scheduled_departure,
       f.scheduled_arrival
FROM   bookings.flights  AS f
JOIN   bookings.routes   AS r
  ON   r.route_no = f.route_no
 AND   r.validity @> f.scheduled_departure  -- validity at composition time
WHERE  r.departure_airport = 'SVO'
  AND  f.scheduled_departure::date = DATE '2025-10-07'
ORDER BY f.scheduled_departure;
```

## Benchmark plan

### Metrics
- Wall-clock time (median over N runs).
- `EXPLAIN (ANALYZE, BUFFERS)` for PostgreSQL queries (saved to logs).
- Result row counts and checksum (simple hash) for equivalence.

### Datasets
- Two scales (e.g., 6 months and 1 year).

### Procedure
1. Warm-up: run each query once (discard).
2. Execute each query **10 runs**, record wall-clock ms.
3. Store results in `bench/results/*.csv` with columns:
   `system,durability_mode,dataset,query_id,run,rows,ms`.
4. Compute summary stats (`min/median/p95`) and check consistency.

### Durability modes
- `durable`: default compose (baseline, production-like).
- `embedded`: compose override + **UNLOGGED** tables to eliminate WAL where possible.
  - Caveats: not crash-safe; `UNLOGGED` tables are truncated after crash/restart; no replication.

### Doublets parity
- Model the same entities/time attributes.
- Return the **same logical result sets** as the SQL above.
- Export CSV with the same columns (include `durability_mode` = `embedded`).

## Validation
- `SELECT bookings.now();` returns a timestamp (model time).
- Sanity: row counts by status, min/max scheduled times, airports present.
- **Validity sanity:** joining `routes` to `flights` without the view must use `r.validity @> f.scheduled_departure`.
- **Embedded mode check:** verify `relpersistence='u'` for `bookings` tables:
  ```sql
  SELECT relname, relpersistence
  FROM pg_class
  WHERE relnamespace = 'bookings'::regnamespace AND relkind='r';
  ```

## Acceptance Criteria
- [ ] Dockerized environment with **PostgreSQL 18** (`postgres:18`) and reproducible startup.
- [ ] Big demo DB available (document exact method & parameters).
- [ ] `10_timetable_queries.sql` runnable on the generated dataset.
- [ ] Manual queries that bypass the view include the **validity** check.
- [ ] Benchmarks run in **both** durability modes and produce CSVs at two scales.
- [ ] README updated with a short how-to and links to scripts & results.

## Nice to have (optional)
- Timing plots from CSVs.
- Notes on indexes that materially change plans.

## References
- **PostgresPro Airlines demo DB (official):** https://postgrespro.ru/education/demodb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Generate Airlines Demo DB (Big Data) and Build a Flight Timetable to make a benchmark for PostgreSQL vs Doublets #11

Summary

Scope

Deliverables

Prerequisites (PostgreSQL)

Containerized Environment

Durable (default) compose

Embedded-like (WAL-light) override

Setup Options

Option A — Load a ready snapshot

Option B — Generate your own “big data” (recommended)

What to implement (PostgreSQL)

A. Timetable queries (using the provided view)

B. Manual timetable (without the view, explicit validity check)

Benchmark plan

Metrics

Datasets

Procedure

Durability modes

Doublets parity

Validation

Acceptance Criteria

Nice to have (optional)

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generate Airlines Demo DB (Big Data) and Build a Flight Timetable to make a benchmark for PostgreSQL vs Doublets #11

Description

Summary

Scope

Deliverables

Prerequisites (PostgreSQL)

Containerized Environment

Durable (default) compose

Embedded-like (WAL-light) override

Setup Options

Option A — Load a ready snapshot

Option B — Generate your own “big data” (recommended)

What to implement (PostgreSQL)

A. Timetable queries (using the provided view)

B. Manual timetable (without the view, explicit validity check)

Benchmark plan

Metrics

Datasets

Procedure

Durability modes

Doublets parity

Validation

Acceptance Criteria

Nice to have (optional)

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions