-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Set up the PostgresPro Airlines demo database, generate a large (“big data”) instance with the official generator, and implement flight timetable queries (departures/arrivals by airport/date/route). Use these queries to establish a repeatable benchmark and compare PostgreSQL 18 performance and ergonomics with a Doublets implementation on equivalent operations. The environment must be containerized, with two modes: default (durable) and embedded-like (WAL-light).
Scope
- PostgreSQL side (Docker): data generation, schema inspection, timetable queries (with validity checks), basic indexes/constraints, timing runs in both modes.
- Doublets side: functionally equivalent data model and queries (or pipelines), plus timing runs.
- Deliver reproducible scripts, Docker assets, and result artifacts (logs/CSVs).
Deliverables
[docker]/docker-compose.ymlpinned topostgres:18.[docker]/compose.embedded.yml(override) for embedded-like mode (WAL-light).[docker]/init/99_unlogged.sql(optional) to switch data tables to UNLOGGED in embedded-like runs.[docs]/HOWTO.md— runbook for Docker, generation parameters, and how to run both benchmark modes.[sql]/10_timetable_queries.sql— queries below.[bench]/pg/run.sh— timings (EXPLAIN (ANALYZE, BUFFERS)+ wall-clock), CSV output.[bench]/doublets/run.*— equivalent Doublets operations, same CSV schema.[bench]/schema-mapping.md— mapping Airlines entities/fields to Doublets structures.- README update with a “Benchmark: Timetable” section linking to the above.
Prerequisites (PostgreSQL)
- PostgreSQL 18 (pinned) via Docker image
postgres:18. - Disk space per chosen dataset size (3m/6m/1y/2y).
- Extensions typically used by the generator:
btree_gist,cube,earthdistance,dblink.
Containerized Environment
Durable (default) compose
# [docker]/docker-compose.yml
version: "3.9"
services:
pg:
image: postgres:18
container_name: airlines-pg18
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: postgres
ports: ["5432:5432"]
volumes:
- pgdata:/var/lib/postgresql/data
- ./init:/docker-entrypoint-initdb.d
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 20
volumes:
pgdata:Embedded-like (WAL-light) override
This mode trades durability for speed to better match embedded DB behavior.
# [docker]/compose.embedded.yml
services:
pg:
command:
- "postgres"
- "-c"; "fsync=off"
- "-c"; "synchronous_commit=off"
- "-c"; "full_page_writes=off"
- "-c"; "wal_level=minimal"Optional (strongly recommended for WAL-free data writes): switch large tables to UNLOGGED in embedded mode.
-- [docker]/init/99_unlogged.sql (only applied when you want embedded-like runs)
DO $$
DECLARE r record;
BEGIN
FOR r IN
SELECT format('%I.%I', n.nspname, c.relname) AS fqname
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'bookings' AND c.relkind = 'r'
LOOP
EXECUTE 'ALTER TABLE '|| r.fqname || ' SET UNLOGGED';
END LOOP;
END$$;Revert to durable: replace SET UNLOGGED with SET LOGGED.
Run:
# Durable
docker compose -f docker/docker-compose.yml up -d
# Embedded-like (WAL-light)
docker compose -f docker/docker-compose.yml -f docker/compose.embedded.yml up -dSetup Options
Option A — Load a ready snapshot
gunzip -c demo-YYYYMMDD-1y.sql.gz \
| docker compose -f docker/docker-compose.yml exec -T pg psql -U postgresRecreates
demoinside the container.
Option B — Generate your own “big data” (recommended)
From psql in the container (any DB, not demo yet):
\i install; -- creates DB demo, schemas, extensions
-- Example: generate 1 year with 4 workers
CALL generate(now(), now() + interval '1 year', 4);
-- Progress
SELECT busy();
-- Post-checks
\i check.sql;What to implement (PostgreSQL)
A. Timetable queries (using the provided view)
bookings.timetablealready handles airport-local times and the temporal validity join.
-- Departures from SVO on 2025-10-07
SELECT *
FROM bookings.timetable
WHERE departure_airport = 'SVO'
AND (scheduled_departure AT TIME ZONE 'UTC')::date = DATE '2025-10-07'
ORDER BY scheduled_departure;
-- Arrivals to SVO on 2025-10-07
SELECT *
FROM bookings.timetable
WHERE arrival_airport = 'SVO'
AND (scheduled_arrival AT TIME ZONE 'UTC')::date = DATE '2025-10-07'
ORDER BY scheduled_arrival;
-- Next available flight after model-time “now” for a route
SELECT *
FROM bookings.timetable t
WHERE t.departure_airport = 'SVX'
AND t.arrival_airport = 'WUH'
AND t.scheduled_departure > bookings.now()
ORDER BY t.scheduled_departure
LIMIT 1;B. Manual timetable (without the view, explicit validity check)
SELECT f.flight_id,
r.route_no,
r.departure_airport,
r.arrival_airport,
f.status,
f.scheduled_departure,
f.scheduled_arrival
FROM bookings.flights AS f
JOIN bookings.routes AS r
ON r.route_no = f.route_no
AND r.validity @> f.scheduled_departure -- validity at composition time
WHERE r.departure_airport = 'SVO'
AND f.scheduled_departure::date = DATE '2025-10-07'
ORDER BY f.scheduled_departure;Benchmark plan
Metrics
- Wall-clock time (median over N runs).
EXPLAIN (ANALYZE, BUFFERS)for PostgreSQL queries (saved to logs).- Result row counts and checksum (simple hash) for equivalence.
Datasets
- Two scales (e.g., 6 months and 1 year).
Procedure
- Warm-up: run each query once (discard).
- Execute each query 10 runs, record wall-clock ms.
- Store results in
bench/results/*.csvwith columns:
system,durability_mode,dataset,query_id,run,rows,ms. - Compute summary stats (
min/median/p95) and check consistency.
Durability modes
durable: default compose (baseline, production-like).embedded: compose override + UNLOGGED tables to eliminate WAL where possible.- Caveats: not crash-safe;
UNLOGGEDtables are truncated after crash/restart; no replication.
- Caveats: not crash-safe;
Doublets parity
- Model the same entities/time attributes.
- Return the same logical result sets as the SQL above.
- Export CSV with the same columns (include
durability_mode=embedded).
Validation
SELECT bookings.now();returns a timestamp (model time).- Sanity: row counts by status, min/max scheduled times, airports present.
- Validity sanity: joining
routestoflightswithout the view must user.validity @> f.scheduled_departure. - Embedded mode check: verify
relpersistence='u'forbookingstables:SELECT relname, relpersistence FROM pg_class WHERE relnamespace = 'bookings'::regnamespace AND relkind='r';
Acceptance Criteria
- Dockerized environment with PostgreSQL 18 (
postgres:18) and reproducible startup. - Big demo DB available (document exact method & parameters).
-
10_timetable_queries.sqlrunnable on the generated dataset. - Manual queries that bypass the view include the validity check.
- Benchmarks run in both durability modes and produce CSVs at two scales.
- README updated with a short how-to and links to scripts & results.
Nice to have (optional)
- Timing plots from CSVs.
- Notes on indexes that materially change plans.
References
- PostgresPro Airlines demo DB (official): https://postgrespro.ru/education/demodb