Production-grade PostgreSQL operational playbooks from 20+ years of SLA-critical database operations.
Database teams under SLAs need repeatable, safe procedures to keep PostgreSQL highly available, recoverable, and performant. Ad-hoc fixes and tribal knowledge increase downtime and risk. These runbooks give you battle-tested, step-by-step guidance with built-in safety checks and rollback plans.
- DBAs and support engineers running PostgreSQL in production
- SRE/Platform engineers handling HA, failover, and zero-downtime changes
- Teams building a customer-facing knowledge base or on-call playbooks
- High Availability: Patroni cluster builds, failover/switchover, replication troubleshooting, split-brain recovery
- Backup & Recovery: pgBackRest, WAL-G, PITR, verification automation, DR drills
- Performance: Query optimization workflow, index strategy, autovacuum tuning, pooling patterns, slow query deep dives
- Incidents: Outage playbook, disk full, replication break, lock contention, RCA template
- Migrations: Zero-downtime version upgrades, logical cutovers, tablespace moves
- Monitoring & KB: PMM setup, critical metrics, alert tuning, common errors, Linux troubleshooting toolkit
- Scripts: Health checks, diagnostics, automation helpers
- Case Studies: Real-world scenarios with metrics (5TB migration, Patroni failover, PITR RPO 5m, query tuning 90% win)
- Clone:
git clone https://github.com/yourusername/postgresql-dba-runbooks - Browse docs: start with
docs/operating-philosophy.mdanddocs/runbook-template.mdfor structure and safety expectations. - Pick a category: e.g., HA →
docs/high-availability/failover-procedures.md. - Practice: run non-destructive validation steps in staging before production.
- Adapt: update parameters (hosts, ports, paths) to your environment, and keep verification + rollback steps intact.
- Always read prerequisites and safety checks first.
- Pause if any verification step fails; do not continue until resolved.
- Keep rollback instructions ready before executing changes.
- Record MTTR/RPO/RTO achieved and feed back into tuning.
- Add full runbook set per the documented structure
- Add scripts with usage examples and ShellCheck/SQL validation
- Wire CI for markdownlint, shellcheck, and SQL syntax checks
MIT. See LICENSE for details.