Skip to content

A practical PostgreSQL DBA playbook with runbooks, checklists, and scripts for keeping production clusters reliable.

License

Notifications You must be signed in to change notification settings

fdaniel-alvarez-dev/postgresql-dba-runbooks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PostgreSQL DBA Runbooks

Production-grade PostgreSQL operational playbooks from 20+ years of SLA-critical database operations.

License: MIT PostgreSQL Patroni

Problem This Repository Solves

Database teams under SLAs need repeatable, safe procedures to keep PostgreSQL highly available, recoverable, and performant. Ad-hoc fixes and tribal knowledge increase downtime and risk. These runbooks give you battle-tested, step-by-step guidance with built-in safety checks and rollback plans.

Who This Is For

  • DBAs and support engineers running PostgreSQL in production
  • SRE/Platform engineers handling HA, failover, and zero-downtime changes
  • Teams building a customer-facing knowledge base or on-call playbooks

What’s Inside (at a glance)

  • High Availability: Patroni cluster builds, failover/switchover, replication troubleshooting, split-brain recovery
  • Backup & Recovery: pgBackRest, WAL-G, PITR, verification automation, DR drills
  • Performance: Query optimization workflow, index strategy, autovacuum tuning, pooling patterns, slow query deep dives
  • Incidents: Outage playbook, disk full, replication break, lock contention, RCA template
  • Migrations: Zero-downtime version upgrades, logical cutovers, tablespace moves
  • Monitoring & KB: PMM setup, critical metrics, alert tuning, common errors, Linux troubleshooting toolkit
  • Scripts: Health checks, diagnostics, automation helpers
  • Case Studies: Real-world scenarios with metrics (5TB migration, Patroni failover, PITR RPO 5m, query tuning 90% win)

Quick Start

  1. Clone: git clone https://github.com/yourusername/postgresql-dba-runbooks
  2. Browse docs: start with docs/operating-philosophy.md and docs/runbook-template.md for structure and safety expectations.
  3. Pick a category: e.g., HA → docs/high-availability/failover-procedures.md.
  4. Practice: run non-destructive validation steps in staging before production.
  5. Adapt: update parameters (hosts, ports, paths) to your environment, and keep verification + rollback steps intact.

How to Use Safely

  • Always read prerequisites and safety checks first.
  • Pause if any verification step fails; do not continue until resolved.
  • Keep rollback instructions ready before executing changes.
  • Record MTTR/RPO/RTO achieved and feed back into tuning.

Roadmap (next commits)

  • Add full runbook set per the documented structure
  • Add scripts with usage examples and ShellCheck/SQL validation
  • Wire CI for markdownlint, shellcheck, and SQL syntax checks

License

MIT. See LICENSE for details.

About

A practical PostgreSQL DBA playbook with runbooks, checklists, and scripts for keeping production clusters reliable.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published