|
| 1 | +# Fly pg_dump to AWS S3 |
| 2 | + |
| 3 | +This is a **hacky** way to have a Fly app that dumps postgres databases that are also on Fly, to AWS S3 buckets. |
| 4 | +This uses a dedicaded app for the *backup worker* that is woken up to start the dump. When it finished it is scaled back to 0, meaning it is not billable when idle*. |
| 5 | + |
| 6 | +*The machine is not billable, any volumes will be. This can further be improved so volumes are deleted. Volumes are required as the temporary disk is of an unknown, small size. |
| 7 | + |
| 8 | + |
| 9 | +## Why this? |
| 10 | + |
| 11 | +Indeed Fly's pg images support wal-g config to S3 via env vars. But I wanted a way to create simple archives periodically with pg_dump, making it easy for developers to replicate databases, and have a point in time recovery. |
| 12 | + |
| 13 | +Since the backup worker is running on Fly, and not in some other external service like AWS or GitHub actions, we can create backups rather quickly. And also because the latency/bandwith from Fly to AWS are quite good (in the regions I've tested). |
| 14 | + |
| 15 | +And what about Fly machines? I haven't tried them. |
| 16 | + |
| 17 | +## Requirements |
| 18 | + |
| 19 | +1. Fly postgres instance and a user with read permissons. |
| 20 | + Create the `db_backup_worker` user with: |
| 21 | + ```sql |
| 22 | + CREATE USER db_backup_worker WITH PASSWORD '<password>'; |
| 23 | + GRANT CONNECT ON DATABASE <db_name> TO db_backup_worker; |
| 24 | + -- For all schemas (example for public): |
| 25 | + GRANT USAGE ON SCHEMA public TO db_backup_worker; |
| 26 | + GRANT SELECT ON ALL TABLES IN SCHEMA public TO db_backup_worker; |
| 27 | + GRANT SELECT ON ALL SEQUENCES IN SCHEMA public TO db_backup_worker; |
| 28 | + ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO db_backup_worker; |
| 29 | + ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON SEQUENCES TO db_backup_worker; |
| 30 | + ``` |
| 31 | + |
| 32 | +2. AWS S3 bucket and an access token with write permissions to it. |
| 33 | + Iam policy: |
| 34 | + ```json |
| 35 | + { |
| 36 | + "Version": "2012-10-17", |
| 37 | + "Statement": [ |
| 38 | + { |
| 39 | + "Sid": "WriteDatabaseBackups", |
| 40 | + "Effect": "Allow", |
| 41 | + "Action": [ |
| 42 | + "s3:PutObject", |
| 43 | + "s3:AbortMultipartUpload", |
| 44 | + "s3:ListMultipartUploadParts" |
| 45 | + ], |
| 46 | + "Resource": [ |
| 47 | + "arn:aws:s3:::your-s3-bucket/backup.tar.gz" |
| 48 | + ] |
| 49 | + } |
| 50 | + ] |
| 51 | + } |
| 52 | + ``` |
| 53 | + |
| 54 | + |
| 55 | +## Installation |
| 56 | + |
| 57 | +1. Launch your database backup worker with `fly launch --image ghcr.io/significa/fly-pg-dump-to-s3` |
| 58 | + |
| 59 | +2. Create a volume for temporary files with `fly volumes create --no-encryption --size $SIZE_IN_GB temp_data` |
| 60 | + |
| 61 | +3. Add the volume to your `fly.toml` |
| 62 | + ```toml |
| 63 | + [mounts] |
| 64 | + destination = "/tmp/db-backups" |
| 65 | + source = "temp_data" |
| 66 | + ``` |
| 67 | + |
| 68 | +4. Set the required fly secrets (env vars). Example: |
| 69 | + ```env |
| 70 | + AWS_ACCESS_KEY_ID=XXXX |
| 71 | + AWS_SECRET_ACCESS_KEY=XXXX |
| 72 | + DATABASE_URL=postgresql://username:password@my-fly-db-instance.internal:5432/my_database |
| 73 | + S3_DESTINATON=s3://your-s3-bucket/backup.tar.gz |
| 74 | + FLY_API_TOKEN=XXXX |
| 75 | + ``` |
| 76 | + |
| 77 | +5. Run `flyctl scale count 1` whenever you want to start a backup. Add this to any periodic runner along with the envs `FLY_APP` and `FLY_API_TOKEN` to run it periodically. |
| 78 | + |
| 79 | + |
| 80 | +## What about backup history? |
| 81 | + |
| 82 | +You could add a date to the S3_DESTINATON filename (by changing the docker CMD). But I recommend adding versioning to your S3 and manage retention via policies. |
| 83 | + |
| 84 | + |
| 85 | +## Backup multiple databases/backups in one go? |
| 86 | + |
| 87 | +Just use the env vars like so: |
| 88 | + |
| 89 | +```env |
| 90 | +BACKUP_CONFIGURATION_NAMES=ENV1,STAGING_ENVIRONMENT,test |
| 91 | +
|
| 92 | +ENV1_DATABASE_URL=postgresql://username:password@env1/my_database |
| 93 | +ENV1_S3_DESTINATON=s3://sample-bucket/sample.tar.gz |
| 94 | +
|
| 95 | +STAGING_ENVIRONMENT_DATABASE_URL=postgresql://username:password@sample/staging |
| 96 | +STAGING_ENVIRONMENT_S3_DESTINATON=s3://sample-db-backups/staging_backup.tar.gz |
| 97 | +
|
| 98 | +TEST_DATABASE_URL=postgresql://username:password@sample/test |
| 99 | +TEST_S3_DESTINATON=s3://sample-db-backups/test_backup.tar.gz |
| 100 | +``` |
| 101 | + |
| 102 | +It will backup all the databases to the desired s3 destination. AWS and fly tokens are reused. |
| 103 | + |
| 104 | +## Env vars documentation |
| 105 | + |
| 106 | +- `DATABASE_URL`: Postgres database URL. Example: `postgresql://username:password@test:5432/my_database` |
| 107 | +- `S3_DESTINATON`: AWS S3 fill file destinaton Postgres database URl |
| 108 | +- `BACKUP_CONFIGURATION_NAMES`: Optional: Configuration names/prefixs for `DATABASE_URL` and `S3_DESTINATON` |
| 109 | +- `FLY_APP_NAME`: Optional to scale down the worker. Automatically set by Fly. |
| 110 | +- `FLY_API_TOKEN`: Optional to scale down the worker. Fly api token created via flyctl or the UI. |
| 111 | +- `BACKUPS_TEMP_DIR`: Optional: Where the temp files shoudl go. Defaults to: `/tmp/db-backups` |
| 112 | +- `PG_DUMP_ARGS`: Optional: Override the default `pg_dump` args: `--no-owner --clean --no-privileges --no-sync --jobs=4 --format=directory --compress=0` |
| 113 | + |
| 114 | +## Is this hacky? Does it work in production environments? |
| 115 | + |
| 116 | +Yes. Yes :sweat_smile: |
| 117 | + |
| 118 | +## Will this work outside fly? |
| 119 | + |
| 120 | +Yes, if FLY_APP_NAME or FLY_API_TOKEN are not prsent, fly commands will be ignored. |
0 commit comments