From 6761156025cfd0894101e770918c4a72042bb9e4 Mon Sep 17 00:00:00 2001 From: Mike Lonergan Date: Sun, 5 May 2019 15:43:40 -0700 Subject: [PATCH] Remove the 2018 implementation docs from their old location --- ...create-backup-for-new-database-creation.md | 52 -------------- ...ebuild-the-centralized-database-service.md | 68 ------------------- 2 files changed, 120 deletions(-) delete mode 100644 docs/HOWTO-create-backup-for-new-database-creation.md delete mode 100644 docs/HOWTO-rebuild-the-centralized-database-service.md diff --git a/docs/HOWTO-create-backup-for-new-database-creation.md b/docs/HOWTO-create-backup-for-new-database-creation.md deleted file mode 100644 index fe8df5a..0000000 --- a/docs/HOWTO-create-backup-for-new-database-creation.md +++ /dev/null @@ -1,52 +0,0 @@ -# HOWTO create a backup for new database creation - -When standing up a new database instance, the project team must submit that data in a specified format so that the database automation we support can ingest the initial data load. The currently-supported procedures for initial data load are [here](https://github.com/hackoregon/civic-devops/blob/master/docs/HOWTO-rebuild-the-centralized-database-service.md#restore-databases-from-backup) - -## Requirements - -These are the current requirements for submitting a database backup to the Hack Oregon DevOps squad that can be successfully used to perform the initial load of a new database instance. - -1. Coordinate with the Hack Oregon DevOps squad to finalize the name of the database instance and acquire the username that will own that database instance (e.g. a project named `mars-space-flight` might have two databases named `mars-space-flight-surface-lander` and `mars-space-flight-liftoff`). - -2. The current required format for database backups is compressed plain-text, and the current supported version of PostgreSQL at this time (March 2018) is 9.6.6 (latest supported on Amazon Linux 2). Other formats are sometime incompatible with PostgreSQL servers at the same release or newer. Note that database backups are not designed to be restored to older versions of PostgreSQL - e.g. don't take a backup from 10.3 and expect it to restore to 9.6. - - A plain-text backup is SQL code with some `psql` commands mixed in. You can read the backup's contents with a text editor - a compressed plain text backup can be read if you simply uncompress it first. - -3. A database on the source server should have the same owner name as it will have on the destination server. All the objects in the database should have that owner name too. This means the user / role must exist on the source when the backup is created and on the destination before it is restored. - - You can change owners of databases and the objects they contain with SQL script like the following - e.g. source database is `odot_crash_data`, original owner is `znmeb`, new owner is `transportation-systems`: - - ``` - ALTER DATABASE odot_crash_data OWNER TO "transportation-systems"; - REASSIGN OWNED BY znmeb TO "transportation-systems"; - ``` - - Note: the double quotes are required because of the hyphen in the new owner's name. - -4. This command will create a compressed plain-text backup: - - ``` - pg_dump -Fp -v -C -c --if-exists -d \ - | gzip -c > .sql.gz - ``` - - `` is the database. Run this as the database superuser `postgres` on Linux. The parameters: - * `-Fp`: plain text format - * `-v`: verbose - * `-C -c`: create a clean new database. This is done by DROPping the database objects. If they doesn't exist, the DROP will error, so ... - * `--if-exists`: don't DROP if it doesn't exist. You'll get a `NOTICE` instead of an `ERROR` and the restore will continue! - -5. Command to restore the compressed backup: - - ``` - gzip -dc .sql.gz | psql -v ON_ERROR_STOP=1 - ``` - - Run this as the database superuser `postgres`. As noted above, the owners of all the objects on the backup file must exist in the destination server or the restore will fail. - -6. Did it work? Usually you'll see `Restore completed` at the end of a successful restore. The `ON_ERROR_STOP=1` option will force `psql` to stop on its first error. - -7. The latest release of `data-science-pet-containers` has an option to do restore testing on an Amazon Linux 2 container running PostgreSQL from Amazon Linux Extras. See - -## Need Help? -If you need help generating compatible backups from your development data, please post your questions to the #chat-databases channel on the [Hack Oregon 2018 Slack group](https://hackoregon2018.slack.com). diff --git a/docs/HOWTO-rebuild-the-centralized-database-service.md b/docs/HOWTO-rebuild-the-centralized-database-service.md deleted file mode 100644 index 3349011..0000000 --- a/docs/HOWTO-rebuild-the-centralized-database-service.md +++ /dev/null @@ -1,68 +0,0 @@ -# HOWTO: Rebuild the Centralized Database service - -This set of scripts and procedures is how we built the PostgreSQL database service for the 2018 Hack Oregon projects. - -1. Build an EC2 machine -2. Install and configure PostgreSQL -3. Create users and databases -4. Restore databases from backup -5. Extend the EBS data volume for new databases - -## Build an EC2 machine - -This procedure relies on the `create-ec2-machine-database.sh` script: - -* Prereq: run the script from an environment where AWS keys are available and the keys grant EC2 machine creation privileges -* Prereq: the [awscli](https://docs.aws.amazon.com/cli/latest/userguide/installing.html) is installed -* Step 1: edit any of the VARIABLES to match your intended EC2 virtual machine environment, especially the KEYNAME -* Step 2: run `./create-ec2-machine-database.sh NAME_OF_RESULTING_MACHINE`, where NAME_OF_RESULTING_MACHINE will be used to populate the AWS "Name" tag of the machine - -## Install and configure PostgreSQL - -NOTE: "development" is meant to signify this was the build for a development, not a production, instance of the database. -This procedure relies on the `create-database-development.sh` script that runs on the server: - -* Prereq: an EC2 machine has already been built -* Assumption: no database service has been installed or configured on the machine -* Prereq: the AWS "amazon-linux-extras" repo is available, and the version of PostgreSQL installed is the Amazon Linux package -* Step 1: edit any of the VARIABLES to match your intended PostgreSQL service and environment -* Step 2: run `./create-database-development.sh` with no parameters -* Step 3: fill in the `postgres` database user password when prompted - -## Create users and databases - -This procedure relies on the `create-db.sh` script that runs on the PostgreSQL server - -* Prereq: PostgreSQL has been configured successfully and is running -* Prereq: run the script from within the machine that hosts the targeted PostgreSQL service -* Prereq: run the script from a shell where `sudo -u postgres` will silently succeed -* Step 1: decide upon the name of the database instance, name of database user who will own that instance, and password for that user -* Step 2: `scp` the script to the server and `ssh` in as `ec2-user` (or equivalent) -* Step 3: run `./create-db.sh` with no parameters (or alternatively, feed the values from Step 1 in as the correct parameters) -* Step 4: fill in the database user password when prompted (if running interactively) -* Result: database user and associated database are created - -## Restore databases from backup - -This procedure relies on stepwise commands run on the PostgreSQL server to restore database data to an empty database instance. - -There are two scenarios: `.backup` file that was generated from `pgdump` native format, or `.sql.gz` file that was generated from `pgdump` in "compressed SQL" format - -### Restore from .sql.gz - -* Prereq: database instance has been created -* Prereq: named database user is owner of the target database instance -* Prereq: backup has been created from instance with the same database user name and database instance name -* Step 1: `ssh` into the machine hosting the PostgreSQL service as `ec2-user` (or equivalent) -* Step 2: run wget to download the backup file e.g. `wget https://s3-us-west-2.amazonaws.com/hacko-data-archive/2018-transportation-systems/data/interim/passenger_census.backup` -* Step 3: run `gzip -dc BACKUP_FILE.sql.gz | sudo -u postgres psql` -* Step 4: test that the restore succeeded (i.e. the database is not still empty) by running `sudo -u postgres psql -d [DATABASE_INSTANCE] -c 'SELECT COUNT(*) FROM [DATABASE_INSTANCE/TABLE_NAME];'` where [DATABASE_INSTANCE] is the database instance name e.g. `sudo -u postgres psql -d passenger_census -c 'SELECT COUNT(*) FROM passenger_census;'` -* Note: the test command should result in a non-zero count - -# Extend the EBS data volume for new databases - -EBS volumes are allocated statically for EC2 machines - that is, if you need to allow your data files to grow to 500GB, then you need to explicitly allocate 500GB to the EBS volume housing those data files. - -Since EBS space isn't free, Hack Oregon is working to allocate only as much space as is needed for the existing databases. When a new database instance is added, chances are the EBS volume needs to be "grown" to accommodate the new space requirements. - -Detailed instructions on how to [increase the space on the data volume can be found here](https://github.com/hackoregon/civic-devops/blob/master/docs/HOWTO-extend-EBS-Volume-size.md).