-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrading between major versions? #37
Comments
The "Usage" section of this document describes what |
I guess my concern does not communicate well in the first comment... Lets try again.
It would be fairly useful to have a section in the Readme regarding how to approach this. (And what to avoid?) |
I've been wondering about/looking out for a good way to handle this Perhaps for each "supported" version, we could add a variant that also |
@roosmaa: my comment wasn't so much directed at you as intended as background. |
+1 for major version upgrade instructions. Yet another alternative, for smaller databases (?), should be that the old container runs |
Yeah this is a major pain ... at the moment. Would be great to get some official way to do it. |
Yeah, just nuked my postgres install on accident with an upgrade. Are there any plans for this? |
I am not a docker guru, but I could imagine that it is possible to add some script to the container that is started when the container starts. That script should check the database files, and when it detects the correct version, simply startup the postgresql engine. If not, it should decide to run pgupgrade or dump/reload. Shouldn't be rocket science? |
The main problem as I understand it is that we need both the old version
and the new version of the postgres binaries simultaneously in the same
container (otherwise you can't pgdump the old data).
|
+1. I'm just going to nuke what I have since it's just a personal server I don't really use, but this seems like a major pain maintenance wise. |
I've had success upgrading launching another postgres instance with the new version, and then using dumpall and psql to move all the data piping:
It's rather simple :) pg_upgrade is supposed to be faster though, I'm interested if someone has an easy way of using it. |
I was attempting to hack up a bash script to manage the upgrade process between two containers of different versions by using
the idea is a bit convoluted, because I'm extracting the binaries from the old version and mounting them into a new temporary container created from the same image of the container with the new version, along with data volumes from existing containers (the old and the new ones).
Any ideas? |
I'm using @Dirbaio tricks to dump all data from old container and restore it into the new container. Of course, this needs separate data volume and run fast with small data sets. I hope that pr_upgrade can be more 'intelligent', even better if the Postgres itself could do the upgrade by itself. I mean, when installing the new version, the apps should also know how old version format looks like. Maybe also include a necessary old version binary to do the data 'upgrade' then after the upgrade finished, just delete that old version binary. |
+1 |
I think we should have a complete solution or none at all, as I doubt that most people always use the latest version. From: Jonas Thiem [mailto:notifications@github.com] Ok so why can't the container be changed to simply have both the latest and the second-to-latest postgresql version installed? (e.g. just put a chroot into the docker container and install the older package from the distribution there or something. Once a script for that has be made that just needs to be passed the name of the respective older apt package, it shouldn't really be a lengthy process) At least that would cover the majority of users and it would allow to successfully convert the database for them automatically by the container. — |
Why is none at all any better? I would assume most people running a production server with some basic security consciousness will probably use at least some version of somewhere in the latest release cycle - such major upgrades aren't happening that often, right? (I'm thinking of the usual docker environment here optimized for flexibility and repeatability where upgrading and reverting if it goes wrong is usually less painful than for a regular oldschool server administrator) And if doing it for all versions is not feasible, at least doing it for the respective second-to-newest one should be doable.. If for the other cases where the upgrade can't be done there is a descriptive error message like there is now, I don't see how this wouldn't be at least a considerable improvement - although I agree that being able to upgrade from any arbitrary previous version automatically is of course better if it can be done. |
It binds resources better invested into developing a general solution, and it possibly could stand in the way when coming up with the general solution. -Markus From: Jonas Thiem [mailto:notifications@github.com] Why is none at all any better? I would assume most people running a production server with some basic security consciousness will probably use at least the latest release - they aren't happening that often, right? And if doing it for all versions is not feasible, at least doing it for the respective second-to-newest one should be doable.. If for the other cases where the upgrade can't be done there is a descriptive error message like there is now, I don't see how this wouldn't be at least a considerable improvement - although I agree that being able to upgrade from any arbitrary previous version automatically is of course better if it can be done. — |
Yes, resignation is the right word. We just do not see how it could be solved in a good way. |
If you have time you can do whatever you like -- this is the nature of open source. Whether or not the project leads will accept your PR is a different question and not up to me to decide. |
I'm a strong -1 on including the "second to last version" in every tag
unless the size bloat by doing so is extremely minimal (which is why this
solution hasn't been implemented before now).
|
I spent a little time working on a generic way to run |
I haven't tested this yet but, since it's built on top of Debian, maybe an option would be to add some script to Considering the current DB is on 9.4 and you want to migrate it to 9.5 apt-get update
apt-get install -y postgresql-9.5 postgresql-contrib-9.5
pg_dropcluster 9.5 main
pg_upgradecluster 9.4 main -m upgrade -k This is the basic idea. However, this will add some unneccessary downtime. We could save some time by creating a temporary image:
This is just to simplify the explanation, I'd probably use COPY rather than The container running from this new image would then only execute the remaining steps and would be run only once during the upgrade after stopping the previous 9.4 container and before starting the new 9.5 container. Maybe to reduce the impact on users, one might want to run the 9.4 cluster in a read-only transaction during the upgrade so that it would mostly work and handle the write errors in the application to tell the users the database is being upgraded and that it's currently not possible to write new data to it... If upgrading the cluster would take a while even with the |
@Jonast, The only complete way is to have images for each set of supported versions; to ensure users can upgrade between them. @tianon, has a set of them in tianon/docker-postgres-upgrade. The other major problem is that it requires two folders (volumes) in specific locations to upgrade your database, since The reason mariadb works to "upgrade automatically" is that the mariadb team has to put in extra work to make newer versions able to read older data directories, whereas postgres can make backwards incompatible changes in newer releases and not have to worry about file formats from older versions. This is why postgres data is usually stored under a directory of its version, but that would over complicate volume mounts if every version of postgres had a different volume. @rosenfeld, unfortunately
|
Hi @yosifkit. I don't understand why you think pg_upgradecluster being specific to Debian is a problem since the official PG images are based on Debian. As a matter of fact this is how I upgraded my PG servers from 9.5.4 to 9.6: docker pull postgresql:9.6 # so we can easily start it after the upgrade is finished
docker run -it --rm --name pg-upgrade -v /mnt/pg/data:/var/lib/postgresql \
-v /mnt/pg/config:/etc/postgresql --entrypoint bash postgres:9.5.4
# in the bash session in the container I installed 9.6:
apt-get update && apt-get install -y postgresql-9.6 postgresql-contrib-9.6
# Then, in another session, I stopped the server running current PG in order to
# start the upgrade (it's always a good idea to perform a back-up of the data
# before upgrading anyway).
pg_upgradecluster -m upgrade --link 9.5 main
# optionally: pg_dropcluster 9.5 main After the upgrade is finished just start the new 9.6 container with the same arguments. Here's how I run the container: docker run --name pg -v /mnt/pg/scripts:/pg-scripts \
-v /mnt/pg/data:/var/lib/postgresql \
-v /mnt/pg/config:/etc/postgresql -p 5432:5432 \
postgresql:9.6 /pg-scripts/start-pg 9.6 10.0.1.0/24 I stop it with Here's how start-pg looks like: #!/bin/bash
version=$1
net=$2
setup_db(){
pg_createcluster $version main -o listen_addresses='*' -o wal_level=hot_standby \
-o max_wal_senders=3 -o hot_standby=on -- -A trust
pghba=/etc/postgresql/$version/main/pg_hba.conf
echo -e "host\tall\tpguser\t$net\ttrust" >> $pghba
echo -e "host\treplication\tpguser\t$net\ttrust" >> $pghba
pg_ctlcluster $version main start
psql -U postgres -c '\du' postgres|grep -q pguser || \
createuser -U postgres -l -s pguser
pg_ctlcluster $version main stop
}
[ -d /var/lib/postgresql/$version/main ] || setup_db
exec pg_ctlcluster --foreground $version main start The server is actually managed by some systemd unit files, but this is basically how it works behind the scene. |
What about a solution where the new container is using an image with an older postgresql version to launch the older version, then use the regular flow to perform the update? It might be tricky to get it right, but at least the new postgresql image does not need to contain any previous postgresql version. |
I love the simplicity of @Dirbaio's approach. You wouldn't have to know what two versions you're migrating between for it to work. If you're worried about the speed of that approach, it sounds like the Upgrade via Replication approach described in the docs would be the best option, over pg_upgrade, in terms of actual downtime. |
@dschilling yes, if you can afford the time to set up slony (I started to read its documentation once but found it very complicated) or if you can't afford any downtime window, that's probably the best plan. Since I'm not comfortable with setting up Slony I preferred to use the method I explained above because the downtime is much smaller when you use pg_upgrade with |
Instead of Slony, you could use pglogical: I plan to try this approach, using a HAProxy to switch from the old to the new version. Pglogical should be easier to use than Slony, but I still need to validate. |
Yes, so you use If you do need some protection because you could "by mistake" change from
In most software (at least using semver) it's expected that major version is "breaking" (changes in API, data migration) so if someone explicitly changes from one major to another major then it should be assumed that user also expect some sort of migration... If you upgrade your OS or any app do you expect it to NOT perform data migration? |
To me, that sounds pretty sensible if the official PostgreSQL containers gain the upgrade ability. 😄
Typos. It sucks when they're in a critical thing like this could potentially be, and not noticed until too late.
Well, they probably shouldn't be using a container literally called |
Still, this feel like "deploy on Friday". If it's critical infrastructure you do QA, dry-run (basically prepare) and not fiddle with the db in production. Besides, this all assumes that all postgres major upgrades breaks data/compatibility all the time, which is not the case neither... |
Yeah, that's fair. 😄 |
I would argue that if I set a version tag, there will never be any upgrade. Therefore, upgrading between major versions MUST be intentional because I force it by changing the major version tag. In this case, the upgrade should be handled automatically in some way. The same applies to the latest tag. Adding an environment variable to disable automatic upgrades and perform them manually wouldn't hurt anybody. Also, when an automatic upgrade process starts, it should dump the old database just in case, providing an easy way back. This could also be automated when downgrading back to the old major version tag. In my opinion, this is a solution that shouldn't hurt anybody but makes life much easier. The current way in a containerized world is just painful. |
And that's why I'm not posting comments on that project. The expectations for pgautoupgrade are very different from what I'd expect from a "docker-library/postgres" container. |
That would need to be thought through pretty carefully. For larger sized databases that could delay the upgrade process significantly, and could potentially cause other (serious) issues like running the filesystem out of space (etc). The pgautoupgrade container uses |
Postgres isn't most software. And just because I update a version of Word, I still expect to be able to read and write Word documents with the older version. Those storage formats are stable. There is a big difference between upgrading programs and data formats on disk. Changing data on disk (when upgrading) is surprising to most users. Hiding the upgrade behind an env flag is the most unsurprising thing that still makes the behavior available to those who want it. |
Pondering it a bit more I think there is a consensus that it would be great if original pg containers would have some sort of automatic upgrade functionality. There seems to be the consensus that there could/should be a flag/variable, that would control the process. Right now the discussion seems to be about preferred default (upgrade by default or warn/stop by default). IMHO, while I would prefer and argue for auto upgrade by default, if there is a flag I could setup my deployments to have the flag and set to "true" so I would get those upgrades "hassle" free sot that would also work for me (still, less than ideal ;) ). |
That's (obviously) what I'd argue for. It's an extra flag for those who know what will happen (eg I default to being conservative when it comes to changing data on storage volumes. Especially when you have potentially non-downgradeable changes. (Upgraded binaries and ephemeral data are entirely different) |
On that note, I'm not sure that downgrades aren't possible. Haven't tried it out personally (yet), but I wouldn't be surprised if |
Ugh.
That was when attempting to go from 15 to 14. Same happens with 15 to 10. Seems like an unnecessary restriction (to me). 😦 |
New home for the "pgautoupgrade" project on GitHub. Created a "pgautoupgrade" organisation for it: https://github.com/pgautoupgrade/docker-pgautoupgrade If anyone wants to help out in the development, let me know. Happy to add people to the project to help kick it into shape. 😄 Also, some initial discussion (prior to implementation) for how to potentially do an (optional) backup prior to the automatic upgrade: pgautoupgrade/docker-pgautoupgrade#6 To me, the concept there sounds like a reasonable first go. But more people casting their eyes over it first would be good. 😄 |
As has already been pointed out, the issue is with how |
I'm aware, and it would be awesome if Postgres would support upgrading between versions like other databases (enabled conditioned on flag, off by default)… |
I agree completely -- however, we (the maintainers of this repository where we're all discussing this) don't maintain PostgreSQL, but merely package it for use in Docker containers (hence being "stuck" with the limitations of the upstream |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as spam.
This comment was marked as spam.
If anyone's interested in trying an "automatic upgrade" image that works for Debian based PostgreSQL containers, we've just added https://hub.docker.com/r/pgautoupgrade/pgautoupgrade These new Debian based images are thanks to the efforts of @andyundso, who's been diligently improving all kinds of stuff in the project. 😄 To use either of these two new Docker images, pick which version of PostgreSQL you want to upgrade to (15 or 16), and change your existing PostgreSQL docker image to match. For example, to upgrade your database to PostgreSQL 15 (and keep on 15):
Whereas to upgrade your database to PostgreSQL 16, and keep on 16, you should use:
To re-iterate, these new images are for people currently using a Debian based PostgreSQL image. If you're instead using an Alpine based PostgreSQL image, use our existing images instead:
|
And do not forget the images from @tianon that has been available on https://github.com/tianon/docker-postgres-upgrade (also on docker hub) for quite awhile. Very recently I had to implement generic automated (or how I call it script-assisted) postgres upgrade solution as part of a bigger project and I used these images as the base for what we needed, customizing them to meet our specific requirements. |
Oh, looks like I forgot to mention that the pgautoupgrade images support both x86_64 and ARM64 these days too. The ARM64 support was added (by @andyundso 😀) a while ago, and is known to work on M-series macOS, and likely other ARM64 things. 😄 |
@tianon The "automatic upgrade" approach we've been working on in the pgautoupgrade repo seems to be working decently well. Would the team for the official Postgres repo (ie here) be open to adding some kind of official "automatic upgrade" images? We can create the needed PR's (etc). Note that the license of the pgautoupgrade repo is identical to the one here (on purpose), just to keep things easy for grabbing pieces from if needed. |
There doesn't seem to be a good way to upgrade between major versions of postgres. When sharing the volume with a new container with a newer version of postgres it won't run as the data directory hasn't been upgraded.
pg_upgrade
on the other hand requires (?) old installation binary files, so upgrading the data files from new server container is also difficult.It would be nice if there was some suggested way of doing this in the readme. Maybe even some meta container which does the upgrading from version to version?
The text was updated successfully, but these errors were encountered: