Tools for creating replication feeds from the main OSM database.
These tools are only useful if you run an OSM database like the main central OSM database!
You need a C++17 compliant compiler. GCC 8 and later as well as clang 7 and later are known to work.
You also need the following libraries:
Libosmium (>= 2.15.0)
https://osmcode.org/libosmium
Debian/Ubuntu: libosmium2-dev
Fedora/CentOS: libosmium-devel
boost-program-options (>= 1.55)
https://www.boost.org/doc/libs/1_55_0/doc/html/program_options.html
Debian/Ubuntu: libboost-program-options-dev
Fedora/CentOS: boost-devel
bz2lib
http://www.bzip.org/
Debian/Ubuntu: libbz2-dev
Fedora/CentOS: bzip2-devel
zlib
https://www.zlib.net/
Debian/Ubuntu: zlib1g-dev
Fedora/CentOS: zlib-devel
Expat
https://libexpat.github.io/
Debian/Ubuntu: libexpat1-dev
Fedora/CentOS: expat-devel
cmake
https://cmake.org/
Debian/Ubuntu: cmake
Fedora/CentOS: cmake
yaml-cpp
https://github.com/jbeder/yaml-cpp
Debian/Ubuntu: libyaml-cpp-dev
Fedora/CentOS: yaml-cpp-devel
libpqxx (version 6)
https://github.com/jtv/libpqxx/
Debian/Ubuntu: libpqxx-dev
Fedora/CentOS: libpqxx-devel
Pandoc
(Needed to build documentation, optional)
https://pandoc.org/
Debian/Ubuntu: pandoc
Fedora/CentOS: pandoc
gettext
(envsubst command for tests)
Debian/Ubuntu: gettext-base
Fedora/CentOS: gettext
PostgreSQL database and pg_virtualenv
Debian/Ubuntu: postgresql-common, postgresql-server-dev-all
Fedora/CentOS: postgresql-server, postgresql-server-devel
On Linux systems most of these libraries are available through your package manager, see the list above for the names of the packages. But make sure to check the versions. If the packaged version available is not new enough, you'll have to install from source. Most likely this is the case for Libosmium.
Use CMake to build in the usual way, for instance:
mkdir build
cd build
cmake ..
cmake --build .
Note: Unlike the original osmdbt version,
this fork uses the built-in pgoutput
plugin. There's no need to build and deploy
a custom osm-logical
plugin anymore.
You need a PostgreSQL database with
- config option
wal_level=logical
, - config option
max_replication_slots
set to at least 1, - a user with REPLICATION attribute, and
- a database containing an OSM database where this user has access.
There is an unofficial test/structure.sql
provided in this repository to
set up an OSM database for testing. Do not use it for production, use the
official way of installing an OSM database instead.
(The original for this file is in the openstreetmap-website repository at https://github.com/openstreetmap/openstreetmap-website/raw/master/db/structure.sql)
There are several commands in the build/src
directory. They all can be
called with -h
or --help
to see how they are run. They all need a common
config file, a template is in osmdbt-config.yaml
. This will be found
automatically if it is in the current directory, use -c
or --config
to
set a different path.
There are man pages for all commands in man
and an overview page in
man/osmdbt.md
.
If you have pandoc
installed they will be built when running make
.
To run the tests after build call ctest
.
To create a Debian/Ubuntu package, call debuild -I
.
The Debian package will contain the executables and the man pages.
First set up the configuration file and make sure you can access the database by running:
osmdbt-testdb
Then enable replication:
osmdbt-enable-replication
After that get current log files once every minute (or whatever you want the update interval to be). Use cron or something like it to handle this:
osmdbt-get-log --catchup
To create an OSM change file from the log, call
osmdbt-create-diff
To disable replication, use:
osmdbt-disable-replication
For more details see the individual man pages.
This section describes how everything is supposed to work in detail. The whole process is controlled from one shell script run once per minute. It looks something like this:
#!/bin/sh
set -e
osmdbt-catchup
osmdbt-get-log
# optionally copy log file(s) to other hosts
osmdbt-catchup
osmdbt-create-diff
If there are complete log files left over from a crash, they will be in the
log_dir
directory and named *.log
.
osmdbt-catchup
is called without command line arguments. It finds those
left-over log files and tells the PostgreSQL database the largest of the LSNs
so that the database can "forget" all changes before that.
If there was no crash, no such log files are found and osmdbt-catchup
does
nothing.
Now osmdbt-get-log
is called which creates a log file in the log_dir
named
something like osm-repl-2020-03-18T14:18:49Z-lsn-0-1924DE0.log
. The file is
first created with the suffix .new
, synced to disk, then renamed and the
directory is synced.
If any of these steps fail or if the host crashes, a .new
file might be
left around, which should be flagged for the sysadmin to take care of. The
file can be removed without loosing data, but the circumstances should be
reviewed in case there is some systematic problem.
All files named *.log
in the log_dir
can now be copied (using scp or
rsync or so) to a separate host for safekeeping. These will only be used if
the local host crashes and log files on its disk are lost. In this case
manual intervention is necessary.
Now osmdbt-catchup
is called to catch up the database to the log file just
created in step 2.
If the system crashes in step 2, 3, or 4 a log file might be left around without the database being updated. In this case step 1 of the next cycle will pick this up and do the database update.
Now osmdbt-create-diff
is called which reads any log files in the log_dir
and creates replication diff files. Files are first created in the tmp_dir
directory and then moved into place in the changes_dir
and its
subdirectories. osmdbt-create-diff
will also read the state.txt
in the
changes_dir
file and create a new one. See the manual page for
osmdbt-create-diff
for the details on how this is done exactly.
- The programs
osmdbt-get-log
,osmdbt-fake-log
, andosmdbt-catchup
use the same PID/lock filerun_dir/osmdbt-log
making sure that only one of them is running. - The program
osmdbt-create-diff
uses a different PID/lock file, it can run in parallel to the other programs, but only one copy of it will run. osmdbt-create-diff
can handle any number of log files, so if it is not run for a while it will recover by reading all log files it finds and creating one replication diff file with the (sorted) data from all of them.- All programs will write files under different names or in separate directories, sync the files and only then move them into place atomically. After that directories are synced.
- After diff and state files are created in the
tmp_dir
, before they are moved to thechanges_dir
, a fileosmdbt-create-diff.lock
is created intmp_dir
. This is removed afterosmdbt-create-diff
finished moving all state and diff files to their final place and appending the.done
suffix to the log file(s). If this is left lying around,osmdbt-create-diff
will not run any more and the sysadmin needs to intervene.
When run in production you should regularly
- remove old log files marked as done (files in
log_dir
named*.log.done
) - remove log file copies you might have made on a separate host
To make sure everything runs smoothly, the age of the PID files can be checked
(should never be more than a few seconds) and the existence of older (more
than a minute or so) log files named *.log.new
.
Copyright (C) 2021-2022 Jochen Topf (jochen@topf.org)
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
This program was written and is maintained by Jochen Topf (jochen@topf.org).