From 5e0424068b54c56d5782b7fd6b8b3f4578b9b6fc Mon Sep 17 00:00:00 2001 From: Colin Dellow Date: Sun, 3 Dec 2023 14:46:42 -0500 Subject: [PATCH] warn the user when using sub-optimal PBFs Tilemaker's documentation suggests that users can get PBFs from BBBike: https://github.com/systemed/tilemaker/blob/1da4be97dc4ac7f11daf3f417c7ca0a6a34ae47f/docs/VECTOR_TILES.md?plain=1#L25 BBBike uses osmconvert to generate PBFs. Reading through the osmconvert code [1], osmconvert will pack up to 31 MB of data into each block. PBFs from Geofabrik use Osmium, which defaults to packing 8,000 objects into each block, resulting in blocks that are more like ~60 KB in size. More, smaller blocks are better for Tilemaker: - fewer blocks means less opportunity to use multiple cores - larger blocks means higher baseline memory requirement I emailed BBBike's maintainer. He wants to one day move to Osmium, and is understandably not keen on patching osmconvert in the interim. He did point out that a user can just run `osmium cat` on a PBF from BBBike to rejig its innards. This PR detects when a PBF is suboptimal, warns the user, and provides an explanation of how to fix it. For one of my BBBike PBFs, this results in processing time dropping from 90 seconds to 28 seconds. (`osmium cat` itself only takes 17 seconds, and in any case, only has to be run a single time.) [1]: http://m.m.i24.cc/osmconvert.c --- src/read_pbf.cpp | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/src/read_pbf.cpp b/src/read_pbf.cpp index 5b1795cb..5745e9ed 100644 --- a/src/read_pbf.cpp +++ b/src/read_pbf.cpp @@ -311,8 +311,12 @@ int PbfReader::ReadPbfFile(unordered_set const &nodeKeys, unsigned int t std::map > blocks; + // Track the filesize - note that we can't rely on tellg(), as + // its meant to be an opaque token useful only for seeking. + size_t filesize = 0; while (true) { BlobHeader bh = readHeader(*infile); + filesize += bh.datasize(); if (infile->eof()) { break; } @@ -327,6 +331,21 @@ int PbfReader::ReadPbfFile(unordered_set const &nodeKeys, unsigned int t std::size_t total_blocks = blocks.size(); + // PBFs generated by Osmium have 8,000 entities per block, + // and each block is about 64KB. + // + // PBFs generated by osmconvert (e.g., BBBike PBFs) have as + // many entities as fit in 31MB. Each block is about 16MB. + // + // Osmium PBFs seem to be processed about 3x faster than osmconvert + // PBFs, so try to hint to the user when they could speed up their + // pipeline. + if (filesize / total_blocks > 1000000) { + std::cout << "warning: PBF has very large blocks, which may slow processing" << std::endl; + std::cout << " to fix: osmium cat -f pbf your-file.osm.pbf -o optimized.osm.pbf" << std::endl; + } + + std::vector all_phases = { ReadPhase::Nodes, ReadPhase::RelationScan, ReadPhase::Ways, ReadPhase::Relations }; for(auto phase: all_phases) { // Launch the pool with threadNum threads