Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

warn the user when using sub-optimal PBFs #601

Merged
merged 1 commit into from
Dec 5, 2023

Conversation

cldellow
Copy link
Contributor

@cldellow cldellow commented Dec 3, 2023

Tilemaker's documentation suggests that users can get PBFs from BBBike:

You'll mostly use OpenStreetMap's openly-licensed map data to make your vector tiles. You get OSM data in a .pbf-format dump, rather than pulling it from an API. Several sites, such as [Geofabrik](https://download.geofabrik.de) and [BBBike](https://extract.bbbike.org), provide free country/city extracts of OSM data. (You can also download a dump of the whole planet [directly from OSM](https://planet.osm.org), but this is a massive file which is probably too much for tilemaker to process.)

BBBike uses osmconvert to generate PBFs. Reading through the osmconvert code 1, osmconvert will pack up to 31 MB of data into each block.

PBFs from Geofabrik use Osmium, which defaults to packing 8,000 objects into each block, resulting in blocks that are more like ~60 KB in size.

More, smaller blocks are better for Tilemaker:

  • fewer blocks means less opportunity to use multiple cores
  • larger blocks means higher baseline memory requirement

I emailed BBBike's maintainer. He wants to one day move to Osmium, and is understandably not keen on patching osmconvert in the interim.

He did point out that a user can just run osmium cat on a PBF from BBBike to rejig its innards.

This PR detects when a PBF is suboptimal, warns the user, and provides an explanation of how to fix it.

For one of my BBBike PBFs, this results in processing time dropping from 90 seconds to 28 seconds. (osmium cat itself only takes 17 seconds, and in any case, only has to be run a single time.)

Tilemaker's documentation suggests that users can get PBFs from BBBike:
https://github.com/systemed/tilemaker/blob/1da4be97dc4ac7f11daf3f417c7ca0a6a34ae47f/docs/VECTOR_TILES.md?plain=1#L25

BBBike uses osmconvert to generate PBFs. Reading through the osmconvert
code [1], osmconvert will pack up to 31 MB of data into each block.

PBFs from Geofabrik use Osmium, which defaults to packing 8,000 objects
into each block, resulting in blocks that are more like ~60 KB in size.

More, smaller blocks are better for Tilemaker:

- fewer blocks means less opportunity to use multiple cores
- larger blocks means higher baseline memory requirement

I emailed BBBike's maintainer. He wants to one day move to Osmium, and
is understandably not keen on patching osmconvert in the interim.

He did point out that a user can just run `osmium cat` on a PBF from
BBBike to rejig its innards.

This PR detects when a PBF is suboptimal, warns the user, and provides
an explanation of how to fix it.

For one of my BBBike PBFs, this results in processing time dropping
from 90 seconds to 28 seconds. (`osmium cat` itself only takes 17
seconds, and in any case, only has to be run a single time.)

[1]: http://m.m.i24.cc/osmconvert.c
@systemed systemed merged commit f409626 into systemed:master Dec 5, 2023
5 checks passed
@systemed
Copy link
Owner

systemed commented Dec 5, 2023

Good spot - I pretty much exclusively use Geofabrik PBFs so had never noticed this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants