-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics about file size #59
Comments
Data size depends on the compression used. The best compression you would get by using something like Shuffly ( Here are some numbers for different compressions/raw with and without the optional OSM identifier subarchive:
Interestingly osmflat is much smaller than pbf when using shuffly + zstd (any replacement for zstd would work, as shuffly makes data more compressible for any dictionary based algorithm), even though that was not the main goal (performance / random access was). Regarding performance: Another benefit of having random access to data is that parallelizing processing is much more trivial. The biggest downside would be that it requires a larger disk footprint after downloading. Being built upon the cross-language IDL flatdata also has its benefits: No manual code shifting around bits/etc is needed, multiple languages are supported fromt he get-go, and each archive is self-describing. |
Thank you for your detailed reply. I defined my own binary format "FlatMap" many years ago, refined it over the years and published as it at FOSSGIS 2022 conference. Size is
I use it uncompressed via memory mapping which gives (below) microsecond access to nodes/ways/relations . Only for transport I would compress it. It holds exactly the OSM data as in the planet.pbf but no metadata, but puts locations into ways, keeping nodeids, for development and debugging purposes. It is not a geo but an OSM format which also manifests in the 4 byte = 100 nanodegree resolution for lon/lat. |
Nice! Having only 70GB "at rest" can make It looks like the biggest difference between If you want to we could set up a simple benchmark (e.g. building a routing graph), and test it on all 3 formats? |
Thank you, looks like we are technically on the same level and did some similar and some different decisions. It would be fruitful to exchange and compare. I am busy with other things and will come back here later. cheers |
FYI: #70 makes the schema a bit more compact (especially if compressed with shuffly). |
What size does such file have if it holds the data from a planet.pbf file?
Are there any other metrics you can give to estimate performance or size?
Thank you
The text was updated successfully, but these errors were encountered: