Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend NextRelation/FindInRelation to nodes #632

Merged
merged 13 commits into from
Jan 7, 2024

Conversation

cldellow
Copy link
Contributor

@cldellow cldellow commented Jan 2, 2024

This also makes NextRelation return a tuple of ID, role rather than just ID.

According to https://www.lua.org/pil/5.1.html, I think that's not a breaking change.

Motivation: When I naively try to create labels from OSM entities with the place tag, I occasionally get duplicates due to relations.

For example, the city of Guelph has relation 7486148, which has node 36576733 as its label member.

They both have place=city, so my Lua script faithfully spits out two labels.

This PR allows me to suppress the label from the node, which is a start.

Ideally, I'd actually prefer to use the node, as it'll likely have a nicely human-curated location. For that, I'd need the relation to be able to interrogate its members.

Would you be open to me extending this PR to add NextMember, FindInMember and RestartMembers functions that mirror the ones used by nodes/ways ?

This also makes `NextRelation` return a tuple of ID, role rather than
just ID.

According to https://www.lua.org/pil/5.1.html, I think that's not a breaking change.

Motivation: When I naively try to create labels from OSM entities with the `place`
tag, I occasionally get duplicates due to relations.

For example, the city of Guelph has [relation
7486148](https://www.openstreetmap.org/relation/7486148), which has
[node 36576733](https://www.openstreetmap.org/node/36576733) as its
`label` member.

They both have `place=city`, so my Lua script faithfully spits out two
labels.

This PR allows me to suppress the label from the node, which is a start.

Ideally, I'd actually prefer to use the node, as it'll likely have a
nicely human-curated location. For that, I'd need the relation to be
able to interrogate its members.

Would you be open to me extending this PR to add `NextMember`,
`FindInMember` and `RestartMembers` functions that mirror the ones used
by nodes/ways ?
@systemed
Copy link
Owner

systemed commented Jan 2, 2024

Thanks! Reading roles, and extending support to nodes, is a big win. Providing NextMember etc. sounds good too.

Relations are generally a mess, and each type is basically sui generis. As a general rule I try to keep tag-specific logic out of the C++ code, and expect users to handle that in the Lua.

The only (reluctant) exceptions are multipolygon and boundary relations, and their constituent inner/outer roles, which are effectively geometries expressed in metadata (yuk) and therefore we have to give them special handling. (In an ideal world we'd have an OSM <Area> datatype rather than this kludge but there you go.) tilemaker transparently handles multipolygon relations as ways, and boundary relations as relations but with the multipolygon geometry correctly generated.

So I'm a little anxious about hard-coding admin_centre and label - they are of course already a part of the hard-coded boundary relation but I'd like to keep that to the minimum possible. If NextMember etc. would allow us to do that via Lua, that would be good. One other option might be to add support to LayerAsCentroid, so you could write something like this

LayerAsCentroid("places", "label", "admin_centre")

which would write a node to the places layer, with the geometry taken from (a) a label member, (b) failing that, an admin_centre member, (c) failing that, an automatically generated centroid.

@cldellow
Copy link
Contributor Author

cldellow commented Jan 2, 2024

Great feedback. I'll rework this:

  • keep NextRelation / FindInRelation for nodes, but remove knowledge of specific roles from C++ code; instead, blindly expose the role to Lua via NextRelation without checking its content
  • LayerAsCentroid(...) will accept an optional list of roles as hints for which point to emit
  • don't bother with NextMember, FindInMember, RestartMembers for now

Something that gives me pause: I am separately considering implementing support for polylabel. I think this would address issues #392 and #447 (and perhaps lessen the urgency of #467).

I hadn't thought about how that would surface to the user, though. Did you have anything in mind?

One option in line with what you'd proposed here would be:

LayerAsCentroid("places", "label", "admin_centre", "polylabel")

But I wonder if the user would almost always want polylabel as the fallback? So maybe LayerAsCentroid should just default to that. But then it's a bit bizarre that we have centroid in the name, so perhaps LayerAsCentroid should be renamed to LayerAsPoint. But then that's a breaking change, and it'd be worth trying to make it happen before pushing out v3.

@systemed
Copy link
Owner

systemed commented Jan 2, 2024

Polylabel support would be excellent. Maybe something like

LayerAsPoint("places", strategy, "admin_centre", "label" [...])

where the 2nd parameter onwards are optional, and strategy is an enum for the calculation algorithm: 0 for polylabel, 1 for centroid, 2 for Boost.Geometry's point_on_surface, and so on. Lua doesn't actually have enums of course, but we could potentially define constants somewhere.

I dunno what I was thinking, this is much better. (And I see now that
there are many more roles than these four.)
@cldellow
Copy link
Contributor Author

cldellow commented Jan 3, 2024

That ended up being pretty straightforward. No more duplicate labels, and I can now use the label node when present -- awesome.

Ugh, I guess this could be considered a breaking change after all.

If a user previously passed `LayerAsCentroid("layername", false)`,
we'd ignore the false.

With this change, the false causes us to fail, because we expect a
string.

I'd normally just say this was undefined behaviour, and those users
deserve to be broken...but it's tricky, since this is a script from the
official tilemaker repo.
@cldellow
Copy link
Contributor Author

cldellow commented Jan 3, 2024

D'oh, I guess this could be considered a breaking change after all: 154d3ab

@systemed
Copy link
Owner

systemed commented Jan 3, 2024

Looks good! I'm not too worried about breaking undocumented behaviour even if we do erroneously use that behaviour ourselves (oops) - and I think we're on the verge of 3.0 anyway.

Passing thought - is there any benefit in using some sort of table, or PooledString maybe, for relation roles rather than std::string? A quick check with taginfo suggests there's only 129 roles in widespread use (https://gist.github.com/systemed/29ea4c8d797a20dcdffee8ba907d62ea). (taginfo doesn't show roles for relations used <100 times.)

@cldellow
Copy link
Contributor Author

cldellow commented Jan 3, 2024

Passing thought - is there any benefit in using some sort of table, or PooledString maybe, for relation roles rather than std::string?

I guess there are two concerns:

  1. Memory used by the std::string in relationsForWays and relationsForNodes
  2. Runtime impact of Lua interop

Re 1 - these don't spill to disk and are each taking 24 bytes. PooledString would get us down to 16 bytes, so save 8 bytes. Or we could do some custom mapping that could fit in a short for 2 bytes. So if we did the most aggressive thing, we could save 22 bytes per relation member.

The UK has 3.9M relation members in a 1.6GB PBF. If I scale that to a 72GB planet file, that suggests 175M relation members, so the savings could be as much as 3.8GB.

Yeah, that's probably worth doing. I'll have a think on how to do it without introducing a lot of locking.

Re 2 - for the most popular roles, I think if we saved a reference to a string in the Lua stack, we could avoid the expense of Lua's defensive copies. Relation roles are a good candidate for this, as they're a small set used many times. I expect this to be relatively small compared to other Lua interop costs we're paying, so wouldn't do anything about it immediately.

@systemed
Copy link
Owner

systemed commented Jan 3, 2024

Yes - it was memory I was mostly thinking of, in particular for route relations. Way/node ID, plus an index into a vector of strings, should fit in a 64-bit struct.

cldellow added a commit to cldellow/tilemaker that referenced this pull request Jan 3, 2024
This is just the translation of the algorithm and some smoke testing
that it seems to do the right thing.

It's not integrated into Tilemaker yet, that's blocked on systemed#632
These seems to work well and, at least for large polygons (city parks,
national parks), is faster than Boost's centroid algorithm. That surprised
me! I haven't benchmarked it on building polygons yet.

There are several todos here around making it configurable, and making
it play well with lazy geometries.

Going to finish off the relation memory stuff then fix those.
You can now pass the preferred algorithm to use to LayerAsCentroid.

Still to do: teach lazy geometries about which algorithm was used.
This ensures that the user gets the same, correct behaviour both in
--materialize-geometries and --lazy-geometries.

We can extend support to materialize geometries, but this PR is already
getting big (and it has conflicts with the other 2 PRs), so I'm leery of
getting further out over my skis.
@cldellow
Copy link
Contributor Author

cldellow commented Jan 4, 2024

Updated to reduce memory use for tracking roles, and to add polylabel support.

The parks and forests in the Appalachians of America are a little better now. Note the locations of Cherokee National Forest and Pisgah National Forest. (This is a truncated extract - on the full map, Pisgah would likely be in a different location.)

Algorithm = Centroid:
Selection_591

Algorithm = Polylabel:
Selection_592

This switches polylabel to be the default.

There are some areas for improvement:

  • I haven't yet benchmarked the performance implications of using polylabel everywhere

  • --lazy-geometries materializes the centroid when the algorithm is polylabel

  • I've left the name as LayerAsCentroid, not LayerAsPoint

  • The user passes the algorithm as a string. I think this is probably fine? As you point out, we do control the runtime environment, so we could predeclare constants like CENTROID and POLYLABEL if we felt that was a better user experience.

These areas can all be improved -- I'm a little hesitant to do it while there are other Lua PRs outstanding that also have perf implications and will have code conflicts with this set of changes.

Maybe we should start merging the Lua things into a shared branch, or start merging them to master?

@cldellow
Copy link
Contributor Author

cldellow commented Jan 4, 2024

Reading through https://github.com/mapbox/polylabel/issues, I think there's also some risk of infinite loops if we don't pick the precision parameter as a function of the input geometry (and/or if we don't use centroid as a fallback for certain degenerate geometries).

It seemed to work for the areas I tried. I think before cutting a new release, I'd want to run tilemaker with all the Lua changes on the planet in order to confirm there are no regressions in runtime or memory usage.

@cldellow cldellow marked this pull request as draft January 4, 2024 03:57
@cldellow cldellow marked this pull request as ready for review January 4, 2024 04:19
@systemed
Copy link
Owner

systemed commented Jan 4, 2024

Excellent - that’s a great improvement in the screenshot. I’m AFK today/tomorrow but will run it on the planet at the weekend unless you beat me to it!

@systemed
Copy link
Owner

systemed commented Jan 6, 2024

I've just run this (only) over the planet and it works really well. No hangs and I couldn't see any impact on runtime speed. (Planet completed in 4h38m53 with no runtime options other than --store; that's about an hour faster than before #623.)

Label placement is pretty good. Here's Rutland Water, which is U-shaped. Before, it's over the peninsula:

Screenshot 2024-01-06 at 17 18 31

After, it's in the water as it should be:

Screenshot 2024-01-06 at 17 18 43

We don't get all the country labels we'd like at lower zoom levels (particularly the USA) but this is because the points are near tile boundaries - so more of a style/rendering issue than a tile generation issue. At some point in the 3.X series I'd like to ship a new, better style (possibly to go with Shortbread rather than OMT) but that's for the future!

@cldellow
Copy link
Contributor Author

cldellow commented Jan 6, 2024

Great, thanks for running it on the planet!

We don't get all the country labels we'd like at lower zoom levels (particularly the USA) but this is because the points are near tile boundaries - so more of a style/rendering issue than a tile generation issue.

I see this, too, using OSM Bright:

Selection_597

For OSM Bright, at least, I think the issue is unrelated to polylabel.

It looks like the OMT Lua script only writes to the place layer in the node_function:

node:Layer("place", false)
node:Attribute("class", place)
node:MinZoom(mz)
if rank then node:AttributeNumeric("rank", rank) end
if capital then node:AttributeNumeric("capital", capital) end
if place=="country" then node:Attribute("iso_a2", node:Find("ISO3166-1:alpha2")) end

OSM Bright only draws a rank 1 country label if there is an iso_a2 attribute: https://github.com/openmaptiles/osm-bright-gl-style/blob/8af4769692d0f9219d0936711609d580b34bf365/style.json#L2391-L2402

iso_a2 comes from ISO3166-1:alpha2 in OSM.

For Canada, both its relation and its label node have such an attribute.

For the US, only its relation has that attribute, not its label node.

Hey, that's exactly the sort of thing that this PR can help with. I've pushed a commit that uses the new support for FindInRelation in nodes to propagate it from the relation if needed, and now USA shows up in OSM Bright.

@systemed
Copy link
Owner

systemed commented Jan 7, 2024

Well hunted! Think this is ready to go?

@cldellow
Copy link
Contributor Author

cldellow commented Jan 7, 2024

Yup!

@systemed systemed merged commit a600524 into systemed:master Jan 7, 2024
5 checks passed
@systemed
Copy link
Owner

systemed commented Jan 7, 2024

Superb - thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants