Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lua: use non-member functions for interop #589

Closed
wants to merge 7 commits into from

Conversation

cldellow
Copy link
Contributor

Currently, Tilemaker uses member functions for interop:

function node_function(node)
    node:Layer(...)

This PR changes Tilemaker to use global functions:

function node_function()
    Layer(...)

The chief rationale is performance. Every member function call needs to push an extra pointer onto the stack when crossing the Lua/C++ boundary.

Kaguya serializes this pointer as a Lua userdata. That means every call into Lua has to malloc some memory, and every call back from Lua has to dereference through this pointer.

And there are a lot of calls! For OMT on the GB extract, I counted ~1.4B calls from Lua into C++.

A secondary rationale is that a global function is a bit more honest. A user might believe that this is currently permissible:

last_node = nil
function node_function(node)
    if last_node ~= nil
        -- do something with last_node
    end

    -- save the current node for later, for some reason
    last_node = node

But in reality, the OSM objects we pass into Lua don't behave quite like Lua objects. They're backed by OsmLuaProcessing, who will move on, invalidating whatever the user thinks they've got a reference to.

This PR has a noticeable decrease in reading time for me, measured on the OMT profile for GB, on a 16-core computer:

Before:

real	1m28.230s
user	19m30.281s
sys	0m29.610s

After:

real	1m21.728s
user	17m27.150s
sys	0m32.668s

The tradeoffs:

  • anyone with a custom Lua profile will need to update it, although the changes are fairly mechanical

  • Tilemaker now reserves several functions in the global namespace, causing the potential for conflicts

systemed and others added 7 commits December 4, 2023 21:48
Currently, Tilemaker uses member functions for interop:

```lua
function node_function(node)
    node:Layer(...)
```

This PR changes Tilemaker to use global functions:

```lua
function node_function()
    Layer(...)
```

The chief rationale is performance. Every member function call needs to
push an extra pointer onto the stack when crossing the Lua/C++ boundary.

Kaguya serializes this pointer as a Lua userdata. That means every
call into Lua has to malloc some memory, and every call back from Lua
has to dereference through this pointer.

And there are a lot of calls! For OMT on the GB extract, I counted
~1.4B calls from Lua into C++.

A secondary rationale is that a global function is a bit more honest.
A user might believe that this is currently permissible:

```lua
last_node = nil
function node_function(node)
    if last_node ~= nil
        -- do something with last_node
    end

    -- save the current node for later, for some reason
    last_node = node
```

But in reality, the OSM objects we pass into Lua don't behave quite
like Lua objects. They're backed by OsmLuaProcessing, who will move
on, invalidating whatever the user thinks they've got a reference to.

This PR has a noticeable decrease in reading time for me, measured
on the OMT profile for GB, on a 16-core computer:

Before:
```
real	1m28.230s
user	19m30.281s
sys	0m29.610s
```

After:
```
real	1m21.728s
user	17m27.150s
sys	0m32.668s
```

The tradeoffs:

- anyone with a custom Lua profile will need to update it, although the
  changes are fairly mechanical

- Tilemaker now reserves several functions in the global namespace,
  causing the potential for conflicts
Building a std::map for tags is somewhat expensive, especially when
we know that the number of tags is usually quite small.

Instead, use a custom structure that does a crappy-but-fast hash
to put the keys/values in one of 16 buckets, then linear search
the bucket.

For GB, before:
```
real 1m11.507s
user 16m49.604s
sys 0m17.381s
```

After:
```
real	1m9.557s
user	16m28.826s
sys	0m17.937s
```

Saving 2 seconds of wall clock and 20 seconds of user time doesn't
seem like much, but (a) it's not nothing and (b) having the tags
in this format will enable us to thwart some of Lua's defensive
copies in a subsequent commit.

A note about the hash function: hashing each letter of the string
using boost::hash_combine eliminated the time savings.
We (ab?)use kaguya's parameter serialization machinery. Rather than
take a `std::string`, we take a `KnownTagKey` and teach Lua how to
convert a Lua string into a `KnownTagKey`.

This avoids the need to do a defensive copy of the string when coming
from Lua.

It provides a modest boost:

```
real 1m8.859s
user 16m13.292s
sys 0m18.104s
```

Most keys are short enough to fit in the small-string optimization, so
this doesn't help us avoid mallocs. An exception is `addr:housenumber`,
which, at 16 bytes, exceeds g++'s limit of 15 bytes.

It should be possible to also apply a similar trick to the `Attribute(...)`
functions, to avoid defensive copies of strings that we've seen as keys
or values.
After:

```
real	1m8.124s
user	16m6.620s
sys	0m16.808s
```

Looks like we're solidly into diminishing returns at this point.
@cldellow
Copy link
Contributor Author

cldellow commented Dec 4, 2023

Closing to retarget against master

@cldellow cldellow closed this Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants