-
Notifications
You must be signed in to change notification settings - Fork 55
[PoC] Speed up reading #1709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[PoC] Speed up reading #1709
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
20d8d45 to
dd3873d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This will panic for bad font data, but good for measurements in HarfRust.
This change is not necessary for performance gains in my measurement.
Invalid memory access on bad font data, but useful for measuring HarfRust performance boost.
|
So this is doing two things: it is skipping any validation of the input data, and it is subsequently doing unsafe reads. I don't see that this is actually doing any access checks in the getters? I do see the benefit of trying to do bounds checking on some full table graph instead of always needing to do it on each read, I can try to play around with what that might look like.. |
Right. Here's some unhooked vibe-coded |
|
This is what I propose: an alternate reading path, full-on HarfBuzz style: All the simple codegen'ed tables & structs get a OpenType has the following wording:
https://learn.microsoft.com/en-us/typography/opentype/spec/otff To implement this, I suggest that we adopt the HarfBuzz null-object model, whereas if a null offset is dereferenced, a pointer to a shared singleton null object of that type is returned instead, removing codepath divergence in the callers for when an offset is null vs points to an empty object. |
|
So the problem I'm seeing with the sanitize approach is that we have a TOCTOU problem unless you retain a sanitized reference to the whole checked subtree. For HR, this means that we have to run the sanitize pass every time we construct a This might not matter for Chrome because I imagine we'll keep the In either case, sanitize and read cannot really be separate-- we need to sanitize on read to guarantee safety at the API level. |
With mmaped files, you can't get around TOCTOU since the data can change under you anyway. Just saying. Sanitizing per Shaper is fine I think. As long as it's not per shape() call, we should be good. |
Point taken. I believe there are still ongoing arguments about the combination of mmap and Rust. Same for anything else that lets you poke at process memory. |
This is the result of a few days vibe-coding with codex. This moves access checks from table construction time, to each individual fields's getter method.
If there's out of bounds access, this code crashes. We can see if we can codegen
sanitizemethods ala HB to address that and reject at font table level bad data.Accompanying HR PR: harfbuzz/harfrust#314
Experiment details:
https://docs.google.com/document/d/1LjYFjZj8Kw8zyhqsfZg0_VgzHf2zhYUsksPCkZL7GJI/edit