[PoC] Speed up reading #1709

behdad · 2025-12-30T23:30:09Z

This is the result of a few days vibe-coding with codex. This moves access checks from table construction time, to each individual fields's getter method.

If there's out of bounds access, this code crashes. We can see if we can codegen sanitize methods ala HB to address that and reject at font table level bad data.

Accompanying HR PR: harfbuzz/harfrust#314

Experiment details:
https://docs.google.com/document/d/1LjYFjZj8Kw8zyhqsfZg0_VgzHf2zhYUsksPCkZL7GJI/edit

This will panic for bad font data, but good for measurements in HarfRust.

This change is not necessary for performance gains in my measurement.

Invalid memory access on bad font data, but useful for measuring HarfRust performance boost.

cmyr · 2026-01-14T16:46:26Z

So this is doing two things: it is skipping any validation of the input data, and it is subsequently doing unsafe reads.

I don't see that this is actually doing any access checks in the getters?

I do see the benefit of trying to do bounds checking on some full table graph instead of always needing to do it on each read, I can try to play around with what that might look like..

behdad · 2026-01-14T16:56:31Z

I do see the benefit of trying to do bounds checking on some full table graph instead of always needing to do it on each read, I can try to play around with what that might look like..

Right. Here's some unhooked vibe-coded sanitize methods:
f9aeba4#diff-60c87c44076fdcbdd9d4c569cb2cebed11c15a8c3c7ab4a076f5380e747170f6

behdad · 2026-01-14T18:45:45Z

This is what I propose: an alternate reading path, full-on HarfBuzz style:

All the simple codegen'ed tables & structs get a sanitize() method, that returns a result object in the sanitized namespace. If the result is ok, then the entire subtree can be accessed infallibly and without bounds checks. The object is simply a typed pointer.

OpenType has the following wording:

In some cases, fields for subtable offsets are documented as permitting NULL values when the given subtable is optional. For example, in the BASE table header, the horizAxisOffset and vertAxisOffset fields may be NULL, meaning that either subtable (or both) is optional. A NULL subtable offset always indicates that the given subtable is not present. Applications must never interpret a NULL offset value as the offset to subtable data. For cases in which subtable offset fields are not documented as permitting NULL values, font compilers must include a subtable of the indicated format, even if it is a header stub without further data (for example, a coverage table with no glyph IDs). Applications parsing font data should, however, anticipate non-conformant font data that has a NULL subtable offset where only a non-NULL value is expected.

https://learn.microsoft.com/en-us/typography/opentype/spec/otff

To implement this, I suggest that we adopt the HarfBuzz null-object model, whereas if a null offset is dereferenced, a pointer to a shared singleton null object of that type is returned instead, removing codepath divergence in the callers for when an offset is null vs points to an empty object.

cc @raphlinus @dfrg @rsheeter

dfrg · 2026-01-14T23:59:03Z

So the problem I'm seeing with the sanitize approach is that we have a TOCTOU problem unless you retain a sanitized reference to the whole checked subtree.

For HR, this means that we have to run the sanitize pass every time we construct a Shaper since we receive a fresh reference to the font data and that might not be the same one that was used to initialize the shaping caches. In the current state, if a user provides different font data, the shaping will be incorrect but the code will still be memory safe.

This might not matter for Chrome because I imagine we'll keep the Shaper type around anyway (the HB/HR code currently does this) so it might be good enough.

In either case, sanitize and read cannot really be separate-- we need to sanitize on read to guarantee safety at the API level.

behdad · 2026-01-15T00:55:00Z

So the problem I'm seeing with the sanitize approach is that we have a TOCTOU problem unless you retain a sanitized reference to the whole checked subtree.

With mmaped files, you can't get around TOCTOU since the data can change under you anyway. Just saying.

Sanitizing per Shaper is fine I think. As long as it's not per shape() call, we should be good.

dfrg · 2026-01-15T01:13:49Z

With mmaped files, you can't get around TOCTOU since the data can change under you anyway. Just saying.

Point taken. I believe there are still ongoing arguments about the combination of mmap and Rust. Same for anything else that lets you poke at process memory.

behdad requested review from cmyr and dfrg December 30, 2025 23:30

This comment was marked as outdated.

Sign in to view

behdad force-pushed the unchecked-read branch from 20d8d45 to dd3873d Compare December 31, 2025 04:19

This comment was marked as outdated.

Sign in to view

behdad force-pushed the unchecked-read branch from fa51ac0 to 41e696b Compare January 1, 2026 02:43

behdad mentioned this pull request Jan 4, 2026

[PoC] Companion to Fontations change to speedup reading. harfbuzz/harfrust#314

Open

behdad force-pushed the unchecked-read branch from 41e696b to 6c6fd71 Compare January 4, 2026 23:17

behdad added 4 commits January 4, 2026 16:21

[codegen/read-fonts] Part 1 of remove bounds-checking from constructors

3180fb0

This will panic for bad font data, but good for measurements in HarfRust.

[codegen/read-fonts] Part 2 of remove bounds checking from constructors

e1f32b4

Remove unused shape from TableRef

0dd615b

This change is not necessary for performance gains in my measurement.

[code-gen/read-fonts] Use unsafe unchecked access across all reads

b71a265

Invalid memory access on bad font data, but useful for measuring HarfRust performance boost.

behdad force-pushed the unchecked-read branch from 6c6fd71 to b71a265 Compare January 4, 2026 23:22

cmyr mentioned this pull request Jan 16, 2026

[read] Add unsafe read for scalars and arrays #1715

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PoC] Speed up reading #1709

[PoC] Speed up reading #1709

behdad commented Dec 30, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

cmyr commented Jan 14, 2026

Uh oh!

behdad commented Jan 14, 2026

Uh oh!

behdad commented Jan 14, 2026 •

edited

Loading

Uh oh!

dfrg commented Jan 14, 2026

Uh oh!

behdad commented Jan 15, 2026

Uh oh!

dfrg commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PoC] Speed up reading #1709

Are you sure you want to change the base?

[PoC] Speed up reading #1709

Conversation

behdad commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

cmyr commented Jan 14, 2026

Uh oh!

behdad commented Jan 14, 2026

Uh oh!

behdad commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfrg commented Jan 14, 2026

Uh oh!

behdad commented Jan 15, 2026

Uh oh!

dfrg commented Jan 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

behdad commented Dec 30, 2025 •

edited

Loading

behdad commented Jan 14, 2026 •

edited

Loading