Skip to content

Latest commit

 

History

History
55 lines (34 loc) · 2.84 KB

ARCHITECTURE.md

File metadata and controls

55 lines (34 loc) · 2.84 KB

Architecture

This document records this project's design principles. Inspired by https://matklad.github.io/2021/02/06/ARCHITECTURE.md.html

Fun optimizations

The speed and memory footprint of image reference parsing is highly unlikely to ever matter to a program handling gigabytes of images. However, optimizing the parsing is fun, and fun is encouraged in this repository.

Parse ascii bytes

This library takes an input ascii string (a slice of bytes) and parses the lengths of each of the sections of an image reference. Using ascii only avoids allocating unicode chars which each weigh 4 bytes.

Avoid backtracking

Re-parsing bytes costs time and memory. Peeking one byte ahead is ok. Re-parsing sections on error to find an invalid character is also ok as long as the benchmarks don't regress.

Keep only one copy of a string slice

&strs are expensive: they cost 2 usizes. Prefer holding one &str and many short lengths in-memory, then splitting new &strs using the lengths on-demand.

Store short lengths

Use the smallest unsigned integer size that can represent the length of a section of an image reference. Since most sections of an image reference are under 255 ascii characters long, most lengths can be represented using a u8. The encoded section of the digest is technically unbounded, but practically can be measured with a u16.

All lengths are implicitly optional

Since all lengths can be 0, treat 0 as the None value rather than using extra space for an Option<Length>. Temporarily converting a length to an Option<length> is ok, since it's roughly equivalent to using a temporary bool while checking len == 0.

debug mode

Record invariants using debug_assert!(..) instead of assert!(..) to avoid extra computation in release mode. Put extra debugging variables behind #[cfg(debug_assertions)] conditional-compilation macros.

0 dependencies

To keep the library size small and keep ownership of all of the relevant logic.

I chose not to use the excellent regex crate since:

  1. writing the parsers as pure functions avoids issues of cross-thread resource contention.
  2. I think regex relies on pointer-sized offsets for capture groups, which cancels out the short-length optimizations. A scan through the regex and regex-automata docs and issues didn't reveal a way to use u8s . If you know a way to get regex to use custom offset sizes, please let me know in this repo's issues!