You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Optimizes the turbo-persistence compaction and iteration paths with several targeted improvements:
Iterator optimizations
Flatten index block iteration — The iterator previously used a Vec<CurrentIndexBlock> stack, but SST files have exactly one index level. Inline the index block fields (index_entries, index_block_count, index_pos) directly into StaticSortedFileIter, eliminating the stack allocation and Option overhead.
Non-optional CurrentKeyBlock — Parse the first key block during try_into_iter() construction so current_key_block is always populated, removing the Option<CurrentKeyBlock> wrapper and its take()/Some() ceremony in the hot loop.
Replace ReadBytesExt with direct byte indexing — In handle_key_match, parse_key_block, and next_internal, replace val.read_u16::<BE>() etc. with u16::from_be_bytes(val[0..2].try_into().unwrap()). This eliminates the trait dispatch overhead, dead error-handling code, and mutable slice pointer advancement.
Extract read_offset_entry helper — Read type + offset from the key block offset table in a single u32 load + shift, replacing two separate ReadBytesExt calls.
Refcounting optimization
Introduce RcBytes — Thread-local byte slice type using Rc instead of Arc, eliminating atomic refcount overhead during single-threaded SST iteration. The iteration path (StaticSortedFileIter) now produces RcBytes slices backed by an Rc<Mmap>, so per-entry clone/drop operations are plain integer increments rather than atomic operations.
Merge iterator simplification
Optimize MergeIter::next h — Replaced the straightforwards pop/push pattern with PeekMut-based replace-top pattern, which means we only need to adjust the heap once per iteration instead of twice.
Benchmark results
Compaction benchmarks (key_8/value_4/entries_16.00Mi/commits_128), canary baseline vs optimized:
Benchmark
Baseline
Optimized
Change
partial compaction
1.985 s
1.545 s
-22%
full compaction
2.068 s
1.544 s
-25%
On my machine, now we rarely hit 100% cpu usage during compaction (compaction is single threaded), so we can assume we are mostly IO bound.
Test plan
cargo test -p turbo-persistence — 60/60 tests passing
Compaction benchmarks run and compared against canary baseline
lukesandberg
changed the title
improve compaction cpu
[turbopack] Optimize compaction cpu usage
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes the turbo-persistence compaction and iteration paths with several targeted improvements:
Iterator optimizations
Flatten index block iteration — The iterator previously used a
Vec<CurrentIndexBlock>stack, but SST files have exactly one index level. Inline the index block fields (index_entries,index_block_count,index_pos) directly intoStaticSortedFileIter, eliminating the stack allocation andOptionoverhead.Non-optional
CurrentKeyBlock— Parse the first key block duringtry_into_iter()construction socurrent_key_blockis always populated, removing theOption<CurrentKeyBlock>wrapper and itstake()/Some()ceremony in the hot loop.Replace
ReadBytesExtwith direct byte indexing — Inhandle_key_match,parse_key_block, andnext_internal, replaceval.read_u16::<BE>()etc. withu16::from_be_bytes(val[0..2].try_into().unwrap()). This eliminates the trait dispatch overhead, dead error-handling code, and mutable slice pointer advancement.Extract
read_offset_entryhelper — Read type + offset from the key block offset table in a singleu32load + shift, replacing two separateReadBytesExtcalls.Refcounting optimization
RcBytes— Thread-local byte slice type usingRcinstead ofArc, eliminating atomic refcount overhead during single-threaded SST iteration. The iteration path (StaticSortedFileIter) now producesRcBytesslices backed by anRc<Mmap>, so per-entry clone/drop operations are plain integer increments rather than atomic operations.Merge iterator simplification
MergeIter::nexth — Replaced the straightforwardspop/pushpattern withPeekMut-based replace-top pattern, which means we only need to adjust the heap once per iteration instead of twice.Benchmark results
Compaction benchmarks (
key_8/value_4/entries_16.00Mi/commits_128), canary baseline vs optimized:On my machine, now we rarely hit 100% cpu usage during compaction (compaction is single threaded), so we can assume we are mostly IO bound.
Test plan
cargo test -p turbo-persistence— 60/60 tests passing