perf(virtqueue): replace vec-based `MemPools` with bitmap-based `IndexAlloc` #2049

mkroening · 2025-11-04T08:19:03Z

This replaces the vec-based MemPool with a bitmap-based IndexAlloc. To track 256 indexes, we now need 32 bytes instead of 512 bytes. I am not sure if this is worth it, though.

These are measurements from my machine of creating the allocator, allocating all indices, and then deallocating them again:

len	size old	size new
256	512	32
1024	2048	64
2048	4096	128

github-actions

Benchmark Results

Benchmark	Current: `78d301c`	Previous: `9d1e4dd`	Performance Ratio
startup_benchmark Build Time	`112.64` s	`111.98` s	`1.01`
startup_benchmark File Size	`0.91` MB	`0.91` MB	`1.00`
Startup Time - 1 core	`0.92` s (`±0.03` s)	`0.94` s (`±0.02` s)	`0.98`
Startup Time - 2 cores	`0.93` s (`±0.02` s)	`0.94` s (`±0.03` s)	`0.99`
Startup Time - 4 cores	`0.93` s (`±0.02` s)	`0.94` s (`±0.02` s)	`0.99`
multithreaded_benchmark Build Time	`113.19` s	`112.12` s	`1.01`
multithreaded_benchmark File Size	`1.02` MB	`1.02` MB	`1.00`
Multithreaded Pi Efficiency - 2 Threads	`87.29` % (`±6.22` %)	`88.00` % (`±7.30` %)	`0.99`
Multithreaded Pi Efficiency - 4 Threads	`44.24` % (`±3.39` %)	`43.95` % (`±3.32` %)	`1.01`
Multithreaded Pi Efficiency - 8 Threads	`25.40` % (`±2.24` %)	`25.25` % (`±2.26` %)	`1.01`
micro_benchmarks Build Time	`293.42` s	`315.31` s	`0.93`
micro_benchmarks File Size	`1.02` MB	`1.02` MB	`1.00`
Scheduling time - 1 thread	`166.87` ticks (`±27.41` ticks)	`181.72` ticks (`±30.16` ticks)	`0.92`
Scheduling time - 2 threads	`101.62` ticks (`±22.94` ticks)	`107.77` ticks (`±18.94` ticks)	`0.94`
Micro - Time for syscall (getpid)	`10.25` ticks (`±4.71` ticks)	`13.22` ticks (`±5.19` ticks)	`0.77`
Memcpy speed - (built_in) block size 4096	`60354.71` MByte/s (`±43217.33` MByte/s)	`55204.30` MByte/s (`±40404.17` MByte/s)	`1.09`
Memcpy speed - (built_in) block size 1048576	`13715.40` MByte/s (`±11182.02` MByte/s)	`14269.94` MByte/s (`±12083.95` MByte/s)	`0.96`
Memcpy speed - (built_in) block size 16777216	`10001.37` MByte/s (`±8088.85` MByte/s)	`7485.41` MByte/s (`±6068.00` MByte/s)	`1.34`
Memset speed - (built_in) block size 4096	`60529.57` MByte/s (`±43362.49` MByte/s)	`55494.03` MByte/s (`±40595.71` MByte/s)	`1.09`
Memset speed - (built_in) block size 1048576	`14069.07` MByte/s (`±11388.11` MByte/s)	`14670.46` MByte/s (`±12334.51` MByte/s)	`0.96`
Memset speed - (built_in) block size 16777216	`10232.46` MByte/s (`±8219.09` MByte/s)	`7599.70` MByte/s (`±6128.76` MByte/s)	`1.35`
Memcpy speed - (rust) block size 4096	`54089.57` MByte/s (`±40501.99` MByte/s)	`51056.01` MByte/s (`±38411.57` MByte/s)	`1.06`
Memcpy speed - (rust) block size 1048576	`13805.14` MByte/s (`±11305.04` MByte/s)	`13857.34` MByte/s (`±11384.86` MByte/s)	`1.00`
Memcpy speed - (rust) block size 16777216	`10006.77` MByte/s (`±8110.63` MByte/s)	`7529.44` MByte/s (`±6166.85` MByte/s)	`1.33`
Memset speed - (rust) block size 4096	`54844.52` MByte/s (`±41047.82` MByte/s)	`51991.84` MByte/s (`±39176.17` MByte/s)	`1.05`
Memset speed - (rust) block size 1048576	`14068.24` MByte/s (`±11428.28` MByte/s)	`14087.35` MByte/s (`±11505.86` MByte/s)	`1.00`
Memset speed - (rust) block size 16777216	`10263.74` MByte/s (`±8266.58` MByte/s)	`7599.50` MByte/s (`±6197.41` MByte/s)	`1.35`
alloc_benchmarks Build Time	`293.71` s	`312.10` s	`0.94`
alloc_benchmarks File Size	`0.98` MB	`0.98` MB	`1.00`
Allocations - Allocation success	`100.00` %	`100.00` %	`1`
Allocations - Deallocation success	`100.00` %	`100.00` %	`1`
Allocations - Pre-fail Allocations	`100.00` %	`100.00` %	`1`
Allocations - Average Allocation time	`20171.66` Ticks (`±975.73` Ticks)	`20044.74` Ticks (`±1076.26` Ticks)	`1.01`
Allocations - Average Allocation time (no fail)	`20171.66` Ticks (`±975.73` Ticks)	`20044.74` Ticks (`±1076.26` Ticks)	`1.01`
Allocations - Average Deallocation time	`2914.05` Ticks (`±1268.86` Ticks)	`2980.59` Ticks (`±1256.07` Ticks)	`0.98`
mutex_benchmark Build Time	`292.81` s	`296.51` s	`0.99`
mutex_benchmark File Size	`1.02` MB	`1.02` MB	`1.00`
Mutex Stress Test Average Time per Iteration - 1 Threads	`36.14` ns (`±3.33` ns)	`36.10` ns (`±4.90` ns)	`1.00`
Mutex Stress Test Average Time per Iteration - 2 Threads	`30.00` ns (`±3.13` ns)	`29.58` ns (`±2.65` ns)	`1.01`

This comment was automatically generated by workflow using github-action-benchmark.

src/drivers/virtio/virtqueue/mod.rs

cagatay-y · 2025-11-06T16:24:15Z

src/drivers/virtio/virtqueue/mod.rs

+			for (word_index, word) in self.bits.iter_mut().enumerate() {
+				let trailing_ones = word.trailing_ones();
+				if trailing_ones < usize::BITS {
+					let mask = 1 << trailing_ones;
+					*word |= mask;
+					let index = word_index * USIZE_BITS + usize::try_from(trailing_ones).unwrap();
+					return Some(index);
+				}
+			}
+
+			None


Suggested change

for (word_index, word) in self.bits.iter_mut().enumerate() {

let trailing_ones = word.trailing_ones();

if trailing_ones < usize::BITS {

let mask = 1 << trailing_ones;

*word |= mask;

let index = word_index * USIZE_BITS + usize::try_from(trailing_ones).unwrap();

return Some(index);

}

}

None

let (word_index, trailing_ones) = self

.bits

.iter()

.copied()

.map(usize::trailing_ones)

.enumerate()

.find(|(_, trailing_ones)| *trailing_ones < usize::BITS)?;

let mask = 1 << trailing_ones;

self.bits[word_index] |= mask;

let index = word_index * USIZE_BITS + usize::try_from(trailing_ones).unwrap();

Some(index)

I am not sure if it would be an improvement but wanted to offer it as an option. It would save us from some nesting.

Interesting! I have looked into this, and the compiler fails to optimize the bounds check when setting the bit. Also, maybe because the trailing ones calculation is too far away now, the compiler no longer optimizes the masking from shl and or to bts.

For details, see Compiler Explorer.

So I'd keep it as is, even though the performance difference is small, of course (about 5%). :D

src/drivers/virtio/virtqueue/mod.rs

…xAlloc`

mkroening self-assigned this Nov 4, 2025

mkroening changed the title ~~perf(virtqueue): remove unused MemPool::limit field~~ perf(virtqueue): replace vec-based MemPools with bitmap-based IndexAlloc Nov 4, 2025

mkroening force-pushed the mempool-bitvec branch 2 times, most recently from 7320231 to fc37370 Compare November 4, 2025 08:53

mkroening marked this pull request as ready for review November 4, 2025 08:54

mkroening requested review from Gelbpunkt and cagatay-y and removed request for Gelbpunkt November 4, 2025 08:54

github-actions bot reviewed Nov 4, 2025

View reviewed changes

mkroening marked this pull request as draft November 4, 2025 13:05

mkroening force-pushed the mempool-bitvec branch 2 times, most recently from e3958f3 to 34f9060 Compare November 4, 2025 18:06

cagatay-y approved these changes Nov 6, 2025

View reviewed changes

perf(virtqueue): remove unused MemPool::limit field

a52430a

mkroening force-pushed the mempool-bitvec branch from 34f9060 to a52430a Compare November 6, 2025 16:41

perf(virtqueue): replace vec-based MemPools with bitmap-based `Inde…

78d301c

…xAlloc`

mkroening marked this pull request as ready for review November 6, 2025 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(virtqueue): replace vec-based `MemPools` with bitmap-based `IndexAlloc` #2049

perf(virtqueue): replace vec-based `MemPools` with bitmap-based `IndexAlloc` #2049

mkroening commented Nov 4, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cagatay-y Nov 6, 2025

Uh oh!

mkroening Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(virtqueue): replace vec-based MemPools with bitmap-based IndexAlloc #2049

Are you sure you want to change the base?

perf(virtqueue): replace vec-based MemPools with bitmap-based IndexAlloc #2049

Conversation

mkroening commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Benchmark Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cagatay-y Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

mkroening Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(virtqueue): replace vec-based `MemPools` with bitmap-based `IndexAlloc` #2049

perf(virtqueue): replace vec-based `MemPools` with bitmap-based `IndexAlloc` #2049

mkroening commented Nov 4, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading