perf(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

mkroening · 2025-11-03T17:20:34Z

This PR makes use of the offset + wrap bitfield for saving one byte per index. The performance improvement might not be measurable, but this also makes the code more readable by combining these related fields.

Depends on #2046.

github-actions

Benchmark Results

Benchmark	Current: `57f95d4`	Previous: `9d1e4dd`	Performance Ratio
startup_benchmark Build Time	`114.72` s	`111.98` s	`1.02`
startup_benchmark File Size	`0.91` MB	`0.91` MB	`1.00`
Startup Time - 1 core	`0.94` s (`±0.02` s)	`0.94` s (`±0.02` s)	`1.00`
Startup Time - 2 cores	`0.94` s (`±0.02` s)	`0.94` s (`±0.03` s)	`1.00`
Startup Time - 4 cores	`0.94` s (`±0.03` s)	`0.94` s (`±0.02` s)	`1.00`
multithreaded_benchmark Build Time	`115.29` s	`112.12` s	`1.03`
multithreaded_benchmark File Size	`1.02` MB	`1.02` MB	`1.00`
Multithreaded Pi Efficiency - 2 Threads	`89.87` % (`±7.80` %)	`88.00` % (`±7.30` %)	`1.02`
Multithreaded Pi Efficiency - 4 Threads	`45.54` % (`±3.47` %)	`43.95` % (`±3.32` %)	`1.04`
Multithreaded Pi Efficiency - 8 Threads	`25.92` % (`±2.62` %)	`25.25` % (`±2.26` %)	`1.03`
micro_benchmarks Build Time	`294.80` s	`315.31` s	`0.93`
micro_benchmarks File Size	`1.02` MB	`1.02` MB	`1.00`
Scheduling time - 1 thread	`187.76` ticks (`±30.58` ticks)	`181.72` ticks (`±30.16` ticks)	`1.03`
Scheduling time - 2 threads	`103.66` ticks (`±21.44` ticks)	`107.77` ticks (`±18.94` ticks)	`0.96`
Micro - Time for syscall (getpid)	`10.81` ticks (`±6.33` ticks)	`13.22` ticks (`±5.19` ticks)	`0.82`
Memcpy speed - (built_in) block size 4096	`58059.52` MByte/s (`±41478.23` MByte/s)	`55204.30` MByte/s (`±40404.17` MByte/s)	`1.05`
Memcpy speed - (built_in) block size 1048576	`13798.77` MByte/s (`±11407.26` MByte/s)	`14269.94` MByte/s (`±12083.95` MByte/s)	`0.97`
Memcpy speed - (built_in) block size 16777216	`8440.61` MByte/s (`±6978.73` MByte/s)	`7485.41` MByte/s (`±6068.00` MByte/s)	`1.13`
Memset speed - (built_in) block size 4096	`57061.47` MByte/s (`±41206.31` MByte/s)	`55494.03` MByte/s (`±40595.71` MByte/s)	`1.03`
Memset speed - (built_in) block size 1048576	`14212.20` MByte/s (`±11676.23` MByte/s)	`14670.46` MByte/s (`±12334.51` MByte/s)	`0.97`
Memset speed - (built_in) block size 16777216	`8647.22` MByte/s (`±7111.15` MByte/s)	`7599.70` MByte/s (`±6128.76` MByte/s)	`1.14`
Memcpy speed - (rust) block size 4096	`48221.83` MByte/s (`±35963.38` MByte/s)	`51056.01` MByte/s (`±38411.57` MByte/s)	`0.94`
Memcpy speed - (rust) block size 1048576	`13676.87` MByte/s (`±11230.07` MByte/s)	`13857.34` MByte/s (`±11384.86` MByte/s)	`0.99`
Memcpy speed - (rust) block size 16777216	`8183.51` MByte/s (`±6689.26` MByte/s)	`7529.44` MByte/s (`±6166.85` MByte/s)	`1.09`
Memset speed - (rust) block size 4096	`49285.30` MByte/s (`±36845.92` MByte/s)	`51991.84` MByte/s (`±39176.17` MByte/s)	`0.95`
Memset speed - (rust) block size 1048576	`13969.29` MByte/s (`±11407.85` MByte/s)	`14087.35` MByte/s (`±11505.86` MByte/s)	`0.99`
Memset speed - (rust) block size 16777216	`8404.33` MByte/s (`±6846.10` MByte/s)	`7599.50` MByte/s (`±6197.41` MByte/s)	`1.11`
alloc_benchmarks Build Time	`295.19` s	`312.10` s	`0.95`
alloc_benchmarks File Size	`0.98` MB	`0.98` MB	`1.00`
Allocations - Allocation success	`100.00` %	`100.00` %	`1`
Allocations - Deallocation success	`100.00` %	`100.00` %	`1`
Allocations - Pre-fail Allocations	`100.00` %	`100.00` %	`1`
Allocations - Average Allocation time	`19837.35` Ticks (`±1087.80` Ticks)	`20044.74` Ticks (`±1076.26` Ticks)	`0.99`
Allocations - Average Allocation time (no fail)	`19837.35` Ticks (`±1087.80` Ticks)	`20044.74` Ticks (`±1076.26` Ticks)	`0.99`
Allocations - Average Deallocation time	`2869.62` Ticks (`±1189.04` Ticks)	`2980.59` Ticks (`±1256.07` Ticks)	`0.96`
mutex_benchmark Build Time	`293.94` s	`296.51` s	`0.99`
mutex_benchmark File Size	`1.02` MB	`1.02` MB	`1.00`
Mutex Stress Test Average Time per Iteration - 1 Threads	`36.78` ns (`±4.34` ns)	`36.10` ns (`±4.90` ns)	`1.02`
Mutex Stress Test Average Time per Iteration - 2 Threads	`31.70` ns (`±3.53` ns)	`29.58` ns (`±2.65` ns)	`1.07`

This comment was automatically generated by workflow using github-action-benchmark.

Gelbpunkt

Not sure about perf here since bitfields should have generally worse access times due to the necessary bitshifts as opposed to doing them once and using an intermediate struct for holding the values. I don't really mind either way, but this version feels like we're at least not reinventing virtio-spec

…v_wc`

…_wc`

mkroening self-assigned this Nov 3, 2025

mkroening force-pushed the pvirtq-ring-idx branch from 360b927 to a89d96c Compare November 3, 2025 17:39

mkroening changed the title ~~perf(pvirtq): use bitfields for RingIdx~~ perf(pvirtq): combine u15 offset and u1 wrap counter in more places Nov 3, 2025

github-actions bot reviewed Nov 3, 2025

View reviewed changes

mkroening marked this pull request as ready for review November 4, 2025 08:54

mkroening requested review from Gelbpunkt and cagatay-y November 4, 2025 08:54

Gelbpunkt approved these changes Nov 6, 2025

View reviewed changes

mkroening added 3 commits November 6, 2025 17:39

perf(pvirtq): use bitfields for RingIdx

b20103e

perf(pvirtq): use bitfields for DescriptorRing::write_index and `dr…

ca2d433

…v_wc`

perf(pvirtq): use bitfields for DescriptorRing::poll_index and `dev…

57f95d4

…_wc`

mkroening force-pushed the pvirtq-ring-idx branch from 96ca7e8 to b20103e Compare November 6, 2025 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

perf(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

Uh oh!

mkroening commented Nov 3, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

Gelbpunkt left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(pvirtq): combine u15 offset and u1 wrap counter in more places #2047

Are you sure you want to change the base?

perf(pvirtq): combine u15 offset and u1 wrap counter in more places #2047

Uh oh!

Conversation

mkroening commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Benchmark Results

Uh oh!

Gelbpunkt left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

perf(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

perf(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

mkroening commented Nov 3, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading