Simplify Thumb context switch by folding Thumb bit into adr offset#328
Simplify Thumb context switch by folding Thumb bit into adr offset#328
Conversation
Use 'adr r2, 0f + 1' instead of separate 'adr r2, 0f' + 'adds r2, #1' to set the Thumb mode LSB in the return address.
📝 WalkthroughWalkthroughThe PR consolidates the Thumb return-address calculation in the switchContext assembly routine by using a single Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/coro/coroutines.zig`:
- Around line 440-441: Update the PR description and any comment near the
assembly in src/coro/coroutines.zig to state that replacing "adr r2, 0f; adds
r2, `#1`" with "adr r2, 0f + 1" reduces the instruction count by one but does not
reduce assembled size (both encode to 4 bytes because the single form uses
Thumb2 T3 encoding). Reference the exact symbols "adr r2, 0f + 1", "adr r2, 0f",
and "adds r2, `#1`" when fixing the text and add a short compatibility note that
"adr ... + 1" requires Thumb2 (ARMv7+) and will not assemble on Thumb‑1/ARMv6‑M
(Cortex‑M0/M0+) targets if you want to document portability considerations.
| // Calculate return address with Thumb bit (LSB=1) set via adr offset | ||
| \\ adr r2, 0f + 1 |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine build.zig to understand target configuration
find . -name "build.zig" -type f | head -5Repository: lalinsky/zio
Length of output: 67
🏁 Script executed:
# Check the actual lines 440-441 in the file to confirm the context
wc -l src/coro/coroutines.zigRepository: lalinsky/zio
Length of output: 84
🏁 Script executed:
# Examine lines around 440-441 and the .balign directive mentioned
sed -n '435,465p' src/coro/coroutines.zigRepository: lalinsky/zio
Length of output: 974
🏁 Script executed:
# Search for ARM/Thumb target declarations in build files and config
rg -i "(armv6|cortex-m|thumb|arm)" --type zig --type toml -C2 | head -100Repository: lalinsky/zio
Length of output: 5260
🏁 Script executed:
# Check if cross-compilation tools are available and usable
which arm-linux-gnueabihf-as arm-linux-gnueabihf-objdump 2>&1 || echo "Tools not available"Repository: lalinsky/zio
Length of output: 119
🏁 Script executed:
# Examine build.zig to understand CPU/target configuration
head -100 build.zigRepository: lalinsky/zio
Length of output: 3834
🏁 Script executed:
# Search for cpu/target related code in build.zig
rg -n "cpu|target|arm|thumb" build.zig -A2 -B2 | head -150Repository: lalinsky/zio
Length of output: 904
Code-size claim in PR is misleading: the change saves 1 instruction, not 2 bytes
The adr r2, 0f + 1 instruction requires Thumb2 T3 encoding (32-bit, 4 bytes) because the offset is misaligned. The old sequence was adr r2, 0f (T1, 2 bytes) + adds r2, #1`` (T1, 2 bytes) = 4 bytes total. Assembled byte count is identical. The instruction count drops from 2 to 1, which is real, but the assembled size does not shrink. The PR description's "2 bytes saved" is incorrect.
Side note: T3 encoding requires Thumb2 (ARMv7+), so this code would fail to assemble on ARMv6-M targets (Cortex-M0/M0+, Thumb-1 only). The zio project doesn't declare any ARMv6-M targets and is clearly Linux-focused, so this isn't a practical concern—but worth knowing if someone tries to port it to bare-metal Cortex-M0 later.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/coro/coroutines.zig` around lines 440 - 441, Update the PR description
and any comment near the assembly in src/coro/coroutines.zig to state that
replacing "adr r2, 0f; adds r2, `#1`" with "adr r2, 0f + 1" reduces the
instruction count by one but does not reduce assembled size (both encode to 4
bytes because the single form uses Thumb2 T3 encoding). Reference the exact
symbols "adr r2, 0f + 1", "adr r2, 0f", and "adds r2, `#1`" when fixing the text
and add a short compatibility note that "adr ... + 1" requires Thumb2 (ARMv7+)
and will not assemble on Thumb‑1/ARMv6‑M (Cortex‑M0/M0+) targets if you want to
document portability considerations.
There was a problem hiding this comment.
Hmmm, so for cortex-m0 we needs the adds?
There was a problem hiding this comment.
@lalinsky Yes, exactly. Cortex-M0/M0+ (ARMv6-M) only support the Thumb-1 instruction set, which doesn't have the T3 encoding needed for adr with a misaligned offset like 0f + 1. So on those processors, you'd need the two-instruction sequence: adr r2, 0f (Thumb-1 T1 encoding) followed by adds r2, #1``.
However, this isn't relevant for your project. The zio codebase targets Linux on ARM (32-bit ARMv7+ with Thumb2 support), not bare-metal ARMv6-M microcontrollers. The single-instruction form works fine for all your target platforms and does reduce instruction count by 1 (even though the byte count stays at 4 bytes due to the encoding).
The main correction from my previous comment is just about the PR description claiming "2 bytes saved" when the actual benefit is reducing instruction count, not code size.
🧠 Learnings used
Learnt from: lalinsky
Repo: lalinsky/zio PR: 48
File: src/core/ConcurrentAwaitableList.zig:254-269
Timestamp: 2025-10-16T14:33:19.718Z
Learning: In the zio repository's `src/core/ConcurrentAwaitableList.zig` and `src/core/Awaitable.zig`, the `in_list` field should remain debug-only (conditional compilation with `builtin.mode == .Debug`). Double-removal detection in `ConcurrentAwaitableList.remove()` uses pointer-based checks (comparing `prev`/`next` against `head`/`tail`) rather than relying on a regular `in_list` field.
Learnt from: lalinsky
Repo: lalinsky/zio PR: 250
File: src/utils/simple_stack.zig:10-14
Timestamp: 2026-01-29T05:49:14.622Z
Learning: In the zio repository, the `in_list` field in node structs (e.g., src/utils/simple_stack.zig, src/utils/simple_queue.zig, src/runtime/awaitable.zig, etc.) should use `std.debug.runtime_safety` for the conditional type, not `builtin.mode == .Debug`. This ensures the field exists in both Debug and ReleaseSafe modes (where runtime safety checks are enabled) and is void only in ReleaseFast and ReleaseSmall builds.
Learnt from: lalinsky
Repo: lalinsky/zio PR: 118
File: src/core/task.zig:33-115
Timestamp: 2025-11-04T10:29:56.949Z
Learning: In zio's task implementation (src/core/task.zig), the Closure struct intentionally caps result_len and context_len at 4096 bytes (using u12 fields) and alignment at 16 bytes (using u4 padding fields). This is a deliberate design decision to avoid bloating task structures. Users needing larger result or context storage should heap-allocate and pass pointers instead. These limits are more generous than std lib's fixed storage (64 bytes for results, 1024 bytes for args).
<!-- [add_learning]
The task storage limits in zio (4 KiB for result/context, 16-byte alignment) are intentional design constraints to prevent task bloat, not correctness issues. The maintainer's position is that larger data should be heap-allocated.
Summary
adr r2, 0f + 1instead of separateadr r2, 0f+adds r2, #1to set the Thumb mode LSB in the return addressTest plan
thumb-linux