ed25519: {x86_64, aarch64}: codegen required s2n-bignum assembly #74

phlip9 · 2025-04-19T19:00:52Z

I might have gotten a bit over excited and started implementing ed25519 support (#73) 😅. I'm pretty excited about graviola -- it would seriously simplify the build issues and reduce the compile overhead we experience with aws-lc-rs and (patched) ring while yielding aws-lc-level perf. Our internal CA/TLS uses ed25519 certs exclusively, so ed25519 support is a requirement.

My plan is to spend the next several weekends implementing ed25519 support in graviola, if you'll let me. If you're not comfortable with that, I totally understand. Hopefully this PR can at least get things started.

This PR adds Rust codegen for all s2n-bignum assembly subroutines required to support ed25519 gen/sign/verify:

edwards25519_decode: decodes a 32B compressed edwards curve point. also tests for canonicality.
- verify: deserialize the pubkey
edwards25519_scalarmulbase: computes P := [n]B, where n is a scalar and B is the basepoint.
- gen: compute our pubkey A := [s]B
- sign: compute R := [r]B
edwards25519_scalarmuldouble: computes P' := [n]P + [m]B, where n and m are scalars, B is the basepoint, and P is an edwards curve point.
- verify: compute R' := [S]B + [k](-A')
bignum_mod_n25519: modular reduction mod L, the order of the edwards25519 basepoint.
- sign+verify: mod reduce r and k scalars
bignum_madd_n25519: computes z := x * y + c (mod L)
- sign: compute S := r + k * s (mod L)
bignum_neg_p25519: computes z := -x (mod 2^255-19)
- verify: compute -A' := (-(A'_X), A'_Y)

You can see how aws-lc uses these in curve25519.c and curve25519_s2n_bignum_asm.c.

There's also edwards25519_encode (compress+serialize an edwards point), but s2n-bignum's arm impl is fairly pessimistic so as to be endian independent: (https://github.com/awslabs/s2n-bignum/blob/main/arm/curve25519/edwards25519_encode.S#L61-L123). I wrote this one in Rust to avoid that.

Testing

There's a basic sanity check test for edwards25519_decode to at least convince myself that the codegen works, but it's probably higher leverage to test the mid level ed25519 impl, so I've left further testing for later.

Generated assembly

I took a a pass auditing the codegen'd inline assembly. As far as I can tell, everything looks like it translated over correctly -- no missing clobbers, instructions, hoisting looks ok, etc.

The only funny thing I spotted is the somewhat wasteful 16 KiB page aligned constant tables in the generated aarch64 code. Guess that's something to optimize later :)

ctz · 2025-04-19T19:14:10Z

The only funny thing I spotted is the somewhat wasteful 16 KiB page aligned constant tables in the generated aarch64 code. Guess that's something to optimize later :)

AIUI this alignment needs to be the highest in common use for a given architecture. As it currently stands 16KB pages are needed on macos, but theoretically the architecture supports 64KB pages. But it was kind of unclear to me whether adrp actually respects the page size, or is a remnant of the every-page-is-4KB history.

phlip9 · 2025-04-20T22:50:50Z

AIUI this alignment needs to be the highest in common use for a given architecture. As it currently stands 16KB pages are needed on macos, but theoretically the architecture supports 64KB pages. But it was kind of unclear to me whether adrp actually respects the page size, or is a remnant of the every-page-is-4KB history.

I did a little experimental validation and can confirm adrp is not special on Apple silicon and works with 4KiB alignment. Apple Silicon CPU Optimization Guide indirectly corroborates this:

2.8.3 Long Address Generation
The Arm ISA requires a two-step instruction sequence to generate long distance PC-
relative addresses. ADRP adds a 21-bit signed immediate value to the page address
(sometimes referred to as page number) of the PC while clearing the bottom 12 bits.
This allows an address to be generated +/- 4 GiB from top of the current PC page.
ADRP is often followed by an ADD instruction with a page offset as an immediate value to
generate a full address. In many cases, the assembly will include a name instead of the
immediate values, which will be converted to immediate values by the linker.
000000000000132c ADRP X0, #0x84 ; 0x84000 + 0x001000 = 0x85000
0000000000001330 ADD X0, X0, #0xa00 ; 0x85000 + 0x000a00 = 0x85a00

So looks like just ADRP + 4KiB-aligned tables should work.

There's also ADRP+ADD which doesn't require any alignment:

; aarch64-apple-darwin
adrp     x0, .my_label@PAGE
add      x0, x0, .my_label@PAGEOFF

; aarch64-unknown-linux-gnu
adrp     x0, .my_label
add      x0, x0, :lo12:.my_label

ctz · 2025-04-23T18:02:11Z

Ah, that's good to know - thanks for looking into that. I know that s2n-bignum upstream is working on obviating all pain in this area, so hopefully that ends up in a situation where we don't need to rewrite any assembly soon.

I will keep tabs on this PR!

codspeed-hq · 2025-04-23T18:05:59Z

CodSpeed Performance Report

Merging #74 will not alter performance

_{Comparing phlip9:phlip9/ed25519-low-codegen (c4a7cd8) with main (a0f733b)}

Summary

✅ 134 untouched benchmarks

phlip9 · 2025-04-23T19:39:26Z

Hmm looks like some low::aarch64::edwards25519_decode label references didn't get wired up correctly. I can at least manually edit to fix compilation. Will need to take a look at the driver later

diff --git a/graviola/src/low/aarch64/edwards25519_decode.rs b/graviola/src/low/aarch64/edwards25519_decode.rs
index 5f2fbd1f..0dc9bc81 100644
--- a/graviola/src/low/aarch64/edwards25519_decode.rs
+++ b/graviola/src/low/aarch64/edwards25519_decode.rs
@@ -91,7 +91,7 @@ macro_rules! mulp {
         "add x0, " $dest ";\n"
         "add x1, " $src1 ";\n"
         "add x2, " $src2 ";\n"
-        "bl " Label!("edwards25519_decode_alt_mul_p25519", 0, Before)
+        "bl " Label!("edwards25519_decode_alt_mul_p25519", 3, After)
     )}
 }

@@ -100,7 +100,7 @@ macro_rules! nsqr {
         "add x0, " $dest ";\n"
         "mov x1, " $n ";\n"
         "add x2, " $src ";\n"
-        "bl " Label!("edwards25519_decode_alt_nsqr_p25519", 0, Before)
+        "bl " Label!("edwards25519_decode_alt_nsqr_p25519", 4, After)
     )}
 }

ctz · 2025-04-24T09:45:55Z

Hmm looks like some low::aarch64::edwards25519_decode label references didn't get wired up correctly. I can at least manually edit to fix compilation. Will need to take a look at the driver later

Ah, there is a false assumption here that jumps only happen in the function body; not in a macro. Assuming there is one solution to resolving the label direction, we could fix this by finding that in a later pass and using it when outputting the macro. Alternatively we could expand macros that contain jumps.

phlip9 · 2025-05-11T01:26:26Z

Rebased and added a "macro fixup" pass that runs on the generated Rust code if there are any macro definitions that reference labels.

codecov · 2025-05-17T10:49:18Z

Codecov Report

❌ Patch coverage is 99.91327% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.46%. Comparing base (a0f733b) to head (c4a7cd8).
⚠️ Report is 59 commits behind head on main.

Files with missing lines	Patch %	Lines
graviola/src/mid/ed25519.rs	99.53%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #74      +/-   ##
==========================================
+ Coverage   99.31%   99.46%   +0.14%     
==========================================
  Files         168      181      +13     
  Lines       40269    51529   +11260     
==========================================
+ Hits        39994    51252   +11258     
- Misses        275      277       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Ex: `#define foo() bl edwards25519_decode_alt_mul_p25519` Before this change, the generated rust macro didn't wire up the local label reference properly: ```rust macro_rules! foo { () => { Q!( "bl " Label!("edwards25519_decode_alt_mul_p25519", 0, Before) )} } ``` After this change: ```rust macro_rules! foo { () => { Q!( "bl " Label!("edwards25519_decode_alt_mul_p25519", 4, After) )} } ``` To support this, we now track which macros reference labels. We then track which blocks call these macros. As long as all macro callsites are uniformly before or after the labels they reference, we can safely replace the local label id and search direction in the original macro definition. It was challenging to track macro callsites in the main `RustFormatter` and also difficult to "pre-locate" callsites while dealing with hoisting, so the actual macro fixing happens on the generated Rust code from the `RustFormatter` pass.

phlip9 · 2025-11-25T21:15:34Z

Replaced by #120

phlip9 force-pushed the phlip9/ed25519-low-codegen branch from 8b1f482 to 4fb676b Compare May 11, 2025 01:23

phlip9 mentioned this pull request May 17, 2025

driver.py: aarch64: constants only need 4 KiB alignment #87

Merged

phlip9 added 16 commits May 18, 2025 18:44

ed25519: x86_64: edwards25519_decode codegen

1e43d72

ed25519: aarch64: edwards25519_decode codegen

21341e3

ed25519: tiny test for edwards25519_decode to sanity check

bd6b42f

ed25519: add edwards25519_encode in Rust

f3b25fd

ed25519: x86_64: edwards25519_scalarmulbase codegen

d3bc7f3

ed25519: aarch64: edwards25519_scalarmulbase codegen

ddcc040

ed25519: x86_64: bignum_mod_n25519 codegen

77ab7bb

ed25519: aarch64: bignum_mod_n25519 codegen

bee0950

ed25519: x86_64: bignum_madd_n25519 codegen

8daac11

ed25519: aarch64: bignum_madd_n25519 codegen

d12407e

ed25519: x86_64: bignum_neg_p25519 codegen

f0ff95b

ed25519: aarch64: bignum_neg_p25519 codegen

4bc5c8c

ed25519: x86_64: edwards25519_scalarmuldouble codegen

322086f

ed25519: aarch64: edwards25519_scalarmuldouble codegen

6985629

WIP

06e69fd

phlip9 force-pushed the phlip9/ed25519-low-codegen branch from 4ec2dd5 to 06e69fd Compare May 19, 2025 10:03

phlip9 added 2 commits May 19, 2025 19:35

WIP

38a579d

WIP

c4a7cd8

ctz mentioned this pull request Nov 15, 2025

Support RFC5958 PKCS#8 v2 private keys #117

Merged

ctz mentioned this pull request Nov 23, 2025

Ed25519 support #120

Draft

2 tasks

phlip9 closed this Nov 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ed25519: {x86_64, aarch64}: codegen required s2n-bignum assembly #74

ed25519: {x86_64, aarch64}: codegen required s2n-bignum assembly #74

Uh oh!

phlip9 commented Apr 19, 2025 •

edited

Loading

Uh oh!

ctz commented Apr 19, 2025 •

edited

Loading

Uh oh!

phlip9 commented Apr 20, 2025

Uh oh!

ctz commented Apr 23, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

phlip9 commented Apr 23, 2025 •

edited

Loading

Uh oh!

ctz commented Apr 24, 2025

Uh oh!

phlip9 commented May 11, 2025

Uh oh!

codecov bot commented May 17, 2025 •

edited

Loading

Uh oh!

phlip9 commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ed25519: {x86_64, aarch64}: codegen required s2n-bignum assembly #74

ed25519: {x86_64, aarch64}: codegen required s2n-bignum assembly #74

Uh oh!

Conversation

phlip9 commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing

Generated assembly

Uh oh!

ctz commented Apr 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phlip9 commented Apr 20, 2025

Uh oh!

ctz commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging #74 will not alter performance

Summary

Uh oh!

phlip9 commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ctz commented Apr 24, 2025

Uh oh!

phlip9 commented May 11, 2025

Uh oh!

codecov bot commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

phlip9 commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

phlip9 commented Apr 19, 2025 •

edited

Loading

ctz commented Apr 19, 2025 •

edited

Loading

ctz commented Apr 23, 2025 •

edited

Loading

codspeed-hq bot commented Apr 23, 2025 •

edited

Loading

phlip9 commented Apr 23, 2025 •

edited

Loading

codecov bot commented May 17, 2025 •

edited

Loading