-
Notifications
You must be signed in to change notification settings - Fork 7
ed25519: {x86_64, aarch64}: codegen required s2n-bignum assembly #74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AIUI this alignment needs to be the highest in common use for a given architecture. As it currently stands 16KB pages are needed on macos, but theoretically the architecture supports 64KB pages. But it was kind of unclear to me whether |
I did a little experimental validation and can confirm
So looks like just ADRP + 4KiB-aligned tables should work. There's also ADRP+ADD which doesn't require any alignment: ; aarch64-apple-darwin
adrp x0, .my_label@PAGE
add x0, x0, .my_label@PAGEOFF; aarch64-unknown-linux-gnu
adrp x0, .my_label
add x0, x0, :lo12:.my_label |
|
Ah, that's good to know - thanks for looking into that. I know that s2n-bignum upstream is working on obviating all pain in this area, so hopefully that ends up in a situation where we don't need to rewrite any assembly soon. I will keep tabs on this PR! |
CodSpeed Performance ReportMerging #74 will not alter performanceComparing Summary
|
|
Hmm looks like some diff --git a/graviola/src/low/aarch64/edwards25519_decode.rs b/graviola/src/low/aarch64/edwards25519_decode.rs
index 5f2fbd1f..0dc9bc81 100644
--- a/graviola/src/low/aarch64/edwards25519_decode.rs
+++ b/graviola/src/low/aarch64/edwards25519_decode.rs
@@ -91,7 +91,7 @@ macro_rules! mulp {
"add x0, " $dest ";\n"
"add x1, " $src1 ";\n"
"add x2, " $src2 ";\n"
- "bl " Label!("edwards25519_decode_alt_mul_p25519", 0, Before)
+ "bl " Label!("edwards25519_decode_alt_mul_p25519", 3, After)
)}
}
@@ -100,7 +100,7 @@ macro_rules! nsqr {
"add x0, " $dest ";\n"
"mov x1, " $n ";\n"
"add x2, " $src ";\n"
- "bl " Label!("edwards25519_decode_alt_nsqr_p25519", 0, Before)
+ "bl " Label!("edwards25519_decode_alt_nsqr_p25519", 4, After)
)}
} |
Ah, there is a false assumption here that jumps only happen in the function body; not in a macro. Assuming there is one solution to resolving the label direction, we could fix this by finding that in a later pass and using it when outputting the macro. Alternatively we could expand macros that contain jumps. |
8b1f482 to
4fb676b
Compare
|
Rebased and added a "macro fixup" pass that runs on the generated Rust code if there are any macro definitions that reference labels. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #74 +/- ##
==========================================
+ Coverage 99.31% 99.46% +0.14%
==========================================
Files 168 181 +13
Lines 40269 51529 +11260
==========================================
+ Hits 39994 51252 +11258
- Misses 275 277 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Ex: `#define foo() bl edwards25519_decode_alt_mul_p25519`
Before this change, the generated rust macro didn't wire up the local label
reference properly:
```rust
macro_rules! foo { () => { Q!(
"bl " Label!("edwards25519_decode_alt_mul_p25519", 0, Before)
)} }
```
After this change:
```rust
macro_rules! foo { () => { Q!(
"bl " Label!("edwards25519_decode_alt_mul_p25519", 4, After)
)} }
```
To support this, we now track which macros reference labels. We then track
which blocks call these macros. As long as all macro callsites are uniformly
before or after the labels they reference, we can safely replace the local label
id and search direction in the original macro definition.
It was challenging to track macro callsites in the main `RustFormatter` and also
difficult to "pre-locate" callsites while dealing with hoisting, so the actual
macro fixing happens on the generated Rust code from the `RustFormatter` pass.
4ec2dd5 to
06e69fd
Compare
|
Replaced by #120 |
I might have gotten a bit over excited and started implementing
ed25519support (#73) 😅. I'm pretty excited about graviola -- it would seriously simplify the build issues and reduce the compile overhead we experience withaws-lc-rsand (patched)ringwhile yieldingaws-lc-level perf. Our internal CA/TLS usesed25519certs exclusively, soed25519support is a requirement.My plan is to spend the next several weekends implementing
ed25519support ingraviola, if you'll let me. If you're not comfortable with that, I totally understand. Hopefully this PR can at least get things started.This PR adds Rust codegen for all s2n-bignum assembly subroutines required to support
ed25519gen/sign/verify:edwards25519_decode: decodes a 32B compressed edwards curve point. also tests for canonicality.edwards25519_scalarmulbase: computesP := [n]B, wherenis a scalar andBis the basepoint.A := [s]BR := [r]Bedwards25519_scalarmuldouble: computesP' := [n]P + [m]B, wherenandmare scalars,Bis the basepoint, andPis an edwards curve point.R' := [S]B + [k](-A')bignum_mod_n25519: modular reduction modL, the order of the edwards25519 basepoint.randkscalarsbignum_madd_n25519: computesz := x * y + c (mod L)S := r + k * s (mod L)bignum_neg_p25519: computesz := -x (mod 2^255-19)-A' := (-(A'_X), A'_Y)You can see how
aws-lcuses these in curve25519.c and curve25519_s2n_bignum_asm.c.There's also
edwards25519_encode(compress+serialize an edwards point), but s2n-bignum's arm impl is fairly pessimistic so as to be endian independent: (https://github.com/awslabs/s2n-bignum/blob/main/arm/curve25519/edwards25519_encode.S#L61-L123). I wrote this one in Rust to avoid that.Testing
There's a basic sanity check test for
edwards25519_decodeto at least convince myself that the codegen works, but it's probably higher leverage to test themidlevel ed25519 impl, so I've left further testing for later.Generated assembly
I took a a pass auditing the codegen'd inline assembly. As far as I can tell, everything looks like it translated over correctly -- no missing clobbers, instructions, hoisting looks ok, etc.
The only funny thing I spotted is the somewhat wasteful 16 KiB page aligned constant tables in the generated aarch64 code. Guess that's something to optimize later :)