WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes #7004

beautifulentropy · 2023-07-18T22:21:19Z

Fix an issue related to the custom gRPC Picker implementation introduced in #6618. When a nonce contained a prefix not associated with a known backend, the Picker would continuously rebuild, re-resolve DNS, and eventually throw a 500 "Server Error" at RPC timeout. The Picker now promptly returns a 400 "Bad Nonce" error as expected, in response the requesting client should retry their request with a fresh nonce.

Additionally:

WFE unit tests use derived nonces when "BOULDER_CONFIG_DIR" == "test/config-next".
Balancer.Build() in "noncebalancer" forces a rebuild until non-zero backends are available. This matches the balancer/roundrobin implementation.
Nonces with no matching backend increment "jose_errors" with label "type": "JWSInvalidNonce" and "nonce_no_backend_found".
Nonces of incorrect length are now rejected at the WFE and increment "jose_errors" with label "type": "JWSMalformedNonce" instead of "type": "JWSInvalidNonce".
Nonces not encoded as base64url are now rejected at the WFE and increment "jose_errors" with label "type": "JWSMalformedNonce" instead of "type": "JWSInvalidNonce".

Fixes #6969
Part of #6974

wfe2/verify.go

wfe2/verify_test.go

jsha · 2023-07-25T22:10:09Z

I was concerned this would cause a lot of spurious badNonce errors during normal rolling restarts of nonce-service, because one WFE would learn about a new nonce-service instance before the others know about it. However, @jcjones mentioned in #6404 (comment):

We've already taken the efforts to ensure smooth roll-off of the boulder-nonce-redeem-grpc and boulder-nonce-generate-grpc services: the generate service is stopped at least 30 seconds before the redeem service stops, so that nonces in flight can be serviced.

So I think we're covered here. Though we should probably find someplace to document this as best practice for deploying Boulder.

grpc/noncebalancer/noncebalancer.go

wfe2/verify.go

grpc/noncebalancer/noncebalancer.go

test/integration/nonce_test.go

wfe2/wfe_test.go

sheurich · 2023-08-23T18:38:46Z

I was concerned this would cause a lot of spurious badNonce errors during normal rolling restarts of nonce-service, because one WFE would learn about a new nonce-service instance before the others know about it. However, @jcjones mentioned in #6404 (comment):

We've already taken the efforts to ensure smooth roll-off of the boulder-nonce-redeem-grpc and boulder-nonce-generate-grpc services: the generate service is stopped at least 30 seconds before the redeem service stops, so that nonces in flight can be serviced.

So I think we're covered here. Though we should probably find someplace to document this as best practice for deploying Boulder.

This sounds like a great approach to minimizing badNonce errors after nonce-service restarts. How does the generate service get stopped ahead of the redeem service?

WFE: Return 400 for well-formed nonces with unroutable prefixes

b9b85b8

beautifulentropy force-pushed the noncebalancer-timeout branch from 29d746f to b9b85b8 Compare July 18, 2023 23:15

Comply with gRFC A54

673fa6c

beautifulentropy changed the title ~~WFE: Return 400 for well-formed nonces with unroutable prefixes~~ WFE: Correct Error Handling for Nonce Redemption RPCs with Unmatched Prefixes Jul 19, 2023

beautifulentropy changed the title ~~WFE: Correct Error Handling for Nonce Redemption RPCs with Unmatched Prefixes~~ WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes Jul 19, 2023

beautifulentropy added 2 commits July 19, 2023 14:01

Fix broken unit test.

12434a5

Add missing TODOs

f8e4da0

beautifulentropy marked this pull request as ready for review July 19, 2023 18:38

beautifulentropy requested a review from a team as a code owner July 19, 2023 18:38

beautifulentropy requested a review from aarongable July 19, 2023 18:38

aarongable reviewed Jul 21, 2023

View reviewed changes

wfe2/verify.go Outdated Show resolved Hide resolved

wfe2/verify_test.go Show resolved Hide resolved

wfe2/verify_test.go Show resolved Hide resolved

Address comments and consolidate some error handling.

b108a72

beautifulentropy requested review from aarongable and a team July 24, 2023 17:17

aarongable reviewed Jul 24, 2023

View reviewed changes

wfe2/verify_test.go Show resolved Hide resolved

Revert config-next conditionals.

0b4b7bc

beautifulentropy requested review from aarongable and a team July 24, 2023 20:14

jsha requested changes Jul 25, 2023

View reviewed changes

beautifulentropy added 2 commits July 26, 2023 17:28

Addressing comments.

0dd94a4

Add unit test and fix new metrics.

c81a414

beautifulentropy requested review from jsha, a team, aarongable and pgporada and removed request for aarongable July 26, 2023 22:02

jsha approved these changes Jul 27, 2023

View reviewed changes

pgporada reviewed Jul 28, 2023

View reviewed changes

wfe2/wfe_test.go Show resolved Hide resolved

pgporada approved these changes Jul 28, 2023

View reviewed changes

beautifulentropy merged commit b141fa7 into main Jul 28, 2023
21 checks passed

beautifulentropy deleted the noncebalancer-timeout branch July 28, 2023 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes #7004

WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes #7004

beautifulentropy commented Jul 18, 2023 •

edited

Loading

jsha commented Jul 25, 2023

sheurich commented Aug 23, 2023

WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes #7004

WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes #7004

Conversation

beautifulentropy commented Jul 18, 2023 • edited Loading

jsha commented Jul 25, 2023

sheurich commented Aug 23, 2023

beautifulentropy commented Jul 18, 2023 •

edited

Loading