Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes #7004

Merged
merged 8 commits into from
Jul 28, 2023

Conversation

beautifulentropy
Copy link
Member

@beautifulentropy beautifulentropy commented Jul 18, 2023

Fix an issue related to the custom gRPC Picker implementation introduced in #6618. When a nonce contained a prefix not associated with a known backend, the Picker would continuously rebuild, re-resolve DNS, and eventually throw a 500 "Server Error" at RPC timeout. The Picker now promptly returns a 400 "Bad Nonce" error as expected, in response the requesting client should retry their request with a fresh nonce.

Additionally:

  • WFE unit tests use derived nonces when "BOULDER_CONFIG_DIR" == "test/config-next".
  • Balancer.Build() in "noncebalancer" forces a rebuild until non-zero backends are available. This matches the balancer/roundrobin implementation.
  • Nonces with no matching backend increment "jose_errors" with label "type": "JWSInvalidNonce" and "nonce_no_backend_found".
  • Nonces of incorrect length are now rejected at the WFE and increment "jose_errors" with label "type": "JWSMalformedNonce" instead of "type": "JWSInvalidNonce".
  • Nonces not encoded as base64url are now rejected at the WFE and increment "jose_errors" with label "type": "JWSMalformedNonce" instead of "type": "JWSInvalidNonce".

Fixes #6969
Part of #6974

@beautifulentropy beautifulentropy changed the title WFE: Return 400 for well-formed nonces with unroutable prefixes WFE: Correct Error Handling for Nonce Redemption RPCs with Unmatched Prefixes Jul 19, 2023
@beautifulentropy beautifulentropy changed the title WFE: Correct Error Handling for Nonce Redemption RPCs with Unmatched Prefixes WFE: Correct Error Handling for Nonce Redemption RPCs with Unknown Prefixes Jul 19, 2023
@beautifulentropy beautifulentropy marked this pull request as ready for review July 19, 2023 18:38
@beautifulentropy beautifulentropy requested a review from a team as a code owner July 19, 2023 18:38
wfe2/verify.go Outdated Show resolved Hide resolved
wfe2/verify_test.go Show resolved Hide resolved
wfe2/verify_test.go Show resolved Hide resolved
@beautifulentropy beautifulentropy requested review from aarongable and a team July 24, 2023 17:17
@beautifulentropy beautifulentropy requested review from aarongable and a team July 24, 2023 20:14
@jsha
Copy link
Contributor

jsha commented Jul 25, 2023

I was concerned this would cause a lot of spurious badNonce errors during normal rolling restarts of nonce-service, because one WFE would learn about a new nonce-service instance before the others know about it. However, @jcjones mentioned in #6404 (comment):

We've already taken the efforts to ensure smooth roll-off of the boulder-nonce-redeem-grpc and boulder-nonce-generate-grpc services: the generate service is stopped at least 30 seconds before the redeem service stops, so that nonces in flight can be serviced.

So I think we're covered here. Though we should probably find someplace to document this as best practice for deploying Boulder.

grpc/noncebalancer/noncebalancer.go Outdated Show resolved Hide resolved
wfe2/verify.go Outdated Show resolved Hide resolved
grpc/noncebalancer/noncebalancer.go Outdated Show resolved Hide resolved
grpc/noncebalancer/noncebalancer.go Outdated Show resolved Hide resolved
grpc/noncebalancer/noncebalancer.go Outdated Show resolved Hide resolved
test/integration/nonce_test.go Show resolved Hide resolved
test/integration/nonce_test.go Show resolved Hide resolved
@beautifulentropy beautifulentropy requested review from jsha, a team, aarongable and pgporada and removed request for aarongable July 26, 2023 22:02
@beautifulentropy beautifulentropy merged commit b141fa7 into main Jul 28, 2023
21 checks passed
@beautifulentropy beautifulentropy deleted the noncebalancer-timeout branch July 28, 2023 16:07
@sheurich
Copy link
Contributor

I was concerned this would cause a lot of spurious badNonce errors during normal rolling restarts of nonce-service, because one WFE would learn about a new nonce-service instance before the others know about it. However, @jcjones mentioned in #6404 (comment):

We've already taken the efforts to ensure smooth roll-off of the boulder-nonce-redeem-grpc and boulder-nonce-generate-grpc services: the generate service is stopped at least 30 seconds before the redeem service stops, so that nonces in flight can be serviced.

So I think we're covered here. Though we should probably find someplace to document this as best practice for deploying Boulder.

This sounds like a great approach to minimizing badNonce errors after nonce-service restarts. How does the generate service get stopped ahead of the redeem service?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Modify noncebalancer to return 400 "bad nonce" when no nonce server matches the prefix
5 participants