-
-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crl-updater: CRL number not strictly increasing sometimes in integration test #7590
Comments
From the full test log, but extracting just the lines relevant to the failed shard:
This shows the following sequence of events:
The obvious question is: where did CRL 77 come from? The crl_test.go logs show no reference to it, except for the two failure messages. We can conclude that it wasn't generated by crl_test.go. However, if we look earlier in the integration test output, we can see this collection of log lines:
What we see here is the following sequence of events:
We know that the log lines in (3) came from the normal instance of crl-updater started by the integration test environment's start.py, because the log lines show up at all. All of the crl-updater log lines created by crl_test.go get captured by the test runner and only printed if something goes wrong (this is why the crl-storer log lines in (1) and (4) aren't interleaved with corresponding crl-updater logs). So this is just an instance of the integration tests getting unlucky. The crl_test.go integration test raced against the normal semi-random execution of the normal crl-updater, and lost the race. This does beg the question: why does the CRL uploaded by the normal continuous instance of crl-updater have a slightly larger (i.e. later, since our CRL numbers are just timestamps) number than the one created by crl_test.go's batch process? I think this is simply because there's some delay between the batch process selecting its time and actually doing the work, and there's much less delay in the continuous case. So the continuous process updated one shard in the gap between the batch process picking its time/number and getting around to updating that same shard. The only real solution I've thought of so far is "don't run crl-updater in continuous mode in the integration test environment". I don't love this idea, because it means that we lose coverage of crl-updater in Does anyone have other ideas for how to prevent crl-updater continuous from racing against crl-updater batch in this test? |
Your analysis makes sense. The difference between those two timestamps is 20ms, which is on the order of thread scheduling overhead. The unwritten(?) assumption of crl-updater is that it is never competing with another instance of crl-updater, and we're violating that assumption in our integration tests. One way to start meeting that assumption would be to split the config into |
We could do that if we had one covering RSA and the other covering ECDSA. Otherwise we don't have a way to guarantee that the cert which crl_test.go issues-and-revokes would be issued by the issuer which crl-updater-batch.json is responsible for. |
Seen here
The text was updated successfully, but these errors were encountered: