Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix intermittent startup issues on slow systems (#3803)
With a suitably slow system -- say, if you're waiting on a Windows Terminal to output text inside a virtual machine -- the expected order of things can be thrown off. One specific way this can happen is that the worker can attempt to dial and have its dial fail because of the slowness causing the dial context to time out. If, because of the slowness, we also haven't released the log gate yet, then the event with the failure information will be queued, along with the context that was used. Unfortunately, in some of these error cases, the context that was used was the dial context instead of the system base context. In many other places in the function it was the system base context so this is just a mismatch, probably from code written at different times. Normally this wouldn't be a problem as we'd fall back to the underlying logger, but when we release the log gate, things happen differently: it's a synchronous function and on error it causes us to abandon system startup entirely. This commit fixes the issue with using the incorrect context. It's an open question whether we should change the behavior around errors when replaying queued events, falling back to the underlying logger instead of erroring.
- Loading branch information