-
Notifications
You must be signed in to change notification settings - Fork 29.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: rewrite AsyncLocalStorage without async_hooks #48528
lib: rewrite AsyncLocalStorage without async_hooks #48528
Conversation
Review requested:
|
Also worth considering that as this no longer depends on |
b35d35c
to
8b30200
Compare
Some additional notes:
|
Woo! Thanks for getting this started. The need for the flag is unfortunate but definitely required given how chromium is using the API and the conflict that causes with electron. But otherwise this should be a major improvement over the current state. Will do a thorough review this weekend. |
Another note: This currently uses a model of building a linked-list of AsyncContextFrames which each contain only the key and value they were constructed with. The alternative which is what was originally proposed was to use a map which is cloned from the parent frame at each frame construction and then the single additional entry for that frame is added but it is considered otherwise immutable. I went with the current model mostly because it was easier and made debugging simpler but I plan to spend some time benchmarking the two approaches to identify the trade-offs and figure out which approach would be generally better. The one definite downside to the current model is it holds past values for the same store from further up the call graph alive longer which may be undesirable but may only really apply to top-level scopes that will need to be kept alive anyway. Further analysis should provide some better insight into the behaviour. |
This is awesome!
That'd be a welcome change. We've been trying to eliminate async_hooks usage in our apps (as we've detected performance issues when async_hooks are misused), and having to import AsyncLocalStorage from the |
Thenables propagation fix is here: https://chromium-review.googlesource.com/c/v8/v8/+/4674242 |
I propose tagging this as semver-major. |
@mcollina Could do that. Though the way in which we land it is still not entirely decided. Due to the Electron conflict it needs build-flagging anyway, so this could be landed as a minor if it's by default turned off. It could land off by default and then have a follow-up to turn it on by default as a major. What do you think? |
That works |
I've got all but
|
The This is essentially what |
e009382
to
6d26f70
Compare
6d26f70
to
5a7b995
Compare
|
||
> Stability: 1 - Experimental | ||
|
||
Enables the use of AsyncLocalStorage backed by AsyncContextFrame rather than |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enables the use of AsyncLocalStorage backed by AsyncContextFrame rather than | |
Enables the use of `AsyncLocalStorage` backed by `AsyncContextFrame` rather than |
maybe even convert it to links.
} | ||
|
||
disable() { | ||
AsyncContextFrame.disable(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This disable
variant seems quite different to the other one as this version effects only the current frame (which might be even undefined
) whereas the other variant disables the ALS instance globally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not quite accurate as run
/enterWith
will enable it again automatically. The existing disable
also doesn't actually change the value stored on the associated resource so, for example, you could call disable()
and then follow it with a run()
and the previously entered context before the disable would be restored after the run()
exits, which is kind of a bug. I'm intentionally leaving the existing ALS behaviour as-is, despite having several bugs. Those incorrect behaviours can be fixed in follow-up PRs.
The API is also considered experimental, so can be changed at any time. Support for the existing experimental APIs is somewhat best-effort and I would prefer to deprecate these APIs entirely unless some specific effort goes into fixing and stabilizing these APIs.
I don't think exact bug-for-bug reproduction of the existing ALS should be a goal here. The stabilized parts should be expected to behave the same, but the rest is best effort and likely to deviate a bit as the new model allows fixing some issues in the design of the prior model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, run/enterWith
reenable it but e.g. a timers or task_queue items do not use these APIs.
Anyhow, I agree that disable
/exit
APIs are strange (e.g. exit
isn't the pendent of enter
,...) already now. I also think they have no real use case besides testing maybe. So fine for me if we have behavior changes or even remove them later on.
Just want to make sure that these differences are not slipping in unintended and people complain. Maybe worth to mention this in the initial PR comment and/or commit message to avoid people say this was a silent change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intend to do some follow-up improvements, like there's currently a bunch of enabled checks in hot-paths that I want to restructure somewhat to have already made that check ahead of time.
I do think we should actually keep the exit
behaviour at least, but I want to work on making the behaviour more coherent and consistent.
I'm not quite so convinced we should be keeping disable
as it not being scoped doesn't match up evenly with the rest of the API and so the behaviour is a bit confusing given that it gets re-enabled automatically in certain cases. If it required an explicit enable then it might be more reasonable, but with the implicit re-enable it functions essentially the same as exit
but in a much more edge-case-y way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exit()
without the internal disable()
is mostly an enter(undefined)
.
I would prefer a single set()
instead but I guess it's a bit too late :o)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually not, and I do plan on doing that eventually, but some more work is needed to make that happen. Basically if we split the scoping part of run() from the mutating part we can raise the scoping part to provide implicit scopes in many places (like around every callback) and eliminate the need for all the closures. 😅
I'm working on an RFC that proves the correctness of the model which I should hopefully be able to share soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if native.js is a good name.
@@ -41,6 +42,10 @@ const { | |||
|
|||
const { AsyncResource } = require('async_hooks'); | |||
|
|||
const AsyncContextFrame = require('internal/async_context_frame'); | |||
|
|||
const async_context_frame = Symbol('asyncContextFrame'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const async_context_frame = Symbol('asyncContextFrame'); | |
const async_context_frame = Symbol('kAsyncContextFrame'); |
Not sure but I think there was some naming guideline somewhere. But maybe I'm wrong so feel free to ignore.
if you go for the change there are a few similar places.
} | ||
|
||
AsyncResource::~AsyncResource() { | ||
CHECK_NOT_NULL(env_); | ||
env_->RemoveAsyncResourceContextFrame(reinterpret_cast<std::uintptr_t>(this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change the sequence here?
In most places you enter context frame before asyncId handling and exit/restore after. But here it's the other way around.
I guess this has impact to anyone using a destroy hook for whatever reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timing consistency between ALS and async_hooks was never deterministic. Promises also produce a similar timing inconsistency. I can move that around if you feel strongly about it, but I don't think it really matters much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a fast test and it seems destroy hook is not running with the the store as of now. So actually this sequence seems to be less breaking one.
In general I think that changing the implementation details of ALS should not result in too many side effects to other node APIs.
I know AsyncHooks API is experimental and their use for ALS will/should go away. But there are other async hooks users which might also rely on ALS therefore the result of als.getStore()
seen within init/before/after/destroy hooks should stay the same if possible.
I expect there are no tests for this because till now async hooks were fully bound to ALS and quite a lot testing happens on async hooks level as of now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so my long-term plan is to:
- migrate ALS off async_hooks
- create some new API for resource tracking that doesn't conflate consumable and non-consumable resource, plus probably have some filtering mechanism to subscribe to specific resource types and skip the barrier hop altogether when not interested.
- eventually deprecate async_hooks
This will be a very long-term thing as I only have so much free time to work on this stuff, but that's my general aim with this. 🙂
namespace node { | ||
namespace async_context_frame { | ||
|
||
class Scope { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we disallow copy/assignment? Or is it anyway implicit because of the members?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Any @nodejs/cpp-reviewers have opinions on this? I don't have strong opinions either way.
I think my older comments regarding ABI stability are all solved by moving the relevant parts into But this comment seems to be still relevant as the new/old behavior is likely different. As this is behind a flag it should be fine. |
The scope thing should cover that. I've tested against some native modules already and it seems to be working fine, as far as I can tell. My plan was just to land this behind a flag, get some customers to try it out, and shake out any inconsistencies we haven't already found. I've done my best making things as consistent as I can, but there were a few things that needed to change a bit to fit the different model. As far as I can tell though, none of the changes should ever have user-visible impact. |
I'm going to just land this and start a follow-up PR with further improvements. What's here works, and I'd like to get it into users' hands sooner rather than later to start gathering feedback. The remaining suggestions are largely cosmetic, so I will just defer those to the follow-up rather than dealing with yet another round of CI flakiness and endless restarts. 😅 |
I accidentally added the |
Landed in d1229ee |
@Qard Hey! Did you ever get to benchmarking the change? |
Yep. See: #48528 (comment) |
@Qard thanks, this looks great! |
PR-URL: #48528 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Benjamin Gruenbaum <benjamingr@gmail.com> Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com> Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com> Reviewed-By: Gerhard Stöbich <deb2001-github@yahoo.de>
Notable changes: buffer: * use fast API for writing one-byte strings (Robert Nagy) #54311 * optimize createFromString (Robert Nagy) #54324 * use native copy impl (Robert Nagy) #54087 inspector: * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246 lib: * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528 module: * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619 PR-URL: TODO
Notable changes: buffer: * use fast API for writing one-byte strings (Robert Nagy) #54311 * optimize createFromString (Robert Nagy) #54324 * use native copy impl (Robert Nagy) #54087 inspector: * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246 lib: * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528 module: * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619 PR-URL: #54452
Notable changes: buffer: * use fast API for writing one-byte strings (Robert Nagy) #54311 * optimize createFromString (Robert Nagy) #54324 * use native copy impl (Robert Nagy) #54087 inspector: * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246 lib: * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528 module: * (SEMVER-MINOR) add --experimental-transform-types flag (Marco Ippolito) #54283 * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619 PR-URL: #54452
Notable changes: buffer: * use fast API for writing one-byte strings (Robert Nagy) #54311 * optimize createFromString (Robert Nagy) #54324 * use native copy impl (Robert Nagy) #54087 inspector: * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246 lib: * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528 module: * (SEMVER-MINOR) add --experimental-transform-types flag (Marco Ippolito) #54283 * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619 PR-URL: #54452
Notable changes: buffer: * use fast API for writing one-byte strings (Robert Nagy) #54311 * optimize createFromString (Robert Nagy) #54324 * use native copy impl (Robert Nagy) #54087 inspector: * (SEMVER-MINOR) support `Network.loadingFailed` event (Kohei Ueno) #54246 lib: * (SEMVER-MINOR) rewrite AsyncLocalStorage without async_hooks (Stephen Belanger) #48528 module: * (SEMVER-MINOR) add --experimental-transform-types flag (Marco Ippolito) #54283 * (SEMVER-MINOR) unflag detect-module (Geoffrey Booth) #53619 PR-URL: #54452
I'm working on rewriting AsyncLocalStorage to use v8::SetContinuationPreservedEmbedderData(...) in a similar model to Cloudflare and in anticipation of the upcoming AsyncContext proposal.
This is not done yet but is mostly working as-is. I have not got to benchmarking or writing tests for it yet, but it seems to be passing most of the existing AsyncLocalStorage. 😄
cc @nodejs/diagnostics @nodejs/performance