-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker Slots interface and Resource Based Autotuner First Cut #719
Worker Slots interface and Resource Based Autotuner First Cut #719
Conversation
Pausing while removing wf state inputs
9042276
to
d848b99
Compare
d9821b2
to
add670f
Compare
add670f
to
7d6e505
Compare
341d3c3
to
eb23321
Compare
core/src/abstractions.rs
Outdated
// TODO: Real release reason | ||
supp_c_c.release_slot(SlotReleaseReason::TaskComplete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turns out to be... quite annoying to deal with. Since permits simply being drop
ed causes them to be released, it's incredibly annoying to find every place that might ever happen and attach a reason.
My inclination is to simply not bother with this for now, possibly never, unless users ask for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not be understanding, but can you make permit release a more explicit step where it occurs instead of relying on drop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can, but it's worse from a safety perspective (could still use it as a backup though, with some default reason), and it can potentially happen in many, many places. It's just a lot of gruntwork for something I don't know that users will actually care about yet. Hence my preference to just defer it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Can we just get rid of reason for now then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that was what I was going to do based on discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we remove this from Java then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably for consistencies' sake (I'll ask specifically for feedback on this one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM, though I am only judging from the pub API and behavior POV (with the knowledge that this is not truly public to users, so langs can clarify, not use abbreviations, etc).
core-api/src/worker.rs
Outdated
/// Set a [SlotSupplier] for workflow tasks. | ||
#[builder(setter(into = false))] | ||
pub workflow_task_slot_supplier: | ||
Arc<dyn SlotSupplier<SlotKind = WorkflowSlotKind> + Send + Sync>, | ||
/// Set a [SlotSupplier] for activity tasks. | ||
#[builder(setter(into = false))] | ||
pub activity_task_slot_supplier: | ||
Arc<dyn SlotSupplier<SlotKind = ActivitySlotKind> + Send + Sync>, | ||
/// Set a [SlotSupplier] for local activity tasks. | ||
#[builder(setter(into = false))] | ||
pub local_activity_task_slot_supplier: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think, as I did on the Java SDK, that these should be one worker tuner object that a user can provide (even if the thing is just these three). It is better for callers. Granted this is more of a comment about lang than core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tradeoffs with both. I did encounter a couple patterns where having just one would have made things slightly easier - but there are just as many situations where having them separate was really useful. Ultimately I think this is just a matter of taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It matters from a caller POV too. I suspect many callers will configure workers with tuners created by someone else (same with interceptors). Setting a single resource-based tuner is easier than three resource-based slot suppliers (same with interceptors). This also makes instance reuse across workers easier since it's a single instance shared instead of three instances shared (same with interceptors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna try it out before merging it just to see if it's much simpler. If I like the end result we'll go that way but if it ends up uglier I'll keep the three options.
My main concern is what happens when you want to use different kinds of suppliers for different types. EX: you want fixed size for workflow and resource based for activity. Obviously you can just stuff those inside your own implementation, but in the (I would guess fairly common) case that you want to do exactly that, you now have to make your own implementation where before you could just set ours - or we have to provide some combinators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 FWIW your main concern also applies to interceptors and data converters in many langs (interceptors are just combinations of client, activity, and worker interceptors, and data converters are just combinations of payload codecs, payload converters, and failure converters). So there is a precedent for this type of combining. And many languages already have these combinators assuming you provide a simple 3-item struct instead of just the interface (e.g. with
keyword in C# or dataclasses.replace
in Python or ..
in Rust or ...
in JS). But I think we should optimize for what I think will be the common use case of users wanting to just provide a "tuner".
core-api/src/worker.rs
Outdated
/// Core will call this at worker initialization time, allowing the implementation to hook up to | ||
/// metrics if any are configured. If not, it will not be called. | ||
fn attach_metrics(&self, metrics: TemporalMeter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm. I know everywhere else implementers want to use metrics, they access via runtime (e.g. .telemetry().get_metric_meter()
). Do we need this here? Can't the user wire up metrics themselves? Is there a general purpose initialization method that may be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least the way TS is structured it's not really possible, and is likely to be annoying in other langs. The problem is you need to have initialized the runtime before you construct worker options to be able to provide the telemetry instance when constructing your slot supplier - but since the runtime is often constructed implicitly when a user initializes a worker, the worker config is already created before the runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but since the runtime is often constructed implicitly when a user initializes a worker
Hrmm, is this true? It seems from the TS bridge code I see that the client already has the runtime before worker is created. I would expect it can provide the metric meter to the slot supplier instances it creates at worker creation time.
For TS-side implementations, when they get around to temporalio/sdk-typescript#1229, should provide an accessor to the metric meter off of the runtime object (and if that object is created implicitly, so be it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair point, it's not impossible, I actually had it kinda like this originally but it just made for a bit of a mess in TS's bridge because of type erasure stuff, where you only had access to the erased slot supplier, so the trait needed this way to attach the metrics without changing things such that the slot suppliers are passed separately from the worker config which would've been annoying.
The nice thing about this is that lang doesn't need to do anything, this is all handled inside Core and there's going to have to be some kind of LangWrapper
implementation of SlotSupplier
anyway, and lang can provide the attached meter to the user's implementation in a context object or however else it feels like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lang can provide the attached meter to the user's implementation in a context object or however else it feels like
I think lang may want to ask users to do this themselves and not provide them anything in this abstraction specifically. At least in .NET and Python, any user that wants to emit metrics from their custom tuner implementation can of course use their own metrics impl, or use ours the same way they'd do it anywhere else outside of workflows/activities (using the runtime metric meter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they can still do that just fine, as well as we can provide it to them for access whenever they like outside of that. For now, within core, this makes the most sense and doesn't have any impact on lang.
core/src/abstractions.rs
Outdated
// TODO: Real release reason | ||
supp_c_c.release_slot(SlotReleaseReason::TaskComplete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may not be understanding, but can you make permit release a more explicit step where it occurs instead of relying on drop?
@@ -395,6 +406,7 @@ pub(super) const ACT_SCHED_TO_START_LATENCY_NAME: &str = "activity_schedule_to_s | |||
pub(super) const ACT_EXEC_LATENCY_NAME: &str = "activity_execution_latency"; | |||
pub(super) const NUM_POLLERS_NAME: &str = "num_pollers"; | |||
pub(super) const TASK_SLOTS_AVAILABLE_NAME: &str = "worker_task_slots_available"; | |||
pub(super) const TASK_SLOTS_USED_NAME: &str = "worker_task_slots_used"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this metric, can we make an issue to make sure all SDKs get it? (or ignore if this is implicitly assumed as part of the general tuning project)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this all should get this as part of this work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm is Java supposed to have this metric? I don't recall it getting added as part of your PR then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, but I'll add it
|
||
/// Implements [SlotSupplier] and attempts to maintain certain levels of resource usage when | ||
/// under load. | ||
pub struct ResourceBasedSlots<MI> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you get the algorithm settled, I think we should document the exact algorithm so that all languages can get the same (or at least real close)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more to the docstring here, but of course we might still come back and decide to change the algo later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an issue for a resource based slot supplier in java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one in Jira at least
eaeb4cc
to
b650e3d
Compare
b650e3d
to
aadb636
Compare
7ad344b
to
e934332
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I like the tuner abstraction. I can already see this will have a good interface look in langs.
In Python/TS we can have a tuner
be FixedTunerConfig | ResourceBasedTunerConfig | CustomTuner
(the existing 3 fixed values are just shortcuts for the first one, the latter is just an interface), but in .NET I can't really make unions or sum types, so there'll probably be TunerOptions
and 3 mutually exclusive properties of Fixed
, ResourceBased
, and Custom
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably could change the dir/package from slot_suppler
to tuner
, but meh. Could also have a public FixedSizeTuner
in this package and just remove those 3 worker options and force everyone to create tuners if you want, but no prob leaving as is too.
e934332
to
0e0544e
Compare
0e0544e
to
42140a6
Compare
Implements the first portions of the slot management proposal and an initial take on resource-based auto-tuning.
Here the auto-tuning is done based on PID controllers which target specified CPU and RAM usage levels. This will very likely need more tweaking as we proceed and do more testing, but initial tests are quite promising for your typical sorts of relatively-low-resource-usage activities.
I have already successfully made use of the resource-based tuner in Typescript without much challenge. The lang layer needs little adjustment to use it. Passing through the callbacks to allow users to implement the interface with their own implementation will be more involved, and we've decided to do that after shipping the auto-tuner and letting users try that out.