-
-
Notifications
You must be signed in to change notification settings - Fork 7
RFC 0002: MT Execution Contexts #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The distinction between execution context and scheduler could need a bit refinement. There is definitely some overlap in functionality, just by comparing the API. I guess execution contexts might take over some features of the current scheduler? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely lovely, well-written proposal. I completely agree with the design intent here, and only have a few—mostly overlapping—comments about event loops and the default context. The vast majority of this design is exactly what I would like to see in crystal.
0002-execution-contexts.md
Outdated
|
||
## Default context configuration | ||
|
||
This proposal doesn’t solve the inherent problem of: how can applications configure the default context at runtime (e.g. number of MT schedulers) since we create the context before the application’s main can start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal supports creating multiple execution contexts, let the application configure it's own EC and start fibers in it if required. That allows all the complexity the app needs when configuring the context the application actually runs in, because it's initialized by the application. The root context does not have to be well-used by the application.
@RX14 Tell me if I'm wrong, the plan would be: Crystal 1
Crystal 2
Isolated context I think I see that context for UI loops only, and want to prevent blocking behaviors, but there's nothing wrong with doing blocking calls in other use cases, and using the event-loop normally is fine. Still, spawning a fiber without an explicit context should either raise or the default context should be configured (as you suggest): abstract class ExecutionContext
class Isolated < ExecutionContext
def initialize(name : String, @spawn_context : ExecutionContext? = nil, &)
@thread = Thread.new(name) { yield }
end
def spawn(**args, &) : Fiber
if ctx = @spawn_context
ctx.spawn(**args) { yield }
else
raise RuntimeError.new("Can't spawn in isolated context (need a spawn context)")
end
end
end
end
mt = ExecutionContext::MultiThreaded.new
ui = ExecutionContext::Isolated.new("GTK", spawn_context: mt) { Gtk.main } Instead of raising, the spawn context could be the default EC. |
@RX14 I applied your suggestions to the RFC. |
There's no such method
@ysbaddaden I think I envision the root execution context being MT or ST a moot point, because every well-architected app has a single If we all agree, maybe we can start on the other 90% of the RFC: bikeshedding naming. I like |
@RX14 The I wouldn't bikeshed the namings just yet. As I'm experimenting with the types, I feel that the difference is getting thinner and thinner. In fact, Kotlin only has a single scheduler implementation, and a couple constructors to start execution contexts with 1 (ST) or many threads (MT). I'm also struggling with the inheritance: |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/crystal-multithreading-support/6622/8 |
I forgot again but MT:1 would break |
I think we can accept breakage with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, I like this a lot.
0002-execution-contexts.md
Outdated
|
||
Such a group of fibers will never run in parallel. This can vastly simplify the synchronization logic since you don’t have to deal with parallelism anymore, only concurrency, which is much easier & faster to deal with. For example no need for costly atomic operations, you can simply access a value directly. Parallelism issues and their impact on the application performance is limited to the global communication. | ||
|
||
## Issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think data and especially execution locality could show up on the negative side as well, as the round robin take away a lot of programmatic control of data locality as well. It is .. possible.. to manually schedule fibers to dedicated threads but it really is not the way it currently is meant to be used.
0002-execution-contexts.md
Outdated
- a scheduler to run the fibers (or many schedulers for a MT context); | ||
- an event loop (IO & timers): | ||
|
||
=> this might be complex: I don’t think we can share a libevent across event bases? we already need to have a “thread local” libevent object for IO objects as well as for PCRE2 (though this is an optimization). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in the 'let the event loop decide if it want to be instantiated on a thread, execution level or global level' camp. How that would look API-wise I'm less sure - especially not if dynamic amount of threads in a context is to be supported.
This might be complex: I don’t think we can share a libevent across event bases
From what I have gathered from libevent docs, it is possible, but would necessitate a lot more synchronization when io happens (*), so it is probably slower.
But yes, it is complex. Windows, and its weird file handles says hi. Each open file handle is specific for each instance of whatever it uses, so it needs to be only one global one event instance there.
- we already enable some structures for thread safety but then create separate bases for each thread anyhow, IIRC. It was quite a while since I looked at it. I think we can remove that enabling without danger - they should really only be used when actually reusing a libevent base between threads). We don't use the specialized mt safe functions libevent that make use of it.
0002-execution-contexts.md
Outdated
def initialize(@name : String, @minimum : Int32, @maximum : Int32) | ||
# todo: start @minimum threads | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing a dynamic amount of threads requires more synchronization and complexity than having a static amount. While it sounds nice to be able to adjust, it probably warrants its own separate class. Making certain all threads are in a waiting state before starting to actually queue stuff allows a bunch of simplifications with less mutexes and risks a lot fewer possible race conditions too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enqueue doesn't have to need much sync. Go pushes to a bounded local queue (per scheduler) with overflow to a global queue; threads can be started at any time: when they reach the run loop they will grab a batch of fibers from the global queue or steal from another scheduler. Stopping ain't more complex, schedulers aren't tied to a specific thread, the thread detaches the scheduler and returns itself to the thread pool.
The complexity is more in when to start / stop a thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that complexity pushes for it to become a "future evolution".
As someone who isn't familiar with this stuff at all, my random question is:
|
Co-authored-by: Linus Sellberg <sellberg@gmail.com>
@Blacksmoke16 From what I read specifying a thread priority can hint the OS to schedule the thread on a big (efficient) or little (power) core. We can also set a thread affinity to a given core, but we must detect the core type beforehand. |
Adds notes about wrapping an existing EC, and thread affinities (to pin a thread to a core) in addition to set priorities (still no API). Simplifies the EC API to remove `yield` and `sleep` that may not be needed (the `Fiber.yield` and `sleep` methods can create the resume events), but adds `spawn(same_thread)` to handle the transition.
Co-authored-by: Johannes Müller <straightshoota@gmail.com>
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/timeline-for-multithreading-support/3604/21 |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/upcoming-release-1-14-0/7199/1 |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/new-event-loop-unix-call-for-reviews-tests/7207/6 |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/charting-the-route-to-multi-threading-support/7320/1 |
Just wanted to share some thoughts/ideas I'm using in my own Fiber-based multi-threaded language. There are two distinct access patterns that can be isolated into different types of Fiber construction:
How I chose to handle this is with two distinct Fiber types: a local Fiber which can access any in-scope data but can only execute on the thread in which it was created, and an ownership-claiming Fiber which specifically avoids running on the thread in which it was created, taking ownership of a passed in value and bringing that over with it to whatever thread it gets dispatched to. On top of that I have a work-stealing mechanism which can pull a Fiber back to its origin thread when it's not busy, but the Fiber would otherwise be guaranteed to run on a different thread. This design focuses very strongly on maintaining locality to achieve the best performance and to avoid synchronization complexity as much as possible while making distribution over threads certain when loop utilization would benefit from it. In a typical app server you're going to see http requests come in which are largely isolated from each other and can be immediately distributed to other threads, but then the processing within those tends to need to share more with each other, and what you need within the scope of parts of that request is more likely just concurrency and not actually parallelism. If you need parallelism you can reach for that tool, but if a more restrictive tool is provided to ensure actual isolation of the work for that Fiber then it can be much more effectively distributed. Another thing you see often with http requests is that most object references within a request are to objects allocated in that request, but there are a small number of exceptions which typically fit neatly into a shared interface category such as a database connection (or pool) or an app server exposing utilities to each route. For these types I opted to build actors into my language and have any interaction with these objects go through a shim copied into the ownership-taking fibers which dispatch these interactions across threads through Fiber yields as the "lock" mechanism. A task simply gets passed over to the thread owning the object to do that interaction and the result of that interaction resolves the yield on the other end. In my language I went for an explicit form of this actor type specialization, but you could also go for explicit construction of a proxy type when passing to an ownership-taking Fiber. In any case, what I wanted to convey is that in my experience there are two distinct ways in which people want to distribute work with Fibers and isolating those two behaviours can actually have a lot of benefits. 🙂 |
Upgrades the IOCP event loop for Windows to be on par with the Polling event loops (epoll, kqueue) on UNIX. After a few low hanging fruits (enqueue multiple fibers on each call, for example) the last commit completely rewrites the `#run` method: - store events in pairing heaps; - high resolution timers (`CreateWaitableTimer`); - block forever/never (no need for timeout); - cancelling timeouts (no more dead fibers); - thread safety (parallel timer de/enqueues) for [RFC #2]; - interrupt run using completion key instead of an UserAPC for [RFC #2] (untested). [RFC #2]: crystal-lang/rfcs#2
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/upcoming-release-1-15-0/7537/1 |
In a MT environment such as proposed in crystal-lang/rfcs#2, the main thread's fiber may be resumed by any thread, and it may return which would terminate the program... but it might return from _another thread_ that the process' main thread, which may be unexpected by the OS. This patch instead explicitly exits from `main` and `wmain`. For backward compatibility reasons (win32 `wmain` and wasi `__main_argc_argv` both call `main` andand are documented to do so), the default `main` still returns, but is being replaced for UNIX targets by one that exits. Maybe the OS actual entrypoint could merely call `Crystal.main` instead of `main` and explicitely exit (there wouldn't be a global `main` except for `UNIX`), but this is out of scope for this PR.
Integrates the skeleton as per crystal-lang/rfcs#2 - Add the `ExecutionContext` module; - Add the `ExecutionContext::Scheduler` module; - Add the `execution_context` compile-time flag. When the `execution_context` flag is set: - Don't load `Crystal::Scheduler`; - Plug `ExecutionContext` instead of `Crystal::Scheduler` in `spawn`, `Fiber`, ... This is only the skeleton: there are no implementations (yet). Trying to compile anything with `-Dexecution_context` will fail until the ST and/or MT context are implemented. Co-authored-by: Johannes Müller <straightshoota@gmail.com>
Integrates the skeleton as per crystal-lang/rfcs#2 - Add the `ExecutionContext` module; - Add the `ExecutionContext::Scheduler` module; - Add the `execution_context` compile-time flag. When the `execution_context` flag is set: - Don't load `Crystal::Scheduler`; - Plug `ExecutionContext` instead of `Crystal::Scheduler` in `spawn`, `Fiber`, ... This is only the skeleton: there are no implementations (yet). Trying to compile anything with `-Dexecution_context` will fail until the ST and/or MT context are implemented. Co-authored-by: Johannes Müller <straightshoota@gmail.com>
The current constructors are rather complex and inconvenient to type, and pushing ctx = Fiber::ExecutionContext::Isolated.new(name) { }
ctx = Fiber::ExecutionContext::SingleThreaded.new(name)
ctx = Fiber::ExecutionContext::MultiThreaded.new(name, size: 1..4) I'm wondering if we could have simpler helpers. For example (thinking out loud, don't take things for granted): ctx = Fiber.start_isolated(name) { } # or #spawn_isolated ?
ctx = Fiber.start_concurrent_context(name)
ctx = Fiber.start_parallel_context(name, size: 1..4) I'm not sure it's really better. Maybe a bit more explicit thanks to the The naming might also indicate that maybe the ST and MT classes should be named |
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/upcoming-release-1-16-0/7883/1 |
The `@[ThreadLocal]` annotation only works on some targets and doesn't allow registering a destructor callback that will be invoked when a thread shuts down. We currently don't have threads shutting down, but with [RFC 2] it will start happening (at least isolated contexts are expected to shut down, others should eventually evolve to shut down too), or use the complex `Crystal::ThreadLocalValue` to tie a value to a thread, which in turn requires finalize methods [RFC 2]: crystal-lang/rfcs#2
I updated the phrasing and names of the proposed execution contexts. Reading the RFC again, I believe that its purpose, to propose a new API to achieve parallelism in Crystal, can be considered done, even though the feature isn't fully released. I propose to archive RFC 0002 and to start a new one with details on the actual execution contexts (Concurrent, Parallel, Isolated), what guarantees they bring and how they differ, and what still needs to be done: resize, shutdown, thread pool and detach on syscall, mostly. |
|
||
The following are the potential contexts that Crystal could implement in stdlib. | ||
|
||
**Concurrent Context**: fibers will never run in parallel, they can use simpler and faster synchronization primitives internally (no atomics, no thread safety) and still communicate with other contexts with the default thread-safe primitives (e.g. `Channel`); the drawback is that a blocking fiber will block the other fibers from progressing. The concurrency limitation doesn't mean that the fibers will keep running on the same system thread forever. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
polish: Maybe this wording is a bit more clear?
**Concurrent Context**: fibers will never run in parallel, they can use simpler and faster synchronization primitives internally (no atomics, no thread safety) and still communicate with other contexts with the default thread-safe primitives (e.g. `Channel`); the drawback is that a blocking fiber will block the other fibers from progressing. The concurrency limitation doesn't mean that the fibers will keep running on the same system thread forever. | |
**Concurrent Context**: fibers will never run in parallel, they can use simpler and faster synchronization primitives internally (no atomics, no thread safety) and still communicate with other contexts with the default thread-safe primitives (e.g. `Channel`); the drawback is that a blocking fiber will block the other fibers from progressing. This concurrency limitation however does not guarantee that fibers stay on the same system thread. |
@@ -212,40 +213,36 @@ Ideally, anybody could implement an execution context that suits their applicati | |||
|
|||
2. We can create an execution context dedicated to handle the UI or game loop of an application, and keep the threads of the default context to handle calculations or requests, never impacting the responsiveness of the UI. | |||
|
|||
3. We can create an MT execution context for CPU heavy algorithms, that would block the current thread (e.g. hashing passwords using BCrypt with a high cost), and let the operating system preempt the threads, so the default context running a webapp backend won't be blocked when thousands of users try to login at the same time. | |||
3. We can create a MT execution context for CPU heavy algorithms, that would block the current thread (e.g. hashing passwords using BCrypt with a high cost), and let the operating system preempt the threads, so the default context running a webapp backend won't be blocked when thousands of users try to login at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: an
is actually correct because MT
is pronounced as em-tee
.
def spawn(*, name : String?, execution_context : Fiber::ExecutionContext = Fiber::ExecutionContext.current, &block) : Fiber | ||
execution_context.spawn(name: name, &block) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: There's actually no execution_context
parameter. Some of the other implementations are also not accurate (e.g. ::sleep
only delegates to Fiber::ExecutionContext.current.sleep
). We should validate all of them.
def spawn(*, name : String?, execution_context : Fiber::ExecutionContext = Fiber::ExecutionContext.current, &block) : Fiber | |
execution_context.spawn(name: name, &block) | |
end | |
def spawn(*, name : String?, &block) : Fiber | |
Fiber::ExecutionContext::Scheduler.current.spawn | |
.spawn(name: name, &block) | |
end |
# For example "running", "event-loop" or "parked". | ||
abstract def status : String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I'm wondering if this should be an enum instead of arbitrary strings?
In practice, the concurrent context might not bring much performance improvement | ||
over the parallel context with a single thread, and both might share the same | ||
base. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I'm still not sure if it's even worth having a separate concurrent context when there's not much practical benefit over a parallel context with max=1
.
advantage of having a single fiber to run —in practice we want to have a | ||
distinct implementation since we should only have to deal with the thread's main | ||
fiber and the isolated fiber. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I suppose it would probably be more effort and not worth it to implement something different.
But technically, we probably don't even need a separate fiber for the event loop. In an isolated context, at any point we can only wait on a single event at most. So that could potentially just be interweaved directly into the isolated fiber. Avoiding fiber swapping. Probably not worth it, just mentioning the possibility.
ncpu.times do | ||
codegen.spawn do | ||
# (runs in the codegen context) | ||
while unit = channel.receive? | ||
unit.compile | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: The codegen fibers should probably inform the waitgroup.
ncpu.times do | |
codegen.spawn do | |
# (runs in the codegen context) | |
while unit = channel.receive? | |
unit.compile | |
end | |
end | |
end | |
wg.add ncpu | |
ncpu.times do | |
codegen.spawn do | |
# (runs in the codegen context) | |
while unit = channel.receive? | |
unit.compile | |
end | |
wg.done | |
end | |
end |
|
||
~~Since it is a breaking change from stable Crystal, unrelated to `preview_mt`, MT and execution contexts may only become default with a Crystal major release?~~ | ||
|
||
~~Maybe MT could be limited to 1 thread by default, and `spawn(same_thread)` be deprecated, but what to do with `spawn(same_thread: true)`?~~ | ||
|
||
## Default context configuration | ||
|
||
This proposal doesn’t solve the inherent problem of: how can applications configure the default context at runtime (e.g. number of MT schedulers) since we create the context before the application’s main can start. | ||
This proposal doesn’t solve the inherent problem of: how can applications configure the default context at runtime (e.g. number of MT schedulers) since we create the context before the application’s main can start. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: When we implement rescaling for parallel contexts, the default context can be parallel with max=1
but resizable from user code.
And I believe at least upscaling shouldn't be much more complicated than setting Parallel#capacity
to a higher number and initialize the associated schedulers.
|
||
# Summary | ||
|
||
Reinvent MT support in Crystal to be more efficient, for example by avoiding blocked fibers while there are sleeping threads, all the while empowering developers with different Execution Contexts to run fibers in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: The summary should be dead simple and understandable without reading up on terms only defined in this document.
Reinvent MT support in Crystal to be more efficient, for example by avoiding blocked fibers while there are sleeping threads, all the while empowering developers with different Execution Contexts to run fibers in. | |
Reinvent Crystal's concurrency execution model with respect to multi-threading. It's getting more efficient, for example by avoiding blocked fibers while there are sleeping threads. And it empowers developers with different strategies for scheduling fiber, called _Execution Contexts_. |
|
||
An execution context creates and manages a dedicated pool of 1 or more threads where fibers can be executed into. Each context manages the rules to run, suspend and swap fibers internally. | ||
|
||
Applications can create any number of execution contexts in parallel. These contexts are isolated but they shall still be capable to communicate together with the usual synchronization primitives (e.g. Channel, Mutex) that must be thread-safe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: simplify:
Applications can create any number of execution contexts in parallel. These contexts are isolated but they shall still be capable to communicate together with the usual synchronization primitives (e.g. Channel, Mutex) that must be thread-safe. | |
Applications can create any number of execution contexts in parallel. These contexts are isolated but they shall still be capable to communicate together with the usual thread-safe synchronization primitives (e.g. `Channel`, `Mutex`). |
No description provided.