-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GGML threadpool blocked when thread contention #2725
Comments
Can you reproduce with any of the |
The repro that I added here is just a simple C++ example (no whisper.net involved): sandrohanea@07edc83#diff-8012be6a28736dd93582661245965820609c958d8d97ae03750257a2e11f35e1R34 It's basically just starting multiple inference calls in parallel:
|
The thread sanitizer reports a couple of data races, though I'm not sure if these could cause the problem that you observe:
Does it work with |
That doesn't look like a deadlock, it seems more likely that the performance is just very bad when using more threads than available. This is not completely unexpected, many parts need cooperation of all the threads to proceed, and if there are not enough threads available, then what's going to happen is that it is going to spin for a while, lose its time slice, then the next thread is going to be scheduled and waste more time spinning and so on. I am not sure that this is something that needs to be fixed, just do not use more threads than available. |
It seems that So, most likely the problem is with GGML_OPENMP=OFF and clang just couldn't find openmp while MSVC can. |
It stayed for more than 5 hours on github actions (here: https://github.com/sandrohanea/whisper.net/actions/runs/12726221171/job/35474406671) and not one tests succeeded (which are usually finishing in a couple of seconds). It's not impossible to be a problem with the performance, but such a big difference seems too much to me. |
Much of the synchronization without OpenMP uses spin locks, which is great for performance when all threads are running at the same time, but it is susceptible to very bad results when there is thread contention. The thread is waiting for other threads to finish their job by wasting time spinning, but actually there are no other threads running, and the only way for the other threads to finish their jobs is for the current thread to give up its time slice. It may actually be a deadlock in this case, it's not completely out of the question, but it is something that should be easy to avoid by not using more threads than available. |
@max-krasnyansky tagging you in case this is something you are interested in improving. The problem seems to be that when using the ggml threadpool when there is thread contention, the threads fail to make any progress. |
In general, the number of running threads is smaller than the number of physical threads. The default parameters are already set to limit the number of threads to at most the number of physical threads, as seen here: Line 4644 in 2ab2eb5
However, in some environments, it might not be possible to determine the number of available threads (e.g., within certain containers, such as in a GitHub Actions environment), where the thread affinity is managed by the host. I agree it is not ideal, and we can mitigate this by hardcoding the max number of threads to 1 if we don't control the environment where these will run. However, it would be great if ggml could handle thread contention cases more gracefully, allowing the processing to resume even under such conditions. (renamed the issue as it is not related to clang) Thanks Georgi and Diego for the quick answers and investigation! |
Yep. Sorry for the delayed reply. |
Hello @ggerganov , @slaren ,
Identified initially this issue when I tried to upgrade Whisper.net to latest Whisper.cpp and ggml in sandrohanea/whisper.net#319 and observed that support for MSVC was removed.
I started to investigate this issue but I'm a bit blocked as I'm not that proficient in C++.
I added a simple repro example here: sandrohanea@07edc83
Same code, compiled with MSVC works:
But when using clang, it's not working:
where
../cmake/x64-windows-llvm.cmake
is: https://github.com/ggerganov/llama.cpp/blob/master/cmake/x64-windows-llvm.cmakeSame setup is working with the MSVC build:
What's more interesting is that the deadlock from the clang process is blocking that entire CPU and causing all other processes of whisper to fail if using the same CPU (however, when the process for the clang build is killed with Ctrl + C, the other one resumes):
Not sure if the same behavior will be observed on Arm64 for Clang or just on x64, but Arm64 can be built now only with Clang, so it would be great if someone can test this.
Thank you!
The text was updated successfully, but these errors were encountered: