-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prb-wm – too many concurrent requests with many workers #1285
Comments
I can confirm the problem exists in serveral farms. The problem increases when you have more workers connected. I was looking for the specific code that triggers this, so good find @ngrewe ... i now have more to finally get devs moving on this. |
Oh i also noticed that using headerscache solves it ... BUT you must implement HC correctly (synced, good bind address not 127.0.0.1, etc) then it does limit the amount of load towards the node hitting this client slots issue. So to reproduce just remove HC from the config, only use 1 node as backend, with a WM having ~150 workers. |
The thing about headers-cache is that the version in the latest docker image is a bit unreliable because it tends to also import unfinalized blocks, which causes sync issues. There's a fix for this in-tree, though... just not released. |
I make a release, could you help to try it? just change the Docker image to jasl123/phala-headers-cache:23062301 If everything good, I shall move it to phalanetwork org |
We'll deploy it in a test environment. I'll get back to you after I've had it soak test for a few days. |
Got the same issue, using 1 node and don't have header-cache, just a single archive node and 178 + 116 workers on two different PRBv3 wm's. |
Can someone maybe refactor the prb code so it just uses a small amount of RPC clients instead of several per worker? |
Sorry for the late response but we're testing a workaround #1388 jasl123/phala-prb:23091801 |
I've already patched my prb version to allow a higher connection count but that just overloads the node more. Please implement some connection pooling or other way that not every worker establishes a new node connection. Other PRBv3 users also mentioned it's more stable when using a header cache which is something that is not mentioned on the public mining wiki. |
When the new prb-wm service handles more than ~250 workers with default settings, we start seeing the following kind of error:
This error comes from jsonrpcee and occurs because some client is build with a low (default is 256) limit of concurrent requests and the prb is utilising many of them. I suspect that it's this client here, but I'm not familiar with architecture to be sure:
https://github.com/Phala-Network/phala-blockchain/blob/8fe05eb72b76f4939d0c03e62a3fc7e58b260a5c/standalone/prb/src/datasource.rs#LL555C1-L555C85
We seem to have worked around this (somewhat) by increasing
CACHE_SIZE
, but it would be nice to have the number of concurrent connections configurable.The text was updated successfully, but these errors were encountered: