Copies references to data structures to avoid extra ARC traffic.#42
Copies references to data structures to avoid extra ARC traffic.#42
Conversation
Note: this change (by itself) does not reduce ARC traffic, but in concert with `UnmanagedBuffer` (#41), we see a ~15x performance improvement in the parallelFor benchmark.
|
A couple of words describing the relationship you're establishing by mentioning that issue here would be appreciated. |
dabrahams
left a comment
There was a problem hiding this comment.
I have to say, the phrasing of the change description was rather non-obvious to me on its own. I think I roughly understand what you're doing but it's also very non-obvious why it makes to store shared data structures in something called PerThreadState. Maybe you could at least segregate those properties or comment them to make it clear what's happening. I approve of this change as a step along the path of progress, but I think once things settle down a bit it would be a good for us to do a full walkthrough of all of this code. In the meantime, how about turning on thread sanitizer for your tests of parallel components? ;-)
|
|
||
| func spin() -> Task? { | ||
| let spinCount = pool.threads.count > 0 ? Constants.spinCount / pool.threads.count : 0 | ||
| let spinCount = totalThreadCount > 0 ? Constants.spinCount / totalThreadCount : 0 |
There was a problem hiding this comment.
This change looks suspect. Elsewhere you have replaced pool.totalThreadCount with totalThreadCount but here you're replacing pool.threads.count which is presumably a different quantity (if it isn't, I question the value of pool.totalThreadCount!)
There was a problem hiding this comment.
Good eye! This should be the number of worker threads and not the totalThreadCount. Thanks!
|
@saeta You didn't address my suggestion to turn on tsan. Building parallel stuff and not testing with tsan seems really super inadvisable. At least file an issue? |
Note: this change (by itself) does not significantly reduce ARC traffic, but in concert
with
UnmanagedBuffer(#41), we see a ~15x performance improvement inthe parallelFor benchmark.
Performance for
NonBlockingThreadPool: parallel for, one level:Note: the variability in timing increases substantially as performance improves. This is (I believe) due to the lower frequency of stealing.