-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement GET protocol for dependencies #420
base: master
Are you sure you want to change the base?
Conversation
Have you looked at performance at all? I think I've discussed with @bosilca ad nauseam regarding the implications regarding communication prioritization. I might pull a version of this into my branch at some point so I can test.
There could very well be a bug in the GET implementation for the MPI comm engine that has gone undiagnosed since it's not been well-tested. |
I implemented this to have a better baseline in the comparison with TTG (which does GET instead of PUT). As far as I remember, there was little to no benefit in terms of performance (didn't get worse though). |
That's relevant! How much did you scale and were you using George's hypotheses regarding this are, if I recall correctly, along the lines that a sender can more easily regulate how much data it pushes onto the network than the receiver—with a GET protocol the sender doesn't have as much ability to prioritize communications, so multiple receivers can end up competing for a sender's bandwidth. On the other hand, a PUT protocol can overwhelm a receiver with many incoming messages, but that shouldn't be the case for most PaRSEC applications since the receiver also regulates which data it requests to be sent. |
83623c1
to
1b856ca
Compare
Sweet, it seems to have been a problem with the termination detection @therault :) All checks pass now |
Adds runtime_comm_get MCA parameter to enable use of the GET protocol. Signed-off-by: Joseph Schuchart <schuchart@icl.utk.edu>
1b856ca
to
edbd8fe
Compare
Argh of course not, CI doesn't test the GET protocol. Test fail if run with |
I think I might have found the bug. When a process gets too many internal GET AMs (or is asked to do too many PUTs), then it defers starting the Like I said, no one has tested GET really. |
Sigh, thanks for checking @omor1. I'm fairly sure I had it working at some point before the big merge. The MPI backend is still student research quality, at best. I hate the fact that we're putting data into random fields, makes the code unmaintainable. I guess It's good to have pointer though, will have to take a closer look at it again... |
"research quality" The LCI backend is better, in my humble opinion, and is certainly better-documented. It still has a decent amount of jankiness from various things I tried and haven't fully ripped out, but is more maintainable. |
parsec_type_size(dtt, &dtt_size); | ||
parsec_ce.mem_register(dataptr, PARSEC_MEM_TYPE_CONTIGUOUS, | ||
-1, NULL, | ||
dtt_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aren't we missing the count here ? parsec_type_size returns the size of the dtt
type but it does not account for the nbdtt
, so we need to scale it up for the mem_register in the contiguous case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes—this has long been fixed on my branch, see 948aa58.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omor1 if you have a fix, would you mind upstreaming them to this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@omor1 any fixes that you have are more than welcomed.
/* Retreive deps from callback_data */ | ||
remote_dep_cb_data_t *cb_data = (remote_dep_cb_data_t *)msg; | ||
parsec_remote_deps_t* deps = cb_data->deps; | ||
parsec_execution_stream_t* es = &parsec_comm_es; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation.
ce->mem_unregister(cb_data->memory_handle); | ||
parsec_thread_mempool_free(parsec_remote_dep_cb_data_mempool->thread_mempools, cb_data); | ||
|
||
parsec_comm_puts--; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
puts or gets ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that any of parsec_comm_gets_max
, parsec_comm_gets
, parsec_comm_puts_max
, and parsec_comm_puts
are actually being used in any way, so the point is moot—the number of concurrent communications is being managed by each communication engine, not at the upper layer.
parsec_type_size(dtt, &dtt_size); | ||
parsec_ce.mem_register(PARSEC_DATA_COPY_GET_PTR(deps->output[k].data.data), PARSEC_MEM_TYPE_CONTIGUOUS, | ||
-1, NULL, | ||
dtt_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above, we need to account for nbddt
in the contiguous case.
receiver_memory_handle, | ||
receiver_memory_handle_size ); | ||
|
||
// TODO: fix the profiling! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still TODO's left in the code.
Avoids a round-trip by directly fetching data when a dependency release arrives.
Adds
runtime_comm_get
MCA parameter to enable use of the GET protocol.Currently enabled to have it worked by CI. I'm not sure I am using the right datatypes since the reshape and redistribute tests are failing...
Signed-off-by: Joseph Schuchart schuchart@icl.utk.edu