-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
register memory that larger than 4GB #39
Comments
@yuvaldeg you're the ODP expert, what do you think? |
@lynus is correct, this should work and allow us to register >4GB blocks. We should change it to long and break the build if this is not a 64bit system. |
@lynus, please report any other issues you may be experiencing with ODP - are these mostly functional or performance issues? |
I have encountered following pitfalls that are not documented in any materials I could find in the Internet:
The infiniband card in my experiements is MT2770 (Mellanox ConnectX-4). |
Re #1: You are correct, rnr_retry_count should be non-zero. ODP utilizes RNR for it to function as expected. E.g., if a user tries to read/write from/to remote memory, whereas the remote memory is registered with ODP, the remote side will respond with an RNR so that the initiator will back off until the page fault process is complete. This is why RNRs are a requirement for ODP. Re #2: For the reasons above, page fault is a lengthy process that should be avoided as much as possible. ODP is intended to be used with buffers that get reused frequently - thus they do not get evicted too frequently hence minimizing the occurrence of page faults. A better approach for ODP is to use it on larger data buffers and then use prefetch wisely and make any effort to reuse those buffers and not release them. Everything else should be kept registered without ODP (e.g. QP WR buffers, receive WQEs, etc...). Regarding the ConnectX-4 card - generally speaking, ConnectX-5 will get better performance with ODP, though ConnectX-4 also shows very nice improvements with ODP. Re IBV_WC_WR_FLUSH_ERR - this is usually a symptom of some other issue. Whenever a QP gets an error, it will report the error using a WC, and will then flush the rest of the posted WRs with an IBV_WC_WR_FLUSH_ERR. So, you should fish out the first WR that fails and see what reason for failure it reports - we can further assist if needed. |
@yuvaldeg Thanks for the comments. So there are no prior error to IBV_WC_WR_FLUSH_ERR and the only failed WR is B with flush error. This kind of error is really hard to debug. |
Thanks for the details, @lynus |
Right now, I'm working on a experimental project aiming at using DiSNI directly writing to remote JVM managed heap. The entire code includes customed JVM (https://github.com/lynus/JikesRVM/tree/RDMA) and my disni fork (https://github.com/lynus/disni). So it may take too many time to dive in. I will write a short C codes to reproduce this exact situation. Thanks for your kindness :-) |
I vote for 64bit registration support. |
@patrickstuedi I agree with @lynus, I think we should support it. It will be needed sooner or later, so we might as well do it now. |
Recently, I extended disni enabling On Demand Paging (ODP) feature. With this feeature, it is practical to register large memory whose size should be hold in long type instead of int. In fact, native call ibv_reg_mr() use size_t which is 64-bit wide in 64-bit architecture system.
So what would think of breaking the api and updating IbvMr.length to long typed?
P.S. I encountered some pitfalls to enable ODP. API imcompatibility is just one them.
The text was updated successfully, but these errors were encountered: