-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
work completion event received with wrong value #44
Comments
Does the problem also occur with the SendRecv Benchmark (https://github.com/zrlio/disni/blob/master/src/test/java/com/ibm/disni/benchmarks/SendRecvClient.java, https://github.com/zrlio/disni/blob/master/src/test/java/com/ibm/disni/benchmarks/SendRecvServer.java)? |
I didn't try it, I have made changes in these classes: |
The issue is that disni reuses wc object. In your code you need to do: lastEvent.set(wc.clone()); After changing this got correct result SEND -> RECV -> SEND -> RECV, etc. |
I've changed my code as follow: public synchronized void dispatchCqEvent(IbvWC wc) throws IOException { I am using Soft-RoCE to do this check , are you using the real hardware? |
@yaelna Can you please run this function: public void dispatchCqEvent(IbvWC wc) throws IOException {
System.out.println("Server got event " + wc.getWr_id() + " : " + IbvWC.IbvWcOpcode.valueOf(wc.getOpcode()) +
"Old " + lastEvent.get() + "\n\n");
int newOpCode = wc.getOpcode();
IbvWC old = lastEvent.get();
if(old == null){
lastEvent.set(wc.clone());
System.out.println(" Last event1 = " + lastEvent.get());
} else if (old.getOpcode() == newOpCode){
System.out.println( "Last event: " + old + "current :" + wc);
throw new RuntimeException("*******server got "+ IbvWC.IbvWcOpcode.valueOf(newOpCode) +" event twice in a row. last id = "+old.getWr_id()+", current id "+old.getWr_id()+"***********");
}
lastEvent.set(wc.clone());
System.out.println("Last event1 = " + lastEvent.get());
wcEvents.add(wc);
} Yes i'm running on real Roce device. Regarding SoftRoce, maybe related to #37. Please see my comment: |
I am testing Flink with DiSNI and noticed some sporadic duplicated receives. We are already cloning the WC in endpoint classes as suggested in this bug.
All the receives starting from seq 0 till 405014 received without any repetition. This issue is happening only once in a while. Server did not repeat the sequence during send, however, on client it is received twice. Any ideas why this could happen? Additional information: it is tested on InfiniBand HCA mlx-4 |
The issue could be with the way This immediate Imagining the HCA reading the WR from the command address and current WR being written to the same address as an interleaved operation, this could explain why we see duplicate WR ids, data in the receive or send completions. The solution could be to hold command reference until WC is received and call |
Hi Venkatsc, thanks for posting this. Let me double check the code as well and find out if this is really an issue of the buffer being freed too early, that would be a major bug. |
Can you point me to the line in the example with the problematic early free? |
@patrickstuedi I have faced with this issue irregularly while testing code by posting many WRs at once. As per my reading of the code, it is more of documentation fix. Method The below line of code is taken from here endpoint.postSend(endpoint.getWrList_send()).execute().free(); I think, below flow causing the issue when execute and free methods used in immediate succession.
|
in (2), when exec is called, the WR list is actually copied by the user driver to a dedicated memory area from where the NIC is fetching the WR list via DMA. The copied WR lits also needs to have the specific format matching the device so the user driver also creates a slightly different layout. Therefore, the application memory backing the WR list can be freed and re-used right away after exec(). The copying of the WR list can been seen in the default post_send call of libibverbs that eventuallyIcalls into the kernel: https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/cmd.c#L1247 This is not the post_send call that typiclly gets executed if you run on IB or RoCE beause normally a specific user-driver for a particular NIC is directly memory mapping the NIC queues and populates WRs without kernel involvement, but the copying will be similar. At least that's my understanding of how things work, happy to discuss. |
Note that we are talking about copying WRs, not data, just to not create confusion here. |
-----"Patrick Stuedi" <notifications@github.com> wrote: -----
To: "zrlio/disni" ***@***.***>
From: "Patrick Stuedi" ***@***.***>
Date: 11/05/2019 04:42PM
Cc: "Subscribed" ***@***.***>
Subject: [EXTERNAL] Re: [zrlio/disni] work completion event received
with wrong value (#44)
in (2), when exec is called, the WR list is actually copied by the
user driver to a dedicated memory area from where the NIC is fetching
the WR list via DMA. The copied WR lits also needs to have the
specific format matching the device so the user driver also creates a
slightly different layout. Therefore, the application memory backing
the WR list can be freed and re-used right away after exec().
The copying of the WR list can been seen in the default post_send
call of libibverbs that eventuallyIcalls into the kernel:
https://github.com/linux-rdma/rdma-core/blob/master/libibverbs/cmd.c#
L1247
This is not the post_send call that typiclly gets executed if you
run on IB or RoCE beause normally a specific user-driver for a
particular NIC is directly memory mapping the NIC queues and
populates WRs without kernel involvement, but the copying will be
similar.
At least that's my understanding of how things work, happy to
discuss.
I agree. The work request list is only needed to create driver
specific work queue elements. The post_send() or post_receive()
call is synchronous. If the call finishes, appropriate work
requests got instantiated within the work queue of the driver,
or the call fails since the work queue was full. In any case, with
the call return, the WR list is under control of the caller
again and can be re-used/overwritten, or free'd.
Best regards,
Bernard.
|
Indeed, you are right. I was under assumption that, libibverbs does not copy the request from the given address. As I mentioned before, the issues is occurring rarely, but after not freeing resource until completion I didn't see the issue. Hence, thought Recently, after posting the comment, I had faced the duplicate response again. Issue is due to application bug that is overriding header in the same memory location. |
Should I remove the comments, to avoid misdirection. As you explained, |
Please leave the comments, I think it's a good discussion. And it would be great to find the actual cause for this problem.. |
we tried implementing sever client on top of send recv operations and noticed that sometimes we get a receive completion event following a send operation (or vise versa)
I've managed to reproduce the issue on top of RDMAvsTcpBenchmark ( changed files attached) by saving the last completion event and compering the new event against the old one.
we expect that events will always be send-recv-send-recv-... and found that we get two consecutive similar events.
server output:
2305 [Thread-0] INFO com.ibm.disni - got event type + RDMA_CM_EVENT_ESTABLISHED, srcAddress /192.168.33.137:8881, dstAddress /192.168.33.137:51085
RDMAvsTcpBenchmarkServer::client connection accepted
2308 [Thread-1] INFO com.ibm.disni - cq processing, caught exception but keep going server got IBV_WC_SEND event twice in a row. last id = 2000, current id 2000****
java.lang.RuntimeException: server got IBV_WC_SEND event twice in a row. last id = 2000, current id 2000****
at com.ibm.disni.examples.SendRecvServer$CustomServerEndpoint.dispatchCqEvent(SendRecvServer.java:212)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:37)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:29)
at com.ibm.disni.RdmaCqProcessor.dispatchCqEvent(RdmaCqProcessor.java:106)
at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:136)
at java.lang.Thread.run(Thread.java:745)
2309 [Thread-1] INFO com.ibm.disni - cq processing, caught exception but keep going server got IBV_WC_RECV event twice in a row. last id = 2001, current id 2001****
java.lang.RuntimeException: server got IBV_WC_RECV event twice in a row. last id = 2001, current id 2001****
at com.ibm.disni.examples.SendRecvServer$CustomServerEndpoint.dispatchCqEvent(SendRecvServer.java:212)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:37)
at com.ibm.disni.RdmaActiveCqProcessor.dispatchCqEvent(RdmaActiveCqProcessor.java:29)
at com.ibm.disni.RdmaCqProcessor.dispatchCqEvent(RdmaCqProcessor.java:106)
at com.ibm.disni.RdmaCqProcessor.run(RdmaCqProcessor.java:136)
at java.lang.Thread.run(Thread.java:745)
changedFiles.zip
The text was updated successfully, but these errors were encountered: