-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zero-copy support #359
Comments
Changing the API would require completely redesigning the entire implementation :-) It's also almost impossible to make a two-stage approach thread-safe for the MPMC case (while being efficient). It sounds like you only have one producer, and one consumer. A single-producer/single-consumer (SPSC) queue, like my ReaderWriterQueue, would be more efficient in that case. In any case, objects are moved both on enqueue and dequeue if the type supports it (using the move constructor and move assignment operator, respectively). Instead of copying entire buffers, consider using pointers to buffers. With two queues, you can have one containing pointers to buffers to be processed, and one containing pointers to free buffers (essentially acting as a pool to avoid dynamically allocating buffers on the hot path). |
No, I just layed out a simplified example. In the real world, my network stream is split over a dozen UDP multicast groups. I have one producer thread per multicast group.
Is it really? I did not read the actual code, but I would expect e. g. the producer enqueue to look something like:
Which could be changed to something like:
Basically, I would expect that the enqueuing and dequeuing decomposition would naturally follow the flow of the existing functions. It would just be a matter of returning the intermediate states to the caller during If the split by returning intermediate states is annoying to implement, I guess it could also be implemented by enqueue/dequeue accepting a function pointer or lambda that operates on the slot instead of an element to copy. Something along the lines of:
|
It's not that simple, as there's quite a bit of state beyond just the tail index. This style of API is also easy to accidentally misuse, leading to bugs in the calling application. If you want to use the queue as backing storage for a C-style API (e.g. |
I see, indeed. And I agree with you that 2 stage operations are more error prone. I still think it's worth it though. Packet processing is a common task for concurrent queues. Here is what 2 stage/zero copy looks like in DPDK's ring buffer for instance:
The problem I have with that is creating an efficient, thread safe memory pool is not a trivial task. The queue buffer is naturally contiguous, already in the cache, avoids further dereferences, etc. |
I am willing to use the concurrent queue for a low latency network application. Overall, I need to enqueue network packets on one thread, and decode them in another thread - in practice that would be multiple producers/consumers, but it does not matter here. The queuing is necessary as with high packet rates, decoding synchronously would be too long and start dropping incoming packets.
In pseudo code, that means my flow is roughly:
My actual packets are copied all over the place here.
Now, I am more of a C programmer than C++, so the subtleties of move operations are a bit magical to me, but do I have some way to ensure that
p
will not be copied upon enqueuing and dequeuing? It seems a bit magical to me.Also the possible move is only on the producer side, not on the consumer side, as far as I understand.
I would propose an explicit API to handle that case, which splits enqueue and dequeue in two halves. The first half grabs the slot from the queue, and the second half commits the operation.
That would look like:
This would allow true (producer + consumer) zero copy behavior, without relying on compiler optimizations.
The text was updated successfully, but these errors were encountered: