Questions on internal path of an SQE #950
-
Hello to all.
Based on my current understanding (which may be wrong) the execution modes of a request are:
However, in trying to understand more about the the async path, I wrote a relatively simple program that opens a file and reads 512 bytes with the [...]
buf = malloc(BUF_SIZE);
fd = open("test.txt", O_RDONLY);
io_uring_queue_init(32, &ring, 0);
sqe = io_uring_get_sqe(&ring);
io_uring_sqe_set_flags(sqe, IOSQE_ASYNC);
io_uring_prep_read(sqe, fd, buf, BUF_SIZE, 0);
io_uring_submit(&ring);
[...] And the results are the following, no
Additionally, I've included an (incomplete) visualization of what I described above. I'm aware that it may not be correct, so I'd appreciate any corrections to help me better understand the entire process. Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
I think you'll have better luck if you do:
as any prep handler will clear the sqe, hence overwriting whatever you've put in it upfront (like the ASYNC flag). |
Beta Was this translation helpful? Give feedback.
-
Yes this works as expected, thank you @axboe. If anyone has any tip on the figure I share in my previous post about the SQE path it would really helpful to share it! |
Beta Was this translation helpful? Give feedback.
-
For your diagram, the most important thing it's missing is the fast poll implementation that io_uring has. For that '???' box, the question before that should be "Can this file type be polled for data/space readiness?" and if the answer is yes and you go to that ??? box, the answer is "Arm a poll handler to get notified when we can retry the operation". The io-wq / io worker side generally isn't very interesting, as most things should never hit that side. They are just a fallback for slower operations, generally, if we can't do sane nonblocking operations on them or they are not pollable (like regular files, for example). Speaking of regular files, reading or writing to them with buffered IO has some special support as well. On the read side, it works kind of like polling, in that we kick off a read-ahead window that includes the requested range, and then we retry that similarly to reading from a socket when we get a callback that the readahead has finished. For writing, it's mostly just dirtying data. If we need to balance dirty pages, then it'll get punted to io-wq. |
Beta Was this translation helpful? Give feedback.
-
That was a very informative response. Thanks!
This action is taking place in process context of the process that made the submission right? The process submits let's say an IORING_OP_READ, in a file which is opened in buffered mode, then the kernel runs in this process context and fires a readahead (if the data are not present in the page-cache) and return control to the process. And when the readahead has finished it runs a callback to fulfill the CQE. This callback runs in interrupt context? |
Beta Was this translation helpful? Give feedback.
-
The readahead callback can indeed be in interrupt context, but finishing the original read operation and posting the CQE runs in the context of the original issuing task. |
Beta Was this translation helpful? Give feedback.
For your diagram, the most important thing it's missing is the fast poll implementation that io_uring has. For that '???' box, the question before that should be "Can this file type be polled for data/space readiness?" and if the answer is yes and you go to that ??? box, the answer is "Arm a poll handler to get notified when we can retry the operation".
The io-wq / io worker side generally isn't very interesting, as most things should never hit that side. They are just a fallback for slower operations, generally, if we can't do sane nonblocking operations on them or they are not pollable (like regular files, for example).
Speaking of regular files, reading or writing to them with buffered…