-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configs: Consider adding CONFIG_UDMABUF=y #6706
Comments
Any thoughts, @jc-kynesim, @cillian64, @naushir and @6by9? |
P.S. in case you wonder: That makes it much more attractive for apps/toolkits/libraries to support. |
I'm not opposed to this change, but our libcamera based camera applications already do their own buffer allocations through the dma heap instead of relying on the kernel drivers to do this: This provides all the efficiency (i.e. userland managed and cached) that we require. So I doubt we would change to using UDMABUF allocations for these applications. |
Kodi can support udmabuf in additional to dma heap. Currently we go through the dma heap path, but it looks like if udmabuf is available it will take precedence. It's possible the cacheability may be different (and according to PR whether buffer is contiguous, although I would hope it would be if we are trying to display it), so I'd like to test whether anything goes wrong with udmabuf enabled. |
I can't see that there is a downside to enabling it as long as the dma_heap
method remains available. I doubt it is usable if you need CMA given that
the mem is pre-alloced by memfd. Again given that you have mem from memfd
I'd assume it was cacheable? Is there more doc than the header file? If I
google udmabuf I get an entirely different udmabuf interface!
…On Fri, 7 Mar 2025 at 10:40, popcornmix ***@***.***> wrote:
Kodi can support udmabuf <xbmc/xbmc#17523> in
additional to dma heap <xbmc/xbmc#17532>.
Currently we go through the dma heap path, but it looks like if udmabuf is
available it will take precedence.
It's possible the cacheability may be different (and according to PR
whether buffer is contiguous, although I would hope it would be if we are
trying to display it), so I'd like to test whether anything goes wrong with
udmabuf enabled.
—
Reply to this email directly, view it on GitHub
<#6706 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADV4O4MK4LHKZZIKTP2NBKL2TFZSPAVCNFSM6AAAAABYQ2L36KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBWGEYTKOBQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: popcornmix]*popcornmix* left a comment (raspberrypi/linux#6706)
<#6706 (comment)>
Kodi can support udmabuf <xbmc/xbmc#17523> in
additional to dma heap <xbmc/xbmc#17532>.
Currently we go through the dma heap path, but it looks like if udmabuf is
available it will take precedence.
It's possible the cacheability may be different (and according to PR
whether buffer is contiguous, although I would hope it would be if we are
trying to display it), so I'd like to test whether anything goes wrong with
udmabuf enabled.
—
Reply to this email directly, view it on GitHub
<#6706 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADV4O4MK4LHKZZIKTP2NBKL2TFZSPAVCNFSM6AAAAABYQ2L36KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBWGEYTKOBQGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
If as I suspect this cannot do CMA we should only enable it on IOMMU capable Pis otherwise it will just be confusing. |
That's not straightforward. bcm2711_defconfig is used on 64-bit pi3, pi4 and pi5 (with 4k pagesize). |
Which of 2711 / 2712 do we ship by default on Pi5? If it is 2712 then I'd be in favour of just adding it to 2712_defconfig but I can see there are other valid opinions here. |
We ship both. 2712 (16k pagesize) is the default on Pi5 but there are some compatibility issues with software that assumes a 4k pagesize, so some users run with 2711 / 4k ( |
Cursory googling suggests UDMABUFS do allocate contiguous blocks. |
You've found a different udmabuf interface! (same as I did first time) |
I'm happy to leave the decision to someone else, but I do know that passing a non-CMA dma buffer to something expecting CMA produces very confusing errors, especially as it sometimes works as the memory just happens to be contiguous (this normally happens on the first run through then everything fails on subsequent runs). |
How do Kodi et al detect UDMABUF support, and is there scope for enabling via a cmdline.txt setting? |
FWIW., I was just able to confirm that my main use-case, improving performance for sw-decoded video, works as expected on the RPi5. With this Gstreamer MR I can fluently play 1080p AV1 (8bit) on a 4K screen as the udmabuf buffers are passed through from Gstreamer -> GTK4 -> Gnome-Shell/Mutter -> KMS:
Playing 4K@30FPS unfortunately is still not smooth, but passthrough works as well. So both GPU and display engine seem to handle the buffers pretty well already. Will quickly test the RPi4 - IIUC the display engine there shouldn't support the import, however the GPU should. |
FWIW I think this https://github.com/torvalds/linux/blob/master/include/uapi/linux/udmabuf.h is the interface under discussion |
@rmader You might not want to do it this way but allocating dmabufs via /dev/dma_heap/vidbuf_cached (a sym-link to system or linux,cma depending on Pi variant) will work on all Pis and does work well for s/w decode buffers |
In the context of Wayland, GL/VK and KMS - shouldn't that usually just gracefully fail? In any case, the udmabuf we're talking about here uses virtual memory.
Wow, that's wild^^
By checking whether we can open |
That suggests we could use a module parameter set from bootargs in Device Tree to enable it on model-specific basis. |
I think that by the time you have a dmabuf handle in your hand you must have all the memory locked down - almost no bit of h/w is going to cope with paging requests. Failure tends to happen late in the process when a driver tries to actually get a h/w address from the handle with a call it is expecting to succeed.
Seen it happen |
That sounds like a solid idea |
Thanks, that's great to know - for my current project it's kinda an anti-goal though :) The context here is that we are exploring whether sw-decoding with udmabuf could be an almost universal baseline for video playback on semi-recent linux devices. I previously tested the same GST patches on old Intel and AMD laptops, now I'm trying various ARM devices. In theory - and so far results are very promising - the approach should allow us to make software decoding quite a bit faster compared to previous generic approaches, by avoiding unnecessary copies in the graphics stack whenever at least the GPU has a MMU. Because the buffers can just get passed through the same code paths that many apps already have for HW decoding - and only once we need to composite, be it in the app, the system compositor or the display engine, a copy happens. The RPi5 is an interesting case because it lacks HW decoders for common formats BUT has a powerful display engine, unlike older laptops that usually need to use the GPU for the final blit. |
Tested the RPi4 now. As expected the display engine doesn't support / accept udmabuf buffers - it seems to handle that perfectly though, failing in the test-only KMS commits, not in real ones (I didn't test long though). The GPU in turn imports the buffers just fine as expected, meaning that clients can pass the buffers through, up to the Wayland compositor when possible, ensuring a minimal amount of copies. Thus I suggest to enable udmabuf on the 4 as well. P.S.: With the GST patches Showtime (the upcoming Gnome default player) outperformed mpv (upstream version) and just managed to play 1080p30fps-8bit AV1 (low quality) on a 2560x1440 screen smoothly. And the same should be possible in other apps/players - notably Firefox. |
Beware that our Wayland now has passthrough for dmabufs direct to HVS if fullscreen. This may confuse that. |
I tried enabling this and testing kodi. Initially I see in kodi log
because user isn't in kvm group:
Adding user to kvm group does mean kodi uses the /dev/udmabuf node, but h264 (which is software decode on Pi5) is very corrupt. Checking logs shows:
I'm guessing this is trying to use cma (which is set quite low on Pi5, due to availability of iommus). Ah yes, dmesg shows (many of):
and increasing cma does get the file to play. But this is suboptimal - Pi5 doesn't require cma for this use case (dma heap can allocate from non-contiguous system memory using /dev/dma_heap/vidbuf_cached). |
Those traces look like you are still using dma_heap but have fallen back to linux,cma not udmabuf at all |
how does fixing permissions on /dev/udmabuf make it use dma_heap? I'm pretty sure this code is being used and the UDMABUF_CREATE ioctl is causing a cma allocation. |
OK - but the debug you quote says "ioctl DMA_HEAP_IOCTL_ALLOC failed" not "ioctl UDMABUF_CREATE failed" |
Turning on udmabuf seems sensible. I agree that if there's a neat way to only enable it on Pi5 then that might avoid some confusion. As an aside, does anyone know why udmabuf only works with memfd and not regular SHM allocations? Is it just so we can enforce appropriate seals (🦭🦭)? It would be really useful if we could use udmabuf to do DRM scanout on regular Wayland SHM buffers. |
No objection from me if it works in a useful manner. I remember @cillian64 looking at it and getting some benefit, but largely on Pi5 as HVS then has an IOMMU. Earlier boards it is less useful. |
Describe the bug
From the docs:
It is becoming increasingly popular for multimedia related tasks and, from Fedora 41 and systemd 257 on, is available by default to users.
The RPi5 can notably benefit from it as it lacks hardware video decoders for most common codecs such as H264, VP9 and AV1 - udmabuf allows software decoders to allocate buffers that can be used by the powerful display engine.
Thus I'd like to kindly request to add
CONFIG_UDMABUF=y
to the configs :)Some more context:
udmabuf was recently enabled by default in systemd and is used by the libcamera software ISP, mesa-llvmpipe and, hopefully soon, gstreamer. The later MR is an experiment, allowing software decoded video to be displayed much more efficiently compared to other approaches.
See also:
Steps to reproduce the behaviour
Device (s)
Raspberry Pi 5
System
Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: