-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Null pointer dereference on concurrent VC_SM_CMA_IOCTL_MEM_IMPORT_DMABUF ioctl #6701
Comments
Investigating. It looks like it fails and is still holding sm_state->lock, so |
#6703 fixes one issue. Allocating and freeing kernel IDs took the spinlock, but looking up the value didn't. Another thread importing or freeing could therefore corrupt the idr whilst a thread was doing a lookup, resulting in a duff buffer pointer. Your test case ran for just over 100000 iterations (compared to a few thousands before), but still failed. |
Definitely timing related as adding logging reduces the rate of reproduction :-/ |
The logging output is leaving me confused at the moment. I've tweaked the test case to have a separate dmabuf with a different size per thread, so that I can identify which log messages are associated to which thread. In the failure case I have the "attempt to import" message, but I don't get the log line from One to look at further tomorrow. |
Not wasting my time at all - more eyes on code is always a good thing.
The error was fairly infrequent (every few days), but when triggered resulted in all the monitors on a police CCTV system being blanked! They weren't too happy to say the least. Although in this case I've already been down that path in trying to work out how we can have skipped the vc_sm_add_resource call. Neither of the pr_debug messages from vc_sm_cma_vchi_import failing appear in my logs, and I've added a log message if Regarding threading, the userspace ioctl call will call On close of the fd, |
I did a bit of debugging and found the following:
I explicitly removed the kfree(buffer), so addresses should be unique and not identical due to reuse. Two things:
|
Continuing:
I traced further into |
I give up for today. After staring at the code for the last two hours I tried replacing |
The weird part is that the vpu_event call shouldn't happen until all firmware references to the memory block have been released, and that includes the one that vc-sm-cma has just taken whilst importing the dmabuf. I have just spotted that there is a slightly surprising loop in I'm adding logging / error handling to check that |
I guess that's the one mentioned here? |
So switching to With The bit I still don't understand is how we're getting that callback twice (I think that's more likely than the sequencing getting messed up during allocation). I am also seeing numerous mailbox calls timing out, but checking the VPU logs it is maxed out dealing with all these buffer mapping calls, so that's not so unsurprising. |
Something is certainly going wrong and stopping messages being handled. |
So this might not even be a Linux-side bug but something in whatever is handling the mailbox replies on the other end? I would assume that the part responsible for handling the 'import' or 'release' message also doesn't accidentally produce two responses. |
It's something in the handling of VCHI. It's most likely that "slots" aren't getting freed under some conditions, and we end up with none available. I'll be talking to pelwell in the morning over it - he's the man who knows how that is meant to all work. |
Describe the bug
I've observed kernel null pointer dereferences while using the VC_SM_CMA_IOCTL_MEM_IMPORT_DMABUF ioctl. A traceback might look like this:
Once that happened, other calls interfacing with the hardware might lock up and in my case the hardware watchdog resets the CPU. See also the discussion on the Pi forum.
Steps to reproduce the behaviour
On a Pi4, run the code from https://gist.github.com/dividuum/da0a9a7038b592898ea269f19917e438. After a few seconds, the program will stop showing output and the kernel log will likely show a traceback similar to the one above. Using more threads seems to speed up the time it takes to crash.
Device (s)
Raspberry Pi 4 Mod. B
System
Tested on
Logs
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: