Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NULL pointer dereference in kfd_dbgmgr_wave_control #70

Closed
misos1 opened this issue Feb 8, 2019 · 3 comments
Closed

NULL pointer dereference in kfd_dbgmgr_wave_control #70

misos1 opened this issue Feb 8, 2019 · 3 comments

Comments

@misos1
Copy link

misos1 commented Feb 8, 2019

Calling hsaKmtDbgWavefrontControl causes kernel bug. Seems after this rocm is somehow "blocked" and system cannot be soft-rebooted so probably some locked mutex was not unlocked.

main.cpp:

#include <hc.hpp>
#include <hsa.h>
#include <hsakmt.h>

int main()
{
	hc::accelerator_view view = hc::accelerator().get_default_view();
	hsa_agent_t agent = *static_cast<hsa_agent_t*>(view.get_hsa_agent());
	unsigned int node;
	hsa_agent_get_info(agent, HSA_AGENT_INFO_NODE, &node);

	HsaDbgWaveMessage msg = {0};
	hsaKmtDbgWavefrontControl(node, HSA_DBG_WAVEOP_TRAP, HSA_DBG_WAVEMODE_SINGLE, 2, &msg);

	return 0;
}

Run:

hcc -hc -lhsa-runtime64 -lhsakmt main.cpp
./a.out

dmesg:

[  279.910283] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  279.910345] IP: kfd_dbgmgr_wave_control+0x12/0x60 [amdgpu]
[  279.910347] PGD 7e8155067 P4D 7e8155067 PUD 81419b067 PMD 0 
[  279.910352] Oops: 0000 [#1] SMP NOPTI
[  279.910422] CPU: 17 PID: 7520 Comm: a.out Tainted: G           OE    4.15.0-45-generic #48-Ubuntu
[  279.910424] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Professional Gaming, BIOS P3.30 08/14/2018
[  279.910477] RIP: 0010:kfd_dbgmgr_wave_control+0x12/0x60 [amdgpu]
[  279.910478] RSP: 0018:ffff9c339056fd28 EFLAGS: 00010246
[  279.910481] RAX: ffff8dee7ce4b800 RBX: ffff9c339056fdb0 RCX: 0000000000000000
[  279.910482] RDX: 000000000000800b RSI: ffff9c339056fd38 RDI: 0000000000000000
[  279.910484] RBP: ffff9c339056fd28 R08: ffff9c3390570000 R09: 0000000000000020
[  279.910485] R10: 0000000000000020 R11: 0000000000000fa0 R12: ffff8deebcf27800
[  279.910486] R13: ffff8dee760cb440 R14: ffff8dee7ce4b800 R15: ffff8dee82a73200
[  279.910489] FS:  00007f284a99ec00(0000) GS:ffff8deedcc40000(0000) knlGS:0000000000000000
[  279.910490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  279.910492] CR2: 0000000000000000 CR3: 000000084e608000 CR4: 00000000003406e0
[  279.910493] Call Trace:
[  279.910544]  kfd_ioctl_dbg_wave_control+0x120/0x1a0 [amdgpu]
[  279.910593]  kfd_ioctl+0x271/0x450 [amdgpu]
[  279.910640]  ? kfd_ioctl_destroy_queue+0x70/0x70 [amdgpu]
[  279.910645]  ? __handle_mm_fault+0x478/0x5c0
[  279.910650]  do_vfs_ioctl+0xa8/0x630
[  279.910652]  ? handle_mm_fault+0xb1/0x1f0
[  279.910655]  ? __do_page_fault+0x270/0x4d0
[  279.910658]  SyS_ioctl+0x79/0x90
[  279.910662]  do_syscall_64+0x73/0x130
[  279.910666]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[  279.910668] RIP: 0033:0x7f2848e1c5d7
[  279.910670] RSP: 002b:00007ffd97cd0f38 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  279.910672] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f2848e1c5d7
[  279.910673] RDX: 00000000010b6600 RSI: 0000000040104b10 RDI: 0000000000000003
[  279.910675] RBP: 00000000010b6600 R08: 00007ffd97cd0fd0 R09: 0000000000000000
[  279.910676] R10: 0000000001003010 R11: 0000000000000246 R12: 0000000040104b10
[  279.910677] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
[  279.910679] Code: c7 c8 bf 83 c0 e8 bf 0d 28 e5 48 c7 c0 ea ff ff ff eb d2 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 8b 06 48 89 e5 8b 90 90 00 00 00 <39> 17 75 11 48 8b 7f 10 48 8b 47 38 e8 9d fe 9b e5 48 98 5d c3 
[  279.910759] RIP: kfd_dbgmgr_wave_control+0x12/0x60 [amdgpu] RSP: ffff9c339056fd28
[  279.910760] CR2: 0000000000000000
[  279.910763] ---[ end trace 33bd6cf8014cbbaf ]---
@ppanchad-amd
Copy link

@misos1 Apologies for the lack of response. Can you please check if your issue still exist with the latest ROCm 6.2? If not, please close the ticket. Thanks!

@ppanchad-amd
Copy link

@misos1 Closing ticket. Please feel free to re-open ticket if you still see the issue with the latest ROCm. Thanks!

@ppanchad-amd ppanchad-amd closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2024
@misos1
Copy link
Author

misos1 commented Oct 16, 2024

Yes I forgot, this seems to be resolved now, also #71.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants