virtio-gpu + spice-guest-agent #3

jfouquart · 2022-07-05T22:23:31Z

Hi,

I'm using this channel to give you an information, I hope you won't mind too much. Indeed I've been following your work for a while now and I've noticed in particular that you're trying to make wayland work on VM. So I would like to give you a link that I hope will help you to progress freebsd/drm-kmod#119. I also have a port to submit to you if you want https://github.com/jfouquart/spice-guest-agent to simplify the integration with spices clients.

Thanks for your great work and good luck to you.
jfouquart.

ayeedshaikh · 2022-07-12T21:10:51Z

@jfouquart Thank you for the link.

The capture code is typically run entirely in the fence signalling critical path. We're about to add lockdep annotation in an upcoming patch which reveals a lockdep splat similar to the below one. Fix the associated potential deadlocks using __GFP_KSWAPD_RECLAIM (which is the same as GFP_WAIT, but open-coded for clarity) rather than GFP_KERNEL for memory allocation in the capture path. This has the potential drawback that capture might fail in situations with memory pressure. [ 234.842048] WARNING: possible circular locking dependency detected [ 234.842050] 5.15.0-rc7+ #20 Tainted: G U W [ 234.842052] ------------------------------------------------------ [ 234.842054] gem_exec_captur/1180 is trying to acquire lock: [ 234.842056] ffffffffa3e51c00 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc+0x4d/0x330 [ 234.842063] but task is already holding lock: [ 234.842064] ffffffffa3f57620 (dma_fence_map){++++}-{0:0}, at: i915_vma_snapshot_resource_pin+0x27/0x30 [i915] [ 234.842138] which lock already depends on the new lock. [ 234.842140] the existing dependency chain (in reverse order) is: [ 234.842142] -> #2 (dma_fence_map){++++}-{0:0}: [ 234.842145] __dma_fence_might_wait+0x41/0xa0 [ 234.842149] dma_resv_lockdep+0x1dc/0x28f [ 234.842151] do_one_initcall+0x58/0x2d0 [ 234.842154] kernel_init_freeable+0x273/0x2bf [ 234.842157] kernel_init+0x16/0x120 [ 234.842160] ret_from_fork+0x1f/0x30 [ 234.842163] -> #1 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: [ 234.842166] fs_reclaim_acquire+0x6d/0xd0 [ 234.842168] __kmalloc_node+0x51/0x3a0 [ 234.842171] alloc_cpumask_var_node+0x1b/0x30 [ 234.842174] native_smp_prepare_cpus+0xc7/0x292 [ 234.842177] kernel_init_freeable+0x160/0x2bf [ 234.842179] kernel_init+0x16/0x120 [ 234.842181] ret_from_fork+0x1f/0x30 [ 234.842184] -> #0 (fs_reclaim){+.+.}-{0:0}: [ 234.842186] __lock_acquire+0x1161/0x1dc0 [ 234.842189] lock_acquire+0xb5/0x2b0 [ 234.842192] fs_reclaim_acquire+0xa1/0xd0 [ 234.842193] __kmalloc+0x4d/0x330 [ 234.842196] i915_vma_coredump_create+0x78/0x5b0 [i915] [ 234.842253] intel_engine_coredump_add_vma+0x36/0xe0 [i915] [ 234.842307] __i915_gpu_coredump+0x290/0x5e0 [i915] [ 234.842365] i915_capture_error_state+0x57/0xa0 [i915] [ 234.842415] intel_gt_handle_error+0x348/0x3e0 [i915] [ 234.842462] intel_gt_debugfs_reset_store+0x3c/0x90 [i915] [ 234.842504] simple_attr_write+0xc1/0xe0 [ 234.842507] full_proxy_write+0x53/0x80 [ 234.842509] vfs_write+0xbc/0x350 [ 234.842513] ksys_write+0x58/0xd0 [ 234.842514] do_syscall_64+0x38/0x90 [ 234.842516] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 234.842519] other info that might help us debug this: [ 234.842521] Chain exists of: fs_reclaim --> mmu_notifier_invalidate_range_start --> dma_fence_map [ 234.842526] Possible unsafe locking scenario: [ 234.842528] CPU0 CPU1 [ 234.842529] ---- ---- [ 234.842531] lock(dma_fence_map); [ 234.842532] lock(mmu_notifier_invalidate_range_start); [ 234.842535] lock(dma_fence_map); [ 234.842537] lock(fs_reclaim); [ 234.842539] *** DEADLOCK *** [ 234.842540] 4 locks held by gem_exec_captur/1180: [ 234.842543] #0: ffff9007812d9460 (sb_writers#17){.+.+}-{0:0}, at: ksys_write+0x58/0xd0 [ 234.842547] #1: ffff900781d9ecb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_write+0x3a/0xe0 [ 234.842552] #2: ffffffffc11913a8 (capture_mutex){+.+.}-{3:3}, at: i915_capture_error_state+0x1a/0xa0 [i915] [ 234.842602] #3: ffffffffa3f57620 (dma_fence_map){++++}-{0:0}, at: i915_vma_snapshot_resource_pin+0x27/0x30 [i915] [ 234.842656] stack backtrace: [ 234.842658] CPU: 0 PID: 1180 Comm: gem_exec_captur Tainted: G U W 5.15.0-rc7+ #20 [ 234.842661] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 234.842664] Call Trace: [ 234.842666] dump_stack_lvl+0x57/0x72 [ 234.842669] check_noncircular+0xde/0x100 [ 234.842672] ? __lock_acquire+0x3bf/0x1dc0 [ 234.842675] __lock_acquire+0x1161/0x1dc0 [ 234.842678] lock_acquire+0xb5/0x2b0 [ 234.842680] ? __kmalloc+0x4d/0x330 [ 234.842683] ? finish_task_switch.isra.0+0xf2/0x360 [ 234.842686] ? i915_vma_coredump_create+0x78/0x5b0 [i915] [ 234.842734] fs_reclaim_acquire+0xa1/0xd0 [ 234.842737] ? __kmalloc+0x4d/0x330 [ 234.842739] __kmalloc+0x4d/0x330 [ 234.842742] i915_vma_coredump_create+0x78/0x5b0 [i915] [ 234.842793] ? capture_vma+0xbe/0x110 [i915] [ 234.842844] intel_engine_coredump_add_vma+0x36/0xe0 [i915] [ 234.842892] __i915_gpu_coredump+0x290/0x5e0 [i915] [ 234.842939] i915_capture_error_state+0x57/0xa0 [i915] [ 234.842985] intel_gt_handle_error+0x348/0x3e0 [i915] [ 234.843032] ? __mutex_lock+0x81/0x830 [ 234.843035] ? simple_attr_write+0x3a/0xe0 [ 234.843038] ? __lock_acquire+0x3bf/0x1dc0 [ 234.843041] intel_gt_debugfs_reset_store+0x3c/0x90 [i915] [ 234.843083] ? _copy_from_user+0x45/0x80 [ 234.843086] simple_attr_write+0xc1/0xe0 [ 234.843089] full_proxy_write+0x53/0x80 [ 234.843091] vfs_write+0xbc/0x350 [ 234.843094] ksys_write+0x58/0xd0 [ 234.843096] do_syscall_64+0x38/0x90 [ 234.843098] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 234.843101] RIP: 0033:0x7fa467480877 [ 234.843103] Code: 75 05 48 83 c4 58 c3 e8 37 4e ff ff 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 234.843108] RSP: 002b:00007ffd14d79b08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 234.843112] RAX: ffffffffffffffda RBX: 00007ffd14d79b60 RCX: 00007fa467480877 [ 234.843114] RDX: 0000000000000014 RSI: 00007ffd14d79b60 RDI: 0000000000000007 [ 234.843116] RBP: 0000000000000007 R08: 0000000000000000 R09: 00007ffd14d79ab0 [ 234.843119] R10: ffffffffffffffff R11: 0000000000000246 R12: 0000000000000014 [ 234.843121] R13: 0000000000000000 R14: 00007ffd14d79b60 R15: 0000000000000005 v5: - Use __GFP_KSWAPD_RECLAIM rather than __GFP_NOWAIT for clarity. (Daniel Vetter) v6: - Include an instance in execlists_capture_work(). - Rework the commit message due to patch reordering. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Ramalingam C <ramalingam.c@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211108174547.979714-3-thomas.hellstrom@linux.intel.com

We have a debugfs hook to directly call into i915_gem_shrink() with the fs_reclaim acquire annotations to simulate hitting direct reclaim. However we should also annotate this with memalloc_noreclaim, which will set PF_MEMALLOC for us on the current context, to ensure we can't re-enter direct reclaim(just like "real" direct reclaim does). This is an issue now that ttm_bo_validate could potentially be called here, which might try to allocate a tiny amount of memory to hold the new ttm_resource struct, as per the below splat: [ 2507.913844] WARNING: possible recursive locking detected [ 2507.913848] 5.16.0-rc4+ #5 Tainted: G U [ 2507.913853] -------------------------------------------- [ 2507.913856] gem_exec_captur/1825 is trying to acquire lock: [ 2507.913861] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: kmem_cache_alloc_trace+0x30/0x390 [ 2507.913875] but task is already holding lock: [ 2507.913879] ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915] [ 2507.913962] other info that might help us debug this: [ 2507.913966] Possible unsafe locking scenario: [ 2507.913970] CPU0 [ 2507.913973] ---- [ 2507.913975] lock(fs_reclaim); [ 2507.913979] lock(fs_reclaim); [ 2507.913983] DEADLOCK *** [ 2507.913988] May be due to missing lock nesting notation [ 2507.913992] 4 locks held by gem_exec_captur/1825: [ 2507.913997] #0: ffff888101f6e460 (sb_writers#17){..}-{0:0}, at: ksys_write+0xe9/0x1b0 [ 2507.914009] #1: ffff88812d99e2b8 (&attr->mutex){..}-{3:3}, at: simple_attr_write+0xbb/0x220 [ 2507.914019] #2: ffffffffb9df2500 (fs_reclaim){..}-{0:0}, at: i915_drop_caches_set+0x1c9/0x2c0 [i915] [ 2507.914085] #3: ffff8881b4a11b20 (reservation_ww_class_mutex){..}-{3:3}, at: ww_mutex_trylock+0x43f/0xcb0 [ 2507.914097] stack backtrace: [ 2507.914102] CPU: 0 PID: 1825 Comm: gem_exec_captur Tainted: G U 5.16.0-rc4+ #5 [ 2507.914109] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 0403 01/26/2021 [ 2507.914115] Call Trace: [ 2507.914118] <TASK> [ 2507.914121] dump_stack_lvl+0x59/0x73 [ 2507.914128] __lock_acquire.cold+0x227/0x3b0 [ 2507.914135] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 2507.914141] ? __lock_acquire+0x23ca/0x5000 [ 2507.914147] lock_acquire+0x19c/0x4b0 [ 2507.914152] ? kmem_cache_alloc_trace+0x30/0x390 [ 2507.914157] ? lock_release+0x690/0x690 [ 2507.914163] ? lock_is_held_type+0xe4/0x140 [ 2507.914170] ? ttm_sys_man_alloc+0x47/0xb0 [ttm] [ 2507.914178] fs_reclaim_acquire+0x11a/0x160 [ 2507.914183] ? kmem_cache_alloc_trace+0x30/0x390 [ 2507.914188] kmem_cache_alloc_trace+0x30/0x390 [ 2507.914192] ? lock_release+0x37f/0x690 [ 2507.914198] ttm_sys_man_alloc+0x47/0xb0 [ttm] [ 2507.914206] ttm_bo_pipeline_gutting+0x70/0x440 [ttm] [ 2507.914214] ? ttm_mem_io_free+0x150/0x150 [ttm] [ 2507.914221] ? lock_is_held_type+0xe4/0x140 [ 2507.914227] ttm_bo_validate+0x2fb/0x370 [ttm] [ 2507.914234] ? lock_acquire+0x19c/0x4b0 [ 2507.914239] ? ttm_bo_bounce_temp_buffer.constprop.0+0xf0/0xf0 [ttm] [ 2507.914246] ? lock_acquire+0x131/0x4b0 [ 2507.914251] ? lock_is_held_type+0xe4/0x140 [ 2507.914257] i915_ttm_shrinker_release_pages+0x2bc/0x490 [i915] [ 2507.914339] ? i915_ttm_swap_notify+0x130/0x130 [i915] [ 2507.914429] ? i915_gem_object_release_mmap_offset+0x32/0x250 [i915] [ 2507.914529] i915_gem_shrink+0xb14/0x1290 [i915] [ 2507.914616] ? ___i915_gem_object_make_shrinkable+0x3e0/0x3e0 [i915] [ 2507.914698] ? _raw_spin_unlock_irqrestore+0x2d/0x60 [ 2507.914705] ? track_intel_runtime_pm_wakeref+0x180/0x230 [i915] [ 2507.914777] i915_gem_shrink_all+0x4b/0x70 [i915] [ 2507.914857] i915_drop_caches_set+0x227/0x2c0 [i915] Reported-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211213125530.3960007-1-matthew.auld@intel.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

virtio-gpu + spice-guest-agent #3

virtio-gpu + spice-guest-agent #3

jfouquart commented Jul 5, 2022

ayeedshaikh commented Jul 12, 2022

virtio-gpu + spice-guest-agent #3

virtio-gpu + spice-guest-agent #3

Comments

jfouquart commented Jul 5, 2022

ayeedshaikh commented Jul 12, 2022