dGPU SR-IOV support for virtio-GPU #18

phreer · 2024-08-07T06:49:09Z

No description provided.

Tile-4 is supported by Intel DG2. Support it could benefit performance. Tracked-On: OAM-123143 Signed-off-by: HeYue <yue.he@intel.com>

...which allows setting attach->peer2peer without implementing dynamic importer_ops. Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

TODO: We must always use DMA addresses for the following two reasons: 1. By design we are not allowed to access the struct page backing a scatter list, especially when config DMABUF_DEBUG is turned on in which case the addresses will be mangled by the core. 2. DMA addresses are required for dGPU local memory sharing between host and guest. Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

This feature is mainly for dGPU local memory sharing between host and guest. Presence of this capability means that the virtio-GPU backend is expecting local memory buffers for scan-out. Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

Set allow_peer2peer flag when capability VIRTIO_GPU_F_ALLOW_P2P is exposed by the device back-end. This allows other devices to share memory residing in device local memory. Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

[ Upstream commit f8bbc07ac535593139c875ffa19af924b1084540 ] vhost_worker will call tun call backs to receive packets. If too many illegal packets arrives, tun_do_read will keep dumping packet contents. When console is enabled, it will costs much more cpu time to dump packet and soft lockup will be detected. net_ratelimit mechanism can be used to limit the dumping rate. PID: 33036 TASK: ffff949da6f20000 CPU: 23 COMMAND: "vhost-32980" #0 [fffffe00003fce50] crash_nmi_callback at ffffffff89249253 projectceladon#1 [fffffe00003fce58] nmi_handle at ffffffff89225fa3 projectceladon#2 [fffffe00003fceb0] default_do_nmi at ffffffff8922642e projectceladon#3 [fffffe00003fced0] do_nmi at ffffffff8922660d projectceladon#4 [fffffe00003fcef0] end_repeat_nmi at ffffffff89c01663 [exception RIP: io_serial_in+20] RIP: ffffffff89792594 RSP: ffffa655314979e8 RFLAGS: 00000002 RAX: ffffffff89792500 RBX: ffffffff8af428a0 RCX: 0000000000000000 RDX: 00000000000003fd RSI: 0000000000000005 RDI: ffffffff8af428a0 RBP: 0000000000002710 R8: 0000000000000004 R9: 000000000000000f R10: 0000000000000000 R11: ffffffff8acbf64f R12: 0000000000000020 R13: ffffffff8acbf698 R14: 0000000000000058 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 projectceladon#5 [ffffa655314979e8] io_serial_in at ffffffff89792594 projectceladon#6 [ffffa655314979e8] wait_for_xmitr at ffffffff89793470 projectceladon#7 [ffffa65531497a08] serial8250_console_putchar at ffffffff897934f6 projectceladon#8 [ffffa65531497a20] uart_console_write at ffffffff8978b605 projectceladon#9 [ffffa65531497a48] serial8250_console_write at ffffffff89796558 projectceladon#10 [ffffa65531497ac8] console_unlock at ffffffff89316124 projectceladon#11 [ffffa65531497b10] vprintk_emit at ffffffff89317c07 projectceladon#12 [ffffa65531497b68] printk at ffffffff89318306 projectceladon#13 [ffffa65531497bc8] print_hex_dump at ffffffff89650765 projectceladon#14 [ffffa65531497ca8] tun_do_read at ffffffffc0b06c27 [tun] projectceladon#15 [ffffa65531497d38] tun_recvmsg at ffffffffc0b06e34 [tun] projectceladon#16 [ffffa65531497d68] handle_rx at ffffffffc0c5d682 [vhost_net] projectceladon#17 [ffffa65531497ed0] vhost_worker at ffffffffc0c644dc [vhost] projectceladon#18 [ffffa65531497f10] kthread at ffffffff892d2e72 projectceladon#19 [ffffa65531497f50] ret_from_fork at ffffffff89c0022f Fixes: ef3db4a ("tun: avoid BUG, dump packet on GSO errors") Signed-off-by: Lei Chen <lei.chen@smartx.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20240415020247.2207781-1-lei.chen@smartx.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>

sysopenci · 2024-08-23T01:54:58Z

Program name for this pr is not compatable with other dependent prs, for more details please check tracked_on

commit be346c1a6eeb49d8fda827d2a9522124c2f72f36 upstream. The code in ocfs2_dio_end_io_write() estimates number of necessary transaction credits using ocfs2_calc_extend_credits(). This however does not take into account that the IO could be arbitrarily large and can contain arbitrary number of extents. Extent tree manipulations do often extend the current transaction but not in all of the cases. For example if we have only single block extents in the tree, ocfs2_mark_extent_written() will end up calling ocfs2_replace_extent_rec() all the time and we will never extend the current transaction and eventually exhaust all the transaction credits if the IO contains many single block extents. Once that happens a WARN_ON(jbd2_handle_buffer_credits(handle) <= 0) is triggered in jbd2_journal_dirty_metadata() and subsequently OCFS2 aborts in response to this error. This was actually triggered by one of our customers on a heavily fragmented OCFS2 filesystem. To fix the issue make sure the transaction always has enough credits for one extent insert before each call of ocfs2_mark_extent_written(). Heming Zhao said: ------ PANIC: "Kernel panic - not syncing: OCFS2: (device dm-1): panic forced after error" PID: xxx TASK: xxxx CPU: 5 COMMAND: "SubmitThread-CA" #0 machine_kexec at ffffffff8c069932 #1 __crash_kexec at ffffffff8c1338fa #2 panic at ffffffff8c1d69b9 #3 ocfs2_handle_error at ffffffffc0c86c0c [ocfs2] #4 __ocfs2_abort at ffffffffc0c88387 [ocfs2] #5 ocfs2_journal_dirty at ffffffffc0c51e98 [ocfs2] #6 ocfs2_split_extent at ffffffffc0c27ea3 [ocfs2] #7 ocfs2_change_extent_flag at ffffffffc0c28053 [ocfs2] #8 ocfs2_mark_extent_written at ffffffffc0c28347 [ocfs2] #9 ocfs2_dio_end_io_write at ffffffffc0c2bef9 [ocfs2] #10 ocfs2_dio_end_io at ffffffffc0c2c0f5 [ocfs2] #11 dio_complete at ffffffff8c2b9fa7 #12 do_blockdev_direct_IO at ffffffff8c2bc09f #13 ocfs2_direct_IO at ffffffffc0c2b653 [ocfs2] #14 generic_file_direct_write at ffffffff8c1dcf14 #15 __generic_file_write_iter at ffffffff8c1dd07b #16 ocfs2_file_write_iter at ffffffffc0c49f1f [ocfs2] #17 aio_write at ffffffff8c2cc72e #18 kmem_cache_alloc at ffffffff8c248dde #19 do_io_submit at ffffffff8c2ccada #20 do_syscall_64 at ffffffff8c004984 #21 entry_SYSCALL_64_after_hwframe at ffffffff8c8000ba Link: https://lkml.kernel.org/r/20240617095543.6971-1-jack@suse.cz Link: https://lkml.kernel.org/r/20240614145243.8837-1-jack@suse.cz Fixes: c15471f ("ocfs2: fix sparse file & data ordering issue in direct io") Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit f0c18025693707ec344a70b6887f7450bf4c826b ] When running BPF selftests (./test_progs -t sockmap_basic) on a Loongarch platform, the following kernel panic occurs: [...] Oops[#1]: CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18 Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018 ... ... ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560 ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0 CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) PRMD: 0000000c (PPLV0 +PIE +PWE) EUEN: 00000007 (+FPE +SXE +ASXE -BTE) ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7) ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) BADV: 0000000000000040 PRID: 0014c011 (Loongson-64bit, Loongson-3C5000) Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...) Stack : ... Call Trace: [<9000000004162774>] copy_page_to_iter+0x74/0x1c0 [<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560 [<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0 [<90000000049aae34>] inet_recvmsg+0x54/0x100 [<900000000481ad5c>] sock_recvmsg+0x7c/0xe0 [<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0 [<900000000481e27c>] sys_recvfrom+0x1c/0x40 [<9000000004c076ec>] do_syscall+0x8c/0xc0 [<9000000003731da4>] handle_syscall+0xc4/0x160 Code: ... ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Fatal exception Kernel relocated by 0x3510000 .text @ 0x9000000003710000 .data @ 0x9000000004d70000 .bss @ 0x9000000006469400 ---[ end Kernel panic - not syncing: Fatal exception ]--- [...] This crash happens every time when running sockmap_skb_verdict_shutdown subtest in sockmap_basic. This crash is because a NULL pointer is passed to page_address() in the sk_msg_recvmsg(). Due to the different implementations depending on the architecture, page_address(NULL) will trigger a panic on Loongarch platform but not on x86 platform. So this bug was hidden on x86 platform for a while, but now it is exposed on Loongarch platform. The root cause is that a zero length skb (skb->len == 0) was put on the queue. This zero length skb is a TCP FIN packet, which was sent by shutdown(), invoked in test_sockmap_skb_verdict_shutdown(): shutdown(p1, SHUT_WR); In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no page is put to this sge (see sg_set_page in sg_set_page), but this empty sge is queued into ingress_msg list. And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it to kmap_local_page() and to page_address(), then kernel panics. To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(), if copy is zero, that means it's a zero length skb, skip invoking copy_page_to_iter(). We are using the EFAULT return triggered by copy_page_to_iter to check for is_fin in tcp_bpf.c. Fixes: 604326b ("bpf, sockmap: convert to generic sk_msg interface") Suggested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/e3a16eacdc6740658ee02a33489b1b9d4912f378.1719992715.git.tanggeliang@kylinos.cn Signed-off-by: Sasha Levin <sashal@kernel.org>

In binder_add_freeze_work() we iterate over the proc->nodes with the proc->inner_lock held. However, this lock is temporarily dropped to acquire the node->lock first (lock nesting order). This can race with binder_deferred_release() which removes the nodes from the proc->nodes rbtree and adds them into binder_dead_nodes list. This leads to a broken iteration in binder_add_freeze_work() as rb_next() will use data from binder_dead_nodes, triggering an out-of-bounds access: ================================================================== BUG: KASAN: global-out-of-bounds in rb_next+0xfc/0x124 Read of size 8 at addr ffffcb84285f7170 by task freeze/660 CPU: 8 UID: 0 PID: 660 Comm: freeze Not tainted 6.11.0-07343-ga727812a8d45 #18 Hardware name: linux,dummy-virt (DT) Call trace: rb_next+0xfc/0x124 binder_add_freeze_work+0x344/0x534 binder_ioctl+0x1e70/0x25ac __arm64_sys_ioctl+0x124/0x190 The buggy address belongs to the variable: binder_dead_nodes+0x10/0x40 [...] ================================================================== This is possible because proc->nodes (rbtree) and binder_dead_nodes (list) share entries in binder_node through a union: struct binder_node { [...] union { struct rb_node rb_node; struct hlist_node dead_node; }; Fix the race by checking that the proc is still alive. If not, simply break out of the iteration. Fixes: d579b04a52a1 ("binder: frozen notification") Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas <cmllamas@google.com> Bug: 366003708 Link: https://lore.kernel.org/all/20240924184401.76043-3-cmllamas@google.com/ Change-Id: I5ec9d49277a23b864862665b52213460750c535e Signed-off-by: Carlos Llamas <cmllamas@google.com>

phreer force-pushed the main-dgpu-sriov branch from 45f0d5e to 7bd619c Compare August 7, 2024 07:06

iViggyPrabhu force-pushed the main branch 2 times, most recently from 90339be to 4c19e9e Compare August 7, 2024 09:59

iViggyPrabhu closed this Aug 7, 2024

iViggyPrabhu reopened this Aug 7, 2024

sysopenci added Valid commit message Pending Developer Approval Pending Developer Approval Pending PR Review Pending PR Review Engineering Build Not Started Engineering Build Not Started labels Aug 7, 2024

sysopenci requested review from feijiang1, gkdeepa, JeevakaPrabu, kaushlen, sgnanase, shyjumon-n, xyzhao2018 and xzhan34 August 7, 2024 11:23

yhe39 and others added 6 commits August 8, 2024 01:51

drm/virtio: Support tile-4 modifier

f0990dd

Tile-4 is supported by Intel DG2. Support it could benefit performance. Tracked-On: OAM-123143 Signed-off-by: HeYue <yue.he@intel.com>

dma-buf: Add internal dynamic mapping function

253f24c

...which allows setting attach->peer2peer without implementing dynamic importer_ops. Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

drm/virtio: Show more capabilities in debugfs

3b9be43

Tracked-On: OAM-123143 Signed-off-by: Weifeng Liu <weifeng.liu@intel.com>

phreer force-pushed the main-dgpu-sriov branch from 7bd619c to 3b9be43 Compare August 8, 2024 01:51

sysopenci added Valid commit message and removed Valid commit message labels Aug 8, 2024

phreer mentioned this pull request Aug 23, 2024

dGPU SR-IOV patches #30

Merged

phreer closed this Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dGPU SR-IOV support for virtio-GPU #18

dGPU SR-IOV support for virtio-GPU #18

phreer commented Aug 7, 2024

sysopenci commented Aug 23, 2024

dGPU SR-IOV support for virtio-GPU #18

dGPU SR-IOV support for virtio-GPU #18

Conversation

phreer commented Aug 7, 2024

sysopenci commented Aug 23, 2024