Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ksmbd rdma reads hitting kernel panic 6.5.0 on arm server #499

Open
varadakari opened this issue Jan 3, 2025 · 22 comments
Open

ksmbd rdma reads hitting kernel panic 6.5.0 on arm server #499

varadakari opened this issue Jan 3, 2025 · 22 comments

Comments

@varadakari
Copy link

Testing ksmbd on arm server(64bit) with ubuntu 6.5.0 is hitting following panic.

Jan  2 11:58:29 ss193 kernel: [10638.510907] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
Jan  2 11:58:29 ss193 kernel: [10638.519728] Mem abort info:
Jan  2 11:58:29 ss193 kernel: [10638.522526]   ESR = 0x0000000096000004
Jan  2 11:58:29 ss193 kernel: [10638.526268]   EC = 0x25: DABT (current EL), IL = 32 bits
Jan  2 11:58:29 ss193 kernel: [10638.531573]   SET = 0, FnV = 0
Jan  2 11:58:29 ss193 kernel: [10638.534622]   EA = 0, S1PTW = 0
Jan  2 11:58:29 ss193 kernel: [10638.537758]   FSC = 0x04: level 0 translation fault
Jan  2 11:58:29 ss193 kernel: [10638.542630] Data abort info:
Jan  2 11:58:29 ss193 kernel: [10638.545508]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
Jan  2 11:58:29 ss193 kernel: [10638.550987]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
Jan  2 11:58:29 ss193 kernel: [10638.556032]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
Jan  2 11:58:29 ss193 kernel: [10638.561339] user pgtable: 4k pages, 48-bit VAs, pgdp=00000006ee4f3000
Jan  2 11:58:29 ss193 kernel: [10638.567774] [0000000000000038] pgd=0000000000000000, p4d=0000000000000000
Jan  2 11:58:29 ss193 kernel: [10638.574562] Internal error: Oops: 0000000096000004 [#1] SMP
Jan  2 11:58:31 ss193 kernel: [10638.580123] Modules linked in: ksmbd(OE) nls_utf8 libdes rpcrdma rdma_cm iw_cm ib_cm sbsa_gwdt ipmi_ssif ipmi_devintf ipmi_msghandler nvme_fabrics target_core_mod 8021q garp mrp stp llc overlay binfmt_misc nls_iso8859_1 sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfs(POE) spl(OE) efi_pstore drm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear dw_mmc_bluefield dw_mmc_pltfm dw_mmc mlx5_ib ib_uverbs ib_core mlx5_core mlxfw crct10dif_ce nvme psample nvme_core tls sdhci_of_dwcmshc nvme_common sdhci_pltfm vitesse pci_hyperv_intf sdhci  aes_neon_bs aes_neon_blk [last unloaded: crc32_generic]
Jan  2 11:58:31 ss193 kernel: [10638.656521] CPU: 8 PID: 575244 Comm: ksmbd:r445 Tainted: P           OE      6.5.0-45-generic #45~22.04.1-Ubuntu
Jan  2 11:58:31 ss193 kernel: [10638.679529] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Jan  2 11:58:31 ss193 kernel: [10638.686478] pc : smb_direct_read+0x1cc/0x3f8 [ksmbd]
Jan  2 11:58:31 ss193 kernel: [10638.691446] lr : ksmbd_conn_handler_loop+0x18c/0x440 [ksmbd]
Jan  2 11:58:31 ss193 kernel: [10638.697102] sp : ffff8000f9313d00
Jan  2 11:58:31 ss193 kernel: [10638.700403] x29: ffff8000f9313d00 x28: 0000000000000000 x27: 0000000000000000
Jan  2 11:58:31 ss193 kernel: [10638.707527] x26: 0000000000000000 x25: ffffc8e19edd9188 x24: ffff00027e4aec70
Jan  2 11:58:31 ss193 kernel: [10638.714650] x23: ffffc8e19ede9f68 x22: 0000000000000004 x21: ffff8000f9313e64
Jan  2 11:58:31 ss193 kernel: [10638.721773] x20: 0000000000000004 x19: ffff00027e4aec00 x18: ffff80008d8fd018
Jan  2 11:58:31 ss193 kernel: [10638.728896] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000005
Jan  2 11:58:31 ss193 kernel: [10638.736019] x14: 0000000000000000 x13: 0000000000000000 x12: 0000003d63a78000
Jan  2 11:58:31 ss193 kernel: [10638.743142] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc8e19edab11c
Jan  2 11:58:31 ss193 kernel: [10638.750264] x8 : ffff0008140231c0 x7 : 0000000000000000 x6 : 0000000000000000
Jan  2 11:58:31 ss193 kernel: [10638.757386] x5 : 0000000000000000 x4 : ffffc8e19edcf208 x3 : ffff00028298c300
Jan  2 11:58:31 ss193 kernel: [10638.764509] x2 : 0000000000000039 x1 : 0000000000000001 x0 : ffff00027e4aec70
Jan  2 11:58:31 ss193 kernel: [10638.771632] Call trace:
Jan  2 11:58:31 ss193 kernel: [10638.774068]  smb_direct_read+0x1cc/0x3f8 [ksmbd]
Jan  2 11:58:31 ss193 kernel: [10638.778682]  ksmbd_conn_handler_loop+0x18c/0x440 [ksmbd]
Jan  2 11:58:31 ss193 kernel: [10638.783989]  kthread+0x100/0x118
Jan  2 11:58:31 ss193 kernel: [10638.787212]  ret_from_fork+0x10/0x20
Jan  2 11:58:31 ss193 kernel: [10638.790778] Code: 54000920 f9403a63 d100207b 9100e762 (3940e377)
Jan  2 11:58:31 ss193 kernel: [10638.796860] ---[ end trace 0000000000000000 ]---

This module is compiled on same machine commit 7390347
we are consistently hitting this issue. Client machine is running rocky linux 8.10 . Mount and writes are sucessful, hitting this issue on reads only. Running fio on 8 threads on the client on 8 different files.

please let me know if you need any more information.

@varadakari
Copy link
Author

Here is the modinfo

$ modinfo ./ksmbd.ko
filename:       /home/ubuntu/varada/ksmbd/./ksmbd.ko
softdep:        pre: crc32
softdep:        pre: gcm
softdep:        pre: ccm
softdep:        pre: aead2
softdep:        pre: sha512
softdep:        pre: sha256
softdep:        pre: cmac
softdep:        pre: aes
softdep:        pre: nls
softdep:        pre: md5
softdep:        pre: md4
softdep:        pre: hmac
softdep:        pre: ecb
license:        GPL
description:    Linux kernel CIFS/SMB SERVER
version:        3.5.0
author:         Namjae Jeon <linkinjeon@kernel.org>
srcversion:     C0C68EEF6DC0D42CAE6C55F
depends:        ib_core,rdma_cm
name:           ksmbd
vermagic:       6.5.0-45-generic SMP preempt mod_unload modversions aarch64

@mmakassikis
Copy link

can you provide the full fio script / command line ?

can you rebuild module with this patch ?

diff --git a/transport_rdma.c b/transport_rdma.c
index 8f8013475fdf..5cce295a9f74 100644
--- a/transport_rdma.c
+++ b/transport_rdma.c
@@ -708,6 +708,8 @@ again:
 		offset = st->first_entry_offset;
 		while (data_read < size) {
 			recvmsg = get_first_reassembly(st);
+			if (!recvmsg)
+				return data_read;
 			data_transfer = smb_direct_recvmsg_payload(recvmsg);
 			data_length = le32_to_cpu(data_transfer->data_length);
 			remaining_data_length =

@varadakari
Copy link
Author

Sure, will give a try and update here. Thanks for the prompt response.

@varadakari
Copy link
Author

Read test is successful but hitting the following trace during the run.

[Fri Jan  3 11:36:05 2025] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0
[Fri Jan  3 11:36:05 2025] =============================================================================
[Fri Jan  3 11:36:06 2025] BUG smb_direct_resp_00000000194c8d60 (Tainted: P           OE     ): Objects remaining in smb_direct_resp_00000000194c8d60 on __kmem_cache_shutdown()
[Fri Jan  3 11:36:06 2025] -----------------------------------------------------------------------------

[Fri Jan  3 11:36:06 2025] Slab 0x0000000099e5912b objects=22 used=1 fp=0x0000000022e170df flags=0x17fff8000010200(slab|head|node=0|zone=2|lastcpupid=0xffff)
[Fri Jan  3 11:36:06 2025] CPU: 10 PID: 3889139 Comm: ksmbd:r445 Tainted: P           OE      6.5.0-45-generic #45~22.04.1-Ubuntu
[Fri Jan  3 11:36:06 2025] Call trace:
[Fri Jan  3 11:36:06 2025]  dump_backtrace+0xa4/0x150
[Fri Jan  3 11:36:06 2025]  show_stack+0x24/0x50
[Fri Jan  3 11:36:06 2025]  dump_stack_lvl+0x78/0xf8
[Fri Jan  3 11:36:06 2025]  dump_stack+0x1c/0x38
[Fri Jan  3 11:36:06 2025]  slab_err+0xd0/0x140
[Fri Jan  3 11:36:06 2025]  free_partial+0x124/0x400
[Fri Jan  3 11:36:06 2025]  __kmem_cache_shutdown+0x68/0xf8
[Fri Jan  3 11:36:06 2025]  kmem_cache_destroy+0x98/0x1c0
[Fri Jan  3 11:36:06 2025]  smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Fri Jan  3 11:36:06 2025]  free_transport+0x168/0x290 [ksmbd]
[Fri Jan  3 11:36:06 2025]  smb_direct_disconnect+0x60/0x138 [ksmbd]
[Fri Jan  3 11:36:06 2025]  ksmbd_conn_handler_loop+0x294/0x440 [ksmbd]
[Fri Jan  3 11:36:06 2025]  kthread+0x100/0x118
[Fri Jan  3 11:36:06 2025]  ret_from_fork+0x10/0x20
[Fri Jan  3 11:36:06 2025] Object 0x000000002752450b @offset=27968
[Fri Jan  3 11:36:06 2025] ------------[ cut here ]------------
[Fri Jan  3 11:36:06 2025] kmem_cache_destroy smb_direct_resp_00000000194c8d60: Slab cache still has objects when called from smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Fri Jan  3 11:36:06 2025] WARNING: CPU: 10 PID: 3889139 at mm/slab_common.c:498 kmem_cache_destroy+0x1b8/0x1c0
[Fri Jan  3 11:36:06 2025] Modules linked in: ksmbd(OE) nls_utf8 libdes rpcrdma rdma_cm iw_cm ib_cm sbsa_gwdt ipmi_ssif ipmi_devintf ipmi_msghandler nvme_fabrics target_core_mod 8021q garp mrp stp llc overlay binfmt_misc nls_iso8859_1 sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfs(POE) spl(OE) efi_pstore drm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear dw_mmc_bluefield dw_mmc_pltfm dw_mmc mlx5_ib ib_uverbs ib_core mlx5_core mlxfw psample crct10dif_ce nvme sdhci_of_dwcmshc nvme_core tls nvme_common sdhci_pltfm vitesse pci_hyperv_intf sdhci aes_neon_bs aes_neon_blk [last unloaded: ksmbd(OE)]
[Fri Jan  3 11:36:06 2025] CPU: 10 PID: 3889139 Comm: ksmbd:r445 Tainted: P    B      OE      6.5.0-45-generic #45~22.04.1-Ubuntu
[Fri Jan  3 11:36:06 2025] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[Fri Jan  3 11:36:06 2025] pc : kmem_cache_destroy+0x1b8/0x1c0
[Fri Jan  3 11:36:06 2025] lr : kmem_cache_destroy+0x1b8/0x1c0
[Fri Jan  3 11:36:06 2025] sp : ffff8000a414bc90
[Fri Jan  3 11:36:06 2025] x29: ffff8000a414bc90 x28: 00000000424d53fe x27: ffffa27bd39d5c40
[Fri Jan  3 11:36:06 2025] x26: ffffa27bd39d6000 x25: ffffa27bd39d6188 x24: dead000000000100
[Fri Jan  3 11:36:06 2025] x23: dead000000000122 x22: 0000000040002000 x21: 2e91a27bd3a69bf4
[Fri Jan  3 11:36:06 2025] x20: ffffa27c3914c738 x19: ffff0000b2f4c700 x18: ffff80008bd9d030
[Fri Jan  3 11:36:06 2025] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007380736
[Fri Jan  3 11:36:06 2025] x14: 0000000000000001 x13: 206d6f7266206465 x12: 0000000000000000
[Fri Jan  3 11:36:06 2025] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[Fri Jan  3 11:36:06 2025] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[Fri Jan  3 11:36:06 2025] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[Fri Jan  3 11:36:06 2025] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[Fri Jan  3 11:36:06 2025] Call trace:
[Fri Jan  3 11:36:06 2025]  kmem_cache_destroy+0x1b8/0x1c0
[Fri Jan  3 11:36:06 2025]  smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Fri Jan  3 11:36:06 2025]  free_transport+0x168/0x290 [ksmbd]
[Fri Jan  3 11:36:06 2025]  smb_direct_disconnect+0x60/0x138 [ksmbd]
[Fri Jan  3 11:36:06 2025]  ksmbd_conn_handler_loop+0x294/0x440 [ksmbd]
[Fri Jan  3 11:36:06 2025]  kthread+0x100/0x118
[Fri Jan  3 11:36:06 2025]  ret_from_fork+0x10/0x20
[Fri Jan  3 11:36:06 2025] ---[ end trace 0000000000000000 ]---

@namjaejeon
Copy link
Owner

Can you check the following change ? and let me know how to reproduce it using fio.

diff --git a/transport_rdma.c b/transport_rdma.c
index 8f80134..683ee82 100644
--- a/transport_rdma.c
+++ b/transport_rdma.c
@@ -864,13 +864,6 @@ static void send_done(struct ib_cq *cq, struct ib_wc *wc)
                    ib_wc_status_msg(wc->status), wc->status,
                    wc->opcode);
 
-       if (wc->status != IB_WC_SUCCESS || wc->opcode != IB_WC_SEND) {
-               pr_err("Send error. status='%s (%d)', opcode=%d\n",
-                      ib_wc_status_msg(wc->status), wc->status,
-                      wc->opcode);
-               smb_direct_disconnect_rdma_connection(t);
-       }
-
        if (atomic_dec_and_test(&t->send_pending))
                wake_up(&t->wait_send_pending);
 
@@ -885,6 +878,13 @@ static void send_done(struct ib_cq *cq, struct ib_wc *wc)
 
        sibling = container_of(pos, struct smb_direct_sendmsg, list);
        smb_direct_free_sendmsg(t, sibling);
+
+       if (wc->status != IB_WC_SUCCESS || wc->opcode != IB_WC_SEND) {
+               pr_err("Send error. status='%s (%d)', opcode=%d\n",
+                      ib_wc_status_msg(wc->status), wc->status,
+                      wc->opcode);
+               smb_direct_disconnect_rdma_connection(t);
+       }
 }

@varadakari
Copy link
Author

varadakari commented Jan 5, 2025

Still seeing the error in writes and reads

[Sun Jan  5 15:32:14 2025] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0
[Sun Jan  5 15:32:14 2025] ksmbd: smb_direct: Send error. status='WR flushed (5)', opcode=0
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: bad smb2 signature
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] ksmbd: Failed to send message: -107
[Sun Jan  5 15:32:14 2025] =============================================================================
[Sun Jan  5 15:32:14 2025] BUG smb_direct_resp_00000000ad1ba2f7 (Tainted: P    B   W  OE     ): Objects remaining in smb_direct_resp_00000000ad1ba2f7 on __kmem_cache_shutdown()
[Sun Jan  5 15:32:14 2025] -----------------------------------------------------------------------------

[Sun Jan  5 15:32:14 2025] Slab 0x00000000cf438c8c objects=22 used=1 fp=0x00000000e85bd784 flags=0x17fff8000010200(slab|head|node=0|zone=2|lastcpupid=0xffff)
[Sun Jan  5 15:32:14 2025] CPU: 0 PID: 116448 Comm: ksmbd:r445 Tainted: P    B   W  OE      6.5.0-45-generic #45~22.04.1-Ubuntu
[Sun Jan  5 15:32:14 2025] Call trace:
[Sun Jan  5 15:32:14 2025]  dump_backtrace+0xa4/0x150
[Sun Jan  5 15:32:14 2025]  show_stack+0x24/0x50
[Sun Jan  5 15:32:14 2025]  dump_stack_lvl+0x78/0xf8
[Sun Jan  5 15:32:14 2025]  dump_stack+0x1c/0x38
[Sun Jan  5 15:32:14 2025]  slab_err+0xd0/0x140
[Sun Jan  5 15:32:14 2025]  free_partial+0x124/0x400
[Sun Jan  5 15:32:14 2025]  __kmem_cache_shutdown+0x68/0xf8
[Sun Jan  5 15:32:14 2025]  kmem_cache_destroy+0x98/0x1c0
[Sun Jan  5 15:32:14 2025]  smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Sun Jan  5 15:32:14 2025]  free_transport+0x168/0x290 [ksmbd]
[Sun Jan  5 15:32:14 2025]  smb_direct_disconnect+0x60/0x138 [ksmbd]
[Sun Jan  5 15:32:14 2025]  ksmbd_conn_handler_loop+0x294/0x440 [ksmbd]
[Sun Jan  5 15:32:14 2025]  kthread+0x100/0x118
[Sun Jan  5 15:32:14 2025]  ret_from_fork+0x10/0x20
[Sun Jan  5 15:32:14 2025] Object 0x0000000069b8a14a @offset=26496
[Sun Jan  5 15:32:14 2025] =============================================================================
[Sun Jan  5 15:32:14 2025] BUG smb_direct_resp_00000000ad1ba2f7 (Tainted: P    B   W  OE     ): Objects remaining in smb_direct_resp_00000000ad1ba2f7 on __kmem_cache_shutdown()
[Sun Jan  5 15:32:14 2025] -----------------------------------------------------------------------------

[Sun Jan  5 15:32:14 2025] Slab 0x00000000d295c7a0 objects=22 used=1 fp=0x00000000d9136519 flags=0x17fff8000010200(slab|head|node=0|zone=2|lastcpupid=0xffff)
[Sun Jan  5 15:32:14 2025] CPU: 0 PID: 116448 Comm: ksmbd:r445 Tainted: P    B   W  OE      6.5.0-45-generic #45~22.04.1-Ubuntu
[Sun Jan  5 15:32:14 2025] Call trace:
[Sun Jan  5 15:32:14 2025]  dump_backtrace+0xa4/0x150
[Sun Jan  5 15:32:14 2025]  show_stack+0x24/0x50
[Sun Jan  5 15:32:14 2025]  dump_stack_lvl+0x78/0xf8
[Sun Jan  5 15:32:14 2025]  dump_stack+0x1c/0x38
[Sun Jan  5 15:32:14 2025]  slab_err+0xd0/0x140
[Sun Jan  5 15:32:14 2025]  free_partial+0x124/0x400
[Sun Jan  5 15:32:14 2025]  __kmem_cache_shutdown+0x68/0xf8
[Sun Jan  5 15:32:14 2025]  kmem_cache_destroy+0x98/0x1c0
[Sun Jan  5 15:32:14 2025]  smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Sun Jan  5 15:32:14 2025]  free_transport+0x168/0x290 [ksmbd]
[Sun Jan  5 15:32:14 2025]  smb_direct_disconnect+0x60/0x138 [ksmbd]
[Sun Jan  5 15:32:14 2025]  ksmbd_conn_handler_loop+0x294/0x440 [ksmbd]
[Sun Jan  5 15:32:14 2025]  kthread+0x100/0x118
[Sun Jan  5 15:32:14 2025]  ret_from_fork+0x10/0x20
[Sun Jan  5 15:32:14 2025] Object 0x00000000634003a1 @offset=10304
[Sun Jan  5 15:32:14 2025] ------------[ cut here ]------------
[Sun Jan  5 15:32:14 2025] kmem_cache_destroy smb_direct_resp_00000000ad1ba2f7: Slab cache still has objects when called from smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Sun Jan  5 15:32:14 2025] WARNING: CPU: 0 PID: 116448 at mm/slab_common.c:498 kmem_cache_destroy+0x1b8/0x1c0
[Sun Jan  5 15:32:14 2025] Modules linked in: ksmbd(OE) nls_utf8 libdes rpcrdma rdma_cm iw_cm ib_cm sbsa_gwdt ipmi_ssif ipmi_devintf ipmi_msghandler nvme_fabrics target_core_mod 8021q garp mrp stp llc overlay binfmt_misc nls_iso8859_1  sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua zfs(POE) spl(OE) efi_pstore drm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear dw_mmc_bluefield dw_mmc_pltfm dw_mmc mlx5_ib ib_uverbs ib_core mlx5_core mlxfw psample crct10dif_ce nvme sdhci_of_dwcmshc nvme_core tls nvme_common sdhci_pltfm vitesse pci_hyperv_intf sdhci aes_neon_bs aes_neon_blk [last unloaded: ksmbd(OE)]
[Sun Jan  5 15:32:14 2025] CPU: 0 PID: 116448 Comm: ksmbd:r445 Tainted: P    B   W  OE      6.5.0-45-generic #45~22.04.1-Ubuntu
[Sun Jan  5 15:32:14 2025] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[Sun Jan  5 15:32:14 2025] pc : kmem_cache_destroy+0x1b8/0x1c0
[Sun Jan  5 15:32:14 2025] lr : kmem_cache_destroy+0x1b8/0x1c0
[Sun Jan  5 15:32:14 2025] sp : ffff80008d863c90
[Sun Jan  5 15:32:14 2025] x29: ffff80008d863c90 x28: 00000000424d53fe x27: ffffb591101d2c40
[Sun Jan  5 15:32:14 2025] x26: ffffb591101d3000 x25: ffffb591101d3188 x24: dead000000000100
[Sun Jan  5 15:32:14 2025] x23: dead000000000122 x22: 0000000040002000 x21: e1f7b59110266bf4
[Sun Jan  5 15:32:14 2025] x20: ffffb5914e23c738 x19: ffff00008b536000 x18: ffff800083211030
[Sun Jan  5 15:32:14 2025] x17: 0000000000000000 x16: 0000000000000000 x15: 0720072007340730
[Sun Jan  5 15:32:14 2025] x14: 0000000000000001 x13: 206d6f7266206465 x12: 0000000000000000
[Sun Jan  5 15:32:14 2025] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[Sun Jan  5 15:32:14 2025] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[Sun Jan  5 15:32:14 2025] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[Sun Jan  5 15:32:14 2025] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[Sun Jan  5 15:32:14 2025] Call trace:
[Sun Jan  5 15:32:14 2025]  kmem_cache_destroy+0x1b8/0x1c0
[Sun Jan  5 15:32:14 2025]  smb_direct_destroy_pools+0x104/0x160 [ksmbd]
[Sun Jan  5 15:32:14 2025]  free_transport+0x168/0x290 [ksmbd]
[Sun Jan  5 15:32:14 2025]  smb_direct_disconnect+0x60/0x138 [ksmbd]
[Sun Jan  5 15:32:14 2025]  ksmbd_conn_handler_loop+0x294/0x440 [ksmbd]
[Sun Jan  5 15:32:14 2025]  kthread+0x100/0x118
[Sun Jan  5 15:32:14 2025]  ret_from_fork+0x10/0x20
[Sun Jan  5 15:32:14 2025] ---[ end trace 0000000000000000 ]---

@varadakari
Copy link
Author

With the above fixes, issue is not solved. Hitting the first panic mentioned while reads are in progress. And write and reads performance is not consistent. Any debug information needed to fix the issue?
Using the default ubuntu ksmbd.ko module smb share is not displaying the contents of the directories and files, but reads and writes are going fine. Is there any version of the ksmbd-tools needs to be used for the directories and files to be visible? Below are details of module version and ksmbd-tools version

$ modinfo ksmbd
filename:       /lib/modules/6.5.0-45-generic/kernel/fs/smb/server/ksmbd.ko
description:    Linux kernel CIFS/SMB SERVER
version:        3.4.2
$ sudo /usr/sbin/ksmbd.control -V
ksmbd-tools version : 3.4.4

@namjaejeon
Copy link
Owner

Any debug information needed to fix the issue?

I need to reproduce it. Can you analyze and suggest the patch to me ?

Using the default ubuntu ksmbd.ko module smb share is not displaying the contents of the directories and files, but reads and writes are going fine.

Can you tell me the following questions ?

  1. Client: Windows or ?
  2. RDMA or TCP ?
  3. Have you tried github ksmbd(https://github.com/cifsd-team/ksmbd) ?

@varadakari
Copy link
Author

varadakari commented Jan 11, 2025

I need to reproduce it. Can you analyze and suggest the patch to me ?

This is the current top of tree code base. I am using linux client running rocky linux 8.10 and using rdma. Server is a Arm server running Ubuntu 6.5 kernel

Using the default ubuntu ksmbd.ko module smb share is not displaying the contents of the directories and files, but reads and writes are going fine.

Can you tell me the following questions ?

  1. Client: Windows or ?
    Client is Rocky linux 8.10
  2. RDMA or TCP ?
    RDMA
  3. Have you tried github ksmbd(https://github.com/cifsd-team/ksmbd) ?
    Yes, it is working with only latest code base, top of tree ksmbd(3.5) and ksmbd-tools (3.5.2), with those version hitting the above issues.

@namjaejeon
Copy link
Owner

@varadakari

Regarding to kernel oops in ksmbd, I will try to reproduce it using cifs.ko.

And You seems to use cifs.ko as smb client. rdma of cifs.ko is broken.
If you want to use rdma mode using cifs.ko, You need to downgrade kernel version to Linux 5.15.

@varadakari
Copy link
Author

varadakari commented Jan 11, 2025

@namjaejeon is there any alternative to the cifs.ko for latest kernels? We wish to use rdma.

@namjaejeon
Copy link
Owner

@varadakari
No, there is no another option in linux kernel and Linux system. Only cifs.ko support RDMA among Linux smb clients.
Can you check if rdma work fine between cifs.ko in linux-5.15 kernel and ksmbd in 6.5 kernel ?

@varadakari
Copy link
Author

@varadakari No, there is no another option in linux kernel and Linux system. Only cifs.ko support RDMA among Linux smb clients. Can you check if rdma work fine between cifs.ko in linux-5.15 kernel and ksmbd in 6.5 kernel ?

Sure @namjaejeon will test and get back to you.

@varadakari
Copy link
Author

Hi @namjaejeon i have tested with Ubuntu 5.15.0-94-generic kernel with the latest ksmbd module and ksmbd-tools version 3.5.2, i don't see any read or write issues with the same client which is running Rocky Linux with version 4.18.0-553.16.1.el8_10.x86_64 and cifs module version 2.29.
Issue happens with the same client with 6.5 kernel version ksmbd module(compiled with latest code, version 3.5)

@namjaejeon
Copy link
Owner

You are saying,,
the ksmbd module with 5.15 and cifs.ko in 6.5 kernel are no problem ? I don't know and not interest in Rocky Linux version.

@varadakari
Copy link
Author

Sorry for the confusion, i kept the same client for both the ksmbd servers. Client here is running 4.18.0-553.16.1.el8_10.x86_64 and cifs.ko version 2.29.
Server with 5.15 is working fine without any issues with rdma and tcp both with the client mentioned above.
Server with 6.5 kernel is not working as expecting hitting the above issues.

@namjaejeon
Copy link
Owner

Stranged..
Client 4.18.0-553.16.1.el8_10.x86_64 => kernel version is 4.18 ?

@namjaejeon
Copy link
Owner

And can you tell me what RDMA NIC you use ? Mellanox or Chelsio ?

@varadakari
Copy link
Author

Client OS version is 4.18
RDMA nic is Mellanox Cx-5

@namjaejeon
Copy link
Owner

give me more inforamation.

  1. Let me know steps to reproduce it.
    a. mount -t cifs
    b. cp file /mnt/
    c. etc..
  2. want to know minor version of 5.15 kernel you tested. i.e. 5.15.xx
  3. I can not install 4.18 kernel. It is too old version. If if you use cifs.ko in 5.15 kernel, problem can happen ?
  4. What is the endian of client target ?

@varadakari
Copy link
Author

varadakari commented Jan 15, 2025

give me more inforamation.

  1. Let me know steps to reproduce it.
    a. mount -t cifs

sudo mount -t cifs /// /mnt/ -o user=,pass=,uid=,domain=WORKGROUP,sec=ntlmssp,rdma

b. cp file /mnt/

This is fio run
which create 8 files for size 8GB each with direct io, sequential write for 2 mins
[global]
bs=1M
iodepth=1
ioengine=psync
randrepeat=0
group_reporting
time_based
runtime=120
filesize=8G
rw=write
name=write
numjobs=1
direct=1
[job0]
filename=/mnt//FILE0

c. etc..
2. want to know minor version of 5.15 kernel you tested. i.e. 5.15.xx

kernel version 5.15.0-94-generic

  1. I can not install 4.18 kernel. It is too old version. If if you use cifs.ko in 5.15 kernel, problem can happen ?

I will test and update.

  1. What is the endian of client target ?

Little endian
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian

@namjaejeon
Copy link
Owner

@varadakari

I will test and update.

Okay, Let me know your test result.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants