Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme: testcases for TLS support #158

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

hreinecke
Copy link
Contributor

This pull request adds two new testcases for nvme TLS support, one for 'plain' TLS with TLS PSKs, and the other one for testing 'secure concatenation' where TLS is started after DH-HMAC-CHAP authentication.

return 1
fi

systemctl start tlshd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to check that it exists as a dependency

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also check the version of ktls-utils?
Or just explain in a comment if you have any expectations from it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good point. Will check what we can do here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to "man systemctl" "EXIT STATUS" section, systemctl command returns exit status "4" for "no such unit". So it would work to check if "systemctl status tlshd" command's exist status is 4 or not.

I use Fedora, and needed to install "ktls-utils" package to run the test case. It would be the better to mention the word "ktls-utils" in the SKIP_REASONS message to help users to understand what is missing.

_nvmet_target_setup --blkdev file --tls

# Test unencrypted connection
echo "Test unencrypted connection w/ tls not required"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm, looks pretty useless...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think so. This is testing the 'not required' setting in nvmet, which should accept both TLS and non-TLS connections even if TLS is enabled on the target.

echo "WARNING: connection is not encrypted"
fi

_nvme_disconnect_subsys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any room to test passing explicit keys and private keyrings to this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not do that here. This is for testing the 'default' case, where PSKs are pre-populated in the keyring and the connection picks up the keys automatically. Explicit keys and keyrings are really just for testing.
But we should have a separate testcase for that, true.

Copy link
Collaborator

@kawasaki kawasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit 320b9b6 does not look adding value. The helper function requires more types than direct call of "_require_nvme_trtype tcp".

Copy link
Collaborator

@kawasaki kawasaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit bc544f8 introduces the --concat option of _nvme_connect_subsys(), but it is not used anywhere. Do we need this commit in this PR? If it is a preparation for the next PR, I suggest to move this commit to that PR.

@kawasaki
Copy link
Collaborator

@hreinecke Thanks for rebasing the series. I ran the test case in my environment using the kernel v6.13 and the latest nvme-cli (2.10.2-77-gb4628c3, with libnvme 1.11.1-48-gacc19fc), but it fails.

nvme/059 (tr=tcp) (Create TLS-encrypted connections)         [failed]
    runtime    ...  4.690s
    --- tests/nvme/059.out      2025-01-29 17:10:17.090513738 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad      2025-01-30 13:21:58.468322103 +0900
    @@ -2,9 +2,13 @@
     Test unencrypted connection w/ tls not required
     disconnected 1 controller(s)
     Test encrypted connection w/ tls not required
    -disconnected 1 controller(s)
    +cat: /sys/class/nvme//tls_key: No such file or directory
    +WARNING: connection is not encrypted
    +disconnected 0 controller(s)
    ...
    (Run 'diff -u tests/nvme/059.out /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad' to see the entire diff)

059.full file left logs as follows:

NQN:blktests-subsystem-1 disconnected 1 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)

kernel message was as follows:

[   53.709438][ T1008] run blktests nvme/059 at 2025-01-30 13:21:53
[   53.852869][ T1088] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   53.871778][ T1089] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[   53.882422][ T1092] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   53.956688][ T1099] nvme nvme1: failed to connect socket: -512
[   53.966599][   T47] nvmet_tcp: failed to allocate queue, error -107
[   53.972570][  T225] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20.
[   53.978282][ T1099] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev.
[   53.981615][ T1099] nvme nvme1: creating 4 I/O queues.
[   53.985261][ T1099] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   53.988654][ T1099] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9
[   54.139181][ T1118] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[   54.319783][ T1125] nvme nvme1: failed to connect socket: -512
[   54.329235][   T47] nvmet_tcp: failed to allocate queue, error -107
[   55.691522][ T1182] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   55.714859][ T1186] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   55.776929][ T1193] nvme_tcp: queue 0: failed to receive icresp, error -4
[   57.044777][ T1237] nvme nvme1: failed to connect socket: -512

I'm not sure if this catches a kernel bug. Still the test case may need improvement, or I may be missing something. If you have any insights about this failure, please let me know.

@hreinecke
Copy link
Contributor Author

The commit 320b9b6 does not look adding value. The helper function requires more types than direct call of "_require_nvme_trtype tcp".

Okay, I'll fix it up.

@hreinecke
Copy link
Contributor Author

The commit bc544f8 introduces the --concat option of _nvme_connect_subsys(), but it is not used anywhere. Do we need this commit in this PR? If it is a preparation for the next PR, I suggest to move this commit to that PR.

It would if I had pushed the testcase for secure concatenation...

@hreinecke
Copy link
Contributor Author

@hreinecke Thanks for rebasing the series. I ran the test case in my environment using the kernel v6.13 and the latest nvme-cli (2.10.2-77-gb4628c3, with libnvme 1.11.1-48-gacc19fc), but it fails.

nvme/059 (tr=tcp) (Create TLS-encrypted connections)         [failed]
    runtime    ...  4.690s
    --- tests/nvme/059.out      2025-01-29 17:10:17.090513738 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad      2025-01-30 13:21:58.468322103 +0900
    @@ -2,9 +2,13 @@
     Test unencrypted connection w/ tls not required
     disconnected 1 controller(s)
     Test encrypted connection w/ tls not required
    -disconnected 1 controller(s)
    +cat: /sys/class/nvme//tls_key: No such file or directory
    +WARNING: connection is not encrypted
    +disconnected 0 controller(s)
    ...
    (Run 'diff -u tests/nvme/059.out /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad' to see the entire diff)

059.full file left logs as follows:

NQN:blktests-subsystem-1 disconnected 1 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)
NQN:blktests-subsystem-1 disconnected 0 controller(s)

kernel message was as follows:

[   53.709438][ T1008] run blktests nvme/059 at 2025-01-30 13:21:53
[   53.852869][ T1088] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   53.871778][ T1089] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[   53.882422][ T1092] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   53.956688][ T1099] nvme nvme1: failed to connect socket: -512
[   53.966599][   T47] nvmet_tcp: failed to allocate queue, error -107
[   53.972570][  T225] nvmet: creating nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.20.
[   53.978282][ T1099] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port dev.
[   53.981615][ T1099] nvme nvme1: creating 4 I/O queues.
[   53.985261][ T1099] nvme nvme1: mapped 4/0/0 default/read/poll queues.
[   53.988654][ T1099] nvme nvme1: new ctrl: NQN "blktests-subsystem-1", addr 127.0.0.1:4420, hostnqn: nq9
[   54.139181][ T1118] nvme nvme1: Removing ctrl: NQN "blktests-subsystem-1"
[   54.319783][ T1125] nvme nvme1: failed to connect socket: -512
[   54.329235][   T47] nvmet_tcp: failed to allocate queue, error -107
[   55.691522][ T1182] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[   55.714859][ T1186] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[   55.776929][ T1193] nvme_tcp: queue 0: failed to receive icresp, error -4
[   57.044777][ T1237] nvme nvme1: failed to connect socket: -512

I'm not sure if this catches a kernel bug. Still the test case may need improvement, or I may be missing something. If you have any insights about this failure, please let me know.

_check_ctrl_tls() need to redirect stderr to /dev/null, not stdout (as it does now on two occasions). Will be fixing up the testcase.

To start TLS-encrypted connections.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Add --tls option to _create_nvmet_subsystem and allow to specify
the tls requirements in _create_nvmet_port.

Signed-off-by: Hannes Reinecke <hare@suse.de>
@kawasaki
Copy link
Collaborator

@hreinecke Thanks for updating the patches. Question, which kernel should I use to run the test case?
I used the kernel with the tag "nvme-6.14-2025-01-28" with your patch series titled "[PATCHv14 00/10] nvme: implement secure concatenation". Blktests is hreinecke/tls.v3 at git hash 990fc84. But still see the test cases fail. Do you see the new test cases pass?

nvme/059 failure looks like this.

nvme/059 (tr=tcp) (Create TLS-encrypted connections)         [failed]
    runtime  4.666s  ...  6.473s
    --- tests/nvme/059.out      2025-01-31 11:10:39.925656241 +0900
    +++ /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad      2025-01-31 17:39:08.291736971 +0900
    @@ -2,9 +2,11 @@
     Test unencrypted connection w/ tls not required
     disconnected 1 controller(s)
     Test encrypted connection w/ tls not required
    -disconnected 1 controller(s)
    +WARNING: connection is not encrypted
    +disconnected 0 controller(s)
     Test unencrypted connection w/ tls required (should fail)
    ...
    (Run 'diff -u tests/nvme/059.out /home/shin/Blktests/blktests/results/nodev_tr_tcp/nvme/059.out.bad' to see the entire diff)

Also, nvme/060 run terminated in the middle. Kernel reported a BUG.

[  112.851185] [   T1360] run blktests nvme/060 at 2025-01-31 17:39:14
[  112.984431] [   T1462] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[  113.001125] [   T1463] nvmet: Allow non-TLS connections while TLS1.3 is enabled
[  113.008784] [   T1466] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
[  113.146659] [   T1477] nvme nvme1: failed to connect socket: -512
[  113.155644] [     T68] nvmet_tcp: failed to allocate queue, error -107
[  113.164733] [     T65] nvmet: Created nvm controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:0f01fb42-9f7f-4856-b0b3-51e60b8de349 with DH-HMAC-CHAP.
[  113.176065] [     T65] ==================================================================
[  113.176747] [     T65] BUG: KASAN: slab-out-of-bounds in vsnprintf+0x1589/0x18f0
[  113.177324] [     T65] Write of size 1 at addr ffff88812effdec3 by task kworker/2:1H/65

[  113.178094] [     T65] CPU: 2 UID: 0 PID: 65 Comm: kworker/2:1H Not tainted 6.13.0-rc4+ #397
[  113.178687] [     T65] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[  113.179412] [     T65] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[  113.179924] [     T65] Call Trace:
[  113.180200] [     T65]  <TASK>
[  113.180417] [     T65]  dump_stack_lvl+0x6a/0x90
[  113.180762] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.181126] [     T65]  print_report+0x174/0x505
[  113.181464] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.181800] [     T65]  ? __virt_addr_valid+0x208/0x430
[  113.182207] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.183526] [     T65]  kasan_report+0xa7/0x170
[  113.184840] [     T65]  ? format_decode+0x676/0xa40
[  113.186188] [     T65]  ? vsnprintf+0x1589/0x18f0
[  113.187502] [     T65]  vsnprintf+0x1589/0x18f0
[  113.188795] [     T65]  ? __pfx_vsnprintf+0x10/0x10
[  113.190137] [     T65]  sprintf+0xb5/0xf0
[  113.191349] [     T65]  ? __pfx_sprintf+0x10/0x10
[  113.192578] [     T65]  ? __kmalloc_noprof+0x3c4/0x550
[  113.193834] [     T65]  ? nvme_auth_derive_tls_psk+0x15c/0x2df [nvme_auth]
[  113.195242] [     T65]  nvme_auth_derive_tls_psk+0x1da/0x2df [nvme_auth]
[  113.196606] [     T65]  nvmet_auth_insert_psk+0x2fb/0x680 [nvmet]
[  113.197937] [     T65]  ? __pfx_nvmet_auth_insert_psk+0x10/0x10 [nvmet]
[  113.199303] [     T65]  ? rcu_is_watching+0x11/0xb0
[  113.200488] [     T65]  ? nvmet_execute_auth_send+0x157a/0x3380 [nvmet]
[  113.201785] [     T65]  ? __asan_memcpy+0x38/0x60
[  113.202956] [     T65]  nvmet_execute_auth_send+0x2f92/0x3380 [nvmet]
[  113.204260] [     T65]  ? sock_recvmsg+0x179/0x220
[  113.205397] [     T65]  nvmet_tcp_io_work+0x19d1/0x2970 [nvmet_tcp]
[  113.206602] [     T65]  ? __pfx_nvmet_tcp_io_work+0x10/0x10 [nvmet_tcp]
[  113.207843] [     T65]  ? __pfx_lock_release+0x10/0x10
[  113.208985] [     T65]  process_one_work+0x85a/0x1460
[  113.210122] [     T65]  ? __pfx_lock_acquire+0x10/0x10
[  113.211222] [     T65]  ? __pfx_process_one_work+0x10/0x10
[  113.212327] [     T65]  ? assign_work+0x16c/0x240
[  113.213383] [     T65]  ? lock_is_held_type+0xd5/0x130
[  113.214468] [     T65]  worker_thread+0x5e2/0xfc0
[  113.215513] [     T65]  ? __kthread_parkme+0xb1/0x1d0
[  113.216571] [     T65]  ? __pfx_worker_thread+0x10/0x10
[  113.217636] [     T65]  ? __pfx_worker_thread+0x10/0x10
[  113.218681] [     T65]  kthread+0x2d1/0x3a0
[  113.219644] [     T65]  ? _raw_spin_unlock_irq+0x24/0x50
[  113.220696] [     T65]  ? __pfx_kthread+0x10/0x10
[  113.221695] [     T65]  ret_from_fork+0x30/0x70
[  113.222660] [     T65]  ? __pfx_kthread+0x10/0x10
[  113.223597] [     T65]  ret_from_fork_asm+0x1a/0x30
[  113.224536] [     T65]  </TASK>

[  113.226064] [     T65] Allocated by task 65:
[  113.226906] [     T65]  kasan_save_stack+0x2c/0x50
[  113.227809] [     T65]  kasan_save_track+0x10/0x30
[  113.228696] [     T65]  __kasan_kmalloc+0xa6/0xb0
[  113.229536] [     T65]  __kmalloc_noprof+0x1c6/0x550
[  113.230391] [     T65]  nvme_auth_derive_tls_psk+0x15c/0x2df [nvme_auth]
[  113.231375] [     T65]  nvmet_auth_insert_psk+0x2fb/0x680 [nvmet]
[  113.232336] [     T65]  nvmet_execute_auth_send+0x2f92/0x3380 [nvmet]
[  113.233325] [     T65]  nvmet_tcp_io_work+0x19d1/0x2970 [nvmet_tcp]
[  113.234294] [     T65]  process_one_work+0x85a/0x1460
[  113.235182] [     T65]  worker_thread+0x5e2/0xfc0
[  113.236194] [     T65]  kthread+0x2d1/0x3a0
[  113.237011] [     T65]  ret_from_fork+0x30/0x70
[  113.237877] [     T65]  ret_from_fork_asm+0x1a/0x30

[  113.239492] [     T65] The buggy address belongs to the object at ffff88812effde80
                           which belongs to the cache kmalloc-96 of size 96
[  113.241571] [     T65] The buggy address is located 0 bytes to the right of
                           allocated 67-byte region [ffff88812effde80, ffff88812effdec3)

[  113.244454] [     T65] The buggy address belongs to the physical page:
[  113.245464] [     T65] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12effd
[  113.246691] [     T65] ksm flags: 0x17ffffc0000000(node=0|zone=2|lastcpupid=0x1fffff)
[  113.247846] [     T65] page_type: f5(slab)
[  113.248740] [     T65] raw: 0017ffffc0000000 ffff888100042280 ffffea0004d9d8c0 0000000000000007
[  113.249964] [     T65] raw: 0000000000000000 0000000080200020 00000001f5000000 0000000000000000
[  113.251188] [     T65] page dumped because: kasan: bad access detected

[  113.252996] [     T65] Memory state around the buggy address:
[  113.254004] [     T65]  ffff88812effdd80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
[  113.255205] [     T65]  ffff88812effde00: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  113.256376] [     T65] >ffff88812effde80: 00 00 00 00 00 00 00 00 03 fc fc fc fc fc fc fc
[  113.257538] [     T65]                                            ^
[  113.258574] [     T65]  ffff88812effdf00: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  113.259775] [     T65]  ffff88812effdf80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
[  113.260954] [     T65] ==================================================================
[  113.262217] [     T65] Disabling lock debugging due to kernel taint
[  113.264174] [    T113] nvme nvme1: qid 0: authenticated with hash hmac(sha256) dhgroup ffdhe2048
[  113.265694] [   T1477] nvme nvme1: qid 0: authenticated
[  113.268242] [   T1477] nvme nvme1: Please enable CONFIG_NVME_MULTIPATH for full support of multi-port devices.

@kawasaki
Copy link
Collaborator

As to the tlshd service existence check, I quickly created a patch which introduces a helper function. Using this, nvme/059 can do the check like this. If you think its useful, feel free to pick them up.

TCP connections can be encrypted using in-kernel TLS, so add a
testcase to exercise the various combinations.

Signed-off-by: Hannes Reinecke <hare@suse.de>
To start secure concatenation the option '--concat' has to be passed
to the 'nvme connect' command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
NVMe-TCP has a 'secure concatenation' mode, where the TLS PSK is
generated from the secret negotiated by the DH-HMAC-CHAP authentication,
and the TLS connection is started after authentication.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants