Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with master #3

Open
wants to merge 704 commits into
base: rfc-vfio-over-socket
Choose a base branch
from
Open

Sync with master #3

wants to merge 704 commits into from

Conversation

benlwalker
Copy link
Collaborator

This PR merges master into rfc-vfio-over-socket and adds a few commits afterward to deal with some of the relevant changes. The two biggest changes are that the listen_associate transport callback is now synchronous, which was easy to deal with, and spdk_nvmf_request_exec_fabrics has been removed.

To work around not having spdk_nvmf_request_exec_fabrics anymore, delay sending the fabrics CONNECT command for the admin queue until the first property get/set.

This is definitely not all sufficiently tested.

yidong0635 and others added 30 commits August 17, 2020 08:20
Put the NVMF_SECOND_PORT and NVMF_THIRD_PORT to common file.
And clean some wrong comments.

Signed-off-by: yidong0635 <dongx.yi@intel.com>
Change-Id: I2ccd420551bfa686a0d46e317b7a8767b171c2d9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3761
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
The target changed how we deal with custom AERs
so we can't post them and expect to get an
immediate return anymore. The fuzzing application
doesn't expect to not get a return value from the
target so it just spins forever until it times out
and fails. The solution is to not send AERs over
the wire with the fuzzer.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I24fc890dfc05601953552c98690950d0981d70fc
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3739
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Maciej Wawryk <maciejx.wawryk@intel.com>
These suppressions are only valid for the course
of one release. Now that we have started with a
new release, remove all of the suppressions.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I1a36bc49bb3a16f98de870cc06e56dbfa75d72d6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3722
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Monica Kenguva <monica.kenguva@intel.com>
Change-Id: I78ec57d7c0105f5c16261342ddce1294c1a9a2d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3468
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Signed-off-by: Monica Kenguva <monica.kenguva@intel.com>
Change-Id: I9db49b40a4d8276ae754f89257cbcdeae7a50914
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3683
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
As written in doc/iscsi.md, typically the login redirection feature
will be used in scale out iSCSI target system, which runs multiple
SPDK iSCSI target applications.

In scale out iSCSI target system, the initial portal, the current
redirect portal, and the next redirect portal are likely to be in
different SPDK iSCSI target applications.

In this case, asynchronous logout request should be sent independently
from the iSCSI target application which has the current redirect portal.

However, we had added asynchronous logout request into the iSCSI target
application which has the next redirect portal. This idea works only
for the case that login is redirected from the initial portal to a
redirect portal.

We remove asynchronous logout request from iscsi_target_node_redirect()
in this patch, and update the corresponding help documents.

The next patch will add an new RPC to send asynchronous logout
request to all connections to the specified portal group and the
specified target.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ib0ac72e8cdad7e8c64e446b7495e572fac4b5bae
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3779
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
For the login redirection feature, the current implementation works
only if a portal is redirected from an initial portal to a redirect
portal. However, the login redirection feature should work even if a
portal is redirected from one redirect portal to another redirect
portal.

A public portal group knows only a redirect portal and does not know
the portal group of the redirect portal.

Moreover, it is very likely that an initial portal and a redirect portal
exist in different SPDK iSCSI target applications.

To cover all these concerns, add an new iscsi_target_node_request_logout
RPC to request connections whose portal group tag match for the target
node.

To cover potential use cases, make the second parameter portal group
tag optional.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I612672490722fb22fd4eba055998b7408ab84ca5
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3780
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
By the previous patches, updating redirect portal and requesting
logout are done using different RPCs. Update description to reflect
this change.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id1f00bde39446bc2a8de9635135136b8f0194faf
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3781
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Helps us avoid adding a new I/O qpair while the ctrlr
is being destroyed.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: I3bf9318b075125b9d432b885fa9f6f2f44d422d7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3686
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Originally the idea was to disable error checking,
to match output from Kernel and SPDK NVMe cuse.
This includes passing test commands and failures.

Any discrepancy would be caught by log output diff
at the end.

Flaw in this logic is that test command itself might
be incorrect. We shouldn't depend on that, nor
attempt to cover up some of the failures even if
they occur on both interfaces.

Most probable cause for this at all, was NVMe emulated
in QEMU not really working with all the nvme-cli commands
from this test.

Since the original creation of this test, CUSE executes
on physical devices (to be able to support namespace management).

The behavior there is predictable and works with current
test commands, thus the test exits on any error with this patch.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I086faf38b2cbbb6225935cc50d4fad14e81f1972
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3032
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Karol Latecki <karol.latecki@intel.com>
When browsing https://spdk.io/doc/changelog.html
left hand side navigation bar is missing some of the
releases.
This is due to missing 'title' of particular release
that is signified by ":" right after version number.

This patch adds missing ":" and fills out missing titles
for all releases, so they show up properly on the website.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I3bcb0b2e819d311a033d78101034a7adb2c3395a
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3748
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Get rid of WITH_DPDK_DIR and SPDK_RUN_INSTALLED_DPDK,
introduce SPDK_RUN_EXTERNAL_DPDK which can point to
a DPDK dir. It's an empty string by default.

Change-Id: Iff2b3773a4614db07f4196165087a79472e02b9a
Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/867
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
This is to avoid including non-matching pattern as an actual item.

Change-Id: Ie4fbb27e66efa1f56618959bb7db6f0fccfc2847
Signed-off-by: Michal Berger <michalx.berger@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3290
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Mellanox Build Bot
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
…nfigurable

For some use case that there is heavy large read I/O, the performance
bottleneck due to MAX_LARGE_DATAIN_PER_CONNECTION was reported.

The following assumes that all I/Os are large read.

Large read primary task whose I/O size is more than
SPDK_BDEV_LARGE_BUF_MAX_SIZE (=64KB) is split into multiple
read subtasks.

spdk_iscsi_globals::MaxQueueDepth limits maximum number of outstanding
read primary tasks, and MAX_LARGE_DATAIN_PER_CONNECTION (=64)
limits maximum number of outstanding read subtasks.

MAX_LARGE_DATAIN_PER_CONNECTION is also used to calculate PDU pool.

To remove the performance bottleneck, change the macro constant
MAX_LARGE_DATAIN_PER_CONNECTION to a global variable
spdk_iscsi_globals::MaxLargeDataInPerConnection.

We don't see any negative side effect if we set
spdk_iscsi_globals::MaxLargeDataInPerConnection to 64.

The use case that reported the performance issue will change the
value of spdk_iscsi_globals::MaxLargeDataInPerConnection by its own
responsibility.

The next patch will add the value of
spdk_iscsi_globals::MaxLargeDataInPerConnection to iSCSI options,
and make it configurable by JSON RPC.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ifc30cdb8e00d50f4d3755ff399263cf5d0b681b6
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3755
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
…ptions RPC

The next patch will add an new parameter max_large_datain_per_connection
to the iscsi_set_options RPC. It is longer than the column width of
the parameter table.

As a preparation, increase the column width of the parameter table
for the iscsi_set_options RPC first.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Id0f27d608f9c186166cf7a132ae786ba70e398d8
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3782
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add MaxLargeDataInPerConnection to iSCSI global options and make
it configurable by JSON RPC.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ibcd16da2eac64241217bedeb89a7929bbdc67871
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3756
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Other count variables in iSCSI library have used uint32_t rather
than int.

Change the type of spdk_iscsi_conn::pending_r2t from int to uint32_t
and add assert to check if pending_r2t is not negative.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I9bd296c0142b0808ae822952277c9ecc133e5f62
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3775
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
It is likely that the raw number 8 in the macro NUM_PDU_PER_CONNECTION
means 2 * DEFAULT_MAXR2T and the raw number 2 means R2T and Data Out, but
is not certain.

On the other hand, the next patch will make the max number of outstanding
R2Ts per connection configurable.

As a preparation to the next patch, add 2 * DEFAULT_MAXR2T explicitly
to the macro NUM_PDU_PER_CONNECTION.

The next patch will replace DEFAULT_MAXR2T by an new variable.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I8a3be14d53c0abf11d7aade401386601d8fe6c11
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3783
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
By the recent refactoring, we have no static size array for outstanding
R2Ts per connection. It looks that we do not have any critical reason
to prohibit us from making max outstanding R2Ts per connection configurable.

There are some use cases to use large write I/O intensively (e.g. 128KB).
Let such use cases change the value of max R2Ts per connection by their
responsibility to do performance tuning.

Maximum outstanding R2Ts per task are defined both for iSCSI target
and NVMe-TCP target but maximum outstanding R2Ts per connection is
unique for iSCSI target.

The next patch will add the corresponding iSCSI option.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I4f6fd3c750a9a0a99bcf23064fe43a3389829aa9
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3776
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Add MaxR2TPerConnection to iSCSI global options and make it configurable
by JSON RPC.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ida95e5c7dac301a22520656709e1aa4d611f31ef
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3777
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
…umped.

Originally, config->offset was defined as int type.
When the capacity of SSD is very large, such as 8T(P4510),
then bdev->blockcnt2 is 7814037168, config->offset is 3907018584.
At this time, it exceeds the maximum int range of 2147483647 and
becomes a negative number, resulting in core dumped.

Debug info:
config->filename is Nvme1n1.
make_cli_job_config offset is -387948712.
This should be:
config->filename is Nvme1n1.
make_cli_job_config offset is 3907018584.

Change-Id: Ia83d88cc4e56d6c97a6d3fc1a2593b6fc31655b2
Signed-off-by: WANGHAILIANG <hailiangx.e.wang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3818
Community-CI: Broadcom CI
Reviewed-by: GangCao <gang.cao@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Generally, this patch did the following work:

Remove the destruct poller. I think that we do not need this,
the destruct poller is specially for Softwaare RoCE case.
Since SoftRoCE will not have IBV_EVENT_QP_LAST_WQE_REACHED event,
we will not wait the last_wqe_reached flag when srq is enabled.
So we can avoid using the poller.

And the purpose of this patch is to solve the coredump issue.
For example, if we run rdma local test such as, e.g.,

test/nvmf/host/bdevperf.sh --transport=rdma

The coredump reason: the qpair is freed twice. Because for RDMA transport,
we do not really remove the qpair from the group if the upper layer
does it.

The first time is called by nvmf_rdma_destroy_drained_qpair in nvmf_rdma_poller_poll,
and the second time is called by nvmf_rdma_qpair_reject_connection in
in nvme_rdma_close_qpair. Since nvme_rdma_close_qpair will always called,
so we need make sure that the qpair will be close after calling this function.
Otherwise we will have the double free qpair. So our approach here is add a flag
("to_close")in rqpair structure and make sure the rqpair be freed after the
"to_close" is set  nvme_rdma_close_qpair

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I6f97debbcd29bbb7c6e3f9725907b4102a1d2892
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3661
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Seth Howell <seth.howell@intel.com>
This allows for much more granular control over the timeout.

Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: Ib23de21e60eec4207c55320579699edf284f4e16
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3794
Community-CI: Mellanox Build Bot
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Signed-off-by: Seth Howell <seth.howell@intel.com>
Change-Id: If829d399882ef948d95673c17e5689c91386c21d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3795
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
If the transport is broken, we should set errno code in
spdk_nvme_ctrlr_process_admin_completions instead of keeping silence.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ie73763e1329e12a8c82a0223d360991f86c39be3
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3773
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
When using nvme perf program to test against NVME-oF target,
the nvme perf program will hang if we kill the NVMe-oF target.

For example, if we run the following command:

1 On the target side, start a SPDK NVMe-oF target;

2 On the initiator side, we run:
./build/examples/perf -r 'trtype:rdma adrfam:IPv4 traddr:192.168.7.55 trsvcid:4420' -q 128
-o 4096 -w randwrite -t 100

3 Then we kill the NVMe-oF target on the target, the nvme perf program
will hang.

For NVMe perf program, I think that we should check it in

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: Ia864394acdb6e705484dd0db6f015b567eb527a7
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3774
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Change-Id: Ic30fe63a60d2a151a47444fa84e1c99d9b69a454
Signed-off-by: WANGHAILIANG <hailiangx.e.wang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3829
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
When opal_revert_and_init() is interrupted for some reason,
the spdk_tgt still exists, but it should be killed at the same time.

Change-Id: I8546d3b0b4d6a0fda1687558a664decb535ef2b4
Signed-off-by: WANGHAILIANG <hailiangx.e.wang@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3830
Community-CI: Mellanox Build Bot
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
The bdev_examine_bdev api will examine a bdev explicitly. After
disabling the auto_examine feature, a user could call
bdev_examine_bdev to examine a specific bdev he/she wants.

Signed-off-by: Peng Yu <yupeng0921@gmail.com>
Change-Id: Ifbbfb6f667287669ddf6175b8208efee39762933
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3219
Community-CI: Mellanox Build Bot
Community-CI: Broadcom CI
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Michael Haeuptle <michaelhaeuptle@gmail.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
According to page35 in recent NVMe-oF spec (
NVMe-over-Fabrics-1.1-2019.10.22-Ratified), ioccsz is used
to restrict the incapsule size of I/O command, so do not
restrict the NVMe-oF OPC command and also the admin command.

We accidently trigger an bug in kernel since we do not send
the fabrics command with the incapsule and make the kernel
coredump, though the kernel has bugs.

Signed-off-by: Ziye Yang <ziye.yang@intel.com>
Change-Id: I869a2c8ab7b9c2ac1e5cc5b603920662591c2c64
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3837
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: <dongx.yi@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
karlatec and others added 20 commits September 15, 2020 07:14
Signed-off-by: Karol Latecki <karol.latecki@intel.com>
Change-Id: Ia029df5c8c75eff6ba8db9f21756b91a8a0d9720
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4104
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Michal Berger <michalx.berger@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
When net.core.optmem_max is not set high enough, a call to sendmsg()
might fail with ENOBUFS. Currently this is treated as an error.

When we have no more buffer space left, we should continue to process
any completions and by doing so, free up the auxiliary buffers we ran out
of.

With this change I was able to run perf against the spdk target with a
purposely set to a low, value of optmem_max, where previously it would
fail.

This fixes github issue spdk#1592

Signed-off-by: Jeffry Molanus <jeffry.molanus@gmail.com>
Change-Id: Ieeeed4fbecd827d0da815456b57fbe81495fe54d
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4129
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
If the ANA reporting feature is enabled for the subsystem,
- set ANA Change Notice of Asynchronous Event Configuration to 1
- set ANA Change Notice of Optional Asynchronus Event Supported to 1
- set ANA Non-Optimized state and ANA Inaccessible state of ANA
  Capability to 1.

ANA Change state is not used and ANA Persistent Loss state is not
supported for now.

The next patch will actually support ANA Change Notice using an new
RPC.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I4db2e33dd2879cdf995adcab41ef53728b27a201
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4087
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Add an internal API nvmf_subsystem_set_ana_state() to change the
ANA state of the subsystem listener whose trid matches.

ANA optimized state, ANA non-optimized state, and ANA inaccessible
state are supported. ANA change state is not used and ANA persistent
loss state is not supported.

After changing the ANA state of the subsystem listener, on each poll
group, controllers, whose the subsystem listener match, send ANA
change notice.

Initiators query ANA log page anyway if they receive ANA change
notification. False positive notification should be avoided but is
acceptable.

To avoid any concurrency conflict, simply compare ctrlr->listener and
the passed listener.

It may be better to execute nvmf_subsystem_set_ana_state() on the
subsystem thread but currently the RPC thread adds and removes a
listener to and from the subsystem, respectively, and the subsystem
has been suspended while executing nvmf_subsystem_set_ana_state().
Hence we keep this as a future enhancement.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: If1910b79dd33d904114e258ae2c5e868947cdc52
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4079
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Anil Veerabhadrappa <anil.veerabhadrappa@broadcom.com>
Add an new RPC, nvmf_subsystem_listener_set_ana_state.

Find the specified subsystem listener, and then set the ANA state
of the listener by calling nvmf_subsystem_listener_set_ana_state().

By adding a string and an enum to the existing context structure,
nvmf_rpc_listener_ctx, and adding an operation type to the existng
enum, nvmf_rpc_listen_op, reuse the existing code and data as much
as possible.

Besides, insert line break into a few long lines and fix wrong
error log.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I6fb2dfbb1f9c5f56848eba21d2a733fbed802614
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4080
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
…gth array

In C language, we cannot use constant at compile time. Hence the
local array _ana_desc[] is not a fixed size array but a variable
length array.

We can avoid using variable length array by changing const variable
to macro constant.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: I7333a8078d3102c4bd5088f56f6530846854c85f
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4093
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Monica Kenguva <monica.kenguva@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
$USER gives the login name who logged in the terminal and
logname prints the login name who logged in the session.

test/nvmf/common.sh has used $USER to start the SPDK application,
as non-root but test/nvmf/host/perf.sh and
test/nvmf/target/nvmf_example.sh have used logname to start the
example application as non-root.

It looks OK to use $USER to start application as non-root like other
cases.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ia0a17e3bd37a76e4d808e5816ba6716920f8f340
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4090
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Community-CI: Broadcom CI
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
…ariable

Following the last patch, simply set NVMF_EXAMPLE directly without
using echo.

Signed-off-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Change-Id: Ibc3b9268263f44c0b865a15a1ed1a15a404ed23e
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4091
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Aleksey Marchuk <alexeymar@mellanox.com>
uint32_t supports at most 2TB at most, we need to handle
the larger blobstores, fix this overflow problem.

Signed-off-by: Sochin Jiang <jiangxiaoqing.sochin@bytedance.com>
Change-Id: I27950eb759e9cb9ad48fa4aa8dd1976b4e852832
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4075
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Change-Id: I094f8e3c98fb9be4f952efa9ff751e0b1144e391
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4125
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Pawel Piatek <pawelx.piatek@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
The issue happens when SPDK RDMA initiator is connected to a remote
target and this target reports rather small (or zero) ICD and we try
to send several SGL descriptors.
Since SGL descriptors are located in ICD, we should check that their
total length fits into ICD. In other case sending such a command
will cause RDMA errors (local length error)

Change-Id: I8c0e8375dae799bc442ed2fab249cad2c4ccce51
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4131
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Signed-off-by: Maciej Wawryk <maciejx.wawryk@intel.com>
Change-Id: I68766e986268352159114dc2ddc9104a17a31718
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4137
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Maciej Szwed <maciej.szwed@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
DPDK compilation fails when we configure SPDK using --with-reduce:
../drivers/common/qat/qat_qp.c:18:10: fatal error: qat_sym.h: No such file or directory
 #include "qat_sym.h"
          ^~~~~~~~~~~
But when SPDK is also configured using --with-crypto, DPDK is built successfully
--with-crypto enables build of crypto/qat driver and meson config for
this driver adds dpdk/driver/crypto/qat directory to QAT includes. This
directory contains the missing qat_sym.h and qat_sym_pmd.h files.

Group common drivers required by crypto/reduce

Fixes issue spdk#1529

Change-Id: I7a53411798091e9a3c16442f3951ff73138d7337
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4179
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Reviewed-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This makes it easier to zerofy ordering bits.

Change-Id: If5696bfedfff1bf75e41c1449eac7fccb469e98b
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4201
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Paul Luse <paul.e.luse@intel.com>
Rename ordering bit r2t_recv to h2c_send_waiting_ack, that is more
descriptive name.

Change-Id: I6d6143ff4c1cccc74e11226b7974706808092f9a
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/4202
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
This was missed in CI, due to disabled SPDK_TEST_BLOBFS
test flag in UT job that performs check_so_deps.sh.

Error seen when it is enabled:
there was a dependency mismatch in the library blobfs_bdev
The makefile lists: 'bdev blob_bdev blobfs json jsonrpc log rpc thread util'
readelf outputs   : 'bdev blob_bdev blobfs event json jsonrpc log rpc thread util'

Temporary workarounds have been added, ignoring blobfs_dev
in the abi check test. This will allow for better transition
on CI side. After this patch is merged, UT job flags on CI will be
updated to include SPDK_TEST_BLOBFS and abi reference repository
will be recompiled with this flag.

Next patch in series removes this workaround. It will be merged
after work on CI side is done.

Signed-off-by: Tomasz Zawadzki <tomasz.zawadzki@intel.com>
Change-Id: I4753a918d5760f154d4a59349747a0b1356e9c91
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/3961
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Note that the code no longer works after this merge. The
errors will be fixed in the upcoming commits

Change-Id: Ic83dfa99678da535a631e9d91d40b33feb2a6711
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I4a87182a8058cb567bcdd1882c7b4db4e5fdf062
This way, if the target crashes, the file is cleaned up.

Change-Id: Id9682c770c17e23a58aba7f21948bd7adeaa865f
Signed-off-by: Ben Walker <benjamin.walker@intel.com>
We had previously added a hack to work around this in
spdk_nvmf_request_exec_fabrics, but we'd like to move away from that. We
have to wait until the controller is fully initialized (i.e. added to a
poll group) before we can send the connect command to do this correctly,
so we delay until the first property get/set occurs.

Signed-off-by: Ben Walker <benjamin.walker@intel.com>
Change-Id: I5cb910982d1356de15dab78b78243cf59ef76ffe
Copy link

@changpe1 changpe1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For line 1504, the spdk_nvmf_request_exec should not work, because the queue state in NVMf library isn't ACTIVE, so we may call nvmf_ctrlr_process_fabrics_cmd() instead here.

Copy link

@changpe1 changpe1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 1367 struct muser_req *req = cb_arg; should be struct muser_req *req = connect_req->cb_arg;

@changpe1
Copy link

For line 1504, the spdk_nvmf_request_exec should not work, because the queue state in NVMf library isn't ACTIVE, so we may call nvmf_ctrlr_process_fabrics_cmd() instead here.

OK, this can work, only line 1367 need to be fixed.

@changpe1
Copy link

This commit still has an issue, because the libmuser assume that the MMIO R/W is running in synchronous mode, so it still needs some extra changes. let me figure it out.

Copy link

@changpe1 changpe1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit doesn't work for me, we can replace spdk_nvmf_request_exec_fabrics with nvmf_ctrlr_process_fabrics_cmd directly based on original implementation.

We can't connect the ADMIN queue in bar0_access callback, because the function is executed synchronously.

@changpe1
Copy link

This commit doesn't work for me, we can replace spdk_nvmf_request_exec_fabrics with nvmf_ctrlr_process_fabrics_cmd directly based on original implementation.

We can't connect the ADMIN queue in bar0_access callback, because the function is executed synchronously.

Oops, failed again, nvmf_ctrlr_process_fabrics_cmd doesn't work too.

tmakatos pushed a commit that referenced this pull request Feb 19, 2021
There is a chance that admin qpair is being destroyed at
the moment when IO qpair is added to a controller due to e.g.
expired keep alive timer. Part of the qpair destruction process
is change of qpair's state to DEACTIVATING and removing it
from poll group. We can check admin qpair's state and poll
group pointer before sending a message to poll group's thread
and fail connect command.

Logs and backtrace from one CI build that hit this problem:
00:10:53.192  [2021-01-22 15:29:46.671869] ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2016-06.io.spdk:cnode1 due to keep alive timeout.
00:10:53.374  [2021-01-22 15:29:46.854223] ctrlr.c: 185:nvmf_ctrlr_keep_alive_poll: *NOTICE*: Disconnecting host from subsystem nqn.2016-06.io.spdk:cnode2 due to keep alive timeout.
00:10:53.374  ctrlr.c:587:41: runtime error: member access within null pointer of type 'struct spdk_nvmf_poll_group'
00:10:53.486      #0 0x7f9307d3d3d8 in _nvmf_ctrlr_add_io_qpair /home/vagrant/spdk_repo/spdk/lib/nvmf/ctrlr.c:587
00:10:53.486      #1 0x7f93077ea3cd in msg_queue_run_batch /home/vagrant/spdk_repo/spdk/lib/thread/thread.c:553
00:10:53.486      #2 0x7f93077eb66f in thread_poll /home/vagrant/spdk_repo/spdk/lib/thread/thread.c:631
00:10:53.486      #3 0x7f93077ede54 in spdk_thread_poll /home/vagrant/spdk_repo/spdk/lib/thread/thread.c:740
00:10:53.486      spdk#4 0x7f93078366c3 in _reactor_run /home/vagrant/spdk_repo/spdk/lib/event/reactor.c:677
00:10:53.486      spdk#5 0x7f9307836ec8 in reactor_run /home/vagrant/spdk_repo/spdk/lib/event/reactor.c:721
00:10:53.486      spdk#6 0x7f9307837dfb in spdk_reactors_start /home/vagrant/spdk_repo/spdk/lib/event/reactor.c:838
00:10:53.486      spdk#7 0x7f930782f1c4 in spdk_app_start /home/vagrant/spdk_repo/spdk/lib/event/app.c:580
00:10:53.486      spdk#8 0x4024fa in main /home/vagrant/spdk_repo/spdk/app/nvmf_tgt/nvmf_main.c:75
00:10:53.486      spdk#9 0x7f930716d1a2 in __libc_start_main (/lib64/libc.so.6+0x271a2)
00:10:53.486      spdk#10 0x40228d in _start (/home/vagrant/spdk_repo/spdk/build/bin/nvmf_tgt+0x40228d)

Change-Id: I0968eabd1bcd532b8d69434ad5503204c0a2d92b
Signed-off-by: Alexey Marchuk <alexeymar@mellanox.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/6071
Community-CI: Broadcom CI
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: <dongx.yi@intel.com>
tmakatos pushed a commit that referenced this pull request Feb 4, 2022
`struct spdk_vhost_dev vdev` in `struct spdk_vhost_scsi_dev` can be
unregistered in `vhost_scsi_dev_remove`, so we can't use it
anymore in other places after `vhost_dev_unregister`.

Ideally `state->remove_cb` should not take the `vdev` as
the input parameter either, but I don't find it's used
anywhere, so leave it unchanged.

==29555==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000006df0
READ of size 2 at 0x602000006df0 thread T0 (reactor_0)
    #0 0x7f3c246c0f0a  (/lib64/libasan.so.5+0x9cf0a)
    #1 0x7f3c246c3c15 in vsnprintf (/lib64/libasan.so.5+0x9fc15)
    #2 0xa55cfa in spdk_vlog /spdk/lib/log/log.c:158
    #3 0xa5596f in spdk_log /spdk/lib/log/log.c:110
    spdk#4 0x842e43 in remove_scsi_tgt /spdk/lib/vhost/vhost_scsi.c:208
    spdk#5 0x851508 in vhost_scsi_dev_remove_tgt_cpl_cb /spdk/lib/vhost/vhost_scsi.c:1149
    spdk#6 0x8383f1 in foreach_session_finish_cb /spdk/lib/vhost/vhost.c:1144
    spdk#7 0x9d3223 in msg_queue_run_batch /spdk/lib/thread/thread.c:703
    spdk#8 0x9d73fe in thread_poll /spdk/lib/thread/thread.c:919
    spdk#9 0x9d7c3b in spdk_thread_poll /spdk/lib/thread/thread.c:979
    spdk#10 0x8812fe in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#11 0x881bf1 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#12 0x88292b in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#13 0x873ff9 in spdk_app_start /spdk/lib/event/app.c:585
    spdk#14 0x408044 in main /spdk/app/vhost/vhost.c:105
    spdk#15 0x7f3c23691f42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    spdk#16 0x407add in _start (/spdk/build/bin/vhost+0x407add)

0x602000006df0 is located 0 bytes inside of 8-byte region [0x602000006df0,0x602000006df8)
freed by thread T0 (reactor_0) here:
    #0 0x7f3c2473191f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    #1 0x8369f2 in vhost_dev_unregister /spdk/lib/vhost/vhost.c:1024
    #2 0x84f32d in vhost_scsi_dev_remove /spdk/lib/vhost/vhost_scsi.c:913
    #3 0x83cdb7 in spdk_vhost_dev_remove /spdk/lib/vhost/vhost.c:1494
    spdk#4 0x83ed66 in vhost_fini /spdk/lib/vhost/vhost.c:1644
    spdk#5 0x9d3223 in msg_queue_run_batch /spdk/lib/thread/thread.c:703
    spdk#6 0x9d73fe in thread_poll /spdk/lib/thread/thread.c:919
    spdk#7 0x9d7c3b in spdk_thread_poll /spdk/lib/thread/thread.c:979
    spdk#8 0x8812fe in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#9 0x881bf1 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#10 0x88292b in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#11 0x873ff9 in spdk_app_start /spdk/lib/event/app.c:585
    spdk#12 0x408044 in main /spdk/app/vhost/vhost.c:105
    spdk#13 0x7f3c23691f42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

Change-Id: I511c4316a838cd92961d57c9193d384acd49d760
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/10141
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Community-CI: Mellanox Build Bot
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Shuhei Matsumoto <shuhei.matsumoto.xt@hitachi.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
tmakatos pushed a commit that referenced this pull request Feb 8, 2022
The controller data structure may be freed before subsystem resume done
callback, we can take endpoint as the input parameter to avoid this issue.

AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00
READ of size 8 at 0x625000046100 thread T0 (reactor_0)
    #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147
    #1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634
    #2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344
    #3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    spdk#12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd)

0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180)
freed by thread T0 (reactor_0) here:
    #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    #1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976
    #2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996
    #3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742
    spdk#4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604
    spdk#5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055
    spdk#6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026
    spdk#7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041
    spdk#8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576
    spdk#9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127
    spdk#10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433
    spdk#11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

previously allocated by thread T0 (reactor_0) here:
    #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16)
    #1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010
    #2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313
    #3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872
    spdk#4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done

Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
tmakatos pushed a commit that referenced this pull request Feb 23, 2022
The controller data structure may be freed before subsystem resume done
callback, we can take endpoint as the input parameter to avoid this issue.

AddressSanitizer: heap-use-after-free on address 0x625000046100 at pc 0x00000082818f bp 0x7fff7b09bd10 sp 0x7fff7b09bd00
READ of size 8 at 0x625000046100 thread T0 (reactor_0)
    #0 0x82818e in vfio_user_dev_quiesce_resume_done /spdk/lib/nvmf/vfio_user.c:2147
    #1 0x782cc0 in subsystem_state_change_done /spdk/lib/nvmf/subsystem.c:634
    #2 0xad047b in _call_completion /spdk/lib/thread/thread.c:2344
    #3 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#4 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)
    spdk#12 0x407abd in _start (/spdk/build/bin/nvmf_tgt+0x407abd)

0x625000046100 is located 0 bytes inside of 8320-byte region [0x625000046100,0x625000048180)
freed by thread T0 (reactor_0) here:
    #0 0x7f82219ff91f in __interceptor_free (/lib64/libasan.so.5+0x10d91f)
    #1 0x837059 in _free_ctrlr /spdk/lib/nvmf/vfio_user.c:2976
    #2 0x837327 in free_ctrlr /spdk/lib/nvmf/vfio_user.c:2996
    #3 0x843541 in nvmf_vfio_user_close_qpair /spdk/lib/nvmf/vfio_user.c:3742
    spdk#4 0x7d1d91 in nvmf_transport_qpair_fini /spdk/lib/nvmf/transport.c:604
    spdk#5 0x7ad922 in _nvmf_qpair_destroy /spdk/lib/nvmf/nvmf.c:1055
    spdk#6 0x761362 in nvmf_qpair_request_cleanup /spdk/lib/nvmf/ctrlr.c:4026
    spdk#7 0x761906 in spdk_nvmf_request_free /spdk/lib/nvmf/ctrlr.c:4041
    spdk#8 0x75a931 in nvmf_qpair_free_aer /spdk/lib/nvmf/ctrlr.c:3576
    spdk#9 0x7ae626 in spdk_nvmf_qpair_disconnect /spdk/lib/nvmf/nvmf.c:1127
    spdk#10 0x83db36 in _vfio_user_qpair_disconnect /spdk/lib/nvmf/vfio_user.c:3433
    spdk#11 0xabc48d in msg_queue_run_batch /spdk/lib/thread/thread.c:710
    spdk#12 0xac0670 in thread_poll /spdk/lib/thread/thread.c:926
    spdk#13 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#14 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#15 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#16 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#17 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#18 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#19 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

previously allocated by thread T0 (reactor_0) here:
    #0 0x7f82219fff16 in __interceptor_calloc (/lib64/libasan.so.5+0x10df16)
    #1 0x837413 in nvmf_vfio_user_create_ctrlr /spdk/lib/nvmf/vfio_user.c:3010
    #2 0x83bc68 in nvmf_vfio_user_accept /spdk/lib/nvmf/vfio_user.c:3313
    #3 0xabfbd8 in thread_execute_timed_poller /spdk/lib/thread/thread.c:872
    spdk#4 0xac0c75 in thread_poll /spdk/lib/thread/thread.c:960
    spdk#5 0xac0ead in spdk_thread_poll /spdk/lib/thread/thread.c:986
    spdk#6 0x9a5b4f in _reactor_run /spdk/lib/event/reactor.c:920
    spdk#7 0x9a6442 in reactor_run /spdk/lib/event/reactor.c:958
    spdk#8 0x9a717c in spdk_reactors_start /spdk/lib/event/reactor.c:1060
    spdk#9 0x99884a in spdk_app_start /spdk/lib/event/app.c:643
    spdk#10 0x407e82 in main /spdk/app/nvmf_tgt/nvmf_main.c:75
    spdk#11 0x7f822095ff42 in __libc_start_main (/lib64/libc.so.6+0x23f42)

SUMMARY: AddressSanitizer: heap-use-after-free /spdk/lib/nvmf/vfio_user.c:2147 in vfio_user_dev_quiesce_resume_done

Change-Id: Icf5e5b360b9107a3c5eb960ae59b7fe10ace1c66
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-on: https://review.spdk.io/gerrit/c/spdk/spdk/+/11420
Community-CI: Broadcom CI <spdk-ci.pdl@broadcom.com>
Tested-by: SPDK CI Jenkins <sys_sgci@intel.com>
Reviewed-by: Dong Yi <dongx.yi@intel.com>
Reviewed-by: John Levon <levon@movementarian.org>
Reviewed-by: Ben Walker <benjamin.walker@intel.com>
Reviewed-by: Jim Harris <james.r.harris@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.