From 2ceea334db72a57d31800f9f04acb2c1ffa65df1 Mon Sep 17 00:00:00 2001 From: Isaac Khor Date: Tue, 10 Sep 2024 16:53:42 +0000 Subject: [PATCH] Remove outdated sections of the readme --- README.md | 143 +++++++++--------------------------------------------- 1 file changed, 22 insertions(+), 121 deletions(-) diff --git a/README.md b/README.md index 6f02677a..62ebf3da 100644 --- a/README.md +++ b/README.md @@ -13,14 +13,6 @@ Note that although individual disk performance is important, the main goal is to be able to support higher aggregate client IOPS against a given backend OSD pool. -## what's here - -This builds `liblsvd.so`, which provides most of the basic RBD API; you can use -`LD_PRELOAD` to use this in place of RBD with `fio`, KVM/QEMU, and a few other -tools. It also includes some tests and tools described below. - -The repository also includes scripts to setup a SPDK NVMeoF target. - ## Stability This is NOT production-ready code; it still occasionally crashes, and some @@ -32,21 +24,32 @@ other less well-trodden paths. ## How to run +Note that the examples here use the fish shell, that the local nvme cache is +`/dev/nvme0n1`, and that the ceph config files are available in `/etc/ceph`. + ``` echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages -docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -i -t --privileged --entrypoint /bin/bash ghcr.io/cci-moc/lsvd-rbd:main +sudo docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -v /mnt/nvme0:/lsvd -i -t --privileged --entrypoint /usr/bin/fish ghcr.io/cci-moc/lsvd-rbd:main ``` -If the cpu is too old, you might have to rebuild the image: +If you run into an error, you might need to rebuild the image: ``` git clone https://github.com/cci-moc/lsvd-rbd.git cd lsvd-rbd docker build -t lsvd-rbd . -docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -i -t --privileged --entrypoint /bin/bash lsvd-rbd +sudo docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -v /mnt/nvme0:/lsvd -i -t --privileged --entrypoint /usr/bin/fish lsvd-rbd +``` + +To start the gateway: + +``` +./build-rel/lsvd_tgt ``` -To setup lsvd images: +The target will start listening to rpc commands on `/var/tmp/spdk.sock`. + +To create an lsvd image on the backend: ``` #./imgtool create --size 100g @@ -56,21 +59,21 @@ To setup lsvd images: To configure nvmf: ``` -export gateway_ip=0.0.0.0 +cd subprojects/spdk/scripts ./rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192 ./rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1 -./rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a $gateway_ip -s 9922 +./rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a 0.0.0.0 -s 9922 ``` To mount images on the gateway: ``` export PYTHONPATH=/app/src/ -./rpc.py --plugin rpc_plugin bdev_lsvd_create lsvd-ssd benchtest1 -c '{"rcache_dir":"/var/tmp/lsvd","wlog_dir":"/var/tmp/lsvd"}' +./rpc.py --plugin rpc_plugin bdev_lsvd_create lsvd-ssd benchtest1 -c '{"rcache_dir":"/lsvd","wlog_dir":"/lsvd"}' ./rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 benchtest1 ``` -To kill gracefully shutdown gateway: +To gracefully shutdown gateway: ``` ./rpc.py --plugin rpc_plugin bdev_lsvd_delete benchtest1 @@ -80,10 +83,12 @@ docker kill ## Mount a client +Fill in the appropriate IP address: + ``` modprobe nvme-fabrics nvme disconnect -n nqn.2016-06.io.spdk:cnode1 -gw_ip=${gw_ip:-10.1.0.5} +export gw_ip=${gw_ip:-192.168.52.109} nvme connect -t tcp --traddr $gw_ip -s 9922 -n nqn.2016-06.io.spdk:cnode1 -o normal sleep 2 nvme list @@ -91,7 +96,6 @@ dev_name=$(nvme list | perl -lane 'print @F[0] if /SPDK/') printf "Using device $dev_name\n" ``` - ## Build This project uses `meson` to manage the build system. Run `make setup` to @@ -199,106 +203,3 @@ Allowed options: ``` Other tools live in the `tools` subdirectory - see the README there for more details. - -## Usage - -### Running SPDK target - -You might need to enable hugepages: -``` -sudo sh -c 'echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages' -``` - -Now we start the target, with or without `LD_PRELOAD`, potentially under the debugger. Run `spdk_tgt --help` for more options - in particular, the RPC socket defaults to `/var/tmp/spdk.sock` but a different one can be specified, which might allow running multiple instances of SPDK. Also the roc command has a `—help` option, which is about 500 lines long. - -``` -SPDK=/mnt/nvme/ceph-nvmeof/spdk -sudo LD_PRELOAD=$PWD/liblsvd.so $SPDK/build/bin/spdk_tgt -``` - -Here's a simple setup - the first two steps are handled in the ceph-nvmeof python code, and it may be worth looking through the code more to see what options they use. - -``` -sudo $SPDK/scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192 -sudo $SPDK/scripts/rpc.py bdev_rbd_register_cluster rbd_cluster -sudo $SPDK/scripts/rpc.py bdev_rbd_create rbd rbd/fio-target 4096 -c rbd_cluster -sudo $SPDK/scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1 -sudo $SPDK/scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Ceph0 -sudo $SPDK/scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a 10.1.0.8 -s 5001 -``` - -Note also that you can create a ramdisk test, by (1) creating a ramdisk with brd, and (2) creating another bdev / namespace with `bdev_aio_create`. With the version of SPDK I have, it does 4KB random read/write at about 100K IOPS, or at least it did, a month or two ago, on the HP machines. - -Finally, I’m not totally convinced that the options I used are the best ones - the -u/-m/-c options for `create_transport` were blindly copied from a doc page. I’m a little more convinced that specifying a 4KB block size in `dev_rbd_create` is a good idea. - -## Tests - -There are two tests included: `lsvd_rnd_test` and `lsvd_crash_test`. -They do random writes of various sizes, with random data, and each 512-byte sector is "stamped" with its LBA and a sequence number for the write. -CRCs are saved for each sector, and after a bunch of writes we read everything back and verify that the CRCs match. - -### `lsvd_rnd_test` - -``` -build$ bin/lsvd_rnd_test --help -Usage: lsvd_rnd_test [OPTION...] RUNS - - -c, --close close and re-open - -d, --cache-dir=DIR cache directory - -D, --delay add random backend delays - -k, --keep keep data between tests - -l, --len=N run length - -O, --rados use RADOS - -p, --prefix=PREFIX object prefix - -r, --reads=FRAC fraction reads (0.0-1.0) - -R, --reverse reverse NVMe completion order - -s, --seed=S use this seed (one run) - -v, --verbose print LBAs and CRCs - -w, --window=W write window - -x, --existing don't delete existing cache - -z, --size=S volume size (e.g. 1G, 100M) - -Z, --cache-size=N cache size (K/M/G) - -?, --help Give this help list - --usage Give a short usage message -``` - -Unlike the normal library, it defaults to storing objects on the filesystem; the image name is just the path to the superblock object (the --prefix argument), and other objects live in the same directory. -If you use this, you probably want to use the `--delay` flag, to have object read/write requests subject to random delays. -It creates a volume of --size bytes, does --len random writes of random lengths, and then reads it all back and checks CRCs. -It can do multiple runs; if you don't specify --keep it will delete and recreate the volume between runs. -The --close flag causes it to close and re-open the image between runs; otherwise it stays open. - -### `lsvd_rnd_test` - -This is pretty similar, except that does the writes in a subprocess which kills itself with `_exit` rather than finishing gracefully, and it has an option to delete the cache before restarting. - -This one needs to be run with the file backend, because some of the test options crash the writer, recover the image to read and verify it, then restore it back to its crashed state before starting the writer up again. - -It uses the write sequence numbers to figure out which writes made it to disk before the crash, scanning all the sectors to find the highest sequence number stamp, then it veries that the image matches what you would get if you apply all writes up to and including that sequence number. - -``` -build$ bin/lsvd_crash_test --help -Usage: lsvd_crash_test [OPTION...] RUNS - - -2, --seed2 seed-generating seed - -d, --cache-dir=DIR cache directory - -D, --delay add random backend delays - -k, --keep keep data between tests - -l, --len=N run length - -L, --lose-writes=N delete some of last N cache writes - -n, --no-wipe don't clear image between runs - -o, --lose-objs=N delete some of last N objects - -p, --prefix=PREFIX object prefix - -r, --reads=FRAC fraction reads (0.0-1.0) - -R, --reverse reverse NVMe completion order - -s, --seed=S use this seed (one run) - -S, --sleep child sleeps for debug attach - -v, --verbose print LBAs and CRCs - -w, --window=W write window - -W, --wipe-cache delete cache on restart - -x, --existing don't delete existing cache - -z, --size=S volume size (e.g. 1G, 100M) - -Z, --cache-size=N cache size (K/M/G) - -?, --help Give this help list - --usage Give a short usage message -```