Skip to content

Commit

Permalink
Remove outdated sections of the readme
Browse files Browse the repository at this point in the history
  • Loading branch information
IsaacKhor committed Sep 10, 2024
1 parent fd659a8 commit 2ceea33
Showing 1 changed file with 22 additions and 121 deletions.
143 changes: 22 additions & 121 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,6 @@ Note that although individual disk performance is important, the main goal is to
be able to support higher aggregate client IOPS against a given backend OSD
pool.

## what's here

This builds `liblsvd.so`, which provides most of the basic RBD API; you can use
`LD_PRELOAD` to use this in place of RBD with `fio`, KVM/QEMU, and a few other
tools. It also includes some tests and tools described below.

The repository also includes scripts to setup a SPDK NVMeoF target.

## Stability

This is NOT production-ready code; it still occasionally crashes, and some
Expand All @@ -32,21 +24,32 @@ other less well-trodden paths.

## How to run

Note that the examples here use the fish shell, that the local nvme cache is
`/dev/nvme0n1`, and that the ceph config files are available in `/etc/ceph`.

```
echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -i -t --privileged --entrypoint /bin/bash ghcr.io/cci-moc/lsvd-rbd:main
sudo docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -v /mnt/nvme0:/lsvd -i -t --privileged --entrypoint /usr/bin/fish ghcr.io/cci-moc/lsvd-rbd:main
```

If the cpu is too old, you might have to rebuild the image:
If you run into an error, you might need to rebuild the image:

```
git clone https://github.com/cci-moc/lsvd-rbd.git
cd lsvd-rbd
docker build -t lsvd-rbd .
docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -i -t --privileged --entrypoint /bin/bash lsvd-rbd
sudo docker run --net host -v /dev/hugepages:/dev/hugepages -v /etc/ceph:/etc/ceph -v /var/tmp:/var/tmp -v /dev/shm:/dev/shm -v /mnt/nvme0:/lsvd -i -t --privileged --entrypoint /usr/bin/fish lsvd-rbd
```

To start the gateway:

```
./build-rel/lsvd_tgt
```

To setup lsvd images:
The target will start listening to rpc commands on `/var/tmp/spdk.sock`.

To create an lsvd image on the backend:

```
#./imgtool create <pool> <imgname> --size 100g
Expand All @@ -56,21 +59,21 @@ To setup lsvd images:
To configure nvmf:

```
export gateway_ip=0.0.0.0
cd subprojects/spdk/scripts
./rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192
./rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1
./rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a $gateway_ip -s 9922
./rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a 0.0.0.0 -s 9922
```

To mount images on the gateway:

```
export PYTHONPATH=/app/src/
./rpc.py --plugin rpc_plugin bdev_lsvd_create lsvd-ssd benchtest1 -c '{"rcache_dir":"/var/tmp/lsvd","wlog_dir":"/var/tmp/lsvd"}'
./rpc.py --plugin rpc_plugin bdev_lsvd_create lsvd-ssd benchtest1 -c '{"rcache_dir":"/lsvd","wlog_dir":"/lsvd"}'
./rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 benchtest1
```

To kill gracefully shutdown gateway:
To gracefully shutdown gateway:

```
./rpc.py --plugin rpc_plugin bdev_lsvd_delete benchtest1
Expand All @@ -80,18 +83,19 @@ docker kill <container id>

## Mount a client

Fill in the appropriate IP address:

```
modprobe nvme-fabrics
nvme disconnect -n nqn.2016-06.io.spdk:cnode1
gw_ip=${gw_ip:-10.1.0.5}
export gw_ip=${gw_ip:-192.168.52.109}
nvme connect -t tcp --traddr $gw_ip -s 9922 -n nqn.2016-06.io.spdk:cnode1 -o normal
sleep 2
nvme list
dev_name=$(nvme list | perl -lane 'print @F[0] if /SPDK/')
printf "Using device $dev_name\n"
```


## Build

This project uses `meson` to manage the build system. Run `make setup` to
Expand Down Expand Up @@ -199,106 +203,3 @@ Allowed options:
```

Other tools live in the `tools` subdirectory - see the README there for more details.

## Usage

### Running SPDK target

You might need to enable hugepages:
```
sudo sh -c 'echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages'
```

Now we start the target, with or without `LD_PRELOAD`, potentially under the debugger. Run `spdk_tgt --help` for more options - in particular, the RPC socket defaults to `/var/tmp/spdk.sock` but a different one can be specified, which might allow running multiple instances of SPDK. Also the roc command has a `—help` option, which is about 500 lines long.

```
SPDK=/mnt/nvme/ceph-nvmeof/spdk
sudo LD_PRELOAD=$PWD/liblsvd.so $SPDK/build/bin/spdk_tgt
```

Here's a simple setup - the first two steps are handled in the ceph-nvmeof python code, and it may be worth looking through the code more to see what options they use.

```
sudo $SPDK/scripts/rpc.py nvmf_create_transport -t TCP -u 16384 -m 8 -c 8192
sudo $SPDK/scripts/rpc.py bdev_rbd_register_cluster rbd_cluster
sudo $SPDK/scripts/rpc.py bdev_rbd_create rbd rbd/fio-target 4096 -c rbd_cluster
sudo $SPDK/scripts/rpc.py nvmf_create_subsystem nqn.2016-06.io.spdk:cnode1 -a -s SPDK00000000000001 -d SPDK_Controller1
sudo $SPDK/scripts/rpc.py nvmf_subsystem_add_ns nqn.2016-06.io.spdk:cnode1 Ceph0
sudo $SPDK/scripts/rpc.py nvmf_subsystem_add_listener nqn.2016-06.io.spdk:cnode1 -t tcp -a 10.1.0.8 -s 5001
```

Note also that you can create a ramdisk test, by (1) creating a ramdisk with brd, and (2) creating another bdev / namespace with `bdev_aio_create`. With the version of SPDK I have, it does 4KB random read/write at about 100K IOPS, or at least it did, a month or two ago, on the HP machines.

Finally, I’m not totally convinced that the options I used are the best ones - the -u/-m/-c options for `create_transport` were blindly copied from a doc page. I’m a little more convinced that specifying a 4KB block size in `dev_rbd_create` is a good idea.

## Tests

There are two tests included: `lsvd_rnd_test` and `lsvd_crash_test`.
They do random writes of various sizes, with random data, and each 512-byte sector is "stamped" with its LBA and a sequence number for the write.
CRCs are saved for each sector, and after a bunch of writes we read everything back and verify that the CRCs match.

### `lsvd_rnd_test`

```
build$ bin/lsvd_rnd_test --help
Usage: lsvd_rnd_test [OPTION...] RUNS
-c, --close close and re-open
-d, --cache-dir=DIR cache directory
-D, --delay add random backend delays
-k, --keep keep data between tests
-l, --len=N run length
-O, --rados use RADOS
-p, --prefix=PREFIX object prefix
-r, --reads=FRAC fraction reads (0.0-1.0)
-R, --reverse reverse NVMe completion order
-s, --seed=S use this seed (one run)
-v, --verbose print LBAs and CRCs
-w, --window=W write window
-x, --existing don't delete existing cache
-z, --size=S volume size (e.g. 1G, 100M)
-Z, --cache-size=N cache size (K/M/G)
-?, --help Give this help list
--usage Give a short usage message
```

Unlike the normal library, it defaults to storing objects on the filesystem; the image name is just the path to the superblock object (the --prefix argument), and other objects live in the same directory.
If you use this, you probably want to use the `--delay` flag, to have object read/write requests subject to random delays.
It creates a volume of --size bytes, does --len random writes of random lengths, and then reads it all back and checks CRCs.
It can do multiple runs; if you don't specify --keep it will delete and recreate the volume between runs.
The --close flag causes it to close and re-open the image between runs; otherwise it stays open.

### `lsvd_rnd_test`

This is pretty similar, except that does the writes in a subprocess which kills itself with `_exit` rather than finishing gracefully, and it has an option to delete the cache before restarting.

This one needs to be run with the file backend, because some of the test options crash the writer, recover the image to read and verify it, then restore it back to its crashed state before starting the writer up again.

It uses the write sequence numbers to figure out which writes made it to disk before the crash, scanning all the sectors to find the highest sequence number stamp, then it veries that the image matches what you would get if you apply all writes up to and including that sequence number.

```
build$ bin/lsvd_crash_test --help
Usage: lsvd_crash_test [OPTION...] RUNS
-2, --seed2 seed-generating seed
-d, --cache-dir=DIR cache directory
-D, --delay add random backend delays
-k, --keep keep data between tests
-l, --len=N run length
-L, --lose-writes=N delete some of last N cache writes
-n, --no-wipe don't clear image between runs
-o, --lose-objs=N delete some of last N objects
-p, --prefix=PREFIX object prefix
-r, --reads=FRAC fraction reads (0.0-1.0)
-R, --reverse reverse NVMe completion order
-s, --seed=S use this seed (one run)
-S, --sleep child sleeps for debug attach
-v, --verbose print LBAs and CRCs
-w, --window=W write window
-W, --wipe-cache delete cache on restart
-x, --existing don't delete existing cache
-z, --size=S volume size (e.g. 1G, 100M)
-Z, --cache-size=N cache size (K/M/G)
-?, --help Give this help list
--usage Give a short usage message
```

0 comments on commit 2ceea33

Please sign in to comment.