Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Sherman is a B+Tree on disaggregated memory; it uses one-sided RDMA verbs to perform all index operations. Sherman includes three techniques to boost write performance:

A hierarchical locks leveraging on-chip memory of RDMA NICs.
Coalescing dependent RDMA commands
Two-level version layout in leaf nodes

For more details, please refer to our paper:

[SIGMOG'22] Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory. Qing Wang and Youyou Lu and Jiwu Shu.

Update (2024.10)

Please use Deft for evaluation, which improving Sherman in performance and correct synchronization.

System Requirements

Mellanox ConnectX-5 NICs and above
RDMA Driver: MLNX_OFED_LINUX-4.7-3.2.9.0 (If you use MLNX_OFED_LINUX-5**, you should modify codes to resolve interface incompatibility)
NIC Firmware: version 16.26.4012 and above (to support on-chip memory, you can use ibstat to obtain the version)
memcached (to exchange QP information)
cityhash
boost 1.53 (to support boost::coroutines::symmetric_coroutine)

Setup about RDMA Network

1. RDMA NIC Selection.

You can modify this line according the RDMA NIC you want to use, where ibv_get_device_name(deviceList[i]) is the name of RNIC (e.g., mlx5_0)

Sherman/src/rdma/Resource.cpp

Line 28 in 9bb9508

if (ibv_get_device_name(deviceList[i])[5] == '0') {

2. Gid Selection.

If you use RoCE, modify gidIndex in this line according to the shell command show_gids, which is usually 3.

Sherman/include/Rdma.h

Line 60 in c5ee9d8

bool createContext(RdmaContext *context, uint8_t port = 1, int gidIndex = 1,

3. MTU Selection.

If you use RoCE and the MTU of your NIC is not equal to 4200 (check with ifconfig), modify the value path_mtu in src/rdma/StateTrans.cpp

4. On-Chip Memory Size Selection.

Change the constant kLockChipMemSize in include/Commmon.h, making it <= max size of on-chip memory.

Getting Started

cd Sherman
./script/hugepage.sh to request huge pages from OS (use ./script/clear_hugepage.sh to return huge pages)
mkdir build; cd build; cmake ..; make -j
cp ../script/restartMemc.sh .
configure ../memcached.conf, where the 1st line is memcached IP, the 2nd is memcached port

For each run with kNodeCount servers:

./restartMemc.sh (to initialize memcached server)
In each server, execute ./benchmark kNodeCount kReadRatio kThreadCount

We emulate each server as one compute node and one memory node: In each server, as the compute node, we launch kThreadCount client threads; as the memory node, we launch one memory thread. kReadRatio is the ratio of get operations.

In ./test/benchmark.cpp, we can modify kKeySpace and zipfan, to generate different workloads. In addition, we can open the macro USE_CORO to bind kCoroCnt coroutine on each client thread.

Known bugs

The two-level version may induce inconsistency in some concurrent cases. Refer to this SIGMOD'23 paper

TODO

Re-write delete operations

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
include		include
script		script
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
memcached.conf		memcached.conf
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Update (2024.10)

System Requirements

Setup about RDMA Network

Getting Started

Known bugs

TODO

About

Releases

Packages

Languages

thustorage/Sherman

Folders and files

Latest commit

History

Repository files navigation

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Update (2024.10)

System Requirements

Setup about RDMA Network

Getting Started

Known bugs

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages