Skip to content

Commit

Permalink
Merge pull request #534 from baidu/master
Browse files Browse the repository at this point in the history
Merge master
  • Loading branch information
bluebore authored Oct 25, 2016
2 parents 57e5a9b + aaaeeb8 commit 7eebcf6
Show file tree
Hide file tree
Showing 18 changed files with 358 additions and 147 deletions.
138 changes: 65 additions & 73 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,78 +5,70 @@

The Baidu File System (BFS) is a distributed file system designed to support real-time applications. Like many other distributed file systems, BFS is highly fault-tolerant. But different from others, BFS provides low read/write latency while maintains high throughout rates. Together with [Galaxy](https://github.com/baidu/galaxy) and [Tera](http://github.com/baidu/tera), BFS supports many real-time products in Baidu, including Baidu webpage database, Baidu incremental indexing system, Baidu user behavior analysis system, etc.

##背景
百度的核心数据库[Tera](http://github.com/baidu/tera)将数据持久化在分布式文件系统上,分布式文件系统的性能、可用性和扩展性对整个上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统无法很好地满足这几方面的要求,所以我们从Tera需求出发,开发了百度自己的分布式文件系统Baidu File System (BFS)。

##设计目标
1. 高可靠、高可用
通过将数据副本进行多机房、多地域冗余,实现单个机房、地域遇到严重灾害的情况下,不丢失数据。
将元数据服务分布化,通过多副本实现元数据服务的高可用,通过Raft等一致性协议同元数据操作日志,实现多副本的一致性。
2. 高吞吐、低延迟
通过高性能的单机引擎,最大化存储介质IO吞吐;通过全局副本、流量调度,实现负载均衡。
3. 可水平扩展至万台规模
设计支持两地三机房,1万+台机器管理。

##系统架构
系统主要由NameServer、ChunkServer、SDK和bfs_client等几个模块构成。
其中NameServer是中心控制模块,负责目录树的管理;ChunkServer是数据节点负责提供文件块的读写服务;SDK以静态库的形式提供了用户使用的API;bfs_client是一个二进制的管理工具。
![架构图](resources/images/bfs-arch.png)

## 构建
在百度内部,可以直接运行:
sh internal_build.sh
外部构建请参考.travis.yml中的步骤。

## 单机Sandbox测试
Sandbox目录下包含了运行单机测试的环境和脚本。
deploy.sh: 在本地部署一个包含4个chunkserver、1个nameserver的集群
start.sh: 启动部署好的集群
clear.sh: 清理集群
small_test.sh 简单的自动化测试脚本,会调用上面三个脚本,并使用bfs_client测试文件系统的基本功能

## 系统搭建
1. 搭建NameServer
Nameserver部署需要1~3台机器($nshost1~3)
Nameserver必须指定的flag:
--nameserver_nodes=$nshost1:8828,$nshost2:8828,$nshost3:8828
--node_index=$hostid
启动命令:
./nameserver --flagfile=./bfs.flag
2. 搭建Chunkserver
为了保证可用性,chunkserver至少需要4台机器(一台挂掉的情况下,仍然可写)
Chunkserver必须指定的flag:
--nameserver_nodes=$nshost1:8828,$nshost2:8828,$nshost3:8828
--chunkserver_port=8825
--block_store_path=/home/disk1/bfs,/home/disk2/bfs
启动命令:
./chunkserver --flagfile=./bfs.flag
3. 查看集群
有两种方式可以查看集群:
a) 命令行方式
./bfs_client stat -a
b) Web方式
用浏览器访问http://$nshost1:8828/dfs

## 日志规则与说明
为了简化日志打印,并便于grep,
所有block id的打印使用“#%ld "的格式(即前加#,后加空格)
所有chunkserver id打印使用"C%d "的格式
所有entry id打印使用"E%ld "的格式
所有block version打印使用"V%ld "的格式

##前世
突然想写个分布式文件系统~
1. 支持表格系统的持久化数据存储
2. 支持混布系统的临时数据存储
3. 支持mapreduce的大文件存储


想加入的人在这留个名吧:

yanshiguang~
yuanyi~
yuyangquan~
leiliyuan~
yangce~
## Features
1. Continuous availability
* Nameserver is implemented as a `raft group`, no single point failure.
2. High throughput
* High performance data engine to maximize IO utils.
3. Low latency
* Global load balance and slow node detection.
4. Linear scalability
* Support multi data center deployment and up to 10,000 data nodes.

## Architecture
![架构图](resources/images/bfs-arch2-mini.png)

## Quick Start
#### Build
./build.sh
#### Standalone BFS
cd sandbox; ./deploy.sh; ./start.sh

## How to Contribute
1. Please read the [RoadMap](docs/roadmap.md) or source code.
2. Find something you are interested in and start working on it.
3. Test your code by simply running `make test` and `make check`.
4. Make a pull request.
5. Once your code has passed the code-review and merged, it will be run on thousands of servers :)


## Contact us
opensearch@baidu.com

====

[百度文件系统](http://github.com/baidu/bfs)
====

百度的核心业务和数据库系统都依赖分布式文件系统作为底层存储,文件系统的可用性和性能对上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统(如HDFS等)是为离线批处理设计的,无法在保证高吞吐的情况下做到低延迟和持续可用,所以我们从搜索的业务特点出发,设计了百度文件系统。

## 核心特点
1. 持续可用
数据多机房、多地域冗余,元数据通过Raft维护一致性,单个机房宕机,不影响整体可用性。
2. 高吞吐
通过高性能的单机引擎,最大化存储介质IO吞吐;
3. 低延时
全局负载均衡、慢节点自动规避
4. 水平扩展
设计支持两地三机房,1万+台机器管理。

## 架构
![架构图](resources/images/bfs-arch2-mini.png)

## 快速试用
#### 构建
./build.sh
#### 单机版BFS
cd sandbox; ./deploy.sh; ./start.sh

## 如何参与开发
1. 阅读[RoadMap](docs/roadmap.md)文件或者源代码,了解我们当前的开发方向
2. 找到自己感兴趣开发的的功能或模块
3. 进行开发,开发完成后自测功能是否正确,并运行make test及make check检查是否可以通过已有的测试case
4. 发起pull request
5. 在code-review通过后,你的代码便有机会运行在百度的数万台服务器上~


## 联系我们
opensearch@baidu.com

195 changes: 195 additions & 0 deletions build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
#!/bin/bash

set -e -u -E # this script will exit if any sub-command fails

########################################
# download & build depend software
########################################

WORK_DIR=`pwd`
DEPS_SOURCE=`pwd`/thirdsrc
DEPS_PREFIX=`pwd`/thirdparty
DEPS_CONFIG="--prefix=${DEPS_PREFIX} --disable-shared --with-pic"
FLAG_DIR=`pwd`/.build

export PATH=${DEPS_PREFIX}/bin:$PATH
mkdir -p ${DEPS_SOURCE} ${DEPS_PREFIX} ${FLAG_DIR}

if [ ! -f "${FLAG_DIR}/dl_third" ] || [ ! -d "${DEPS_SOURCE}/.git" ]; then
rm -rf ${DEPS_SOURCE}
mkdir ${DEPS_SOURCE}
git clone https://github.com/yvxiang/thirdparty.git thirdsrc
touch "${FLAG_DIR}/dl_third"
fi

cd ${DEPS_SOURCE}

# boost
if [ ! -f "${FLAG_DIR}/boost_1_57_0" ] \
|| [ ! -d "${DEPS_PREFIX}/boost_1_57_0/boost" ]; then
tar zxf boost_1_57_0.tar.gz
rm -rf ${DEPS_PREFIX}/boost_1_57_0
mv boost_1_57_0 ${DEPS_PREFIX}/boost_1_57_0
touch "${FLAG_DIR}/boost_1_57_0"
fi

# protobuf
if [ ! -f "${FLAG_DIR}/protobuf_2_6_1" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libprotobuf.a" ] \
|| [ ! -d "${DEPS_PREFIX}/include/google/protobuf" ]; then
tar zxf protobuf-2.6.1.tar.gz
cd protobuf-2.6.1
./configure ${DEPS_CONFIG}
make -j4
make install
cd -
touch "${FLAG_DIR}/protobuf_2_6_1"
fi

#leveldb
if [ ! -f "${FLAG_DIR}/leveldb" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libleveldb.a" ] \
|| [ ! -d "${DEPS_PREFIX}/include/leveldb" ]; then
rm -rf leveldb
git clone https://github.com/lylei/leveldb.git leveldb
cd leveldb
echo "PREFIX=${DEPS_PREFIX}" > config.mk
make -j4
make install
cd -
touch "${FLAG_DIR}/leveldb"
fi

# snappy
if [ ! -f "${FLAG_DIR}/snappy_1_1_1" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libsnappy.a" ] \
|| [ ! -f "${DEPS_PREFIX}/include/snappy.h" ]; then
tar zxf snappy-1.1.1.tar.gz
cd snappy-1.1.1
./configure ${DEPS_CONFIG}
make -j4
make install
cd -
touch "${FLAG_DIR}/snappy_1_1_1"
fi

# sofa-pbrpc
if [ ! -f "${FLAG_DIR}/sofa-pbrpc" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libsofa-pbrpc.a" ] \
|| [ ! -d "${DEPS_PREFIX}/include/sofa/pbrpc" ]; then
rm -rf sofa-pbrpc

git clone --depth=1 https://github.com/baidu/sofa-pbrpc.git sofa-pbrpc
cd sofa-pbrpc
sed -i '/BOOST_HEADER_DIR=/ d' depends.mk
sed -i '/PROTOBUF_DIR=/ d' depends.mk
sed -i '/SNAPPY_DIR=/ d' depends.mk
echo "BOOST_HEADER_DIR=${DEPS_PREFIX}/boost_1_57_0" >> depends.mk
echo "PROTOBUF_DIR=${DEPS_PREFIX}" >> depends.mk
echo "SNAPPY_DIR=${DEPS_PREFIX}" >> depends.mk
echo "PREFIX=${DEPS_PREFIX}" >> depends.mk
make -j4
make install
cd -
touch "${FLAG_DIR}/sofa-pbrpc"
fi

# cmake for gflags
if ! which cmake ; then
cd CMake-3.2.1
./configure --prefix=${DEPS_PREFIX}
make -j4
make install
cd -
fi

# gflags
if [ ! -f "${FLAG_DIR}/gflags_2_1_1" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libgflags.a" ] \
|| [ ! -d "${DEPS_PREFIX}/include/gflags" ]; then
tar zxf gflags-2.1.1.tar.gz
cd gflags-2.1.1
cmake -DCMAKE_INSTALL_PREFIX=${DEPS_PREFIX} -DGFLAGS_NAMESPACE=google -DCMAKE_CXX_FLAGS=-fPIC
make -j4
make install
cd -
touch "${FLAG_DIR}/gflags_2_1_1"
fi

# gtest
if [ ! -f "${FLAG_DIR}/gtest_1_7_0" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libgtest.a" ] \
|| [ ! -d "${DEPS_PREFIX}/include/gtest" ]; then
cd gtest-1.7.0
./configure ${DEPS_CONFIG}
make
cp -a lib/.libs/* ${DEPS_PREFIX}/lib
cp -a include/gtest ${DEPS_PREFIX}/include
cd -
touch "${FLAG_DIR}/gtest_1_7_0"
fi

# libunwind for gperftools
if [ ! -f "${FLAG_DIR}/libunwind_0_99" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libunwind.a" ] \
|| [ ! -f "${DEPS_PREFIX}/include/libunwind.h" ]; then
tar zxf libunwind-0.99.tar.gz
cd libunwind-0.99
./configure ${DEPS_CONFIG}
make CFLAGS=-fPIC -j4
make CFLAGS=-fPIC install
cd -
touch "${FLAG_DIR}/libunwind_0_99"
fi

# gperftools (tcmalloc)
if [ ! -f "${FLAG_DIR}/gperftools_2_2_1" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libtcmalloc_minimal.a" ]; then
tar zxf gperftools-2.2.1.tar.gz
cd gperftools-2.2.1
./configure ${DEPS_CONFIG} CPPFLAGS=-I${DEPS_PREFIX}/include LDFLAGS=-L${DEPS_PREFIX}/lib
make -j4
make install
cd -
touch "${FLAG_DIR}/gperftools_2_2_1"
fi

# common
if [ ! -f "${FLAG_DIR}/common" ] \
|| [ ! -f "${DEPS_PREFIX}/lib/libcommon.a" ]; then
rm -rf common
git clone https://github.com/baidu/common
cd common
sed -i 's/^PREFIX=.*/PREFIX=..\/..\/thirdparty/' config.mk
sed -i '/^INCLUDE_PATH=*/s/$/ -I..\/..\/thirdparty\/boost_1_57_0/g' Makefile
make -j4
make install
cd -
touch "${FLAG_DIR}/common"
fi


cd ${WORK_DIR}

########################################
# config depengs.mk
########################################

echo "PBRPC_PATH=./thirdparty" > depends.mk
echo "PROTOBUF_PATH=./thirdparty" >> depends.mk
echo "PROTOC_PATH=./thirdparty/bin/" >> depends.mk
echo 'PROTOC=$(PROTOC_PATH)protoc' >> depends.mk
echo "PBRPC_PATH=./thirdparty" >> depends.mk
echo "BOOST_PATH=./thirdparty/boost_1_57_0" >> depends.mk
echo "GFLAG_PATH=./thirdparty" >> depends.mk
echo "GTEST_PATH=./thirdparty" >> depends.mk
echo "COMMON_PATH=./thirdparty" >> depends.mk
echo "TCMALLOC_PATH=./thirdparty" >> depends.mk

########################################
# build tera
########################################

make clean
make -j4

1 change: 0 additions & 1 deletion build_version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ gen_info_template_foot ()
echo "extern const char kBuildTime[] = \"$BUILD_DATE_TIME\";"
echo "extern const char kBuilderName[] = \"$USER\";"
echo "extern const char kHostName[] = \"$BUILD_HOSTNAME\";"
echo "extern const char kCompiler[] = \"$BUILD_GCC_VERSION\";"
}

gen_info_print_template ()
Expand Down
23 changes: 0 additions & 23 deletions docs/README.md

This file was deleted.

25 changes: 25 additions & 0 deletions docs/roadmap.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Roadmap

## Basic functions
- [x] Basic files, directory operations(Create/Delete/Read/Write/Rename)
- [x] automatic recovery
- [x] Nameserver HA
- [ ] Split the Metaserver from the Nameserver
- [ ] disk loadbalance
- [ ] Dynamic load balancing of chunkserver
- [ ] File Lock & Directory Lock
- [x] Simple multi-geographical replica placement
- [ ] sdk lease
- [ ] Skip slow nodes while reading a file

## Posix interface
- [x] mount support
- [ ] fuse lowlevel
- [x] Basic read and write operations(not include random writes)
- [x] Small file random write, support vim, gcc and other applications
- [ ] Large file random write

## Application support
- [x] Tera
- [ ] Shuttle
- [ ] Galaxy
Binary file added resources/images/bfs-arch2-mini.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 7eebcf6

Please sign in to comment.