diff --git a/README.md b/README.md index 68182fbf..86c168a9 100644 --- a/README.md +++ b/README.md @@ -5,78 +5,70 @@ The Baidu File System (BFS) is a distributed file system designed to support real-time applications. Like many other distributed file systems, BFS is highly fault-tolerant. But different from others, BFS provides low read/write latency while maintains high throughout rates. Together with [Galaxy](https://github.com/baidu/galaxy) and [Tera](http://github.com/baidu/tera), BFS supports many real-time products in Baidu, including Baidu webpage database, Baidu incremental indexing system, Baidu user behavior analysis system, etc. -##背景 -百度的核心数据库[Tera](http://github.com/baidu/tera)将数据持久化在分布式文件系统上,分布式文件系统的性能、可用性和扩展性对整个上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统无法很好地满足这几方面的要求,所以我们从Tera需求出发,开发了百度自己的分布式文件系统Baidu File System (BFS)。 - -##设计目标 -1. 高可靠、高可用 -通过将数据副本进行多机房、多地域冗余,实现单个机房、地域遇到严重灾害的情况下,不丢失数据。 -将元数据服务分布化,通过多副本实现元数据服务的高可用,通过Raft等一致性协议同元数据操作日志,实现多副本的一致性。 -2. 高吞吐、低延迟 -通过高性能的单机引擎,最大化存储介质IO吞吐;通过全局副本、流量调度,实现负载均衡。 -3. 可水平扩展至万台规模 -设计支持两地三机房,1万+台机器管理。 - -##系统架构 -系统主要由NameServer、ChunkServer、SDK和bfs_client等几个模块构成。 -其中NameServer是中心控制模块,负责目录树的管理;ChunkServer是数据节点负责提供文件块的读写服务;SDK以静态库的形式提供了用户使用的API;bfs_client是一个二进制的管理工具。 -![架构图](resources/images/bfs-arch.png) - -## 构建 -在百度内部,可以直接运行: -sh internal_build.sh -外部构建请参考.travis.yml中的步骤。 - -## 单机Sandbox测试 -Sandbox目录下包含了运行单机测试的环境和脚本。 -deploy.sh: 在本地部署一个包含4个chunkserver、1个nameserver的集群 -start.sh: 启动部署好的集群 -clear.sh: 清理集群 -small_test.sh 简单的自动化测试脚本,会调用上面三个脚本,并使用bfs_client测试文件系统的基本功能 - -## 系统搭建 -1. 搭建NameServer -Nameserver部署需要1~3台机器($nshost1~3) -Nameserver必须指定的flag: ---nameserver_nodes=$nshost1:8828,$nshost2:8828,$nshost3:8828 ---node_index=$hostid -启动命令: -./nameserver --flagfile=./bfs.flag -2. 搭建Chunkserver -为了保证可用性,chunkserver至少需要4台机器(一台挂掉的情况下,仍然可写) -Chunkserver必须指定的flag: ---nameserver_nodes=$nshost1:8828,$nshost2:8828,$nshost3:8828 ---chunkserver_port=8825 ---block_store_path=/home/disk1/bfs,/home/disk2/bfs -启动命令: -./chunkserver --flagfile=./bfs.flag -3. 查看集群 -有两种方式可以查看集群: -a) 命令行方式 - ./bfs_client stat -a -b) Web方式 - 用浏览器访问http://$nshost1:8828/dfs - -## 日志规则与说明 -为了简化日志打印,并便于grep, -所有block id的打印使用“#%ld "的格式(即前加#,后加空格) -所有chunkserver id打印使用"C%d "的格式 -所有entry id打印使用"E%ld "的格式 -所有block version打印使用"V%ld "的格式 - -##前世 -突然想写个分布式文件系统~ - 1. 支持表格系统的持久化数据存储 - 2. 支持混布系统的临时数据存储 - 3. 支持mapreduce的大文件存储 - - -想加入的人在这留个名吧: - -yanshiguang~ -yuanyi~ -yuyangquan~ -leiliyuan~ -yangce~ +## Features +1. Continuous availability + * Nameserver is implemented as a `raft group`, no single point failure. +2. High throughput + * High performance data engine to maximize IO utils. +3. Low latency + * Global load balance and slow node detection. +4. Linear scalability + * Support multi data center deployment and up to 10,000 data nodes. +## Architecture +![架构图](resources/images/bfs-arch2-mini.png) + +## Quick Start +#### Build +./build.sh +#### Standalone BFS +cd sandbox; ./deploy.sh; ./start.sh + +## How to Contribute +1. Please read the [RoadMap](docs/roadmap.md) or source code. +2. Find something you are interested in and start working on it. +3. Test your code by simply running `make test` and `make check`. +4. Make a pull request. +5. Once your code has passed the code-review and merged, it will be run on thousands of servers :) + + +## Contact us +opensearch@baidu.com + +==== + +[百度文件系统](http://github.com/baidu/bfs) +==== + +百度的核心业务和数据库系统都依赖分布式文件系统作为底层存储,文件系统的可用性和性能对上层搜索业务的稳定性与效果有着至关重要的影响。现有的分布式文件系统(如HDFS等)是为离线批处理设计的,无法在保证高吞吐的情况下做到低延迟和持续可用,所以我们从搜索的业务特点出发,设计了百度文件系统。 + +## 核心特点 +1. 持续可用 +数据多机房、多地域冗余,元数据通过Raft维护一致性,单个机房宕机,不影响整体可用性。 +2. 高吞吐 +通过高性能的单机引擎,最大化存储介质IO吞吐; +3. 低延时 +全局负载均衡、慢节点自动规避 +4. 水平扩展 +设计支持两地三机房,1万+台机器管理。 + +## 架构 +![架构图](resources/images/bfs-arch2-mini.png) + +## 快速试用 +#### 构建 +./build.sh +#### 单机版BFS +cd sandbox; ./deploy.sh; ./start.sh + +## 如何参与开发 +1. 阅读[RoadMap](docs/roadmap.md)文件或者源代码,了解我们当前的开发方向 +2. 找到自己感兴趣开发的的功能或模块 +3. 进行开发,开发完成后自测功能是否正确,并运行make test及make check检查是否可以通过已有的测试case +4. 发起pull request +5. 在code-review通过后,你的代码便有机会运行在百度的数万台服务器上~ + + +## 联系我们 +opensearch@baidu.com diff --git a/build.sh b/build.sh new file mode 100755 index 00000000..b4a59839 --- /dev/null +++ b/build.sh @@ -0,0 +1,195 @@ +#!/bin/bash + +set -e -u -E # this script will exit if any sub-command fails + +######################################## +# download & build depend software +######################################## + +WORK_DIR=`pwd` +DEPS_SOURCE=`pwd`/thirdsrc +DEPS_PREFIX=`pwd`/thirdparty +DEPS_CONFIG="--prefix=${DEPS_PREFIX} --disable-shared --with-pic" +FLAG_DIR=`pwd`/.build + +export PATH=${DEPS_PREFIX}/bin:$PATH +mkdir -p ${DEPS_SOURCE} ${DEPS_PREFIX} ${FLAG_DIR} + +if [ ! -f "${FLAG_DIR}/dl_third" ] || [ ! -d "${DEPS_SOURCE}/.git" ]; then + rm -rf ${DEPS_SOURCE} + mkdir ${DEPS_SOURCE} + git clone https://github.com/yvxiang/thirdparty.git thirdsrc + touch "${FLAG_DIR}/dl_third" +fi + +cd ${DEPS_SOURCE} + +# boost +if [ ! -f "${FLAG_DIR}/boost_1_57_0" ] \ + || [ ! -d "${DEPS_PREFIX}/boost_1_57_0/boost" ]; then + tar zxf boost_1_57_0.tar.gz + rm -rf ${DEPS_PREFIX}/boost_1_57_0 + mv boost_1_57_0 ${DEPS_PREFIX}/boost_1_57_0 + touch "${FLAG_DIR}/boost_1_57_0" +fi + +# protobuf +if [ ! -f "${FLAG_DIR}/protobuf_2_6_1" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libprotobuf.a" ] \ + || [ ! -d "${DEPS_PREFIX}/include/google/protobuf" ]; then + tar zxf protobuf-2.6.1.tar.gz + cd protobuf-2.6.1 + ./configure ${DEPS_CONFIG} + make -j4 + make install + cd - + touch "${FLAG_DIR}/protobuf_2_6_1" +fi + +#leveldb +if [ ! -f "${FLAG_DIR}/leveldb" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libleveldb.a" ] \ + || [ ! -d "${DEPS_PREFIX}/include/leveldb" ]; then + rm -rf leveldb + git clone https://github.com/lylei/leveldb.git leveldb + cd leveldb + echo "PREFIX=${DEPS_PREFIX}" > config.mk + make -j4 + make install + cd - + touch "${FLAG_DIR}/leveldb" +fi + +# snappy +if [ ! -f "${FLAG_DIR}/snappy_1_1_1" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libsnappy.a" ] \ + || [ ! -f "${DEPS_PREFIX}/include/snappy.h" ]; then + tar zxf snappy-1.1.1.tar.gz + cd snappy-1.1.1 + ./configure ${DEPS_CONFIG} + make -j4 + make install + cd - + touch "${FLAG_DIR}/snappy_1_1_1" +fi + +# sofa-pbrpc +if [ ! -f "${FLAG_DIR}/sofa-pbrpc" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libsofa-pbrpc.a" ] \ + || [ ! -d "${DEPS_PREFIX}/include/sofa/pbrpc" ]; then + rm -rf sofa-pbrpc + + git clone --depth=1 https://github.com/baidu/sofa-pbrpc.git sofa-pbrpc + cd sofa-pbrpc + sed -i '/BOOST_HEADER_DIR=/ d' depends.mk + sed -i '/PROTOBUF_DIR=/ d' depends.mk + sed -i '/SNAPPY_DIR=/ d' depends.mk + echo "BOOST_HEADER_DIR=${DEPS_PREFIX}/boost_1_57_0" >> depends.mk + echo "PROTOBUF_DIR=${DEPS_PREFIX}" >> depends.mk + echo "SNAPPY_DIR=${DEPS_PREFIX}" >> depends.mk + echo "PREFIX=${DEPS_PREFIX}" >> depends.mk + make -j4 + make install + cd - + touch "${FLAG_DIR}/sofa-pbrpc" +fi + +# cmake for gflags +if ! which cmake ; then + cd CMake-3.2.1 + ./configure --prefix=${DEPS_PREFIX} + make -j4 + make install + cd - +fi + +# gflags +if [ ! -f "${FLAG_DIR}/gflags_2_1_1" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libgflags.a" ] \ + || [ ! -d "${DEPS_PREFIX}/include/gflags" ]; then + tar zxf gflags-2.1.1.tar.gz + cd gflags-2.1.1 + cmake -DCMAKE_INSTALL_PREFIX=${DEPS_PREFIX} -DGFLAGS_NAMESPACE=google -DCMAKE_CXX_FLAGS=-fPIC + make -j4 + make install + cd - + touch "${FLAG_DIR}/gflags_2_1_1" +fi + +# gtest +if [ ! -f "${FLAG_DIR}/gtest_1_7_0" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libgtest.a" ] \ + || [ ! -d "${DEPS_PREFIX}/include/gtest" ]; then + cd gtest-1.7.0 + ./configure ${DEPS_CONFIG} + make + cp -a lib/.libs/* ${DEPS_PREFIX}/lib + cp -a include/gtest ${DEPS_PREFIX}/include + cd - + touch "${FLAG_DIR}/gtest_1_7_0" +fi + +# libunwind for gperftools +if [ ! -f "${FLAG_DIR}/libunwind_0_99" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libunwind.a" ] \ + || [ ! -f "${DEPS_PREFIX}/include/libunwind.h" ]; then + tar zxf libunwind-0.99.tar.gz + cd libunwind-0.99 + ./configure ${DEPS_CONFIG} + make CFLAGS=-fPIC -j4 + make CFLAGS=-fPIC install + cd - + touch "${FLAG_DIR}/libunwind_0_99" +fi + +# gperftools (tcmalloc) +if [ ! -f "${FLAG_DIR}/gperftools_2_2_1" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libtcmalloc_minimal.a" ]; then + tar zxf gperftools-2.2.1.tar.gz + cd gperftools-2.2.1 + ./configure ${DEPS_CONFIG} CPPFLAGS=-I${DEPS_PREFIX}/include LDFLAGS=-L${DEPS_PREFIX}/lib + make -j4 + make install + cd - + touch "${FLAG_DIR}/gperftools_2_2_1" +fi + +# common +if [ ! -f "${FLAG_DIR}/common" ] \ + || [ ! -f "${DEPS_PREFIX}/lib/libcommon.a" ]; then + rm -rf common + git clone https://github.com/baidu/common + cd common + sed -i 's/^PREFIX=.*/PREFIX=..\/..\/thirdparty/' config.mk + sed -i '/^INCLUDE_PATH=*/s/$/ -I..\/..\/thirdparty\/boost_1_57_0/g' Makefile + make -j4 + make install + cd - + touch "${FLAG_DIR}/common" +fi + + +cd ${WORK_DIR} + +######################################## +# config depengs.mk +######################################## + +echo "PBRPC_PATH=./thirdparty" > depends.mk +echo "PROTOBUF_PATH=./thirdparty" >> depends.mk +echo "PROTOC_PATH=./thirdparty/bin/" >> depends.mk +echo 'PROTOC=$(PROTOC_PATH)protoc' >> depends.mk +echo "PBRPC_PATH=./thirdparty" >> depends.mk +echo "BOOST_PATH=./thirdparty/boost_1_57_0" >> depends.mk +echo "GFLAG_PATH=./thirdparty" >> depends.mk +echo "GTEST_PATH=./thirdparty" >> depends.mk +echo "COMMON_PATH=./thirdparty" >> depends.mk +echo "TCMALLOC_PATH=./thirdparty" >> depends.mk + +######################################## +# build tera +######################################## + +make clean +make -j4 + diff --git a/build_version.sh b/build_version.sh index 1fd54332..5e30501e 100755 --- a/build_version.sh +++ b/build_version.sh @@ -19,7 +19,6 @@ gen_info_template_foot () echo "extern const char kBuildTime[] = \"$BUILD_DATE_TIME\";" echo "extern const char kBuilderName[] = \"$USER\";" echo "extern const char kHostName[] = \"$BUILD_HOSTNAME\";" - echo "extern const char kCompiler[] = \"$BUILD_GCC_VERSION\";" } gen_info_print_template () diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index 8d3b2ed6..00000000 --- a/docs/README.md +++ /dev/null @@ -1,23 +0,0 @@ -TODO -======== -* chunkserver支持全异步写 -* 整理错误码 -* 目录移动(In progress) -* 副本恢复 -* master主从同步 -* Docker image -* 负载均衡 -* 机架&数据中心感知 -* 规避慢节点 - -DONE -======== -* 文件移动 -* 滑动窗口 -* 定时器 -* Tera env -* 目录删除 -* 增量Report -* 自定义副本数 -* 多盘支持 -* 扇出写支持 diff --git a/docs/roadmap.md b/docs/roadmap.md new file mode 100644 index 00000000..084ff365 --- /dev/null +++ b/docs/roadmap.md @@ -0,0 +1,25 @@ +# Roadmap + +## Basic functions +- [x] Basic files, directory operations(Create/Delete/Read/Write/Rename) +- [x] automatic recovery +- [x] Nameserver HA +- [ ] Split the Metaserver from the Nameserver +- [ ] disk loadbalance +- [ ] Dynamic load balancing of chunkserver +- [ ] File Lock & Directory Lock +- [x] Simple multi-geographical replica placement +- [ ] sdk lease +- [ ] Skip slow nodes while reading a file + +## Posix interface +- [x] mount support +- [ ] fuse lowlevel +- [x] Basic read and write operations(not include random writes) +- [x] Small file random write, support vim, gcc and other applications +- [ ] Large file random write + +## Application support +- [x] Tera +- [ ] Shuttle +- [ ] Galaxy diff --git a/resources/images/bfs-arch2-mini.png b/resources/images/bfs-arch2-mini.png new file mode 100644 index 00000000..8e2e3e8f Binary files /dev/null and b/resources/images/bfs-arch2-mini.png differ diff --git a/src/chunkserver/block_manager.cc b/src/chunkserver/block_manager.cc index 013120b4..a7d1ad54 100644 --- a/src/chunkserver/block_manager.cc +++ b/src/chunkserver/block_manager.cc @@ -234,15 +234,15 @@ bool BlockManager::SetNameSpaceVersion(int64_t version) { return true; } -bool BlockManager::ListBlocks(std::vector* blocks, int64_t offset, int32_t num) { +int64_t BlockManager::ListBlocks(std::vector* blocks, int64_t offset, int32_t num) { leveldb::Iterator* it = metadb_->NewIterator(leveldb::ReadOptions()); + int64_t largest_id = 0; for (it->Seek(BlockId2Str(offset)); it->Valid(); it->Next()) { int64_t block_id = 0; if (1 != sscanf(it->key().data(), "%ld", &block_id)) { LOG(WARNING, "[ListBlocks] Unknown meta key: %s\n", it->key().ToString().c_str()); - delete it; - return false; + break; } BlockMeta meta; bool ret = meta.ParseFromArray(it->value().data(), it->value().size()); @@ -254,13 +254,14 @@ bool BlockManager::ListBlocks(std::vector* blocks, int64_t offset, in } assert(meta.block_id() == block_id); blocks->push_back(meta); + largest_id = block_id; // LOG(DEBUG, "List block %ld", block_id); if (--num <= 0) { break; } } delete it; - return true; + return largest_id; } Block* BlockManager::CreateBlock(int64_t block_id, int64_t* sync_time, StatusCode* status) { diff --git a/src/chunkserver/block_manager.h b/src/chunkserver/block_manager.h index 47bc98cb..b7e5c18a 100644 --- a/src/chunkserver/block_manager.h +++ b/src/chunkserver/block_manager.h @@ -37,7 +37,7 @@ class BlockManager { bool LoadStorage(); int64_t NameSpaceVersion() const; bool SetNameSpaceVersion(int64_t version); - bool ListBlocks(std::vector* blocks, int64_t offset, int32_t num); + int64_t ListBlocks(std::vector* blocks, int64_t offset, int32_t num); Block* CreateBlock(int64_t block_id, int64_t* sync_time, StatusCode* status); Block* FindBlock(int64_t block_id); std::string BlockId2Str(int64_t block_id); diff --git a/src/chunkserver/chunkserver_impl.cc b/src/chunkserver/chunkserver_impl.cc index 92a911ac..919c8eca 100644 --- a/src/chunkserver/chunkserver_impl.cc +++ b/src/chunkserver/chunkserver_impl.cc @@ -81,6 +81,8 @@ ChunkServerImpl::ChunkServerImpl() blockreport_task_id_(-1), last_report_blockid_(-1), report_id_(0), + is_first_round_(true), + first_round_report_start_(-1), service_stop_(false) { data_server_addr_ = common::util::GetLocalHostName() + ":" + FLAGS_chunkserver_port; params_.set_report_interval(FLAGS_blockreport_interval); @@ -184,6 +186,8 @@ void ChunkServerImpl::Register() { assert (response.chunkserver_id() != -1); chunkserver_id_ = response.chunkserver_id(); report_id_ = response.report_id() + 1; + first_round_report_start_ = last_report_blockid_; + is_first_round_ = true; LOG(INFO, "Connect to nameserver version= %ld, cs_id = C%d report_interval = %d " "report_size = %d report_id = %ld", block_manager_->NameSpaceVersion(), chunkserver_id_, @@ -256,7 +260,19 @@ void ChunkServerImpl::SendBlockReport() { int64_t last_report_id = report_id_; std::vector blocks; - block_manager_->ListBlocks(&blocks, last_report_blockid_ + 1, params_.report_size()); + int32_t num = is_first_round_ ? 10000 : params_.report_size(); + int32_t end = block_manager_->ListBlocks(&blocks, last_report_blockid_ + 1, num); + // last id + 1 <= first found report start <= end -> first round ends + if (is_first_round_ && + (last_report_blockid_ + 1) <= first_round_report_start_ && + first_round_report_start_ <= end) { + is_first_round_ = false; + LOG(INFO, "First round report done"); + } + // bugfix, need an elegant implementation T_T + if (is_first_round_ && first_round_report_start_ == -1) { + first_round_report_start_ = 0; + } int64_t blocks_num = blocks.size(); for (int64_t i = 0; i < blocks_num; i++) { @@ -266,28 +282,18 @@ void ChunkServerImpl::SendBlockReport() { info->set_version(blocks[i].version()); } - if (blocks_num < params_.report_size()) { + if (blocks_num < num) { last_report_blockid_ = -1; } else { - if (blocks_num) { - last_report_blockid_ = blocks[blocks_num - 1].block_id(); - } - } - if (blocks_num == 0) { - request.set_end(0); - } else { - request.set_end(blocks[blocks_num - 1].block_id()); + last_report_blockid_ = end; } + request.set_end(end); BlockReportResponse response; - int64_t before_report = common::timer::get_micros(); + common::timer::TimeChecker checker; bool ret = nameserver_->SendRequest(&NameServer_Stub::BlockReport, &request, &response, FLAGS_block_report_timeout); - int64_t after_report = common::timer::get_micros(); - if (after_report - before_report > 20 * 1000 * 1000) { - LOG(WARNING, "Block report use %ld ms last_id %lu (%lu)", - (after_report - before_report) / 1000, last_report_id, request.sequence_id()); - } + checker.Check(20 * 1000 * 1000, "[SendBlockReport] SendRequest"); if (!ret) { LOG(WARNING, "Block report fail last_id %lu (%lu)\n", last_report_id, request.sequence_id()); } else { diff --git a/src/chunkserver/chunkserver_impl.h b/src/chunkserver/chunkserver_impl.h index ca8a9a98..6c3fc771 100644 --- a/src/chunkserver/chunkserver_impl.h +++ b/src/chunkserver/chunkserver_impl.h @@ -94,6 +94,8 @@ class ChunkServerImpl : public ChunkServer { volatile int64_t blockreport_task_id_; int64_t last_report_blockid_; int64_t report_id_; + bool is_first_round_; + int64_t first_round_report_start_; volatile bool service_stop_; Params params_; diff --git a/src/chunkserver/data_block.cc b/src/chunkserver/data_block.cc index 2ac2cbc2..fa617653 100644 --- a/src/chunkserver/data_block.cc +++ b/src/chunkserver/data_block.cc @@ -240,15 +240,16 @@ int64_t Block::Read(char* buf, int64_t len, int64_t offset) { /// Write operation. bool Block::Write(int32_t seq, int64_t offset, const char* data, int64_t len, int64_t* add_use) { - if (finished_) { - LOG(INFO, "Write a finish block #%ld V%ld %ld, seq: %d, offset: %ld", - meta_.block_id(), meta_.version(), meta_.block_size(), seq, offset); + MutexLock lock(&mu_, "BlockWrite", 1000); + if (finished_ || deleted_) { + LOG(INFO, "Write a finish block #%ld V%ld %ld, seq: %d, offset: %ld, finished: %d, deleted: %d", + meta_.block_id(), meta_.version(), meta_.block_size(), seq, offset, finished_, deleted_); return false; } if (offset < meta_.block_size()) { - assert (offset + len <= meta_.block_size()); LOG(INFO, "Write #%ld size %ld, seq: %d, wrong offset: %ld", meta_.block_id(), meta_.block_size(), seq, offset); + assert (offset + len <= meta_.block_size()); return true; } char* buf = NULL; @@ -394,12 +395,7 @@ bool Block::IsRecover() { } /// Append to block buffer StatusCode Block::Append(int32_t seq, const char* buf, int64_t len) { - MutexLock lock(&mu_, "BlockAppend", 1000); - if (finished_ || deleted_) { - LOG(INFO, "[Append] block #%ld closed, do not append to blockbuf_. finished_=%d, deleted_=%d", - meta_.block_id(), finished_, deleted_); - return kBlockClosed; - } + mu_.AssertHeld(); if (blockbuf_ == NULL) { buflen_ = FLAGS_write_buf_size; blockbuf_ = new char[buflen_]; diff --git a/src/client/bfs_client.cc b/src/client/bfs_client.cc index b4a9cc28..3b986a38 100644 --- a/src/client/bfs_client.cc +++ b/src/client/bfs_client.cc @@ -37,11 +37,8 @@ void print_usage() { printf("\t put : copy file from local to bfs\n"); printf("\t rmdir : remove empty directory\n"); printf("\t rmr : remove directory recursively\n"); - printf("\t change_replica_num : change replica num of to \n"); printf("\t du : count disk usage for path\n"); printf("\t stat : list current stat of the file system\n"); - printf("\t shutdownchunkserver : shutdownt chunkservers in the list file\n"); - printf("\t shutdownstat : display stat of shutdown chunkserver progress\n"); } int BfsMkdir(baidu::bfs::FS* fs, int argc, char* argv[]) { @@ -269,7 +266,8 @@ int BfsDu(baidu::bfs::FS* fs, int argc, char* argv[]) { std::string path = argv[0]; assert(path.size() > 0); if (path[path.size() - 1] != '*') { - return BfsDuV2(fs, path); + int64_t du_size = BfsDuV2(fs, path); + return du_size >= 0 ? 0 : -1; } // Wildcard diff --git a/src/flags.cc b/src/flags.cc index 6f87831e..86f8cf98 100644 --- a/src/flags.cc +++ b/src/flags.cc @@ -23,7 +23,7 @@ DEFINE_int32(keepalive_timeout, 10, "Chunkserver keepalive timeout"); DEFINE_int32(default_replica_num, 3, "Default replica num of data block"); DEFINE_int32(nameserver_log_level, 4, "Nameserver log level"); DEFINE_string(nameserver_warninglog, "./wflog", "Warning log file"); -DEFINE_int32(nameserver_start_recover_timeout, 120, "Nameserver starts recover in second"); +DEFINE_int32(nameserver_start_recover_timeout, 3600, "Nameserver starts recover in second"); DEFINE_int32(recover_speed, 100, "Max num of block to recover for one chunkserver"); DEFINE_int32(recover_dest_limit, 5, "Number of recover dest"); DEFINE_int32(hi_recover_timeout, 180, "Recover timeout for high priority blocks"); diff --git a/src/nameserver/nameserver_impl.cc b/src/nameserver/nameserver_impl.cc index 8d5730fa..33e641d4 100644 --- a/src/nameserver/nameserver_impl.cc +++ b/src/nameserver/nameserver_impl.cc @@ -38,6 +38,7 @@ DECLARE_int32(blockmapping_bucket_num); DECLARE_int32(hi_recover_timeout); DECLARE_int32(lo_recover_timeout); DECLARE_int32(block_report_timeout); +DECLARE_bool(clean_redundancy); namespace baidu { namespace bfs { @@ -110,7 +111,7 @@ void NameServerImpl::CheckRecoverMode() { work_thread_pool_->DelayTask(1000, boost::bind(&NameServerImpl::CheckRecoverMode, this)); } void NameServerImpl::LeaveReadOnly() { - LOG(INFO, "Nameserver leave safemode"); + LOG(INFO, "Nameserver leave read only"); if (readonly_) { readonly_ = false; } @@ -1022,27 +1023,30 @@ bool NameServerImpl::WebService(const sofa::pbrpc::HTTPRequest& request, ListRecover(&response); return true; } else if (path == "/dfs/hi_only") { + recover_timeout_ = 0; LOG(INFO, "ChangeRecoverMode hi_only"); recover_mode_ = kHiOnly; response.content->Append(""); return true; } else if (path == "/dfs/recover_all") { + recover_timeout_ = 0; LOG(INFO, "ChangeRecoverMode recover_all"); recover_mode_ = kRecoverAll; response.content->Append(""); return true; } else if (path == "/dfs/stop_recover") { + recover_timeout_ = 0; LOG(INFO, "ChangeRecoverMode stop_recover"); recover_mode_ = kStopRecover; response.content->Append(""); return true; - } else if (path == "/dfs/entry_read_only") { - LOG(INFO, "ChangeStatus entry_read_only"); + } else if (path == "/dfs/leave_read_only") { + LOG(INFO, "ChangeStatus leave_read_only"); LeaveReadOnly(); response.content->Append(""); return true; - } else if (path == "/dfs/leave_read_only") { - LOG(INFO, "ChangeStatus leave_read_only"); + } else if (path == "/dfs/entry_read_only") { + LOG(INFO, "ChangeStatus entry_read_only"); readonly_ = true; response.content->Append(""); return true; @@ -1069,7 +1073,10 @@ bool NameServerImpl::WebService(const sofa::pbrpc::HTTPRequest& request, std::map::const_iterator it = request.query_params->begin(); Params p; if (it != request.query_params->end()) { - int32_t v = boost::lexical_cast(it->second); + int32_t v = 0; + if (it->first != "clean_redundancy") { + v = boost::lexical_cast(it->second); + } if (it->first == "report_interval") { if (v < 1 || v > 3600) { response.content->Append("

Bad Parameter : 1 <= report_interval <= 3600

"); @@ -1100,6 +1107,12 @@ bool NameServerImpl::WebService(const sofa::pbrpc::HTTPRequest& request, return true; } FLAGS_block_report_timeout = v; + } else if (it->first == "clean_redundancy") { + if (it->second != "true" && it->second != "false") { + response.content->Append("

Bad Parameter : clean_redundancy == true || false"); + return true; + } + FLAGS_clean_redundancy = it->second == "true" ? true : false; } else { response.content->Append("

Bad Parameter :"); response.content->Append(it->first); @@ -1236,7 +1249,7 @@ bool NameServerImpl::WebService(const sofa::pbrpc::HTTPRequest& request, str += "
"; str += "Total: " + common::HumanReadableString(total_quota) + "B
"; str += "Used: " + common::HumanReadableString(total_data) + "B
"; - str += "Pending tasks: " + str += "Pending: (r/w/rp/h)
" + common::NumToString(read_thread_pool_->PendingNum()) + " " + common::NumToString(work_thread_pool_->PendingNum()) + " " + common::NumToString(report_thread_pool_->PendingNum()) + " " @@ -1249,13 +1262,13 @@ bool NameServerImpl::WebService(const sofa::pbrpc::HTTPRequest& request, str += "
"; str += "Status: "; if (readonly_) { - str += "Read Only
LeaveSafeMode"; + str += "Read Only
LeaveReadOnly"; } else { - str += "Normal
EnterSafeMode"; + str += "Normal
EnterReadOnly"; } str += "
"; if (recover_timeout_ > 1) { - str += "RecoverCountdown: " + common::NumToString(recover_timeout_) + "
"; + str += "Recover: " + common::NumToString(recover_timeout_) + " Stop
"; } str += "RecoverMode: "; if (recover_mode_ == kRecoverAll) { diff --git a/src/nameserver/namespace.cc b/src/nameserver/namespace.cc index 2007df9b..d94dcb9e 100644 --- a/src/nameserver/namespace.cc +++ b/src/nameserver/namespace.cc @@ -251,6 +251,10 @@ StatusCode NameSpace::CreateFile(const std::string& path, int flags, int mode, i LOG(INFO, "CreateFile %s fail: already exist!", fname.c_str()); return kFileExists; } else { + if (IsDir(file_info.type())) { + LOG(INFO, "CreateFile %s fail: directory with same name exist", fname.c_str()); + return kFileExists; + } for (int i = 0; i < file_info.blocks_size(); i++) { blocks_to_remove->push_back(file_info.blocks(i)); } diff --git a/src/sdk/fs_impl.cc b/src/sdk/fs_impl.cc index f2a18d58..05f689a6 100644 --- a/src/sdk/fs_impl.cc +++ b/src/sdk/fs_impl.cc @@ -399,6 +399,7 @@ int32_t FSImpl::Rename(const char* oldpath, const char* newpath) { return OK; } int32_t FSImpl::ChangeReplicaNum(const char* file_name, int32_t replica_num) { + /* ChangeReplicaNumRequest request; ChangeReplicaNumResponse response; request.set_file_name(file_name); @@ -416,7 +417,8 @@ int32_t FSImpl::ChangeReplicaNum(const char* file_name, int32_t replica_num) { file_name, replica_num, StatusCode_Name(response.status()).c_str()); return GetErrorCode(response.status()); } - return OK; + */ + return PERMISSION_DENIED; } int32_t FSImpl::SysStat(const std::string& stat_name, std::string* result) { SysStatRequest request; diff --git a/src/test/mark.cc b/src/test/mark.cc index 7f3dd972..88301dbd 100644 --- a/src/test/mark.cc +++ b/src/test/mark.cc @@ -20,7 +20,7 @@ DEFINE_string(mode, "put", "[put | read]"); DEFINE_int64(count, 0, "put/read/delete file count"); DEFINE_int32(thread, 5, "thread num"); DEFINE_int32(seed, 301, "random seed"); -DEFINE_int32(file_size, 1024, "file size in KB"); +DEFINE_int64(file_size, 1024, "file size in KB"); DEFINE_string(folder, "test", "write data to which folder"); DEFINE_bool(break_on_failure, true, "exit when error occurs"); @@ -87,7 +87,7 @@ void Mark::Put(const std::string& filename, const std::string& base, int thread_ int64_t len = 0; int64_t base_size = (1 << 20) / 2; while (len < file_size_) { - uint32_t w = base_size + rand_[thread_id]->Uniform(base_size); + uint64_t w = base_size + rand_[thread_id]->Uniform(base_size); uint32_t write_len = file->Write(base.c_str(), w); if (write_len != w) { @@ -145,7 +145,7 @@ void Mark::Read(const std::string& filename, const std::string& base, int thread int64_t bytes = 0; int32_t len = 0; while (1) { - uint32_t r = base_size + rand_[thread_id]->Uniform(base_size); + uint64_t r = base_size + rand_[thread_id]->Uniform(base_size); len = file->Read(buf, r); if (len < 0) { if (FLAGS_break_on_failure) { diff --git a/src/version.h b/src/version.h index 1bef16af..c748402c 100644 --- a/src/version.h +++ b/src/version.h @@ -8,6 +8,7 @@ static const int kMajorVersion = 1; static const int kMinorVersion = 0; static const int kRevision = 0; +static const char kCompiler[] = __VERSION__; extern void PrintSystemVersion();