This repository contains the group project for the Distributed Database Systems (DDBS) course at Tsinghua University. The project implements a distributed database system designed to manage both structured and unstructured data. Key features include:
- Data Partitioning: Efficiently distribute data across nodes.
- Replication: Ensure data availability and fault tolerance.
- Query Execution: Support for complex queries over distributed datasets.
- System Monitoring: Tools for tracking system performance and health.
- Integration: Combines relational database management systems (RDBMSs) with Hadoop HDFS for hybrid data management.
We use Docker to start the MySQL service for this project. The MySQL version used is 5.7.18. Follow these steps to set up MySQL in Docker:
-
Pull MySQL Docker Image:
docker pull mysql:5.7.18
-
Create a Docker Container for MySQL:
docker run --name mysql_server -e MYSQL_ROOT_PASSWORD=rootpassword -d mysql:5.7.18
-
Start MySQL on Multiple Ports (3310, 3311, 3312):
-
Enter the Docker container:
docker exec -it mysql_server bash -
Install MySQL server (if not installed):
apt update apt install mysql-server
-
Edit the MySQL configuration file:
vi /etc/mysql/mysql.conf.d/mysqld.cnf
Add or modify the port setting. For example:
[mysqld] port = 3310 # For instance 1Repeat this step for ports
3311and3312. -
Restart MySQL:
mkdir -p /var/run/mysqld chown mysql /var/run/mysqld/ service mysql restart
-
-
Authorize Remote Access:
-
Log in to MySQL:
mysql -u root -p
-
Update user permissions:
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '123456'; FLUSH PRIVILEGES; exit;
-
-
Connect to MySQL on Custom Ports:
To connect to MySQL running on ports
3310,3311, or3312:mysql -h 127.0.0.1 -P 3310 -u root -p
We use JDK 8 and Hadoop 3.4.1. Follow these steps to set up Hadoop:
-
Install OpenJDK 8:
sudo apt update sudo apt install openjdk-8-jdk java -version
-
Install Hadoop 3.4.1:
wget https://downloads.apache.org/hadoop/common/hadoop-3.4.1/hadoop-3.4.1.tar.gz tar -xzvf hadoop-3.4.1.tar.gz mv hadoop-3.4.1 /home/user/
-
Configure Environment:
Edit
~/.bashrcto add Hadoop and Java environment variables:# Hadoop Environment Variables export PATH=$PATH:/home/user/hadoop-3.4.1/bin:/home/user/hadoop-3.4.1/sbin # Java Environment Variables export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
Apply the changes:
source ~/.bashrc
-
Configure Hadoop:
Edit Hadoop's
core-site.xmlandhdfs-site.xmlfiles based on your requirements. Refer to the official Hadoop documentation or setup guides. -
Start HDFS:
start-dfs.sh
Verify the services:
jps
-
Basic HDFS Commands:
hdfs dfs -mkdir input hdfs dfs -put test.txt input/ hdfs dfs -cat input/test.txt hdfs dfs -get input/test.txt ./test.txt
Install the necessary dependencies:
sudo apt update
sudo apt install libmysqlcppconn-dev systemctl
sudo systemctl start mysqlTo compile the project:
mkdir build
cd build
cmake ..
make -jRun the compiled binaries:
cd build
./server
./clientThe client supports the following commands:
Enter a command:
QUERY <query statement>
BEREAD
POPULAR ["daily", "weekly", "monthly"]
MONITOR
REGISTER <HOST:port>,<user>,<password>,<schema>
DUMP <node_num>
EXIT
Command:-
QUERY: Execute an SQL query.
Example:
QUERY SELECT * FROM users;
-
BEREAD: Retrieve data related to user or article activity.
-
POPULAR: Fetch the top 5 most popular articles for a specified period (
daily,weekly,monthly).Example:
POPULAR daily
-
MONITOR: Monitor the database nodes, including connection status and workloads.
-
REGISTER: Register a new database node.
Example:
REGISTER 127.0.0.1:3312,root,123456,standby1,
-
DUMP: Dump a specific node. If a standby node is ready, it will be promoted to primary.
Example:
DUMP 0
-
EXIT: Exit the client session.
- Ensure all dependencies are correctly installed before running the project.
- Follow the configuration steps carefully to avoid setup issues.
- Refer to the official documentation for any additional configurations or troubleshooting.