Dense or Sparse : Optimal SPMM-as-a-Service for Big-Data Processing
sudo apt-get update -y
sudo apt-get remove docker docker-engine docker.io
sudo apt-get install docker.io -y
sudo service docker start
sudo chmod 666 /var/run/docker.sock
sudo usermod -a -G docker ubuntu
sudo docker pull tensorflow/tensorflow:2.5.0
sudo docker run -it tensorflow/tensorflow:2.5.0 bash
apt-get update -y
apt-get install git -y
DEBIAN_FRONTEND=noninteractive apt-get install r-base -y
cd home
git clone https://github.com/kmu-bigdata/dos.git
cd dos
pip install -r requirements.txt
cd data-generation
./generate-and-optimize-lhs-data.sh
# SPMM based on /data/optimal-lhs-data.csv
./generate-trainset-testset.sh
cd ../dos
python3 train.py
python3 test.py
python3 inference.py --nr_l 10000 --nc_l 60000 --nc_r 20000 --d_l 0.0001 --d_r 0.03 --nnz_l 60000 --nnz_r 36000000
sudo apt-get update -y
sudo apt-get install git -y
sudo apt-get install awscli -y
git clone https://github.com/kmu-bigdata/dos.git
sudo apt-get remove docker docker-engine docker.io
sudo apt-get install docker.io -y
sudo service docker start
sudo chmod 666 /var/run/docker.sock
sudo usermod -a -G docker ubuntu
cd dos/microservice
docker build -t "image-name" .
- Create Amazon ECR Repository to store container images.
aws configure
export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account)
echo "export ACCOUNT_ID=${ACCOUNT_ID}" | tee -a ~/.bash_profile
docker tag "image-name" $ACCOUNT_ID.dkr.ecr."region-name".amazonaws.com/"ecr-name"
aws ecr get-login-password --region "region-name" | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr."region-name".amazonaws.com
docker push $ACCOUNT_ID.dkr.ecr."region-name".amazonaws.com/"ecr-name"
- The container image in Amazon ECR is specified as the runtime environment of the AWS Lambda function.
- When configuring the Lambda, set the memory to 512 MB and the timeout to 1 minute.
7. Write a Lambda function that recommends an optimal multiplication method based on matrix multiplication information
- The Lambda function predicts and transmits the optimal SPMM method according to the matrix multiplication arguments using the Sparse X Sparse Latency prediction model and the Sparse X Dense Latency prediction model.
- Lambda functions can be written based on dos/microservice/lambda_function.py.
- The AWS Lambda function receives a matrix multiplication argument from Amazon API Gateway.
- After that, the optimal SPMM method according to the matrix multiplication argument is sent back to Amazon API Gateway.
sudo yum update -y
sudo yum install git -y
cd /home/hadoop
git clone https://github.com/kmu-bigdata/dos.git
cd dos/spark-3.1.2 && ./build/mvn -pl :spark-mllib_2.12 -DskipTests clean install
sudo mv /home/hadoop/dos/spark-3.1.2/mllib/target/spark-mllib_2.12-3.1.2.jar /usr/lib/spark/jars/spark-mllib_2.12-3.1.2-amzn-0.jar
spark-shell
-
SparseMatrix Multiplication
import org.apache.spark.mllib.linalg.SparseMatrix import java.util.Random; val NumRow_L = 2 val NumCol_L = 3 val NumCol_R = 3 val D_L = 0.001 val D_R = 0.005 val l_sm = SparseMatrix.sprand(NumRow_L, NumCol_L, D_L, new Random(24)) val r_sm = SparseMatrix.sprand(NumCol_L, NumCol_R, D_R, new Random(24)) l_sm.multiply(r_sm)
-
BlockMatrix Multiplication
import org.apache.spark.mllib.linalg.distributed.{CoordinateMatrix, MatrixEntry} val NumRow_L = 2 val NumCol_L = 3 val NumCol_R = 3 val BlockRow_L = 1 val BlockCol_L = 1 val BlockCol_R = 1 val l_entries = sc.parallelize(Seq((0, 0, 1.0), (1, 1, 2.0), (0, 2, 3.0), (1, 2, 4.0))).map{case (i, j, v) => MatrixEntry(i, j, v)} val l_block_matrix = new CoordinateMatrix(l_entries, NumRow_L, NumCol_L).toBlockMatrix(BlockRow_L, BlockCol_L).cache val r_entries = sc.parallelize(Seq((1, 0, 5.0), (2, 0, 6.0), (0, 1, 7.0), (2, 1, 8.0), (1, 2, 9.0))).map{case (i, j, v) => MatrixEntry(i, j, v)} val r_block_matrix = new CoordinateMatrix(r_entries, NumCol_L, NumCol_R).toBlockMatrix(BlockCol_L, BlockCol_R).cache l_block_matrix.multiply(r_block_matrix).validate