GitHub - thestarivore/openmp-mpi_k-means-clustering: K-Means Clustering algorithm in OpenMP/MPI

K-Means Clustering algorithm in OpenMP/MPI

Requirements:

First install the required libraries:

sudo apt install openmpi-bin
sudo apt install libopenmpi-dev

CodeBlocks can be used to compile the program but with the appropriate changes:

Create a new copy of the GNU GCC Compiler in "Settings"-->"Compiler" and change the following in "Toolchain Executables":
- C compiler: mpicc
- C++ compiler: mpicxx
- Linker for dynamic libs: mpicxx
Add -std=c++03 (to disable c++11) in "Compiler Settings" --> "Other Compiler Settings";
Add -fopenmp flag (for OpenMP support) in "Project build options" (all tree tabs);

In case of Manual Compilation just use the following line:

mpicxx main.cpp  -o main -fopenmp

Running Modes:

Run the K-Means Clustering algorithm in four Modes:

Normal Mode;
OpenMP Mode;
MPI Mode;
MPI + OpenMP Mode;

The first two can be run by running the program normally, the third and forth Modes however require the following command (in the repositories root):

mpiexec -n 4 bin/Debug/openmp_cpp_project	//If compiling with CodeBlocks
mpiexec -n 4 ./main							//If compiling from the terminal

The program will first ask the number of clusters to use and afterwards will run one Mode at a time by maintaining the initial clusters and dataset common for every execution.

Once it has finished the execution time will be displayed for each Mode so that we can compare them fairly.

To pass directly the number of centroids to use in the execution, just add the number at the end of the line as an argument:

mpiexec -n 4 bin/Debug/openmp_cpp_project N		//If compiling with CodeBlocks
mpiexec -n 4 ./main N							//If compiling from the terminal

Ploting

There are three defines in the program that can be decommented to enable three types of plotings:

#define PRELOOP_PRINT_AND_PLOT     
#define LOOP_PRINT_AND_PLOT  
#define POSTLOOP_PRINT_AND_PLOT

PRELOOP_PRINT_AND_PLOT enables the plotting of the initial dataset and initial centroids chosen;
LOOP_PRINT_AND_PLOT enables the plotting of the dataset, clusters and centroids on every iteration of the execution (can vary based on the initial condition);
POSTLOOP_PRINT_AND_PLOT enables just the plotting of the final dataset, clusters and centroids;

Amazon AWS - Startcluster

To get meaningful data we need to execute the algorithm on a real cluster, we'll be using Amazon AWS's EC2 service and Starcluster to achieve that.

Follow the instruction at the following link for a brief guide on how to configure Starcluster on Amazin AWS: http://mpitutorial.com/tutorials/launching-an-amazon-ec2-mpi-cluster/

While for a more detailed document, here is the Starcluster Documentation: https://media.readthedocs.org/pdf/starcluster/latest/starcluster.pdf

The installation process will require the compilation of the ".starcluster/config" file, a copy of the configuration file can be found on the root of the project and it's called starcluster_config.

Once the configuration has been completed the following steps must be used to actually execute the algorithm on the cluster and retrieve the results:

Create 4x c3.xlarge (4cores and 8GB RAM) Nodes programmatically on EC2 Amazon AWS Starcluster to create the cluster (first must be configured).

starcluster terminate -f smallcluster 	//To terminate the last session if any
starcluster start smallcluster 			//To create the cluster on AWS nodes.

Use the following command to connect via SSH to the master of the clusters:
```
starcluster sshmaster smallcluster 
```
Use sgeadmin user to control the cluster:
```
su - sgeadmin 
```
Create the hostfile on sgeadmin root with the following content:
```
master
node001
node002
node003
```
Clone the git repository on the root folder;

Compile and execute:

mpicxx main.cpp  -o main -fopenmp
mpiexec -np 4 -hostfile /home/sgeadmin/hostfile ./main

Run the script to perform multiple executions:
```
sh multiple_executions.sh 
```
Save the results;

Testing: Cumulative Results

By running a specific bash script we are able to run multiple executions that can later be used for statistics or any type of data analysis (like the one done with Jupyter Notebook):

sh multiple_executions.sh

Plots need to be disabled by commenting the defines described above, otherwise the script will be interrupted by the plots.

RESULTS: Jupyter Notebook

A Jupyter Notebook documentation is available for data analysis in the Jupyter file openmp_mpi_k-means.ipynb(open with Jupyter Notebook) , an .md file openmp_mpi_k-means.md(open with Typora or similar .md readers) and openmp_mpi_k-means.pdf.

To download the pdf from the .ipynb you might need to install the following:

pip install nbconvert
apt-get install texlive-generic-recommended
sudo apt-get install pandoc

//And use the following command for the convertion
jupyter nbconvert --to pdf MyNotebook.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
dataset_display		dataset_display
openmp_cpp_project		openmp_cpp_project
.gitignore		.gitignore
README.md		README.md
notebook.log		notebook.log
notebook.tex		notebook.tex
openmp_mpi_k-means.ipynb		openmp_mpi_k-means.ipynb
openmp_mpi_k-means.md		openmp_mpi_k-means.md
openmp_mpi_k-means.pdf		openmp_mpi_k-means.pdf
output_13_0.png		output_13_0.png
output_17_0.png		output_17_0.png
output_26_1.png		output_26_1.png
output_27_1.png		output_27_1.png
output_29_0.png		output_29_0.png
output_31_1.png		output_31_1.png
output_33_1.png		output_33_1.png
output_35_1.png		output_35_1.png
output_37_1.png		output_37_1.png
output_39_1.png		output_39_1.png
output_42_1.png		output_42_1.png
output_49_0.png		output_49_0.png
output_52_0.png		output_52_0.png
output_8_0.png		output_8_0.png
starcluster_config		starcluster_config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means Clustering algorithm in OpenMP/MPI

Requirements:

Running Modes:

Ploting

Amazon AWS - Startcluster

Testing: Cumulative Results

RESULTS: Jupyter Notebook

About

Releases

Packages

Contributors 3

Languages

thestarivore/openmp-mpi_k-means-clustering

Folders and files

Latest commit

History

Repository files navigation

K-Means Clustering algorithm in OpenMP/MPI

Requirements:

Running Modes:

Ploting

Amazon AWS - Startcluster

Testing: Cumulative Results

RESULTS: Jupyter Notebook

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages