Skip to content

Latest commit

 

History

History
41 lines (31 loc) · 1.35 KB

README.md

File metadata and controls

41 lines (31 loc) · 1.35 KB

MPI-ML

MPI-ML is parallel implementation of classification task.
MPI-ML runs a few different classifiers on given dataset (loaded from csv file).
Each classifier is parallely trained on the same dataset.

Testing dataset is equally divided and distributed to each process.
Each process runs classification task on the received part of testing data.

Project was created to compare the performance and accuracy of different classifiers with the use of Message Passing Interface in Python.

Dependencies

  • MPI
  • numpy
  • pandas
  • sklearn
  • mpi4pi

Build & Run

Install MPI (e.g. Ubuntu)

sudo apt install libmpich-dev

Install Python dependencies

sudo pip3 install sklearn pandas numpy mpi4py

Run

mpirun -n 4 python3 main.py

Note:
main.py must be run by mpirun to make the execution parallel. Otherwise only one process will be created and as a result only one classifier will be run.

Number of processes to be used for computation (4 in example) depends on number of classifiers you want to run parallely.
Current version of contains four classifiers: KNeighborsClassifier, DecisionTreeClassifier, MLPClassifier, SVC therefore 4 processes were used for computation.
If you want to run more classifiers parallely then you may want to use more processes - depending on your hardware.