Skip to content

Parallel implementation for the Blocked Floyd Warshall APSP algorithm. Designed and implemented in C++ for CUDA GPUs

Notifications You must be signed in to change notification settings

zucchi99/Blocked-AllPairsShortestPath

Repository files navigation

Blocked-AllPairsShortestPath

In this repository we propose our C++ implementation for the Blocked Floyd Warshall APSP algorithm. The code is parallel and is designed for CUDA GPUs. It has been tested only on linux machines, the portability on Windows or MacOS is not guaranteed.

Repository structure:

Folders:

  • .: contains readme.md, python launchers, our report, original paper, make file, python notebook for data analysis, html file which links to the overleaf of the report, .gitignore, changelog.md
  • bin: the binaries and the objects compiled (files *.o and *.out)
  • csv: contains all csv output (from chrono, nvprof or generate_and_print_n_b)
  • input_graphs: instances of csv graphs to import (actually the code works with any folder)
  • main: contains the *.cu, *.cpp containing the main function. Which are: the floyd warshall versions and the generate_and_print_n_b.cpp file
  • include: contains all the headers *.hpp and *.cuh
  • src: contains all the file *.cpp and *cu of the headers specified in the include folder
  • png: contains the images of the plots exported after the data analysis

Compilation with MakeFile

With the Makefile command is possible to compile:

  • make fwb_host

    • Binary path file name: bin/fwb_host.out
    • Floyd Warshall sequential on CPU, both classic and blocked versions.
  • make fwb_dev VERSION=<version>

    • Binary path file name: bin/fwb_dev_v_<version>.out
    • Floyd Warshall parallel on GPU. Since we developed many difference versions with difference performance, is mandatory to specify which version you want to compile. The first compilation will take some time since it has to compile all the objects of the various cpp files. The compilation of the remaining, if desired, is fast since only the cpp of the version still needs to be compiled. NB: The version parameter must be equal to one of the files of floyd warshall inside the main directory, for example 2_1.
    • Example: make fwb_dev VERSION=2_1 will produce bin/fwb_dev_v_2_1.out.
  • make generate_n_b

    • Binary path file name: bin/generate_and_print_n_b.out
    • Generates a list of couples (n,B). Useful for correctness and efficiency testing purposes.

Floyd Warshall Binary Execution

Both host and device Floyd Warshall binaries share the same parameters, which are handled in the file handle_arguments_and_execute.cu. It is possible also to pass "--help" to read the guide. The parameters are the following:

  • only one is mandatory: exec_option=<test|perf|launch>.

    • launch: just executes the matrix given as input. Additional params:
      • (mandatory) --input-file=<file>: matrix csv input file
      • (mandatory) -b=<b>: blocking factor

    Example: bin/fwb_dev_v_2_1.out launch --input-file="input_graphs/example_1.csv" -b=2

    • perf: executes the algorithm and calculates the performances. It is possible to pass the matrix in input or to generate randomly the matrixes. Numbers of tests is useful to execute many different times. If random matrixes are used then each test will use a different input matrix otherwise always the input csv one. Is suggested to use t >= 10 to have consistency in the data. Also, in case of chrono, The execution is repeated 20*t times. 20 timings are saved. The execution time returned is the mean of the 20 values. These 20 different times allows to calculate the variancy of the timings. Note that if you to use nvprof as analyzer is not sufficient to specify it in the command line but also to launch nvprof passing the binary.

      • (mandatory) -b=<b>: blocking factor
      • (mandatory) -t=<t>: number of tests
      • (optional) -n=<n>: matrix size (mandatory to generate random matrix)
      • (optional) --input-file=<file>: matrix csv input file (mandatory for use a csv matrix)
      • (optional) --output-file=<file>: csv of all outputs (only in case of analyzer is chrono, with nvprof every couple (n,B) will produce its output csv) (default is csv/chrono_performances.csv)
      • (optional) --analyzer=<chrono|nvprof>: the analyzer to use (default is chrono)
      • (optional) -s=<s>: the seed of the matrix to generate randomly (default is RANDOM, only in case of matrix random generation)

    Main file: src/performance_test.cu

    Example: bin/fwb_dev_v_2_1.out perf --input-file="input_graphs/example_1.csv" -b=2 -t=10

    Example: bin/fwb_dev_v_2_1.out perf -n=20 -b=2 -t=10 -s=16362 --output_file="csv/chrono_performances.out" --analyzer=chrono

    Example: bin/fwb_dev_v_2_1.out perf -n=20 -b=2 -t=10

    Example: nvprof bin/fwb_dev_v_2_1.out perf -n=20 -b=2 -t=10 --analyzer=nvprof

    • test: will execute automatically 500 different tests per each random couple (n,B), comparing the version compiled and the host function. If the two matrixes are not equal, a counter of the number errors is increased and the seed used as input is printed. At the end of each couple the number of errors (new and total) is printed. Since this was designed to check the correctness during the developments part, the parameters used cannot be passed through the terminal. If you desire to see or change their values you can set defaults in the statistical_test.hpp file or customize them inside the handle_arguments_and_execute.cpp.

    Main file: src/statistical_test.cpp

    Example: bin/fwb_dev_v_2_1.out test

Generate and Print (n,b) Binary Execution

Binary file name: bin/generate_and_print_n_b.out Since this was designed to generate random values during the developments part, not all the parameters can be passed through the terminal. There are two ways to generate the couples:

  • randomly: the code generates the random couples of (n,b) in this way: next(n) = to_mul * n + to_sum, using min_n as first n and max_n as upper bound. The b are taken randomly between its divisors. There is no control if n obtained is a prime number, in case no b are found the current n is discarded. The to_mul value is a random double between 1.3 and 1.6, the to_sum value is a random integer between 0 and 100. Note that the parameters regarding the random generation (such as min_n, max_n, to_mul, to_sum ecc.) must be setted in the cpp file.

    Example: bin/generate_and_print_n_b.out

  • as parameter: the code generates all the possible couples of (n,b) given two lists, the first one for all n and the second one for all b. In this case all couples of (n,b) s.t. b is a divisor of n are printed to the output file.

    Example: bin/generate_and_print_n_b.out 80,160,240,320,480,640,720,1200 8,16,24,32

In both cases the output is printed to a default csv output file: csv/list_of_n_b.csv

Python Launchers Execution

Instead of launching a single binary we developed two python scripts, one for testing and one for performance, which automatically compile and execute all the versions using random matrixes. For the python tests no parameters are needed, for the one of perfomances only the analyzer (chrono, default, or nvprof).

The python for the tests don't generates itself all the values for n and b since they already randomly obtained at the start of the multi_size_statistical_test function in the file statistical_test.cpp.

The python for the performance analysis instead calls the itself the generate_and_print_n_b.cpp generating all couples of (n,b) given the two lists (see the example before). Then it reads the csv with pandas and iterates over the rows launching the binary with the current n,b values.

We remember that python is an interpreted language, so there is no need of compilation, just invoke directly the python interpreter with the desired parameters.

Examples: python launch_test.py, python launch_perf.py chrono, python launch_perf.py nvprof.

Data Analysis with Python Notebook

To analyse the execution timings obtained during the performances we used a python notebook. It automatically generates the dataframe reading the output csv both for chrono and nvprof and prints the plots both in the notebook and to image files in the png folder. It is easily possible to filter the df to see a detail of some specific versions or matrix sizes, inside the notebook there are two cells where can be specified two lists: versions_to_remove and sizes_to_remove.

About

Parallel implementation for the Blocked Floyd Warshall APSP algorithm. Designed and implemented in C++ for CUDA GPUs

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published