This works aims to improve Sparse Matrix-Vector Multiplication by using mixed-precision (FP32 + FP64). In doing so, it permutes the matrix such that threads are more load balanced for the mixed-precision computations.
Use make to build the CUDA binary at bin/spmv. The compiler uses --arch=sm_70 for NVIDIA V100, but you can change that to suit your own GPU with an MYGPU_ARCH environment variable, e.g. export MYGPU_ARCH=sm_50. We have used cuda/11.2, python/3.7.4 and gcc/9.3.0 to compile our program and run Python scripts. You also need to install and compile HSL_MC64 static library with gfortran.
The file structure of this project is as follows:
batchhas shell scripts for cluster commands, such as queueing a job.binfor binary executables.buildfor build files.diagnostichas several scripts to check the program via Valgrind, cudamemcheck etc.evaluationsis where we store the execution output. This is later read by Python scripts to make plots.imgstores the output from Python files, such as plot images.includehas header files.logshave log outputs, generally from the diagnostic tools.reshas resources, such as MatrixMarket files.scriptshas a variety of Python scripts, mostly for plotting and automated running of the code.srchas the source files.templateshas the source files for template functions.
The Makefile will create a binary called spmv under bin folder within the same directory, with object files under build. Run the executable with -h or --help option to see usage.
For both kuacc and simula under batches we have the following:
final_experiment.shruns the final experiments, as used for the paper.spmv_all.shruns SpMV test on all matrices (fromallprunedindex)._srun_gpu.shasks for an interactive shell with one Tesla V100._check_queue.shchecks the queue for my jobs._load_modules.shloads necessary modules. does not work sometimes
Matrices are stored under res folder, with the following scripts:
download.sh <MatrixMarketURL>downloads the matrix from the given URL. See SuiteSparse.download-from-md.sh <path>downloads the matrices that appear in the provided Markdown file.generate.shunderarchitectgenerates a specific set of matrices using thearchitect.pyscript.parsehtml.sh <path-to-html> <output-name>parses an HTML from http://yifanhu.net/GALLERY/GRAPHS/search.html to create an index file.
The scripts below are under diagnostics folder:
eval_architect.shusesevaluator.pyon matrices underres/architect.eval_res.shusesevaluator.pyon matrices underres.cudamemcheck.shrunscudamemcheckwith a matrix underres/architect.valgrind.sh <matrix>runsvalgrindfor the provided matrix.nvprof.sh <matrix>profiles SpMV kernels for the provided matrix.run_random.shselects a random matrix underresand runs it.
Stored under scripts folder:
architect.pycreates random MatrixMarket matrices.evaluator.pyruns the binary and parses it's outputs to create plots. Saves the resulting dictionary on file.exporter.pyreads a a dictionary output byevaluator.pyand exportscsvfiles.interpreter.pyreads a dictionary output byevaluator.pyand plots stuff.interpret.ipynba notebook to plot the results from another evaluation output.analyser.pyanalyse a specific matrix with Python.plots.pyhelper functions for plotting.utility.pyutility functions.prints.pyhelper functions for printing.
plottype folder has generic plotting functions such as bar, heatmap, density etc. and plotspecial folder has specific plots.
To be published in SBACPAD'22 IEEE 34th International Symposium on Computer Architecture and High Performance Computing.
Erhan Tezcan, Tugba Torun, Fahrican Koşar, Kamer Kaya, and Didem Unat (2022). Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection. IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD’22), November 2-5, 2022, Bordeaux, France.