This repository contains the implementation of an academic experiment designed to compare the performance of CPU and GPU using CUDA and OpenMP. The experiment involves reimplementing three algorithms in both OpenMP and CUDA to achieve the fastest performance. The three algorithms are:
- Standard Deviation Calculation
- Image Convolution using Sobel Operator
- Histogram-Based Data Structure
This experiment is part of the coursework for the module Parallel Computing with Graphical Processing Units (GPUs). It aims to assess the ability to implement and optimize parallel algorithms using OpenMP and CUDA.
- CUDA-Enabled GPU
- CUDA Toolkit
- Visual Studio with CUDA Installed
- OpenMP-Compatible Compiler
git clone https://github.com/Vivek-Tate/GPU-Parallel-Computing-using-CUDA-and-OpenMP.git
Ensure you have CUDA installed and Visual Studio set up with CUDA integration.
- Open the solution file in Visual Studio.
- Set the configuration to
Release
. - Build the solution.
The executables can be run with specific command-line arguments for different algorithms and inputs.
# Using random seed and population size
CPU SD 12 100000 -b
# Using a CSV input file
CPU SD sd_in.csv -b
# Using a PNG input file
CPU C c_in.png
# Optional output file
CPU C c_in.png c_out.png
# Using random seed and array length
CPU DS 12 100000
# Using a CSV input file
CPU DS ds_in.csv
# Optional output file
CPU DS ds_in.csv ds_out.csv
For CUDA implementations, ensure you have a CUDA-capable GPU. The command-line arguments follow the same structure as the CPU versions but replace CPU
with CUDA
.
CUDA SD 12 100000 -b
CUDA C c_in.png
CUDA DS ds_in.csv
Ensure your compiler supports OpenMP. The command-line arguments follow the same structure as the CPU versions but replace CPU
with OpenMP
.
OpenMP SD 12 100000 -b
OpenMP C c_in.png
OpenMP DS ds_in.csv
The standard deviation is computed in two main stages using OpenMP and CUDA:
- Mean Calculation
- Sum of Squared Differences Calculation
The convolution algorithm applies the Sobel operator to an image to detect edges. The horizontal and vertical gradients are computed, and the gradient magnitude is calculated.
A histogram of a sorted integer array is computed, followed by the calculation of boundary indices. This involves atomic operations and guided scheduling in OpenMP and CUDA.
- This repository is for an academic experiment and not a complete project.
- The performance of each implementation is benchmarked, and results are documented in the final report in docs.
See LICENSE
for more information.