UNIVERSITY OF WEST ATTICA
SCHOOL OF ENGINEERING
DEPARTMENT OF COMPUTER ENGINEERING AND INFORMATICS
University of West Attica · Department of Computer Engineering and Informatics
Parallel Systems
Vasileios Evangelos Athanasiou
Student ID: 19390005
Supervision
Supervisor: Vasileios Mamalis, Professor
Co-supervisor: Michalis Iordanakis, Academic Scholar
Athens, December 2024
This repository provides an overview of a parallel computing project using OpenMP. The project focuses on matrix manipulation, diagonal dominance verification, and parallel reduction techniques.
| Section | Folder/File | Description |
|---|---|---|
| 1 | assign/ |
Assignment material for the OpenMP workshop |
| 1.1 | assign/_Par_Sys_Ask_1_2024-25.pdf |
Assignment description in English |
| 1.2 | assign/_Παρ_Συσ_Ασκ_1_2024-25.pdf |
Assignment description in Greek |
| 2 | docs/ |
Documentation and performance data for OpenMP parallel computing |
| 2.1 | docs/Parallel-Computing-using-OpenMP.pdf |
English documentation for OpenMP parallel computing |
| 2.2 | docs/Παράλληλος-Υπολογισμός-με-OpenMP.pdf |
Greek documentation for OpenMP parallel computing |
| 2.3 | docs/Times.xlsx |
Performance timing results |
| 3 | src/ |
Source code and input/output data for OpenMP exercises |
| 3.1 | src/omp.c |
Main OpenMP program |
| 3.2 | src/A/ |
Input data files for exercise A |
| 3.2.1 | src/A/A_no_strict_diagonial.txt |
Special test case for A |
| 3.2.2 | src/A/A_T1_N10_CZ2.txt |
Input for 1 thread, N=10, CZ=2 |
| 3.2.3 | src/A/A_T1_N16_CZ3.txt |
Input for 1 thread, N=16, CZ=3 |
| 3.2.4 | src/A/A_T1_N25_CZ5.txt |
Input for 1 thread, N=25, CZ=5 |
| 3.2.5 | src/A/A_T1_N400_CZ20.txt |
Input for 1 thread, N=400, CZ=20 |
| 3.2.6 | src/A/A_T1_N1000_CZ100.txt |
Input for 1 thread, N=1000, CZ=100 |
| 3.2.7 | src/A/A_T12_N10_CZ2.txt |
Input for 12 threads, N=10, CZ=2 |
| 3.2.8 | src/A/A_T12_N16_CZ3.txt |
Input for 12 threads, N=16, CZ=3 |
| 3.2.9 | src/A/A_T12_N25_CZ5.txt |
Input for 12 threads, N=25, CZ=5 |
| 3.2.10 | src/A/A_T12_N400_CZ20.txt |
Input for 12 threads, N=400, CZ=20 |
| 3.2.11 | src/A/A_T12_N1000_CZ100.txt |
Input for 12 threads, N=1000, CZ=100 |
| 3.2.12 | src/A/A_T2_N10_CZ2.txt |
Input for 2 threads, N=10, CZ=2 |
| 3.2.13 | src/A/A_T2_N16_CZ3.txt |
Input for 2 threads, N=16, CZ=3 |
| 3.2.14 | src/A/A_T2_N25_CZ5.txt |
Input for 2 threads, N=25, CZ=5 |
| 3.2.15 | src/A/A_T2_N400_CZ20.txt |
Input for 2 threads, N=400, CZ=20 |
| 3.2.16 | src/A/A_T2_N1000_CZ100.txt |
Input for 2 threads, N=1000, CZ=100 |
| 3.2.17 | src/A/A_T4_N10_CZ2.txt |
Input for 4 threads, N=10, CZ=2 |
| 3.2.18 | src/A/A_T4_N16_CZ4.txt |
Input for 4 threads, N=16, CZ=4 |
| 3.2.19 | src/A/A_T4_N25_CZ5.txt |
Input for 4 threads, N=25, CZ=5 |
| 3.2.20 | src/A/A_T4_N400_CZ20.txt |
Input for 4 threads, N=400, CZ=20 |
| 3.2.21 | src/A/A_T8_N10_CZ2.txt |
Input for 8 threads, N=10, CZ=2 |
| 3.2.22 | src/A/A_T8_N16_CZ3.txt |
Input for 8 threads, N=16, CZ=3 |
| 3.2.23 | src/A/A_T8_N25_CZ5.txt |
Input for 8 threads, N=25, CZ=5 |
| 3.2.24 | src/A/A_T8_N400_CZ20.txt |
Input for 8 threads, N=400, CZ=20 |
| 3.3 | src/B/ |
Input data files for exercise B (same pattern as A) |
| 3.4 | src/Output/ |
Output result files from OpenMP execution |
| 4 | README.md |
Project documentation |
| 5 | INSTALL.md |
Usage instructions |
This project evaluates the performance and implementation of a multithreaded program using the fork-join parallel execution model. Key highlights:
- High-level OpenMP techniques
- Task division across threads
- Synchronized communication
- Performance scaling and speedup measurement between sequential and parallel execution
The software performs operations on square matrices (N \times N):
Check if matrix A is strictly diagonally dominant:
Find the maximum absolute element (m) on the diagonal of A.
Create matrix B based on (m):
Compute the minimum absolute value in B using three methods:
- OpenMP reduction directive
- Critical region protection
- Binary Tree Algorithm for parallel reduction
- A, B: Dynamic 2D arrays storing matrix data
- M: Dynamic 1D array for the binary tree reduction
- chunk: Number of iterations per thread (parameter CZ)
Manually replaces the OpenMP reduction. Threads synchronize in phases, comparing pairs until a single minimum remains in the first position.
#pragma omp parallel→ Activates threads#pragma omp for schedule(static, chunk)→ Distributes loop iterations#pragma omp atomic→ Ensures atomic updates (e.g., dominance flag)reduction(max: m)→ Handles maximum calculation automatically
- Execution times are measured to calculate Speedup (S):
- Matrix Size (N): 10×10 up to 10,000×10,000
- Thread Count (T): 1 to 12 threads
- Chunk Size (CZ): Number of elements per thread
-
Generate Matrix
Matrices can be strictly diagonally dominant or random based on a seed. -
Execute Tasks
Program executes Tasks A–D using parallel regions for each. -
Results
Output saved to text files (e.g.,Output_T8_N400_CZ20.txt) for verification and analysis.
