This repository presents a modified variant of the Bruck/Dissemination algorithm for commutative reduction operations in MPI. The implementation performs a global merge of sorted local blocks across all processes, achieving the same asymptotic communication and computation complexity as standard MPI_Allgather-based approaches.
Although this implementation uses element-wise merging as the reduction operator, the structure of the algorithm supports any commutative operation.
Unlike circulant-style Allreduce variants, which require different communication partners based on the number of processes, this version preserves the fixed partner structure of Bruck’s algorithm—making it suitable for systems with pre-optimized communication schedules or strict communication policies. To support correct aggregation, the Bruck/Dissemination algorithm is modified to transmit additional data only in specific, predetermined rounds—precisely when required to avoid loss of information due to overlapping reductions. This extra data is accumulated in a separate buffer and sent in a final additional round. The selection of rounds and data is optimal: it represents the minimal necessary overhead, resulting in at most 1.5× the total data volume of the standard Bruck’s algorithm in the worst case.
For pseudo code and a detailed space and time complexity analysis of the three implemented algorithms, see details (PDF).
ID | Name | Description |
---|---|---|
0 | Baseline | Standard MPI_Allgather followed by local sort |
1 | Bruck | Dissemination based allgather and merge with extra data sends |
2 | Circulant | Roughly-halving skips communication, optimal but different communication pairs |
Each algorithm is implemented in its own source file and selected at runtime via command-line arguments.
- C++17-compatible compiler (e.g.,
g++
,clang++
) - MPI library (e.g., OpenMPI or MPICH)
- CMake version 3.10 or higher
git clone https://github.com/jerelang/mpi-bruck-reduce.git
cd mpi-bruck-reduce
cmake -S sources -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j
This will build the dissemination_reduce
executable inbuild/
.
Run with mpirun
/mpiexec
:
mpirun -np <num_processes> ./build/dissemination_reduce [msg_size] [algorithm] [--check] [--warmup <int>] [--repeat <int>]
msg_size
(default:1000
)
Number of integers sent by each processalgorithm
(default:0
)
0
= Baseline
1
= Bruck
2
= Circulant
--check
Enables correctness verification against a sortedMPI_Allgather
reference--warmup <int>
Number of warm-up iterations before timing (default:2
)--repeat <int>
Number of timed iterations to average (default:10
)
mpirun -np 4 ./build/dissemination_reduce 10000 1 --check --repeat 5
Runs the Bruck algorithm with 10,000 integers per process, performs correctness verification, and averages over 5 timed iterations.
├── sources/
│ ├── main.cpp # Entry point and benchmark driver
│ ├── algorithm0.cpp # Baseline allgather + sort
│ ├── algorithm1.cpp # Bruck-style dissemination algorithm
│ ├── algorithm2.cpp # Circulant algorithm
│ ├── merge.cpp # Sorted merge routine
│ ├── algorithms.h # Common declarations
│ └── CMakeLists.txt # Build configuration
└── details.pdf