- Uses Blelloch's Algorithm (exclusive scan)
- Not limited by 2048 items (a former restriction on the initial implementation of the algorithm due to the maximum threads that can run in a thread block on current GPUs)
- Not limited by input sizes that are powers of 2 (a former restriction due to inherent binary tree-approach of the algorithm)
- Free of shared memory bank conflicts using the index padding method in this paper.
forked from mark-poscablo/gpu-prefix-sum
-
Notifications
You must be signed in to change notification settings - Fork 0
jkalloor3/gpu-prefix-sum
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
CUDA implementation of exclusive prefix sum via Blelloch's algorithm
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Cuda 77.4%
- C++ 19.4%
- C 1.9%
- Makefile 1.3%