Skip to content

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Notifications You must be signed in to change notification settings

jkalloor3/gpu-prefix-sum

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

GPU Prefix Sum

  • Uses Blelloch's Algorithm (exclusive scan)
  • Not limited by 2048 items (a former restriction on the initial implementation of the algorithm due to the maximum threads that can run in a thread block on current GPUs)
  • Not limited by input sizes that are powers of 2 (a former restriction due to inherent binary tree-approach of the algorithm)
  • Free of shared memory bank conflicts using the index padding method in this paper.

About

CUDA implementation of exclusive prefix sum via Blelloch's algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Cuda 77.4%
  • C++ 19.4%
  • C 1.9%
  • Makefile 1.3%