Skip to content

Latest commit

 

History

History
88 lines (51 loc) · 3.23 KB

applications.md

File metadata and controls

88 lines (51 loc) · 3.23 KB

Applications

For an application to experience speedup compared to the CPU, it must:

  • be highly parallelizable
  • do a lot of work per input byte, because IO is very expensive

Minimal example request: http://stackoverflow.com/questions/7663343/simplest-possible-example-to-show-gpu-outperform-cpu-using-cuda

Specific applications

Matrix multiplication

Major example:

Not surprising, since rendering is just a bunch of matrix multiplications, with fixed matrices and varying vectors.

Sparse: http://stackoverflow.com/questions/3438826/sparse-matrix-multiplication-on-gpu-or-cpu

Bolt: C++ STL GPU powered implementation by AMD: http://developer.amd.com/tools-and-sdks/opencl-zone/bolt-c-template-library/ Other suggestions: http://stackoverflow.com/questions/16438099/high-level-gpu-programming-in-c

Non-applications

Vector addition. Too little work per input byte (1 CPU cycle). https://forums.khronos.org/showthread.php/7741-CPU-faster-in-vector-addition-than-GPU, http://stackoverflow.com/questions/15194798/vector-step-addition-slower-on-cuda http://hpclab.blogspot.fr/2011/09/is-gpu-good-for-large-vector-addition.html

Projects using OpenCL

Master list: https://en.wikipedia.org/wiki/List_of_OpenCL_applications

Notable users:

Application areas

General purpose wrappers

5265417205c9da930c92a48c087cbe342889bd3d