Deep Learning C++ Libraries

07/01/2020 - Update Single Layer Perceptron
07/03/2020 - Ongoing Variable, cudaMat, Functions
07/05/2020 - Variable
07/07/2020 ~ 07/12/2020 - src/cuda_repo/repo5 for CUDA C
07/15/2020 ~ - src/cuda_repo/repo6 for CUDA C
__ restrict __ in CUDA / Pointer Aliasing
- It is an optimizing way for pointer aliasing
- C/C++ code cannot match FORTRAN performance, pointer aliasing is an important topic to understand when considering optimizations for C/C++ code.
- Two pointers alias if the memory to which they point overlaps. When a compiler can't determine whether pointers alias, it has to assume that they do.
```
void example1(float *a, float *b, float *c, int i) {
        a[i] = a[i] + c[i];
        b[i] = b[i] + c[i];
}
```
- Above simple functions shows why this is potentially harmful to performance.
  - It assumes that c[i] can be reused once it is loaded. Consider the case where a and c point to the same address. In this case the first line modifies the value c[i] when writing to a[i].
  - Therefore, the compiler must generate code to reload c[i] on the second line, in case it has been modified.
  - Because the compiler must conservatively assume the pointers alias, it will compile the above code ineffectively, even if the programmer knows that the pointers never alias.
- C99 standard includes the keyword 'restrict' for use in C. In C++, there is no standard keyword, but most compilers allow the keywords __ restrict __ or __ restrict to be used for the same purpose as restrict in C.
- If we know at compile time that three pointers are not used to access overlapping regions, we can add __ restrict __ to our pointers. It can optimize the inner loop by storing the running sum in a local variable and only writing it once at the end.
- Pointer aliasing is something developers of high-performance code need to be aware of on both the GPU and the CPU, proper use can significantly improve performance.
- Due to potential aliasing, the compiler can't be sure a pointer references read-only data unless the pointer is marked with both const and __ restrict __.
- In this case, there are no redundant memory accesses due to potential pointer aliasing. Each thread reads one element of c and a and writes one element of b. However, because both a and c are read-only, and I know that the data does not overlap, I can add const and __ restrict __ to the code.
- ```
__global__ void example3b(const float* __restrict__ a, float* __restrict__ b, const int*  __restrict__ c) {
      int index = blockIdx.x * blockDim.x + threadIdx.x;
      b[index] = a[c[index]];
}
```
CUDA Environement Comments
- Command environments
nvcc -o main main.cpp -std=c++11

./main

-ldlib -L/usr/local/cuda-9.0/lib64 -lcudnn -lpthread -lcuda -lcudart -lcublas -lcurand -lcusolver -lopencv_core -lopencv_objdetect -lopencv_highgui -lopencv_imgproc -lopencv_videoio -lopencv_imgcodecs -lopencv_dnn -lcudart -lgomp -lm -lstdc++
- CMakeLists environments
-#include_directories(/usr/local/cuda-9.0/targets/x86_64-linux/include) -#include_directories(/usr/local/cuda/targets/x86_64-linux/include) -#include_directories(/usr/local/cuda-9.0/lib64) -#include_directories(/usr/local/cuda/extras/CUPTI/lib64) -#include_directories(/usr/local/cuda-9.0) -#find_package(CUDA REQUIRED) -#include_directories("${CUDA_INCLUDE_DIRS}")
Reference

https://github.com/takezo5096

https://developer.nvidia.com/blog/cuda-pro-tip-optimize-pointer-aliasing/

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.idea		.idea
cmake-build-debug		cmake-build-debug
src		src
CMakeLists.txt		CMakeLists.txt
README.md		README.md
example_slp.h		example_slp.h
main		main
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning C++ Libraries

About

Uh oh!

Releases

Packages

Languages

jinparksj/deeplearning_cpp_libraries

Folders and files

Latest commit

History

Repository files navigation

Deep Learning C++ Libraries

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages