Faial is a static code analyzer for CUDA kernels that can help you detect data-race free kernels and racy kernels without running the program and for all possible inputs.
- Installing Faial (external link)
- Setting up a GitHub Actions project
- Using command line options
- Adding assumptions
The file saxpy-buggy.cu
is a buggy SAXPY CUDA kernel.
__global__ void saxpy(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
if (i < n) y[i] = a*x[i] + y[i + 1];
}
Running faial-drf saxpy-buggy.cu
shows us the root cause and a program state
that triggers the data-race.
The error report consists of:
- the source location (line 5) of the error, along with both access being highlighted (here underline)
- In the table labelled
Global
, we have the state of thread-global program variables. An important characteristic of thread-global variables, is that threads observe the same value. Variable index represents the array index being accessed. Next, we have the value of variablesblockDim.x
andn
. - Next, we have the runtime state of thread-local variables (table
Locals
). There is one column per thread causing the data-race. In this case one threadthreadIdx.x=0
races with treadthreadIdx.x=1
.
The fixed version of our example is in saxpy.cu
. Here is how these two
files differ:
$ diff -u saxpy-buggy.cu saxpy.cu
--- saxpy-buggy.cu 2021-04-20 10:41:26.317324409 -0400
+++ saxpy.cu 2021-04-19 16:28:24.407379028 -0400
@@ -2,5 +2,5 @@
void saxpy(int n, float a, float *x, float *y)
{
int i = blockIdx.x*blockDim.x + threadIdx.x;
- if (i < n) y[i] = a*x[i] + y[i + 1];
+ if (i < n) y[i] = a*x[i] + y[i];
}
\ No newline at end of file
Running faial-drf saxpy.cu
informs us that there are no data races in all
possible executions.