This is a CUDA-compatible wrapper for Auburn's popular FastNoiseLite library. It allows for the high-performance generation of various noise types directly on the GPU within CUDA kernels.
This port is designed to be a drop-in replacement for the original single-header file, enabling its use in both host (__host__) and device (__device__) code.
The library's behavior changes depending on whether you are compiling a .cu file with NVCC or a standard .cpp file with a host compiler (like GCC/Clang/MSVC). This is crucial for understanding where you can safely call the GetNoise functions.
The reason for this is the NOISE_CONSTANT macro, which places large lookup tables into __constant__ GPU memory when processed by NVCC, but compiles them as regular CPU memory otherwise.
-
In
.cufiles (compiled with NVCC): Any function that uses the lookup tables (likeGetNoise) is prepared by NVCC for device execution. If you call such a function from host code within a.cufile, the CPU will try to access the__constant__memory on the GPU, which is an illegal operation and will not work.- Rule: Inside
.cufiles, only configureFastNoiseLiteon the host. Noise generation must happen inside a__global__kernel.
- Rule: Inside
-
In
.cppfiles (or other non-NVCC compiled sources): When you includeFastNoiseLiteCUDA.hin a standard.cppfile,NOISE_CONSTANTis empty. The lookup tables are just standard global arrays in CPU memory.- Rule: Inside
.cppfiles, you can safely use the entireFastNoiseLiteobject on the host, including callingGetNoise. This allows for CPU-based noise generation for testing or other logic.
- Rule: Inside
Recommendation:
For consistency and to avoid mistakes, the best practice is to configure your FastNoiseLite instance on the CPU and then pass it by value to your kernels for noise generation, as shown in Example 2. Use host-side GetNoise calls from .cpp files only when you have a specific need for them.
To make the library compatible with CUDA, several key changes were made to the original source code:
-
CUDA Function Specifiers: All class methods and helper functions are now decorated with
__device__ __host__(via theNOISE_DHmacro). This allows them to be called from both CPU (host) and GPU (device) code seamlessly. -
Lookup Table Refactoring: The original
Lookupstruct, which was a nested static member of the class, caused compilation issues with NVCC. The compiler struggles with the definition of static device-side members. To resolve this:- The large lookup arrays (for gradients and random vectors) have been moved into a global
detailnamespace. - These arrays are declared in
__constant__memory using theNOISE_CONSTANTmacro. Constant memory is a cached, read-only memory space on the GPU, making it highly efficient for data that is accessed uniformly by all threads in a warp. - A new
FastNoisenamespace now contains theLookupstruct, which safely references these constant memory arrays. This restructuring resolves compilation errors while improving performance on the GPU.
- The large lookup arrays (for gradients and random vectors) have been moved into a global
Include the FastNoiseLiteCUDA.h header in your .cu file. You can then instantiate and use the FastNoiseLite object directly inside your CUDA kernels.
Here is a simple example of a kernel that generates 2D OpenSimplex2 noise for a grid by creating the noise object on the device.
#include "FastNoiseLiteCUDA.h"
__global__ void generate_noise_kernel(float* output, int width, int height)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
{
return;
}
// Create a FastNoiseLite instance on the stack for each thread
FastNoiseLite noise(1337); // Seed
noise.SetNoiseType(FastNoiseLite::NoiseType_OpenSimplex2);
noise.SetFrequency(0.05f);
// Calculate noise value
float noiseValue = noise.GetNoise((float)x, (float)y);
// Write the result to the output array
output[y * width + x] = noiseValue;
}A more common pattern is to configure the noise generator on the host and pass it by value to the kernel.
#include "FastNoiseLiteCUDA.h"
#include <cuda_runtime.h>
// Kernel accepts a configured FastNoiseLite object
__global__ void generate_noise_from_host_config(float* output, int width, int height, FastNoiseLite noise)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
{
return;
}
// Use the pre-configured noise object passed from the host
float noiseValue = noise.GetNoise((float)x, (float)y);
output[y * width + x] = noiseValue;
}
int main()
{
int width = 1024;
int height = 1024;
size_t bufferSize = width * height * sizeof(float);
float* d_output;
cudaMalloc(&d_output, bufferSize);
// 1. Configure FastNoiseLite on the host
FastNoiseLite host_noise_generator;
host_noise_generator.SetNoiseType(FastNoiseLite::NoiseType_Perlin);
host_noise_generator.SetFrequency(0.02f);
host_noise_generator.SetFractalType(FastNoiseLite::FractalType_FBm);
host_noise_generator.SetFractalOctaves(5);
dim3 threadsPerBlock(16, 16);
dim3 numBlocks((width + threadsPerBlock.x - 1) / threadsPerBlock.x, (height + threadsPerBlock.y - 1) / threadsPerBlock.y);
// 2. Pass the configured object by value to the kernel
generate_noise_from_host_config<<<numBlocks, threadsPerBlock>>>(d_output, width, height, host_noise_generator);
// ... copy data back to host and process ...
cudaFree(d_output);
return 0;
}This project is a wrapper and is entirely based on the fantastic work by Jordan Peck (Auburn). All noise generation algorithms and logic belong to the original author.
For more in-depth documentation on the noise algorithms, features, and settings, please refer to the official repository.
This wrapper is distributed under the MIT License, consistent with the original FastNoiseLite library. See the LICENSE file for more detail.