This repository contains Hephflow, a moment-based implementation of the Lattice Boltzmann Method (LBM) for GPU acceleration using CUDA.
Hephflow is the successor to the previous project MR-LBM, and builds upon its concepts and codebase, introducing new features and improvements.
- Project Overview
- Installation Guide
- Simulation
- Post Processing
- File Structure
- Creating a Boundary Case
- Using Voxels Immersed Bodies
- Benchmark
- Gallery
- Publications
- Updates
- License
- Contact
Hephflow implements the moment-based Lattice Boltzmann method, where the collision is carried out across moments from the 0th to the 2nd order.
This project is intended primarily as a proof of concept, with many features still under development.
Hephflow is heavily based on the previous version VISCOPLASTIC-LBM, and therefore shares similar licensing terms and design philosophy.
To compile and run the project, you will need:
- C++ compiler (e.g., MSVC)
- Nvidia GPU drivers
- CUDA toolkit
The CUDA toolkit, provided by Nvidia, includes both the GPU drivers and the necessary CUDA libraries. Ensure that your GPU supports compute capability 3.5 or higher. Note that the software is designed to run on a single GPU, and multi-GPU setups are not currently supported.
To compile the project, you can use the provided bash script. The script includes instructions on how to modify it based on your GPU’s compute capability and the necessary arguments.
- Install the CUDA toolkit from Nvidia's website.
- Modify the
compile.shfile if necessary for your GPU when on Windows.compile_linux.shfor linux machines - Run the script to compile the project.
The program generates binary output files containing macroscopic quantities such as density and velocity, as well as an information file that describes the simulation parameters (lattice size, tau, velocity set, etc.).
To convert these binary files into a format that can be interpreted by other software (e.g., Paraview), a Python post-processing tool is provided.
To process the output binary files, you will need Python 3 and the following dependencies:
- glob
- numpy
- os
- pyevtk
- matplotlib
To convert the binary files to
.vtrfiles (used in Paraview), use the following command:
python exampleVtk.py "PATH_FILES/ID_SIM"| No | File Name | Details |
|---|---|---|
| 1 | main | Main application code |
| 2 | mlbm | Core kernel with streaming-collision operations |
| 3 | var | Simulation parameters |
| 4 | compile.sh | Compilation script (edit for correct CUDA version) |
| 5 | definitions.h | Constants used within LBM |
| 6 | arrayIndex.h | Index calculation for moments |
| 7 | auxFunctions.h | Auxiliary functions |
| 8 | cases | Folder for different simulation cases |
| 9 | cases/bc_definition | Boundary condition definitions for lattices |
| 10 | cases/bc_initialization | Boundary condition flags |
| 11 | cases/constants | Constants specific to the case |
| 12 | cases/flow_initialization | Flow initialization parameters |
| 13 | cases/model | Model parameters (e.g., velocity set, collision model) |
| 14 | cases/output | Data export parameters |
| 15 | colrec | Collision and reconstruction files for moments |
| 16 | includeFiles/popSave | Load population from global memory |
| 17 | includeFiles/popLoad | Save population to global memory |
| 18 | includeFiles/interface | Definition of boundary conditions (wall or periodic) |
| 19 | checkpoint | Functions for generating simulation checkpoints |
| 20 | errorDef | Error handling functions |
| 21 | globalFunctions | Index functions |
| 22 | globalStructs | Structures for device and host |
| 23 | lbmInitialization | Field initialization functions |
| 24 | nnf | Non-Newtonian fluid definitions |
| 25 | nodeTypeMap | Boundary condition node type map |
| 26 | particleTracer | Functions for particle tracing |
| 27 | reduction | Parallel reduction functions |
| 28 | saveData | Functions to save simulation data |
To define a new boundary case, you need to create files within the cases folder. Each case should contain the following files:
bc_definition: Defines the mathematical equations for the moments (0th to 2nd order).bc_initialization: Defines the boundary condition flags.constants: Defines simulation parameters (mesh size, velocity, etc.).flow_initialization: Defines how the flow is initialized.model: Defines model parameters (e.g., velocity set, collision model).output: Defines output parameters for data export.
Additional models like thermal or viscoelastic will require additional files.
To use voxels for immersed bodies, follow these steps:
- Create a CSV file containing the coordinates of the solid nodes.
- Add an
includedirective forVOXEL_FILENAMEin theconstantsfile. - Include
VOXEL_BC_DEFINEin thebc_definition.
The following table summarizes performance benchmarks on various GPUs:
| GPu | sm | GPU Clock (GHz) | Memory Clock (GHz) | Block Size | MLUPs | Observations |
|---|---|---|---|---|---|---|
| RTX 4090 OC | 89 | 3.0 GHz | 1.5 GHz | 8x8x8 | 9075 | |
| RTX 4090 | 89 | 2.8 GHz | 1.3 GHz | 8x8x8 | 7899 | |
| RTX 4060 | 89 | 2.8 GHz | 2.1 GHz | 8x8x8 | 2167 | |
| RTX 4060 | 89 | 2.8 GHz | 2.1 GHz | 8x8x4 | 1932 | |
| --------------- | ---- | ----------------- | -------------------- | ----------------- | ----------- | --------------------------------------- |
| RTX 3060 OC | 86 | 2.0 GHz | 2.0 GHz | 8x8x8 | 3083 | |
| RTX 3060 | 86 | 1.8 GHz | 1.8 GHz | 8x8x8 | 2755 | |
| --------------- | ---- | ----------------- | -------------------- | ----------------- | ----------- | --------------------------------------- |
| A100 | 80 | 1.3 GHz | 1.5 GHz | 16x8x8 (D) | 10243 | |
| A100 | 80 | 1.3 GHz | 1.5 GHz | 8x8x8 | 11390 | |
| --------------- | ---- | ----------------- | -------------------- | ----------------- | ----------- | --------------------------------------- |
| RTX 2060 | 75 | 1.9 GHz | 1.7 GHz | 8x8x8 | 2397 | |
| GTX 1660 | 75 | 1.9 GHz | 2.0 GHz | 8x8x8 | 1323 | |
| GTX 1660 | 75 | 1.9 GHz | 2.0 GHz | 8x8x4 | 1248 | |
| GTX 1660 | 75 | 1.9 GHz | 2.0 GHz | 16x4x4 | 1213 | |
| T600 | 75 | 1.5 GHz | 1.25 GHz | 8x8x8 | 881 | Limited by GPU, Memory controler 84% |
| --------------- | ---- | ----------------- | -------------------- | ----------------- | ----------- | --------------------------------------- |
| K20x | 35 | 0.7 GHz | 1.3 GHz | 8x8x4 | 730 | Limited by GPU, Memory controler 47% |
| K20x | 35 | 0.7 GHz | 1.3 GHz | 8x8x8 | 670 | Limited by GPU, Memory controler 40% |
| K80 | 35 | 0.8 GHz | 1.2 GHz | 8x8x8 | 1142 | Using 1 Chip |
| --------------- | ---- | ----------------- | -------------------- | ----------------- | ----------- | --------------------------------------- |
Currently tested speed using benchmark case FP32, differences may occur due to frequencies and thermal throttle.
- D refers to dynamic allocation of shared memory, which is required to increase the maximum shared memory per block to 48 KB.
In progress
- DOI: 10.1063/5.0209802
- DOI: 10.1016/j.jnnfm.2024.105198
- ResearchGate: Evaluating the Impact of Boundary Conditions on MR-LBM
- DOI: 10.1016/j.jnnfm.2023.105030
- DOI: 10.1002/fld.5185
Hephflow is a direct evolution of the MR-LBM project.
If you are looking for the original MR-LBM implementation, see MR-LBM on GitHub.
The TODO.todo file contains list of other objectives that may or may not be implemented. Either contain items to improve performance or enhance the capabilities of the model.
This software is provided under the GPLv2 license.
For bug reports or issues, please use the GitHub issue tracker for better visibility. You can also contact the maintainers via email at: