This repository has been archived by the owner on Jan 3, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 15
Understanding the Efficiency of Ray Traversal on GPUs
License
matt77hias/GPURayTraversal
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Fast GPU Ray Traversal 1.4 -------------------------- Implementation by Tero Karras, Timo Aila, and Samuli Laine Copyright 2009-2012 NVIDIA Corporation This package contains full source code for the fast GPU-based ray traversal routines used in the following paper: "Understanding the Efficiency of Ray Traversal on GPUs", Timo Aila and Samuli Laine, Proc. High-Performance Graphics 2009 http://www.tml.tkk.fi/~timo/publications/aila2009hpg_paper.pdf In addition to the original kernels that were optimized for NVIDIA GTX 285, the package also includes kernels specifically hand-tuned for GTX 480 (GF100/Fermi) and GTX 680 (GK104/Kepler). The results for these GPUs have been published in the following technical report: "Understanding the Efficiency of Ray Traversal on GPUs - Kepler and Fermi Addendum", Timo Aila, Samuli Laine, and Tero Karras, NVIDIA Technical Report NVR-2012-02, http://research.nvidia.com/publication/understanding-efficiency-ray-traversal-gpus-kepler-and-fermi-addendum The accompanying benchmark application and test scenes aim to replicate the published results as accurately as possible, although there are slight differences in the test setup (e.g. BVH builder, CUDA version). See results.txt for details. The source code is licensed under New BSD License (see LICENSE), and hosted by Google Code: http://code.google.com/p/understanding-the-efficiency-of-ray-traversal-on-gpus/ System requirements ------------------- - Microsoft Windows XP, Vista, or 7. - At least 1GB of system memory. - NVIDIA CUDA-compatible GPU with compute capability 1.2 and at least 1.5 gigabytes of RAM. GeForce GTX 480 or GTX 680 is recommended. - Microsoft Visual Studio 2010. Required even if you do not plan to build the source code, as the runtime CUDA compilation mechanism depends on it. Instructions ------------ 1. Install Visual Studio 2010. The Express edition can be downloaded from: http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-cpp-express 2. Install the latest NVIDIA GPU drivers and CUDA Toolkit. http://developer.nvidia.com/object/cuda_archive.html 3. Run rt.exe to start the application in interactive mode. The first run executes certain initialization tasks that may take a while to complete. 4. If you get an error during initialization, the most probable explanation is that the application is unable to launch nvcc.exe contained in the CUDA Toolkit. In this case, you should: - Set CUDA_BIN_PATH to point to the CUDA Toolkit "bin" directory, e.g. "set CUDA_BIN_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\bin". - Set CUDA_INC_PATH to point to the CUDA Toolkit "include" directory, e.g. "set CUDA_INC_PATH=C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\CUDA\v4.2\include". - Run vcvars32.bat to setup Visual Studio paths, e.g. "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\vcvars32.bat". 5. Run benchmark.cmd to measure the performance of all test scenes with the same settings that were used in the paper. Expected performance numbers for different GPUs are listed in "results.txt". Note that a 64-bit build is required to benchmark the San Miguel scene. 6. Optional: Build the application yourself. - Open rt.sln in Visual Studio 2010. - Right-click the "rt" project and select "Set as StartUp Project". - Select Release build. Debug build is very slow, especially when sorting secondary rays during the benchmark. - Build and run. Test setup ---------- Camera positions: - Results of each scene are averaged over 5 distinct camera positions. - Camera positions are specified on the command line using "signature strings". - To generate a signature string, click "Show camera controls" in the interactive mode and then "Export camera signature..." Ray generation: - Viewport of 1024x768 pixels. - Primary rays that hit scene geometry generate 32 secondary rays (AO or diffuse) distributed according to a cosine-weighted Halton sequence on the hemisphere. - Primary rays that miss the geometry generate dummy secondary rays that are ignored by the ray traversal kernel and excluded from the rays-per-second figures. - AO rays are of limited length and terminate immediately after encountering an intersection. - Diffuse interreflection rays are very long and continue the traversal until they find the closest intersection. - See src/rt/ray/RayGen.cpp Batches and sorting: - All primary rays are traced in a single launch. - Secondary rays are divided into batches of 2^20 rays, and each batch is traced in a separate launch. - Primary rays are generated according to 2D Morton order in screen space. - Each batch of secondary rays is sorted according to 6D Morton order based on ray origin and direction vectors. - The sorting of is performed by the CPU and is a quite heavy operation, taking roughly one second per batch. It is disabled in the interactive mode. - See src/rt/ray/PixelTable.cpp and src/rt/ray/RayBuffer.cpp Acceleration structure: - AABB-based binary bounding volume hierarchy. - Built using spatial triangle splits (Stich et al.) to improve quality: http://www.nvidia.com/docs/IO/77714/sbvh.pdf - Triangles are represented using Woop's affine triangle transformation: http://www.sven-woop.de/publications/Diplom_SvenWoop_Final.pdf - Memory layout varies between individual traversal kernels. - See src/rt/bvh/SplitBVHBuilder.cpp and src/cuda/CudaBVH.cpp Ray traversal: - Performance results are based on the time spent in ray traversal for the selected ray type. Ray generation and sorting are excluded from the measurements. - The code for launching the traversal kernels can be found in src/cuda/CudaTracer.cpp - The kernels themselves are located in src/rt/kernels: fermi_speculative_while_while Hand-tuned to yield the best performance on GTX 480. Works on older GPUs as well, but is not optimal. kepler_dynamic_fetch Hand-tuned to yield the best performance on GTX 680. Works on older GPUs as well, but is not optimal. tesla_persistent_packet "Persistent packet" kernel from the paper. tesla_persistent_speculative_while_while "Persistent speculative while-while" kernel from the paper. This is the fastest kernel on GTX 285. tesla_persistent_while_while "Persistent while-while" kernel from the paper. Version history --------------- Version 1.4, May 22, 2012 - Include hand-tuned kernels for Kepler-based GPUs. - Improve fermi_speculative_while_while perf using vmin/vmax PTX instructions. - Include San Miguel test scene in the package. - Improve robustness of the BVH builder with degenerate input. - Switch to New BSD License (previously Apache License 2.0). - Upgrade to Visual Studio 2010 (previously 2008). - Fix a CUDA compilation issue with Visual Studio Express. - General bugfixes and improvements to framework. Version 1.3, Jul 08, 2011 - Fix compatibility issues with CUDA 4.0. Version 1.2, Dec 17, 2010 - Fix issues with nvcc path autodetection with CUDA 3.2. Version 1.1, Dec 01, 2010 - Update the codebase to support GF104 and CUDA 3.2. - Speed up ray sorting significantly by utilizing all available CPU cores. - Minor stability improvements. Version 1.0, Jun 29, 2010 - Initial release. Known issues ------------ - When using CUDA 3.2 or later, the performance of device-side code drops slightly in 64-bit builds. This is because CUDA 3.2 disallows "mixed-bitness mode", which we utilize on earlier CUDA versions to get maximum performance. With CUDA 3.2, we must always compile device code with the same bitness as host code, which generally results in higher register pressure in 64-bit builds. For more information, see: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/CUDA_3.2_Readiness_Tech_Brief.pdf - The mesh importer only supports a limited subset of the Wavefront OBJ file format. If you have trouble importing a mesh, you may want to try enabling WAVEFRONT_DEBUG in src/framework/io/MeshWavefrontIO.cpp. Acknowledgements ---------------- Anat Grynberg and Greg Ward for the Conference room model. University of Utah for the Fairy scene. Marko Dabrovic (www.rna.hr) for the Sibenik cathedral model. Guillermo M. Leal Llaguno (www.evvisual.com) for the San Miguel model.
About
Understanding the Efficiency of Ray Traversal on GPUs
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published