Skip to content

KelvinYYY/KdTree_Traversal

Repository files navigation

KdTree_Traversal

GPU Traversal of KdTree for N-body problems

This is my senior year undergrad research project. It is an implementation of this paper:

https://engineering.purdue.edu/~milind/docs/sc13.pdf

Basically it uses autorope and lockstep traversal to improve the performance for kdtree traversal on GPU

1.Reference

The kdtree construction method and CPU traversal of kdTree is downloaded from http://nghiaho.com/?p=437 for comparation.

The correctness of the original code is tested by the ANN library. So I assume the result is correct. Then I removed the ANN library and add my autorope traversal method and compare the result, and the result is correct. Since it uses same dataset every time, the result of my implementation is also correct. Then I delete the original traversal and computation function and added my lockstep traversal method, and compare it with the result from autorope. The results are also same.

2.Files and Functions

Main.cu: handle input files and points sizes, call kdtree build function, traversal functions, calculate time for each method(lockstep and autorope).

CUDA_KDtree.cu: Lockstep and autorope traversal for KNN. Kernal launch functions, memory copy function, kdtree transfer to gpu function, distance computation function. Local stack, can only support for up to 50000 data points, but no limit for data points as long as the GPU global memory is enough.

SearchAtNode function: the main traversal function Search function: extra function, will be deleted Search Batch: Kernal function of traversal CUDA_KDTree::CreateKDTree: copy the content of the KDtree created by main function, and then memcpy that to GPU pointer CUDA_KDTree::Search: handle memory allocate, kernal launch, memory copy and transfer back, and free.

CUDA_KDtree1.cu: Lockstep and autorope traversal for KNN. Stack with global memory, significantly slow, but can support large number of data points. Functions similar with CUDA_KDtree.cu

KDtree.cu: the kdtree layout and build functions on CPU.

Timers.cpp: the timer function used

KdTree traversal implementation based on Michael Goldfarb's paper: General Transformations for GPU Execution of Tree Traversals

The result is from 50k points, 100k query points for 5 nearest neighbor.

Result of local stack

capture

Result of Global memory stack

capture1

Tested Result for Autorope and Lockstep Traversal(KNN)

The input data is from dataset directory

knn_10000 knn_10000 knn_10000

Tested code is in GPU_tests folder(compile with CUDA_KDTree_KNN)

Result Compare with CPU(NN using ANN library)

capture

Tested code is in NN_Compare folder

Graph of compute time(Autorope, Lockstep, ANN)

Tested on G2-D3-1000000.csv

capture-2

Compare results for real_dataset: 5 nearest neighbor

file:IHEPC.csv

N: 2075259

autorope: 12.7538 s

lockstep: 13.4201 s

kNN_kdtree: 114.39 s

About

GPU Traversal of KdTree for N-body problems

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published