KdTree_Traversal

GPU Traversal of KdTree for N-body problems

This is my senior year undergrad research project. It is an implementation of this paper:

https://engineering.purdue.edu/~milind/docs/sc13.pdf

Basically it uses autorope and lockstep traversal to improve the performance for kdtree traversal on GPU

1.Reference

The kdtree construction method and CPU traversal of kdTree is downloaded from http://nghiaho.com/?p=437 for comparation.

The correctness of the original code is tested by the ANN library. So I assume the result is correct. Then I removed the ANN library and add my autorope traversal method and compare the result, and the result is correct. Since it uses same dataset every time, the result of my implementation is also correct. Then I delete the original traversal and computation function and added my lockstep traversal method, and compare it with the result from autorope. The results are also same.

2.Files and Functions

Main.cu: handle input files and points sizes, call kdtree build function, traversal functions, calculate time for each method(lockstep and autorope).

CUDA_KDtree.cu: Lockstep and autorope traversal for KNN. Kernal launch functions, memory copy function, kdtree transfer to gpu function, distance computation function. Local stack, can only support for up to 50000 data points, but no limit for data points as long as the GPU global memory is enough.

SearchAtNode function: the main traversal function Search function: extra function, will be deleted Search Batch: Kernal function of traversal CUDA_KDTree::CreateKDTree: copy the content of the KDtree created by main function, and then memcpy that to GPU pointer CUDA_KDTree::Search: handle memory allocate, kernal launch, memory copy and transfer back, and free.

CUDA_KDtree1.cu: Lockstep and autorope traversal for KNN. Stack with global memory, significantly slow, but can support large number of data points. Functions similar with CUDA_KDtree.cu

KDtree.cu: the kdtree layout and build functions on CPU.

Timers.cpp: the timer function used

KdTree traversal implementation based on Michael Goldfarb's paper: General Transformations for GPU Execution of Tree Traversals

The result is from 50k points, 100k query points for 5 nearest neighbor.

Result of local stack

Result of Global memory stack

Tested Result for Autorope and Lockstep Traversal(KNN)

The input data is from dataset directory

Tested code is in GPU_tests folder(compile with CUDA_KDTree_KNN)

Result Compare with CPU(NN using ANN library)

Tested code is in NN_Compare folder

Graph of compute time(Autorope, Lockstep, ANN)

Tested on G2-D3-1000000.csv

Compare results for real_dataset: 5 nearest neighbor

file:IHEPC.csv

N: 2075259

autorope: 12.7538 s

lockstep: 13.4201 s

kNN_kdtree: 114.39 s

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
NN_Comapre		NN_Comapre
CUDA_KDtree_KNN.cu		CUDA_KDtree_KNN.cu
CUDA_KDtree_KNN.h		CUDA_KDtree_KNN.h
Header copy.h		Header copy.h
Header.h		Header.h
KDtree.cu		KDtree.cu
KDtree.h		KDtree.h
README.md		README.md
main.cu		main.cu
times.cpp		times.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KdTree_Traversal

About

Releases

Packages

Languages

KelvinYYY/KdTree_Traversal

Folders and files

Latest commit

History

Repository files navigation

KdTree_Traversal

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages