Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 47 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,50 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**
# University of Pennsylvania, CIS 565: GPU Programming and Architecture

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
## Project 1 - Flocking
* Liang Peng
* Tested on: Windows 10, i7-6700HQ @ 2.6GHz, 8GB, GTX 960M (Personal Computer)

### (TODO: Your README)
## Screenshots
* Rendering
<br><img src="images/flocking.gif" width="500">
* Profiling
<br><img src="images/profiling.PNG" width="500">

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
## Performance Analysis

### With Visualization

Algorithm | Max Boid Count | Framerate (FPS)
:---:|:---:|:---:
Brute-Force | 5,000 | 60
Scattered Uniform Grid | 55,000 | 60
Coherent Uniform Grid | 120,000 | 60

### Without Visualization

Algorithm | Boid Count | Framerate (FPS)
:---:|:---:|:---:
Brute-Force | 5,000 | 72
Scattered Uniform Grid | 5,000 | 590
Coherent Uniform Grid | 5,000 | 640

### Block Size

Boid Count | Block Size | Framerate (FPS)
:---:|:---:|:---:
50000 | 16 | 109
50000 | 32 | 170
50000 | 64 | 182
50000 | 128 | 180
50000 | 256 | 170
50000 | 1024 | 170

### Conclusion
* When boid count is small, framerate can be maintained at 60 fps. As boid count increases, framerate will from some point drop below 60 fps.

* Algorithm used to update boid positions and velocities has large influence on simulation performance.
* From 1.2 to 2.1, neighbor search efficiency is greatly improved by using grid index information.
* From 2.1 to 2.3, performance is further improved because cache-hit rate is enhanced by grouping data accessed by neighboring threads.

* Block size and block count has some impact on performance. As block size increases and block count decreases, framerate will rise and at some point drop.
* _My speculation_ If block size is too small, block count will be large. Since each block is processed by a core and number of core is limited, number of cycles to handle all blocks will increase. If block size is too big, since capacity of cache in that block shared by its threads is limited, replacement of data in cache will become more frequent and decrease cache-hit rate thus affect performance.
Binary file added images/flocking.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/profiling.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_50
)
Loading