Skip to content

Conversation

@thecapablesnakekeeper
Copy link

Repo Link

Features implemented:

  • CPU: scan, compact without + with scan
  • GPU: Naive scan, Work-Efficient scan, Stream compaction
  • Thrust scan
  • Attempted optimizing the work-efficient scan by launching only as many threads as the number of resulting elements in each iteration

Feedback:

  • It would have been good to make us work on making the GPU implementations more performant that the CPU ones.
  • This project could also have been a good opportunity to show us how to use various profiling tools to identify the different kinds of bottlenecks that can occur, and tell us how to go from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant