Skip to content

Conversation

@utkarshdwivedi3997
Copy link

@utkarshdwivedi3997 utkarshdwivedi3997 commented Sep 20, 2023

Repo Link

Features implemented:

  • CPU: scan, stream compaction with and without scan
  • GPU: Naive scan, Work-Efficient scan, Stream compaction using work-efficient scan
  • Thrust scan

Note: my naive GPU scan is faster than the work-efficient GPU scan. This makes NO sense to me, and I've spent an exorbitant amount of time debugging this: my code is correct, the problem seems to be somewhere else. I've listed my theoretical guesses on why this is happening in the README for the project, but I'm listing more information here that I thought made less sense to be shared in a publicly available project. This is more of a in-class debugging note.

After a lot of debugging, I eventually ran the deviceQuery sample from Nvidia's cuda-samples and found a CUDA version mismatch: my CUDA driver version was 12.1 but CUDA runtime version was 11.8. I thought perhaps this could be the explanation of the weird behaviour of the naive vs work-efficient algorithms. I updated my CUDA versions to 12.2 for both the driver and runtime, and re-ran my project. The problem was still there. I'm not sure what is happening, if there's any explanation to this I'd appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant