Project 2: Utkarsh Dwivedi #23
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Repo Link
Features implemented:
Note: my naive GPU scan is faster than the work-efficient GPU scan. This makes NO sense to me, and I've spent an exorbitant amount of time debugging this: my code is correct, the problem seems to be somewhere else. I've listed my theoretical guesses on why this is happening in the README for the project, but I'm listing more information here that I thought made less sense to be shared in a publicly available project. This is more of a in-class debugging note.
After a lot of debugging, I eventually ran the deviceQuery sample from Nvidia's
cuda-samplesand found a CUDA version mismatch: my CUDA driver version was 12.1 but CUDA runtime version was 11.8. I thought perhaps this could be the explanation of the weird behaviour of the naive vs work-efficient algorithms. I updated my CUDA versions to 12.2 for both the driver and runtime, and re-ran my project. The problem was still there. I'm not sure what is happening, if there's any explanation to this I'd appreciate it.