- Update to cudart version of occupancy API
- Unified memory examples
- Consider abstraction for shared memory
- Make nbody example work without CUDA?
- Consider launch_bounds support...
- Multi-dimensional thread/block accessors
- Add version of parallel_for with an ExecutionPolicy
- Combine tests into small number of binaries
- Add streams to ExecutionPolicy
- Tests for cudaLaunch with and without nvcc
- Tests for other APIs
- Provide portable utility functions for cudaDeviceReset, etc.
- Fix/rename index accessors
- Move accessors to device_api.h