-
Notifications
You must be signed in to change notification settings - Fork 100
CodeFest Jlab January 2018
maddyscientist edited this page Jan 13, 2018
·
7 revisions
- Different precision for halo and body (
colorspinor::FieldOrder
) - Add support for 8-bit fixed point in QUDA adding new
QUDA_QUARTER_PRECSION
- 8-bit halos for smoother (combining above two)
- Multi-right-hand sides MG setup for fine and coarse grids (bigger effect on coarse grids)
- Add support for non-Hermitian chronological prediction
- Investigate stability of chronological subspace evolution (over refinement issues seen on pure gauge?)
- Try CG for null-space finding?
Memory reduction strategies:
- thrust memory allocations don't seem to be routed through QUDA's allocators
- remove fp32 null-space temporary during prolongator construction
- use same smoother for pre and post
- can chrono vectors be in single precision
- run the GCR in half precision?
Copy gauge and copy gauge-kernels are not using fine grained parallelization and hence are running very slow. E.g., 2^4, Nc=24 copy ghost takes 4ms per direction on P100 vs. 10us for the coarse dslash.
Applied fine-grained parallelization, and these kernels are running 10-30us - problem fixed!