Venado optimizations #297

mewall · 2025-01-14T23:24:22Z

Venado optimizations with many new functionalities added

Notable revisions:

o gpmdk example code for graph-based QMD using the kernel method
o updates to use the new bml_transpose fortran interface
o additional progress subroutines to support gpmdk functionalities
o bug fixes
o modifications to decrease memory allocations, speeding up the code

This change is

o Kernel before rank update was being used. Fix improves SCF error o Fix the matmul equation by specifying index range on lhs

…code to fix issue with XYZ coordinates being printed across two lines instead of on a single line

o Pass subarray of ham, over to get_skblock o Change get_hsmat nested loop to enable collapse(2) In particular, the collapse() clause improves performance

o Modify get_skblock_inplace to assume zero elements on entry o Modify get_hsmat to zero the ham, over matrices before calc

o Create get_dH_or_dS using new get_skblock_inplace method - Initialize matrices with zero o Use new method in gpmdk o Performance is now on par with cray vectorized method

o Rewrite CPU code to eliminate packs o Offload the calc for nvidia build

mewall and others added 30 commits January 14, 2025 15:57

Add gpmdk for the Venado hackathon

9d12c58

Add build script for hackathon

f2a3c08

Update bml submodule

3efdf37

Update bml submodule

8c1bcab

Add electrons.dat to latteTBparams

48b8022

Add MPI barrier to pin down performance issue

05314a5

Add venado build, env, and run scripts

ff1bd08

Bug fix

7525e35

Add TrpCage example for gpmdk

17ad747

Fixed line truncations

639473f

Added sedacs partition and field induced forces

61a5be8

Added main fro field

a61c7a6

NVTX tags

f62b952

Introducing nvtx tags and some optimizations

c1aaa69

Debug resizing and resize two more arrays

e3f9a44

Resize zqt array and use new bml_transpose_inplace to avoid allocation

2c63bed

Move response into gpmdk in preparation for GPU kernel

a044282

Add some useful build scripts

dfeb3a1

Update bml submodule

1aebd59

Reduce allocations in hcsf method

15594cf

Update build script

11695ec

Modifications to support the new bml_transpose Fortran API

f1d152c

Add nvtx tags for charges and thread charge calculation

6195172

Add nvtx tag methods to gpmdk

2c72a40

Debug cray build and work on omp offload

2c7afd7

Working omp offload for gpmdcov_response

0ae84e1

Bug fix

468bca1

Allocate smaller array for work in kernel

07f9dcb

Code decorations to investigate MPI imbalance

5edf99b

Build updates

b941d17

mewall and others added 30 commits January 14, 2025 15:59

Eliminate debug output of FORCESS

6a2cb5f

Update bml submodule

6722bda

Update bml submodule and move build scripts to scripts/ dir

87cf0c8

breaking lines

63828d5

fixing gpmd.py

52e7fd6

Add more nvtx tags

23c3af7

Added err_var

69d937a

Workaround for Cray matmul bug in gpmdk

a4dc3a2

Report max time and rank for dH+dS

26f34d8

Fix kernel bug. Better matmul fix.

e2b232c

o Kernel before rank update was being used. Fix improves SCF error o Fix the matmul equation by specifying index range on lhs

Clean up comments and begin working on offload of get_dH_or_dS_vect

fde467f

Added protections against double alloc

59a45c5

Prepare for OMP offload optimization

49154ca

Fix bug in graph update

9fff9ab

Use MAGMA pointer in offload response kernel

0315f8c

Working on openACC

9b5b4bd

Working offload of response using magma pointer

829eb87

fixing compilation bug gfortran

578dec1

Added voltage option

bdc7776

Added dos

7779af8

changes for coarse MD

c5dc023

Update bml submodule

7b2ece1

Added formatting statements to prg_system_mod XYZ trajectory writing …

bfd58e4

…code to fix issue with XYZ coordinates being printed across two lines instead of on a single line

Opts for nvidia build

8a35116

o Pass subarray of ham, over to get_skblock o Change get_hsmat nested loop to enable collapse(2) In particular, the collapse() clause improves performance

NVIDIA opt get_hsmat

89f9eaa

o Modify get_skblock_inplace to assume zero elements on entry o Modify get_hsmat to zero the ham, over matrices before calc

NVIDIA opt hsderivative

305dbc9

o Create get_dH_or_dS using new get_skblock_inplace method - Initialize matrices with zero o Use new method in gpmdk o Performance is now on par with cray vectorized method

OpenACC accelerated nonorthocoul

cb20d3d

o Rewrite CPU code to eliminate packs o Offload the calc for nvidia build

Add variables to select MD steps for nsys profiling

33cf90a

Disable bisection in gpmd. Nonorthocoul opts.

0933417

Fix merge error

03ef662

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Venado optimizations #297

Venado optimizations #297

mewall commented Jan 14, 2025 •

edited by nicolasbock

Loading

Venado optimizations #297

Are you sure you want to change the base?

Venado optimizations #297

Conversation

mewall commented Jan 14, 2025 • edited by nicolasbock Loading

mewall commented Jan 14, 2025 •

edited by nicolasbock

Loading