-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCSD T2_8 w/ just DGEMM #882
Conversation
@jeffhammond the code is not ready for the case when
|
okay, sorry, i thought i tested without but apparently i didn't clean the environment. i'll fix it. |
No problem. My only suggestion is to run github actions in your own fork first, so that you avoid burning github actions' cycyles that could be used by other pulls and/or schedule tests. |
@jeffhammond Merge is dangerous. This is what has cause the repository corruption in the past. |
Could you try the following?
As an alternative, I could do the |
I though GitHub update would do rebase. I'll fix it. |
Actually feel free to fix it. It's late here. |
37be848
to
8d8ad2e
Compare
Done. |
Do you agree this is ready for review? It seems to pass CI on my end. |
I have added What about threading control of the threaded BLAS now used by these kernels? Is the number of threads set anywhere? Line 180 in 7b06d34
|
No, I haven't made that change yet, because nobody will ever run that code. I wrote it as a prototype at Intel, and with Intel OpenMP and MKL will figure out the threading thing properly. I will clean it up eventually, probably to remove the code in ccsd_kernels.F altogether. |
CCSD_T2_8 should never have used transpose/sort and I should have done this 15 years ago, but here we are.
Karol wrote the transpose-free loop version, but it's equivalent to DGEMM. This change uses DGEMM.
For the ICSD/NTS version, I just pulled in the whole routine from CCSD because it's much cleaner.
In single-node testing, this appears to be 20-30% faster (0.4s versus 0.6s for H2O cc-pVQZ).