-
Notifications
You must be signed in to change notification settings - Fork 39
TREXIO + MCSCF wavefunction #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you @NastaMauger ! I noticed that you do not register the CI determinants as part of your new |
I guess the best test of the validity of the produced multi-configurational TREXIO wavefunction is via an attempt to import and use it in some software that handles such wave functions (e.g. QP2)? @NastaMauger @scemama |
@@ -434,10 +464,6 @@ def det_to_trexio(mcscf, norb, nelec, filename, backend='h5', ci_threshold=0., c | |||
with trexio.File(filename, 'u', back_end=_mode(backend)) as tf: | |||
if trexio.has_determinant(tf): | |||
trexio.delete_determinant(tf) | |||
trexio.write_mo_num(tf, mo_num) | |||
trexio.write_electron_up_num(tf, len(a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed? We need the number of up and down electrons to perform the CI determinants "correctness" check before writing them in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now handled by the _mcscf_to_trexio
function, which calls _mol_to_trexio
.
Reshape write_eri function to ensusre conssitency between other codes mcscf_function directly register the determinant list now
|
For the ERIs - I agree, @kgasperich submitted a draft PR which takes care of the AO/MO inconsistency. |
Hello, I am following this issue as it's a feature I really want to add. After discussing with @q-posev and @scemama, we observed that although we correctly registered all necessary data into TREXIO, there is still a difference between the CASSCF energy from PySCF and the exported TREXIO file. Upon examining the convergence of the energy with respect to the CAS space, it seems there may be a hidden selected CI procedure within the CASSCF routine. @sunqm Could you please confirm and provide guidance on how to correctly register the CASCI wavefunction if some selected CI is implicitly performed? Best |
There is no selected CI procedure in CASSCF that I am aware of. There is a procedure to approximate the CI step during the orbital optimization, but that should have no effect on the evaluation of the CASCI energy, which is entirely determined by the MO coefficients, one- and two-electron integrals, and CI vectors. Could you elaborate on what you are seeing? |
Are you able to reproduce the CASSCF energy for small systems like H2 molecule with (2,2) CAS? |
I think the differences I got stems for different version I was using. Now it should be ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not an expert of PySCF, but on the TREXIO side everything looks OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NastaMauger thank you!
However, test_trexio
is failing in the CI. Have you tried to run pytest
locally on your branch? Maybe it's because of the modifications in the mol_from_trexio
?
In general, did you manage to reproduce the CI energy between QP2 and PySCF via TREXIO using the code in this branch? That could be a good test to add to test_trexio.py
here if you or @scemama have a small CIPSI wavefunction in the TREXIO format.
@@ -329,6 +365,12 @@ def write_eri(eri, filename, backend='h5'): | |||
idx[:,:,2:] = idx_pair[None,:,:] | |||
idx = idx[np.tril_indices(npair)] | |||
|
|||
idx=idx.reshape((num_integrals,4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to reshape the integrals here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is due to the difference between chemist and physicist notation.
QP2 uses the physicist's notation, whereas PySCF uses the chemist's.
There is a PR where Kevin is modifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I think there should be a possibility with TREXIO to register data in one format or another, and to have an HDF5 field indicating which format the data is stored in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you test this part?
I am asking because
- I see that you swap the indices to go from one notation into another, but you do not swap the
eri
array accordingly (which remains in the original chemist notation of PySCF). - I see that
read_eri
part is untouched. So when reading the ERI's in physicist notation, one may encounter issues as PySCF may expect chemist notation. So the the index swap might be needed inread_eri
too.
I guess one of these points leads to tests failing in the CI. But maybe I missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@q-posev This is something I worked on with Anthony during a Zoom call a while ago because we noticed some weird behavior. I'm not sure if it's still necessary, so I left it here as a comment just in case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding your point 2: exactly. That's why I think something should be done in the TREXIO format to ensure that the formats are properly registered and can be used seamlessly across different software.
I never really had the opportunity to finish this, and I know Kevin is working on something similar on the QP2 side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a small comment in case someone wants to follow this. If it turns out to be useless, I guess it can easily be removed by either you or Anthony in the next PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we enforce one format or another in TREXIO, the user can still confuse them and accidentally export the wrong info. But we have trexio-tools
package which can be used to validate some TREXIO fields and people can contribute more robust tests to the repo (e.g. computing the HF energy from the one- and two-electron integrals)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I think there should be a possibility with TREXIO to register data in one format or another, and to have an HDF5 field indicating which format the data is stored in.
The whole point of TREXIO is to have a single and consistent way of doing things so the data is easy to interpret. We chose physicist's notation because chemist's notation can't generalize to more than 2-electron integrals. Reordering indices from 1,2,3,4 to 1,3,2,4 is not really a big issue...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@q-posev This is something I worked on with Anthony during a Zoom call a while ago because we noticed some weird behavior. I'm not sure if it's still necessary, so I left it here as a comment just in case.
In this zoom call we realized we had to reorder the indices because we checked the enrgy with quantum package. So we fixed the ordering for the export. I don't recall we checked importing integrals into pyscf. The reordering should also be done in the reading part.
pyscf/tools/trexio.py
Outdated
from trexio_tools.group_tools import determinant as trexio_det | ||
|
||
mo_num = norb | ||
ncore = mcscf.ncore | ||
int64_num = int((mo_num - 1) / 64) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user registers the determinant in a new/different HDF5 file, this will not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible that in previous versions of trexio you could bypass the need of storing the number of MOs in the trexio file, but it was because some internal consistency checking was lacking in TREXIO. In the current version, you can't write determinants if you don't write the number of MOs in the MO group to ensure that the determinants you store are consistent. So it should be totally equivalent.
I would rather use the library function call than compute it, because it will always return the value expected by the library. Here it is possible that user makes a mistake, and that norb is not the same as the value stored in the mo group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I guess, I need to update TREXIO on my own laptop because int64_num = trexio.get_int64_num(trexio_file)
does not work on my end
I have no clue why this CI failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NastaMauger thanks! Indeed, the CI was failing before because of the broken ERI import. Now it's fixed and the CI failed only for Python 3.12 for the reasons unknown to me and unrelated to TREXIO, which is good news :-)
So if you ask me, I think this PR can be merged, even though the ERIs are exported in the chemist notation and not in the physicist one at the moment. I believe this issue can be addressed in a separate PR.
What do you think @sunqm @MatthewRHermes @scemama ?
Please do not merge this PR yet !!! I think this PR should not be merged until the ERIs are written in phisicist notation, as is specified in the trexio format. Here PySCF is only compatible with itself, are we are back to the problem that trexio was created for. It is dangerous if people start to write wrong trexio files and start to write interfaces from those wrong files!!! |
If you want to merge quickly, please re-activate the correct reordering for ERI indices for exporting, and set the importing of ERIs as |
Ok @scemama, this is exactly what I was thinking. The correct ERIs are missing, and without them, it's useless to have a QP2 (or any other) test export. I'm a bit confused about this part: "set the importing of ERIs as |
pyscf/tools/trexio.py
Outdated
# x = idx[:,0]*(idx[:,0]+1)//2 + idx[:,1] | ||
# y = idx[:,2]*(idx[:,2]+1)//2 + idx[:,3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, NotImplementedError
breaks the tests. So it was not a good idea.
You can reactivate the read_eri
as it was before, replacing those two lines with
x = idx[:,0]*(idx[:,0]+1)//2 + idx[:,2]
y = idx[:,1]*(idx[:,1]+1)//2 + idx[:,3]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@scemama, , are the modifications you requested included in my PR? For some reason, I can’t see your comments. :( |
TREXIO CI is green now! I think this PR is ready to be merged |
@sunqm We believe that this PR is finally ready to be merged. Would it be possible to get another round of review from one of the |
@sunqm I have no clue why version 3.12 is having issues with the check, since the error arises in a part I haven't modified at all. |
Hello,
In this PR, you will find modifications that allow users to register an MCSCF function in the TREXIO format.
I am uncertain if additional data must be stored for TREXIO to function correctly with other software and to return the corresponding wave function produced by PySCF before registering it into TREXIO. However, the corresponding HDF5 file generated using the following code:
seems to register the data correctly.
Please let me know if further modifications or improvements are necessary. I am also uncertain whether the one-electron MO group is essential, as mentione here. I would also appreciate any feedback on whether the
write_eri
function needs to be modified to handle CASCI/CASSCF, particularly for the corresponding active spaceBest