Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve unphysical (greater than 1) occupancy handling in CifParser and add missing site label if not check_occu #3819

Merged
merged 21 commits into from
May 30, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented May 11, 2024

Summary

To fix #3816.

  • Raise ValueError if occupancy exceeds occupancy_tolerance and if check_occu, with a more descriptive error message.
  • [Need Confirmation] Allow any occupancy (ignore occupancy_tolerance) in cif when not check_occu in 7cf5ba9, a warning would still be raised.
  • Fix a bug where site label is lost if not check_occu by 05c11d4.

A more descriptive error message for unphysical sites

With currently implementation (with check_occu = False), ValueError for occupancy beyond tolerance would be raised by the checker inside Structure (repored in #3816), with a misleading error message (Some occupancies ([2.0, 2.0, 2.0, 1.0, 1.0]) sum to > 1), where it should not be > 1, but > occupancy_tolerance instead.

Clarify check_occu

pymatgen/pymatgen/io/cif.py

Lines 1263 to 1266 in bb68c78

check_occu (bool): Whether to check site for unphysical occupancy > 1.
Useful for experimental results in which occupancy was allowed to
refine to unphysical values. Warning: unphysical occupancies are
incompatible with many pymatgen features. Defaults to True.

The name of check_occu does not reflect its functionality. With current implementation, occupancy would be checked regardless of the value of check_occu.

pymatgen/pymatgen/io/cif.py

Lines 1067 to 1081 in bb68c78

# Check occupancy
_sum_occupancies: list[float] = [
sum(comp.values())
for comp in coord_to_species.values()
if set(comp.elements) != {Element("O"), Element("H")}
]
if any(occu > 1 for occu in _sum_occupancies):
msg = (
f"Some occupancies ({_sum_occupancies}) sum to > 1! If they are within "
"the occupancy_tolerance, they will be rescaled. "
f"The current occupancy_tolerance is set to: {self._occupancy_tolerance}"
)
warnings.warn(msg)
self.warnings.append(msg)

We would need to clarify its behavior in docstring.

@@ -735,10 +735,10 @@ def test_bad_cif(self):
filepath = f"{TEST_FILES_DIR}/cif/bad_occu.cif"
parser = CifParser(filepath)
with pytest.raises(
ValueError, match="No structure parsed for section 1 in CIF.\nSpecies occupancies sum to more than 1!"
Copy link
Contributor Author

@DanielYang59 DanielYang59 May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous error message might be misleading, as this fails because the occupancy is greater than tolerance, not 1.

@@ -1196,8 +1203,7 @@ def parse_structures(
"in the CIF file as is. If you want the primitive cell, please set primitive=True explicitly.",
UserWarning,
)
if not check_occu: # added in https://github.com/materialsproject/pymatgen/pull/2836
warnings.warn("Structures with unphysical site occupancies are not compatible with many pymatgen features.")
Copy link
Contributor Author

@DanielYang59 DanielYang59 May 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning should not be raised just because if not check_occu, it should really check the occupancy and only warn if the occupancy is "unphysical".

The following code has already done this, so I would suggest removing it:

pymatgen/pymatgen/io/cif.py

Lines 1011 to 1018 in 2e1c301

if any(occu > 1 for occu in sum_occu):
msg = (
f"Some occupancies ({sum_occu}) sum to > 1! If they are within "
"the occupancy_tolerance, they will be rescaled. "
f"The current occupancy_tolerance is set to: {self._occupancy_tolerance}"
)
warnings.warn(msg)
self.warnings.append(msg)

@DanielYang59 DanielYang59 marked this pull request as ready for review May 11, 2024 09:21
@DanielYang59
Copy link
Contributor Author

@janosh. Can you please review (and comment) on this? Thanks.

@DanielYang59 DanielYang59 changed the title Improve occupancy handling in io.cif.CifParser Improve occupancy (greater than 1) handling in CifParser May 12, 2024
@DanielYang59 DanielYang59 changed the title Improve occupancy (greater than 1) handling in CifParser Improve unphysical (greater than 1) occupancy handling in CifParser May 13, 2024
@DanielYang59 DanielYang59 changed the title Improve unphysical (greater than 1) occupancy handling in CifParser Improve unphysical (greater than 1) occupancy handling in CifParser and add missing site label if not check_occu May 15, 2024
@DanielYang59 DanielYang59 marked this pull request as draft May 16, 2024 00:52
@DanielYang59 DanielYang59 marked this pull request as ready for review May 16, 2024 03:07
pymatgen/io/cif.py Outdated Show resolved Hide resolved
pymatgen/io/cif.py Outdated Show resolved Hide resolved
@DanielYang59 DanielYang59 marked this pull request as draft May 16, 2024 04:57
@@ -1149,7 +1149,10 @@ def get_matching_coord(
all_species_noedit = all_species.copy() # save copy before scaling in case of check_occu=False, used below
for idx, species in enumerate(all_species):
total_occu = sum(species.values())
if 1 < total_occu <= self._occupancy_tolerance:
if check_occu and total_occu > self._occupancy_tolerance:
raise ValueError(f"Occupancy {total_occu} exceeded tolerance.")
Copy link
Contributor Author

@DanielYang59 DanielYang59 May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would help capture the occupancy > tolerance error.

Without this, if an occupancy is greater than tolerance, it would not be scaled, and passed directly into Structure. The Exception raised by Structure (because of unphysical occupancy) would be compressed and replaced with a general message which doesn't show the reason for failure:

pymatgen/pymatgen/io/cif.py

Lines 1291 to 1301 in bb68c78

for idx, data in enumerate(self._cif.data.values()):
try:
if struct := self._get_structure(data, primitive, symmetrized, check_occu=check_occu):
structures.append(struct)
except (KeyError, ValueError) as exc:
msg = f"No structure parsed for section {idx + 1} in CIF.\n{exc}"
if on_error == "raise":
raise ValueError(msg) from exc
if on_error == "warn":
warnings.warn(msg)

For example the following error message provided in #3816:

Some occupancies ([2.0, 2.0, 2.0, 1.0, 1.0]) sum to > 1! If they are within the occupancy_tolerance, they will be rescaled. The current occupancy_tolerance is set to: 1.0
No structure parsed for section 1 in CIF.
...
in CifParser.parse_structures(self, primitive, symmetrized, check_occu, on_error)
   1219     warnings.warn("Issues encountered while parsing CIF: " + "\n".join(self.warnings))
   1221 if len(structures) == 0:
-> 1222     raise ValueError("Invalid CIF file with no structures!")
   1223 return structures

ValueError: Invalid CIF file with no structures!

@DanielYang59 DanielYang59 marked this pull request as ready for review May 16, 2024 08:44
@shyuep shyuep merged commit a772ddb into materialsproject:master May 30, 2024
23 checks passed
@DanielYang59 DanielYang59 deleted the fix-3816-cif branch May 30, 2024 23:53
@janosh janosh added needs discussion Needs discussion to agree on actionable next steps ux User experience cif Crystallographic information file labels May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cif Crystallographic information file needs discussion Needs discussion to agree on actionable next steps ux User experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Argument check_occu is not working as intended (in _get_structure and parse_structures)
4 participants