Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip observables contained in particle groups #4615

Merged

Conversation

PythonFZ
Copy link
Contributor

@PythonFZ PythonFZ commented Jun 15, 2024

This is not a fix for #4598 but enables reading files that contain data in

observables
 \-- <group1>
 |    \-- <observable2>
 |         \-- step: Integer[variable]
 |         \-- time: Float[variable]
 |         \-- value: <type>[variable][D][D]

Changes made in this Pull Request:

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Developers certificate of origin


📚 Documentation preview 📚: https://mdanalysis--4615.org.readthedocs.build/en/4615/

This file hasn't been touched in 9 months and I don't think there are some open PRs so there would be a chance to format the entire file using ruff which I accidently did here faafe8b - is this of interest?

@pep8speaks
Copy link

pep8speaks commented Jun 15, 2024

Hello @PythonFZ! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2024-07-15 13:51:03 UTC

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

github-actions bot commented Jun 15, 2024

Linter Bot Results:

Hi @PythonFZ! Thanks for making this PR. We linted your code and found the following:

Some issues were found with the formatting of your code.

Code Location Outcome
main package ⚠️ Possible failure
testsuite ⚠️ Possible failure

Please have a look at the darker-main-code and darker-test-code steps here for more details: https://github.com/MDAnalysis/mdanalysis/actions/runs/9940672284/job/27457930405


Please note: The black linter is purely informational, you can safely ignore these outcomes if there are no flake8 failures!

Copy link

codecov bot commented Jun 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.59%. Comparing base (cfda8b7) to head (66b9ecb).
Report is 29 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #4615      +/-   ##
===========================================
- Coverage    93.61%   93.59%   -0.02%     
===========================================
  Files          171      183      +12     
  Lines        21243    22316    +1073     
  Branches      3934     3936       +2     
===========================================
+ Hits         19886    20886    +1000     
- Misses         898      971      +73     
  Partials       459      459              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@PythonFZ PythonFZ force-pushed the h5md-skip-per-particles-observables branch from faafe8b to 078d168 Compare June 15, 2024 07:52
@orbeckst
Copy link
Member

@edisj and @ljwoods2 could you have a quick look at this PR and comment?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the logger but logging at every step is potentially a lot of output and performance hit. Furthermore, I don’t think users care. Thus I suggest you remove the logging and just pass.

We also need at least one test case.

@orbeckst
Copy link
Member

@hmacdope would you be able to shepherd this PR? If not switch it to me.

prec = 3
ext = 'h5md'

@pytest.fixture(scope='class')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be mistaken, but I think the test_n_frames method isn't testing the issue- if the reader was going to except out due to a key error, this would happen during the line in the universe fixture:

reader = H5MDReader(H5MD_energy, convert_units=True)

when the H5MDReader's __init__ is called and the first call to _read_next_timestep is made. The test method doesn't call any code that reads a frame, it just ends up calling the reader's n_frames method. I agree with your approach to fixing the issue, just maybe have the test initialize or initialize and iterate through the reader so that the potential issue will occur in the scope of the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure to update the CHANGELOG!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now. The CI failing seems not related to this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpick- can you change the name of the test to make it represent what it is testing? Also, the assert statement seems unrelated to what the test is testing. Maybe write an except and pytest.fail statement upon raising a KeyError but otherwise don't have an assert statement. Here's an example from testsuite/coordinates/base.py:

    def test_frame_jump_issue1942(self, ref, reader):
        """Test for issue 1942 (especially XDR on macOS)"""
        reader.rewind()
        try:
            for ii in range(ref.n_frames + 2):
                reader[0]
        except StopIteration:
            pytest.fail("Frame-seeking wrongly iterated (#1942)")

Otherwise LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the test and used pytest.fail. I hope the test structure now is a bit cleaner. Thanks for the review.

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! This is a good start until we decide what to do with the other form of observables.

@@ -231,6 +232,7 @@ class MockH5pyFile:
HAS_H5PY = True



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo blank line change.

Comment on lines 912 to 915
try:
_ = H5MDReader(H5MD_energy, convert_units=True)
except KeyError:
pytest.fail("Could not read H5MD file with observables.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just have

_ = H5MDReader(H5MD_energy, convert_units=True)
As the body of the test, no need to try, except. Pytest can catch all the errors.

See also comment about warnings, we should catch suggested warning.

package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW @PythonFZ the following will successfully read into ts.data with the key form group/observable

    def _copy_to_data(self):
        """assigns values to keys in data dictionary"""

        if "observables" in self._file:
            for key in self._file["observables"].keys():
                try:
                    # if has value as subkey read directly into data
                    if "value" in self._file["observables"][key]:
                        self.ts.data[key] = self._file["observables"][key][
                            "value"
                        ][self._frame]
                    # if value is not a subkey, read dict of subkeys
                    else:
                        for subkey in self._file["observables"][key].keys():
                            self.ts.data[key + "/" + subkey] = self._file["observables"][key][
                                subkey
                            ]["value"][self._frame]
                except KeyError:
                    warnings.warn(f"Could not read {key} from observables group, not a legal H5MD observable specification or data not ")

@PythonFZ
Copy link
Contributor Author

@hmacdope What is the expected way to auto-format the code? I'm used to running either black or ruff format but once I run that on the code base it will yield to changes in most of the files. I haven't found anything at https://userguide.mdanalysis.org/stable/contributing.html?

Maybe there are some pre-commit hooks I haven't found?

@@ -372,6 +373,7 @@
TPR_xvf = (_data_ref / 'cobrotoxin.tpr').as_posix()
TRR_xvf = (_data_ref / 'cobrotoxin.trr').as_posix()
H5MD_xvf = (_data_ref / 'cobrotoxin.h5md').as_posix()
H5MD_energy = (_data_ref / 'cu.h5').as_posix()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the convention for other h5md files out there to use the *.h5 extension rather than *.h5md?

Copy link
Contributor Author

@PythonFZ PythonFZ Jun 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename the file to *.h5md. I've seen both and personally prefer *.h5 because I was lazy and did not configure the VS Code H5WEB extension to also read *.h5md. Afaik H5MD is the only format specification for H5 files, so there is no risk of confusion.

Edit: I haven't seen a recommendation for the *.h5md suffix in the H5MD definition so I would actually prefer keeping the more general *.h5.

@hmacdope
Copy link
Member

@PythonFZ lets go with .h5md as I think the missing extension may be causing some test failures. Other than that this is very close.

@PythonFZ
Copy link
Contributor Author

@PythonFZ lets go with .h5md as I think the missing extension may be causing some test failures. Other than that this is very close.

I renamed the files and ensured everything is properly formatted. I hope everything is good to go?

@PythonFZ PythonFZ requested a review from hmacdope June 27, 2024 18:34
@hmacdope
Copy link
Member

There seems to be some failing test on the azure runner. Can your reproduce locally @PythonFZ ?

@PythonFZ
Copy link
Contributor Author

There seems to be some failing test on the azure runner. Can your reproduce locally @PythonFZ ?

All tests are passing for me locally. From what I see in the logs, it is the following test (correct me if I am wrong)

/home/vsts/work/1/s/testsuite/MDAnalysisTests/parallelism/test_multiprocessing.py

All tests from that file run locally for me. Also, these tests should not have been affected by this PR and the tests I changed also all run locally. The same seems to be true for the GH hosted runners. I'm a bit lost here on how to fix this, because I don't know the differences between the Azure runner and the GH runners.

@orbeckst
Copy link
Member

orbeckst commented Jun 28, 2024 via email

@hmacdope
Copy link
Member

Let me kick CI IIRC there were some failing tests on mdanalysis-CI (Azure_Tests Linux-Python310-64bit-full-wheel)

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on your first contribution @PythonFZ! Much appreciated. Will be useful with some exciting H5MD developments around the corner (CC @ljwoods2).

@hmacdope hmacdope enabled auto-merge (squash) July 2, 2024 03:35
@hmacdope hmacdope disabled auto-merge July 2, 2024 03:35
@hmacdope
Copy link
Member

hmacdope commented Jul 2, 2024

@orbeckst are you happy with changes here?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good overall, just two docs/comments requests.

"observables"
][key][subkey]["value"][self._frame]
except KeyError:
warnings.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not legal, shouldn't we be failing? Or is the reasoning that we don't overly care about observables as opposed to x/v/f?

Perhaps add a comment explaining your rationale.

Also, given that this is not tested, add a #pragma: nocover (or whatever you have to do to show that this code block isn't covered). It will tell anyone else reading the code that it's un-tested.

Tested would be better, of course, but if we don't care about observables then I don't insist on a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor of raising an Error here as well. Shall I create an additional h5md file which contains wrong data to test against this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be ideal!

package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
@orbeckst
Copy link
Member

orbeckst commented Jul 9, 2024

@PythonFZ if you can have a quick look at my comments then we should be able to merge this PR soon.

@PythonFZ
Copy link
Contributor Author

@orbeckst I've added a file that tests for AttributeError. I can add another file for testing against KeyError but this is an unlikely case. The code will account for this though. Let me know if you prefer the additional file or are good with this.

I'm not sure how to account for

Add this limitation to the documentation.

If you are referring to

We should issue a warning that observables of the form observables/group1/observable1 are being ignored if present.

This should not longer be the case and all valid, non-fixed time step, H5MD datasets should work.

@orbeckst
Copy link
Member

lgtm, thanks for adding the test @PythonFZ !

@orbeckst orbeckst merged commit 1bf58cc into MDAnalysis:develop Jul 17, 2024
19 of 23 checks passed
@orbeckst
Copy link
Member

Thank you for your contribution @PythonFZ ! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants