Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathlib object handling for Universe, SingleFrameReaderBase and Toplogy parsers (Issue #3937) #4535

Open
wants to merge 18 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ The rules for this file:
-------------------------------------------------------------------------------
??/??/?? IAlibay, HeetVekariya, marinegor, lilyminium, RMeli,
ljwoods2, aditya292002, pstaerk, PicoCentauri, BFedder,
tyler.je.reddy, SampurnaM, leonwehrhan, kainszs, orionarcher

tyler.je.reddy, SampurnaM, leonwehrhan, kainszs, orionarcher,
talagayev, hmacdope

* 2.8.0

Fixes
Expand All @@ -39,6 +40,7 @@ Fixes
* Fix groups.py doctests using sphinx directives (Issue #3925, PR #4374)

Enhancements
* Handling of Pathlib.Path in SingleFrameReaderBase, Topology and Universe (Issue #3937)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit vague and doesn't actually tell me what is fixed or what is now possible. e.g. Is it that pathlib.Paths are now correctly handled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, originally it was only targeting pathlib object handling of SingleFrameReaderBase in specific cases, but with the adjustment of @hmacdope it would cover Pathlib.Paths for all cases if the errros that appear currently due to the changes in the base.py in topology and also the base.py in coordinates, which could be fixed with

elif hasattr(filename, 'topology'):
self.filename = filename

but this would just exclude the cases of topology so I am not sure if this would fix it and cover then all the Pathlib.Path cases

* Added a tqdm progress bar for `MDAnalysis.analysis.pca.PCA.transform()`
(PR #4531)
* Improved performance of PDBWriter (Issue #2785, PR #4472)
Expand Down
6 changes: 5 additions & 1 deletion package/MDAnalysis/coordinates/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1663,7 +1663,11 @@ class SingleFrameReaderBase(ProtoReader):
def __init__(self, filename, convert_units=True, n_atoms=None, **kwargs):
super(SingleFrameReaderBase, self).__init__()

self.filename = filename
if isinstance(filename, NamedStream):
self.filename = filename
else:
self.filename = str(filename)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here errors if we use self.filename = str(filename) in the test_mmtf.py
The lead to errors similar to this one:

FileNotFoundError: [Errno 2] No such file or directory: '<mmtf.api.mmtf_reader.MMTFDecoder object at 0x135416780>'

I managed to fix the error by adding:
else:
self.filename = str(filename)
if "MMTF" in self.filename:
self.filename = filename

basically putting a specific condition for MMTF cases, not the cleanest approach, but maybe this helps @hmacdope fixing the error in a better way :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look more into it now, the "MMTF" is a specific case, the other errors appear mainly because of the parmed parser which has both problems with the base.py in the coordinates and topology. It is possible to use in the base.py in the coordinates:

elif hasattr(filename, 'topology'):
self.filename = filename

This helps to fix the errors, although it is quite similar as to just directly using self.filename = filename without any if/elif/else conditions. For the base.py in topology I will try to figure out if there is any option to fix it, since again having there only self.filename = filename also fixes the errors with the parmed parser

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@talagayev I think the FileNotFoundError for mmtf above is because instead of a filename, sometimes a MMTFDecoder object is encountered. I think what you've got is fixing it, but it might be clearer to have a block like:

if isinstance(filename, NamedStream):
    self.filename = filename
elif 'mmtf' in str(filename.__class__):
    # mmtf case, where we've avoided a mmtf import in detection
    self.filename = filename

Where you've enumerated all the special cases explicitly in each elif branch, rather than hiding some special cases inside other branches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, yes the option that I got worked, but was definitely not the best option, but yes was connected with mmtf and yes elif 'mmtf' in str(filename.__class__): looks much better :)


self.convert_units = convert_units

self.n_frames = 1
Expand Down
4 changes: 4 additions & 0 deletions package/MDAnalysis/core/universe.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@
import warnings
import contextlib
import collections
import pathlib

import MDAnalysis
import sys
Expand Down Expand Up @@ -99,6 +100,9 @@ def _check_file_like(topology):
else:
_name = None
return NamedStream(topology, _name)

elif isinstance(topology, pathlib.Path):
return str(topology)
return topology

def _topology_from_file_like(topology_file, topology_format=None,
Expand Down
8 changes: 7 additions & 1 deletion package/MDAnalysis/topology/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
from ..lib import util



class _Topologymeta(type):
"""Internal: Topology Parser registration voodoo

Expand Down Expand Up @@ -115,7 +116,12 @@ class TopologyReaderBase(IOBase, metaclass=_Topologymeta):
Added keyword 'universe' to pass to Atom creation.
"""
def __init__(self, filename):
self.filename = filename

if isinstance(filename, util.NamedStream):
self.filename = filename
else:
self.filename = str(filename)


def parse(self, **kwargs): # pragma: no cover
raise NotImplementedError("Override this in each subclass")
Expand Down
18 changes: 18 additions & 0 deletions testsuite/MDAnalysisTests/coordinates/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,14 @@

import numpy as np
import pytest
from pathlib import Path
from unittest import TestCase
from numpy.testing import (assert_equal, assert_almost_equal,
assert_array_almost_equal, assert_allclose)

import MDAnalysis as mda
from MDAnalysis.coordinates.timestep import Timestep
from MDAnalysis.coordinates.memory import MemoryReader
from MDAnalysis.transformations import translate


Expand Down Expand Up @@ -126,6 +128,12 @@ def test_pickle_singleframe_reader(self):
assert_equal(reader.ts, reader_p.ts,
"Single-frame timestep is changed after pickling")

def test_pathlib_input_single(self):
path = Path(self.filename)
u_str = mda.Universe(self.filename)
u_path = mda.Universe(path)
assert u_str.atoms.n_atoms == u_path.atoms.n_atoms


class BaseReference(object):
def __init__(self):
Expand Down Expand Up @@ -521,6 +529,16 @@ def test_timeseries_atomgroup_asel_mutex(self, reader):
atoms = mda.Universe(reader.filename).select_atoms("index 1")
with pytest.raises(ValueError, match="Cannot provide both"):
timeseries = reader.timeseries(atomgroup=atoms, asel=atoms, order='fac')

def test_pathlib_input_base(self, reader):
if isinstance(reader, MemoryReader):
if isinstance(reader, MemoryReader):
skip_reason = "MemoryReader"
pytest.skip(f"Skipping test for Pathlib input with reason: {skip_reason}")
path = Path(reader.filename)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here adding an additional Line:
path = path.as_posix())

Fixes the following Failed Pytest in 4 cases:
AttributeError: 'PosixPath' object has no attribute 'encode'

u_str = mda.Universe(reader.filename)
u_path = mda.Universe(path)
assert u_str.atoms.n_atoms == u_path.atoms.n_atoms


class MultiframeReaderTest(BaseReaderTest):
Expand Down
1 change: 0 additions & 1 deletion testsuite/MDAnalysisTests/coordinates/test_gro.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@
assert_equal,
)
import pytest
from io import StringIO


class TestGROReaderOld(RefAdK):
Expand Down
8 changes: 8 additions & 0 deletions testsuite/MDAnalysisTests/topology/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,11 @@ def test_creates_universe(self, filename):
"""Check that Universe works with this Parser"""
u = mda.Universe(filename)
assert isinstance(u, mda.Universe)

def test_pathlib_input(self, filename):
"""Check that pathlib.Path objects are accepted"""
import pathlib
path = pathlib.Path(filename)
u_str = mda.Universe(filename)
u_path = mda.Universe(path)
assert u_str.atoms.n_atoms == u_path.atoms.n_atoms
Loading