-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/dssp #4304
Feature/dssp #4304
Conversation
Hello @marinegor! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-06-13 05:15:32 UTC |
Linter Bot Results:Hi @marinegor! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
There's a problem I don't know the workaround for yet: prolines. In the original
However, in my implementation all prolines (and N-terminal residue) are assigned '-' (=unstructured loop). This is likely less correct than original implementation, but I'm not sure how to fix that.
|
…cture in trajectory
The prolines problem is now dealt with, exactly as I described above -- if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks already very promising.
For an initial quick review please see inline comments.
Additional requests
- Fix the tests: AttributeError: module 'MDAnalysisTests.datafiles' has no attribute 'DSSP'
- We also need a test with a trajectory.
- Compress the PDF files with bz2 or gz.
- Update CHANGELOG.
- Add a
.. versionadded:: 2.7.0
to the docs. - Fix PEP8 complaints.
Initially distinguishing H,E,- is good but eventually it would be good to distinguish other secondary structures, too, in particular 310 and π helices.
package/MDAnalysis/analysis/dssp.py
Outdated
h3 = h3 * ~np.roll(helix4, -1, 1) * ~helix4 # helix4 is higher prioritized | ||
h5 = h5 * ~np.roll(helix4, -1, 1) * ~helix4 # helix4 is higher prioritized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get 3_10 and π helix, too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain, but seems like yes (used this paper's Fig 1):
from MDAnalysis.analysis.dssp import DSSP
import MDAnalysis as mda
u = mda.Universe('./1FUR.pdb')
r = DSSP(u).run()
r.results.resids[151:158]
# array([155, 156, 157, 158, 159, 160, 161])
''.join(r.results.dssp[0])[151:158]
# 'HHHH-HH'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can detect 3-10 and pi helixes indeed -- I generated idealized structures with mmtbx
from here and ran DSSP on them:
dssp/o_beta_seq.pdb --------------------
dssp/o_helix310_seq.pdb -HHHHHHHHHHHHHHHHHH-
dssp/o_helix_seq.pdb -HHHHHHHHHHHHHHHHHH-
o_helix_pi_seq.pdb -HHHHHHHHHHHHHHHHHH-
manual test a top level, should not be in source
@IAlibay would you mind taking a quick final glance at the DSSP PR? I think your comments were addressed but would prefer you confirm instead of me dismissing a totally valid review. Would be great if we could finally get that one in. |
@IAlibay any comments on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the long review delay.
I just have the one docstring confusion, once addressed then it's good to go.
Tagging @orbeckst here - please do dismiss my review if this gets addressed and I don't get a chance to re-review.
Hi @IAlibay -- couldn't follow your link, maybe you explain in more detail? I updated class's documentation slightly to match the description of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marinegor and @IAlibay I think I understand the point of your discussion: As far as I can tell, I agree with @IAlibay that the current ValueError
check does not exactly check that there's exactly one HN. If we could change the code then that would be safer, please see comments.
I think I also found a potential issue with how _heavy_atoms
and _hydrogens
are constructed. Please check.
self._heavy_atoms: dict[str, "AtomGroup"] = { | ||
t: ag.atoms[ | ||
np.isin( | ||
ag.names, t.split() | ||
) # need split() since `np.isin` takes an iterable as second argument | ||
# and "N".split() -> ["N"] | ||
] | ||
for t in heavyatom_names | ||
} | ||
self._hydrogens: list["AtomGroup"] = [ | ||
res.atoms.select_atoms(f"name {hydrogen_name}") for res in ag.residues | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the following code I see that there's the assumption that the order of atoms in _heavy_atoms["CA"]
is the same as in _hydrogens
.
However, that may not be true: _hydrogens
is created from a select_atoms()
which always sorts the atoms by index in increasing order whereas ag's in _heavy_atoms
are created by look up and slicing. If the original input ag
is NOT ORDERED then _hydrogens
will be ORDERED whereas _heavy_atoms
remain in the CUSTOM ORDER of the original ag. At least that's what it looks like to me.
It may be safer to just sort the original ag
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be safer to just sort the original ag first.
but there's line 274 that says:
ag: AtomGroup = atoms.select_atoms("protein")
and then all operations are on this ag
. I even think that you suggested this, but I'm not entirely sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right: ag
is guaranteed to be ordered.
Maybe add a comment somewhere that the code is guaranteed to work because ag
is ordered. Just in case someone later wants to make it more general (eg so that one doesn't have to rely on MDAnalysis's definition of "protein").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on Dr. Richard Hipp's obsession with code comments https://discord.com/channels/807348386012987462/807712498249236520/1237819983833337917
@marinegor do you have any question regarding my last review? Please ping me if you need feedback or want to discuss anything. |
Co-authored-by: Oliver Beckstein <orbeckst@gmail.com>
@orbeckst sorry for some silence -- I don't have any questions, and also have added few lines that improve the guess-hydrogens check, following your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy! Great work, @marinegor !
self._heavy_atoms: dict[str, "AtomGroup"] = { | ||
t: ag.atoms[ | ||
np.isin( | ||
ag.names, t.split() | ||
) # need split() since `np.isin` takes an iterable as second argument | ||
# and "N".split() -> ["N"] | ||
] | ||
for t in heavyatom_names | ||
} | ||
self._hydrogens: list["AtomGroup"] = [ | ||
res.atoms.select_atoms(f"name {hydrogen_name}") for res in ag.residues | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right: ag
is guaranteed to be ordered.
Maybe add a comment somewhere that the code is guaranteed to work because ag
is ordered. Just in case someone later wants to make it more general (eg so that one doesn't have to rely on MDAnalysis's definition of "protein").
self._heavy_atoms: dict[str, "AtomGroup"] = { | ||
t: ag.atoms[ | ||
np.isin( | ||
ag.names, t.split() | ||
) # need split() since `np.isin` takes an iterable as second argument | ||
# and "N".split() -> ["N"] | ||
] | ||
for t in heavyatom_names | ||
} | ||
self._hydrogens: list["AtomGroup"] = [ | ||
res.atoms.select_atoms(f"name {hydrogen_name}") for res in ag.residues | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on Dr. Richard Hipp's obsession with code comments https://discord.com/channels/807348386012987462/807712498249236520/1237819983833337917
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just adding doc fixes) still ✅
@IAlibay I resolved most of the comments that you had because @marinegor had addressed them. There are only three or so for you to quickly look at. Would be great to get it shipped once you are satisfied. Cheers! |
- corrected docs for DSSP.results.dssp_ndarray and results.dssp - added docs for DSSP.results.resids - more reST fixes and updates
Congratulations @marinegor , a big effort merged and one of the oldest outstanding issues closed! Well done. Thank you for all your work! |
Thanks for your support and guidance, that was fun!
|
Fixes #1612
Changes made in this Pull Request:
MDAnalysis.analysis.dssp.DSSP
class for secondary structure analysis, using code implemented inpydssp
package available for secondary structure annotationPR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4304.org.readthedocs.build/en/4304/