-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unique string IDs #142
Conversation
The following BGC attributes are updated: - product_prediction - mibig_bgc_class - smiles
Parameters before "/" are positional-only parameters, see https://docs.python.org/3/glossary.html#term-parameter.
to figure out how `spectrum_id` is set
Ready for review ;-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! I'm just a bit confused by the testing of NPLinker scores and Metcalf scores - it seems like some of the tests are duplicated, or almost at least?
strains.remove(bgc.strain) | ||
if bgc.strain is not None: | ||
strains.remove(bgc.strain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the scenario where the strain object of the bgc is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strain is None by default, so I guess this happens in all cases in which the strain is not set through the strain setter method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Giulia is right. The strain might be None when BGC instances are not created using BGC Loaders (in the loaders, the strain will be set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a huge work! Great :D
I added few minor comments, and I checked the code starting from the tests and trying then to dive deep into the internals. Since I don't know those parts of the package well, I am missing something here and there but overall I got the scope of the PR and I tried to check stuff like type hinting, doc strings, and comments. As I already mentioned, maybe next time it would be good to go through a PR like this together before reviewing it.
strains.remove(bgc.strain) | ||
if bgc.strain is not None: | ||
strains.remove(bgc.strain) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strain is None by default, so I guess this happens in all cases in which the strain is not set through the strain setter method
One is testing NPLinker object, another is testing Metcalf object. Though some tests look similar, they are serving different purposes and testing different objects. |
This is a very big PR with a lot of changes. It's very important to keep in mind that the purpose is to use unique string ID for scoring.
Brief background:
GCF.id
andGCF.gcf_id
Spectrum.id
andSpectrum.spectrum_id
MolecularFamily.id
andMolecularFamily.family_id
*.id
) are used for scoring part. But it caused a lot trouble because[gcf1, gcf2]
gcf1.id is 0; but[gcf2, gcf1]
gcf1.id will be 1.The solution is to only use string ID (unique) and remove the changeable integer ID.
Major Changes
change
GCF.gcf_id
,Specrum.spectrum_id
andMolecularFamily.family_id
to string IDuse string ID for scoring (
DataLinks
,LinkFinder
,MetcalfScoring
)remove the assignment of integer ID (
*.id
)remove the use of integer ID (
*.id
) inStrainCollection
add docstrings and unit tests for scoring related classes
add TODO comments to files for further refactoring
Important but not involved in this PR
*.id
from all objects