Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor gnps classes #169

Merged
merged 50 commits into from
Aug 29, 2023
Merged

Refactor gnps classes #169

merged 50 commits into from
Aug 29, 2023

Conversation

CunliangGeng
Copy link
Member

@CunliangGeng CunliangGeng commented Aug 25, 2023

This PR intends to check GNPS data, refactor classes and functions on downloading, extracting and loading GNPS data, add detailed docstring to explain how GNPS data is processed, and update unit tests for all types of GNPS data.

👉 It's easier to review the code commit by commit or class by class.
Though this PR has so many commits, actually the changes on GNPS classes followed the similar pattern.

Major changes

  • Removed dependency requests and only use httpx to handle http requests
  • Added dependency pytemoics to parse MGF file
  • Added the missing GNPS workflow (SNETS-V2) to enum GNPSFormat
  • Refactored all classes and functions in the module gnps, including GNPSDownloader, GNPSExtractor, GNPSAnnotationLoader, GNPSFileMappingLoader, GNPSMolecularFamilyLoader, GNPSSpectrumLoader, and functions from gnps_format.py
    • Updated relevant download/extract/load logics to consider all GNPS workflows in GNPSFormat
    • Added data validation before loading
    • Added detailed docstring for each GNPS data type
    • Fixed some bugs or invalid URLs
  • Refactored all unit tests relevant to the gnps module
    • Replaced minimal working test data with complete GNPS data (zip file)
    • Restructured and refactored module-level fixtures to make all tests consistent with each other and easier to understand
    • Refactored all tests to test all types of GNPS workflows
  • Added, removed or updated some util functions in utils.py

Expected failed tests

  • all tests in pairedomics folder
  • test_loader.py

Add the detection of GNPS workflow "METABOLOMICS-SNETS-V2"
- rename enum names
- change enum values to GNPS workflow names
- add docstrings
Add the scenario of GNPSFormat.Unknown
- replace workflow name with enum value
- update docstrings
- replace workflow name with enum value
- update docstrings
- move existing GNPS files
- add new file `ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip`
- add new file `ProteoSAFe-Unknown.zip`
Move some fixtures from top level to metabolomics level
- add tests for workflow SNETSV2 and Unknown
- use new fixtures `gnps_zip_files` and `gnps_file_mappings_files`
- use httpx to replace urllib
- remove private functions not needed any more
Add GNPSFormat checking
Checking GNPSFormat has been moved to `__init__`
- add detection of gnps website availability
- add tests for all supported GNPS workflows
- add detection of GNPS workflow in initiation
- change method `extract` to private `_extract`
- rewrite the private methods based on GNPS workflow types
- add detailed docstring to explain which file is extracted and renamed

update extractor
- update URL for GNPS USI
- add `_validate` to validate annotation file (.tsv)
- add `_load` method to modularise loading code
- change method `get_annotations` to property `annotations`
- update docstring
- add `_validate` to validate file mappings file
- add `_load*` methods to modularise loading code
- change method `mappings` to property
- add detailed docstring
-  add `_validate` method
- change method `families` to property
- refactor `_load` method to make loading code more modular
- add detailed docstring
- replace mgf parser with community pakcage `pyteomics`
- add `_validate` method
- add `_load` method
- add detailed docstring
…t_antismash_data`

to make sure deletion of exttract_path can always happen.
@CunliangGeng CunliangGeng self-assigned this Aug 25, 2023
Copy link
Contributor

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments :)

src/nplinker/metabolomics/gnps/gnps_format.py Show resolved Hide resolved
src/nplinker/metabolomics/gnps/gnps_format.py Show resolved Hide resolved
src/nplinker/utils.py Outdated Show resolved Hide resolved
src/nplinker/metabolomics/gnps/gnps_annotation_loader.py Outdated Show resolved Hide resolved
@CunliangGeng CunliangGeng merged commit 4d230e9 into dev Aug 29, 2023
2 of 4 checks passed
@CunliangGeng CunliangGeng deleted the refactor_gnps_classes branch August 29, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants