Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor initialisation of project root and data folders [Track issue] #163

Closed
12 tasks done
CunliangGeng opened this issue Jul 14, 2023 · 0 comments
Closed
12 tasks done

Comments

@CunliangGeng
Copy link
Member

CunliangGeng commented Jul 14, 2023

The following work has been updated on the community meeting on March 5th 2024, see the slides for detail.

Tasks

Project directory structure

NPLinker uses this directory structure to define the default paths.

For clarity, unnecessary folders and/or files are not displayed.
The displayed folders and files are required by NPLinker.

root_dir
    │
    ├── nplinker.toml
    ├── strain_mappings.json
    ├── strains_selected.json
    │
    ├── downloads
    │       ├── paired_datarecord_4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.json
    │       ├── GCF_000016425.1.zip
    │       ├── GCF_0000514975.1.zip
    │       ├── c22f44b14a3d450eb836d607cb9521bb.zip
    │       ├── genome_status.json
    │       └── mibig_json_3.1.tar.gz
    │
    ├── gnps
    │       ├── spectra.mgf
    │       ├── molecular_families.tsv
    │       ├── annotations.tsv
    │       └── file_mappings.tsv
    │
    ├── antismash
    │   ├── GCF_000016425.1
    │   │   ├── xxx.region001.gbk
    │   │   └── ...
    │   ├── GCF_000016425.1
    │   │   ├── xxxx.region001.gbk
    │   │   └── ...
    │   └── ...
    │
    ├── bigscape
    │   ├── mix_clustering_c0.30.tsv
    │   └── bigscape_running_output
    │       └── ...
    │
    ├── mibig
    │   ├── BGC0000001.json
    │   ├── BGC0000002.json
    │   └── ...
    │
    └── ...
@CunliangGeng CunliangGeng added this to dev Feb 16, 2024
@CunliangGeng CunliangGeng moved this to In progress in dev Feb 16, 2024
@CunliangGeng CunliangGeng changed the title project folder creation Refactor initialisation of project root and data folders [Track issue] Feb 23, 2024
CunliangGeng added a commit that referenced this issue Mar 5, 2024
This is a big PR to implement the pipelines of data arranging, which enables the local and podp modes.

Arranging data means
- creating data folders in the `root_dir`
-  downloading dataset if needed (e.g. for podp mode)
-  validating dataset downloaded or provided by users

Basically, it means all steps needed to make data ready for loading. 

The pipelines of arranging data for different types of data are displayed in the diagram of #117.

To keep the data arranging workflow simple, we use fixed project directory structure (see #163) with fixed dir and file names (see `globals.py`).

To use nplinker, users are required to
- create a `root_dir` manually and use it as the root directory of the nplinker project
- provide a config file `nplinker.toml` and put it in the `root_dir` 

**Major changes**
- Added file `arranger.py` including the class `DatasetArranger ` and some validation functions, which implement the pipelines of arranging data

- Clean/remove/update some files to make the arrangers work (some may need further refactoring in future PRs)
    -  cleaned `runbigscape.py`
    - Deleted `downloader.py` and its tests, which is replaced by `DatasetArranger`
    - Updated `loader.py` and `nplinker.py` to use the `DatasetArranger`

- Added integration tests for the arranger (tests passed)
  - Created `nplinker_local_mode.toml`
  - Updated `tests/conftest.py` 
   - Updated `test_nplinker_local.py` to test the `local mode` 
 

Tests on podp mode also passed on my local machine. Due to the cost of running bigscape, the tests will be added to the codebase in next PRs.
@github-project-automation github-project-automation bot moved this from In progress to Done in dev Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

1 participant