From ca0a4b71569e95436ccd360df2178dae34c745c0 Mon Sep 17 00:00:00 2001
From: Cunliang Geng <c.geng@esciencecenter.nl>
Date: Thu, 12 Sep 2024 13:34:08 +0100
Subject: [PATCH] Deployed ab45190 to dev with MkDocs 1.6.0 and mike 2.1.1

---
 dev/install/index.html       | 2 +-
 dev/search/search_index.json | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/dev/install/index.html b/dev/install/index.html
index 82f7c780..65c7e69a 100644
--- a/dev/install/index.html
+++ b/dev/install/index.html
@@ -1535,7 +1535,7 @@ <h1>Installation</h1>
 <details class="- note">
 <summary>Requirements</summary>
 <ul>
-<li>Linux, MacOS and <a href="https://learn.microsoft.com/en-us/windows/wsl/">Windows with WSL</a></li>
+<li>Linux, MacOS or <a href="https://learn.microsoft.com/en-us/windows/wsl/">Windows with WSL</a></li>
 <li>Python version ≥3.9</li>
 </ul>
 </details>
diff --git a/dev/search/search_index.json b/dev/search/search_index.json
index 974ec4c5..b9066f8e 100644
--- a/dev/search/search_index.json
+++ b/dev/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"NPLinker","text":"<p>NPLinker is a python framework for data mining microbial natural products by integrating genomics and metabolomics data.</p> <p>For a deep understanding of NPLinker, please refer to the original paper.</p> <p>Under Development</p> <p>NPLinker v2 is under active development (see its pre-releases).  The documentation is not complete yet. If you have any questions, please contact us via GitHub Issues.</p>"},{"location":"install/","title":"Installation","text":"Requirements <ul> <li>Linux, MacOS and Windows with WSL</li> <li>Python version \u22653.9</li> </ul> <p>NPLinker is a python package that has both pypi packages and non-pypi packages as dependencies. It  requires ~4.5GB of disk space to install all the dependencies. </p> <p>Install <code>nplinker</code> package as following:</p> Install nplinker package<pre><code># Check python version (\u22653.9)\npython --version\n\n# Create a new virtual environment\npython -m venv env          # (1)!\nsource env/bin/activate     # (2)! \n\n# install nplinker package (requiring ~300MB of disk space)\npip install --pre nplinker # (3)!\n\n# install nplinker non-pypi dependencies and databases (~4GB)\ninstall-nplinker-deps\n</code></pre> <ol> <li>A virtual environment is required to install the the non-pypi dependencies. You can also use <code>conda</code> to create a new environment. But NPLinker is not available on conda yet.</li> <li>Check <code>pip</code> command and make sure it is provided by the activated virtual environment. </li> <li>NPLinker v2 is still under development and released as pre-release. To install the pre-release, you need the <code>--pre</code> option. </li> </ol>"},{"location":"install/#install-from-source-code","title":"Install from source code","text":"<p>You can also install NPLinker from source code:</p> Install from latest source code<pre><code>pip install git+https://github.com/nplinker/nplinker@dev  # (1)!\ninstall-nplinker-deps\n</code></pre> <ol> <li>The <code>@dev</code> is the branch name. You can replace it with the branch name, commit or tag.</li> </ol>"},{"location":"logging/","title":"How to setup logging","text":"<p>NPLinker uses the standard library logging  module for managing log messages and the python library rich  to colorize the log messages. Depending on how you use NPLinker, you can set up logging in different ways.</p>"},{"location":"logging/#nplinker-as-an-application","title":"NPLinker as an application","text":"<p>If you're using NPLinker as an application, you're running the whole workflow of NPLinker as  described in the Quickstart. In this case, you can set up logging in the nplinker  configuration file <code>nplinker.toml</code>. </p>"},{"location":"logging/#nplinker-as-a-library","title":"NPLinker as a library","text":"<p>If you're using NPLinker as a library, you're using only some functions and classes of NPLinker in  your script. By default, NPLinker will not log any messages. However, you can set up logging in your script to log messages. </p> Set up logging in 'your_script.py'<pre><code># Set up logging configuration first\nfrom nplinker import setup_logging\n\nsetup_logging(level=\"DEBUG\", file=\"nplinker.log\", use_console=True) # (1)!\n\n# Your business code here\n# e.g. download and extract nplinker example data\nfrom nplinker.utils import download_and_extract_archive\n\ndownload_and_extract_archive(\n    url=\"https://zenodo.org/records/10822604/files/nplinker_local_mode_example.zip\",\n    download_root=\".\",\n)\n</code></pre> <ol> <li>The <code>setup_logging</code> function sets up the logging configuration. The <code>level</code> argument sets the     logging level. The <code>file</code> argument sets the log file. The <code>use_console</code> argument sets whether to     log messages to the console.</li> </ol> <p>The log messages will be written to the log file <code>nplinker.log</code> and displayed in the console with a  format like this: <code>[Date Time] Level Log-message Module:Line</code>.</p> Run your script in a terminal<pre><code># Run your script\n$ python your_script.py\nDownloading nplinker_local_mode_example.zip \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100.0% \u2022 195.3/195.3 MB \u2022 2.6 MB/s \u2022 0:00:00 \u2022 0:01:02 # (1)!\n[2024-05-10 15:14:48] INFO     Extracting nplinker_local_mode_example.zip to .                      utils.py:401\n\n# Check the log file\n$ cat nplinker.log\n[2024-05-10 15:14:48] INFO     Extracting nplinker_local_mode_example.zip to .                      utils.py:401\n</code></pre> <ol> <li>This is a progress bar but not a log message.</li> </ol>"},{"location":"quickstart/","title":"Quickstart","text":"<p>NPLinker allows you to run in two modes:</p> <code>local</code> mode<code>podp</code> mode <p>The <code>local</code> mode assumes that the data required by NPLinker is available on your local machine.</p> <p>The required input data includes:</p> <ul> <li>GNPS molecular networking data from one of the following GNPS workflows<ul> <li><code>METABOLOMICS-SNETS</code>,</li> <li><code>METABOLOMICS-SNETS-V2</code></li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code></li> </ul> </li> <li>AntiSMASH BGC data</li> <li>BigScape data (optional)</li> </ul> <p>The <code>podp</code> mode assumes that you use an identifier of Paired Omics Data Platform (PODP) as the input for NPLinker. Then NPLinker will download and prepare all data necessary based on the PODP id which refers to the metadata of the dataset.</p> <p>So, which mode will you use? The answer is important for the next steps.</p>"},{"location":"quickstart/#1-create-a-working-directory","title":"1. Create a working directory","text":"<p>The working directory is used to store all input and output data for NPLinker. You can name this directory as you like, for example <code>nplinker_quickstart</code>:</p> Create a working directory<pre><code>mkdir nplinker_quickstart\n</code></pre> <p>Important</p> <p>Before going to the next step, make sure you get familiar with how NPLinker organizes data in the working directory, see Working Directory Structure page.</p>"},{"location":"quickstart/#2-prepare-input-data-local-mode-only","title":"2. Prepare input data (<code>local</code> mode only)","text":"Details <p>Skip this step if you choose to use the <code>podp</code> mode.</p> <p>If you choose to use the <code>local</code> mode, meaning you have input data of NPLinker stored on your local machine, you need to move the input data to the working directory created in the previous step.</p>"},{"location":"quickstart/#gnps-data","title":"GNPS data","text":"<p>NPLinker accepts data from the output of the following GNPS workflows:</p> <ul> <li><code>METABOLOMICS-SNETS</code></li> <li><code>METABOLOMICS-SNETS-V2</code></li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code>.</li> </ul> <p>NPLinker provides the tools <code>GNPSDownloader</code> and <code>GNPSExtractor</code> to download and extract the GNPS data with ease. What you need to give is a valid GNPS task ID, referring to a task of the GNPS workflows supported by NPLinker.</p> GNPS task id and workflow <p>Given an example of GNPS task at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c22f44b14a3d450eb836d607cb9521bb, the task id is the last part of this url, i.e. <code>c22f44b14a3d450eb836d607cb9521bb</code>. Open this link, you can find the worklow info at the row \"Workflow\" of the table \"Job Status\", for this case, it is <code>METABOLOMICS-SNETS</code>.</p> Download &amp; Extract GNPS data<pre><code>from nplinker.metabolomics.gnps import GNPSDownloader, GNPSExtractor\n\n# Go to the working directory\ncd nplinker_quickstart\n\n# Download GNPS data &amp; get the path to the downloaded archive\ndownloader = GNPSDownloader(\"gnps_task_id\", \"downloads\") # (1)!\ndownloaded_archive = downloader.download().get_download_file()\n\n# Extract GNPS data to `gnps` directory\nextractor = GNPSExtractor(downloaded_archive, \"gnps\") # (2)!\n</code></pre> <ol> <li>If you already have the downloaded archive of GNPS data, you can skip the download steps.</li> <li>Replace <code>downloaded_archive</code> with the actuall path to your GNPS data archive if you skipped the download steps.</li> </ol> <p>The required data for NPLinker will be extracted to the <code>gnps</code> subdirectory of the working directory.</p> <p>Info</p> <p>Not all GNPS data are required by NPLinker, and only the necessary data will be extracted. During the extraction, these data will be renamed to the standard names used by NPLinker. See the page GNPS Data for more information.</p> Prepare GNPS data manually <p>If you have GNPS data but it is not the archive format as downloaded from GNPS, it's recommended to re-download the data from GNPS.</p> <p>If (re-)downloading is not possible, you could manually prepare data for the <code>gnps</code> directory. In this case, you must make sure that the data is organized as expected by NPLinker. See the page GNPS Data for examples of how to prepare the data.</p>"},{"location":"quickstart/#antismash-data","title":"AntiSMASH data","text":"<p>NPLinker requires AntiSMASH BGC data as input, which are organized in the <code>antismash</code> subdirectory of  the working directory.</p> <p>For each output of AntiSMASH run, the BGC data must be stored in a subdirectory named after the NCBI accession number (e.g. <code>GCF_000514975.1</code>). And only the <code>*.region*.gbk</code> files are required by NPLinker.</p> <p>When manually preparing AntiSMASH data for NPLinker, you must make sure that the data is organized as expected by NPLinker. See the page Working Directory Structure for more information.</p>"},{"location":"quickstart/#bigscape-data-optional","title":"BigScape data (optional)","text":"<p>It is optional to provide the output of BigScape to NPLinker. If the output of BigScape is not provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC data.</p> <p>If you have the output of BigScape, you can put its <code>mix_clustering_c{cutoff}.tsv</code> file in the <code>bigscape</code> subdirectory of the NPLinker working directory, where <code>{cutoff}</code> is the cutoff value used in the BigScape run.</p>"},{"location":"quickstart/#strain-mappings-file","title":"Strain mappings file","text":"<p>The strain mappings file <code>strain_mapping.json</code> is required by NPLinker to map the strain to genomics and metabolomics data. </p> `strain_mappings.json` example<pre><code>{\n    \"strain_mappings\": [\n        {\n            \"strain_id\": \"strain_id_1\", # (1)!\n            \"strain_alias\": [\"bgc_id_1\", \"spectrum_id_1\", ...] # (2)!\n        },\n        {\n            \"strain_id\": \"strain_id_2\",\n            \"strain_alias\": [\"bgc_id_2\", \"spectrum_id_2\", ...]\n        },\n        ...\n    ],\n    \"version\": \"1.0\" # (3)!\n}\n</code></pre> <ol> <li><code>strain_id</code> is the unique identifier of the strain.</li> <li><code>strain_alias</code> is a list of aliases of the strain, which are the identifiers of the BGCs and spectra of the strain.</li> <li><code>version</code> is the schema version of this file. It is recommended to use the latest version of the schema. The current latest version is <code>1.0</code>. </li> </ol> <p>The BGC id is same as the name of the BGC file in the <code>antismash</code> directory, for example, given a  BGC file <code>xxxx.region001.gbk</code>, the BGC id is <code>xxxx.region001</code>.</p> <p>The spectrum id is same as the scan number in the <code>spectra.mgf</code> file in the <code>gnps</code> directory,  for example, given a spectrum in the mgf file with a scan <code>SCANS=1</code>, the spectrum id is <code>1</code>. </p> <p>If you labelled the mzXML files (input for GNPS) with the strain id, you may need the function  extract_mappings_ms_filename_spectrum_id  to extract the mappings from mzXML files to the spectrum ids.</p> <p>For the <code>local</code> mode, you need to create this file manually and put it in the working directory. It takes some effort to prepare this file manually, especially when you have a large number of strains.</p>"},{"location":"quickstart/#3-prepare-config-file","title":"3. Prepare config file","text":"<p>The configuration file <code>nplinker.toml</code> is required by NPLinker to specify the working directory, mode, and other settings for the run of NPLinker. You can put the <code>nplinker.toml</code> file in any place, but it  is recommended to put it in the working directory created in step 2.</p> <p>The details of all settings can be found at this page Config File.</p> <p>To keep it simple, default settings will be used  automatically by NPLinker if you don't set them in your <code>nplinker.toml</code> config file.</p> <p>What you need to do is to set the <code>root_dir</code> and <code>mode</code> in the <code>nplinker.toml</code> file.</p> <code>local</code> mode<code>podp</code> mode nplinker.toml<pre><code>root_dir = \"absolute/path/to/working/directory\" # (1)!\nmode = \"local\"\n# and other settings you want to override the default settings \n</code></pre> <ol> <li>Replace <code>absolute/path/to/working/directory</code> with the absolute path to the working directory    created in step 2.</li> </ol> nplinker.toml<pre><code>root_dir = \"absolute/path/to/working/directory\" # (1)!\nmode = \"podp\"\npodp_id = \"podp_id\" # (2)!\n# and other settings you want to override the default settings \n</code></pre> <ol> <li>Replace <code>absolute/path/to/working/directory</code> with the absolute path to the working directory    created in step 2.</li> <li>Replace <code>podp_id</code> with the identifier of the dataset in the Paired Omics Data Platform (PODP).</li> </ol>"},{"location":"quickstart/#4-run-nplinker","title":"4. Run NPLinker","text":"<p>Before running NPLinker, make sure your working directory has the correct directory structure and names as described in the Working Directory Structure page.</p> Run NPLinker in your working directory<pre><code>from nplinker import NPLinker\n\n# create an instance of NPLinker\nnpl = NPLinker(\"nplinker.toml\") # (1)!\n\n# load data\nnpl.load_data()\n\n# check loaded data\nprint(npl.bgcs)\nprint(npl.gcfs)\nprint(npl.spectra)\nprint(npl.mfs)\nprint(npl.strains)\n\n# compute the links for the first 3 GCFs using metcalf scoring method\nlink_graph = npl.get_links(npl.gcfs[:3], \"metcalf\")  # (2)!\n\n# get links as a list of tuples\nlink_graph.links \n\n# get the link data between two objects or entities\nlink_graph.get_link_data(npl.gcfs[0], npl.spectra[0]) \n\n# Save data to a pickle file\nnpl.save_data(\"npl.pkl\", link_graph)\n</code></pre> <ol> <li>Replace <code>nplinker.toml</code> with the actual path to your configuration file.</li> <li>The <code>get_links</code> returns a LinkGraph object that     represents the calculated links between the GCFs and other entities as a graph.</li> </ol> <p>For more info about the classes and methods, see the API Documentation.</p>"},{"location":"api/antismash/","title":"AntiSMASH","text":""},{"location":"api/antismash/#nplinker.genomics.antismash","title":"nplinker.genomics.antismash","text":""},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader","title":"AntismashBGCLoader","text":"<pre><code>AntismashBGCLoader(data_dir: str | PathLike)\n</code></pre> <p>               Bases: <code>BGCLoaderBase</code></p> <p>Data loader for AntiSMASH BGC genbank (.gbk) files.</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to AntiSMASH directory that contains a collection of AntiSMASH outputs.</p> </li> </ul> Notes <p>The input <code>data_dir</code> must follow the structure defined in the Working Directory Structure for AntiSMASH data, e.g.: <pre><code>antismash\n    \u251c\u2500\u2500 genome_id_1                  # one AntiSMASH output, e.g. GCF_000514775.1\n    \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 genome_id_2\n    \u2502\u00a0 \u251c\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre></p> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def __init__(self, data_dir: str | PathLike) -&gt; None:\n    \"\"\"Initialize the AntiSMASH BGC loader.\n\n    Args:\n        data_dir: Path to AntiSMASH directory that contains a collection of AntiSMASH outputs.\n\n    Notes:\n        The input `data_dir` must follow the structure defined in the\n        [Working Directory Structure][working-directory-structure] for AntiSMASH data, e.g.:\n        ```shell\n        antismash\n            \u251c\u2500\u2500 genome_id_1                  # one AntiSMASH output, e.g. GCF_000514775.1\n            \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n            \u2502\u00a0 \u2514\u2500\u2500 ...\n            \u251c\u2500\u2500 genome_id_2\n            \u2502\u00a0 \u251c\u2500\u2500 ...\n            \u2514\u2500\u2500 ...\n        ```\n    \"\"\"\n    self.data_dir = str(data_dir)\n    self._file_dict = self._parse_data_dir(self.data_dir)\n    self._bgcs = self._parse_bgcs(self._file_dict)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.data_dir","title":"data_dir  <code>instance-attribute</code>","text":"<pre><code>data_dir = str(data_dir)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.get_bgc_genome_mapping","title":"get_bgc_genome_mapping","text":"<pre><code>get_bgc_genome_mapping() -&gt; dict[str, str]\n</code></pre> <p>Get the mapping from BGC to genome.</p> <p>Info</p> <p>The directory name of the gbk files is treated as genome id.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is BGC name (gbk file name) and value is genome id (the directory name of the</p> </li> <li> <code>dict[str, str]</code>           \u2013            <p>gbk file).</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def get_bgc_genome_mapping(self) -&gt; dict[str, str]:\n    \"\"\"Get the mapping from BGC to genome.\n\n    !!! info\n        The directory name of the gbk files is treated as genome id.\n\n    Returns:\n        The key is BGC name (gbk file name) and value is genome id (the directory name of the\n        gbk file).\n    \"\"\"\n    return {\n        bid: os.path.basename(os.path.dirname(bpath)) for bid, bpath in self._file_dict.items()\n    }\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.get_files","title":"get_files","text":"<pre><code>get_files() -&gt; dict[str, str]\n</code></pre> <p>Get BGC gbk files.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is BGC name (gbk file name) and value is path to the gbk file.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def get_files(self) -&gt; dict[str, str]:\n    \"\"\"Get BGC gbk files.\n\n    Returns:\n        The key is BGC name (gbk file name) and value is path to the gbk file.\n    \"\"\"\n    return self._file_dict\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.get_bgcs","title":"get_bgcs","text":"<pre><code>get_bgcs() -&gt; list[BGC]\n</code></pre> <p>Get all BGC objects.</p> <p>Returns:</p> <ul> <li> <code>list[BGC]</code>           \u2013            <p>A list of BGC objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def get_bgcs(self) -&gt; list[BGC]:\n    \"\"\"Get all BGC objects.\n\n    Returns:\n        A list of BGC objects\n    \"\"\"\n    return self._bgcs\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus","title":"GenomeStatus","text":"<pre><code>GenomeStatus(\n    original_id: str,\n    resolved_refseq_id: str = \"\",\n    resolve_attempted: bool = False,\n    bgc_path: str = \"\",\n)\n</code></pre> <p>Class to represent the status of a single genome.</p> <p>The status of genomes is tracked in the file GENOME_STATUS_FILENAME.</p> <p>Parameters:</p> <ul> <li> <code>original_id</code>               (<code>str</code>)           \u2013            <p>The original ID of the genome.</p> </li> <li> <code>resolved_refseq_id</code>               (<code>str</code>, default:                   <code>''</code> )           \u2013            <p>The resolved RefSeq ID of the genome. Defaults to \"\".</p> </li> <li> <code>resolve_attempted</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>A flag indicating whether an attempt to resolve the RefSeq ID has been made. Defaults to False.</p> </li> <li> <code>bgc_path</code>               (<code>str</code>, default:                   <code>''</code> )           \u2013            <p>The path to the downloaded BGC file for the genome. Defaults to \"\".</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>def __init__(\n    self,\n    original_id: str,\n    resolved_refseq_id: str = \"\",\n    resolve_attempted: bool = False,\n    bgc_path: str = \"\",\n):\n    \"\"\"Initialize a GenomeStatus object for the given genome.\n\n    Args:\n        original_id: The original ID of the genome.\n        resolved_refseq_id: The resolved RefSeq ID of the\n            genome. Defaults to \"\".\n        resolve_attempted: A flag indicating whether an\n            attempt to resolve the RefSeq ID has been made. Defaults to False.\n        bgc_path: The path to the downloaded BGC file for\n            the genome. Defaults to \"\".\n    \"\"\"\n    self.original_id = original_id\n    self.resolved_refseq_id = \"\" if resolved_refseq_id == \"None\" else resolved_refseq_id\n    self.resolve_attempted = resolve_attempted\n    self.bgc_path = bgc_path\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.original_id","title":"original_id  <code>instance-attribute</code>","text":"<pre><code>original_id = original_id\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.resolved_refseq_id","title":"resolved_refseq_id  <code>instance-attribute</code>","text":"<pre><code>resolved_refseq_id = (\n    \"\"\n    if resolved_refseq_id == \"None\"\n    else resolved_refseq_id\n)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.resolve_attempted","title":"resolve_attempted  <code>instance-attribute</code>","text":"<pre><code>resolve_attempted = resolve_attempted\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.bgc_path","title":"bgc_path  <code>instance-attribute</code>","text":"<pre><code>bgc_path = bgc_path\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.read_json","title":"read_json  <code>staticmethod</code>","text":"<pre><code>read_json(\n    file: str | PathLike,\n) -&gt; dict[str, \"GenomeStatus\"]\n</code></pre> <p>Get a dict of GenomeStatus objects by loading given genome status file.</p> <p>Note that an empty dict is returned if the given file doesn't exist.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to genome status file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, 'GenomeStatus']</code>           \u2013            <p>Dict keys are genome original id and values are GenomeStatus objects. An empty dict is returned if the given file doesn't exist.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>@staticmethod\ndef read_json(file: str | PathLike) -&gt; dict[str, \"GenomeStatus\"]:\n    \"\"\"Get a dict of GenomeStatus objects by loading given genome status file.\n\n    Note that an empty dict is returned if the given file doesn't exist.\n\n    Args:\n        file: Path to genome status file.\n\n    Returns:\n        Dict keys are genome original id and values are GenomeStatus\n            objects. An empty dict is returned if the given file doesn't exist.\n    \"\"\"\n    genome_status_dict = {}\n    if Path(file).exists():\n        with open(file, \"r\") as f:\n            data = json.load(f)\n\n        # validate json data before using it\n        validate(data, schema=GENOME_STATUS_SCHEMA)\n\n        genome_status_dict = {\n            gs[\"original_id\"]: GenomeStatus(**gs) for gs in data[\"genome_status\"]\n        }\n    return genome_status_dict\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.to_json","title":"to_json  <code>staticmethod</code>","text":"<pre><code>to_json(\n    genome_status_dict: Mapping[str, \"GenomeStatus\"],\n    file: str | PathLike | None = None,\n) -&gt; str | None\n</code></pre> <p>Convert the genome status dictionary to a JSON string.</p> <p>If a file path is provided, the JSON string is written to the file. If the file already exists, it is overwritten.</p> <p>Parameters:</p> <ul> <li> <code>genome_status_dict</code>               (<code>Mapping[str, 'GenomeStatus']</code>)           \u2013            <p>A dictionary of genome status objects. The keys are the original genome IDs and the values are GenomeStatus objects.</p> </li> <li> <code>file</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>The path to the output JSON file. If None, the JSON string is returned but not written to a file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str | None</code>           \u2013            <p>The JSON string if <code>file</code> is None, otherwise None.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>@staticmethod\ndef to_json(\n    genome_status_dict: Mapping[str, \"GenomeStatus\"], file: str | PathLike | None = None\n) -&gt; str | None:\n    \"\"\"Convert the genome status dictionary to a JSON string.\n\n    If a file path is provided, the JSON string is written to the file. If\n    the file already exists, it is overwritten.\n\n    Args:\n        genome_status_dict: A dictionary of genome\n            status objects. The keys are the original genome IDs and the values\n            are GenomeStatus objects.\n        file: The path to the output JSON file.\n            If None, the JSON string is returned but not written to a file.\n\n    Returns:\n        The JSON string if `file` is None, otherwise None.\n    \"\"\"\n    gs_list = [gs._to_dict() for gs in genome_status_dict.values()]\n    json_data = {\"genome_status\": gs_list, \"version\": \"1.0\"}\n\n    # validate json object before dumping\n    validate(json_data, schema=GENOME_STATUS_SCHEMA)\n\n    if file is not None:\n        with open(file, \"w\") as f:\n            json.dump(json_data, f)\n        return None\n    return json.dumps(json_data)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.download_and_extract_antismash_data","title":"download_and_extract_antismash_data","text":"<pre><code>download_and_extract_antismash_data(\n    antismash_id: str,\n    download_root: str | PathLike,\n    extract_root: str | PathLike,\n) -&gt; None\n</code></pre> <p>Download and extract antiSMASH BGC archive for a specified genome.</p> <p>The antiSMASH database (https://antismash-db.secondarymetabolites.org/) is used to download the BGC archive. And antiSMASH use RefSeq assembly id of a genome as the id of the archive.</p> <p>Parameters:</p> <ul> <li> <code>antismash_id</code>               (<code>str</code>)           \u2013            <p>The id used to download BGC archive from antiSMASH database. If the id is versioned (e.g., \"GCF_004339725.1\") please be sure to specify the version as well.</p> </li> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory to place downloaded archive in.</p> </li> <li> <code>extract_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory data files will be extracted to. Note that an <code>antismash</code> directory will be created in the specified <code>extract_root</code> if it doesn't exist. The files will be extracted to <code>&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;</code> directory.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>if <code>&lt;extract_root&gt;/antismash/&lt;refseq_assembly_id&gt;</code> dir is not empty.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; download_and_extract_antismash_metadata(\"GCF_004339725.1\", \"/data/download\", \"/data/extracted\")\n</code></pre> Source code in <code>src/nplinker/genomics/antismash/antismash_downloader.py</code> <pre><code>def download_and_extract_antismash_data(\n    antismash_id: str, download_root: str | PathLike, extract_root: str | PathLike\n) -&gt; None:\n    \"\"\"Download and extract antiSMASH BGC archive for a specified genome.\n\n    The antiSMASH database (https://antismash-db.secondarymetabolites.org/)\n    is used to download the BGC archive. And antiSMASH use RefSeq assembly id\n    of a genome as the id of the archive.\n\n    Args:\n        antismash_id: The id used to download BGC archive from antiSMASH database.\n            If the id is versioned (e.g., \"GCF_004339725.1\") please be sure to\n            specify the version as well.\n        download_root: Path to the directory to place downloaded archive in.\n        extract_root: Path to the directory data files will be extracted to.\n            Note that an `antismash` directory will be created in the specified `extract_root` if\n            it doesn't exist. The files will be extracted to `&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;` directory.\n\n    Raises:\n        ValueError: if `&lt;extract_root&gt;/antismash/&lt;refseq_assembly_id&gt;` dir is not empty.\n\n    Examples:\n        &gt;&gt;&gt; download_and_extract_antismash_metadata(\"GCF_004339725.1\", \"/data/download\", \"/data/extracted\")\n    \"\"\"\n    download_root = Path(download_root)\n    extract_root = Path(extract_root)\n    extract_path = extract_root / \"antismash\" / antismash_id\n\n    try:\n        if extract_path.exists():\n            _check_extract_path(extract_path)\n        else:\n            extract_path.mkdir(parents=True, exist_ok=True)\n\n        for base_url in [ANTISMASH_DB_DOWNLOAD_URL, ANTISMASH_DBV2_DOWNLOAD_URL]:\n            url = base_url.format(antismash_id, antismash_id + \".zip\")\n            download_and_extract_archive(url, download_root, extract_path, antismash_id + \".zip\")\n            break\n\n        # delete subdirs\n        for subdir_path in list_dirs(extract_path):\n            shutil.rmtree(subdir_path)\n\n        # delete unnecessary files\n        files_to_keep = list_files(extract_path, suffix=(\".json\", \".gbk\"))\n        for file in list_files(extract_path):\n            if file not in files_to_keep:\n                os.remove(file)\n\n        logger.info(\"antiSMASH BGC data of %s is downloaded and extracted.\", antismash_id)\n\n    except Exception as e:\n        shutil.rmtree(extract_path)\n        logger.warning(e)\n        raise e\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.parse_bgc_genbank","title":"parse_bgc_genbank","text":"<pre><code>parse_bgc_genbank(file: str | PathLike) -&gt; BGC\n</code></pre> <p>Parse a single BGC gbk file to BGC object.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to BGC gbk file</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>BGC</code>           \u2013            <p>BGC object</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; bgc = AntismashBGCLoader.parse_bgc(\n...    \"/data/antismash/GCF_000016425.1/NC_009380.1.region001.gbk\")\n</code></pre> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def parse_bgc_genbank(file: str | PathLike) -&gt; BGC:\n    \"\"\"Parse a single BGC gbk file to BGC object.\n\n    Args:\n        file: Path to BGC gbk file\n\n    Returns:\n        BGC object\n\n    Examples:\n        &gt;&gt;&gt; bgc = AntismashBGCLoader.parse_bgc(\n        ...    \"/data/antismash/GCF_000016425.1/NC_009380.1.region001.gbk\")\n    \"\"\"\n    file = Path(file)\n    fname = file.stem\n\n    record = SeqIO.read(file, format=\"genbank\")\n    description = record.description  # \"DEFINITION\" in gbk file\n    antismash_id = record.id  # \"VERSION\" in gbk file\n    features = _parse_antismash_genbank(record)\n    product_prediction = features.get(\"product\")\n    if product_prediction is None:\n        raise ValueError(f\"Not found product prediction in antiSMASH Genbank file {file}\")\n\n    # init BGC\n    bgc = BGC(fname, *product_prediction)\n    bgc.description = description\n    bgc.antismash_id = antismash_id\n    bgc.antismash_file = str(file)\n    bgc.antismash_region = features.get(\"region_number\")\n    bgc.smiles = features.get(\"smiles\")\n    bgc.strain = Strain(fname)\n    return bgc\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.get_best_available_genome_id","title":"get_best_available_genome_id","text":"<pre><code>get_best_available_genome_id(\n    genome_id_data: Mapping[str, str]\n) -&gt; str | None\n</code></pre> <p>Get the best available ID from genome_id_data dict.</p> <p>Parameters:</p> <ul> <li> <code>genome_id_data</code>               (<code>Mapping[str, str]</code>)           \u2013            <p>dictionary containing information for each genome record present.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str | None</code>           \u2013            <p>ID for the genome, if present, otherwise None.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>def get_best_available_genome_id(genome_id_data: Mapping[str, str]) -&gt; str | None:\n    \"\"\"Get the best available ID from genome_id_data dict.\n\n    Args:\n        genome_id_data: dictionary containing information for each genome record present.\n\n    Returns:\n        ID for the genome, if present, otherwise None.\n    \"\"\"\n    if \"RefSeq_accession\" in genome_id_data:\n        best_id = genome_id_data[\"RefSeq_accession\"]\n    elif \"GenBank_accession\" in genome_id_data:\n        best_id = genome_id_data[\"GenBank_accession\"]\n    elif \"JGI_Genome_ID\" in genome_id_data:\n        best_id = genome_id_data[\"JGI_Genome_ID\"]\n    else:\n        best_id = None\n\n    if best_id is None or len(best_id) == 0:\n        logger.warning(f\"Failed to get valid genome ID in genome data: {genome_id_data}\")\n        return None\n    return best_id\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.podp_download_and_extract_antismash_data","title":"podp_download_and_extract_antismash_data","text":"<pre><code>podp_download_and_extract_antismash_data(\n    genome_records: Sequence[\n        Mapping[str, Mapping[str, str]]\n    ],\n    project_download_root: str | PathLike,\n    project_extract_root: str | PathLike,\n)\n</code></pre> <p>Download and extract antiSMASH BGC archive for the given genome records.</p> <p>Parameters:</p> <ul> <li> <code>genome_records</code>               (<code>Sequence[Mapping[str, Mapping[str, str]]]</code>)           \u2013            <p>list of dicts representing genome records.</p> <p>The dict of each genome record contains a key of genome ID with a value of another dict containing information about genome type, label and accession ids (RefSeq, GenBank, and/or JGI).</p> </li> <li> <code>project_download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory to place downloaded archive in.</p> </li> <li> <code>project_extract_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory downloaded archive will be extracted to.</p> <p>Note that an <code>antismash</code> directory will be created in the specified <code>extract_root</code> if it doesn't exist. The files will be extracted to <code>&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;</code> directory.</p> </li> </ul> <p>Warns:</p> <ul> <li> <code>UserWarning</code>             \u2013            <p>when no antiSMASH data is found for some genomes.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>def podp_download_and_extract_antismash_data(\n    genome_records: Sequence[Mapping[str, Mapping[str, str]]],\n    project_download_root: str | PathLike,\n    project_extract_root: str | PathLike,\n):\n    \"\"\"Download and extract antiSMASH BGC archive for the given genome records.\n\n    Args:\n        genome_records: list of dicts representing genome records.\n\n            The dict of each genome record contains a key of genome ID with a value\n            of another dict containing information about genome type, label and\n            accession ids (RefSeq, GenBank, and/or JGI).\n        project_download_root: Path to the directory to place\n            downloaded archive in.\n        project_extract_root: Path to the directory downloaded archive will be extracted to.\n\n            Note that an `antismash` directory will be created in the specified\n            `extract_root` if it doesn't exist. The files will be extracted to\n            `&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;` directory.\n\n    Warnings:\n        UserWarning: when no antiSMASH data is found for some genomes.\n    \"\"\"\n    if not Path(project_download_root).exists():\n        # otherwise in case of failed first download, the folder doesn't exist and\n        # genome_status_file can't be written\n        Path(project_download_root).mkdir(parents=True, exist_ok=True)\n\n    gs_file = Path(project_download_root, GENOME_STATUS_FILENAME)\n    gs_dict = GenomeStatus.read_json(gs_file)\n\n    for i, genome_record in enumerate(genome_records):\n        # get the best available ID from the dict\n        genome_id_data = genome_record[\"genome_ID\"]\n        raw_genome_id = get_best_available_genome_id(genome_id_data)\n        if raw_genome_id is None or len(raw_genome_id) == 0:\n            logger.warning(f'Invalid input genome record \"{genome_record}\"')\n            continue\n\n        # check if genome ID exist in the genome status file\n        if raw_genome_id not in gs_dict:\n            gs_dict[raw_genome_id] = GenomeStatus(raw_genome_id)\n\n        gs_obj = gs_dict[raw_genome_id]\n\n        logger.info(\n            f\"Checking for antismash data {i + 1}/{len(genome_records)}, \"\n            f\"current genome ID={raw_genome_id}\"\n        )\n        # first, check if BGC data is downloaded\n        if gs_obj.bgc_path and Path(gs_obj.bgc_path).exists():\n            logger.info(f\"Genome ID {raw_genome_id} already downloaded to {gs_obj.bgc_path}\")\n            continue\n        # second, check if lookup attempted previously\n        if gs_obj.resolve_attempted:\n            logger.info(f\"Genome ID {raw_genome_id} skipped due to previous failed attempt\")\n            continue\n\n        # if not downloaded or lookup attempted, then try to resolve the ID\n        # and download\n        logger.info(f\"Start lookup process for genome ID {raw_genome_id}\")\n        gs_obj.resolved_refseq_id = _resolve_refseq_id(genome_id_data)\n        gs_obj.resolve_attempted = True\n\n        if gs_obj.resolved_refseq_id == \"\":\n            # give up on this one\n            logger.warning(f\"Failed lookup for genome ID {raw_genome_id}\")\n            continue\n\n        # if resolved id is valid, try to download and extract antismash data\n        try:\n            download_and_extract_antismash_data(\n                gs_obj.resolved_refseq_id, project_download_root, project_extract_root\n            )\n\n            gs_obj.bgc_path = str(\n                Path(project_download_root, gs_obj.resolved_refseq_id + \".zip\").absolute()\n            )\n\n            output_path = Path(project_extract_root, \"antismash\", gs_obj.resolved_refseq_id)\n            if output_path.exists():\n                Path.touch(output_path / \"completed\", exist_ok=True)\n\n        except Exception:\n            gs_obj.bgc_path = \"\"\n\n    # raise and log warning for failed downloads\n    failed_ids = [gs.original_id for gs in gs_dict.values() if not gs.bgc_path]\n    if failed_ids:\n        warning_message = (\n            f\"Failed to download antiSMASH data for the following genome IDs: {failed_ids}\"\n        )\n        logger.warning(warning_message)\n        warnings.warn(warning_message, UserWarning)\n\n    # save updated genome status to json file\n    GenomeStatus.to_json(gs_dict, gs_file)\n\n    if len(failed_ids) == len(genome_records):\n        raise ValueError(\"No antiSMASH data found for any genome\")\n</code></pre>"},{"location":"api/arranger/","title":"Dataset Arranger","text":""},{"location":"api/arranger/#nplinker.arranger","title":"nplinker.arranger","text":""},{"location":"api/arranger/#nplinker.arranger.PODP_PROJECT_URL","title":"PODP_PROJECT_URL  <code>module-attribute</code>","text":"<pre><code>PODP_PROJECT_URL = \"https://pairedomicsdata.bioinformatics.nl/api/projects/{}\"\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger","title":"DatasetArranger","text":"<pre><code>DatasetArranger(config: Dynaconf)\n</code></pre> <p>Arrange datasets based on the fixed working directory structure with the given configuration.</p> Concept and Diagram <p>Working Directory Structure</p> <p>Dataset Arranging Pipeline</p> <p>\"Arrange datasets\" means:</p> <ul> <li>For <code>local</code> mode (<code>config.mode</code> is <code>local</code>), the datasets provided by users are validated.</li> <li>For <code>podp</code> mode (<code>config.mode</code> is <code>podp</code>), the datasets are automatically downloaded or     generated, then validated.</li> </ul> <p>The datasets include MIBiG, GNPS, antiSMASH, and BiG-SCAPE data.</p> <p>Attributes:</p> <ul> <li> <code>config</code>           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> <li> <code>root_dir</code>           \u2013            <p>The root directory of the datasets.</p> </li> <li> <code>downloads_dir</code>           \u2013            <p>The directory to store downloaded files.</p> </li> <li> <code>mibig_dir</code>           \u2013            <p>The directory to store MIBiG metadata.</p> </li> <li> <code>gnps_dir</code>           \u2013            <p>The directory to store GNPS data.</p> </li> <li> <code>antismash_dir</code>           \u2013            <p>The directory to store antiSMASH data.</p> </li> <li> <code>bigscape_dir</code>           \u2013            <p>The directory to store BiG-SCAPE data.</p> </li> <li> <code>bigscape_running_output_dir</code>           \u2013            <p>The directory to store the running output of BiG-SCAPE.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>config</code>               (<code>Dynaconf</code>)           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; from nplinker.config import load_config\n&gt;&gt;&gt; from nplinker.arranger import DatasetArranger\n&gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n&gt;&gt;&gt; arranger = DatasetArranger(config)\n&gt;&gt;&gt; arranger.arrange()\n</code></pre> See Also <p>DatasetLoader: Load all data from files to memory.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def __init__(self, config: Dynaconf) -&gt; None:\n    \"\"\"Initialize the DatasetArranger.\n\n    Args:\n        config: A Dynaconf object that contains the configuration settings.\n\n\n    Examples:\n        &gt;&gt;&gt; from nplinker.config import load_config\n        &gt;&gt;&gt; from nplinker.arranger import DatasetArranger\n        &gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n        &gt;&gt;&gt; arranger = DatasetArranger(config)\n        &gt;&gt;&gt; arranger.arrange()\n\n    See Also:\n        [DatasetLoader][nplinker.loader.DatasetLoader]: Load all data from files to memory.\n    \"\"\"\n    self.config = config\n    self.root_dir = config.root_dir\n    self.downloads_dir = self.root_dir / defaults.DOWNLOADS_DIRNAME\n    self.downloads_dir.mkdir(exist_ok=True)\n\n    self.mibig_dir = self.root_dir / defaults.MIBIG_DIRNAME\n    self.gnps_dir = self.root_dir / defaults.GNPS_DIRNAME\n    self.antismash_dir = self.root_dir / defaults.ANTISMASH_DIRNAME\n    self.bigscape_dir = self.root_dir / defaults.BIGSCAPE_DIRNAME\n    self.bigscape_running_output_dir = (\n        self.bigscape_dir / defaults.BIGSCAPE_RUNNING_OUTPUT_DIRNAME\n    )\n\n    self.arrange_podp_project_json()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.config","title":"config  <code>instance-attribute</code>","text":"<pre><code>config = config\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.root_dir","title":"root_dir  <code>instance-attribute</code>","text":"<pre><code>root_dir = root_dir\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.downloads_dir","title":"downloads_dir  <code>instance-attribute</code>","text":"<pre><code>downloads_dir = root_dir / DOWNLOADS_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.mibig_dir","title":"mibig_dir  <code>instance-attribute</code>","text":"<pre><code>mibig_dir = root_dir / MIBIG_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.gnps_dir","title":"gnps_dir  <code>instance-attribute</code>","text":"<pre><code>gnps_dir = root_dir / GNPS_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.antismash_dir","title":"antismash_dir  <code>instance-attribute</code>","text":"<pre><code>antismash_dir = root_dir / ANTISMASH_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.bigscape_dir","title":"bigscape_dir  <code>instance-attribute</code>","text":"<pre><code>bigscape_dir = root_dir / BIGSCAPE_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.bigscape_running_output_dir","title":"bigscape_running_output_dir  <code>instance-attribute</code>","text":"<pre><code>bigscape_running_output_dir = (\n    bigscape_dir / BIGSCAPE_RUNNING_OUTPUT_DIRNAME\n)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange","title":"arrange","text":"<pre><code>arrange() -&gt; None\n</code></pre> <p>Arrange all datasets according to the configuration.</p> <p>The datasets include MIBiG, GNPS, antiSMASH, and BiG-SCAPE.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange(self) -&gt; None:\n    \"\"\"Arrange all datasets according to the configuration.\n\n    The datasets include MIBiG, GNPS, antiSMASH, and BiG-SCAPE.\n    \"\"\"\n    # The order of arranging the datasets matters, as some datasets depend on others\n    self.arrange_mibig()\n    self.arrange_gnps()\n    self.arrange_antismash()\n    self.arrange_bigscape()\n    self.arrange_strain_mappings()\n    self.arrange_strains_selected()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_podp_project_json","title":"arrange_podp_project_json","text":"<pre><code>arrange_podp_project_json() -&gt; None\n</code></pre> <p>Arrange the PODP project JSON file.</p> <p>This method only works for the <code>podp</code> mode. If the JSON file does not exist, download it first; then the downloaded or existing JSON file will be validated according to the PODP_ADAPTED_SCHEMA.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_podp_project_json(self) -&gt; None:\n    \"\"\"Arrange the PODP project JSON file.\n\n    This method only works for the `podp` mode. If the JSON file does not exist, download it\n    first; then the downloaded or existing JSON file will be validated according to the\n    [PODP_ADAPTED_SCHEMA][nplinker.schemas.PODP_ADAPTED_SCHEMA].\n    \"\"\"\n    if self.config.mode == \"podp\":\n        file_name = f\"paired_datarecord_{self.config.podp_id}.json\"\n        podp_file = self.downloads_dir / file_name\n        if not podp_file.exists():\n            download_url(\n                PODP_PROJECT_URL.format(self.config.podp_id),\n                self.downloads_dir,\n                file_name,\n            )\n\n        with open(podp_file, \"r\") as f:\n            json_data = json.load(f)\n        validate_podp_json(json_data)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_mibig","title":"arrange_mibig","text":"<pre><code>arrange_mibig() -&gt; None\n</code></pre> <p>Arrange the MIBiG metadata.</p> <p>If <code>config.mibig.to_use</code> is <code>True</code>, download and extract the MIBiG metadata and override the existing MIBiG metadata if it exists. This ensures that the MIBiG metadata is always up-to-date to the specified version in the configuration.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_mibig(self) -&gt; None:\n    \"\"\"Arrange the MIBiG metadata.\n\n    If `config.mibig.to_use` is `True`, download and extract the MIBiG metadata and override\n    the existing MIBiG metadata if it exists. This ensures that the MIBiG metadata is always\n    up-to-date to the specified version in the configuration.\n    \"\"\"\n    if self.config.mibig.to_use:\n        if self.mibig_dir.exists():\n            # remove existing mibig data\n            shutil.rmtree(self.mibig_dir)\n        download_and_extract_mibig_metadata(\n            self.downloads_dir,\n            self.mibig_dir,\n            version=self.config.mibig.version,\n        )\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_gnps","title":"arrange_gnps","text":"<pre><code>arrange_gnps() -&gt; None\n</code></pre> <p>Arrange the GNPS data.</p> <p>For <code>local</code> mode, validate the GNPS data directory.</p> <p>For <code>podp</code> mode, if the GNPS data does not exist, download it; if it exists but not valid, remove the data and re-downloads it.</p> <p>The validation process includes:</p> <ul> <li>Check if the GNPS data directory exists.</li> <li>Check if the required files exist in the GNPS data directory, including:<ul> <li><code>file_mappings.tsv</code> or <code>file_mappings.csv</code></li> <li><code>spectra.mgf</code></li> <li><code>molecular_families.tsv</code></li> <li><code>annotations.tsv</code></li> </ul> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_gnps(self) -&gt; None:\n    \"\"\"Arrange the GNPS data.\n\n    For `local` mode, validate the GNPS data directory.\n\n    For `podp` mode, if the GNPS data does not exist, download it; if it exists but not valid,\n    remove the data and re-downloads it.\n\n    The validation process includes:\n\n    - Check if the GNPS data directory exists.\n    - Check if the required files exist in the GNPS data directory, including:\n        - `file_mappings.tsv` or `file_mappings.csv`\n        - `spectra.mgf`\n        - `molecular_families.tsv`\n        - `annotations.tsv`\n    \"\"\"\n    pass_validation = False\n    if self.config.mode == \"podp\":\n        # retry downloading at most 3 times if downloaded data has problems\n        for _ in range(3):\n            try:\n                validate_gnps(self.gnps_dir)\n                pass_validation = True\n                break\n            except (FileNotFoundError, ValueError):\n                # Don't need to remove downloaded archive, as it'll be overwritten\n                shutil.rmtree(self.gnps_dir, ignore_errors=True)\n                self._download_and_extract_gnps()\n\n    if not pass_validation:\n        validate_gnps(self.gnps_dir)\n\n    # get the path to file_mappings file (csv or tsv)\n    self.gnps_file_mappings_file = self._get_gnps_file_mappings_file()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_antismash","title":"arrange_antismash","text":"<pre><code>arrange_antismash() -&gt; None\n</code></pre> <p>Arrange the antiSMASH data.</p> <p>For <code>local</code> mode, validate the antiSMASH data.</p> <p>For <code>podp</code> mode, if the antiSMASH data does not exist, download it; if it exists but not valid, remove the data and re-download it.</p> <p>The validation process includes:</p> <ul> <li>Check if the antiSMASH data directory exists.</li> <li>Check if the antiSMASH data directory contains at least one sub-directory, and each     sub-directory contains at least one BGC file (with the suffix <code>.region???.gbk</code> where     <code>???</code> is a number).</li> </ul> <p>AntiSMASH BGC directory must follow the structure below: <pre><code>antismash\n    \u251c\u2500\u2500 genome_id_1 (one AntiSMASH output, e.g. GCF_000514775.1)\n    \u2502\u00a0 \u251c\u2500\u2500 GCF_000514775.1.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 genome_id_2\n    \u2502\u00a0 \u251c\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre></p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_antismash(self) -&gt; None:\n    \"\"\"Arrange the antiSMASH data.\n\n    For `local` mode, validate the antiSMASH data.\n\n    For `podp` mode, if the antiSMASH data does not exist, download it; if it exists but not\n    valid, remove the data and re-download it.\n\n    The validation process includes:\n\n    - Check if the antiSMASH data directory exists.\n    - Check if the antiSMASH data directory contains at least one sub-directory, and each\n        sub-directory contains at least one BGC file (with the suffix `.region???.gbk` where\n        `???` is a number).\n\n    AntiSMASH BGC directory must follow the structure below:\n    ```\n    antismash\n        \u251c\u2500\u2500 genome_id_1 (one AntiSMASH output, e.g. GCF_000514775.1)\n        \u2502\u00a0 \u251c\u2500\u2500 GCF_000514775.1.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u251c\u2500\u2500 genome_id_2\n        \u2502\u00a0 \u251c\u2500\u2500 ...\n        \u2514\u2500\u2500 ...\n    ```\n    \"\"\"\n    pass_validation = False\n    if self.config.mode == \"podp\":\n        for _ in range(3):\n            try:\n                validate_antismash(self.antismash_dir)\n                pass_validation = True\n                break\n            except FileNotFoundError:\n                shutil.rmtree(self.antismash_dir, ignore_errors=True)\n                self._download_and_extract_antismash()\n\n    if not pass_validation:\n        validate_antismash(self.antismash_dir)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_bigscape","title":"arrange_bigscape","text":"<pre><code>arrange_bigscape() -&gt; None\n</code></pre> <p>Arrange the BiG-SCAPE data.</p> <p>For <code>local</code> mode, validate the BiG-SCAPE data.</p> <p>For <code>podp</code> mode, if the BiG-SCAPE data does not exist, run BiG-SCAPE to generate the clustering file; if it exists but not valid, remove the data and re-run BiG-SCAPE to generate the data.</p> <p>The running output of BiG-SCAPE will be saved to the directory <code>bigscape_running_output</code> in the default BiG-SCAPE directory, and the clustering file <code>mix_clustering_c{self.config.bigscape.cutoff}.tsv</code> will be copied to the default BiG-SCAPE directory.</p> <p>The validation process includes:</p> <ul> <li>Check if the default BiG-SCAPE data directory exists.</li> <li>Check if the clustering file <code>mix_clustering_c{self.config.bigscape.cutoff}.tsv</code> exists in the         BiG-SCAPE data directory.</li> <li>Check if the <code>data_sqlite.db</code> file exists in the BiG-SCAPE data directory.</li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_bigscape(self) -&gt; None:\n    \"\"\"Arrange the BiG-SCAPE data.\n\n    For `local` mode, validate the BiG-SCAPE data.\n\n    For `podp` mode, if the BiG-SCAPE data does not exist, run BiG-SCAPE to generate the\n    clustering file; if it exists but not valid, remove the data and re-run BiG-SCAPE to generate\n    the data.\n\n    The running output of BiG-SCAPE will be saved to the directory `bigscape_running_output`\n    in the default BiG-SCAPE directory, and the clustering file\n    `mix_clustering_c{self.config.bigscape.cutoff}.tsv` will be copied to the default BiG-SCAPE\n    directory.\n\n    The validation process includes:\n\n    - Check if the default BiG-SCAPE data directory exists.\n    - Check if the clustering file `mix_clustering_c{self.config.bigscape.cutoff}.tsv` exists in the\n            BiG-SCAPE data directory.\n    - Check if the `data_sqlite.db` file exists in the BiG-SCAPE data directory.\n    \"\"\"\n    pass_validation = False\n    if self.config.mode == \"podp\":\n        for _ in range(3):\n            try:\n                validate_bigscape(self.bigscape_dir, self.config.bigscape.cutoff)\n                pass_validation = True\n                break\n            except FileNotFoundError:\n                shutil.rmtree(self.bigscape_dir, ignore_errors=True)\n                self._run_bigscape()\n\n    if not pass_validation:\n        validate_bigscape(self.bigscape_dir, self.config.bigscape.cutoff)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_strain_mappings","title":"arrange_strain_mappings","text":"<pre><code>arrange_strain_mappings() -&gt; None\n</code></pre> <p>Arrange the strain mappings file.</p> <p>For <code>local</code> mode, validate the strain mappings file.</p> <p>For <code>podp</code> mode, always generate the new strain mappings file and validate it.</p> <p>The validation checks if the strain mappings file exists and if it is a valid JSON file according to STRAIN_MAPPINGS_SCHEMA.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_strain_mappings(self) -&gt; None:\n    \"\"\"Arrange the strain mappings file.\n\n    For `local` mode, validate the strain mappings file.\n\n    For `podp` mode, always generate the new strain mappings file and validate it.\n\n    The validation checks if the strain mappings file exists and if it is a valid JSON file\n    according to [STRAIN_MAPPINGS_SCHEMA][nplinker.schemas.STRAIN_MAPPINGS_SCHEMA].\n    \"\"\"\n    if self.config.mode == \"podp\":\n        self._generate_strain_mappings()\n\n    self._validate_strain_mappings()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_strains_selected","title":"arrange_strains_selected","text":"<pre><code>arrange_strains_selected() -&gt; None\n</code></pre> <p>Arrange the strains selected file.</p> <p>If the file exists, validate it according to the schema defined in <code>user_strains.json</code>.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_strains_selected(self) -&gt; None:\n    \"\"\"Arrange the strains selected file.\n\n    If the file exists, validate it according to the schema defined in `user_strains.json`.\n    \"\"\"\n    strains_selected_file = self.root_dir / defaults.STRAINS_SELECTED_FILENAME\n    if strains_selected_file.exists():\n        with open(strains_selected_file, \"r\") as f:\n            json_data = json.load(f)\n        validate(instance=json_data, schema=USER_STRAINS_SCHEMA)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.validate_gnps","title":"validate_gnps","text":"<pre><code>validate_gnps(gnps_dir: str | PathLike) -&gt; None\n</code></pre> <p>Validate the GNPS data directory and its contents.</p> <p>The GNPS data directory must contain the following files:</p> <ul> <li><code>file_mappings.tsv</code> or <code>file_mappings.csv</code></li> <li><code>spectra.mgf</code></li> <li><code>molecular_families.tsv</code></li> <li><code>annotations.tsv</code></li> </ul> <p>Parameters:</p> <ul> <li> <code>gnps_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS data directory.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the GNPS data directory is not found or any of the required files is not found.</p> </li> <li> <code>ValueError</code>             \u2013            <p>If both file_mappings.tsv and file_mapping.csv are found.</p> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def validate_gnps(gnps_dir: str | PathLike) -&gt; None:\n    \"\"\"Validate the GNPS data directory and its contents.\n\n    The GNPS data directory must contain the following files:\n\n    - `file_mappings.tsv` or `file_mappings.csv`\n    - `spectra.mgf`\n    - `molecular_families.tsv`\n    - `annotations.tsv`\n\n    Args:\n        gnps_dir: Path to the GNPS data directory.\n\n    Raises:\n        FileNotFoundError: If the GNPS data directory is not found or any of the required files\n            is not found.\n        ValueError: If both file_mappings.tsv and file_mapping.csv are found.\n    \"\"\"\n    gnps_dir = Path(gnps_dir)\n    if not gnps_dir.exists():\n        raise FileNotFoundError(f\"GNPS data directory not found at {gnps_dir}\")\n\n    file_mappings_tsv = gnps_dir / defaults.GNPS_FILE_MAPPINGS_TSV\n    file_mappings_csv = gnps_dir / defaults.GNPS_FILE_MAPPINGS_CSV\n    if file_mappings_tsv.exists() and file_mappings_csv.exists():\n        raise ValueError(\n            f\"Both {file_mappings_tsv.name} and {file_mappings_csv.name} found in GNPS directory \"\n            f\"{gnps_dir}, only one is allowed.\"\n        )\n    elif not file_mappings_tsv.exists() and not file_mappings_csv.exists():\n        raise FileNotFoundError(\n            f\"Neither {file_mappings_tsv.name} nor {file_mappings_csv.name} found in GNPS directory\"\n            f\" {gnps_dir}\"\n        )\n\n    required_files = [\n        gnps_dir / defaults.GNPS_SPECTRA_FILENAME,\n        gnps_dir / defaults.GNPS_MOLECULAR_FAMILY_FILENAME,\n        gnps_dir / defaults.GNPS_ANNOTATIONS_FILENAME,\n    ]\n    list_not_found = [f.name for f in required_files if not f.exists()]\n    if list_not_found:\n        raise FileNotFoundError(\n            f\"Files not found in GNPS directory {gnps_dir}: ', '.join({list_not_found})\"\n        )\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.validate_antismash","title":"validate_antismash","text":"<pre><code>validate_antismash(antismash_dir: str | PathLike) -&gt; None\n</code></pre> <p>Validate the antiSMASH data directory and its contents.</p> <p>The validation only checks the structure of the antiSMASH data directory and file names. It does not check</p> <ul> <li>the content of the BGC files</li> <li>the consistency between the antiSMASH data and the PODP project JSON file for the <code>podp</code> mode</li> </ul> <p>The antiSMASH data directory must exist and contain at least one sub-directory. The name of the sub-directories must not contain any space. Each sub-directory must contain at least one BGC file (with the suffix <code>.region???.gbk</code> where <code>???</code> is the region number).</p> <p>Parameters:</p> <ul> <li> <code>antismash_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the antiSMASH data directory.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the antiSMASH data directory is not found, or no sub-directories are found in the antiSMASH data directory, or no BGC files are found in any sub-directory.</p> </li> <li> <code>ValueError</code>             \u2013            <p>If any sub-directory name contains a space.</p> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def validate_antismash(antismash_dir: str | PathLike) -&gt; None:\n    \"\"\"Validate the antiSMASH data directory and its contents.\n\n    The validation only checks the structure of the antiSMASH data directory and file names.\n    It does not check\n\n    - the content of the BGC files\n    - the consistency between the antiSMASH data and the PODP project JSON file for the `podp` mode\n\n    The antiSMASH data directory must exist and contain at least one sub-directory. The name of the\n    sub-directories must not contain any space. Each sub-directory must contain at least one BGC\n    file (with the suffix `.region???.gbk` where `???` is the region number).\n\n    Args:\n        antismash_dir: Path to the antiSMASH data directory.\n\n    Raises:\n        FileNotFoundError: If the antiSMASH data directory is not found, or no sub-directories\n            are found in the antiSMASH data directory, or no BGC files are found in any\n            sub-directory.\n        ValueError: If any sub-directory name contains a space.\n    \"\"\"\n    antismash_dir = Path(antismash_dir)\n    if not antismash_dir.exists():\n        raise FileNotFoundError(f\"antiSMASH data directory not found at {antismash_dir}\")\n\n    sub_dirs = list_dirs(antismash_dir)\n    if not sub_dirs:\n        raise FileNotFoundError(\n            \"No BGC directories found in antiSMASH data directory {antismash_dir}\"\n        )\n\n    for sub_dir in sub_dirs:\n        dir_name = Path(sub_dir).name\n        if \" \" in dir_name:\n            raise ValueError(\n                f\"antiSMASH sub-directory name {dir_name} contains space, which is not allowed\"\n            )\n\n        gbk_files = list_files(sub_dir, suffix=\".gbk\", keep_parent=False)\n        bgc_files = fnmatch.filter(gbk_files, \"*.region???.gbk\")\n        if not bgc_files:\n            raise FileNotFoundError(f\"No BGC files found in antiSMASH sub-directory {sub_dir}\")\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.validate_bigscape","title":"validate_bigscape","text":"<pre><code>validate_bigscape(\n    bigscape_dir: str | PathLike, cutoff: str\n) -&gt; None\n</code></pre> <p>Validate the BiG-SCAPE data directory and its contents.</p> <p>The BiG-SCAPE data directory must exist and contain the clustering file <code>mix_clustering_c{self.config.bigscape.cutoff}.tsv</code> where <code>{self.config.bigscape.cutoff}</code> is the bigscape cutoff value set in the config file.</p> <p>Alternatively, the directory can contain the BiG-SCAPE database file generated by BiG-SCAPE v2. At the moment, all the family assignments in the database will be used, so this database should contain results from a single run with the desired cutoff.</p> <p>Parameters:</p> <ul> <li> <code>bigscape_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the BiG-SCAPE data directory.</p> </li> <li> <code>cutoff</code>               (<code>str</code>)           \u2013            <p>The BiG-SCAPE cutoff value.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the BiG-SCAPE data directory or the clustering file is not found.</p> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def validate_bigscape(bigscape_dir: str | PathLike, cutoff: str) -&gt; None:\n    \"\"\"Validate the BiG-SCAPE data directory and its contents.\n\n    The BiG-SCAPE data directory must exist and contain the clustering file\n    `mix_clustering_c{self.config.bigscape.cutoff}.tsv` where `{self.config.bigscape.cutoff}` is the\n    bigscape cutoff value set in the config file.\n\n    Alternatively, the directory can contain the BiG-SCAPE database file generated by BiG-SCAPE v2.\n    At the moment, all the family assignments in the database will be used, so this database should\n    contain results from a single run with the desired cutoff.\n\n    Args:\n        bigscape_dir: Path to the BiG-SCAPE data directory.\n        cutoff: The BiG-SCAPE cutoff value.\n\n    Raises:\n        FileNotFoundError: If the BiG-SCAPE data directory or the clustering file is not found.\n    \"\"\"\n    bigscape_dir = Path(bigscape_dir)\n    if not bigscape_dir.exists():\n        raise FileNotFoundError(f\"BiG-SCAPE data directory not found at {bigscape_dir}\")\n\n    clustering_file = bigscape_dir / f\"mix_clustering_c{cutoff}.tsv\"\n    database_file = bigscape_dir / \"data_sqlite.db\"\n    if not clustering_file.exists() and not database_file.exists():\n        raise FileNotFoundError(f\"BiG-SCAPE data not found in {clustering_file} or {database_file}\")\n</code></pre>"},{"location":"api/bigscape/","title":"BigScape","text":""},{"location":"api/bigscape/#nplinker.genomics.bigscape","title":"nplinker.genomics.bigscape","text":""},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeGCFLoader","title":"BigscapeGCFLoader","text":"<pre><code>BigscapeGCFLoader(cluster_file: str | PathLike)\n</code></pre> <p>               Bases: <code>GCFLoaderBase</code></p> <p>Data loader for BiG-SCAPE GCF cluster file.</p> <p>Attributes:</p> <ul> <li> <code>cluster_file</code>               (<code>str</code>)           \u2013            <p>path to the BiG-SCAPE cluster file.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>cluster_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the BiG-SCAPE cluster file, the filename has a pattern of <code>&lt;class&gt;_clustering_c0.xx.tsv</code>.</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def __init__(self, cluster_file: str | PathLike, /) -&gt; None:\n    \"\"\"Initialize the BiG-SCAPE GCF loader.\n\n    Args:\n        cluster_file: Path to the BiG-SCAPE cluster file,\n            the filename has a pattern of `&lt;class&gt;_clustering_c0.xx.tsv`.\n    \"\"\"\n    self.cluster_file: str = str(cluster_file)\n    self._gcf_list = self._parse_gcf(self.cluster_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeGCFLoader.cluster_file","title":"cluster_file  <code>instance-attribute</code>","text":"<pre><code>cluster_file: str = str(cluster_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeGCFLoader.get_gcfs","title":"get_gcfs","text":"<pre><code>get_gcfs(\n    keep_mibig_only: bool = False,\n    keep_singleton: bool = False,\n) -&gt; list[GCF]\n</code></pre> <p>Get all GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_mibig_only</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep GCFs that contain only MIBiG BGCs.</p> </li> <li> <code>keep_singleton</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep singleton GCFs. A singleton GCF is a GCF that contains only one BGC.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[GCF]</code>           \u2013            <p>A list of GCF objects.</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def get_gcfs(self, keep_mibig_only: bool = False, keep_singleton: bool = False) -&gt; list[GCF]:\n    \"\"\"Get all GCF objects.\n\n    Args:\n        keep_mibig_only: True to keep GCFs that contain only MIBiG\n            BGCs.\n        keep_singleton: True to keep singleton GCFs. A singleton GCF\n            is a GCF that contains only one BGC.\n\n    Returns:\n        A list of GCF objects.\n    \"\"\"\n    gcf_list = self._gcf_list\n    if not keep_mibig_only:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.has_mibig_only()]\n    if not keep_singleton:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.is_singleton()]\n    return gcf_list\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeV2GCFLoader","title":"BigscapeV2GCFLoader","text":"<pre><code>BigscapeV2GCFLoader(db_file: str | PathLike)\n</code></pre> <p>               Bases: <code>GCFLoaderBase</code></p> <p>Data loader for BiG-SCAPE v2 database file.</p> <p>Attributes:</p> <ul> <li> <code>db_file</code>           \u2013            <p>Path to the BiG-SCAPE database file.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>db_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the BiG-SCAPE v2 database file</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def __init__(self, db_file: str | PathLike, /) -&gt; None:\n    \"\"\"Initialize the BiG-SCAPE v2 GCF loader.\n\n    Args:\n        db_file: Path to the BiG-SCAPE v2 database file\n    \"\"\"\n    self.db_file = str(db_file)\n    self._gcf_list = self._parse_gcf(self.db_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeV2GCFLoader.db_file","title":"db_file  <code>instance-attribute</code>","text":"<pre><code>db_file = str(db_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeV2GCFLoader.get_gcfs","title":"get_gcfs","text":"<pre><code>get_gcfs(\n    keep_mibig_only: bool = False,\n    keep_singleton: bool = False,\n) -&gt; list[GCF]\n</code></pre> <p>Get all GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_mibig_only</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep GCFs that contain only MIBiG BGCs.</p> </li> <li> <code>keep_singleton</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep singleton GCFs. A singleton GCF is a GCF that contains only one BGC.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[GCF]</code>           \u2013            <p>a list of GCF objects.</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def get_gcfs(self, keep_mibig_only: bool = False, keep_singleton: bool = False) -&gt; list[GCF]:\n    \"\"\"Get all GCF objects.\n\n    Args:\n        keep_mibig_only: True to keep GCFs that contain only MIBiG BGCs.\n        keep_singleton: True to keep singleton GCFs.\n            A singleton GCF is a GCF that contains only one BGC.\n\n    Returns:\n        a list of GCF objects.\n    \"\"\"\n    gcf_list = self._gcf_list\n    if not keep_mibig_only:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.has_mibig_only()]\n    if not keep_singleton:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.is_singleton()]\n    return gcf_list\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.run_bigscape","title":"run_bigscape","text":"<pre><code>run_bigscape(\n    antismash_path: str | PathLike,\n    output_path: str | PathLike,\n    extra_params: str,\n    version: Literal[1, 2] = 1,\n) -&gt; bool\n</code></pre> <p>Runs BiG-SCAPE to cluster BGCs.</p> <p>The behavior of this function is slightly different depending on the version of BiG-SCAPE that is set to run using the configuration file. Mostly this means a different set of parameters is used between the two versions.</p> <p>The AntiSMASH output directory should be a directory that contains GBK files. The directory can contain subdirectories, in which case BiG-SCAPE will search recursively for GBK files. E.g.:</p> <pre><code>example_folder\n    \u251c\u2500\u2500 organism_1\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.region001.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.region002.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.region003.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.final.gbk          &lt;- skipped!\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 organism_2\n    \u2502\u00a0 \u251c\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre> <p>By default, only GBK Files with \"cluster\" or \"region\" in the filename are accepted. GBK Files with \"final\" in the filename are excluded.</p> <p>Parameters:</p> <ul> <li> <code>antismash_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the antismash output directory.</p> </li> <li> <code>output_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the output directory where BiG-SCAPE will write its results.</p> </li> <li> <code>extra_params</code>               (<code>str</code>)           \u2013            <p>Additional parameters to pass to BiG-SCAPE.</p> </li> <li> <code>version</code>               (<code>Literal[1, 2]</code>, default:                   <code>1</code> )           \u2013            <p>The version of BiG-SCAPE to run. Must be 1 or 2.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if BiG-SCAPE ran successfully, False otherwise.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If an unexpected BiG-SCAPE version number is specified.</p> </li> <li> <code>FileNotFoundError</code>             \u2013            <p>If the antismash_path does not exist or if the BiG-SCAPE python script could not be found.</p> </li> <li> <code>RuntimeError</code>             \u2013            <p>If BiG-SCAPE fails to run.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt;  from nplinker.genomics.bigscape import run_bigscape\n&gt;&gt;&gt; run_bigscape(antismash_path=\"./antismash\", output_path=\"./output\",\n... extra_params=\"--help\", version=1)\n</code></pre> Source code in <code>src/nplinker/genomics/bigscape/runbigscape.py</code> <pre><code>def run_bigscape(\n    antismash_path: str | PathLike,\n    output_path: str | PathLike,\n    extra_params: str,\n    version: Literal[1, 2] = 1,\n) -&gt; bool:\n    \"\"\"Runs BiG-SCAPE to cluster BGCs.\n\n    The behavior of this function is slightly different depending on the version of\n    BiG-SCAPE that is set to run using the configuration file.\n    Mostly this means a different set of parameters is used between the two versions.\n\n    The AntiSMASH output directory should be a directory that contains GBK files.\n    The directory can contain subdirectories, in which case BiG-SCAPE will search\n    recursively for GBK files. E.g.:\n\n    ```\n    example_folder\n        \u251c\u2500\u2500 organism_1\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.region001.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.region002.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.region003.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.final.gbk          &lt;- skipped!\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u251c\u2500\u2500 organism_2\n        \u2502\u00a0 \u251c\u2500\u2500 ...\n        \u2514\u2500\u2500 ...\n    ```\n\n    By default, only GBK Files with \"cluster\" or \"region\" in the filename are\n    accepted. GBK Files with \"final\" in the filename are excluded.\n\n    Args:\n        antismash_path: Path to the antismash output directory.\n        output_path: Path to the output directory where BiG-SCAPE will write its results.\n        extra_params: Additional parameters to pass to BiG-SCAPE.\n        version: The version of BiG-SCAPE to run. Must be 1 or 2.\n\n    Returns:\n        True if BiG-SCAPE ran successfully, False otherwise.\n\n    Raises:\n        ValueError: If an unexpected BiG-SCAPE version number is specified.\n        FileNotFoundError: If the antismash_path does not exist or if the BiG-SCAPE python\n            script could not be found.\n        RuntimeError: If BiG-SCAPE fails to run.\n\n    Examples:\n        &gt;&gt;&gt;  from nplinker.genomics.bigscape import run_bigscape\n        &gt;&gt;&gt; run_bigscape(antismash_path=\"./antismash\", output_path=\"./output\",\n        ... extra_params=\"--help\", version=1)\n    \"\"\"\n    # switch to correct version of BiG-SCAPE\n    if version == 1:\n        bigscape_py_path = \"bigscape.py\"\n    elif version == 2:\n        bigscape_py_path = \"bigscape-v2.py\"\n    else:\n        raise ValueError(\"Invalid BiG-SCAPE version number. Expected: 1 or 2.\")\n\n    try:\n        subprocess.run([bigscape_py_path, \"-h\"], capture_output=True, check=True)\n    except Exception as e:\n        raise FileNotFoundError(\n            f\"Failed to find/run BiG-SCAPE executable program (path={bigscape_py_path}, err={e})\"\n        ) from e\n\n    if not os.path.exists(antismash_path):\n        raise FileNotFoundError(f'antismash_path \"{antismash_path}\" does not exist!')\n\n    logger.info(f\"Running BiG-SCAPE version {version}\")\n    logger.info(\n        f'run_bigscape: input=\"{antismash_path}\", output=\"{output_path}\", extra_params={extra_params}\"'\n    )\n\n    # assemble arguments. first argument is the python file\n    args = [bigscape_py_path]\n\n    # version 2 points to specific Pfam file, version 1 points to directory\n    # version 2 also requires the cluster subcommand\n    if version == 1:\n        args.extend([\"--pfam_dir\", PFAM_PATH])\n    elif version == 2:\n        args.extend([\"cluster\", \"--pfam_path\", os.path.join(PFAM_PATH, \"Pfam-A.hmm\")])\n\n    # add input and output paths. these are unchanged\n    args.extend([\"-i\", str(antismash_path), \"-o\", str(output_path)])\n\n    # append the user supplied params, if any\n    if len(extra_params) &gt; 0:\n        args.extend(extra_params.split(\" \"))\n\n    logger.info(f\"BiG-SCAPE command: {args}\")\n    result = subprocess.run(args, stdout=sys.stdout, stderr=sys.stderr)\n\n    # return true on any non-error return code\n    if result.returncode == 0:\n        logger.info(f\"BiG-SCAPE completed with return code {result.returncode}\")\n        return True\n\n    # otherwise log details and raise a runtime error\n    logger.error(f\"BiG-SCAPE failed with return code {result.returncode}\")\n    logger.error(f\"output: {str(result.stdout)}\")\n    logger.error(f\"stderr: {str(result.stderr)}\")\n\n    raise RuntimeError(f\"Failed to run BiG-SCAPE with error code {result.returncode}\")\n</code></pre>"},{"location":"api/genomics/","title":"Data Models","text":""},{"location":"api/genomics/#nplinker.genomics","title":"nplinker.genomics","text":""},{"location":"api/genomics/#nplinker.genomics.BGC","title":"BGC","text":"<pre><code>BGC(id: str, /, *product_prediction: str)\n</code></pre> <p>Class to model BGC (biosynthetic gene cluster) data.</p> <p>BGC data include both annotations and sequence data. This class is mainly designed to model the annotations or metadata.</p> <p>The raw BGC data is stored in GenBank format (.gbk). Additional GenBank features could be added to the GenBank file to annotate BGCs, e.g. antiSMASH has some self-defined features (like <code>region</code>) in its output GenBank files.</p> <p>The annotations of BGC can be stored in JSON format, which is defined and used by MIBiG.</p> <p>Attributes:</p> <ul> <li> <code>id</code>           \u2013            <p>BGC identifier, e.g. MIBiG accession, GenBank accession.</p> </li> <li> <code>product_prediction</code>           \u2013            <p>A tuple of (predicted) natural products or product classes of the BGC. For antiSMASH's GenBank data, the feature <code>region /product</code> gives product information. For MIBiG metadata, its biosynthetic class provides such info.</p> </li> <li> <code>mibig_bgc_class</code>               (<code>tuple[str] | None</code>)           \u2013            <p>A tuple of MIBiG biosynthetic classes to which the BGC belongs. Defaults to None, which means the class is unknown.</p> <p>MIBiG defines 6 major biosynthetic classes for natural products, including <code>NRP</code>, <code>Polyketide</code>, <code>RiPP</code>, <code>Terpene</code>, <code>Saccharide</code> and <code>Alkaloid</code>. Note that natural products created by the other biosynthetic mechanisms fall under the category <code>Other</code>. For more details see the paper.</p> </li> <li> <code>description</code>               (<code>str | None</code>)           \u2013            <p>Brief description of the BGC. Defaults to None.</p> </li> <li> <code>smiles</code>               (<code>tuple[str] | None</code>)           \u2013            <p>A tuple of SMILES formulas of the BGC's products. Defaults to None.</p> </li> <li> <code>antismash_file</code>               (<code>str | None</code>)           \u2013            <p>The path to the antiSMASH GenBank file. Defaults to None.</p> </li> <li> <code>antismash_id</code>               (<code>str | None</code>)           \u2013            <p>Identifier of the antiSMASH BGC, referring to the feature <code>VERSION</code> of GenBank file. Defaults to None.</p> </li> <li> <code>antismash_region</code>               (<code>int | None</code>)           \u2013            <p>AntiSMASH BGC region number, referring to the feature <code>region</code> of GenBank file. Defaults to None.</p> </li> <li> <code>parents</code>               (<code>set[GCF]</code>)           \u2013            <p>The set of GCFs that contain the BGC.</p> </li> <li> <code>strain</code>               (<code>Strain | None</code>)           \u2013            <p>The strain of the BGC.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>BGC identifier, e.g. MIBiG accession, GenBank accession.</p> </li> <li> <code>product_prediction</code>               (<code>str</code>, default:                   <code>()</code> )           \u2013            <p>BGC's (predicted) natural products or product classes.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; bgc = BGC(\"Unique_BGC_ID\", \"Polyketide\", \"NRP\")\n&gt;&gt;&gt; bgc.id\n'Unique_BGC_ID'\n&gt;&gt;&gt; bgc.product_prediction\n('Polyketide', 'NRP')\n&gt;&gt;&gt; bgc.is_mibig()\nFalse\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __init__(self, id: str, /, *product_prediction: str):\n    \"\"\"Initialize the BGC object.\n\n    Args:\n        id: BGC identifier, e.g. MIBiG accession, GenBank accession.\n        product_prediction: BGC's (predicted) natural products or product classes.\n\n    Examples:\n        &gt;&gt;&gt; bgc = BGC(\"Unique_BGC_ID\", \"Polyketide\", \"NRP\")\n        &gt;&gt;&gt; bgc.id\n        'Unique_BGC_ID'\n        &gt;&gt;&gt; bgc.product_prediction\n        ('Polyketide', 'NRP')\n        &gt;&gt;&gt; bgc.is_mibig()\n        False\n    \"\"\"\n    # BGC metadata\n    self.id = id\n    self.product_prediction = product_prediction\n\n    self.mibig_bgc_class: tuple[str] | None = None\n    self.description: str | None = None\n    self.smiles: tuple[str] | None = None\n\n    # antismash related attributes\n    self.antismash_file: str | None = None\n    self.antismash_id: str | None = None  # version in .gbk, id in SeqRecord\n    self.antismash_region: int | None = None  # antismash region number\n\n    # other attributes\n    self.parents: set[GCF] = set()\n    self._strain: Strain | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id = id\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.product_prediction","title":"product_prediction  <code>instance-attribute</code>","text":"<pre><code>product_prediction = product_prediction\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.mibig_bgc_class","title":"mibig_bgc_class  <code>instance-attribute</code>","text":"<pre><code>mibig_bgc_class: tuple[str] | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.description","title":"description  <code>instance-attribute</code>","text":"<pre><code>description: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.smiles","title":"smiles  <code>instance-attribute</code>","text":"<pre><code>smiles: tuple[str] | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.antismash_file","title":"antismash_file  <code>instance-attribute</code>","text":"<pre><code>antismash_file: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.antismash_id","title":"antismash_id  <code>instance-attribute</code>","text":"<pre><code>antismash_id: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.antismash_region","title":"antismash_region  <code>instance-attribute</code>","text":"<pre><code>antismash_region: int | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.parents","title":"parents  <code>instance-attribute</code>","text":"<pre><code>parents: set[GCF] = set()\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.strain","title":"strain  <code>property</code> <code>writable</code>","text":"<pre><code>strain: Strain | None\n</code></pre> <p>Get the strain of the BGC.</p>"},{"location":"api/genomics/#nplinker.genomics.BGC.bigscape_classes","title":"bigscape_classes  <code>property</code>","text":"<pre><code>bigscape_classes: set[str | None]\n</code></pre> <p>Get BiG-SCAPE's BGC classes.</p> <p>BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:</p> <ul> <li>NRPS</li> <li>PKS-NRP_Hybrids</li> <li>PKSI</li> <li>PKSother</li> <li>RiPPs</li> <li>Saccharides</li> <li>Terpene</li> </ul> <p>For BGC falls outside of these categories, the value is \"Others\".</p> <p>Default is None, which means the class is unknown.</p> <p>More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.</p>"},{"location":"api/genomics/#nplinker.genomics.BGC.aa_predictions","title":"aa_predictions  <code>property</code>","text":"<pre><code>aa_predictions: list\n</code></pre> <p>Amino acids as predicted monomers of product.</p> <p>Returns:</p> <ul> <li> <code>list</code>           \u2013            <p>list of dicts with key as amino acid and value as prediction</p> </li> <li> <code>list</code>           \u2013            <p>probability.</p> </li> </ul>"},{"location":"api/genomics/#nplinker.genomics.BGC.__repr__","title":"__repr__","text":"<pre><code>__repr__()\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __repr__(self):\n    return str(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__str__","title":"__str__","text":"<pre><code>__str__()\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __str__(self):\n    return \"{}(id={}, strain={}, asid={}, region={})\".format(\n        self.__class__.__name__,\n        self.id,\n        self.strain,\n        self.antismash_id,\n        self.antismash_region,\n    )\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, BGC):\n        return self.id == other.id and self.product_prediction == other.product_prediction\n    return NotImplemented\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __hash__(self) -&gt; int:\n    return hash((self.id, self.product_prediction))\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (self.__class__, (self.id, *self.product_prediction), self.__dict__)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.add_parent","title":"add_parent","text":"<pre><code>add_parent(gcf: GCF) -&gt; None\n</code></pre> <p>Add a parent GCF to the BGC.</p> <p>Parameters:</p> <ul> <li> <code>gcf</code>               (<code>GCF</code>)           \u2013            <p>gene cluster family</p> </li> </ul> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def add_parent(self, gcf: GCF) -&gt; None:\n    \"\"\"Add a parent GCF to the BGC.\n\n    Args:\n        gcf: gene cluster family\n    \"\"\"\n    gcf.add_bgc(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.detach_parent","title":"detach_parent","text":"<pre><code>detach_parent(gcf: GCF) -&gt; None\n</code></pre> <p>Remove a parent GCF.</p> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def detach_parent(self, gcf: GCF) -&gt; None:\n    \"\"\"Remove a parent GCF.\"\"\"\n    gcf.detach_bgc(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.is_mibig","title":"is_mibig","text":"<pre><code>is_mibig() -&gt; bool\n</code></pre> <p>Check if the BGC is a MIBiG reference BGC or not.</p> Warning <p>This method evaluates MIBiG BGC based on the pattern that MIBiG BGC names start with \"BGC\". It might give false positive result.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if it's MIBiG reference BGC</p> </li> </ul> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def is_mibig(self) -&gt; bool:\n    \"\"\"Check if the BGC is a MIBiG reference BGC or not.\n\n    Warning:\n        This method evaluates MIBiG BGC based on the pattern that MIBiG\n        BGC names start with \"BGC\". It might give false positive result.\n\n    Returns:\n        True if it's MIBiG reference BGC\n    \"\"\"\n    return self.id.startswith(\"BGC\")\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF","title":"GCF","text":"<pre><code>GCF(id: str)\n</code></pre> <p>Class to model gene cluster family (GCF).</p> <p>GCF is a group of similar BGCs and generated by clustering BGCs with tools such as BiG-SCAPE and BiG-SLICE.</p> <p>Attributes:</p> <ul> <li> <code>id</code>           \u2013            <p>id of the GCF object.</p> </li> <li> <code>bgc_ids</code>               (<code>set[str]</code>)           \u2013            <p>a set of BGC ids that belongs to the GCF.</p> </li> <li> <code>bigscape_class</code>               (<code>str | None</code>)           \u2013            <p>BiG-SCAPE's BGC class. BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:</p> <ul> <li>NRPS</li> <li>PKS-NRP_Hybrids</li> <li>PKSI</li> <li>PKSother</li> <li>RiPPs</li> <li>Saccharides</li> <li>Terpene</li> </ul> <p>For BGC falls outside of these categories, the value is \"Others\".</p> <p>Default is None, which means the class is unknown.</p> <p>More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>id of the GCF object.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gcf = GCF(\"Unique_GCF_ID\")\n&gt;&gt;&gt; gcf.id\n'Unique_GCF_ID'\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __init__(self, id: str, /) -&gt; None:\n    \"\"\"Initialize the GCF object.\n\n    Args:\n        id: id of the GCF object.\n\n    Examples:\n        &gt;&gt;&gt; gcf = GCF(\"Unique_GCF_ID\")\n        &gt;&gt;&gt; gcf.id\n        'Unique_GCF_ID'\n    \"\"\"\n    self.id = id\n    self.bgc_ids: set[str] = set()\n    self.bigscape_class: str | None = None\n    self._bgcs: set[BGC] = set()\n    self._strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id = id\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.bgc_ids","title":"bgc_ids  <code>instance-attribute</code>","text":"<pre><code>bgc_ids: set[str] = set()\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.bigscape_class","title":"bigscape_class  <code>instance-attribute</code>","text":"<pre><code>bigscape_class: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.bgcs","title":"bgcs  <code>property</code>","text":"<pre><code>bgcs: set[BGC]\n</code></pre> <p>Get the BGC objects.</p>"},{"location":"api/genomics/#nplinker.genomics.GCF.strains","title":"strains  <code>property</code>","text":"<pre><code>strains: StrainCollection\n</code></pre> <p>Get the strains in the GCF.</p>"},{"location":"api/genomics/#nplinker.genomics.GCF.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __str__(self) -&gt; str:\n    return (\n        f\"GCF(id={self.id}, #BGC_objects={len(self.bgcs)}, #bgc_ids={len(self.bgc_ids)},\"\n        f\"#strains={len(self._strains)}).\"\n    )\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, GCF):\n        return self.id == other.id and self.bgcs == other.bgcs\n    return NotImplemented\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> <p>Hash function for GCF.</p> <p>Note that GCF class is a mutable container. We only hash the GCF id to avoid the hash value changes when <code>self._bgcs</code> is updated.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __hash__(self) -&gt; int:\n    \"\"\"Hash function for GCF.\n\n    Note that GCF class is a mutable container. We only hash the GCF id to\n    avoid the hash value changes when `self._bgcs` is updated.\n    \"\"\"\n    return hash(self.id)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (self.__class__, (self.id,), self.__dict__)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.add_bgc","title":"add_bgc","text":"<pre><code>add_bgc(bgc: BGC) -&gt; None\n</code></pre> <p>Add a BGC object to the GCF.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def add_bgc(self, bgc: BGC) -&gt; None:\n    \"\"\"Add a BGC object to the GCF.\"\"\"\n    bgc.parents.add(self)\n    self._bgcs.add(bgc)\n    self.bgc_ids.add(bgc.id)\n    if bgc.strain is not None:\n        self._strains.add(bgc.strain)\n    else:\n        logger.warning(\"No strain specified for the BGC %s\", bgc.id)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.detach_bgc","title":"detach_bgc","text":"<pre><code>detach_bgc(bgc: BGC) -&gt; None\n</code></pre> <p>Remove a child BGC object.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def detach_bgc(self, bgc: BGC) -&gt; None:\n    \"\"\"Remove a child BGC object.\"\"\"\n    bgc.parents.remove(self)\n    self._bgcs.remove(bgc)\n    self.bgc_ids.remove(bgc.id)\n    if bgc.strain is not None:\n        for other_bgc in self._bgcs:\n            if other_bgc.strain == bgc.strain:\n                return\n        self._strains.remove(bgc.strain)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.has_strain","title":"has_strain","text":"<pre><code>has_strain(strain: Strain) -&gt; bool\n</code></pre> <p>Check if the given strain exists.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p><code>Strain</code> object.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the given strain exist.</p> </li> </ul> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def has_strain(self, strain: Strain) -&gt; bool:\n    \"\"\"Check if the given strain exists.\n\n    Args:\n        strain: `Strain` object.\n\n    Returns:\n        True when the given strain exist.\n    \"\"\"\n    return strain in self._strains\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.has_mibig_only","title":"has_mibig_only","text":"<pre><code>has_mibig_only() -&gt; bool\n</code></pre> <p>Check if the GCF's children are only MIBiG BGCs.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if <code>GCF.bgc_ids</code> are only MIBiG BGC ids.</p> </li> </ul> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def has_mibig_only(self) -&gt; bool:\n    \"\"\"Check if the GCF's children are only MIBiG BGCs.\n\n    Returns:\n        True if `GCF.bgc_ids` are only MIBiG BGC ids.\n    \"\"\"\n    return all(map(lambda id: id.startswith(\"BGC\"), self.bgc_ids))\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.is_singleton","title":"is_singleton","text":"<pre><code>is_singleton() -&gt; bool\n</code></pre> <p>Check if the GCF contains only one BGC.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if <code>GCF.bgc_ids</code> contains only one BGC id.</p> </li> </ul> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def is_singleton(self) -&gt; bool:\n    \"\"\"Check if the GCF contains only one BGC.\n\n    Returns:\n        True if `GCF.bgc_ids` contains only one BGC id.\n    \"\"\"\n    return len(self.bgc_ids) == 1\n</code></pre>"},{"location":"api/genomics_abc/","title":"Abstract Base Classes","text":""},{"location":"api/genomics_abc/#nplinker.genomics.abc","title":"nplinker.genomics.abc","text":""},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase","title":"BGCLoaderBase","text":"<pre><code>BGCLoaderBase(data_dir: str | PathLike)\n</code></pre> <p>               Bases: <code>ABC</code></p> <p>Abstract base class for BGC loader.</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to directory that contains BGC metadata files (.json) or full data genbank files (.gbk).</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>def __init__(self, data_dir: str | PathLike) -&gt; None:\n    \"\"\"Initialize the BGC loader.\n\n    Args:\n        data_dir: Path to directory that contains BGC metadata files\n            (.json) or full data genbank files (.gbk).\n    \"\"\"\n    self.data_dir = str(data_dir)\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase.data_dir","title":"data_dir  <code>instance-attribute</code>","text":"<pre><code>data_dir = str(data_dir)\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase.get_files","title":"get_files  <code>abstractmethod</code>","text":"<pre><code>get_files() -&gt; dict[str, str]\n</code></pre> <p>Get path to BGC files.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is BGC name and value is path to BGC file</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>@abstractmethod\ndef get_files(self) -&gt; dict[str, str]:\n    \"\"\"Get path to BGC files.\n\n    Returns:\n        The key is BGC name and value is path to BGC file\n    \"\"\"\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase.get_bgcs","title":"get_bgcs  <code>abstractmethod</code>","text":"<pre><code>get_bgcs() -&gt; list[BGC]\n</code></pre> <p>Get BGC objects.</p> <p>Returns:</p> <ul> <li> <code>list[BGC]</code>           \u2013            <p>A list of BGC objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>@abstractmethod\ndef get_bgcs(self) -&gt; list[BGC]:\n    \"\"\"Get BGC objects.\n\n    Returns:\n        A list of BGC objects\n    \"\"\"\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.GCFLoaderBase","title":"GCFLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for GCF loader.</p>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.GCFLoaderBase.get_gcfs","title":"get_gcfs  <code>abstractmethod</code>","text":"<pre><code>get_gcfs(\n    keep_mibig_only: bool, keep_singleton: bool\n) -&gt; list[GCF]\n</code></pre> <p>Get GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_mibig_only</code>               (<code>bool</code>)           \u2013            <p>True to keep GCFs that contain only MIBiG BGCs.</p> </li> <li> <code>keep_singleton</code>               (<code>bool</code>)           \u2013            <p>True to keep singleton GCFs. A singleton GCF is a GCF that contains only one BGC.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[GCF]</code>           \u2013            <p>A list of GCF objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>@abstractmethod\ndef get_gcfs(self, keep_mibig_only: bool, keep_singleton: bool) -&gt; list[GCF]:\n    \"\"\"Get GCF objects.\n\n    Args:\n        keep_mibig_only: True to keep GCFs that contain only MIBiG\n            BGCs.\n        keep_singleton: True to keep singleton GCFs. A singleton GCF\n            is a GCF that contains only one BGC.\n\n    Returns:\n        A list of GCF objects\n    \"\"\"\n</code></pre>"},{"location":"api/genomics_utils/","title":"Utilities","text":""},{"location":"api/genomics_utils/#nplinker.genomics.utils","title":"nplinker.genomics.utils","text":""},{"location":"api/genomics_utils/#nplinker.genomics.utils.generate_mappings_genome_id_bgc_id","title":"generate_mappings_genome_id_bgc_id","text":"<pre><code>generate_mappings_genome_id_bgc_id(\n    bgc_dir: str | PathLike,\n    output_file: str | PathLike | None = None,\n) -&gt; None\n</code></pre> <p>Generate a file that maps genome id to BGC id.</p> <p>The input <code>bgc_dir</code> must follow the structure of the <code>antismash</code> directory defined in Working Directory Structure, e.g.: <pre><code>bgc_dir\n    \u251c\u2500\u2500 genome_id_1\n    \u2502\u00a0 \u251c\u2500\u2500 bgc_id_1.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 genome_id_2\n    \u2502\u00a0 \u251c\u2500\u2500 bgc_id_2.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre></p> <p>Parameters:</p> <ul> <li> <code>bgc_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>The directory has one-layer of subfolders and each subfolder contains BGC files in <code>.gbk</code> format.</p> <p>It assumes that</p> <ul> <li>the subfolder name is the genome id (e.g. refseq),</li> <li>the BGC file name is the BGC id.</li> </ul> </li> <li> <code>output_file</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>The path to the output file. The file will be overwritten if it already exists.</p> <p>Defaults to None, in which case the output file will be placed in the directory <code>bgc_dir</code> with the file name GENOME_BGC_MAPPINGS_FILENAME.</p> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def generate_mappings_genome_id_bgc_id(\n    bgc_dir: str | PathLike, output_file: str | PathLike | None = None\n) -&gt; None:\n    \"\"\"Generate a file that maps genome id to BGC id.\n\n    The input `bgc_dir` must follow the structure of the `antismash` directory defined in\n    [Working Directory Structure][working-directory-structure], e.g.:\n    ```shell\n    bgc_dir\n        \u251c\u2500\u2500 genome_id_1\n        \u2502\u00a0 \u251c\u2500\u2500 bgc_id_1.gbk\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u251c\u2500\u2500 genome_id_2\n        \u2502\u00a0 \u251c\u2500\u2500 bgc_id_2.gbk\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u2514\u2500\u2500 ...\n    ```\n\n    Args:\n        bgc_dir: The directory has one-layer of subfolders and each subfolder contains BGC files\n            in `.gbk` format.\n\n            It assumes that\n\n            - the subfolder name is the genome id (e.g. refseq),\n            - the BGC file name is the BGC id.\n        output_file: The path to the output file.\n            The file will be overwritten if it already exists.\n\n            Defaults to None, in which case the output file will be placed in\n            the directory `bgc_dir` with the file name\n            [GENOME_BGC_MAPPINGS_FILENAME][nplinker.defaults.GENOME_BGC_MAPPINGS_FILENAME].\n    \"\"\"\n    bgc_dir = Path(bgc_dir)\n    genome_bgc_mappings = {}\n\n    for subdir in list_dirs(bgc_dir):\n        genome_id = Path(subdir).name\n        bgc_files = list_files(subdir, suffix=(\".gbk\"), keep_parent=False)\n        bgc_ids = [bgc_id for f in bgc_files if (bgc_id := Path(f).stem) != genome_id]\n        if bgc_ids:\n            genome_bgc_mappings[genome_id] = bgc_ids\n        else:\n            logger.warning(\"No BGC files found in %s\", subdir)\n\n    # sort mappings by genome_id and construct json data\n    genome_bgc_mappings = dict(sorted(genome_bgc_mappings.items()))\n    json_data_mappings = [{\"genome_ID\": k, \"BGC_ID\": v} for k, v in genome_bgc_mappings.items()]\n    json_data = {\"mappings\": json_data_mappings, \"version\": \"1.0\"}\n\n    # validate json data\n    validate(instance=json_data, schema=GENOME_BGC_MAPPINGS_SCHEMA)\n\n    if output_file is None:\n        output_file = bgc_dir / GENOME_BGC_MAPPINGS_FILENAME\n    with open(output_file, \"w\") as f:\n        json.dump(json_data, f)\n    logger.info(\"Generated genome-BGC mappings file: %s\", output_file)\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.add_strain_to_bgc","title":"add_strain_to_bgc","text":"<pre><code>add_strain_to_bgc(\n    strains: StrainCollection, bgcs: Sequence[BGC]\n) -&gt; tuple[list[BGC], list[BGC]]\n</code></pre> <p>Assign a Strain object to <code>BGC.strain</code> for input BGCs.</p> <p>BGC id is used to find the corresponding Strain object. It's possible that no Strain object is found for a BGC id.</p> <p>Note</p> <p>The input <code>bgcs</code> will be changed in place.</p> <p>Parameters:</p> <ul> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A collection of all strain objects.</p> </li> <li> <code>bgcs</code>               (<code>Sequence[BGC]</code>)           \u2013            <p>A list of BGC objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[BGC], list[BGC]]</code>           \u2013            <p>A tuple of two lists of BGC objects,</p> <ul> <li>the first list contains BGC objects that are updated with Strain object;</li> <li>the second list contains BGC objects that are not updated with     Strain object because no Strain object is found.</li> </ul> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Multiple strain objects found for a BGC id.</p> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def add_strain_to_bgc(\n    strains: StrainCollection, bgcs: Sequence[BGC]\n) -&gt; tuple[list[BGC], list[BGC]]:\n    \"\"\"Assign a Strain object to `BGC.strain` for input BGCs.\n\n    BGC id is used to find the corresponding Strain object. It's possible that\n    no Strain object is found for a BGC id.\n\n    !!! Note\n        The input `bgcs` will be changed in place.\n\n    Args:\n        strains: A collection of all strain objects.\n        bgcs: A list of BGC objects.\n\n    Returns:\n        A tuple of two lists of BGC objects,\n\n            - the first list contains BGC objects that are updated with Strain object;\n            - the second list contains BGC objects that are not updated with\n                Strain object because no Strain object is found.\n\n    Raises:\n        ValueError: Multiple strain objects found for a BGC id.\n    \"\"\"\n    bgc_with_strain = []\n    bgc_without_strain = []\n    for bgc in bgcs:\n        try:\n            strain_list = strains.lookup(bgc.id)\n        except ValueError:\n            bgc_without_strain.append(bgc)\n            continue\n        if len(strain_list) &gt; 1:\n            raise ValueError(\n                f\"Multiple strain objects found for BGC id '{bgc.id}'.\"\n                f\"BGC object accept only one strain.\"\n            )\n        bgc.strain = strain_list[0]\n        bgc_with_strain.append(bgc)\n\n    logger.info(\n        f\"{len(bgc_with_strain)} BGC objects updated with Strain object.\\n\"\n        f\"{len(bgc_without_strain)} BGC objects not updated with Strain object.\"\n    )\n    return bgc_with_strain, bgc_without_strain\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.add_bgc_to_gcf","title":"add_bgc_to_gcf","text":"<pre><code>add_bgc_to_gcf(\n    bgcs: Sequence[BGC], gcfs: Sequence[GCF]\n) -&gt; tuple[list[GCF], list[GCF], dict[GCF, set[str]]]\n</code></pre> <p>Add BGC objects to GCF object based on GCF's BGC ids.</p> <p>The attribute of <code>GCF.bgc_ids</code> contains the ids of BGC objects. These ids are used to find BGC objects from the input <code>bgcs</code> list. The found BGC objects are added to the <code>bgcs</code> attribute of GCF object. It is possible that some BGC ids are not found in the input <code>bgcs</code> list, and so their BGC objects are missing in the GCF object.</p> <p>Note</p> <p>This method changes the lists <code>bgcs</code> and <code>gcfs</code> in place.</p> <p>Parameters:</p> <ul> <li> <code>bgcs</code>               (<code>Sequence[BGC]</code>)           \u2013            <p>A list of BGC objects.</p> </li> <li> <code>gcfs</code>               (<code>Sequence[GCF]</code>)           \u2013            <p>A list of GCF objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[GCF], list[GCF], dict[GCF, set[str]]]</code>           \u2013            <p>A tuple of two lists and a dictionary,</p> <ul> <li>The first list contains GCF objects that are updated with BGC objects;</li> <li>The second list contains GCF objects that are not updated with BGC objects     because no BGC objects are found;</li> <li>The dictionary contains GCF objects as keys and a set of ids of missing     BGC objects as values.</li> </ul> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def add_bgc_to_gcf(\n    bgcs: Sequence[BGC], gcfs: Sequence[GCF]\n) -&gt; tuple[list[GCF], list[GCF], dict[GCF, set[str]]]:\n    \"\"\"Add BGC objects to GCF object based on GCF's BGC ids.\n\n    The attribute of `GCF.bgc_ids` contains the ids of BGC objects. These ids\n    are used to find BGC objects from the input `bgcs` list. The found BGC\n    objects are added to the `bgcs` attribute of GCF object. It is possible that\n    some BGC ids are not found in the input `bgcs` list, and so their BGC\n    objects are missing in the GCF object.\n\n    !!! note\n        This method changes the lists `bgcs` and `gcfs` in place.\n\n    Args:\n        bgcs: A list of BGC objects.\n        gcfs: A list of GCF objects.\n\n    Returns:\n        A tuple of two lists and a dictionary,\n\n            - The first list contains GCF objects that are updated with BGC objects;\n            - The second list contains GCF objects that are not updated with BGC objects\n                because no BGC objects are found;\n            - The dictionary contains GCF objects as keys and a set of ids of missing\n                BGC objects as values.\n    \"\"\"\n    bgc_dict = {bgc.id: bgc for bgc in bgcs}\n    gcf_with_bgc = []\n    gcf_without_bgc = []\n    gcf_missing_bgc: dict[GCF, set[str]] = {}\n    for gcf in gcfs:\n        for bgc_id in gcf.bgc_ids:\n            try:\n                bgc = bgc_dict[bgc_id]\n            except KeyError:\n                if gcf not in gcf_missing_bgc:\n                    gcf_missing_bgc[gcf] = {bgc_id}\n                else:\n                    gcf_missing_bgc[gcf].add(bgc_id)\n                continue\n            gcf.add_bgc(bgc)\n\n        if gcf.bgcs:\n            gcf_with_bgc.append(gcf)\n        else:\n            gcf_without_bgc.append(gcf)\n\n    logger.info(\n        f\"{len(gcf_with_bgc)} GCF objects updated with BGC objects.\\n\"\n        f\"{len(gcf_without_bgc)} GCF objects not updated with BGC objects.\\n\"\n        f\"{len(gcf_missing_bgc)} GCF objects have missing BGC objects.\"\n    )\n    return gcf_with_bgc, gcf_without_bgc, gcf_missing_bgc\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.get_mibig_from_gcf","title":"get_mibig_from_gcf","text":"<pre><code>get_mibig_from_gcf(\n    gcfs: Sequence[GCF],\n) -&gt; tuple[list[BGC], StrainCollection]\n</code></pre> <p>Get MIBiG BGCs and strains from GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>gcfs</code>               (<code>Sequence[GCF]</code>)           \u2013            <p>A list of GCF objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[BGC], StrainCollection]</code>           \u2013            <p>A tuple of two objects,</p> <ul> <li>the first is a list of MIBiG BGC objects used in the GCFs;</li> <li>the second is a StrainCollection object that contains all Strain objects used in the GCFs.</li> </ul> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def get_mibig_from_gcf(gcfs: Sequence[GCF]) -&gt; tuple[list[BGC], StrainCollection]:\n    \"\"\"Get MIBiG BGCs and strains from GCF objects.\n\n    Args:\n        gcfs: A list of GCF objects.\n\n    Returns:\n        A tuple of two objects,\n\n            - the first is a list of MIBiG BGC objects used in the GCFs;\n            - the second is a StrainCollection object that contains all Strain objects used in the\n            GCFs.\n    \"\"\"\n    mibig_bgcs_in_use = []\n    mibig_strains_in_use = StrainCollection()\n    for gcf in gcfs:\n        for bgc in gcf.bgcs:\n            if bgc.is_mibig():\n                mibig_bgcs_in_use.append(bgc)\n                if bgc.strain is not None:\n                    mibig_strains_in_use.add(bgc.strain)\n    return mibig_bgcs_in_use, mibig_strains_in_use\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.extract_mappings_strain_id_original_genome_id","title":"extract_mappings_strain_id_original_genome_id","text":"<pre><code>extract_mappings_strain_id_original_genome_id(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"strain_id &lt;-&gt; original_genome_id\".</p> <p>Tip</p> <p>The <code>podp_project_json_file</code> is the JSON file downloaded from PODP platform.</p> <p>For example, for PODP project MSV000079284, its JSON file is https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.</p> <p>Parameters:</p> <ul> <li> <code>podp_project_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the PODP project JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of original genome ids.</p> </li> </ul> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def extract_mappings_strain_id_original_genome_id(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"strain_id &lt;-&gt; original_genome_id\".\n\n    !!! tip\n        The `podp_project_json_file` is the JSON file downloaded from PODP platform.\n\n        For example, for PODP project MSV000079284, its JSON file is\n        https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.\n\n    Args:\n        podp_project_json_file: The path to the PODP project\n            JSON file.\n\n    Returns:\n        Key is strain id and value is a set of original genome ids.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict: dict[str, set[str]] = {}\n    with open(podp_project_json_file, \"r\") as f:\n        json_data = json.load(f)\n\n    validate_podp_json(json_data)\n\n    for record in json_data[\"genomes\"]:\n        strain_id = record[\"genome_label\"]\n        genome_id = get_best_available_genome_id(record[\"genome_ID\"])\n        if genome_id is None:\n            logger.warning(\"Failed to extract genome ID from genome with label %s\", strain_id)\n            continue\n        if strain_id in mappings_dict:\n            mappings_dict[strain_id].add(genome_id)\n        else:\n            mappings_dict[strain_id] = {genome_id}\n    return mappings_dict\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.extract_mappings_original_genome_id_resolved_genome_id","title":"extract_mappings_original_genome_id_resolved_genome_id","text":"<pre><code>extract_mappings_original_genome_id_resolved_genome_id(\n    genome_status_json_file: str | PathLike,\n) -&gt; dict[str, str]\n</code></pre> <p>Extract mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".</p> <p>Tip</p> <p>The <code>genome_status_json_file</code> is generated by the podp_download_and_extract_antismash_data function with a default file name GENOME_STATUS_FILENAME.</p> <p>Parameters:</p> <ul> <li> <code>genome_status_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome status JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>Key is original genome id and value is resolved genome id.</p> </li> </ul> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def extract_mappings_original_genome_id_resolved_genome_id(\n    genome_status_json_file: str | PathLike,\n) -&gt; dict[str, str]:\n    \"\"\"Extract mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".\n\n    !!! tip\n        The `genome_status_json_file` is generated by the [podp_download_and_extract_antismash_data]\n        [nplinker.genomics.antismash.podp_antismash_downloader.podp_download_and_extract_antismash_data]\n        function with a default file name [GENOME_STATUS_FILENAME][nplinker.defaults.GENOME_STATUS_FILENAME].\n\n    Args:\n        genome_status_json_file: The path to the genome status JSON file.\n\n\n    Returns:\n        Key is original genome id and value is resolved genome id.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    gs_mappings_dict = GenomeStatus.read_json(genome_status_json_file)\n    return {gs.original_id: gs.resolved_refseq_id for gs in gs_mappings_dict.values()}\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.extract_mappings_resolved_genome_id_bgc_id","title":"extract_mappings_resolved_genome_id_bgc_id","text":"<pre><code>extract_mappings_resolved_genome_id_bgc_id(\n    genome_bgc_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"resolved_genome_id &lt;-&gt; bgc_id\".</p> <p>Tip</p> <p>The <code>genome_bgc_mappings_file</code> is usually generated by the generate_mappings_genome_id_bgc_id function with a default file name GENOME_BGC_MAPPINGS_FILENAME.</p> <p>Parameters:</p> <ul> <li> <code>genome_bgc_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome BGC mappings JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is resolved genome id and value is a set of BGC ids.</p> </li> </ul> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def extract_mappings_resolved_genome_id_bgc_id(\n    genome_bgc_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"resolved_genome_id &lt;-&gt; bgc_id\".\n\n    !!! tip\n        The `genome_bgc_mappings_file` is usually generated by the\n        [generate_mappings_genome_id_bgc_id][nplinker.genomics.utils.generate_mappings_genome_id_bgc_id]\n        function with a default file name [GENOME_BGC_MAPPINGS_FILENAME][nplinker.defaults.GENOME_BGC_MAPPINGS_FILENAME].\n\n    Args:\n        genome_bgc_mappings_file: The path to the genome BGC\n            mappings JSON file.\n\n    Returns:\n        Key is resolved genome id and value is a set of BGC ids.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    with open(genome_bgc_mappings_file, \"r\") as f:\n        json_data = json.load(f)\n\n    # validate the JSON data\n    validate(json_data, GENOME_BGC_MAPPINGS_SCHEMA)\n\n    return {mapping[\"genome_ID\"]: set(mapping[\"BGC_ID\"]) for mapping in json_data[\"mappings\"]}\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.get_mappings_strain_id_bgc_id","title":"get_mappings_strain_id_bgc_id","text":"<pre><code>get_mappings_strain_id_bgc_id(\n    mappings_strain_id_original_genome_id: Mapping[\n        str, set[str]\n    ],\n    mappings_original_genome_id_resolved_genome_id: Mapping[\n        str, str\n    ],\n    mappings_resolved_genome_id_bgc_id: Mapping[\n        str, set[str]\n    ],\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Get mappings \"strain_id &lt;-&gt; bgc_id\".</p> <p>Parameters:</p> <ul> <li> <code>mappings_strain_id_original_genome_id</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"strain_id &lt;-&gt; original_genome_id\".</p> </li> <li> <code>mappings_original_genome_id_resolved_genome_id</code>               (<code>Mapping[str, str]</code>)           \u2013            <p>Mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".</p> </li> <li> <code>mappings_resolved_genome_id_bgc_id</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"resolved_genome_id &lt;-&gt; bgc_id\".</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of BGC ids.</p> </li> </ul> See Also <ul> <li><code>extract_mappings_strain_id_original_genome_id</code>: Extract mappings     \"strain_id &lt;-&gt; original_genome_id\".</li> <li><code>extract_mappings_original_genome_id_resolved_genome_id</code>: Extract mappings     \"original_genome_id &lt;-&gt; resolved_genome_id\".</li> <li><code>extract_mappings_resolved_genome_id_bgc_id</code>: Extract mappings     \"resolved_genome_id &lt;-&gt; bgc_id\".</li> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def get_mappings_strain_id_bgc_id(\n    mappings_strain_id_original_genome_id: Mapping[str, set[str]],\n    mappings_original_genome_id_resolved_genome_id: Mapping[str, str],\n    mappings_resolved_genome_id_bgc_id: Mapping[str, set[str]],\n) -&gt; dict[str, set[str]]:\n    \"\"\"Get mappings \"strain_id &lt;-&gt; bgc_id\".\n\n    Args:\n        mappings_strain_id_original_genome_id: Mappings \"strain_id &lt;-&gt; original_genome_id\".\n        mappings_original_genome_id_resolved_genome_id: Mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".\n        mappings_resolved_genome_id_bgc_id: Mappings \"resolved_genome_id &lt;-&gt; bgc_id\".\n\n    Returns:\n        Key is strain id and value is a set of BGC ids.\n\n    See Also:\n        - `extract_mappings_strain_id_original_genome_id`: Extract mappings\n            \"strain_id &lt;-&gt; original_genome_id\".\n        - `extract_mappings_original_genome_id_resolved_genome_id`: Extract mappings\n            \"original_genome_id &lt;-&gt; resolved_genome_id\".\n        - `extract_mappings_resolved_genome_id_bgc_id`: Extract mappings\n            \"resolved_genome_id &lt;-&gt; bgc_id\".\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict = {}\n    for strain_id, original_genome_ids in mappings_strain_id_original_genome_id.items():\n        bgc_ids = set()\n        for original_genome_id in original_genome_ids:\n            resolved_genome_id = mappings_original_genome_id_resolved_genome_id[original_genome_id]\n            if (bgc_id := mappings_resolved_genome_id_bgc_id.get(resolved_genome_id)) is not None:\n                bgc_ids.update(bgc_id)\n        if bgc_ids:\n            mappings_dict[strain_id] = bgc_ids\n    return mappings_dict\n</code></pre>"},{"location":"api/gnps/","title":"GNPS","text":""},{"location":"api/gnps/#nplinker.metabolomics.gnps","title":"nplinker.metabolomics.gnps","text":""},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat","title":"GNPSFormat","text":"<p>               Bases: <code>Enum</code></p> <p>Enum class for GNPS formats or workflows.</p> Concept <p>GNPS data</p> <p>The name of the enum is a short name for the workflow, and the value of the enum is the workflow name used on the GNPS website.</p>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.SNETS","title":"SNETS  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>SNETS = 'METABOLOMICS-SNETS'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.SNETSV2","title":"SNETSV2  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>SNETSV2 = 'METABOLOMICS-SNETS-V2'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.FBMN","title":"FBMN  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>FBMN = 'FEATURE-BASED-MOLECULAR-NETWORKING'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.Unknown","title":"Unknown  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>Unknown = 'Unknown-GNPS-Workflow'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader","title":"GNPSDownloader","text":"<pre><code>GNPSDownloader(task_id: str, download_root: str | PathLike)\n</code></pre> <p>Download GNPS zip archive for the given task id.</p> Concept <p>GNPS data</p> <p>Note that only GNPS workflows listed in the GNPSFormat enum are supported.</p> <p>Attributes:</p> <ul> <li> <code>GNPS_DATA_DOWNLOAD_URL</code>               (<code>str</code>)           \u2013            <p>URL template for downloading GNPS data.</p> </li> <li> <code>GNPS_DATA_DOWNLOAD_URL_FBMN</code>               (<code>str</code>)           \u2013            <p>URL template for downloading GNPS data for FBMN.</p> </li> <li> <code>gnps_format</code>               (<code>GNPSFormat</code>)           \u2013            <p>GNPS workflow type.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>task_id</code>               (<code>str</code>)           \u2013            <p>GNPS task id, identifying the data to be downloaded.</p> </li> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path where to store the downloaded archive.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the given task id does not correspond to a supported GNPS workflow.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; GNPSDownloader(\"c22f44b14a3d450eb836d607cb9521bb\", \"~/downloads\")\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def __init__(self, task_id: str, download_root: str | PathLike):\n    \"\"\"Initialize the GNPSDownloader.\n\n    Args:\n        task_id: GNPS task id, identifying the data to be downloaded.\n        download_root: Path where to store the downloaded archive.\n\n    Raises:\n        ValueError: If the given task id does not correspond to a supported\n            GNPS workflow.\n\n    Examples:\n        &gt;&gt;&gt; GNPSDownloader(\"c22f44b14a3d450eb836d607cb9521bb\", \"~/downloads\")\n    \"\"\"\n    gnps_format = gnps_format_from_task_id(task_id)\n    if gnps_format == GNPSFormat.Unknown:\n        raise ValueError(\n            f\"Unknown workflow type for GNPS task '{task_id}'.\"\n            f\"Supported GNPS workflows are described in the GNPSFormat enum, \"\n            f\"including such as 'METABOLOMICS-SNETS', 'METABOLOMICS-SNETS-V2' \"\n            f\"and 'FEATURE-BASED-MOLECULAR-NETWORKING'.\"\n        )\n\n    self._task_id = task_id\n    self._download_root: Path = Path(download_root)\n    self._gnps_format = gnps_format\n    self._file_name = gnps_format.value + \"-\" + self._task_id + \".zip\"\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.GNPS_DATA_DOWNLOAD_URL","title":"GNPS_DATA_DOWNLOAD_URL  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>GNPS_DATA_DOWNLOAD_URL: str = (\n    \"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&amp;view=download_clustered_spectra\"\n)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.GNPS_DATA_DOWNLOAD_URL_FBMN","title":"GNPS_DATA_DOWNLOAD_URL_FBMN  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>GNPS_DATA_DOWNLOAD_URL_FBMN: str = (\n    \"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&amp;view=download_cytoscape_data\"\n)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.gnps_format","title":"gnps_format  <code>property</code>","text":"<pre><code>gnps_format: GNPSFormat\n</code></pre> <p>Get the GNPS workflow type.</p> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>GNPS workflow type.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.download","title":"download","text":"<pre><code>download() -&gt; Self\n</code></pre> <p>Download GNPS data.</p> <p>Note: GNPS data is downloaded using the POST method (empty payload is OK).</p> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def download(self) -&gt; Self:\n    \"\"\"Download GNPS data.\n\n    Note: GNPS data is downloaded using the POST method (empty payload is OK).\n    \"\"\"\n    download_url(\n        self.get_url(), self._download_root, filename=self._file_name, http_method=\"POST\"\n    )\n    return self\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.get_download_file","title":"get_download_file","text":"<pre><code>get_download_file() -&gt; str\n</code></pre> <p>Get the path to the downloaded file.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Download path as string</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def get_download_file(self) -&gt; str:\n    \"\"\"Get the path to the downloaded file.\n\n    Returns:\n        Download path as string\n    \"\"\"\n    return str(Path(self._download_root) / self._file_name)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.get_task_id","title":"get_task_id","text":"<pre><code>get_task_id() -&gt; str\n</code></pre> <p>Get the GNPS task id.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Task id as string.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def get_task_id(self) -&gt; str:\n    \"\"\"Get the GNPS task id.\n\n    Returns:\n        Task id as string.\n    \"\"\"\n    return self._task_id\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.get_url","title":"get_url","text":"<pre><code>get_url() -&gt; str\n</code></pre> <p>Get the download URL.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>URL pointing to the GNPS data to be downloaded.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def get_url(self) -&gt; str:\n    \"\"\"Get the download URL.\n\n    Returns:\n        URL pointing to the GNPS data to be downloaded.\n    \"\"\"\n    if self.gnps_format == GNPSFormat.FBMN:\n        return GNPSDownloader.GNPS_DATA_DOWNLOAD_URL_FBMN.format(self._task_id)\n    return GNPSDownloader.GNPS_DATA_DOWNLOAD_URL.format(self._task_id)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSExtractor","title":"GNPSExtractor","text":"<pre><code>GNPSExtractor(\n    file: str | PathLike, extract_dir: str | PathLike\n)\n</code></pre> <p>Extract files from a GNPS molecular networking archive (.zip).</p> Concept <p>GNPS data</p> <p>Four files are extracted and renamed to the following names:</p> <ul> <li>file_mappings(.tsv/.csv)</li> <li>spectra.mgf</li> <li>molecular_families.tsv</li> <li>annotations.tsv</li> </ul> <p>The files to be extracted are selected based on the GNPS workflow type, as described below (in the order of the files above):</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv</li> <li>METABOLOMICS-SNETS*.mgf</li> <li>networkedges_selfloop/*.pairsinfo</li> <li>result_specnets_DB/*.tsv</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary</li> <li>METABOLOMICS-SNETS-V2*.mgf</li> <li>networkedges_selfloop/*.selfloop</li> <li>result_specnets_DB/.tsv</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>quantification_table/.csv</li> <li>spectra/*.mgf</li> <li>networkedges_selfloop/*.selfloop</li> <li>DB_result/*.tsv</li> </ul> </li> </ol> <p>Attributes:</p> <ul> <li> <code>gnps_format</code>               (<code>GNPSFormat</code>)           \u2013            <p>The GNPS workflow type.</p> </li> <li> <code>extract_dir</code>               (<code>str</code>)           \u2013            <p>The path where to extract the files to.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the GNPS zip file.</p> </li> <li> <code>extract_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>path to the directory where to extract the files to.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the given file is an invalid GNPS archive.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gnps_extractor = GNPSExtractor(\"path/to/gnps_archive.zip\", \"path/to/extract_dir\")\n&gt;&gt;&gt; gnps_extractor.gnps_format\n&lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n&gt;&gt;&gt; gnps_extractor.extract_dir\n'path/to/extract_dir'\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_extractor.py</code> <pre><code>def __init__(self, file: str | PathLike, extract_dir: str | PathLike):\n    \"\"\"Initialize the GNPSExtractor.\n\n    Args:\n        file: The path to the GNPS zip file.\n        extract_dir: path to the directory where to extract the files to.\n\n    Raises:\n        ValueError: If the given file is an invalid GNPS archive.\n\n    Examples:\n        &gt;&gt;&gt; gnps_extractor = GNPSExtractor(\"path/to/gnps_archive.zip\", \"path/to/extract_dir\")\n        &gt;&gt;&gt; gnps_extractor.gnps_format\n        &lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n        &gt;&gt;&gt; gnps_extractor.extract_dir\n        'path/to/extract_dir'\n    \"\"\"\n    gnps_format = gnps_format_from_archive(file)\n    if gnps_format == GNPSFormat.Unknown:\n        raise ValueError(\n            f\"Unknown workflow type for GNPS archive '{file}'.\"\n            f\"Supported GNPS workflows are described in the GNPSFormat enum, \"\n            f\"including such as 'METABOLOMICS-SNETS', 'METABOLOMICS-SNETS-V2' \"\n            f\"and 'FEATURE-BASED-MOLECULAR-NETWORKING'.\"\n        )\n\n    self._file = Path(file)\n    self._extract_path = Path(extract_dir)\n    self._gnps_format = gnps_format\n    # the order of filenames matters\n    self._target_files = [\n        \"file_mappings\",\n        \"spectra.mgf\",\n        \"molecular_families.tsv\",\n        \"annotations.tsv\",\n    ]\n\n    self._extract()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSExtractor.gnps_format","title":"gnps_format  <code>property</code>","text":"<pre><code>gnps_format: GNPSFormat\n</code></pre> <p>Get the GNPS workflow type.</p> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>GNPS workflow type.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSExtractor.extract_dir","title":"extract_dir  <code>property</code>","text":"<pre><code>extract_dir: str\n</code></pre> <p>Get the path where to extract the files to.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Path where to extract files as string.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSSpectrumLoader","title":"GNPSSpectrumLoader","text":"<pre><code>GNPSSpectrumLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>SpectrumLoaderBase</code></p> <p>Load mass spectra from the given GNPS MGF file.</p> Concept <p>GNPS data</p> <p>The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>METABOLOMICS-SNETS*.mgf</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>METABOLOMICS-SNETS-V2*.mgf</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>spectra/*.mgf</li> </ul> </li> </ol> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>path to the MGF file.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Raises ValueError if the file is not valid.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSSpectrumLoader(\"gnps_spectra.mgf\")\n&gt;&gt;&gt; print(loader.spectra[0])\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_spectrum_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSSpectrumLoader.\n\n    Args:\n        file: path to the MGF file.\n\n    Raises:\n        ValueError: Raises ValueError if the file is not valid.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSSpectrumLoader(\"gnps_spectra.mgf\")\n        &gt;&gt;&gt; print(loader.spectra[0])\n    \"\"\"\n    self._file = str(file)\n    self._spectra: list[Spectrum] = []\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSSpectrumLoader.spectra","title":"spectra  <code>property</code>","text":"<pre><code>spectra: list[Spectrum]\n</code></pre> <p>Get the list of Spectrum objects.</p> <p>Returns:</p> <ul> <li> <code>list[Spectrum]</code>           \u2013            <p>list[Spectrum]: the loaded spectra as a list of <code>Spectrum</code> objects.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSMolecularFamilyLoader","title":"GNPSMolecularFamilyLoader","text":"<pre><code>GNPSMolecularFamilyLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>MolecularFamilyLoaderBase</code></p> <p>Load molecular families from GNPS data.</p> Concept <p>GNPS data</p> <p>The molecular family file is from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>networkedges_selfloop/*.pairsinfo</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>networkedges_selfloop/*.selfloop</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>networkedges_selfloop/*.selfloop</li> </ul> </li> </ol> <p>The <code>ComponentIndex</code> column in the GNPS molecular family file is treated as family id.</p> <p>But for molecular families that have only one member (i.e. spectrum), named singleton molecular families, their files have the same value of <code>-1</code> in the <code>ComponentIndex</code> column. To make the family id unique,the spectrum id plus a prefix <code>singleton-</code> is used as the family id of singleton molecular families.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS molecular family file.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Raises ValueError if the file is not valid.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSMolecularFamilyLoader(\"gnps_molecular_families.tsv\")\n&gt;&gt;&gt; print(loader.families)\n[&lt;MolecularFamily 1&gt;, &lt;MolecularFamily 2&gt;, ...]\n&gt;&gt;&gt; print(loader.families[0].spectra_ids)\n{'1', '3', '7', ...}\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSMolecularFamilyLoader.\n\n    Args:\n        file: Path to the GNPS molecular family file.\n\n    Raises:\n        ValueError: Raises ValueError if the file is not valid.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSMolecularFamilyLoader(\"gnps_molecular_families.tsv\")\n        &gt;&gt;&gt; print(loader.families)\n        [&lt;MolecularFamily 1&gt;, &lt;MolecularFamily 2&gt;, ...]\n        &gt;&gt;&gt; print(loader.families[0].spectra_ids)\n        {'1', '3', '7', ...}\n    \"\"\"\n    self._mfs: list[MolecularFamily] = []\n    self._file = file\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSMolecularFamilyLoader.get_mfs","title":"get_mfs","text":"<pre><code>get_mfs(\n    keep_singleton: bool = False,\n) -&gt; list[MolecularFamily]\n</code></pre> <p>Get MolecularFamily objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_singleton</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep singleton molecular families. A singleton molecular family is a molecular family that contains only one spectrum.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[MolecularFamily]</code>           \u2013            <p>A list of MolecularFamily objects with their spectra ids.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py</code> <pre><code>def get_mfs(self, keep_singleton: bool = False) -&gt; list[MolecularFamily]:\n    \"\"\"Get MolecularFamily objects.\n\n    Args:\n        keep_singleton: True to keep singleton molecular families. A\n            singleton molecular family is a molecular family that contains\n            only one spectrum.\n\n    Returns:\n        A list of MolecularFamily objects with their spectra ids.\n    \"\"\"\n    mfs = self._mfs\n    if not keep_singleton:\n        mfs = [mf for mf in mfs if not mf.is_singleton()]\n    return mfs\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSAnnotationLoader","title":"GNPSAnnotationLoader","text":"<pre><code>GNPSAnnotationLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>AnnotationLoaderBase</code></p> <p>Load annotations from GNPS output file.</p> Concept <p>GNPS data</p> <p>The annotation file is a <code>.tsv</code> file from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>result_specnets_DB/*.tsv</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>result_specnets_DB/.tsv</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>DB_result/*.tsv</li> </ul> </li> </ol> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>The GNPS annotation file.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSAnnotationLoader(\"gnps_annotations.tsv\")\n&gt;&gt;&gt; print(loader.annotations[\"100\"])\n{'#Scan#': '100',\n'Adduct': 'M+H',\n'CAS_Number': 'N/A',\n'Charge': '1',\n'Compound_Name': 'MLS002153841-01!Iobenguane sulfate',\n'Compound_Source': 'NIH Pharmacologically Active Library',\n'Data_Collector': 'VP/LMS',\n'ExactMass': '274.992',\n'INCHI': 'N/A',\n'INCHI_AUX': 'N/A',\n'Instrument': 'qTof',\n'IonMode': 'Positive',\n'Ion_Source': 'LC-ESI',\n'LibMZ': '276.003',\n'LibraryName': 'lib-00014.mgf',\n'LibraryQualityString': 'Gold',\n'Library_Class': '1',\n'MQScore': '0.704152',\n'MZErrorPPM': '405416',\n'MassDiff': '111.896',\n'Organism': 'GNPS-NIH-SMALLMOLECULEPHARMACOLOGICALLYACTIVE',\n'PI': 'Dorrestein',\n'Precursor_MZ': '276.003',\n'Pubmed_ID': 'N/A',\n'RT_Query': '795.979',\n'SharedPeaks': '7',\n'Smiles': 'NC(=N)NCc1cccc(I)c1.OS(=O)(=O)O',\n'SpecCharge': '1',\n'SpecMZ': '164.107',\n'SpectrumFile': 'spectra/specs_ms.pklbin',\n'SpectrumID': 'CCMSLIB00000086167',\n'TIC_Query': '986.997',\n'UpdateWorkflowName': 'UPDATE-SINGLE-ANNOTATED-GOLD',\n'tags': ' ',\n'png_url': 'https://metabolomics-usi.gnps2.org/png/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n'json_url': 'https://metabolomics-usi.gnps2.org/json/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n'svg_url': 'https://metabolomics-usi.gnps2.org/svg/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n'spectrum_url': 'https://metabolomics-usi.gnps2.org/spectrum/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167'}\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_annotation_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSAnnotationLoader.\n\n    Args:\n        file: The GNPS annotation file.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSAnnotationLoader(\"gnps_annotations.tsv\")\n        &gt;&gt;&gt; print(loader.annotations[\"100\"])\n        {'#Scan#': '100',\n        'Adduct': 'M+H',\n        'CAS_Number': 'N/A',\n        'Charge': '1',\n        'Compound_Name': 'MLS002153841-01!Iobenguane sulfate',\n        'Compound_Source': 'NIH Pharmacologically Active Library',\n        'Data_Collector': 'VP/LMS',\n        'ExactMass': '274.992',\n        'INCHI': 'N/A',\n        'INCHI_AUX': 'N/A',\n        'Instrument': 'qTof',\n        'IonMode': 'Positive',\n        'Ion_Source': 'LC-ESI',\n        'LibMZ': '276.003',\n        'LibraryName': 'lib-00014.mgf',\n        'LibraryQualityString': 'Gold',\n        'Library_Class': '1',\n        'MQScore': '0.704152',\n        'MZErrorPPM': '405416',\n        'MassDiff': '111.896',\n        'Organism': 'GNPS-NIH-SMALLMOLECULEPHARMACOLOGICALLYACTIVE',\n        'PI': 'Dorrestein',\n        'Precursor_MZ': '276.003',\n        'Pubmed_ID': 'N/A',\n        'RT_Query': '795.979',\n        'SharedPeaks': '7',\n        'Smiles': 'NC(=N)NCc1cccc(I)c1.OS(=O)(=O)O',\n        'SpecCharge': '1',\n        'SpecMZ': '164.107',\n        'SpectrumFile': 'spectra/specs_ms.pklbin',\n        'SpectrumID': 'CCMSLIB00000086167',\n        'TIC_Query': '986.997',\n        'UpdateWorkflowName': 'UPDATE-SINGLE-ANNOTATED-GOLD',\n        'tags': ' ',\n        'png_url': 'https://metabolomics-usi.gnps2.org/png/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n        'json_url': 'https://metabolomics-usi.gnps2.org/json/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n        'svg_url': 'https://metabolomics-usi.gnps2.org/svg/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n        'spectrum_url': 'https://metabolomics-usi.gnps2.org/spectrum/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167'}\n    \"\"\"\n    self._file = Path(file)\n    self._annotations: dict[str, dict] = {}\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSAnnotationLoader.annotations","title":"annotations  <code>property</code>","text":"<pre><code>annotations: dict[str, dict]\n</code></pre> <p>Get annotations.</p> <p>Returns:</p> <ul> <li> <code>dict[str, dict]</code>           \u2013            <p>Keys are spectrum ids (\"#Scan#\" in annotation file) and values are the annotations dict</p> </li> <li> <code>dict[str, dict]</code>           \u2013            <p>for each spectrum.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFileMappingLoader","title":"GNPSFileMappingLoader","text":"<pre><code>GNPSFileMappingLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>FileMappingLoaderBase</code></p> <p>Class to load file mappings from GNPS output file.</p> Concept <p>GNPS data</p> <p>File mappings refers to the mapping from spectrum id to files in which this spectrum occurs.</p> <p>The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>quantification_table/.csv</li> </ul> </li> </ol> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS file mappings file.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Raises ValueError if the file is not valid.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSFileMappingLoader(\"gnps_file_mappings.tsv\")\n&gt;&gt;&gt; print(loader.mappings[\"1\"])\n['26c.mzXML']\n&gt;&gt;&gt; print(loader.mapping_reversed[\"26c.mzXML\"])\n{'1', '3', '7', ...}\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_file_mapping_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSFileMappingLoader.\n\n    Args:\n        file: Path to the GNPS file mappings file.\n\n    Raises:\n        ValueError: Raises ValueError if the file is not valid.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSFileMappingLoader(\"gnps_file_mappings.tsv\")\n        &gt;&gt;&gt; print(loader.mappings[\"1\"])\n        ['26c.mzXML']\n        &gt;&gt;&gt; print(loader.mapping_reversed[\"26c.mzXML\"])\n        {'1', '3', '7', ...}\n    \"\"\"\n    self._gnps_format = gnps_format_from_file_mapping(file)\n    if self._gnps_format is GNPSFormat.Unknown:\n        raise ValueError(\"Unknown workflow type for GNPS file mappings file \")\n\n    self._file = Path(file)\n    self._mapping: dict[str, list[str]] = {}\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFileMappingLoader.mappings","title":"mappings  <code>property</code>","text":"<pre><code>mappings: dict[str, list[str]]\n</code></pre> <p>Return mapping from spectrum id to files in which this spectrum occurs.</p> <p>Returns:</p> <ul> <li> <code>dict[str, list[str]]</code>           \u2013            <p>Mapping from spectrum id to names of all files in which this spectrum occurs.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFileMappingLoader.mapping_reversed","title":"mapping_reversed  <code>property</code>","text":"<pre><code>mapping_reversed: dict[str, set[str]]\n</code></pre> <p>Return mapping from file name to all spectra that occur in this file.</p> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Mapping from file name to all spectra ids that occur in this file.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.gnps_format_from_archive","title":"gnps_format_from_archive","text":"<pre><code>gnps_format_from_archive(\n    zip_file: str | PathLike,\n) -&gt; GNPSFormat\n</code></pre> <p>Detect GNPS format from GNPS zip archive.</p> <p>The detection is based on the filename of the zip file and the names of the files contained in the zip file.</p> <p>Parameters:</p> <ul> <li> <code>zip_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS zip file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>The format identified in the GNPS zip file.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-c22f44b1-download_clustered_spectra.zip\")\n&lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n&gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip\")\n&lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n&gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-672d0a53-download_cytoscape_data.zip\")\n&lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_format.py</code> <pre><code>def gnps_format_from_archive(zip_file: str | PathLike) -&gt; GNPSFormat:\n    \"\"\"Detect GNPS format from GNPS zip archive.\n\n    The detection is based on the filename of the zip file and the names of the\n    files contained in the zip file.\n\n    Args:\n        zip_file: Path to the GNPS zip file.\n\n    Returns:\n        The format identified in the GNPS zip file.\n\n    Examples:\n        &gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-c22f44b1-download_clustered_spectra.zip\")\n        &lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n        &gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip\")\n        &lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n        &gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-672d0a53-download_cytoscape_data.zip\")\n        &lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n    \"\"\"\n    file = Path(zip_file)\n    # Guess the format from the filename of the zip file\n    if GNPSFormat.FBMN.value in file.name:\n        return GNPSFormat.FBMN\n    # the order of the if statements matters for the following two\n    if GNPSFormat.SNETSV2.value in file.name:\n        return GNPSFormat.SNETSV2\n    if GNPSFormat.SNETS.value in file.name:\n        return GNPSFormat.SNETS\n\n    # Guess the format from the names of the files in the zip file\n    with zipfile.ZipFile(file) as archive:\n        filenames = archive.namelist()\n    if any(GNPSFormat.FBMN.value in x for x in filenames):\n        return GNPSFormat.FBMN\n    # the order of the if statements matters for the following two\n    if any(GNPSFormat.SNETSV2.value in x for x in filenames):\n        return GNPSFormat.SNETSV2\n    if any(GNPSFormat.SNETS.value in x for x in filenames):\n        return GNPSFormat.SNETS\n\n    return GNPSFormat.Unknown\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.gnps_format_from_file_mapping","title":"gnps_format_from_file_mapping","text":"<pre><code>gnps_format_from_file_mapping(\n    file: str | PathLike,\n) -&gt; GNPSFormat\n</code></pre> <p>Detect GNPS format from the given file mapping file.</p> <p>The GNPS file mapping file is located in different folders depending on the GNPS workflow. Here are the locations in corresponding GNPS zip archives:</p> <ul> <li><code>METABOLOMICS-SNETS</code> workflow: the <code>.tsv</code> file in the folder     <code>clusterinfosummarygroup_attributes_withIDs_withcomponentID</code></li> <li><code>METABOLOMICS-SNETS-V2</code> workflow: the <code>.clustersummary</code> file (tsv) in the folder     <code>clusterinfosummarygroup_attributes_withIDs_withcomponentID</code></li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code> workflow: the <code>.csv</code> file in the folder     <code>quantification_table</code></li> </ul> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file to peek the format for.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>GNPS format identified in the file.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_format.py</code> <pre><code>def gnps_format_from_file_mapping(file: str | PathLike) -&gt; GNPSFormat:\n    \"\"\"Detect GNPS format from the given file mapping file.\n\n    The GNPS file mapping file is located in different folders depending on the\n    GNPS workflow. Here are the locations in corresponding GNPS zip archives:\n\n    - `METABOLOMICS-SNETS` workflow: the `.tsv` file in the folder\n        `clusterinfosummarygroup_attributes_withIDs_withcomponentID`\n    - `METABOLOMICS-SNETS-V2` workflow: the `.clustersummary` file (tsv) in the folder\n        `clusterinfosummarygroup_attributes_withIDs_withcomponentID`\n    - `FEATURE-BASED-MOLECULAR-NETWORKING` workflow: the `.csv` file in the folder\n        `quantification_table`\n\n    Args:\n        file: Path to the file to peek the format for.\n\n    Returns:\n        GNPS format identified in the file.\n    \"\"\"\n    with open(file, \"r\") as f:\n        header = f.readline().strip()\n\n    if re.search(r\"\\bAllFiles\\b\", header):\n        return GNPSFormat.SNETS\n    if re.search(r\"\\bUniqueFileSources\\b\", header):\n        return GNPSFormat.SNETSV2\n    if re.search(r\"\\b{}\\b\".format(re.escape(\"row ID\")), header):\n        return GNPSFormat.FBMN\n    return GNPSFormat.Unknown\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.gnps_format_from_task_id","title":"gnps_format_from_task_id","text":"<pre><code>gnps_format_from_task_id(task_id: str) -&gt; GNPSFormat\n</code></pre> <p>Detect GNPS format for the given task id.</p> <p>Parameters:</p> <ul> <li> <code>task_id</code>               (<code>str</code>)           \u2013            <p>GNPS task id.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>The format identified in the GNPS task.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gnps_format_from_task_id(\"c22f44b14a3d450eb836d607cb9521bb\")\n&lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n&gt;&gt;&gt; gnps_format_from_task_id(\"189e8bf16af145758b0a900f1c44ff4a\")\n&lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n&gt;&gt;&gt; gnps_format_from_task_id(\"92036537c21b44c29e509291e53f6382\")\n&lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n&gt;&gt;&gt; gnps_format_from_task_id(\"0ad6535e34d449788f297e712f43068a\")\n&lt;GNPSFormat.Unknown: 'Unknown-GNPS-Workflow'&gt;\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_format.py</code> <pre><code>def gnps_format_from_task_id(task_id: str) -&gt; GNPSFormat:\n    \"\"\"Detect GNPS format for the given task id.\n\n    Args:\n        task_id: GNPS task id.\n\n    Returns:\n        The format identified in the GNPS task.\n\n    Examples:\n        &gt;&gt;&gt; gnps_format_from_task_id(\"c22f44b14a3d450eb836d607cb9521bb\")\n        &lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n        &gt;&gt;&gt; gnps_format_from_task_id(\"189e8bf16af145758b0a900f1c44ff4a\")\n        &lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n        &gt;&gt;&gt; gnps_format_from_task_id(\"92036537c21b44c29e509291e53f6382\")\n        &lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n        &gt;&gt;&gt; gnps_format_from_task_id(\"0ad6535e34d449788f297e712f43068a\")\n        &lt;GNPSFormat.Unknown: 'Unknown-GNPS-Workflow'&gt;\n    \"\"\"\n    task_html = httpx.get(GNPS_TASK_URL.format(task_id))\n    soup = BeautifulSoup(task_html.text, features=\"html.parser\")\n    try:\n        # find the td tag that follows the th tag containing 'Workflow'\n        workflow_tag = soup.find(\"th\", string=\"Workflow\").find_next_sibling(\"td\")  # type: ignore\n        workflow_format = workflow_tag.contents[0].strip()  # type: ignore\n    except AttributeError:\n        return GNPSFormat.Unknown\n\n    if workflow_format == GNPSFormat.FBMN.value:\n        return GNPSFormat.FBMN\n    if workflow_format == GNPSFormat.SNETSV2.value:\n        return GNPSFormat.SNETSV2\n    if workflow_format == GNPSFormat.SNETS.value:\n        return GNPSFormat.SNETS\n    return GNPSFormat.Unknown\n</code></pre>"},{"location":"api/loader/","title":"Dataset Loader","text":""},{"location":"api/loader/#nplinker.loader","title":"nplinker.loader","text":""},{"location":"api/loader/#nplinker.loader.DatasetLoader","title":"DatasetLoader","text":"<pre><code>DatasetLoader(config: Dynaconf)\n</code></pre> <p>Load datasets from the working directory with the given configuration.</p> Concept and Diagram <p>Working Directory Structure</p> <p>Dataset Loading Pipeline</p> <p>Loaded data are stored in the data containers (attributes), e.g. <code>self.bgcs</code>, <code>self.gcfs</code>, etc.</p> <p>Attributes:</p> <ul> <li> <code>config</code>           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> <li> <code>bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of BGC objects.</p> </li> <li> <code>gcfs</code>               (<code>list[GCF]</code>)           \u2013            <p>A list of GCF objects.</p> </li> <li> <code>spectra</code>               (<code>list[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> <li> <code>mfs</code>               (<code>list[MolecularFamily]</code>)           \u2013            <p>A list of MolecularFamily objects.</p> </li> <li> <code>mibig_bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of MIBiG BGC objects.</p> </li> <li> <code>mibig_strains_in_use</code>               (<code>StrainCollection</code>)           \u2013            <p>A StrainCollection object that contains the strains in use from MIBiG.</p> </li> <li> <code>product_types</code>               (<code>list</code>)           \u2013            <p>A list of product types.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A StrainCollection object that contains all strains.</p> </li> <li> <code>class_matches</code>           \u2013            <p>A ClassMatches object that contains class match info.</p> </li> <li> <code>chem_classes</code>           \u2013            <p>A ChemClassPredictions object that contains chemical class predictions.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>config</code>               (<code>Dynaconf</code>)           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; from nplinker.config import load_config\n&gt;&gt;&gt; from nplinker.loader import DatasetLoader\n&gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n&gt;&gt;&gt; loader = DatasetLoader(config)\n&gt;&gt;&gt; loader.load()\n</code></pre> See Also <p>DatasetArranger: Download, generate and/or validate     datasets to ensure they are ready for loading.</p> Source code in <code>src/nplinker/loader.py</code> <pre><code>def __init__(self, config: Dynaconf) -&gt; None:\n    \"\"\"Initialize the DatasetLoader.\n\n    Args:\n        config: A Dynaconf object that contains the configuration settings.\n\n    Examples:\n        &gt;&gt;&gt; from nplinker.config import load_config\n        &gt;&gt;&gt; from nplinker.loader import DatasetLoader\n        &gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n        &gt;&gt;&gt; loader = DatasetLoader(config)\n        &gt;&gt;&gt; loader.load()\n\n    See Also:\n        [DatasetArranger][nplinker.arranger.DatasetArranger]: Download, generate and/or validate\n            datasets to ensure they are ready for loading.\n    \"\"\"\n    self.config = config\n\n    self.bgcs: list[BGC] = []\n    self.gcfs: list[GCF] = []\n    self.spectra: list[Spectrum] = []\n    self.mfs: list[MolecularFamily] = []\n    self.mibig_bgcs: list[BGC] = []\n    self.mibig_strains_in_use: StrainCollection = StrainCollection()\n    self.product_types: list = []\n    self.strains: StrainCollection = StrainCollection()\n\n    self.class_matches = None\n    self.chem_classes = None\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.RUN_CANOPUS_DEFAULT","title":"RUN_CANOPUS_DEFAULT  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>RUN_CANOPUS_DEFAULT = False\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.EXTRA_CANOPUS_PARAMS_DEFAULT","title":"EXTRA_CANOPUS_PARAMS_DEFAULT  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>EXTRA_CANOPUS_PARAMS_DEFAULT = (\n    \"--maxmz 600 formula zodiac structure canopus\"\n)\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.OR_CANOPUS","title":"OR_CANOPUS  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>OR_CANOPUS = 'canopus_dir'\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.OR_MOLNETENHANCER","title":"OR_MOLNETENHANCER  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>OR_MOLNETENHANCER = 'molnetenhancer_dir'\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.config","title":"config  <code>instance-attribute</code>","text":"<pre><code>config = config\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.bgcs","title":"bgcs  <code>instance-attribute</code>","text":"<pre><code>bgcs: list[BGC] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.gcfs","title":"gcfs  <code>instance-attribute</code>","text":"<pre><code>gcfs: list[GCF] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.spectra","title":"spectra  <code>instance-attribute</code>","text":"<pre><code>spectra: list[Spectrum] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.mfs","title":"mfs  <code>instance-attribute</code>","text":"<pre><code>mfs: list[MolecularFamily] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.mibig_bgcs","title":"mibig_bgcs  <code>instance-attribute</code>","text":"<pre><code>mibig_bgcs: list[BGC] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.mibig_strains_in_use","title":"mibig_strains_in_use  <code>instance-attribute</code>","text":"<pre><code>mibig_strains_in_use: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.product_types","title":"product_types  <code>instance-attribute</code>","text":"<pre><code>product_types: list = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.strains","title":"strains  <code>instance-attribute</code>","text":"<pre><code>strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.class_matches","title":"class_matches  <code>instance-attribute</code>","text":"<pre><code>class_matches = None\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.chem_classes","title":"chem_classes  <code>instance-attribute</code>","text":"<pre><code>chem_classes = None\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.load","title":"load","text":"<pre><code>load() -&gt; bool\n</code></pre> <p>Load all data from data files in the working directory.</p> <p>See Dataset Loading Pipeline for the detailed steps.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if all data are loaded successfully.</p> </li> </ul> Source code in <code>src/nplinker/loader.py</code> <pre><code>def load(self) -&gt; bool:\n    \"\"\"Load all data from data files in the working directory.\n\n    See [Dataset Loading Pipeline][dataset-loading-pipeline] for the detailed steps.\n\n    Returns:\n        True if all data are loaded successfully.\n    \"\"\"\n    if not self._load_strain_mappings():\n        return False\n\n    if not self._load_metabolomics():\n        return False\n\n    if not self._load_genomics():\n        return False\n\n    # set self.strains with all strains from input plus mibig strains in use\n    self.strains = self.strains + self.mibig_strains_in_use\n\n    if len(self.strains) == 0:\n        raise Exception(\"Failed to find *ANY* strains.\")\n\n    return True\n</code></pre>"},{"location":"api/metabolomics/","title":"Data Models","text":""},{"location":"api/metabolomics/#nplinker.metabolomics","title":"nplinker.metabolomics","text":""},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily","title":"MolecularFamily","text":"<pre><code>MolecularFamily(id: str)\n</code></pre> <p>Class to model molecular family.</p> <p>Attributes:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>Unique id for the molecular family.</p> </li> <li> <code>spectra_ids</code>               (<code>set[str]</code>)           \u2013            <p>Set of spectrum ids in the molecular family.</p> </li> <li> <code>spectra</code>               (<code>set[Spectrum]</code>)           \u2013            <p>Set of Spectrum objects in the molecular family.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>StrainCollection object that contains strains in the molecular family.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>Unique id for the molecular family.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __init__(self, id: str):\n    \"\"\"Initialize the MolecularFamily.\n\n    Args:\n        id: Unique id for the molecular family.\n    \"\"\"\n    self.id: str = id\n    self.spectra_ids: set[str] = set()\n    self._spectra: set[Spectrum] = set()\n    self._strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id: str = id\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.spectra_ids","title":"spectra_ids  <code>instance-attribute</code>","text":"<pre><code>spectra_ids: set[str] = set()\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.spectra","title":"spectra  <code>property</code>","text":"<pre><code>spectra: set[Spectrum]\n</code></pre> <p>Get Spectrum objects in the molecular family.</p>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.strains","title":"strains  <code>property</code>","text":"<pre><code>strains: StrainCollection\n</code></pre> <p>Get strains in the molecular family.</p>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __str__(self) -&gt; str:\n    return (\n        f\"MolecularFamily(id={self.id}, #Spectrum_objects={len(self._spectra)}, \"\n        f\"#spectrum_ids={len(self.spectra_ids)}, #strains={len(self._strains)})\"\n    )\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, MolecularFamily):\n        return self.id == other.id\n    return NotImplemented\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __hash__(self) -&gt; int:\n    return hash(self.id)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (self.__class__, (self.id,), self.__dict__)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.add_spectrum","title":"add_spectrum","text":"<pre><code>add_spectrum(spectrum: Spectrum) -&gt; None\n</code></pre> <p>Add a Spectrum object to the molecular family.</p> <p>Parameters:</p> <ul> <li> <code>spectrum</code>               (<code>Spectrum</code>)           \u2013            <p><code>Spectrum</code> object to add to the molecular family.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def add_spectrum(self, spectrum: Spectrum) -&gt; None:\n    \"\"\"Add a Spectrum object to the molecular family.\n\n    Args:\n        spectrum: `Spectrum` object to add to the molecular family.\n    \"\"\"\n    self._spectra.add(spectrum)\n    self.spectra_ids.add(spectrum.id)\n    self._strains = self._strains + spectrum.strains\n    # add the molecular family to the spectrum\n    spectrum.family = self\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.detach_spectrum","title":"detach_spectrum","text":"<pre><code>detach_spectrum(spectrum: Spectrum) -&gt; None\n</code></pre> <p>Remove a Spectrum object from the molecular family.</p> <p>Parameters:</p> <ul> <li> <code>spectrum</code>               (<code>Spectrum</code>)           \u2013            <p><code>Spectrum</code> object to remove from the molecular family.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def detach_spectrum(self, spectrum: Spectrum) -&gt; None:\n    \"\"\"Remove a Spectrum object from the molecular family.\n\n    Args:\n        spectrum: `Spectrum` object to remove from the molecular family.\n    \"\"\"\n    self._spectra.remove(spectrum)\n    self.spectra_ids.remove(spectrum.id)\n    self._strains = self._update_strains()\n    # remove the molecular family from the spectrum\n    spectrum.family = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.has_strain","title":"has_strain","text":"<pre><code>has_strain(strain: Strain) -&gt; bool\n</code></pre> <p>Check if the given strain exists.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p><code>Strain</code> object.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the given strain exists.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def has_strain(self, strain: Strain) -&gt; bool:\n    \"\"\"Check if the given strain exists.\n\n    Args:\n        strain: `Strain` object.\n\n    Returns:\n        True when the given strain exists.\n    \"\"\"\n    return strain in self._strains\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.is_singleton","title":"is_singleton","text":"<pre><code>is_singleton() -&gt; bool\n</code></pre> <p>Check if the molecular family contains only one spectrum.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the molecular family has only one spectrum.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def is_singleton(self) -&gt; bool:\n    \"\"\"Check if the molecular family contains only one spectrum.\n\n    Returns:\n        True when the molecular family has only one spectrum.\n    \"\"\"\n    return len(self.spectra_ids) == 1\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum","title":"Spectrum","text":"<pre><code>Spectrum(\n    id: str,\n    mz: list[float],\n    intensity: list[float],\n    precursor_mz: float,\n    rt: float = 0,\n    metadata: dict | None = None,\n)\n</code></pre> <p>Class to model MS/MS Spectrum.</p> <p>Attributes:</p> <ul> <li> <code>id</code>           \u2013            <p>the spectrum ID.</p> </li> <li> <code>mz</code>           \u2013            <p>the list of m/z values.</p> </li> <li> <code>intensity</code>           \u2013            <p>the list of intensity values.</p> </li> <li> <code>precursor_mz</code>           \u2013            <p>the m/z value of the precursor.</p> </li> <li> <code>rt</code>           \u2013            <p>the retention time in seconds.</p> </li> <li> <code>metadata</code>           \u2013            <p>the metadata of the spectrum, i.e. the header information in the MGF file.</p> </li> <li> <code>gnps_annotations</code>               (<code>dict</code>)           \u2013            <p>the GNPS annotations of the spectrum.</p> </li> <li> <code>gnps_id</code>               (<code>str | None</code>)           \u2013            <p>the GNPS ID of the spectrum.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>the strains that this spectrum belongs to.</p> </li> <li> <code>family</code>               (<code>MolecularFamily | None</code>)           \u2013            <p>the molecular family that this spectrum belongs to.</p> </li> <li> <code>peaks</code>               (<code>ndarray</code>)           \u2013            <p>2D array of peaks, each row is a peak of (m/z, intensity) values.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the spectrum ID.</p> </li> <li> <code>mz</code>               (<code>list[float]</code>)           \u2013            <p>the list of m/z values.</p> </li> <li> <code>intensity</code>               (<code>list[float]</code>)           \u2013            <p>the list of intensity values.</p> </li> <li> <code>precursor_mz</code>               (<code>float</code>)           \u2013            <p>the precursor m/z.</p> </li> <li> <code>rt</code>               (<code>float</code>, default:                   <code>0</code> )           \u2013            <p>the retention time in seconds. Defaults to 0.</p> </li> <li> <code>metadata</code>               (<code>dict | None</code>, default:                   <code>None</code> )           \u2013            <p>the metadata of the spectrum, i.e. the header information in the MGF file.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __init__(\n    self,\n    id: str,\n    mz: list[float],\n    intensity: list[float],\n    precursor_mz: float,\n    rt: float = 0,\n    metadata: dict | None = None,\n) -&gt; None:\n    \"\"\"Initialize the Spectrum.\n\n    Args:\n        id: the spectrum ID.\n        mz: the list of m/z values.\n        intensity: the list of intensity values.\n        precursor_mz: the precursor m/z.\n        rt: the retention time in seconds. Defaults to 0.\n        metadata: the metadata of the spectrum, i.e. the header information\n            in the MGF file.\n    \"\"\"\n    self.id = id\n    self.mz = mz\n    self.intensity = intensity\n    self.precursor_mz = precursor_mz\n    self.rt = rt\n    self.metadata = metadata or {}\n\n    self.gnps_annotations: dict = {}\n    self.gnps_id: str | None = None\n    self.strains: StrainCollection = StrainCollection()\n    self.family: MolecularFamily | None = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id = id\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.mz","title":"mz  <code>instance-attribute</code>","text":"<pre><code>mz = mz\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.intensity","title":"intensity  <code>instance-attribute</code>","text":"<pre><code>intensity = intensity\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.precursor_mz","title":"precursor_mz  <code>instance-attribute</code>","text":"<pre><code>precursor_mz = precursor_mz\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.rt","title":"rt  <code>instance-attribute</code>","text":"<pre><code>rt = rt\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.metadata","title":"metadata  <code>instance-attribute</code>","text":"<pre><code>metadata = metadata or {}\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.gnps_annotations","title":"gnps_annotations  <code>instance-attribute</code>","text":"<pre><code>gnps_annotations: dict = {}\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.gnps_id","title":"gnps_id  <code>instance-attribute</code>","text":"<pre><code>gnps_id: str | None = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.strains","title":"strains  <code>instance-attribute</code>","text":"<pre><code>strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.family","title":"family  <code>instance-attribute</code>","text":"<pre><code>family: MolecularFamily | None = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.peaks","title":"peaks  <code>cached</code> <code>property</code>","text":"<pre><code>peaks: ndarray\n</code></pre> <p>Get the peaks, a 2D array with each row containing the values of (m/z, intensity).</p>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __str__(self) -&gt; str:\n    return f\"Spectrum(id={self.id}, #strains={len(self.strains)})\"\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, Spectrum):\n        return self.id == other.id and self.precursor_mz == other.precursor_mz\n    return NotImplemented\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __hash__(self) -&gt; int:\n    return hash((self.id, self.precursor_mz))\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (\n        self.__class__,\n        (self.id, self.mz, self.intensity, self.precursor_mz, self.rt, self.metadata),\n        self.__dict__,\n    )\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.has_strain","title":"has_strain","text":"<pre><code>has_strain(strain: Strain) -&gt; bool\n</code></pre> <p>Check if the given strain exists in the spectrum.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p><code>Strain</code> object.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the given strain exist in the spectrum.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def has_strain(self, strain: Strain) -&gt; bool:\n    \"\"\"Check if the given strain exists in the spectrum.\n\n    Args:\n        strain: `Strain` object.\n\n    Returns:\n        True when the given strain exist in the spectrum.\n    \"\"\"\n    return strain in self.strains\n</code></pre>"},{"location":"api/metabolomics_abc/","title":"Abstract Base Classes","text":""},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc","title":"nplinker.metabolomics.abc","text":""},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.SpectrumLoaderBase","title":"SpectrumLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for SpectrumLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.SpectrumLoaderBase.spectra","title":"spectra  <code>abstractmethod</code> <code>property</code>","text":"<pre><code>spectra: list[Spectrum]\n</code></pre> <p>Get Spectrum objects.</p> <p>Returns:</p> <ul> <li> <code>list[Spectrum]</code>           \u2013            <p>A sequence of Spectrum objects.</p> </li> </ul>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.MolecularFamilyLoaderBase","title":"MolecularFamilyLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for MolecularFamilyLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.MolecularFamilyLoaderBase.get_mfs","title":"get_mfs  <code>abstractmethod</code>","text":"<pre><code>get_mfs(keep_singleton: bool) -&gt; list[MolecularFamily]\n</code></pre> <p>Get MolecularFamily objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_singleton</code>               (<code>bool</code>)           \u2013            <p>True to keep singleton molecular families. A singleton molecular family is a molecular family that contains only one spectrum.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[MolecularFamily]</code>           \u2013            <p>A sequence of MolecularFamily objects.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/abc.py</code> <pre><code>@abstractmethod\ndef get_mfs(self, keep_singleton: bool) -&gt; list[MolecularFamily]:\n    \"\"\"Get MolecularFamily objects.\n\n    Args:\n        keep_singleton: True to keep singleton molecular families. A\n            singleton molecular family is a molecular family that contains\n            only one spectrum.\n\n    Returns:\n        A sequence of MolecularFamily objects.\n    \"\"\"\n</code></pre>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.FileMappingLoaderBase","title":"FileMappingLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for FileMappingLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.FileMappingLoaderBase.mappings","title":"mappings  <code>abstractmethod</code> <code>property</code>","text":"<pre><code>mappings: dict[str, list[str]]\n</code></pre> <p>Get file mappings.</p> <p>Returns:</p> <ul> <li> <code>dict[str, list[str]]</code>           \u2013            <p>A mapping from spectrum ID to the names of files where the spectrum occurs.</p> </li> </ul>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.AnnotationLoaderBase","title":"AnnotationLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for AnnotationLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.AnnotationLoaderBase.annotations","title":"annotations  <code>abstractmethod</code> <code>property</code>","text":"<pre><code>annotations: dict[str, dict]\n</code></pre> <p>Get annotations.</p> <p>Returns:</p> <ul> <li> <code>dict[str, dict]</code>           \u2013            <p>A mapping from spectrum ID to its annotations.</p> </li> </ul>"},{"location":"api/metabolomics_utils/","title":"Utilities","text":""},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils","title":"nplinker.metabolomics.utils","text":""},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.add_annotation_to_spectrum","title":"add_annotation_to_spectrum","text":"<pre><code>add_annotation_to_spectrum(\n    annotations: Mapping[str, dict],\n    spectra: Sequence[Spectrum],\n) -&gt; None\n</code></pre> <p>Add annotations to the <code>Spectrum.gnps_annotations</code> attribute for input spectra.</p> <p>It is possible that some spectra don't have annotations.</p> <p>Note</p> <p>The input <code>spectra</code> list is changed in place.</p> <p>Parameters:</p> <ul> <li> <code>annotations</code>               (<code>Mapping[str, dict]</code>)           \u2013            <p>A dictionary of GNPS annotations, where the keys are spectrum ids and the values are GNPS annotations.</p> </li> <li> <code>spectra</code>               (<code>Sequence[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def add_annotation_to_spectrum(\n    annotations: Mapping[str, dict], spectra: Sequence[Spectrum]\n) -&gt; None:\n    \"\"\"Add annotations to the `Spectrum.gnps_annotations` attribute for input spectra.\n\n    It is possible that some spectra don't have annotations.\n\n    !!! note\n        The input `spectra` list is changed in place.\n\n    Args:\n        annotations: A dictionary of GNPS annotations, where the keys are\n            spectrum ids and the values are GNPS annotations.\n        spectra: A list of Spectrum objects.\n    \"\"\"\n    for spec in spectra:\n        if spec.id in annotations:\n            spec.gnps_annotations = annotations[spec.id]\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.add_strains_to_spectrum","title":"add_strains_to_spectrum","text":"<pre><code>add_strains_to_spectrum(\n    strains: StrainCollection, spectra: Sequence[Spectrum]\n) -&gt; tuple[list[Spectrum], list[Spectrum]]\n</code></pre> <p>Add <code>Strain</code> objects to the <code>Spectrum.strains</code> attribute for input spectra.</p> <p>Note</p> <p>The input <code>spectra</code> list is changed in place.</p> <p>Parameters:</p> <ul> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A collection of strain objects.</p> </li> <li> <code>spectra</code>               (<code>Sequence[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[Spectrum], list[Spectrum]]</code>           \u2013            <p>A tuple of two lists of Spectrum objects,</p> <ul> <li>the first list contains Spectrum objects that are updated with Strain objects;</li> <li>the second list contains Spectrum objects that are not updated with Strain objects because no Strain objects are found.</li> </ul> </li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def add_strains_to_spectrum(\n    strains: StrainCollection, spectra: Sequence[Spectrum]\n) -&gt; tuple[list[Spectrum], list[Spectrum]]:\n    \"\"\"Add `Strain` objects to the `Spectrum.strains` attribute for input spectra.\n\n    !!! note\n        The input `spectra` list is changed in place.\n\n    Args:\n        strains: A collection of strain objects.\n        spectra: A list of Spectrum objects.\n\n    Returns:\n        A tuple of two lists of Spectrum objects,\n\n            - the first list contains Spectrum objects that are updated with Strain objects;\n            - the second list contains Spectrum objects that are not updated with Strain objects\n            because no Strain objects are found.\n    \"\"\"\n    spectra_with_strains = []\n    spectra_without_strains = []\n    for spec in spectra:\n        try:\n            strain_list = strains.lookup(spec.id)\n        except ValueError:\n            spectra_without_strains.append(spec)\n            continue\n\n        for strain in strain_list:\n            spec.strains.add(strain)\n        spectra_with_strains.append(spec)\n\n    logger.info(\n        f\"{len(spectra_with_strains)} Spectrum objects updated with Strain objects.\\n\"\n        f\"{len(spectra_without_strains)} Spectrum objects not updated with Strain objects.\"\n    )\n\n    return spectra_with_strains, spectra_without_strains\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.add_spectrum_to_mf","title":"add_spectrum_to_mf","text":"<pre><code>add_spectrum_to_mf(\n    spectra: Sequence[Spectrum],\n    mfs: Sequence[MolecularFamily],\n) -&gt; tuple[\n    list[MolecularFamily],\n    list[MolecularFamily],\n    dict[MolecularFamily, set[str]],\n]\n</code></pre> <p>Add Spectrum objects to MolecularFamily objects.</p> <p>The attribute <code>MolecularFamily.spectra_ids</code> contains the ids of <code>Spectrum</code> objects. These ids are used to find <code>Spectrum</code> objects from the input <code>spectra</code> list. The found <code>Spectrum</code> objects are added to the <code>MolecularFamily.spectra</code> attribute.</p> <p>It is possible that some spectrum ids are not found in the input <code>spectra</code> list, and so their <code>Spectrum</code> objects are missing in the <code>MolecularFamily</code> object.</p> <p>Note</p> <p>The input <code>mfs</code> list is changed in place.</p> <p>Parameters:</p> <ul> <li> <code>spectra</code>               (<code>Sequence[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> <li> <code>mfs</code>               (<code>Sequence[MolecularFamily]</code>)           \u2013            <p>A list of MolecularFamily objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[MolecularFamily], list[MolecularFamily], dict[MolecularFamily, set[str]]]</code>           \u2013            <p>A tuple of three elements,</p> <ul> <li>the first list contains <code>MolecularFamily</code> objects that are updated with <code>Spectrum</code> objects</li> <li>the second list contains <code>MolecularFamily</code> objects that are not updated with <code>Spectrum</code> objects (all <code>Spectrum</code> objects are missing).</li> <li>the third is a dictionary containing <code>MolecularFamily</code> objects as keys and a set of ids of missing <code>Spectrum</code> objects as values.</li> </ul> </li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def add_spectrum_to_mf(\n    spectra: Sequence[Spectrum], mfs: Sequence[MolecularFamily]\n) -&gt; tuple[list[MolecularFamily], list[MolecularFamily], dict[MolecularFamily, set[str]]]:\n    \"\"\"Add Spectrum objects to MolecularFamily objects.\n\n    The attribute `MolecularFamily.spectra_ids` contains the ids of `Spectrum` objects.\n    These ids are used to find `Spectrum` objects from the input `spectra` list. The found `Spectrum`\n    objects are added to the `MolecularFamily.spectra` attribute.\n\n    It is possible that some spectrum ids are not found in the input `spectra` list, and so their\n    `Spectrum` objects are missing in the `MolecularFamily` object.\n\n\n    !!! note\n        The input `mfs` list is changed in place.\n\n    Args:\n        spectra: A list of Spectrum objects.\n        mfs: A list of MolecularFamily objects.\n\n    Returns:\n        A tuple of three elements,\n\n            - the first list contains `MolecularFamily` objects that are updated with `Spectrum` objects\n            - the second list contains `MolecularFamily` objects that are not updated with `Spectrum`\n            objects (all `Spectrum` objects are missing).\n            - the third is a dictionary containing `MolecularFamily` objects as keys and a set of ids\n            of missing `Spectrum` objects as values.\n    \"\"\"\n    spec_dict = {spec.id: spec for spec in spectra}\n    mf_with_spec = []\n    mf_without_spec = []\n    mf_missing_spec: dict[MolecularFamily, set[str]] = {}\n    for mf in mfs:\n        for spec_id in mf.spectra_ids:\n            try:\n                spec = spec_dict[spec_id]\n            except KeyError:\n                if mf not in mf_missing_spec:\n                    mf_missing_spec[mf] = {spec_id}\n                else:\n                    mf_missing_spec[mf].add(spec_id)\n                continue\n            mf.add_spectrum(spec)\n\n        if mf.spectra:\n            mf_with_spec.append(mf)\n        else:\n            mf_without_spec.append(mf)\n\n    logger.info(\n        f\"{len(mf_with_spec)} MolecularFamily objects updated with Spectrum objects.\\n\"\n        f\"{len(mf_without_spec)} MolecularFamily objects not updated with Spectrum objects.\\n\"\n        f\"{len(mf_missing_spec)} MolecularFamily objects have missing Spectrum objects.\"\n    )\n    return mf_with_spec, mf_without_spec, mf_missing_spec\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.extract_mappings_strain_id_ms_filename","title":"extract_mappings_strain_id_ms_filename","text":"<pre><code>extract_mappings_strain_id_ms_filename(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"strain_id &lt;-&gt; MS_filename\".</p> <p>Parameters:</p> <ul> <li> <code>podp_project_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the PODP project JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of MS filenames.</p> </li> </ul> Notes <p>The <code>podp_project_json_file</code> is the project JSON file downloaded from PODP platform. For example, for project MSV000079284, its json file is https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.</p> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def extract_mappings_strain_id_ms_filename(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"strain_id &lt;-&gt; MS_filename\".\n\n    Args:\n        podp_project_json_file: The path to the PODP project JSON file.\n\n    Returns:\n        Key is strain id and value is a set of MS filenames.\n\n    Notes:\n        The `podp_project_json_file` is the project JSON file downloaded from\n        PODP platform. For example, for project MSV000079284, its json file is\n        https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict: dict[str, set[str]] = {}\n    with open(podp_project_json_file, \"r\") as f:\n        json_data = json.load(f)\n\n    validate_podp_json(json_data)\n\n    # Extract mappings strain id &lt;-&gt; metabolomics filename\n    for record in json_data[\"genome_metabolome_links\"]:\n        strain_id = record[\"genome_label\"]\n        # get the actual filename of the mzXML URL\n        filename = Path(record[\"metabolomics_file\"]).name\n        if strain_id in mappings_dict:\n            mappings_dict[strain_id].add(filename)\n        else:\n            mappings_dict[strain_id] = {filename}\n    return mappings_dict\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.extract_mappings_ms_filename_spectrum_id","title":"extract_mappings_ms_filename_spectrum_id","text":"<pre><code>extract_mappings_ms_filename_spectrum_id(\n    gnps_file_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".</p> <p>Parameters:</p> <ul> <li> <code>gnps_file_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the GNPS file mappings file (csv or tsv).</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is MS filename and value is a set of spectrum ids.</p> </li> </ul> Notes <p>The <code>gnps_file_mappings_file</code> is downloaded from GNPS website and named as GNPS_FILE_MAPPINGS_TSV or GNPS_FILE_MAPPINGS_CSV. For more details, see GNPS data.</p> See Also <ul> <li>GNPSFileMappingLoader: Load GNPS file mappings file.</li> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def extract_mappings_ms_filename_spectrum_id(\n    gnps_file_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".\n\n    Args:\n        gnps_file_mappings_file: The path to the GNPS file mappings file (csv or tsv).\n\n    Returns:\n        Key is MS filename and value is a set of spectrum ids.\n\n    Notes:\n        The `gnps_file_mappings_file` is downloaded from GNPS website and named as\n        [GNPS_FILE_MAPPINGS_TSV][nplinker.defaults.GNPS_FILE_MAPPINGS_TSV] or\n        [GNPS_FILE_MAPPINGS_CSV][nplinker.defaults.GNPS_FILE_MAPPINGS_CSV].\n        For more details, see [GNPS data][gnps-data].\n\n    See Also:\n        - [GNPSFileMappingLoader][nplinker.metabolomics.gnps.gnps_file_mapping_loader.GNPSFileMappingLoader]:\n        Load GNPS file mappings file.\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    loader = GNPSFileMappingLoader(gnps_file_mappings_file)\n    return loader.mapping_reversed\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.get_mappings_strain_id_spectrum_id","title":"get_mappings_strain_id_spectrum_id","text":"<pre><code>get_mappings_strain_id_spectrum_id(\n    mappings_strain_id_ms_filename: Mapping[str, set[str]],\n    mappings_ms_filename_spectrum_id: Mapping[\n        str, set[str]\n    ],\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Get mappings \"strain_id &lt;-&gt; spectrum_id\".</p> <p>Parameters:</p> <ul> <li> <code>mappings_strain_id_ms_filename</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"strain_id &lt;-&gt; MS_filename\".</p> </li> <li> <code>mappings_ms_filename_spectrum_id</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"MS_filename &lt;-&gt; spectrum_id\".</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of spectrum ids.</p> </li> </ul> See Also <ul> <li><code>extract_mappings_strain_id_ms_filename</code>: Extract mappings \"strain_id &lt;-&gt; MS_filename\".</li> <li><code>extract_mappings_ms_filename_spectrum_id</code>: Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".</li> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def get_mappings_strain_id_spectrum_id(\n    mappings_strain_id_ms_filename: Mapping[str, set[str]],\n    mappings_ms_filename_spectrum_id: Mapping[str, set[str]],\n) -&gt; dict[str, set[str]]:\n    \"\"\"Get mappings \"strain_id &lt;-&gt; spectrum_id\".\n\n    Args:\n        mappings_strain_id_ms_filename: Mappings\n            \"strain_id &lt;-&gt; MS_filename\".\n        mappings_ms_filename_spectrum_id: Mappings\n            \"MS_filename &lt;-&gt; spectrum_id\".\n\n    Returns:\n        Key is strain id and value is a set of spectrum ids.\n\n\n    See Also:\n        - `extract_mappings_strain_id_ms_filename`: Extract mappings \"strain_id &lt;-&gt; MS_filename\".\n        - `extract_mappings_ms_filename_spectrum_id`: Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict = {}\n    for strain_id, ms_filenames in mappings_strain_id_ms_filename.items():\n        spectrum_ids = set()\n        for ms_filename in ms_filenames:\n            if (sid := mappings_ms_filename_spectrum_id.get(ms_filename)) is not None:\n                spectrum_ids.update(sid)\n        if spectrum_ids:\n            mappings_dict[strain_id] = spectrum_ids\n    return mappings_dict\n</code></pre>"},{"location":"api/mibig/","title":"MiBIG","text":""},{"location":"api/mibig/#nplinker.genomics.mibig","title":"nplinker.genomics.mibig","text":""},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader","title":"MibigLoader","text":"<pre><code>MibigLoader(data_dir: str | PathLike)\n</code></pre> <p>               Bases: <code>BGCLoaderBase</code></p> <p>Parse MIBiG metadata files and return BGC objects.</p> <p>MIBiG metadata file (json) contains annotations/metadata information for each BGC. See https://mibig.secondarymetabolites.org/download.</p> <p>The MiBIG accession is used as BGC id and strain name. The loaded BGC objects have Strain object as their strain attribute (i.e. <code>BGC.strain</code>).</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory of MIBiG metadata json files</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = MibigLoader(\"path/to/mibig/data/dir\")\n&gt;&gt;&gt; loader.data_dir\n'path/to/mibig/data/dir'\n&gt;&gt;&gt; loader.get_bgcs()\n[BGC('BGC000001', 'NRP'), BGC('BGC000002', 'Polyketide')]\n</code></pre> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def __init__(self, data_dir: str | PathLike):\n    \"\"\"Initialize the MIBiG metadata loader.\n\n    Args:\n        data_dir: Path to the directory of MIBiG metadata json files\n\n    Examples:\n        &gt;&gt;&gt; loader = MibigLoader(\"path/to/mibig/data/dir\")\n        &gt;&gt;&gt; loader.data_dir\n        'path/to/mibig/data/dir'\n        &gt;&gt;&gt; loader.get_bgcs()\n        [BGC('BGC000001', 'NRP'), BGC('BGC000002', 'Polyketide')]\n    \"\"\"\n    self.data_dir = str(data_dir)\n    self._file_dict = self.parse_data_dir(self.data_dir)\n    self._metadata_dict = self._parse_metadata()\n    self._bgcs = self._parse_bgcs()\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.data_dir","title":"data_dir  <code>instance-attribute</code>","text":"<pre><code>data_dir = str(data_dir)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.get_files","title":"get_files","text":"<pre><code>get_files() -&gt; dict[str, str]\n</code></pre> <p>Get the path of all MIBiG metadata json files.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is metadata file name (BGC accession), and the value is path to the metadata</p> </li> <li> <code>dict[str, str]</code>           \u2013            <p>json file</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def get_files(self) -&gt; dict[str, str]:\n    \"\"\"Get the path of all MIBiG metadata json files.\n\n    Returns:\n        The key is metadata file name (BGC accession), and the value is path to the metadata\n        json file\n    \"\"\"\n    return self._file_dict\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.parse_data_dir","title":"parse_data_dir  <code>staticmethod</code>","text":"<pre><code>parse_data_dir(data_dir: str | PathLike) -&gt; dict[str, str]\n</code></pre> <p>Parse metadata directory and return paths to all metadata json files.</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>path to the directory of MIBiG metadata json files</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is metadata file name (BGC accession), and the value is path to the metadata</p> </li> <li> <code>dict[str, str]</code>           \u2013            <p>json file</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>@staticmethod\ndef parse_data_dir(data_dir: str | PathLike) -&gt; dict[str, str]:\n    \"\"\"Parse metadata directory and return paths to all metadata json files.\n\n    Args:\n        data_dir: path to the directory of MIBiG metadata json files\n\n    Returns:\n        The key is metadata file name (BGC accession), and the value is path to the metadata\n        json file\n    \"\"\"\n    file_dict = {}\n    json_files = list_files(data_dir, prefix=\"BGC\", suffix=\".json\")\n    for file in json_files:\n        fname = Path(file).stem\n        file_dict[fname] = file\n    return file_dict\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.get_metadata","title":"get_metadata","text":"<pre><code>get_metadata() -&gt; dict[str, MibigMetadata]\n</code></pre> <p>Get MibigMetadata objects.</p> <p>Returns:</p> <ul> <li> <code>dict[str, MibigMetadata]</code>           \u2013            <p>The key is BGC accession (file name) and the value is MibigMetadata object</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def get_metadata(self) -&gt; dict[str, MibigMetadata]:\n    \"\"\"Get MibigMetadata objects.\n\n    Returns:\n        The key is BGC accession (file name) and the value is MibigMetadata object\n    \"\"\"\n    return self._metadata_dict\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.get_bgcs","title":"get_bgcs","text":"<pre><code>get_bgcs() -&gt; list[BGC]\n</code></pre> <p>Get BGC objects.</p> <p>The BGC objects use MiBIG accession as id and have Strain object as their strain attribute (i.e. <code>BGC.strain</code>), where the name of the Strain object is also MiBIG accession.</p> <p>Returns:</p> <ul> <li> <code>list[BGC]</code>           \u2013            <p>A list of BGC objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def get_bgcs(self) -&gt; list[BGC]:\n    \"\"\"Get BGC objects.\n\n    The BGC objects use MiBIG accession as id and have Strain object as\n    their strain attribute (i.e. `BGC.strain`), where the name of the Strain\n    object is also MiBIG accession.\n\n    Returns:\n        A list of BGC objects\n    \"\"\"\n    return self._bgcs\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata","title":"MibigMetadata","text":"<pre><code>MibigMetadata(file: str | PathLike)\n</code></pre> <p>Class to model the BGC metadata/annotations defined in MIBiG.</p> <p>MIBiG is a specification of BGC metadata and use JSON schema to represent BGC metadata. More details see: https://mibig.secondarymetabolites.org/download.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the json file of MIBiG BGC metadata</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; metadata = MibigMetadata(\"/data/BGC0000001.json\")\n</code></pre> Source code in <code>src/nplinker/genomics/mibig/mibig_metadata.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the MIBiG metadata object.\n\n    Args:\n        file: Path to the json file of MIBiG BGC metadata\n\n    Examples:\n        &gt;&gt;&gt; metadata = MibigMetadata(\"/data/BGC0000001.json\")\n    \"\"\"\n    self.file = str(file)\n    with open(self.file, \"rb\") as f:\n        self.metadata = json.load(f)\n\n    self._mibig_accession: str\n    self._biosyn_class: tuple[str]\n    self._parse_metadata()\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.file","title":"file  <code>instance-attribute</code>","text":"<pre><code>file = str(file)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.metadata","title":"metadata  <code>instance-attribute</code>","text":"<pre><code>metadata = load(f)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.mibig_accession","title":"mibig_accession  <code>property</code>","text":"<pre><code>mibig_accession: str\n</code></pre> <p>Get the value of metadata item 'mibig_accession'.</p>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.biosyn_class","title":"biosyn_class  <code>property</code>","text":"<pre><code>biosyn_class: tuple[str]\n</code></pre> <p>Get the value of metadata item 'biosyn_class'.</p> <p>The 'biosyn_class' is biosynthetic class(es), namely the type of natural product or secondary metabolite.</p> <p>MIBiG defines 6 major biosynthetic classes for natural products, including <code>NRP</code>, <code>Polyketide</code>, <code>RiPP</code>, <code>Terpene</code>, <code>Saccharide</code> and <code>Alkaloid</code>. Note that natural products created by the other biosynthetic mechanisms fall under the category <code>Other</code>. For more details see the paper.</p>"},{"location":"api/mibig/#nplinker.genomics.mibig.download_and_extract_mibig_metadata","title":"download_and_extract_mibig_metadata","text":"<pre><code>download_and_extract_mibig_metadata(\n    download_root: str | PathLike,\n    extract_path: str | PathLike,\n    version: str = \"3.1\",\n)\n</code></pre> <p>Download and extract MIBiG metadata json files.</p> <p>Note that it does not matter whether the metadata json files are in nested folders or not in the archive, all json files will be extracted to the same location, i.e. <code>extract_path</code>. The nested folders will be removed if they exist. So the <code>extract_path</code> will have only json files.</p> <p>Parameters:</p> <ul> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory in which to place the downloaded archive.</p> </li> <li> <code>extract_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to an empty directory where the json files will be extracted. The directory must be empty if it exists. If it doesn't exist, the directory will be created.</p> </li> <li> <code>version</code>               (<code>str</code>, default:                   <code>'3.1'</code> )           \u2013            <p>description. Defaults to \"3.1\".</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; download_and_extract_mibig_metadata(\"/data/download\", \"/data/mibig_metadata\")\n</code></pre> Source code in <code>src/nplinker/genomics/mibig/mibig_downloader.py</code> <pre><code>def download_and_extract_mibig_metadata(\n    download_root: str | os.PathLike,\n    extract_path: str | os.PathLike,\n    version: str = \"3.1\",\n):\n    \"\"\"Download and extract MIBiG metadata json files.\n\n    Note that it does not matter whether the metadata json files are in nested folders or not in the archive,\n    all json files will be extracted to the same location, i.e. `extract_path`. The nested\n    folders will be removed if they exist. So the `extract_path` will have only json files.\n\n    Args:\n        download_root: Path to the directory in which to place the downloaded archive.\n        extract_path: Path to an empty directory where the json files will be extracted.\n            The directory must be empty if it exists. If it doesn't exist, the directory will be created.\n        version: _description_. Defaults to \"3.1\".\n\n    Examples:\n        &gt;&gt;&gt; download_and_extract_mibig_metadata(\"/data/download\", \"/data/mibig_metadata\")\n    \"\"\"\n    download_root = Path(download_root)\n    extract_path = Path(extract_path)\n\n    if download_root == extract_path:\n        raise ValueError(\"Identical path of download directory and extract directory\")\n\n    # check if extract_path is empty\n    if not extract_path.exists():\n        extract_path.mkdir(parents=True)\n    else:\n        if len(list(extract_path.iterdir())) != 0:\n            raise ValueError(f'Nonempty directory: \"{extract_path}\"')\n\n    # download and extract\n    md5 = _MD5_MIBIG_METADATA[version]\n    download_and_extract_archive(\n        url=MIBIG_METADATA_URL.format(version=version),\n        download_root=download_root,\n        extract_root=extract_path,\n        md5=md5,\n    )\n\n    # After extracting mibig archive, it's either one dir or many json files,\n    # if it's a dir, then move all json files from it to extract_path\n    subdirs = list_dirs(extract_path)\n    if len(subdirs) &gt; 1:\n        raise ValueError(f\"Expected one extracted directory, got {len(subdirs)}\")\n\n    if len(subdirs) == 1:\n        subdir_path = subdirs[0]\n        for fname in list_files(subdir_path, prefix=\"BGC\", suffix=\".json\", keep_parent=False):\n            shutil.move(os.path.join(subdir_path, fname), os.path.join(extract_path, fname))\n        # delete subdir\n        if subdir_path != extract_path:\n            shutil.rmtree(subdir_path)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.parse_bgc_metadata_json","title":"parse_bgc_metadata_json","text":"<pre><code>parse_bgc_metadata_json(file: str | PathLike) -&gt; BGC\n</code></pre> <p>Parse MIBiG metadata file and return BGC object.</p> <p>Note that the MiBIG accession is used as the BGC id and strain name. The BGC object has Strain object as its strain attribute.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the MIBiG metadata json file</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>BGC</code>           \u2013            <p>BGC object</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def parse_bgc_metadata_json(file: str | PathLike) -&gt; BGC:\n    \"\"\"Parse MIBiG metadata file and return BGC object.\n\n    Note that the MiBIG accession is used as the BGC id and strain name. The BGC\n    object has Strain object as its strain attribute.\n\n    Args:\n        file: Path to the MIBiG metadata json file\n\n    Returns:\n        BGC object\n    \"\"\"\n    metadata = MibigMetadata(str(file))\n    mibig_bgc = BGC(metadata.mibig_accession, *metadata.biosyn_class)\n    mibig_bgc.mibig_bgc_class = metadata.biosyn_class\n    mibig_bgc.strain = Strain(metadata.mibig_accession)\n    return mibig_bgc\n</code></pre>"},{"location":"api/nplinker/","title":"NPLinker","text":""},{"location":"api/nplinker/#nplinker","title":"nplinker","text":""},{"location":"api/nplinker/#nplinker.NPLinker","title":"NPLinker","text":"<pre><code>NPLinker(config_file: str | PathLike)\n</code></pre> <p>The central class of NPLinker application.</p> <p>Attributes:</p> <ul> <li> <code>config</code>               (<code>Dynaconf</code>)           \u2013            <p>The configuration object for the current NPLinker application.</p> </li> <li> <code>root_dir</code>               (<code>str</code>)           \u2013            <p>The path to the root directory of the current NPLinker application.</p> </li> <li> <code>output_dir</code>               (<code>str</code>)           \u2013            <p>The path to the output directory of the current NPLinker application.</p> </li> <li> <code>bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of all BGC objects.</p> </li> <li> <code>gcfs</code>               (<code>list[GCF]</code>)           \u2013            <p>A list of all GCF objects.</p> </li> <li> <code>spectra</code>               (<code>list[Spectrum]</code>)           \u2013            <p>A list of all Spectrum objects.</p> </li> <li> <code>mfs</code>               (<code>list[MolecularFamily]</code>)           \u2013            <p>A list of all MolecularFamily objects.</p> </li> <li> <code>mibig_bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of all MiBIG BGC objects.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A StrainCollection object containing all Strain objects.</p> </li> <li> <code>product_types</code>               (<code>list[str]</code>)           \u2013            <p>A list of all BiGSCAPE product types.</p> </li> <li> <code>scoring_methods</code>               (<code>list[str]</code>)           \u2013            <p>A list of all valid scoring methods.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>config_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the configuration file to use.</p> </li> </ul> <p>Examples:</p> <p>Starting the NPLinker application:</p> <pre><code>&gt;&gt;&gt; from nplinker import NPLinker\n&gt;&gt;&gt; npl = NPLinker(\"path/to/config.toml\")\n</code></pre> <p>Loading data from files to python objects:</p> <pre><code>&gt;&gt;&gt; npl.load_data()\n</code></pre> <p>Checking the number of GCF objects:</p> <pre><code>&gt;&gt;&gt; len(npl.gcfs)\n</code></pre> <p>Getting the links for all GCF objects using the Metcalf scoring method, and the result is stored in a LinkGraph object:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n</code></pre> <p>Getting the link data between two objects:</p> <pre><code>&gt;&gt;&gt; link_data = lg.get_link_data(npl.gcfs[0], npl.spectra[0])\n{\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0, \"standardised\": False})}\n</code></pre> <p>Saving the data to a pickle file:</p> <pre><code>&gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def __init__(self, config_file: str | PathLike):\n    \"\"\"Initialise an NPLinker instance.\n\n    Args:\n        config_file: Path to the configuration file to use.\n\n\n    Examples:\n        Starting the NPLinker application:\n        &gt;&gt;&gt; from nplinker import NPLinker\n        &gt;&gt;&gt; npl = NPLinker(\"path/to/config.toml\")\n\n        Loading data from files to python objects:\n        &gt;&gt;&gt; npl.load_data()\n\n        Checking the number of GCF objects:\n        &gt;&gt;&gt; len(npl.gcfs)\n\n        Getting the links for all GCF objects using the Metcalf scoring method, and the result\n        is stored in a [LinkGraph][nplinker.scoring.LinkGraph] object:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n\n        Getting the link data between two objects:\n        &gt;&gt;&gt; link_data = lg.get_link_data(npl.gcfs[0], npl.spectra[0])\n        {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0, \"standardised\": False})}\n\n        Saving the data to a pickle file:\n        &gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n    \"\"\"\n    # Load the configuration file\n    self.config: Dynaconf = load_config(config_file)\n\n    # Setup logging for the application\n    setup_logging(\n        level=self.config.log.level,\n        file=self.config.log.get(\"file\", \"\"),\n        use_console=self.config.log.use_console,\n    )\n    logger.info(\n        \"Configuration:\\n %s\", pformat(self.config.as_dict(), width=20, sort_dicts=False)\n    )\n\n    # Setup the output directory\n    self._output_dir = self.config.root_dir / OUTPUT_DIRNAME\n    self._output_dir.mkdir(exist_ok=True)\n\n    # Initialise data containers that will be populated by the `load_data` method\n    self._bgc_dict: dict[str, BGC] = {}\n    self._gcf_dict: dict[str, GCF] = {}\n    self._spec_dict: dict[str, Spectrum] = {}\n    self._mf_dict: dict[str, MolecularFamily] = {}\n    self._mibig_bgcs: list[BGC] = []\n    self._strains: StrainCollection = StrainCollection()\n    self._product_types: list = []\n    self._chem_classes = None  # TODO: to be refactored\n    self._class_matches = None  # TODO: to be refactored\n\n    # Flags to keep track of whether the scoring methods have been set up\n    self._scoring_methods_setup_done = {name: False for name in self._valid_scoring_methods}\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.config","title":"config  <code>instance-attribute</code>","text":"<pre><code>config: Dynaconf = load_config(config_file)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.root_dir","title":"root_dir  <code>property</code>","text":"<pre><code>root_dir: str\n</code></pre> <p>Get the path to the root directory of the current NPLinker instance.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.output_dir","title":"output_dir  <code>property</code>","text":"<pre><code>output_dir: str\n</code></pre> <p>Get the path to the output directory of the current NPLinker instance.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.bgcs","title":"bgcs  <code>property</code>","text":"<pre><code>bgcs: list[BGC]\n</code></pre> <p>Get all BGC objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.gcfs","title":"gcfs  <code>property</code>","text":"<pre><code>gcfs: list[GCF]\n</code></pre> <p>Get all GCF objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.spectra","title":"spectra  <code>property</code>","text":"<pre><code>spectra: list[Spectrum]\n</code></pre> <p>Get all Spectrum objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.mfs","title":"mfs  <code>property</code>","text":"<pre><code>mfs: list[MolecularFamily]\n</code></pre> <p>Get all MolecularFamily objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.mibig_bgcs","title":"mibig_bgcs  <code>property</code>","text":"<pre><code>mibig_bgcs: list[BGC]\n</code></pre> <p>Get all MiBIG BGC objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.strains","title":"strains  <code>property</code>","text":"<pre><code>strains: StrainCollection\n</code></pre> <p>Get all Strain objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.product_types","title":"product_types  <code>property</code>","text":"<pre><code>product_types: list[str]\n</code></pre> <p>Get all BiGSCAPE product types.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.chem_classes","title":"chem_classes  <code>property</code>","text":"<pre><code>chem_classes\n</code></pre> <p>Returns loaded ChemClassPredictions with the class predictions.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.class_matches","title":"class_matches  <code>property</code>","text":"<pre><code>class_matches\n</code></pre> <p>ClassMatches with the matched classes and scoring tables from MIBiG.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.scoring_methods","title":"scoring_methods  <code>property</code>","text":"<pre><code>scoring_methods: list[str]\n</code></pre> <p>Get names of all valid scoring methods.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.load_data","title":"load_data","text":"<pre><code>load_data()\n</code></pre> <p>Load all data from files into memory.</p> <p>This method is a convenience function that calls the <code>DatasetArranger</code> class to arrange data files (download, generate and/or validate data) in the correct directory structure, and then calls the <code>DatasetLoader</code> class to load all data from the files into memory.</p> <p>The loaded data is stored in various data containers for easy access, e.g. <code>self.bgcs</code> for all BGC objects, <code>self.strains</code> for all Strain objects, etc.</p> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def load_data(self):\n    \"\"\"Load all data from files into memory.\n\n    This method is a convenience function that calls the\n    [`DatasetArranger`][nplinker.arranger.DatasetArranger] class to arrange data files\n    (download, generate and/or validate data) in the [correct directory structure][working-directory-structure],\n    and then calls the [`DatasetLoader`][nplinker.loader.DatasetLoader] class to load all data\n    from the files into memory.\n\n    The loaded data is stored in various data containers for easy access, e.g.\n    [`self.bgcs`][nplinker.NPLinker.bgcs] for all BGC objects,\n    [`self.strains`][nplinker.NPLinker.strains] for all Strain objects, etc.\n    \"\"\"\n    arranger = DatasetArranger(self.config)\n    arranger.arrange()\n    loader = DatasetLoader(self.config)\n    loader.load()\n\n    self._bgc_dict = {bgc.id: bgc for bgc in loader.bgcs}\n    self._gcf_dict = {gcf.id: gcf for gcf in loader.gcfs}\n    self._spec_dict = {spec.id: spec for spec in loader.spectra}\n    self._mf_dict = {mf.id: mf for mf in loader.mfs}\n\n    self._mibig_bgcs = loader.mibig_bgcs\n    self._strains = loader.strains\n    self._product_types = loader.product_types\n    self._chem_classes = loader.chem_classes\n    self._class_matches = loader.class_matches\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.get_links","title":"get_links","text":"<pre><code>get_links(\n    objects: (\n        Sequence[BGC]\n        | Sequence[GCF]\n        | Sequence[Spectrum]\n        | Sequence[MolecularFamily]\n    ),\n    scoring_method: str,\n    **scoring_params: Any\n) -&gt; LinkGraph\n</code></pre> <p>Get links for the given objects using the specified scoring method and parameters.</p> <p>Parameters:</p> <ul> <li> <code>objects</code>               (<code>Sequence[BGC] | Sequence[GCF] | Sequence[Spectrum] | Sequence[MolecularFamily]</code>)           \u2013            <p>A sequence of objects to get links for. The objects must be of the same type, i.e. <code>BGC</code>, <code>GCF</code>, <code>Spectrum</code> or <code>MolecularFamily</code> type.</p> <p>Warning</p> <p>For scoring method <code>metcalf</code>, the <code>BGC</code> objects are not supported.</p> </li> <li> <code>scoring_method</code>               (<code>str</code>)           \u2013            <p>The scoring method to use. Must be one of the valid scoring methods <code>self.scoring_methods</code>, such as <code>metcalf</code>.</p> </li> <li> <code>scoring_params</code>               (<code>Any</code>, default:                   <code>{}</code> )           \u2013            <p>Parameters to pass to the scoring method. If not given, the default parameters of the specified scoring method will be used.</p> <p>Check the <code>get_links</code> method of the scoring method class for the available parameters and their default values.</p> Scoring Method Scoring Parameters <code>metcalf</code> <code>cutoff</code>, <code>standardised</code> </li> </ul> <p>Returns:</p> <ul> <li> <code>LinkGraph</code>           \u2013            <p>A LinkGraph object containing the links for the given objects.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If input objects are empty or if the scoring method is invalid.</p> </li> <li> <code>TypeError</code>             \u2013            <p>If the input objects are not of the same type or if the object type is invalid.</p> </li> </ul> <p>Examples:</p> <p>Using default scoring parameters:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n</code></pre> <p>Scoring parameters provided:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\", cutoff=0.5, standardised=True)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def get_links(\n    self,\n    objects: Sequence[BGC] | Sequence[GCF] | Sequence[Spectrum] | Sequence[MolecularFamily],\n    scoring_method: str,\n    **scoring_params: Any,\n) -&gt; LinkGraph:\n    \"\"\"Get links for the given objects using the specified scoring method and parameters.\n\n    Args:\n        objects: A sequence of objects to get links for. The objects must be of the same\n            type, i.e. `BGC`, `GCF`, `Spectrum` or `MolecularFamily` type.\n            !!! Warning\n                For scoring method `metcalf`, the `BGC` objects are not supported.\n        scoring_method: The scoring method to use. Must be one of the valid scoring methods\n            [`self.scoring_methods`][nplinker.NPLinker.scoring_methods], such as `metcalf`.\n        scoring_params: Parameters to pass to the scoring method. If not given, the default\n            parameters of the specified scoring method will be used.\n\n            Check the `get_links` method of the scoring method class for the available\n            parameters and their default values.\n\n            | Scoring Method | Scoring Parameters |\n            | -------------- | ------------------ |\n            | `metcalf` | [`cutoff`, `standardised`][nplinker.scoring.MetcalfScoring.get_links] |\n\n    Returns:\n        A LinkGraph object containing the links for the given objects.\n\n    Raises:\n        ValueError: If input objects are empty or if the scoring method is invalid.\n        TypeError: If the input objects are not of the same type or if the object type is invalid.\n\n    Examples:\n        Using default scoring parameters:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n\n        Scoring parameters provided:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\", cutoff=0.5, standardised=True)\n    \"\"\"\n    # Validate objects\n    if len(objects) == 0:\n        raise ValueError(\"No objects provided to get links for\")\n    # check if all objects are of the same type\n    types = {type(i) for i in objects}\n    if len(types) &gt; 1:\n        raise TypeError(\"Input objects must be of the same type.\")\n    # check if the object type is valid\n    obj_type = next(iter(types))\n    if obj_type not in (BGC, GCF, Spectrum, MolecularFamily):\n        raise TypeError(\n            f\"Invalid type {obj_type}. Input objects must be BGC, GCF, Spectrum or MolecularFamily objects.\"\n        )\n\n    # Validate scoring method\n    if scoring_method not in self._valid_scoring_methods:\n        raise ValueError(f\"Invalid scoring method {scoring_method}.\")\n\n    # Check if the scoring method has been set up\n    if not self._scoring_methods_setup_done[scoring_method]:\n        self._valid_scoring_methods[scoring_method].setup(self)\n        self._scoring_methods_setup_done[scoring_method] = True\n\n    # Initialise the scoring method\n    scoring = self._valid_scoring_methods[scoring_method]()\n\n    return scoring.get_links(*objects, **scoring_params)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_bgc","title":"lookup_bgc","text":"<pre><code>lookup_bgc(id: str) -&gt; BGC | None\n</code></pre> <p>Get the BGC object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the BGC to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>BGC | None</code>           \u2013            <p>The BGC object with the given ID, or None if no such object exists.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; bgc = npl.lookup_bgc(\"BGC000001\")\n&gt;&gt;&gt; bgc\nBGC(id=\"BGC000001\", ...)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_bgc(self, id: str) -&gt; BGC | None:\n    \"\"\"Get the BGC object with the given ID.\n\n    Args:\n        id: the ID of the BGC to look up.\n\n    Returns:\n        The BGC object with the given ID, or None if no such object exists.\n\n    Examples:\n        &gt;&gt;&gt; bgc = npl.lookup_bgc(\"BGC000001\")\n        &gt;&gt;&gt; bgc\n        BGC(id=\"BGC000001\", ...)\n    \"\"\"\n    return self._bgc_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_gcf","title":"lookup_gcf","text":"<pre><code>lookup_gcf(id: str) -&gt; GCF | None\n</code></pre> <p>Get the GCF object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the GCF to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GCF | None</code>           \u2013            <p>The GCF object with the given ID, or None if no such object exists.</p> </li> </ul> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_gcf(self, id: str) -&gt; GCF | None:\n    \"\"\"Get the GCF object with the given ID.\n\n    Args:\n        id: the ID of the GCF to look up.\n\n    Returns:\n        The GCF object with the given ID, or None if no such object exists.\n    \"\"\"\n    return self._gcf_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_spectrum","title":"lookup_spectrum","text":"<pre><code>lookup_spectrum(id: str) -&gt; Spectrum | None\n</code></pre> <p>Get the Spectrum object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the Spectrum to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>Spectrum | None</code>           \u2013            <p>The Spectrum object with the given ID, or None if no such object exists.</p> </li> </ul> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_spectrum(self, id: str) -&gt; Spectrum | None:\n    \"\"\"Get the Spectrum object with the given ID.\n\n    Args:\n        id: the ID of the Spectrum to look up.\n\n    Returns:\n        The Spectrum object with the given ID, or None if no such object exists.\n    \"\"\"\n    return self._spec_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_mf","title":"lookup_mf","text":"<pre><code>lookup_mf(id: str) -&gt; MolecularFamily | None\n</code></pre> <p>Get the MolecularFamily object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the MolecularFamily to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>MolecularFamily | None</code>           \u2013            <p>The MolecularFamily object with the given ID, or None if no such object exists.</p> </li> </ul> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_mf(self, id: str) -&gt; MolecularFamily | None:\n    \"\"\"Get the MolecularFamily object with the given ID.\n\n    Args:\n        id: the ID of the MolecularFamily to look up.\n\n    Returns:\n        The MolecularFamily object with the given ID, or None if no such object exists.\n    \"\"\"\n    return self._mf_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.save_data","title":"save_data","text":"<pre><code>save_data(\n    file: str | PathLike, links: LinkGraph | None = None\n) -&gt; None\n</code></pre> <p>Pickle data to a file.</p> <p>The  pickled data is a tuple of BGCs, GCFs, Spectra, MolecularFamilies, StrainCollection and links, i.e. <code>(bgcs, gcfs, spectra, mfs, strains, links)</code>.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the pickle file to save the data to.</p> </li> <li> <code>links</code>               (<code>LinkGraph | None</code>, default:                   <code>None</code> )           \u2013            <p>The LinkGraph object to save.</p> </li> </ul> <p>Examples:</p> <p>Saving the data to a pickle file, links data is <code>None</code>:</p> <pre><code>&gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\")\n</code></pre> <p>Also saving the links data:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n&gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def save_data(\n    self,\n    file: str | PathLike,\n    links: LinkGraph | None = None,\n) -&gt; None:\n    \"\"\"Pickle data to a file.\n\n    The  pickled data is a tuple of BGCs, GCFs, Spectra, MolecularFamilies, StrainCollection and\n    links, i.e. `(bgcs, gcfs, spectra, mfs, strains, links)`.\n\n    Args:\n        file: The path to the pickle file to save the data to.\n        links: The LinkGraph object to save.\n\n    Examples:\n        Saving the data to a pickle file, links data is `None`:\n        &gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\")\n\n        Also saving the links data:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n        &gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n    \"\"\"\n    data = (self.bgcs, self.gcfs, self.spectra, self.mfs, self.strains, links)\n    with open(file, \"wb\") as f:\n        pickle.dump(data, f)\n</code></pre>"},{"location":"api/nplinker/#nplinker.setup_logging","title":"setup_logging","text":"<pre><code>setup_logging(\n    level: str = \"INFO\",\n    file: str = \"\",\n    use_console: bool = True,\n) -&gt; None\n</code></pre> <p>Setup logging configuration for the ancestor logger \"nplinker\".</p> Usage Documentation <p>How to setup logging</p> <p>Parameters:</p> <ul> <li> <code>level</code>               (<code>str</code>, default:                   <code>'INFO'</code> )           \u2013            <p>The log level, use the logging module's log level constants. Valid levels are: <code>NOTSET</code>, <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code>, <code>CRITICAL</code>.</p> </li> <li> <code>file</code>               (<code>str</code>, default:                   <code>''</code> )           \u2013            <p>The file to write the log to. If the file is an empty string (by default), the log will not be written to a file. If the file does not exist, it will be created. The log will be written to the file in append mode.</p> </li> <li> <code>use_console</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>Whether to log to the console.</p> </li> </ul> Source code in <code>src/nplinker/logger.py</code> <pre><code>def setup_logging(level: str = \"INFO\", file: str = \"\", use_console: bool = True) -&gt; None:\n    \"\"\"Setup logging configuration for the ancestor logger \"nplinker\".\n\n    ??? info \"Usage Documentation\"\n        [How to setup logging][how-to-setup-logging]\n\n    Args:\n        level: The log level, use the logging module's log level constants.\n            Valid levels are: `NOTSET`, `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.\n        file: The file to write the log to.\n            If the file is an empty string (by default), the log will not be written to a file.\n            If the file does not exist, it will be created.\n            The log will be written to the file in append mode.\n        use_console: Whether to log to the console.\n    \"\"\"\n    # Get the ancestor logger \"nplinker\"\n    logger = logging.getLogger(\"nplinker\")\n    logger.setLevel(level)\n\n    # File handler\n    if file:\n        logger.addHandler(\n            RichHandler(\n                console=Console(file=open(file, \"a\"), width=120),  # force the line width to 120\n                omit_repeated_times=False,\n                rich_tracebacks=True,\n                tracebacks_show_locals=True,\n                log_time_format=\"[%Y-%m-%d %X]\",\n            )\n        )\n\n    # Console handler\n    if use_console:\n        logger.addHandler(\n            RichHandler(\n                omit_repeated_times=False,\n                rich_tracebacks=True,\n                tracebacks_show_locals=True,\n                log_time_format=\"[%Y-%m-%d %X]\",\n            )\n        )\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults","title":"nplinker.defaults","text":""},{"location":"api/nplinker/#nplinker.defaults.NPLINKER_APP_DATA_DIR","title":"NPLINKER_APP_DATA_DIR  <code>module-attribute</code>","text":"<pre><code>NPLINKER_APP_DATA_DIR: Final = parent / 'data'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.STRAIN_MAPPINGS_FILENAME","title":"STRAIN_MAPPINGS_FILENAME  <code>module-attribute</code>","text":"<pre><code>STRAIN_MAPPINGS_FILENAME: Final = 'strain_mappings.json'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GENOME_BGC_MAPPINGS_FILENAME","title":"GENOME_BGC_MAPPINGS_FILENAME  <code>module-attribute</code>","text":"<pre><code>GENOME_BGC_MAPPINGS_FILENAME: Final = (\n    \"genome_bgc_mappings.json\"\n)\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GENOME_STATUS_FILENAME","title":"GENOME_STATUS_FILENAME  <code>module-attribute</code>","text":"<pre><code>GENOME_STATUS_FILENAME: Final = 'genome_status.json'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_SPECTRA_FILENAME","title":"GNPS_SPECTRA_FILENAME  <code>module-attribute</code>","text":"<pre><code>GNPS_SPECTRA_FILENAME: Final = 'spectra.mgf'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_MOLECULAR_FAMILY_FILENAME","title":"GNPS_MOLECULAR_FAMILY_FILENAME  <code>module-attribute</code>","text":"<pre><code>GNPS_MOLECULAR_FAMILY_FILENAME: Final = (\n    \"molecular_families.tsv\"\n)\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_ANNOTATIONS_FILENAME","title":"GNPS_ANNOTATIONS_FILENAME  <code>module-attribute</code>","text":"<pre><code>GNPS_ANNOTATIONS_FILENAME: Final = 'annotations.tsv'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_FILE_MAPPINGS_TSV","title":"GNPS_FILE_MAPPINGS_TSV  <code>module-attribute</code>","text":"<pre><code>GNPS_FILE_MAPPINGS_TSV: Final = 'file_mappings.tsv'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_FILE_MAPPINGS_CSV","title":"GNPS_FILE_MAPPINGS_CSV  <code>module-attribute</code>","text":"<pre><code>GNPS_FILE_MAPPINGS_CSV: Final = 'file_mappings.csv'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.STRAINS_SELECTED_FILENAME","title":"STRAINS_SELECTED_FILENAME  <code>module-attribute</code>","text":"<pre><code>STRAINS_SELECTED_FILENAME: Final = 'strains_selected.json'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.DOWNLOADS_DIRNAME","title":"DOWNLOADS_DIRNAME  <code>module-attribute</code>","text":"<pre><code>DOWNLOADS_DIRNAME: Final = 'downloads'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.MIBIG_DIRNAME","title":"MIBIG_DIRNAME  <code>module-attribute</code>","text":"<pre><code>MIBIG_DIRNAME: Final = 'mibig'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_DIRNAME","title":"GNPS_DIRNAME  <code>module-attribute</code>","text":"<pre><code>GNPS_DIRNAME: Final = 'gnps'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.ANTISMASH_DIRNAME","title":"ANTISMASH_DIRNAME  <code>module-attribute</code>","text":"<pre><code>ANTISMASH_DIRNAME: Final = 'antismash'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.BIGSCAPE_DIRNAME","title":"BIGSCAPE_DIRNAME  <code>module-attribute</code>","text":"<pre><code>BIGSCAPE_DIRNAME: Final = 'bigscape'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.BIGSCAPE_RUNNING_OUTPUT_DIRNAME","title":"BIGSCAPE_RUNNING_OUTPUT_DIRNAME  <code>module-attribute</code>","text":"<pre><code>BIGSCAPE_RUNNING_OUTPUT_DIRNAME: Final = (\n    \"bigscape_running_output\"\n)\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.OUTPUT_DIRNAME","title":"OUTPUT_DIRNAME  <code>module-attribute</code>","text":"<pre><code>OUTPUT_DIRNAME: Final = 'output'\n</code></pre>"},{"location":"api/nplinker/#nplinker.config","title":"nplinker.config","text":""},{"location":"api/nplinker/#nplinker.config.CONFIG_VALIDATORS","title":"CONFIG_VALIDATORS  <code>module-attribute</code>","text":"<pre><code>CONFIG_VALIDATORS = [\n    Validator(\n        \"root_dir\",\n        required=True,\n        cast=transform_to_full_path,\n        condition=lambda v: is_dir(),\n    ),\n    Validator(\n        \"mode\",\n        required=True,\n        cast=lambda v: lower(),\n        is_in=[\"local\", \"podp\"],\n    ),\n    Validator(\n        \"podp_id\",\n        required=True,\n        when=Validator(\"mode\", eq=\"podp\"),\n    ),\n    Validator(\n        \"podp_id\",\n        required=False,\n        when=Validator(\"mode\", eq=\"local\"),\n    ),\n    Validator(\n        \"log.level\",\n        is_type_of=str,\n        cast=lambda v: upper(),\n        is_in=[\n            \"NOTSET\",\n            \"DEBUG\",\n            \"INFO\",\n            \"WARNING\",\n            \"ERROR\",\n            \"CRITICAL\",\n        ],\n    ),\n    Validator(\"log.file\", is_type_of=str),\n    Validator(\"log.use_console\", is_type_of=bool),\n    Validator(\n        \"mibig.to_use\", required=True, is_type_of=bool\n    ),\n    Validator(\n        \"mibig.version\",\n        required=True,\n        is_type_of=str,\n        when=Validator(\"mibig.to_use\", eq=True),\n    ),\n    Validator(\n        \"bigscape.parameters\", required=True, is_type_of=str\n    ),\n    Validator(\n        \"bigscape.cutoff\", required=True, is_type_of=str\n    ),\n    Validator(\n        \"bigscape.version\", required=True, is_type_of=int\n    ),\n    Validator(\n        \"scoring.methods\",\n        required=True,\n        cast=lambda v: [lower() for i in v],\n        is_type_of=list,\n        len_min=1,\n        condition=lambda v: issubset(\n            {\"metcalf\", \"rosetta\"}\n        ),\n    ),\n]\n</code></pre>"},{"location":"api/nplinker/#nplinker.config.load_config","title":"load_config","text":"<pre><code>load_config(config_file: str | PathLike) -&gt; Dynaconf\n</code></pre> <p>Load and validate the configuration file.</p> Usage Documentation <p>Config Loader</p> <p>Parameters:</p> <ul> <li> <code>config_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the configuration file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>Dynaconf</code> (              <code>Dynaconf</code> )          \u2013            <p>A Dynaconf object containing the configuration settings.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the configuration file does not exist.</p> </li> </ul> Source code in <code>src/nplinker/config.py</code> <pre><code>def load_config(config_file: str | PathLike) -&gt; Dynaconf:\n    \"\"\"Load and validate the configuration file.\n\n    ??? info \"Usage Documentation\"\n        [Config Loader][config-loader]\n\n    Args:\n        config_file: Path to the configuration file.\n\n    Returns:\n        Dynaconf: A Dynaconf object containing the configuration settings.\n\n    Raises:\n        FileNotFoundError: If the configuration file does not exist.\n    \"\"\"\n    config_file = transform_to_full_path(config_file)\n    if not config_file.exists():\n        raise FileNotFoundError(f\"Config file '{config_file}' not found\")\n\n    # Locate the default config file\n    default_config_file = Path(__file__).resolve().parent / \"nplinker_default.toml\"\n\n    # Load config files\n    config = Dynaconf(settings_files=[config_file], preload=[default_config_file])\n\n    # Validate configs\n    config.validators.register(*CONFIG_VALIDATORS)\n    config.validators.validate()\n\n    return config\n</code></pre>"},{"location":"api/schema/","title":"Schemas","text":""},{"location":"api/schema/#nplinker.schemas","title":"nplinker.schemas","text":""},{"location":"api/schema/#nplinker.schemas.GENOME_STATUS_SCHEMA","title":"GENOME_STATUS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>GENOME_STATUS_SCHEMA = load(f)\n</code></pre> <p>Schema for the genome status JSON file.</p> Schema Content: <code>genome_status_schema.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/genome_status_schema.json\",\n  \"title\": \"Status of genomes\",\n  \"description\": \"A list of genome status objects, each of which contains information about a single genome\",\n  \"type\": \"object\",\n  \"required\": [\n    \"genome_status\",\n    \"version\"\n  ],\n  \"properties\": {\n    \"genome_status\": {\n      \"type\": \"array\",\n      \"title\": \"Genome status\",\n      \"description\": \"A list of genome status objects\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"original_id\",\n          \"resolved_refseq_id\",\n          \"resolve_attempted\",\n          \"bgc_path\"\n        ],\n        \"properties\": {\n          \"original_id\": {\n            \"type\": \"string\",\n            \"title\": \"Original ID\",\n            \"description\": \"The original ID of the genome\",\n            \"minLength\": 1\n          },\n          \"resolved_refseq_id\": {\n            \"type\": \"string\",\n            \"title\": \"Resolved RefSeq ID\",\n            \"description\": \"The RefSeq ID that was resolved for this genome\"\n          },\n          \"resolve_attempted\": {\n            \"type\": \"boolean\",\n            \"title\": \"Resolve Attempted\",\n            \"description\": \"Whether or not an attempt was made to resolve this genome\"\n          },\n          \"bgc_path\": {\n            \"type\": \"string\",\n            \"title\": \"BGC Path\",\n            \"description\": \"The path to the downloaded BGC file for this genome\"\n          }\n        }\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.GENOME_BGC_MAPPINGS_SCHEMA","title":"GENOME_BGC_MAPPINGS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>GENOME_BGC_MAPPINGS_SCHEMA = load(f)\n</code></pre> <p>Schema for genome BGC mappings JSON file.</p> Schema Content: <code>genome_bgc_mappings_schema.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/genome_bgc_mappings_schema.json\",\n  \"title\": \"Mappings from genome ID to BGC IDs\",\n  \"description\": \"A list of mappings from genome ID to BGC (biosynthetic gene cluster) IDs\",\n  \"type\": \"object\",\n  \"required\": [\n    \"mappings\",\n    \"version\"\n  ],\n  \"properties\": {\n    \"mappings\": {\n      \"type\": \"array\",\n      \"title\": \"Mappings from genome ID to BGC IDs\",\n      \"description\": \"A list of mappings from genome ID to BGC IDs\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"genome_ID\",\n          \"BGC_ID\"\n        ],\n        \"properties\": {\n          \"genome_ID\": {\n            \"type\": \"string\",\n            \"title\": \"Genome ID\",\n            \"description\": \"The genome ID used in BGC database such as antiSMASH\",\n            \"minLength\": 1\n          },\n          \"BGC_ID\": {\n            \"type\": \"array\",\n            \"title\": \"BGC ID\",\n            \"description\": \"A list of BGC IDs\",\n            \"items\": {\n              \"type\": \"string\",\n              \"minLength\": 1\n            },\n            \"minItems\": 1,\n            \"uniqueItems\": true\n          }\n        }\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.STRAIN_MAPPINGS_SCHEMA","title":"STRAIN_MAPPINGS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>STRAIN_MAPPINGS_SCHEMA = load(f)\n</code></pre> <p>Schema for strain mappings JSON file.</p> Schema Content: <code>strain_mappings_schema.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/strain_mappings_schema.json\",\n  \"title\": \"Strain mappings\",\n  \"description\": \"A list of mappings from strain ID to strain aliases\",\n  \"type\": \"object\",\n  \"required\": [\n    \"strain_mappings\",\n    \"version\"\n  ],\n  \"properties\": {\n    \"strain_mappings\": {\n      \"type\": \"array\",\n      \"title\": \"Strain mappings\",\n      \"description\": \"A list of strain mappings\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"strain_id\",\n          \"strain_alias\"\n        ],\n        \"properties\": {\n          \"strain_id\": {\n            \"type\": \"string\",\n            \"title\": \"Strain ID\",\n            \"description\": \"Strain ID, which could be any strain name or accession number\",\n            \"minLength\": 1\n          },\n          \"strain_alias\": {\n            \"type\": \"array\",\n            \"title\": \"Strain aliases\",\n            \"description\": \"A list of strain aliases, which could be any names that refer to the same strain\",\n            \"items\": {\n              \"type\": \"string\",\n              \"minLength\": 1\n            },\n            \"minItems\": 1,\n            \"uniqueItems\": true\n          }\n        }\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.USER_STRAINS_SCHEMA","title":"USER_STRAINS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>USER_STRAINS_SCHEMA = load(f)\n</code></pre> <p>Schema for user strains JSON file.</p> Schema Content: <code>user_strains.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/user_strains.json\",\n  \"title\": \"User specified strains\",\n  \"description\": \"A list of strain IDs specified by user\",\n  \"type\": \"object\",\n  \"required\": [\n    \"strain_ids\"\n  ],\n  \"properties\": {\n    \"strain_ids\": {\n      \"type\": \"array\",\n      \"title\": \"Strain IDs\",\n      \"description\": \"A list of strain IDs specified by user. The strain IDs must be the same as the ones in the strain mappings file.\",\n      \"items\": {\n        \"type\": \"string\",\n        \"minLength\": 1\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.PODP_ADAPTED_SCHEMA","title":"PODP_ADAPTED_SCHEMA  <code>module-attribute</code>","text":"<pre><code>PODP_ADAPTED_SCHEMA = load(f)\n</code></pre> <p>Schema for PODP JSON file.</p> <p>The PODP JSON file is the project JSON file downloaded from PODP platform. For example, for PODP project MSV000079284, its JSON file is https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.</p> Schema Content: <code>podp_adapted_schema.json</code> <pre><code>{\n  \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/podp_adapted_schema.json\",\n  \"title\": \"Adapted Paired Omics Data Platform Schema for NPLinker\",\n  \"description\": \"This schema is adapted from PODP schema (https://pairedomicsdata.bioinformatics.nl/schema.json) for NPLinker. It's used to validate the input data for NPLinker. Thus, only required fields for NPLinker are kept in this schema, and some fields are modified to fit NPLinker's requirements.\",\n  \"type\": \"object\",\n  \"required\": [\n    \"version\",\n    \"metabolomics\",\n    \"genomes\",\n    \"genome_metabolome_links\"\n  ],\n  \"properties\": {\n    \"version\": {\n      \"type\": \"string\",\n      \"readOnly\": true,\n      \"default\": \"3\",\n      \"enum\": [\n        \"3\"\n      ]\n    },\n    \"metabolomics\": {\n      \"type\": \"object\",\n      \"title\": \"2. Metabolomics Information\",\n      \"description\": \"Please provide basic information on the publicly available metabolomics project from which paired data is available. Currently, we allow for links to mass spectrometry data deposited in GNPS-MaSSIVE or MetaboLights.\",\n      \"properties\": {\n        \"project\": {\n          \"type\": \"object\",\n          \"required\": [\n            \"molecular_network\"\n          ],\n          \"title\": \"GNPS-MassIVE\",\n          \"properties\": {\n            \"GNPSMassIVE_ID\": {\n              \"type\": \"string\",\n              \"title\": \"GNPS-MassIVE identifier\",\n              \"description\": \"Please provide the GNPS-MassIVE identifier of your metabolomics data set, e.g., MSV000078839.\",\n              \"pattern\": \"^MSV[0-9]{9}$\"\n            },\n            \"MaSSIVE_URL\": {\n              \"type\": \"string\",\n              \"title\": \"Link to MassIVE upload\",\n              \"description\": \"Please provide the link to the MassIVE upload, e.g., &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=a507232a787243a5afd69a6c6fa1e508&amp;view=advanced_view\\\"&gt;https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=a507232a787243a5afd69a6c6fa1e508&amp;view=advanced_view&lt;/a&gt;. Warning, there cannot be spaces in the URI.\",\n              \"format\": \"uri\"\n            },\n            \"molecular_network\": {\n              \"type\": \"string\",\n              \"pattern\": \"^[0-9a-z]{32}$\",\n              \"title\": \"Molecular Network Task ID\",\n              \"description\": \"If you have run a Molecular Network on GNPS, please provide the task ID of the Molecular Network job. It can be found in the URL of the Molecular Networking job, e.g., in &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c36f90ba29fe44c18e96db802de0c6b9\\\"&gt;https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c36f90ba29fe44c18e96db802de0c6b9&lt;/a&gt; the task ID is c36f90ba29fe44c18e96db802de0c6b9.\"\n            }\n          }\n        }\n      },\n      \"required\": [\n        \"project\"\n      ],\n      \"additionalProperties\": true\n    },\n    \"genomes\": {\n      \"type\": \"array\",\n      \"title\": \"3. (Meta)genomics Information\",\n      \"description\": \"Please add all genomes and/or metagenomes for which paired data is available as separate entries.\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"genome_ID\",\n          \"genome_label\"\n        ],\n        \"properties\": {\n          \"genome_ID\": {\n            \"type\": \"object\",\n            \"title\": \"Genome accession\",\n            \"description\": \"At least one of the three identifiers is required.\",\n            \"anyOf\": [\n              {\n                \"required\": [\n                  \"GenBank_accession\"\n                ]\n              },\n              {\n                \"required\": [\n                  \"RefSeq_accession\"\n                ]\n              },\n              {\n                \"required\": [\n                  \"JGI_Genome_ID\"\n                ]\n              }\n            ],\n            \"properties\": {\n              \"GenBank_accession\": {\n                \"type\": \"string\",\n                \"title\": \"GenBank accession number\",\n                \"description\": \"If the publicly available genome got a GenBank accession number assigned, e.g., &lt;a href=\\\"https://www.ncbi.nlm.nih.gov/nuccore/AL645882\\\" target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\"&gt;AL645882&lt;/a&gt;, please provide it here. The genome sequence must be submitted to GenBank/ENA/DDBJ (and an accession number must be received) before this form can be filled out. In case of a whole genome sequence, please use master records. At least one identifier must be entered.\",\n                \"minLength\": 1\n              },\n              \"RefSeq_accession\": {\n                \"type\": \"string\",\n                \"title\": \"RefSeq accession number\",\n                \"description\": \"For example: &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://www.ncbi.nlm.nih.gov/nuccore/NC_003888.3\\\"&gt;NC_003888.3&lt;/a&gt;\",\n                \"minLength\": 1\n              },\n              \"JGI_Genome_ID\": {\n                \"type\": \"string\",\n                \"title\": \"JGI IMG genome ID\",\n                \"description\": \"For example: &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=TaxonDetail&amp;page=taxonDetail&amp;taxon_oid=641228474\\\"&gt;641228474&lt;/a&gt;\",\n                \"minLength\": 1\n              }\n            }\n          },\n          \"genome_label\": {\n            \"type\": \"string\",\n            \"title\": \"Genome label\",\n            \"description\": \"Please assign a unique Genome Label for this genome or metagenome to help you recall it during the linking step. For example 'Streptomyces sp. CNB091'\",\n            \"minLength\": 1\n          }\n        }\n      },\n      \"minItems\": 1\n    },\n    \"genome_metabolome_links\": {\n      \"type\": \"array\",\n      \"title\": \"6. Genome - Proteome - Metabolome Links\",\n      \"description\": \"Create a linked pair by selecting the Genome Label and optional Proteome label as provided earlier. Subsequently links to the metabolomics data file belonging to that genome/proteome with appropriate experimental methods.\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"genome_label\",\n          \"metabolomics_file\"\n        ],\n        \"properties\": {\n          \"genome_label\": {\n            \"type\": \"string\",\n            \"title\": \"Genome/Metagenome\",\n            \"description\": \"Please select the Genome Label to be linked to a metabolomics data file.\"\n          },\n          \"metabolomics_file\": {\n            \"type\": \"string\",\n            \"title\": \"Location of metabolomics data file\",\n            \"description\": \"Please provide a direct link to the metabolomics data file location, e.g. &lt;a href=\\\"ftp://massive.ucsd.edu/MSV000078839/spectrum/R5/CNB091_R5_M.mzXML\\\" target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\"&gt;ftp://massive.ucsd.edu/MSV000078839/spectrum/R5/CNB091_R5_M.mzXML&lt;/a&gt; found in the FTP download of a MassIVE dataset or &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://www.ebi.ac.uk/metabolights/MTBLS307/files/Urine_44_fullscan1_pos.mzXML\\\"&gt;https://www.ebi.ac.uk/metabolights/MTBLS307/files/Urine_44_fullscan1_pos.mzXML&lt;/a&gt; found in the Files section of a MetaboLights study. Warning, there cannot be spaces in the URI.\",\n            \"format\": \"uri\"\n          }\n        },\n        \"additionalProperties\": true\n      },\n      \"minItems\": 1\n    }\n  },\n  \"additionalProperties\": true\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.validate_podp_json","title":"validate_podp_json","text":"<pre><code>validate_podp_json(json_data: dict) -&gt; None\n</code></pre> <p>Validate JSON data against the PODP JSON schema.</p> <p>All validation error messages are collected and raised as a single ValueError.</p> <p>Parameters:</p> <ul> <li> <code>json_data</code>               (<code>dict</code>)           \u2013            <p>The JSON data to validate.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the JSON data does not match the schema.</p> </li> </ul> <p>Examples:</p> <p>Download PODP JSON file for project MSV000079284 from https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4 and save it as <code>podp_project.json</code>.</p> <p>Validate it:</p> <pre><code>&gt;&gt;&gt; with open(podp_project.json, \"r\") as f:\n...     json_data = json.load(f)\n&gt;&gt;&gt; validate_podp_json(json_data)\n</code></pre> Source code in <code>src/nplinker/schemas/__init__.py</code> <pre><code>def validate_podp_json(json_data: dict) -&gt; None:\n    \"\"\"Validate JSON data against the PODP JSON schema.\n\n    All validation error messages are collected and raised as a single\n    ValueError.\n\n    Args:\n        json_data: The JSON data to validate.\n\n    Raises:\n        ValueError: If the JSON data does not match the schema.\n\n    Examples:\n        Download PODP JSON file for project MSV000079284 from\n        https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4\n        and save it as `podp_project.json`.\n\n        Validate it:\n        &gt;&gt;&gt; with open(podp_project.json, \"r\") as f:\n        ...     json_data = json.load(f)\n        &gt;&gt;&gt; validate_podp_json(json_data)\n    \"\"\"\n    validator = Draft7Validator(PODP_ADAPTED_SCHEMA)\n    errors = sorted(validator.iter_errors(json_data), key=lambda e: e.path)\n    if errors:\n        error_messages = [f\"{e.json_path}: {e.message}\" for e in errors]\n        raise ValueError(\n            \"Not match PODP adapted schema, here are the detailed error:\\n  - \"\n            + \"\\n  - \".join(error_messages)\n        )\n</code></pre>"},{"location":"api/scoring/","title":"Data Models","text":""},{"location":"api/scoring/#nplinker.scoring","title":"nplinker.scoring","text":""},{"location":"api/scoring/#nplinker.scoring.LinkGraph","title":"LinkGraph","text":"<pre><code>LinkGraph()\n</code></pre> <p>Class to represent the links between objects in NPLinker.</p> <p>This class wraps the <code>networkx.Graph</code> class to provide a more user-friendly interface for working with the links.</p> <p>The links between objects are stored as edges in a graph, while the objects themselves are stored as nodes.</p> <p>The scoring data for each link (or link data) is stored as the key/value attributes of the edge.</p> <p>Examples:</p> <p>Create a LinkGraph object:</p> <pre><code>&gt;&gt;&gt; lg = LinkGraph()\n</code></pre> <p>Add a link between a GCF and a Spectrum object:</p> <pre><code>&gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n</code></pre> <p>Get all links for a given object:</p> <pre><code>&gt;&gt;&gt; lg[gcf]\n{spectrum: {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}}\n</code></pre> <p>Get all links in the LinkGraph:</p> <pre><code>&gt;&gt;&gt; lg.links\n[(gcf, spectrum, {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})})]\n</code></pre> <p>Check if there is a link between two objects:</p> <pre><code>&gt;&gt;&gt; lg.has_link(gcf, spectrum)\nTrue\n</code></pre> <p>Get the link data between two objects:</p> <pre><code>&gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n{\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def __init__(self) -&gt; None:\n    \"\"\"Initialize a LinkGraph object.\n\n    Examples:\n        Create a LinkGraph object:\n        &gt;&gt;&gt; lg = LinkGraph()\n\n        Add a link between a GCF and a Spectrum object:\n        &gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n\n        Get all links for a given object:\n        &gt;&gt;&gt; lg[gcf]\n        {spectrum: {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}}\n\n        Get all links in the LinkGraph:\n        &gt;&gt;&gt; lg.links\n        [(gcf, spectrum, {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})})]\n\n        Check if there is a link between two objects:\n        &gt;&gt;&gt; lg.has_link(gcf, spectrum)\n        True\n\n        Get the link data between two objects:\n        &gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n        {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n    \"\"\"\n    self._g: Graph = Graph()\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.links","title":"links  <code>property</code>","text":"<pre><code>links: list[LINK]\n</code></pre> <p>Get all links.</p> <p>Returns:</p> <ul> <li> <code>list[LINK]</code>           \u2013            <p>A list of tuples containing the links between objects.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.links\n[(gcf, spectrum, {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})})]\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> <p>Get a short summary of the LinkGraph.</p> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def __str__(self) -&gt; str:\n    \"\"\"Get a short summary of the LinkGraph.\"\"\"\n    return f\"{self.__class__.__name__}(#links={len(self.links)}, #objects={len(self)})\"\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.__len__","title":"__len__","text":"<pre><code>__len__() -&gt; int\n</code></pre> <p>Get the number of objects.</p> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def __len__(self) -&gt; int:\n    \"\"\"Get the number of objects.\"\"\"\n    return len(self._g)\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.__getitem__","title":"__getitem__","text":"<pre><code>__getitem__(u: Entity) -&gt; dict[Entity, LINK_DATA]\n</code></pre> <p>Get all links for a given object.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the given object</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[Entity, LINK_DATA]</code>           \u2013            <p>A dictionary of links for the given object.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>KeyError</code>             \u2013            <p>if the input object is not found in the link graph.</p> </li> </ul> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_u\ndef __getitem__(self, u: Entity) -&gt; dict[Entity, LINK_DATA]:\n    \"\"\"Get all links for a given object.\n\n    Args:\n        u: the given object\n\n    Returns:\n        A dictionary of links for the given object.\n\n    Raises:\n        KeyError: if the input object is not found in the link graph.\n    \"\"\"\n    try:\n        links = self._g[u]\n    except KeyError:\n        raise KeyError(f\"{u} not found in the link graph.\")\n\n    return {**links}  # type: ignore\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.add_link","title":"add_link","text":"<pre><code>add_link(u: Entity, v: Entity, **data: Score) -&gt; None\n</code></pre> <p>Add a link between two objects.</p> <p>The objects <code>u</code> and <code>v</code> must be different types, i.e. one must be a GCF and the other must be a Spectrum or MolecularFamily.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the first object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>v</code>               (<code>Entity</code>)           \u2013            <p>the second object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>data</code>               (<code>Score</code>, default:                   <code>{}</code> )           \u2013            <p>keyword arguments. At least one scoring method and its data must be provided. The key must be the name of the scoring method defined in <code>ScoringMethod</code>, and the value is a <code>Score</code> object, e.g. <code>metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})</code>.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_uv\ndef add_link(\n    self,\n    u: Entity,\n    v: Entity,\n    **data: Score,\n) -&gt; None:\n    \"\"\"Add a link between two objects.\n\n    The objects `u` and `v` must be different types, i.e. one must be a GCF and the other must be\n    a Spectrum or MolecularFamily.\n\n    Args:\n        u: the first object, either a GCF, Spectrum, or MolecularFamily\n        v: the second object, either a GCF, Spectrum, or MolecularFamily\n        data: keyword arguments. At least one scoring method and its data must be provided.\n            The key must be the name of the scoring method defined in `ScoringMethod`, and the\n            value is a `Score` object, e.g. `metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})`.\n\n    Examples:\n        &gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n    \"\"\"\n    # validate the data\n    if not data:\n        raise ValueError(\"At least one scoring method and its data must be provided.\")\n    for key, value in data.items():\n        if not ScoringMethod.has_value(key):\n            raise ValueError(\n                f\"{key} is not a valid name of scoring method. See `ScoringMethod` for valid names.\"\n            )\n        if not isinstance(value, Score):\n            raise TypeError(f\"{value} is not a Score object.\")\n\n    self._g.add_edge(u, v, **data)\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.has_link","title":"has_link","text":"<pre><code>has_link(u: Entity, v: Entity) -&gt; bool\n</code></pre> <p>Check if there is a link between two objects.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the first object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>v</code>               (<code>Entity</code>)           \u2013            <p>the second object, either a GCF, Spectrum, or MolecularFamily</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if there is a link between the two objects, False otherwise</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.has_link(gcf, spectrum)\nTrue\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_uv\ndef has_link(self, u: Entity, v: Entity) -&gt; bool:\n    \"\"\"Check if there is a link between two objects.\n\n    Args:\n        u: the first object, either a GCF, Spectrum, or MolecularFamily\n        v: the second object, either a GCF, Spectrum, or MolecularFamily\n\n    Returns:\n        True if there is a link between the two objects, False otherwise\n\n    Examples:\n        &gt;&gt;&gt; lg.has_link(gcf, spectrum)\n        True\n    \"\"\"\n    return self._g.has_edge(u, v)\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.get_link_data","title":"get_link_data","text":"<pre><code>get_link_data(u: Entity, v: Entity) -&gt; LINK_DATA | None\n</code></pre> <p>Get the data for a link between two objects.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the first object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>v</code>               (<code>Entity</code>)           \u2013            <p>the second object, either a GCF, Spectrum, or MolecularFamily</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>LINK_DATA | None</code>           \u2013            <p>A dictionary of scoring methods and their data for the link between the two objects, or</p> </li> <li> <code>LINK_DATA | None</code>           \u2013            <p>None if there is no link between the two objects.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n{\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_uv\ndef get_link_data(\n    self,\n    u: Entity,\n    v: Entity,\n) -&gt; LINK_DATA | None:\n    \"\"\"Get the data for a link between two objects.\n\n    Args:\n        u: the first object, either a GCF, Spectrum, or MolecularFamily\n        v: the second object, either a GCF, Spectrum, or MolecularFamily\n\n    Returns:\n        A dictionary of scoring methods and their data for the link between the two objects, or\n        None if there is no link between the two objects.\n\n    Examples:\n        &gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n        {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n    \"\"\"\n    return self._g.get_edge_data(u, v)  # type: ignore\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.filter","title":"filter","text":"<pre><code>filter(\n    u_nodes: Sequence[Entity],\n    v_nodes: Sequence[Entity] = [],\n) -&gt; LinkGraph\n</code></pre> <p>Return a new LinkGraph object with the filtered links between the given objects.</p> <p>The new LinkGraph object will only contain the links between <code>u_nodes</code> and <code>v_nodes</code>.</p> <p>If <code>u_nodes</code> or <code>v_nodes</code> is empty, the new LinkGraph object will contain the links for the given objects in <code>v_nodes</code> or <code>u_nodes</code>, respectively. If both are empty, return an empty LinkGraph object.</p> <p>Note that not all objects in <code>u_nodes</code> and <code>v_nodes</code> need to be present in the original LinkGraph.</p> <p>Parameters:</p> <ul> <li> <code>u_nodes</code>               (<code>Sequence[Entity]</code>)           \u2013            <p>a sequence of objects used as the first object in the links</p> </li> <li> <code>v_nodes</code>               (<code>Sequence[Entity]</code>, default:                   <code>[]</code> )           \u2013            <p>a sequence of objects used as the second object in the links</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>LinkGraph</code>           \u2013            <p>A new LinkGraph object with the filtered links between the given objects.</p> </li> </ul> <p>Examples:</p> <p>Filter the links for <code>gcf1</code> and <code>gcf2</code>:</p> <pre><code>&gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2])\nFilter the links for `spectrum1` and `spectrum2`:\n&gt;&gt;&gt; new_lg = lg.filter([spectrum1, spectrum2])\nFilter the links between two lists of objects:\n&gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def filter(self, u_nodes: Sequence[Entity], v_nodes: Sequence[Entity] = [], /) -&gt; LinkGraph:\n    \"\"\"Return a new LinkGraph object with the filtered links between the given objects.\n\n    The new LinkGraph object will only contain the links between `u_nodes` and `v_nodes`.\n\n    If `u_nodes` or `v_nodes` is empty, the new LinkGraph object will contain the links for\n    the given objects in `v_nodes` or `u_nodes`, respectively. If both are empty, return an\n    empty LinkGraph object.\n\n    Note that not all objects in `u_nodes` and `v_nodes` need to be present in the original\n    LinkGraph.\n\n    Args:\n        u_nodes: a sequence of objects used as the first object in the links\n        v_nodes: a sequence of objects used as the second object in the links\n\n    Returns:\n        A new LinkGraph object with the filtered links between the given objects.\n\n    Examples:\n        Filter the links for `gcf1` and `gcf2`:\n        &gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2])\n        Filter the links for `spectrum1` and `spectrum2`:\n        &gt;&gt;&gt; new_lg = lg.filter([spectrum1, spectrum2])\n        Filter the links between two lists of objects:\n        &gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])\n    \"\"\"\n    lg = LinkGraph()\n\n    # exchange u_nodes and v_nodes if u_nodes is empty but v_nodes not\n    if len(u_nodes) == 0 and len(v_nodes) != 0:\n        u_nodes = v_nodes\n        v_nodes = []\n\n    if len(v_nodes) == 0:\n        for u in u_nodes:\n            self._filter_one_node(u, lg)\n\n    for u in u_nodes:\n        for v in v_nodes:\n            self._filter_two_nodes(u, v, lg)\n\n    return lg\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score","title":"Score  <code>dataclass</code>","text":"<pre><code>Score(name: str, value: float, parameter: dict)\n</code></pre> <p>A data class to represent score data.</p> <p>Attributes:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>the name of the scoring method. See <code>ScoringMethod</code> for valid values.</p> </li> <li> <code>value</code>               (<code>float</code>)           \u2013            <p>the score value.</p> </li> <li> <code>parameter</code>               (<code>dict</code>)           \u2013            <p>the parameters used for the scoring method.</p> </li> </ul>"},{"location":"api/scoring/#nplinker.scoring.Score.name","title":"name  <code>instance-attribute</code>","text":"<pre><code>name: str\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.value","title":"value  <code>instance-attribute</code>","text":"<pre><code>value: float\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.parameter","title":"parameter  <code>instance-attribute</code>","text":"<pre><code>parameter: dict\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.__post_init__","title":"__post_init__","text":"<pre><code>__post_init__() -&gt; None\n</code></pre> <p>Check if the value of <code>name</code> is valid.</p> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>if the value of <code>name</code> is not valid.</p> </li> </ul> Source code in <code>src/nplinker/scoring/score.py</code> <pre><code>def __post_init__(self) -&gt; None:\n    \"\"\"Check if the value of `name` is valid.\n\n    Raises:\n        ValueError: if the value of `name` is not valid.\n    \"\"\"\n    if ScoringMethod.has_value(self.name) is False:\n        raise ValueError(\n            f\"{self.name} is not a valid value. Valid values are: {[e.value for e in ScoringMethod]}\"\n        )\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.__getitem__","title":"__getitem__","text":"<pre><code>__getitem__(key)\n</code></pre> Source code in <code>src/nplinker/scoring/score.py</code> <pre><code>def __getitem__(self, key):\n    if key in {field.name for field in fields(self)}:\n        return getattr(self, key)\n    else:\n        raise KeyError(f\"{key} not found in {self.__class__.__name__}\")\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.__setitem__","title":"__setitem__","text":"<pre><code>__setitem__(key, value)\n</code></pre> Source code in <code>src/nplinker/scoring/score.py</code> <pre><code>def __setitem__(self, key, value):\n    # validate the value of `name`\n    if key == \"name\" and ScoringMethod.has_value(value) is False:\n        raise ValueError(\n            f\"{value} is not a valid value. Valid values are: {[e.value for e in ScoringMethod]}\"\n        )\n\n    if key in {field.name for field in fields(self)}:\n        setattr(self, key, value)\n    else:\n        raise KeyError(f\"{key} not found in {self.__class__.__name__}\")\n</code></pre>"},{"location":"api/scoring_abc/","title":"Abstract Base Classes","text":""},{"location":"api/scoring_abc/#nplinker.scoring.abc","title":"nplinker.scoring.abc","text":""},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase","title":"ScoringBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class of scoring methods.</p> <p>Attributes:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>The name of the scoring method.</p> </li> <li> <code>npl</code>               (<code>NPLinker | None</code>)           \u2013            <p>The NPLinker object.</p> </li> </ul>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.name","title":"name  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>name: str = 'ScoringBase'\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.npl","title":"npl  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>npl: NPLinker | None = None\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.setup","title":"setup  <code>abstractmethod</code> <code>classmethod</code>","text":"<pre><code>setup(npl: NPLinker)\n</code></pre> <p>Setup class level attributes.</p> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@classmethod\n@abstractmethod\ndef setup(cls, npl: NPLinker):\n    \"\"\"Setup class level attributes.\"\"\"\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.get_links","title":"get_links  <code>abstractmethod</code>","text":"<pre><code>get_links(*objects, **parameters) -&gt; LinkGraph\n</code></pre> <p>Get links information for the given objects.</p> <p>Parameters:</p> <ul> <li> <code>objects</code>           \u2013            <p>A list of objects to get links for.</p> </li> <li> <code>parameters</code>           \u2013            <p>The parameters used for scoring.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>LinkGraph</code>           \u2013            <p>The LinkGraph object.</p> </li> </ul> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@abstractmethod\ndef get_links(\n    self,\n    *objects,\n    **parameters,\n) -&gt; LinkGraph:\n    \"\"\"Get links information for the given objects.\n\n    Args:\n        objects: A list of objects to get links for.\n        parameters: The parameters used for scoring.\n\n    Returns:\n        The LinkGraph object.\n    \"\"\"\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.format_data","title":"format_data  <code>abstractmethod</code>","text":"<pre><code>format_data(data) -&gt; str\n</code></pre> <p>Format the scoring data to a string.</p> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@abstractmethod\ndef format_data(self, data) -&gt; str:\n    \"\"\"Format the scoring data to a string.\"\"\"\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.sort","title":"sort  <code>abstractmethod</code>","text":"<pre><code>sort(objects, reverse=True) -&gt; list\n</code></pre> <p>Sort the given objects based on the scoring data.</p> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@abstractmethod\ndef sort(self, objects, reverse=True) -&gt; list:\n    \"\"\"Sort the given objects based on the scoring data.\"\"\"\n</code></pre>"},{"location":"api/scoring_methods/","title":"Scoring Methods","text":""},{"location":"api/scoring_methods/#nplinker.scoring","title":"nplinker.scoring","text":""},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod","title":"ScoringMethod","text":"<p>               Bases: <code>Enum</code></p> <p>Enum class for scoring methods.</p>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.METCALF","title":"METCALF  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>METCALF = 'metcalf'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.ROSETTA","title":"ROSETTA  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>ROSETTA = 'rosetta'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.NPLCLASS","title":"NPLCLASS  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>NPLCLASS = 'nplclass'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.has_value","title":"has_value  <code>classmethod</code>","text":"<pre><code>has_value(value: str) -&gt; bool\n</code></pre> <p>Check if the enum has a value.</p> Source code in <code>src/nplinker/scoring/scoring_method.py</code> <pre><code>@classmethod\ndef has_value(cls, value: str) -&gt; bool:\n    \"\"\"Check if the enum has a value.\"\"\"\n    return any(value == item.value for item in cls)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring","title":"MetcalfScoring","text":"<p>               Bases: <code>ScoringBase</code></p> <p>Metcalf scoring method.</p> <p>Attributes:</p> <ul> <li> <code>name</code>           \u2013            <p>The name of this scoring method, set to a fixed value <code>metcalf</code>.</p> </li> <li> <code>npl</code>               (<code>NPLinker | None</code>)           \u2013            <p>The NPLinker object.</p> </li> <li> <code>CACHE</code>               (<code>str</code>)           \u2013            <p>The name of the cache file to use for storing the MetcalfScoring.</p> </li> <li> <code>presence_gcf_strain</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store presence of gcfs with respect to strains. The index of the DataFrame are the GCF objects and the columns are Strain objects. The values are 1 where the gcf occurs in the strain, 0 otherwise.</p> </li> <li> <code>presence_spec_strain</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store presence of spectra with respect to strains. The index of the DataFrame are the Spectrum objects and the columns are Strain objects. The values are 1 where the spectrum occurs in the strain, 0 otherwise.</p> </li> <li> <code>presence_mf_strain</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store presence of molecular families with respect to strains. The index of the DataFrame are the MolecularFamily objects and the columns are Strain objects. The values are 1 where the molecular family occurs in the strain, 0 otherwise.</p> </li> <li> <code>raw_score_spec_gcf</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store the raw Metcalf scores for spectrum-gcf links. The columns are \"spec\", \"gcf\" and \"score\":</p> <ul> <li>The \"spec\" and \"gcf\" columns contain the Spectrum and GCF objects respectively,</li> <li>The \"score\" column contains the raw Metcalf scores.</li> </ul> </li> <li> <code>raw_score_mf_gcf</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store the raw Metcalf scores for molecular family-gcf links. The columns are \"mf\", \"gcf\" and \"score\":</p> <ul> <li>The \"mf\" and \"gcf\" columns contain the MolecularFamily and GCF objects respectively,</li> <li>the \"score\" column contains the raw Metcalf scores.</li> </ul> </li> <li> <code>metcalf_mean</code>               (<code>ndarray | None</code>)           \u2013            <p>A numpy array to store the mean value used for standardising Metcalf scores. The array has shape (n_strains+1, n_strains+1), where n_strains is the number of strains.</p> </li> <li> <code>metcalf_std</code>               (<code>ndarray | None</code>)           \u2013            <p>A numpy array to store the standard deviation value used for standardising Metcalf scores. The array has shape (n_strains+1, n_strains+1), where n_strains is the number of strains.</p> </li> </ul>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.name","title":"name  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>name = METCALF.value\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.npl","title":"npl  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>npl: NPLinker | None = None\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.CACHE","title":"CACHE  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>CACHE: str = 'cache_metcalf_scoring.pckl'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.metcalf_weights","title":"metcalf_weights  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>metcalf_weights: tuple[int, int, int, int] = (10, -10, 0, 1)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.presence_gcf_strain","title":"presence_gcf_strain  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>presence_gcf_strain: DataFrame = DataFrame()\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.presence_spec_strain","title":"presence_spec_strain  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>presence_spec_strain: DataFrame = DataFrame()\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.presence_mf_strain","title":"presence_mf_strain  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>presence_mf_strain: DataFrame = DataFrame()\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.raw_score_spec_gcf","title":"raw_score_spec_gcf  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>raw_score_spec_gcf: DataFrame = DataFrame(\n    columns=[\"spec\", \"gcf\", \"score\"]\n)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.raw_score_mf_gcf","title":"raw_score_mf_gcf  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>raw_score_mf_gcf: DataFrame = DataFrame(\n    columns=[\"mf\", \"gcf\", \"score\"]\n)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.metcalf_mean","title":"metcalf_mean  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>metcalf_mean: ndarray | None = None\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.metcalf_std","title":"metcalf_std  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>metcalf_std: ndarray | None = None\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.setup","title":"setup  <code>classmethod</code>","text":"<pre><code>setup(npl: NPLinker) -&gt; None\n</code></pre> <p>Setup the MetcalfScoring object.</p> <p>This method is only called once to setup the MetcalfScoring object.</p> <p>Parameters:</p> <ul> <li> <code>npl</code>               (<code>NPLinker</code>)           \u2013            <p>The NPLinker object.</p> </li> </ul> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>@classmethod\ndef setup(cls, npl: NPLinker) -&gt; None:\n    \"\"\"Setup the MetcalfScoring object.\n\n    This method is only called once to setup the MetcalfScoring object.\n\n    Args:\n        npl: The NPLinker object.\n    \"\"\"\n    if cls.npl is not None:\n        logger.info(\"MetcalfScoring.setup already called, skipping.\")\n        return\n\n    logger.info(\n        f\"MetcalfScoring.setup starts: #bgcs={len(npl.bgcs)}, #gcfs={len(npl.gcfs)}, \"\n        f\"#spectra={len(npl.spectra)}, #mfs={len(npl.mfs)}, #strains={npl.strains}\"\n    )\n    cls.npl = npl\n\n    # calculate presence of gcfs/spectra/mfs with respect to strains\n    cls.presence_gcf_strain = get_presence_gcf_strain(npl.gcfs, npl.strains)\n    cls.presence_spec_strain = get_presence_spec_strain(npl.spectra, npl.strains)\n    cls.presence_mf_strain = get_presence_mf_strain(npl.mfs, npl.strains)\n\n    # calculate raw Metcalf scores for spec-gcf links\n    raw_score_spec_gcf = cls._calc_raw_score(\n        cls.presence_spec_strain, cls.presence_gcf_strain, cls.metcalf_weights\n    )\n    cls.raw_score_spec_gcf = raw_score_spec_gcf.reset_index().melt(id_vars=\"index\")\n    cls.raw_score_spec_gcf.columns = [\"spec\", \"gcf\", \"score\"]  # type: ignore\n\n    # calculate raw Metcalf scores for spec-gcf links\n    raw_score_mf_gcf = cls._calc_raw_score(\n        cls.presence_mf_strain, cls.presence_gcf_strain, cls.metcalf_weights\n    )\n    cls.raw_score_mf_gcf = raw_score_mf_gcf.reset_index().melt(id_vars=\"index\")\n    cls.raw_score_mf_gcf.columns = [\"mf\", \"gcf\", \"score\"]  # type: ignore\n\n    # calculate mean and std for standardising Metcalf scores\n    cls.metcalf_mean, cls.metcalf_std = cls._calc_mean_std(\n        len(npl.strains), cls.metcalf_weights\n    )\n\n    logger.info(\"MetcalfScoring.setup completed\")\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.get_links","title":"get_links","text":"<pre><code>get_links(*objects, **parameters)\n</code></pre> <p>Get links for the given objects.</p> <p>Parameters:</p> <ul> <li> <code>objects</code>           \u2013            <p>The objects to get links for. All objects must be of the same type, i.e. <code>GCF</code>, <code>Spectrum</code> or <code>MolecularFamily</code> type. If no objects are provided, all detected objects (<code>npl.gcfs</code>) will be used.</p> </li> <li> <code>parameters</code>           \u2013            <p>The scoring parameters to use for the links. The parameters are:</p> <ul> <li><code>cutoff</code>: The minimum score to consider a link (\u2265cutoff). Default is 0.</li> <li><code>standardised</code>: Whether to use standardised scores. Default is False.</li> </ul> </li> </ul> <p>Returns:</p> <ul> <li>           \u2013            <p>The <code>LinkGraph</code> object containing the links involving the input objects with the Metcalf scores.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>TypeError</code>             \u2013            <p>If the input objects are not of the same type or the object type is invalid.</p> </li> </ul> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>def get_links(self, *objects, **parameters):\n    \"\"\"Get links for the given objects.\n\n    Args:\n        objects: The objects to get links for. All objects must be of the same type, i.e. `GCF`,\n            `Spectrum` or `MolecularFamily` type.\n            If no objects are provided, all detected objects (`npl.gcfs`) will be used.\n        parameters: The scoring parameters to use for the links.\n            The parameters are:\n\n            - `cutoff`: The minimum score to consider a link (\u2265cutoff). Default is 0.\n            - `standardised`: Whether to use standardised scores. Default is False.\n\n    Returns:\n        The [`LinkGraph`][nplinker.scoring.LinkGraph] object containing the links involving the\n            input objects with the Metcalf scores.\n\n    Raises:\n        TypeError: If the input objects are not of the same type or the object type is invalid.\n    \"\"\"\n    # validate input objects\n    if len(objects) == 0:\n        objects = self.npl.gcfs\n    # check if all objects are of the same type\n    types = {type(i) for i in objects}\n    if len(types) &gt; 1:\n        raise TypeError(\"Input objects must be of the same type.\")\n    # check if the object type is valid\n    obj_type = next(iter(types))\n    if obj_type not in (GCF, Spectrum, MolecularFamily):\n        raise TypeError(\n            f\"Invalid type {obj_type}. Input objects must be GCF, Spectrum or MolecularFamily objects.\"\n        )\n\n    # validate scoring parameters\n    self._cutoff: float = parameters.get(\"cutoff\", 0)\n    self._standardised: bool = parameters.get(\"standardised\", False)\n    parameters.update({\"cutoff\": self._cutoff, \"standardised\": self._standardised})\n\n    logger.info(\n        f\"MetcalfScoring: #objects={len(objects)}, type={obj_type}, cutoff={self._cutoff}, \"\n        f\"standardised={self._standardised}\"\n    )\n    if not self._standardised:\n        scores_list = self._get_links(*objects, obj_type=obj_type, score_cutoff=self._cutoff)\n    else:\n        if self.metcalf_mean is None or self.metcalf_std is None:\n            raise ValueError(\n                \"MetcalfScoring.metcalf_mean and metcalf_std are not set. Run MetcalfScoring.setup first.\"\n            )\n        # use negative infinity as the score cutoff to ensure we get all links\n        scores_list = self._get_links(*objects, obj_type=obj_type, score_cutoff=-np.inf)\n        scores_list = self._calc_standardised_score(scores_list)\n\n    links = LinkGraph()\n    for score_df in scores_list:\n        for row in score_df.itertuples(index=False):  # row has attributes: spec/mf, gcf, score\n            met = row.spec if score_df.name == LinkType.SPEC_GCF else row.mf\n            links.add_link(\n                row.gcf,\n                met,\n                metcalf=Score(self.name, row.score, parameters),\n            )\n\n    logger.info(f\"MetcalfScoring: completed! Found {len(links.links)} links in total.\")\n    return links\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.format_data","title":"format_data","text":"<pre><code>format_data(data)\n</code></pre> <p>Format the data for display.</p> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>def format_data(self, data):\n    \"\"\"Format the data for display.\"\"\"\n    # for metcalf the data will just be a floating point value (i.e. the score)\n    return f\"{data:.4f}\"\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.sort","title":"sort","text":"<pre><code>sort(objects, reverse=True)\n</code></pre> <p>Sort the objects based on the score.</p> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>def sort(self, objects, reverse=True):\n    \"\"\"Sort the objects based on the score.\"\"\"\n    # sort based on score\n    return sorted(objects, key=lambda objlink: objlink[self], reverse=reverse)\n</code></pre>"},{"location":"api/scoring_utils/","title":"Utilities","text":""},{"location":"api/scoring_utils/#nplinker.scoring.utils","title":"nplinker.scoring.utils","text":""},{"location":"api/scoring_utils/#nplinker.scoring.utils.get_presence_gcf_strain","title":"get_presence_gcf_strain","text":"<pre><code>get_presence_gcf_strain(\n    gcfs: Sequence[GCF], strains: StrainCollection\n) -&gt; DataFrame\n</code></pre> <p>Get the occurrence of strains in gcfs.</p> <p>The occurrence is a DataFrame with GCF objects as index and Strain objects as columns, and the values are 1 if the gcf occurs in the strain,  0 otherwise.</p> Source code in <code>src/nplinker/scoring/utils.py</code> <pre><code>def get_presence_gcf_strain(gcfs: Sequence[GCF], strains: StrainCollection) -&gt; pd.DataFrame:\n    \"\"\"Get the occurrence of strains in gcfs.\n\n    The occurrence is a DataFrame with GCF objects as index and Strain objects as columns, and the\n    values are 1 if the gcf occurs in the strain,  0 otherwise.\n    \"\"\"\n    df_gcf_strain = pd.DataFrame(\n        0,\n        index=gcfs,\n        columns=list(strains),\n        dtype=int,\n    )  # type: ignore\n    for gcf in gcfs:\n        for strain in strains:\n            if gcf.has_strain(strain):\n                df_gcf_strain.loc[gcf, strain] = 1\n    return df_gcf_strain  # type: ignore\n</code></pre>"},{"location":"api/scoring_utils/#nplinker.scoring.utils.get_presence_spec_strain","title":"get_presence_spec_strain","text":"<pre><code>get_presence_spec_strain(\n    spectra: Sequence[Spectrum], strains: StrainCollection\n) -&gt; DataFrame\n</code></pre> <p>Get the occurrence of strains in spectra.</p> <p>The occurrence is a DataFrame with Spectrum objects as index and Strain objects as columns, and the values are 1 if the spectrum occurs in the strain, 0 otherwise.</p> Source code in <code>src/nplinker/scoring/utils.py</code> <pre><code>def get_presence_spec_strain(\n    spectra: Sequence[Spectrum], strains: StrainCollection\n) -&gt; pd.DataFrame:\n    \"\"\"Get the occurrence of strains in spectra.\n\n    The occurrence is a DataFrame with Spectrum objects as index and Strain objects as columns, and\n    the values are 1 if the spectrum occurs in the strain, 0 otherwise.\n    \"\"\"\n    df_spec_strain = pd.DataFrame(\n        0,\n        index=spectra,\n        columns=list(strains),\n        dtype=int,\n    )  # type: ignore\n    for spectrum in spectra:\n        for strain in strains:\n            if spectrum.has_strain(strain):\n                df_spec_strain.loc[spectrum, strain] = 1\n    return df_spec_strain  # type: ignore\n</code></pre>"},{"location":"api/scoring_utils/#nplinker.scoring.utils.get_presence_mf_strain","title":"get_presence_mf_strain","text":"<pre><code>get_presence_mf_strain(\n    mfs: Sequence[MolecularFamily],\n    strains: StrainCollection,\n) -&gt; DataFrame\n</code></pre> <p>Get the occurrence of strains in molecular families.</p> <p>The occurrence is a DataFrame with MolecularFamily objects as index and Strain objects as columns, and the values are 1 if the molecular family occurs in the strain, 0 otherwise.</p> Source code in <code>src/nplinker/scoring/utils.py</code> <pre><code>def get_presence_mf_strain(\n    mfs: Sequence[MolecularFamily], strains: StrainCollection\n) -&gt; pd.DataFrame:\n    \"\"\"Get the occurrence of strains in molecular families.\n\n    The occurrence is a DataFrame with MolecularFamily objects as index and Strain objects as\n    columns, and the values are 1 if the molecular family occurs in the strain, 0 otherwise.\n    \"\"\"\n    df_mf_strain = pd.DataFrame(\n        0,\n        index=mfs,\n        columns=list(strains),\n        dtype=int,\n    )  # type: ignore\n    for mf in mfs:\n        for strain in strains:\n            if mf.has_strain(strain):\n                df_mf_strain.loc[mf, strain] = 1\n    return df_mf_strain  # type: ignore\n</code></pre>"},{"location":"api/strain/","title":"Data Models","text":""},{"location":"api/strain/#nplinker.strain","title":"nplinker.strain","text":""},{"location":"api/strain/#nplinker.strain.Strain","title":"Strain","text":"<pre><code>Strain(id: str)\n</code></pre> <p>Class to model the mapping between strain id and its aliases.</p> <p>It's recommended to use NCBI taxonomy strain id or name as the primary id.</p> <p>Attributes:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>The representative id of the strain.</p> </li> <li> <code>names</code>               (<code>set[str]</code>)           \u2013            <p>A set of names associated with the strain.</p> </li> <li> <code>aliases</code>               (<code>set[str]</code>)           \u2013            <p>A set of aliases associated with the strain.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the representative id of the strain.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __init__(self, id: str) -&gt; None:\n    \"\"\"To model the mapping between strain id and its aliases.\n\n    Args:\n        id: the representative id of the strain.\n    \"\"\"\n    self.id: str = id\n    self._aliases: set[str] = set()\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id: str = id\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.names","title":"names  <code>property</code>","text":"<pre><code>names: set[str]\n</code></pre> <p>Get the set of strain names including id and aliases.</p> <p>Returns:</p> <ul> <li> <code>set[str]</code>           \u2013            <p>A set of names associated with the strain.</p> </li> </ul>"},{"location":"api/strain/#nplinker.strain.Strain.aliases","title":"aliases  <code>property</code>","text":"<pre><code>aliases: set[str]\n</code></pre> <p>Get the set of known aliases.</p> <p>Returns:</p> <ul> <li> <code>set[str]</code>           \u2013            <p>A set of aliases associated with the strain.</p> </li> </ul>"},{"location":"api/strain/#nplinker.strain.Strain.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __str__(self) -&gt; str:\n    return f\"Strain({self.id}) [{len(self._aliases)} aliases]\"\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, Strain):\n        return self.id == other.id\n    return NotImplemented\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> <p>Hash function for Strain.</p> <p>Note that Strain is a mutable container, so here we hash on only the id to avoid the hash value changes when <code>self._aliases</code> is updated.</p> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __hash__(self) -&gt; int:\n    \"\"\"Hash function for Strain.\n\n    Note that Strain is a mutable container, so here we hash on only the id\n    to avoid the hash value changes when `self._aliases` is updated.\n    \"\"\"\n    return hash(self.id)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__contains__","title":"__contains__","text":"<pre><code>__contains__(alias: str) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __contains__(self, alias: str) -&gt; bool:\n    if not isinstance(alias, str):\n        raise TypeError(f\"Expected str, got {type(alias)}\")\n    return alias in self._aliases\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.add_alias","title":"add_alias","text":"<pre><code>add_alias(alias: str) -&gt; None\n</code></pre> <p>Add an alias for the strain.</p> <p>Parameters:</p> <ul> <li> <code>alias</code>               (<code>str</code>)           \u2013            <p>The alias to add for the strain.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def add_alias(self, alias: str) -&gt; None:\n    \"\"\"Add an alias for the strain.\n\n    Args:\n        alias: The alias to add for the strain.\n    \"\"\"\n    if not isinstance(alias, str):\n        raise TypeError(f\"Expected str, got {type(alias)}\")\n    if len(alias) == 0:\n        logger.warning(\"Refusing to add an empty-string alias to strain {%s}\", self)\n    else:\n        self._aliases.add(alias)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection","title":"StrainCollection","text":"<pre><code>StrainCollection()\n</code></pre> <p>A collection of <code>Strain</code> objects.</p> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __init__(self) -&gt; None:\n    # the order of strains is needed for scoring part, so use a list\n    self._strains: list[Strain] = []\n    self._strain_dict_name: dict[str, list[Strain]] = {}\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __str__(self) -&gt; str:\n    if len(self) &gt; 20:\n        return f\"StrainCollection(n={len(self)})\"\n\n    return f\"StrainCollection(n={len(self)}) [\" + \",\".join(s.id for s in self._strains) + \"]\"\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__len__","title":"__len__","text":"<pre><code>__len__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __len__(self) -&gt; int:\n    return len(self._strains)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, StrainCollection):\n        return (\n            self._strains == other._strains\n            and self._strain_dict_name == other._strain_dict_name\n        )\n    return NotImplemented\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__add__","title":"__add__","text":"<pre><code>__add__(other) -&gt; StrainCollection\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __add__(self, other) -&gt; StrainCollection:\n    if isinstance(other, StrainCollection):\n        sc = StrainCollection()\n        for strain in self._strains:\n            sc.add(strain)\n        for strain in other._strains:\n            sc.add(strain)\n        return sc\n    return NotImplemented\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__contains__","title":"__contains__","text":"<pre><code>__contains__(item: Strain) -&gt; bool\n</code></pre> <p>Check if the strain collection contains the given Strain object.</p> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __contains__(self, item: Strain) -&gt; bool:\n    \"\"\"Check if the strain collection contains the given Strain object.\"\"\"\n    if isinstance(item, Strain):\n        return item.id in self._strain_dict_name\n    raise TypeError(f\"Expected Strain, got {type(item)}\")\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__iter__","title":"__iter__","text":"<pre><code>__iter__() -&gt; Iterator[Strain]\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __iter__(self) -&gt; Iterator[Strain]:\n    return iter(self._strains)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.add","title":"add","text":"<pre><code>add(strain: Strain) -&gt; None\n</code></pre> <p>Add strain to the collection.</p> <p>If the strain already exists, merge the aliases.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p>The strain to add.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def add(self, strain: Strain) -&gt; None:\n    \"\"\"Add strain to the collection.\n\n    If the strain already exists, merge the aliases.\n\n    Args:\n        strain: The strain to add.\n    \"\"\"\n    if strain in self._strains:\n        # only one strain object per id\n        strain_ref = self._strain_dict_name[strain.id][0]\n        new_aliases = [alias for alias in strain.aliases if alias not in strain_ref.aliases]\n        for alias in new_aliases:\n            strain_ref.add_alias(alias)\n            if alias not in self._strain_dict_name:\n                self._strain_dict_name[alias] = [strain_ref]\n            else:\n                self._strain_dict_name[alias].append(strain_ref)\n    else:\n        self._strains.append(strain)\n        for name in strain.names:\n            if name not in self._strain_dict_name:\n                self._strain_dict_name[name] = [strain]\n            else:\n                self._strain_dict_name[name].append(strain)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.remove","title":"remove","text":"<pre><code>remove(strain: Strain) -&gt; None\n</code></pre> <p>Remove a strain from the collection.</p> <p>It removes the given strain object from the collection by strain id. If the strain id is not found, raise <code>ValueError</code>.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p>The strain to remove.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the strain is not found in the collection.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def remove(self, strain: Strain) -&gt; None:\n    \"\"\"Remove a strain from the collection.\n\n    It removes the given strain object from the collection by strain id.\n    If the strain id is not found, raise `ValueError`.\n\n    Args:\n        strain: The strain to remove.\n\n    Raises:\n        ValueError: If the strain is not found in the collection.\n    \"\"\"\n    if strain in self._strains:\n        self._strains.remove(strain)\n        # only one strain object per id\n        strain_ref = self._strain_dict_name[strain.id][0]\n        for name in strain_ref.names:\n            if name in self._strain_dict_name:\n                new_strain_list = [s for s in self._strain_dict_name[name] if s.id != strain.id]\n                if not new_strain_list:\n                    del self._strain_dict_name[name]\n                else:\n                    self._strain_dict_name[name] = new_strain_list\n    else:\n        raise ValueError(f\"Strain {strain} not found in the strain collection.\")\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.filter","title":"filter","text":"<pre><code>filter(strain_set: set[Strain])\n</code></pre> <p>Remove all strains that are not in <code>strain_set</code> from the strain collection.</p> <p>Parameters:</p> <ul> <li> <code>strain_set</code>               (<code>set[Strain]</code>)           \u2013            <p>Set of strains to keep.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def filter(self, strain_set: set[Strain]):\n    \"\"\"Remove all strains that are not in `strain_set` from the strain collection.\n\n    Args:\n        strain_set: Set of strains to keep.\n    \"\"\"\n    # note that we need to copy the list of strains, as we are modifying it\n    for strain in self._strains.copy():\n        if strain not in strain_set:\n            self.remove(strain)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.intersection","title":"intersection","text":"<pre><code>intersection(other: StrainCollection) -&gt; StrainCollection\n</code></pre> <p>Get the intersection of two strain collections.</p> <p>Parameters:</p> <ul> <li> <code>other</code>               (<code>StrainCollection</code>)           \u2013            <p>The other strain collection to compare.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>StrainCollection</code>           \u2013            <p>StrainCollection object containing the strains that are in both collections.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def intersection(self, other: StrainCollection) -&gt; StrainCollection:\n    \"\"\"Get the intersection of two strain collections.\n\n    Args:\n        other: The other strain collection to compare.\n\n    Returns:\n        StrainCollection object containing the strains that are in both collections.\n    \"\"\"\n    intersection = StrainCollection()\n    for strain in self:\n        if strain in other:\n            intersection.add(strain)\n    return intersection\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.has_name","title":"has_name","text":"<pre><code>has_name(name: str) -&gt; bool\n</code></pre> <p>Check if the strain collection contains the given strain name (id or alias).</p> <p>Parameters:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>Strain name (id or alias) to check.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if the strain name is in the collection, False otherwise.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def has_name(self, name: str) -&gt; bool:\n    \"\"\"Check if the strain collection contains the given strain name (id or alias).\n\n    Args:\n        name: Strain name (id or alias) to check.\n\n    Returns:\n        True if the strain name is in the collection, False otherwise.\n    \"\"\"\n    return name in self._strain_dict_name\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.lookup","title":"lookup","text":"<pre><code>lookup(name: str) -&gt; list[Strain]\n</code></pre> <p>Lookup a strain by name (id or alias).</p> <p>Parameters:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>Strain name (id or alias) to lookup.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[Strain]</code>           \u2013            <p>List of Strain objects with the given name.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the strain name is not found.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def lookup(self, name: str) -&gt; list[Strain]:\n    \"\"\"Lookup a strain by name (id or alias).\n\n    Args:\n        name: Strain name (id or alias) to lookup.\n\n    Returns:\n        List of Strain objects with the given name.\n\n    Raises:\n        ValueError: If the strain name is not found.\n    \"\"\"\n    if name in self._strain_dict_name:\n        return self._strain_dict_name[name]\n    raise ValueError(f\"Strain {name} not found in the strain collection.\")\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.read_json","title":"read_json  <code>staticmethod</code>","text":"<pre><code>read_json(file: str | PathLike) -&gt; StrainCollection\n</code></pre> <p>Read a strain mappings JSON file and return a <code>StrainCollection</code> object.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the strain mappings JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>StrainCollection</code>           \u2013            <p><code>StrainCollection</code> object.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>@staticmethod\ndef read_json(file: str | PathLike) -&gt; StrainCollection:\n    \"\"\"Read a strain mappings JSON file and return a `StrainCollection` object.\n\n    Args:\n        file: Path to the strain mappings JSON file.\n\n    Returns:\n        `StrainCollection` object.\n    \"\"\"\n    with open(file, \"r\") as f:\n        json_data = json.load(f)\n\n    # validate json data\n    validate(instance=json_data, schema=STRAIN_MAPPINGS_SCHEMA)\n\n    strain_collection = StrainCollection()\n    for data in json_data[\"strain_mappings\"]:\n        strain = Strain(data[\"strain_id\"])\n        for alias in data[\"strain_alias\"]:\n            strain.add_alias(alias)\n        strain_collection.add(strain)\n    return strain_collection\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.to_json","title":"to_json","text":"<pre><code>to_json(file: str | PathLike | None = None) -&gt; str | None\n</code></pre> <p>Convert the <code>StrainCollection</code> object to a JSON string.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>Path to output JSON file. If None, return the JSON string instead.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str | None</code>           \u2013            <p>If input <code>file</code> is None, return the JSON string. Otherwise, write the JSON string to the given</p> </li> <li> <code>str | None</code>           \u2013            <p>file.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def to_json(self, file: str | PathLike | None = None) -&gt; str | None:\n    \"\"\"Convert the `StrainCollection` object to a JSON string.\n\n    Args:\n        file: Path to output JSON file. If None, return the JSON string instead.\n\n    Returns:\n        If input `file` is None, return the JSON string. Otherwise, write the JSON string to the given\n        file.\n    \"\"\"\n    data_list = [\n        {\"strain_id\": strain.id, \"strain_alias\": list(strain.aliases)} for strain in self\n    ]\n    json_data = {\"strain_mappings\": data_list, \"version\": \"1.0\"}\n\n    # validate json data\n    validate(instance=json_data, schema=STRAIN_MAPPINGS_SCHEMA)\n\n    if file is not None:\n        with open(file, \"w\") as f:\n            json.dump(json_data, f)\n        return None\n    return json.dumps(json_data)\n</code></pre>"},{"location":"api/strain_utils/","title":"Utilities","text":""},{"location":"api/strain_utils/#nplinker.strain.utils","title":"nplinker.strain.utils","text":""},{"location":"api/strain_utils/#nplinker.strain.utils.load_user_strains","title":"load_user_strains","text":"<pre><code>load_user_strains(json_file: str | PathLike) -&gt; set[Strain]\n</code></pre> <p>Load user specified strains from a JSON file.</p> <p>The JSON file will be validated against the schema USER_STRAINS_SCHEMA</p> <p>The content of the JSON file could be, for example: <pre><code>{\"strain_ids\": [\"strain1\", \"strain2\"]}\n</code></pre></p> <p>Parameters:</p> <ul> <li> <code>json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the JSON file containing user specified strains.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>set[Strain]</code>           \u2013            <p>A set of user specified strains.</p> </li> </ul> Source code in <code>src/nplinker/strain/utils.py</code> <pre><code>def load_user_strains(json_file: str | PathLike) -&gt; set[Strain]:\n    \"\"\"Load user specified strains from a JSON file.\n\n    The JSON file will be validated against the schema\n    [USER_STRAINS_SCHEMA][nplinker.schemas.USER_STRAINS_SCHEMA]\n\n    The content of the JSON file could be, for example:\n    ```\n    {\"strain_ids\": [\"strain1\", \"strain2\"]}\n    ```\n\n    Args:\n        json_file: Path to the JSON file containing user specified strains.\n\n    Returns:\n        A set of user specified strains.\n    \"\"\"\n    with open(json_file, \"r\") as f:\n        json_data = json.load(f)\n\n    # validate json data\n    validate(instance=json_data, schema=USER_STRAINS_SCHEMA)\n\n    strains = set()\n    for strain_id in json_data[\"strain_ids\"]:\n        strains.add(Strain(strain_id))\n\n    return strains\n</code></pre>"},{"location":"api/strain_utils/#nplinker.strain.utils.podp_generate_strain_mappings","title":"podp_generate_strain_mappings","text":"<pre><code>podp_generate_strain_mappings(\n    podp_project_json_file: str | PathLike,\n    genome_status_json_file: str | PathLike,\n    genome_bgc_mappings_file: str | PathLike,\n    gnps_file_mappings_file: str | PathLike,\n    output_json_file: str | PathLike,\n) -&gt; StrainCollection\n</code></pre> <p>Generate strain mappings JSON file for PODP pipeline.</p> <p>To get the strain mappings, we need to combine the following mappings:</p> <ul> <li>strain_id &lt;-&gt; original_genome_id &lt;-&gt; resolved_genome_id &lt;-&gt; bgc_id</li> <li>strain_id &lt;-&gt; MS_filename &lt;-&gt; spectrum_id</li> </ul> <p>These mappings are extracted from the following files:</p> <ul> <li>\"strain_id &lt;-&gt; original_genome_id\" is extracted from <code>podp_project_json_file</code>.</li> <li>\"original_genome_id &lt;-&gt; resolved_genome_id\" is extracted from <code>genome_status_json_file</code>.</li> <li>\"resolved_genome_id &lt;-&gt; bgc_id\" is extracted from <code>genome_bgc_mappings_file</code>.</li> <li>\"strain_id &lt;-&gt; MS_filename\" is extracted from <code>podp_project_json_file</code>.</li> <li>\"MS_filename &lt;-&gt; spectrum_id\" is extracted from <code>gnps_file_mappings_file</code>.</li> </ul> <p>Parameters:</p> <ul> <li> <code>podp_project_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the PODP project JSON file.</p> </li> <li> <code>genome_status_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome status JSON file.</p> </li> <li> <code>genome_bgc_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome BGC mappings JSON file.</p> </li> <li> <code>gnps_file_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the GNPS file mappings file (csv or tsv).</p> </li> <li> <code>output_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the output JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>StrainCollection</code>           \u2013            <p>The strain mappings stored in a StrainCollection object.</p> </li> </ul> See Also <ul> <li><code>extract_mappings_strain_id_original_genome_id</code>: Extract mappings     \"strain_id &lt;-&gt; original_genome_id\".</li> <li><code>extract_mappings_original_genome_id_resolved_genome_id</code>: Extract mappings     \"original_genome_id &lt;-&gt; resolved_genome_id\".</li> <li><code>extract_mappings_resolved_genome_id_bgc_id</code>: Extract mappings     \"resolved_genome_id &lt;-&gt; bgc_id\".</li> <li><code>get_mappings_strain_id_bgc_id</code>: Get mappings \"strain_id &lt;-&gt; bgc_id\".</li> <li><code>extract_mappings_strain_id_ms_filename</code>: Extract mappings     \"strain_id &lt;-&gt; MS_filename\".</li> <li><code>extract_mappings_ms_filename_spectrum_id</code>: Extract mappings     \"MS_filename &lt;-&gt; spectrum_id\".</li> <li><code>get_mappings_strain_id_spectrum_id</code>: Get mappings \"strain_id &lt;-&gt; spectrum_id\".</li> </ul> Source code in <code>src/nplinker/strain/utils.py</code> <pre><code>def podp_generate_strain_mappings(\n    podp_project_json_file: str | PathLike,\n    genome_status_json_file: str | PathLike,\n    genome_bgc_mappings_file: str | PathLike,\n    gnps_file_mappings_file: str | PathLike,\n    output_json_file: str | PathLike,\n) -&gt; StrainCollection:\n    \"\"\"Generate strain mappings JSON file for PODP pipeline.\n\n    To get the strain mappings, we need to combine the following mappings:\n\n    - strain_id &lt;-&gt; original_genome_id &lt;-&gt; resolved_genome_id &lt;-&gt; bgc_id\n    - strain_id &lt;-&gt; MS_filename &lt;-&gt; spectrum_id\n\n    These mappings are extracted from the following files:\n\n    - \"strain_id &lt;-&gt; original_genome_id\" is extracted from `podp_project_json_file`.\n    - \"original_genome_id &lt;-&gt; resolved_genome_id\" is extracted from `genome_status_json_file`.\n    - \"resolved_genome_id &lt;-&gt; bgc_id\" is extracted from `genome_bgc_mappings_file`.\n    - \"strain_id &lt;-&gt; MS_filename\" is extracted from `podp_project_json_file`.\n    - \"MS_filename &lt;-&gt; spectrum_id\" is extracted from `gnps_file_mappings_file`.\n\n    Args:\n        podp_project_json_file: The path to the PODP project\n            JSON file.\n        genome_status_json_file: The path to the genome status\n            JSON file.\n        genome_bgc_mappings_file: The path to the genome BGC\n            mappings JSON file.\n        gnps_file_mappings_file: The path to the GNPS file\n            mappings file (csv or tsv).\n        output_json_file: The path to the output JSON file.\n\n    Returns:\n        The strain mappings stored in a StrainCollection object.\n\n    See Also:\n        - `extract_mappings_strain_id_original_genome_id`: Extract mappings\n            \"strain_id &lt;-&gt; original_genome_id\".\n        - `extract_mappings_original_genome_id_resolved_genome_id`: Extract mappings\n            \"original_genome_id &lt;-&gt; resolved_genome_id\".\n        - `extract_mappings_resolved_genome_id_bgc_id`: Extract mappings\n            \"resolved_genome_id &lt;-&gt; bgc_id\".\n        - `get_mappings_strain_id_bgc_id`: Get mappings \"strain_id &lt;-&gt; bgc_id\".\n        - `extract_mappings_strain_id_ms_filename`: Extract mappings\n            \"strain_id &lt;-&gt; MS_filename\".\n        - `extract_mappings_ms_filename_spectrum_id`: Extract mappings\n            \"MS_filename &lt;-&gt; spectrum_id\".\n        - `get_mappings_strain_id_spectrum_id`: Get mappings \"strain_id &lt;-&gt; spectrum_id\".\n    \"\"\"\n    # Get mappings strain_id &lt;-&gt; original_genome_id &lt;-&gt; resolved_genome_id &lt;-&gt; bgc_id\n    mappings_strain_id_bgc_id = get_mappings_strain_id_bgc_id(\n        extract_mappings_strain_id_original_genome_id(podp_project_json_file),\n        extract_mappings_original_genome_id_resolved_genome_id(genome_status_json_file),\n        extract_mappings_resolved_genome_id_bgc_id(genome_bgc_mappings_file),\n    )\n\n    # Get mappings strain_id &lt;-&gt; MS_filename &lt;-&gt; spectrum_id\n    mappings_strain_id_spectrum_id = get_mappings_strain_id_spectrum_id(\n        extract_mappings_strain_id_ms_filename(podp_project_json_file),\n        extract_mappings_ms_filename_spectrum_id(gnps_file_mappings_file),\n    )\n\n    # Get mappings strain_id &lt;-&gt; bgc_id / spectrum_id\n    mappings = mappings_strain_id_bgc_id.copy()\n    for strain_id, spectrum_ids in mappings_strain_id_spectrum_id.items():\n        if strain_id in mappings:\n            mappings[strain_id].update(spectrum_ids)\n        else:\n            mappings[strain_id] = spectrum_ids.copy()\n\n    # Create StrainCollection\n    sc = StrainCollection()\n    for strain_id, bgc_ids in mappings.items():\n        if not sc.has_name(strain_id):\n            strain = Strain(strain_id)\n            for bgc_id in bgc_ids:\n                strain.add_alias(bgc_id)\n            sc.add(strain)\n        else:\n            # strain_list has only one element\n            strain_list = sc.lookup(strain_id)\n            for bgc_id in bgc_ids:\n                strain_list[0].add_alias(bgc_id)\n\n    # Write strain mappings JSON file\n    sc.to_json(output_json_file)\n    logger.info(\"Generated strain mappings JSON file: %s\", output_json_file)\n\n    return sc\n</code></pre>"},{"location":"api/utils/","title":"Utilities","text":""},{"location":"api/utils/#nplinker.utils","title":"nplinker.utils","text":""},{"location":"api/utils/#nplinker.utils.calculate_md5","title":"calculate_md5","text":"<pre><code>calculate_md5(\n    fpath: str | PathLike, chunk_size: int = 1024 * 1024\n) -&gt; str\n</code></pre> <p>Calculate the MD5 checksum of a file.</p> <p>Parameters:</p> <ul> <li> <code>fpath</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file.</p> </li> <li> <code>chunk_size</code>               (<code>int</code>, default:                   <code>1024 * 1024</code> )           \u2013            <p>Chunk size for reading the file. Defaults to 1024*1024.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>MD5 checksum of the file.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def calculate_md5(fpath: str | PathLike, chunk_size: int = 1024 * 1024) -&gt; str:\n    \"\"\"Calculate the MD5 checksum of a file.\n\n    Args:\n        fpath: Path to the file.\n        chunk_size: Chunk size for reading the file. Defaults to 1024*1024.\n\n    Returns:\n        MD5 checksum of the file.\n    \"\"\"\n    if sys.version_info &gt;= (3, 9):\n        md5 = hashlib.md5(usedforsecurity=False)\n    else:\n        md5 = hashlib.md5()\n    with open(fpath, \"rb\") as f:\n        for chunk in iter(lambda: f.read(chunk_size), b\"\"):\n            md5.update(chunk)\n    return md5.hexdigest()\n</code></pre>"},{"location":"api/utils/#nplinker.utils.check_disk_space","title":"check_disk_space","text":"<pre><code>check_disk_space(func)\n</code></pre> <p>A decorator to check available disk space.</p> <p>If the available disk space is less than 500GB, raise and log a warning.</p> <p>Warns:</p> <ul> <li> <code>UserWarning</code>             \u2013            <p>If the available disk space is less than 500GB.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def check_disk_space(func):\n    \"\"\"A decorator to check available disk space.\n\n    If the available disk space is less than 500GB, raise and log a warning.\n\n    Warnings:\n        UserWarning: If the available disk space is less than 500GB.\n    \"\"\"\n\n    @functools.wraps(func)\n    def wrapper_check_disk_space(*args, **kwargs):\n        _, _, free = shutil.disk_usage(\"/\")\n        free_gb = free // (2**30)\n        if free_gb &lt; 50:\n            warning_message = f\"Available disk space is {free_gb}GB. Is it enough for your project?\"\n            logger.warning(warning_message)\n            warnings.warn(warning_message, UserWarning)\n        return func(*args, **kwargs)\n\n    return wrapper_check_disk_space\n</code></pre>"},{"location":"api/utils/#nplinker.utils.check_md5","title":"check_md5","text":"<pre><code>check_md5(fpath: str | PathLike, md5: str) -&gt; bool\n</code></pre> <p>Verify the MD5 checksum of a file.</p> <p>Parameters:</p> <ul> <li> <code>fpath</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file.</p> </li> <li> <code>md5</code>               (<code>str</code>)           \u2013            <p>MD5 checksum to verify.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if the MD5 checksum matches, False otherwise.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def check_md5(fpath: str | PathLike, md5: str) -&gt; bool:\n    \"\"\"Verify the MD5 checksum of a file.\n\n    Args:\n        fpath: Path to the file.\n        md5: MD5 checksum to verify.\n\n    Returns:\n        True if the MD5 checksum matches, False otherwise.\n    \"\"\"\n    return md5 == calculate_md5(fpath)\n</code></pre>"},{"location":"api/utils/#nplinker.utils.download_and_extract_archive","title":"download_and_extract_archive","text":"<pre><code>download_and_extract_archive(\n    url: str,\n    download_root: str | PathLike,\n    extract_root: str | Path | None = None,\n    filename: str | None = None,\n    md5: str | None = None,\n    remove_finished: bool = False,\n) -&gt; None\n</code></pre> <p>Download an archive file and then extract it.</p> <p>This method is a wrapper of <code>download_url</code> and <code>extract_archive</code> functions.</p> <p>Parameters:</p> <ul> <li> <code>url</code>               (<code>str</code>)           \u2013            <p>URL to download file from</p> </li> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory to place downloaded file in. If it doesn't exist, it will be created.</p> </li> <li> <code>extract_root</code>               (<code>str | Path | None</code>, default:                   <code>None</code> )           \u2013            <p>Path to the directory the file will be extracted to. The given directory will be created if not exist. If omitted, the <code>download_root</code> is used.</p> </li> <li> <code>filename</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>Name to save the downloaded file under. If None, use the basename of the URL</p> </li> <li> <code>md5</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>MD5 checksum of the download. If None, do not check</p> </li> <li> <code>remove_finished</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>If <code>True</code>, remove the downloaded file  after the extraction. Defaults to False.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def download_and_extract_archive(\n    url: str,\n    download_root: str | PathLike,\n    extract_root: str | Path | None = None,\n    filename: str | None = None,\n    md5: str | None = None,\n    remove_finished: bool = False,\n) -&gt; None:\n    \"\"\"Download an archive file and then extract it.\n\n    This method is a wrapper of [`download_url`][nplinker.utils.download_url] and\n    [`extract_archive`][nplinker.utils.extract_archive] functions.\n\n    Args:\n        url: URL to download file from\n        download_root: Path to the directory to place downloaded\n            file in. If it doesn't exist, it will be created.\n        extract_root: Path to the directory the file\n            will be extracted to. The given directory will be created if not exist.\n            If omitted, the `download_root` is used.\n        filename: Name to save the downloaded file under.\n            If None, use the basename of the URL\n        md5: MD5 checksum of the download. If None, do not check\n        remove_finished: If `True`, remove the downloaded file\n             after the extraction. Defaults to False.\n    \"\"\"\n    download_root = Path(download_root)\n    if extract_root is None:\n        extract_root = download_root\n    else:\n        extract_root = Path(extract_root)\n    if not filename:\n        filename = Path(url).name\n\n    download_url(url, download_root, filename, md5)\n\n    archive = download_root / filename\n    extract_archive(archive, extract_root, remove_finished=remove_finished)\n</code></pre>"},{"location":"api/utils/#nplinker.utils.download_url","title":"download_url","text":"<pre><code>download_url(\n    url: str,\n    root: str | PathLike,\n    filename: str | None = None,\n    md5: str | None = None,\n    http_method: str = \"GET\",\n    allow_http_redirect: bool = True,\n) -&gt; None\n</code></pre> <p>Download a file from a url and place it in root.</p> <p>Parameters:</p> <ul> <li> <code>url</code>               (<code>str</code>)           \u2013            <p>URL to download file from</p> </li> <li> <code>root</code>               (<code>str | PathLike</code>)           \u2013            <p>Directory to place downloaded file in. If it doesn't exist, it will be created.</p> </li> <li> <code>filename</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>Name to save the file under. If None, use the basename of the URL.</p> </li> <li> <code>md5</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>MD5 checksum of the download. If None, do not check.</p> </li> <li> <code>http_method</code>               (<code>str</code>, default:                   <code>'GET'</code> )           \u2013            <p>HTTP request method, e.g. \"GET\", \"POST\". Defaults to \"GET\".</p> </li> <li> <code>allow_http_redirect</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>If true, enable following redirects for all HTTP (\"http:\") methods.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>@check_disk_space\ndef download_url(\n    url: str,\n    root: str | PathLike,\n    filename: str | None = None,\n    md5: str | None = None,\n    http_method: str = \"GET\",\n    allow_http_redirect: bool = True,\n) -&gt; None:\n    \"\"\"Download a file from a url and place it in root.\n\n    Args:\n        url: URL to download file from\n        root: Directory to place downloaded file in. If it doesn't exist, it will be created.\n        filename: Name to save the file under. If None, use the\n            basename of the URL.\n        md5: MD5 checksum of the download. If None, do not check.\n        http_method: HTTP request method, e.g. \"GET\", \"POST\".\n            Defaults to \"GET\".\n        allow_http_redirect: If true, enable following redirects for all HTTP (\"http:\") methods.\n    \"\"\"\n    root = transform_to_full_path(root)\n    # create the download directory if not exist\n    root.mkdir(exist_ok=True)\n    if not filename:\n        filename = Path(url).name\n    fpath = root / filename\n\n    # check if file is already present locally\n    if fpath.is_file() and md5 is not None and check_md5(fpath, md5):\n        logger.info(\"Using downloaded and verified file: \" + str(fpath))\n        return\n\n    # download the file\n    logger.info(f\"Downloading {filename} to {root}\")\n    with open(fpath, \"wb\") as fh:\n        with httpx.stream(http_method, url, follow_redirects=allow_http_redirect) as response:\n            if not response.is_success:\n                fpath.unlink(missing_ok=True)\n                raise RuntimeError(\n                    f\"Failed to download url {url} with status code {response.status_code}\"\n                )\n            total = int(response.headers.get(\"Content-Length\", 0))\n\n            with Progress(\n                TextColumn(\"[progress.description]{task.description}\"),\n                BarColumn(bar_width=None),\n                \"[progress.percentage]{task.percentage:&gt;3.1f}%\",\n                \"\u2022\",\n                DownloadColumn(),\n                \"\u2022\",\n                TransferSpeedColumn(),\n                \"\u2022\",\n                TimeRemainingColumn(),\n                \"\u2022\",\n                TimeElapsedColumn(),\n            ) as progress:\n                task = progress.add_task(f\"[hot_pink]Downloading {fpath.name}\", total=total)\n                for chunk in response.iter_bytes():\n                    fh.write(chunk)\n                    progress.update(task, advance=len(chunk))\n\n    # check integrity of downloaded file\n    if md5 is not None and not check_md5(fpath, md5):\n        raise RuntimeError(\"MD5 validation failed.\")\n</code></pre>"},{"location":"api/utils/#nplinker.utils.extract_archive","title":"extract_archive","text":"<pre><code>extract_archive(\n    from_path: str | PathLike,\n    extract_root: str | PathLike | None = None,\n    members: list | None = None,\n    remove_finished: bool = False,\n) -&gt; str\n</code></pre> <p>Extract an archive.</p> <p>The archive type and a possible compression is automatically detected from the file name.</p> <p>If the file is compressed but not an archive, the call is dispatched to <code>_decompress</code> function.</p> <p>Parameters:</p> <ul> <li> <code>from_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file to be extracted.</p> </li> <li> <code>extract_root</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>Path to the directory the file will be extracted to. The given directory will be created if not exist. If omitted, the directory of the archive file is used.</p> </li> <li> <code>members</code>               (<code>list | None</code>, default:                   <code>None</code> )           \u2013            <p>Optional selection of members to extract. If not specified, all members are extracted. Members must be a subset of the list returned by - <code>zipfile.ZipFile.namelist()</code> or a list of strings for zip file - <code>tarfile.TarFile.getmembers()</code> for tar file</p> </li> <li> <code>remove_finished</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>If <code>True</code>, remove the file after the extraction.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Path to the directory the file was extracted to.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def extract_archive(\n    from_path: str | PathLike,\n    extract_root: str | PathLike | None = None,\n    members: list | None = None,\n    remove_finished: bool = False,\n) -&gt; str:\n    \"\"\"Extract an archive.\n\n    The archive type and a possible compression is automatically detected from\n    the file name.\n\n    If the file is compressed but not an archive, the call is dispatched to `_decompress` function.\n\n    Args:\n        from_path: Path to the file to be extracted.\n        extract_root: Path to the directory the file will be extracted to.\n            The given directory will be created if not exist.\n            If omitted, the directory of the archive file is used.\n        members: Optional selection of members to extract. If not specified,\n            all members are extracted.\n            Members must be a subset of the list returned by\n            - `zipfile.ZipFile.namelist()` or a list of strings for zip file\n            - `tarfile.TarFile.getmembers()` for tar file\n        remove_finished: If `True`, remove the file after the extraction.\n\n    Returns:\n        Path to the directory the file was extracted to.\n    \"\"\"\n    from_path = Path(from_path)\n\n    if extract_root is None:\n        extract_root = from_path.parent\n    else:\n        extract_root = Path(extract_root)\n\n    # create the extract directory if not exist\n    extract_root.mkdir(exist_ok=True)\n\n    logger.info(f\"Extracting {from_path} to {extract_root}\")\n    suffix, archive_type, compression = _detect_file_type(from_path)\n    if not archive_type:\n        return _decompress(\n            from_path,\n            extract_root / from_path.name.replace(suffix, \"\"),\n            remove_finished=remove_finished,\n        )\n\n    extractor = _ARCHIVE_EXTRACTORS[archive_type]\n\n    extractor(str(from_path), str(extract_root), members, compression)\n    if remove_finished:\n        from_path.unlink()\n\n    return str(extract_root)\n</code></pre>"},{"location":"api/utils/#nplinker.utils.is_file_format","title":"is_file_format","text":"<pre><code>is_file_format(\n    file: str | PathLike, format: str = \"tsv\"\n) -&gt; bool\n</code></pre> <p>Check if the file is in the given format.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file to check.</p> </li> <li> <code>format</code>               (<code>str</code>, default:                   <code>'tsv'</code> )           \u2013            <p>The format to check for, either \"tsv\" or \"csv\".</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if the file is in the given format, False otherwise.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def is_file_format(file: str | PathLike, format: str = \"tsv\") -&gt; bool:\n    \"\"\"Check if the file is in the given format.\n\n    Args:\n        file: Path to the file to check.\n        format: The format to check for, either \"tsv\" or \"csv\".\n\n    Returns:\n        True if the file is in the given format, False otherwise.\n    \"\"\"\n    try:\n        with open(file, \"rt\") as f:\n            if format == \"tsv\":\n                reader = csv.reader(f, delimiter=\"\\t\")\n            elif format == \"csv\":\n                reader = csv.reader(f, delimiter=\",\")\n            else:\n                raise ValueError(f\"Unknown format '{format}'.\")\n            for _ in reader:\n                pass\n        return True\n    except csv.Error:\n        return False\n</code></pre>"},{"location":"api/utils/#nplinker.utils.list_dirs","title":"list_dirs","text":"<pre><code>list_dirs(\n    root: str | PathLike, keep_parent: bool = True\n) -&gt; list[str]\n</code></pre> <p>List all directories at a given root.</p> <p>Parameters:</p> <ul> <li> <code>root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to directory whose folders need to be listed</p> </li> <li> <code>keep_parent</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>If true, prepends the path to each result, otherwise only returns the name of the directories found</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def list_dirs(root: str | PathLike, keep_parent: bool = True) -&gt; list[str]:\n    \"\"\"List all directories at a given root.\n\n    Args:\n        root: Path to directory whose folders need to be listed\n        keep_parent: If true, prepends the path to each result, otherwise\n            only returns the name of the directories found\n    \"\"\"\n    root = transform_to_full_path(root)\n    directories = [str(p) for p in root.iterdir() if p.is_dir()]\n    if not keep_parent:\n        directories = [os.path.basename(d) for d in directories]\n    return directories\n</code></pre>"},{"location":"api/utils/#nplinker.utils.list_files","title":"list_files","text":"<pre><code>list_files(\n    root: str | PathLike,\n    prefix: str | tuple[str, ...] = \"\",\n    suffix: str | tuple[str, ...] = \"\",\n    keep_parent: bool = True,\n) -&gt; list[str]\n</code></pre> <p>List all files at a given root.</p> <p>Parameters:</p> <ul> <li> <code>root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to directory whose files need to be listed</p> </li> <li> <code>prefix</code>               (<code>str | tuple[str, ...]</code>, default:                   <code>''</code> )           \u2013            <p>Prefix of the file names to match, Defaults to empty string '\"\"'.</p> </li> <li> <code>suffix</code>               (<code>str | tuple[str, ...]</code>, default:                   <code>''</code> )           \u2013            <p>Suffix of the files to match, e.g. \".png\" or (\".jpg\", \".png\"). Defaults to empty string '\"\"'.</p> </li> <li> <code>keep_parent</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>If true, prepends the parent path to each result, otherwise only returns the name of the files found. Defaults to False.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def list_files(\n    root: str | PathLike,\n    prefix: str | tuple[str, ...] = \"\",\n    suffix: str | tuple[str, ...] = \"\",\n    keep_parent: bool = True,\n) -&gt; list[str]:\n    \"\"\"List all files at a given root.\n\n    Args:\n        root: Path to directory whose files need to be listed\n        prefix: Prefix of the file names to match,\n            Defaults to empty string '\"\"'.\n        suffix: Suffix of the files to match, e.g. \".png\" or\n            (\".jpg\", \".png\").\n            Defaults to empty string '\"\"'.\n        keep_parent: If true, prepends the parent path to each\n            result, otherwise only returns the name of the files found.\n            Defaults to False.\n    \"\"\"\n    root = Path(root)\n    files = [\n        str(p)\n        for p in root.iterdir()\n        if p.is_file() and p.name.startswith(prefix) and p.name.endswith(suffix)\n    ]\n\n    if not keep_parent:\n        files = [os.path.basename(f) for f in files]\n\n    return files\n</code></pre>"},{"location":"api/utils/#nplinker.utils.transform_to_full_path","title":"transform_to_full_path","text":"<pre><code>transform_to_full_path(p: str | PathLike) -&gt; Path\n</code></pre> <p>Transform a path to a full path.</p> <p>The path is expanded (i.e. the <code>~</code> will be replaced with actual path) and converted to an absolute path (i.e. <code>.</code> or <code>..</code> will be replaced with actual path).</p> <p>Parameters:</p> <ul> <li> <code>p</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to transform.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>Path</code>           \u2013            <p>The transformed full path.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def transform_to_full_path(p: str | PathLike) -&gt; Path:\n    \"\"\"Transform a path to a full path.\n\n    The path is expanded (i.e. the `~` will be replaced with actual path) and converted to an\n    absolute path (i.e. `.` or `..` will be replaced with actual path).\n\n    Args:\n        p: The path to transform.\n\n    Returns:\n        The transformed full path.\n    \"\"\"\n    # Multiple calls to `Path` are used to ensure static typing compatibility.\n    p = Path(p).expanduser()\n    p = Path(p).resolve()\n    return Path(p)\n</code></pre>"},{"location":"concepts/bigscape/","title":"BigScape","text":"<p>NPLinker can run BigScape automatically if the <code>bigscape</code> directory does not exist in the working directory. Both version 1 and version 2 of BigScape are supported.</p> <p>See the configuration template for how to set parameters for running BigScape.</p> <p>See the default configurations for the default parameters used in NPLinker.</p>"},{"location":"concepts/config_file/","title":"Config File","text":""},{"location":"concepts/config_file/#configuration-template","title":"Configuration Template","text":"<pre><code>#############################\n# NPLinker configuration file\n#############################\n\n# The root directory of the NPLinker project. You need to create it first.\n# The value is required and must be a full path.\nroot_dir = \"&lt;NPLinker root directory&gt;\"\n# The mode for preparing dataset.\n# The available modes are \"podp\" and \"local\".\n# \"podp\" mode is for using the PODP platform (https://pairedomicsdata.bioinformatics.nl/) to prepare the dataset.\n# \"local\" mode is for preparing the dataset locally. So uers do not need to upload their data to the PODP platform.\n# The value is required.\nmode = \"podp\"\n# The PODP project identifier.\n# The value is required if the mode is \"podp\".\npodp_id = \"\"\n\n\n[log]\n# Log level. The available levels are same as the levels in python package `logging`:\n# \"DEBUG\", \"INFO\", \"WARNING\", \"ERROR\", \"CRITICAL\".\n# The default value is \"INFO\".\nlevel = \"INFO\"\n# The log file to append log messages.\n# The value is optional.\n# If not set or use empty string, log messages will not be written to a file.\n# The file will be created if it does not exist. Log messages will be appended to the file if it exists.\nfile = \"path/to/logfile\"\n# Whether to write log meesages to console.\n# The default value is true.\nuse_console = true\n\n\n[mibig]\n# Whether to use mibig metadta (json).\n# The default value is true.\nto_use = true\n# The version of mibig metadata.\n# Make sure using the same version of mibig in bigscape.\n# The default value is \"3.1\"\nversion = \"3.1\"\n\n\n[bigscape]\n# The parameters to use for running BiG-SCAPE.\n# Version of BiG-SCAPE to run. Make sure to change the parameters property below as well\n# when changing versions.\nversion = 1\n# Required BiG-SCAPE parameters.\n# --------------\n# For version 1:\n# -------------\n# Required parameters are: `--mix`, `--include_singletons` and `--cutoffs`. NPLinker needs them to run the analysis properly.\n# Do NOT set these parameters: `--inputdir`, `--outputdir`, `--pfam_dir`. NPLinker will automatically configure them.\n# If parameter `--mibig` is set, make sure to set the config `mibig.to_use` to true and `mibig.version` to the version of mibig in BiG-SCAPE.\n# The default value is \"--mibig --clans-off --mix --include_singletons --cutoffs 0.30\".\n# --------------\n# For version 2:\n# --------------\n# Note that BiG-SCAPE v2 has subcommands. NPLinker requires the `cluster` subcommand and its parameters.\n# Required parameters of `cluster` subcommand are: `--mibig_version`, `--include_singletons` and `--gcf_cutoffs`.\n# DO NOT set these parameters: `--pfam_path`, `--inputdir`, `--outputdir`. NPLinker will automatically configure them.\n# BiG-SCPAPE v2 also runs a `--mix` analysis by default, so you don't need to set this parameter here.\n# Example parameters for BiG-SCAPE v2: \"--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30\"\nparameters = \"--mibig --clans-off --mix --include_singletons --cutoffs 0.30\"\n# Which bigscape cutoff to use for NPLinker analysis.\n# There might be multiple cutoffs in bigscape output.\n# Note that this value must be a string.\n# The default value is \"0.30\".\ncutoff = \"0.30\"\n\n\n[scoring]\n# Scoring methods.\n# Valid values are \"metcalf\" and \"rosetta\".\n# The default value is \"metcalf\".\nmethods = [\"metcalf\"]\n</code></pre>"},{"location":"concepts/config_file/#default-configurations","title":"Default Configurations","text":"<p>The default configurations are automatically used by NPLinker if you don't set them in your config file.</p> <pre><code># NPLinker default configurations\n\n[log]\nlevel = \"INFO\"\nuse_console = true\n\n[mibig]\nto_use = true\nversion = \"3.1\"\n\n[bigscape]\nversion = 1\nparameters = \"--mibig --clans-off --mix --include_singletons --cutoffs 0.30\"\ncutoff = \"0.30\"\n\n[scoring]\nmethods = [\"metcalf\"]\n</code></pre>"},{"location":"concepts/config_file/#config-loader","title":"Config loader","text":"<p>You can load the configuration file using the load_config function.</p> <pre><code>from nplinker.config import load_config\nconfig = load_config('path/to/nplinker.toml')\n</code></pre> <p>When you use NPLinker as an application, you can get access to the configuration object directly:</p> <pre><code>from nplinker import NPLinker\nnpl = NPLinker('path/to/nplinker.toml')\nprint(npl.config)\n</code></pre>"},{"location":"concepts/gnps_data/","title":"GNPS data","text":"<p>NPLinker requires GNPS molecular networking data as input. It currently accepts data from the following  GNPS workflows:</p> <ul> <li><code>METABOLOMICS-SNETS</code> (data should be downloaded from the option <code>Download Clustered Spectra as MGF</code>)</li> <li><code>METABOLOMICS-SNETS-V2</code> (<code>Download Clustered Spectra as MGF</code>)</li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code> (<code>Download Cytoscape Data</code>)</li> </ul>"},{"location":"concepts/gnps_data/#mappings-from-gnps-data-to-nplinker-input","title":"Mappings from GNPS data to NPLinker input","text":"<code>METABOLOMICS-SNETS</code> workflow<code>METABOLOMICS-SNETS-V2</code><code>FEATURE-BASED-MOLECULAR-NETWORKING</code> NPLinker input GNPS file in the archive of <code>Download Clustered Spectra as MGF</code> spectra.mgf METABOLOMICS-SNETS*.mgf molecular_families.tsv networkedges_selfloop/*.pairsinfo annotations.tsv result_specnets_DB/*.tsv file_mappings.tsv clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv <p>For example, the file <code>METABOLOMICS-SNETS*.mgf</code> from the downloaded zip archive is used as  the <code>spectra.mgf</code> input file of NPLinker. </p> <p>When manually preparing GNPS data for NPLinker, the <code>METABOLOMICS-SNETS*.mgf</code> must be renamed to <code>spectra.mgf</code> and placed in the <code>gnps</code> sub-directory of the NPLinker working directory.</p> NPLinker input GNPS file in the archive of <code>Download Clustered Spectra as MGF</code> spectra.mgf METABOLOMICS-SNETS-V2*.mgf molecular_families.tsv networkedges_selfloop/*.selfloop annotations.tsv result_specnets_DB/*.tsv file_mappings.tsv clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary NPLinker input GNPS file in the archive of <code>Download Cytoscape Data</code> spectra.mgf spectra/*.mgf molecular_families.tsv networkedges_selfloop/*.selfloop annotations.tsv DB_result/*.tsv file_mappings.csv quantification_table/*.csv <p>Note that <code>file_mappings.csv</code> is a CSV file, not a TSV file, different from the other workflows.</p>"},{"location":"concepts/working_dir_structure/","title":"Working Directory Structure","text":"<p>NPLinker requires a fixed structure of working directory with fixed names for the input and output data.</p> <pre><code>root_dir # (1)!\n    \u2502\n    \u251c\u2500\u2500 nplinker.toml                           # (2)!\n    \u251c\u2500\u2500 strain_mappings.json                [F] # (3)!\n    \u251c\u2500\u2500 strains_selected.json               [F][O] # (4)!\n    \u2502\n    \u251c\u2500\u2500 gnps                                [F] # (5)!\n    \u2502       \u251c\u2500\u2500 spectra.mgf                 [F]\n    \u2502       \u251c\u2500\u2500 molecular_families.tsv      [F]\n    \u2502       \u251c\u2500\u2500 annotations.tsv             [F]\n    \u2502       \u2514\u2500\u2500 file_mappings.tsv (.csv)    [F] # (6)!\n    \u2502\n    \u251c\u2500\u2500 antismash                           [F] # (7)!\n    \u2502   \u251c\u2500\u2500 GCF_000514975.1\n    \u2502   \u2502   \u251c\u2500\u2500 xxx.region001.gbk\n    \u2502   \u2502   \u2514\u2500\u2500 ...\n    \u2502   \u251c\u2500\u2500 GCF_000016425.1\n    \u2502   \u2502   \u251c\u2500\u2500 xxxx.region001.gbk\n    \u2502   \u2502   \u2514\u2500\u2500 ...\n    \u2502   \u2514\u2500\u2500 ...\n    \u2502\n    \u251c\u2500\u2500 bigscape                            [F][O] # (8)!\n    \u2502   \u251c\u2500\u2500 mix_clustering_c0.30.tsv        [F]    # (9)!\n    \u2502   \u2514\u2500\u2500 bigscape_running_output\n    \u2502       \u2514\u2500\u2500 ...\n    \u2502\n    \u251c\u2500\u2500 downloads                           [F][A] # (10)!\n    \u2502       \u251c\u2500\u2500 paired_datarecord_4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.json # (11)!\n    \u2502       \u251c\u2500\u2500 GCF_000016425.1.zip\n    \u2502       \u251c\u2500\u2500 GCF_0000514975.1.zip\n    \u2502       \u251c\u2500\u2500 c22f44b14a3d450eb836d607cb9521bb.zip\n    \u2502       \u251c\u2500\u2500 genome_status.json\n    \u2502       \u2514\u2500\u2500 mibig_json_3.1.tar.gz\n    \u2502\n    \u251c\u2500\u2500 mibig                               [F][A] # (12)!\n    \u2502   \u251c\u2500\u2500 BGC0000001.json\n    \u2502   \u251c\u2500\u2500 BGC0000002.json\n    \u2502   \u2514\u2500\u2500 ...\n    \u2502\n    \u251c\u2500\u2500 output                              [F][A] # (13)!\n    \u2502   \u2514\u2500\u2500 ...\n    \u2502\n    \u2514\u2500\u2500 ...                                        # (14)!\n</code></pre> <ol> <li><code>root_dir</code> is the working directory you created, used as the root directory for NPLinker.</li> <li><code>nplinker.toml</code> is the configuration file (toml format) provided by the user for running NPLinker. </li> <li><code>strain_mappings.json</code> contains the mappings from strain to genomics and metabolomics data. It is     generated by NPLinker for <code>podp</code> mode; for <code>local</code> mode, users need to create it manually. <code>[F]</code> means the file name <code>nplinker.toml</code> is a fixed name (including the extension) and must be     named as shown.</li> <li><code>strains_selected.json</code> is an optional file containing the list of strains to be used in the analysis.     If it is not provided, NPLinker will use all strains detected from the input data.  <code>[O]</code> means the file <code>strains_selected.json</code> is optional for users to provide.</li> <li><code>gnps</code> directory contains the GNPS data. The files in this directory must be named as shown.     See XXX for more information about the GNPS data.</li> <li>This file could be <code>.tsv</code> or <code>.csv</code> format.</li> <li><code>antismash</code> directory contains a collection of AntiSMASH BGC data. The BGC data (<code>*.region*.gbk</code>      files) must be stored in subdirectories named after NCBI accession number (e.g. <code>GCF_000514975.1</code>).</li> <li><code>bigscape</code> directory is optional and contains the output of BigScape. If the directory is not     provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC     data.</li> <li><code>mix_clustering_c0.30.tsv</code> is an example output of BigScape. The file name must follow the pattern     <code>mix_clustering_c{cutoff}.tsv</code>, where <code>{cutoff}</code> is the cutoff value used in the BigScape run.</li> <li><code>downloads</code> directory is automatically created and managed by NPLinker. It stores the downloaded data    from the internet. Users can also use it to store their own downloaded data.  <code>[A]</code> means the directory is automatically created and/or managed by NPLinker.</li> <li>This is an example file, the actual file would be different. Same as the other files in     the <code>downloads</code> directory.</li> <li><code>mibig</code> directory contains the MIBiG metadata, which is automatically created and downloaded by      NPLinker. Users should not interfere with this directory and its content.</li> <li><code>output</code> directory is automatically created by NPLinker. It stores the output data of NPLinker.</li> <li>It's flexible to extend NPLinker by adding other types of data.</li> </ol> <p>Tip</p> <ul> <li><code>[F]</code> means the file or directory name is fixed and must be named as shown. The names are defined in the defaults module.</li> <li><code>[O]</code> means the file or directory is optional for users to provide. It does not mean the file or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate it.</li> <li><code>[A]</code> means the directory is automatically created and/or managed by NPLinker.</li> </ul>"},{"location":"diagrams/arranger/","title":"Dataset Arranging Pipeline","text":"<p>The DatasetArranger is implemented according to the following flowcharts.</p>"},{"location":"diagrams/arranger/#strain-mappings-file","title":"Strain mappings file","text":"<pre><code>flowchart TD\n    StrainMappings[`strain_mappings.json`] --&gt; SM{Is the mode PODP?}\n    SM --&gt; |No |SM0[Validate the file]\n    SM --&gt; |Yes|SM1[Generate the file] --&gt; SM0</code></pre>"},{"location":"diagrams/arranger/#strain-selection-file","title":"Strain selection file","text":"<pre><code>flowchart TD\n    StrainsSelected[`strains_selected.json`] --&gt; S{Does the file exist?}\n    S --&gt; |No | S0[Nothing to do]\n    S --&gt; |Yes| S1[Validate the file]</code></pre>"},{"location":"diagrams/arranger/#podp-project-metadata-json-file","title":"PODP project metadata json file","text":"<pre><code>flowchart TD\n    podp[PODP project metadata json file] --&gt; A{Is the mode PODP?}\n    A --&gt; |No | A0[Nothing to do]\n    A --&gt; |Yes| P{Does the file exist?}\n    P --&gt; |No | P0[Download the file] --&gt; P1\n    P --&gt; |Yes| P1[Validate the file]</code></pre>"},{"location":"diagrams/arranger/#gnps-antismash-and-bigscape","title":"GNPS, AntiSMASH and BigScape","text":"<pre><code>flowchart TD\n    ConfigError[Dynaconf config validation error]\n    DataError[Data validation error]\n    UseIt[Use the data]\n    Download[First remove existing data if relevent, then download or generate data]\n\n    A[GNPS, antiSMASH and BigSCape] --&gt; B{Pass Dynaconf config validation?}\n    B --&gt;|No | ConfigError\n    B --&gt;|Yes| G{Is the mode PODP?}\n\n    G --&gt;|No, local mode| G1{Does data dir exist?}\n    G1 --&gt;|No | DataError\n    G1 --&gt;|Yes| H{Pass data validation?}\n    H --&gt; |No | DataError\n    H --&gt; |Yes| UseIt \n\n    G --&gt;|Yes, podp mode| G2{Does data dir exist?}\n    G2 --&gt; |No | Download\n    G2 --&gt; |Yes | J{Pass data validation?}\n    J --&gt;|No | Download --&gt; |try max 2 times| J\n    J --&gt;|Yes| UseIt</code></pre>"},{"location":"diagrams/arranger/#mibig-data","title":"MIBiG Data","text":"<p>MIBiG data is always downloaded automatically. Users cannot provide their own MIBiG data.</p> <pre><code>flowchart TD\n    Mibig[MIBiG] --&gt; M0{Pass Dynaconf config validation?}\n    M0 --&gt;|No | M01[Dynaconf config validation error]\n    M0 --&gt;|Yes | MibigDownload[First remove existing data if relevant and then download data]</code></pre>"},{"location":"diagrams/loader/","title":"Dataset Loading Pipeline","text":"<p>The DatasetLoader is implemented according to the following pipeline.</p> <p></p>"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"NPLinker","text":"<p>NPLinker is a python framework for data mining microbial natural products by integrating genomics and metabolomics data.</p> <p>For a deep understanding of NPLinker, please refer to the original paper.</p> <p>Under Development</p> <p>NPLinker v2 is under active development (see its pre-releases).  The documentation is not complete yet. If you have any questions, please contact us via GitHub Issues.</p>"},{"location":"install/","title":"Installation","text":"Requirements <ul> <li>Linux, MacOS or Windows with WSL</li> <li>Python version \u22653.9</li> </ul> <p>NPLinker is a python package that has both pypi packages and non-pypi packages as dependencies. It  requires ~4.5GB of disk space to install all the dependencies. </p> <p>Install <code>nplinker</code> package as following:</p> Install nplinker package<pre><code># Check python version (\u22653.9)\npython --version\n\n# Create a new virtual environment\npython -m venv env          # (1)!\nsource env/bin/activate     # (2)! \n\n# install nplinker package (requiring ~300MB of disk space)\npip install --pre nplinker # (3)!\n\n# install nplinker non-pypi dependencies and databases (~4GB)\ninstall-nplinker-deps\n</code></pre> <ol> <li>A virtual environment is required to install the the non-pypi dependencies. You can also use <code>conda</code> to create a new environment. But NPLinker is not available on conda yet.</li> <li>Check <code>pip</code> command and make sure it is provided by the activated virtual environment. </li> <li>NPLinker v2 is still under development and released as pre-release. To install the pre-release, you need the <code>--pre</code> option. </li> </ol>"},{"location":"install/#install-from-source-code","title":"Install from source code","text":"<p>You can also install NPLinker from source code:</p> Install from latest source code<pre><code>pip install git+https://github.com/nplinker/nplinker@dev  # (1)!\ninstall-nplinker-deps\n</code></pre> <ol> <li>The <code>@dev</code> is the branch name. You can replace it with the branch name, commit or tag.</li> </ol>"},{"location":"logging/","title":"How to setup logging","text":"<p>NPLinker uses the standard library logging  module for managing log messages and the python library rich  to colorize the log messages. Depending on how you use NPLinker, you can set up logging in different ways.</p>"},{"location":"logging/#nplinker-as-an-application","title":"NPLinker as an application","text":"<p>If you're using NPLinker as an application, you're running the whole workflow of NPLinker as  described in the Quickstart. In this case, you can set up logging in the nplinker  configuration file <code>nplinker.toml</code>. </p>"},{"location":"logging/#nplinker-as-a-library","title":"NPLinker as a library","text":"<p>If you're using NPLinker as a library, you're using only some functions and classes of NPLinker in  your script. By default, NPLinker will not log any messages. However, you can set up logging in your script to log messages. </p> Set up logging in 'your_script.py'<pre><code># Set up logging configuration first\nfrom nplinker import setup_logging\n\nsetup_logging(level=\"DEBUG\", file=\"nplinker.log\", use_console=True) # (1)!\n\n# Your business code here\n# e.g. download and extract nplinker example data\nfrom nplinker.utils import download_and_extract_archive\n\ndownload_and_extract_archive(\n    url=\"https://zenodo.org/records/10822604/files/nplinker_local_mode_example.zip\",\n    download_root=\".\",\n)\n</code></pre> <ol> <li>The <code>setup_logging</code> function sets up the logging configuration. The <code>level</code> argument sets the     logging level. The <code>file</code> argument sets the log file. The <code>use_console</code> argument sets whether to     log messages to the console.</li> </ol> <p>The log messages will be written to the log file <code>nplinker.log</code> and displayed in the console with a  format like this: <code>[Date Time] Level Log-message Module:Line</code>.</p> Run your script in a terminal<pre><code># Run your script\n$ python your_script.py\nDownloading nplinker_local_mode_example.zip \u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501 100.0% \u2022 195.3/195.3 MB \u2022 2.6 MB/s \u2022 0:00:00 \u2022 0:01:02 # (1)!\n[2024-05-10 15:14:48] INFO     Extracting nplinker_local_mode_example.zip to .                      utils.py:401\n\n# Check the log file\n$ cat nplinker.log\n[2024-05-10 15:14:48] INFO     Extracting nplinker_local_mode_example.zip to .                      utils.py:401\n</code></pre> <ol> <li>This is a progress bar but not a log message.</li> </ol>"},{"location":"quickstart/","title":"Quickstart","text":"<p>NPLinker allows you to run in two modes:</p> <code>local</code> mode<code>podp</code> mode <p>The <code>local</code> mode assumes that the data required by NPLinker is available on your local machine.</p> <p>The required input data includes:</p> <ul> <li>GNPS molecular networking data from one of the following GNPS workflows<ul> <li><code>METABOLOMICS-SNETS</code>,</li> <li><code>METABOLOMICS-SNETS-V2</code></li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code></li> </ul> </li> <li>AntiSMASH BGC data</li> <li>BigScape data (optional)</li> </ul> <p>The <code>podp</code> mode assumes that you use an identifier of Paired Omics Data Platform (PODP) as the input for NPLinker. Then NPLinker will download and prepare all data necessary based on the PODP id which refers to the metadata of the dataset.</p> <p>So, which mode will you use? The answer is important for the next steps.</p>"},{"location":"quickstart/#1-create-a-working-directory","title":"1. Create a working directory","text":"<p>The working directory is used to store all input and output data for NPLinker. You can name this directory as you like, for example <code>nplinker_quickstart</code>:</p> Create a working directory<pre><code>mkdir nplinker_quickstart\n</code></pre> <p>Important</p> <p>Before going to the next step, make sure you get familiar with how NPLinker organizes data in the working directory, see Working Directory Structure page.</p>"},{"location":"quickstart/#2-prepare-input-data-local-mode-only","title":"2. Prepare input data (<code>local</code> mode only)","text":"Details <p>Skip this step if you choose to use the <code>podp</code> mode.</p> <p>If you choose to use the <code>local</code> mode, meaning you have input data of NPLinker stored on your local machine, you need to move the input data to the working directory created in the previous step.</p>"},{"location":"quickstart/#gnps-data","title":"GNPS data","text":"<p>NPLinker accepts data from the output of the following GNPS workflows:</p> <ul> <li><code>METABOLOMICS-SNETS</code></li> <li><code>METABOLOMICS-SNETS-V2</code></li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code>.</li> </ul> <p>NPLinker provides the tools <code>GNPSDownloader</code> and <code>GNPSExtractor</code> to download and extract the GNPS data with ease. What you need to give is a valid GNPS task ID, referring to a task of the GNPS workflows supported by NPLinker.</p> GNPS task id and workflow <p>Given an example of GNPS task at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c22f44b14a3d450eb836d607cb9521bb, the task id is the last part of this url, i.e. <code>c22f44b14a3d450eb836d607cb9521bb</code>. Open this link, you can find the worklow info at the row \"Workflow\" of the table \"Job Status\", for this case, it is <code>METABOLOMICS-SNETS</code>.</p> Download &amp; Extract GNPS data<pre><code>from nplinker.metabolomics.gnps import GNPSDownloader, GNPSExtractor\n\n# Go to the working directory\ncd nplinker_quickstart\n\n# Download GNPS data &amp; get the path to the downloaded archive\ndownloader = GNPSDownloader(\"gnps_task_id\", \"downloads\") # (1)!\ndownloaded_archive = downloader.download().get_download_file()\n\n# Extract GNPS data to `gnps` directory\nextractor = GNPSExtractor(downloaded_archive, \"gnps\") # (2)!\n</code></pre> <ol> <li>If you already have the downloaded archive of GNPS data, you can skip the download steps.</li> <li>Replace <code>downloaded_archive</code> with the actuall path to your GNPS data archive if you skipped the download steps.</li> </ol> <p>The required data for NPLinker will be extracted to the <code>gnps</code> subdirectory of the working directory.</p> <p>Info</p> <p>Not all GNPS data are required by NPLinker, and only the necessary data will be extracted. During the extraction, these data will be renamed to the standard names used by NPLinker. See the page GNPS Data for more information.</p> Prepare GNPS data manually <p>If you have GNPS data but it is not the archive format as downloaded from GNPS, it's recommended to re-download the data from GNPS.</p> <p>If (re-)downloading is not possible, you could manually prepare data for the <code>gnps</code> directory. In this case, you must make sure that the data is organized as expected by NPLinker. See the page GNPS Data for examples of how to prepare the data.</p>"},{"location":"quickstart/#antismash-data","title":"AntiSMASH data","text":"<p>NPLinker requires AntiSMASH BGC data as input, which are organized in the <code>antismash</code> subdirectory of  the working directory.</p> <p>For each output of AntiSMASH run, the BGC data must be stored in a subdirectory named after the NCBI accession number (e.g. <code>GCF_000514975.1</code>). And only the <code>*.region*.gbk</code> files are required by NPLinker.</p> <p>When manually preparing AntiSMASH data for NPLinker, you must make sure that the data is organized as expected by NPLinker. See the page Working Directory Structure for more information.</p>"},{"location":"quickstart/#bigscape-data-optional","title":"BigScape data (optional)","text":"<p>It is optional to provide the output of BigScape to NPLinker. If the output of BigScape is not provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC data.</p> <p>If you have the output of BigScape, you can put its <code>mix_clustering_c{cutoff}.tsv</code> file in the <code>bigscape</code> subdirectory of the NPLinker working directory, where <code>{cutoff}</code> is the cutoff value used in the BigScape run.</p>"},{"location":"quickstart/#strain-mappings-file","title":"Strain mappings file","text":"<p>The strain mappings file <code>strain_mapping.json</code> is required by NPLinker to map the strain to genomics and metabolomics data. </p> `strain_mappings.json` example<pre><code>{\n    \"strain_mappings\": [\n        {\n            \"strain_id\": \"strain_id_1\", # (1)!\n            \"strain_alias\": [\"bgc_id_1\", \"spectrum_id_1\", ...] # (2)!\n        },\n        {\n            \"strain_id\": \"strain_id_2\",\n            \"strain_alias\": [\"bgc_id_2\", \"spectrum_id_2\", ...]\n        },\n        ...\n    ],\n    \"version\": \"1.0\" # (3)!\n}\n</code></pre> <ol> <li><code>strain_id</code> is the unique identifier of the strain.</li> <li><code>strain_alias</code> is a list of aliases of the strain, which are the identifiers of the BGCs and spectra of the strain.</li> <li><code>version</code> is the schema version of this file. It is recommended to use the latest version of the schema. The current latest version is <code>1.0</code>. </li> </ol> <p>The BGC id is same as the name of the BGC file in the <code>antismash</code> directory, for example, given a  BGC file <code>xxxx.region001.gbk</code>, the BGC id is <code>xxxx.region001</code>.</p> <p>The spectrum id is same as the scan number in the <code>spectra.mgf</code> file in the <code>gnps</code> directory,  for example, given a spectrum in the mgf file with a scan <code>SCANS=1</code>, the spectrum id is <code>1</code>. </p> <p>If you labelled the mzXML files (input for GNPS) with the strain id, you may need the function  extract_mappings_ms_filename_spectrum_id  to extract the mappings from mzXML files to the spectrum ids.</p> <p>For the <code>local</code> mode, you need to create this file manually and put it in the working directory. It takes some effort to prepare this file manually, especially when you have a large number of strains.</p>"},{"location":"quickstart/#3-prepare-config-file","title":"3. Prepare config file","text":"<p>The configuration file <code>nplinker.toml</code> is required by NPLinker to specify the working directory, mode, and other settings for the run of NPLinker. You can put the <code>nplinker.toml</code> file in any place, but it  is recommended to put it in the working directory created in step 2.</p> <p>The details of all settings can be found at this page Config File.</p> <p>To keep it simple, default settings will be used  automatically by NPLinker if you don't set them in your <code>nplinker.toml</code> config file.</p> <p>What you need to do is to set the <code>root_dir</code> and <code>mode</code> in the <code>nplinker.toml</code> file.</p> <code>local</code> mode<code>podp</code> mode nplinker.toml<pre><code>root_dir = \"absolute/path/to/working/directory\" # (1)!\nmode = \"local\"\n# and other settings you want to override the default settings \n</code></pre> <ol> <li>Replace <code>absolute/path/to/working/directory</code> with the absolute path to the working directory    created in step 2.</li> </ol> nplinker.toml<pre><code>root_dir = \"absolute/path/to/working/directory\" # (1)!\nmode = \"podp\"\npodp_id = \"podp_id\" # (2)!\n# and other settings you want to override the default settings \n</code></pre> <ol> <li>Replace <code>absolute/path/to/working/directory</code> with the absolute path to the working directory    created in step 2.</li> <li>Replace <code>podp_id</code> with the identifier of the dataset in the Paired Omics Data Platform (PODP).</li> </ol>"},{"location":"quickstart/#4-run-nplinker","title":"4. Run NPLinker","text":"<p>Before running NPLinker, make sure your working directory has the correct directory structure and names as described in the Working Directory Structure page.</p> Run NPLinker in your working directory<pre><code>from nplinker import NPLinker\n\n# create an instance of NPLinker\nnpl = NPLinker(\"nplinker.toml\") # (1)!\n\n# load data\nnpl.load_data()\n\n# check loaded data\nprint(npl.bgcs)\nprint(npl.gcfs)\nprint(npl.spectra)\nprint(npl.mfs)\nprint(npl.strains)\n\n# compute the links for the first 3 GCFs using metcalf scoring method\nlink_graph = npl.get_links(npl.gcfs[:3], \"metcalf\")  # (2)!\n\n# get links as a list of tuples\nlink_graph.links \n\n# get the link data between two objects or entities\nlink_graph.get_link_data(npl.gcfs[0], npl.spectra[0]) \n\n# Save data to a pickle file\nnpl.save_data(\"npl.pkl\", link_graph)\n</code></pre> <ol> <li>Replace <code>nplinker.toml</code> with the actual path to your configuration file.</li> <li>The <code>get_links</code> returns a LinkGraph object that     represents the calculated links between the GCFs and other entities as a graph.</li> </ol> <p>For more info about the classes and methods, see the API Documentation.</p>"},{"location":"api/antismash/","title":"AntiSMASH","text":""},{"location":"api/antismash/#nplinker.genomics.antismash","title":"nplinker.genomics.antismash","text":""},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader","title":"AntismashBGCLoader","text":"<pre><code>AntismashBGCLoader(data_dir: str | PathLike)\n</code></pre> <p>               Bases: <code>BGCLoaderBase</code></p> <p>Data loader for AntiSMASH BGC genbank (.gbk) files.</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to AntiSMASH directory that contains a collection of AntiSMASH outputs.</p> </li> </ul> Notes <p>The input <code>data_dir</code> must follow the structure defined in the Working Directory Structure for AntiSMASH data, e.g.: <pre><code>antismash\n    \u251c\u2500\u2500 genome_id_1                  # one AntiSMASH output, e.g. GCF_000514775.1\n    \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 genome_id_2\n    \u2502\u00a0 \u251c\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre></p> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def __init__(self, data_dir: str | PathLike) -&gt; None:\n    \"\"\"Initialize the AntiSMASH BGC loader.\n\n    Args:\n        data_dir: Path to AntiSMASH directory that contains a collection of AntiSMASH outputs.\n\n    Notes:\n        The input `data_dir` must follow the structure defined in the\n        [Working Directory Structure][working-directory-structure] for AntiSMASH data, e.g.:\n        ```shell\n        antismash\n            \u251c\u2500\u2500 genome_id_1                  # one AntiSMASH output, e.g. GCF_000514775.1\n            \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n            \u2502\u00a0 \u2514\u2500\u2500 ...\n            \u251c\u2500\u2500 genome_id_2\n            \u2502\u00a0 \u251c\u2500\u2500 ...\n            \u2514\u2500\u2500 ...\n        ```\n    \"\"\"\n    self.data_dir = str(data_dir)\n    self._file_dict = self._parse_data_dir(self.data_dir)\n    self._bgcs = self._parse_bgcs(self._file_dict)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.data_dir","title":"data_dir  <code>instance-attribute</code>","text":"<pre><code>data_dir = str(data_dir)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.get_bgc_genome_mapping","title":"get_bgc_genome_mapping","text":"<pre><code>get_bgc_genome_mapping() -&gt; dict[str, str]\n</code></pre> <p>Get the mapping from BGC to genome.</p> <p>Info</p> <p>The directory name of the gbk files is treated as genome id.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is BGC name (gbk file name) and value is genome id (the directory name of the</p> </li> <li> <code>dict[str, str]</code>           \u2013            <p>gbk file).</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def get_bgc_genome_mapping(self) -&gt; dict[str, str]:\n    \"\"\"Get the mapping from BGC to genome.\n\n    !!! info\n        The directory name of the gbk files is treated as genome id.\n\n    Returns:\n        The key is BGC name (gbk file name) and value is genome id (the directory name of the\n        gbk file).\n    \"\"\"\n    return {\n        bid: os.path.basename(os.path.dirname(bpath)) for bid, bpath in self._file_dict.items()\n    }\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.get_files","title":"get_files","text":"<pre><code>get_files() -&gt; dict[str, str]\n</code></pre> <p>Get BGC gbk files.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is BGC name (gbk file name) and value is path to the gbk file.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def get_files(self) -&gt; dict[str, str]:\n    \"\"\"Get BGC gbk files.\n\n    Returns:\n        The key is BGC name (gbk file name) and value is path to the gbk file.\n    \"\"\"\n    return self._file_dict\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.AntismashBGCLoader.get_bgcs","title":"get_bgcs","text":"<pre><code>get_bgcs() -&gt; list[BGC]\n</code></pre> <p>Get all BGC objects.</p> <p>Returns:</p> <ul> <li> <code>list[BGC]</code>           \u2013            <p>A list of BGC objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def get_bgcs(self) -&gt; list[BGC]:\n    \"\"\"Get all BGC objects.\n\n    Returns:\n        A list of BGC objects\n    \"\"\"\n    return self._bgcs\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus","title":"GenomeStatus","text":"<pre><code>GenomeStatus(\n    original_id: str,\n    resolved_refseq_id: str = \"\",\n    resolve_attempted: bool = False,\n    bgc_path: str = \"\",\n)\n</code></pre> <p>Class to represent the status of a single genome.</p> <p>The status of genomes is tracked in the file GENOME_STATUS_FILENAME.</p> <p>Parameters:</p> <ul> <li> <code>original_id</code>               (<code>str</code>)           \u2013            <p>The original ID of the genome.</p> </li> <li> <code>resolved_refseq_id</code>               (<code>str</code>, default:                   <code>''</code> )           \u2013            <p>The resolved RefSeq ID of the genome. Defaults to \"\".</p> </li> <li> <code>resolve_attempted</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>A flag indicating whether an attempt to resolve the RefSeq ID has been made. Defaults to False.</p> </li> <li> <code>bgc_path</code>               (<code>str</code>, default:                   <code>''</code> )           \u2013            <p>The path to the downloaded BGC file for the genome. Defaults to \"\".</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>def __init__(\n    self,\n    original_id: str,\n    resolved_refseq_id: str = \"\",\n    resolve_attempted: bool = False,\n    bgc_path: str = \"\",\n):\n    \"\"\"Initialize a GenomeStatus object for the given genome.\n\n    Args:\n        original_id: The original ID of the genome.\n        resolved_refseq_id: The resolved RefSeq ID of the\n            genome. Defaults to \"\".\n        resolve_attempted: A flag indicating whether an\n            attempt to resolve the RefSeq ID has been made. Defaults to False.\n        bgc_path: The path to the downloaded BGC file for\n            the genome. Defaults to \"\".\n    \"\"\"\n    self.original_id = original_id\n    self.resolved_refseq_id = \"\" if resolved_refseq_id == \"None\" else resolved_refseq_id\n    self.resolve_attempted = resolve_attempted\n    self.bgc_path = bgc_path\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.original_id","title":"original_id  <code>instance-attribute</code>","text":"<pre><code>original_id = original_id\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.resolved_refseq_id","title":"resolved_refseq_id  <code>instance-attribute</code>","text":"<pre><code>resolved_refseq_id = (\n    \"\"\n    if resolved_refseq_id == \"None\"\n    else resolved_refseq_id\n)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.resolve_attempted","title":"resolve_attempted  <code>instance-attribute</code>","text":"<pre><code>resolve_attempted = resolve_attempted\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.bgc_path","title":"bgc_path  <code>instance-attribute</code>","text":"<pre><code>bgc_path = bgc_path\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.read_json","title":"read_json  <code>staticmethod</code>","text":"<pre><code>read_json(\n    file: str | PathLike,\n) -&gt; dict[str, \"GenomeStatus\"]\n</code></pre> <p>Get a dict of GenomeStatus objects by loading given genome status file.</p> <p>Note that an empty dict is returned if the given file doesn't exist.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to genome status file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, 'GenomeStatus']</code>           \u2013            <p>Dict keys are genome original id and values are GenomeStatus objects. An empty dict is returned if the given file doesn't exist.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>@staticmethod\ndef read_json(file: str | PathLike) -&gt; dict[str, \"GenomeStatus\"]:\n    \"\"\"Get a dict of GenomeStatus objects by loading given genome status file.\n\n    Note that an empty dict is returned if the given file doesn't exist.\n\n    Args:\n        file: Path to genome status file.\n\n    Returns:\n        Dict keys are genome original id and values are GenomeStatus\n            objects. An empty dict is returned if the given file doesn't exist.\n    \"\"\"\n    genome_status_dict = {}\n    if Path(file).exists():\n        with open(file, \"r\") as f:\n            data = json.load(f)\n\n        # validate json data before using it\n        validate(data, schema=GENOME_STATUS_SCHEMA)\n\n        genome_status_dict = {\n            gs[\"original_id\"]: GenomeStatus(**gs) for gs in data[\"genome_status\"]\n        }\n    return genome_status_dict\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.GenomeStatus.to_json","title":"to_json  <code>staticmethod</code>","text":"<pre><code>to_json(\n    genome_status_dict: Mapping[str, \"GenomeStatus\"],\n    file: str | PathLike | None = None,\n) -&gt; str | None\n</code></pre> <p>Convert the genome status dictionary to a JSON string.</p> <p>If a file path is provided, the JSON string is written to the file. If the file already exists, it is overwritten.</p> <p>Parameters:</p> <ul> <li> <code>genome_status_dict</code>               (<code>Mapping[str, 'GenomeStatus']</code>)           \u2013            <p>A dictionary of genome status objects. The keys are the original genome IDs and the values are GenomeStatus objects.</p> </li> <li> <code>file</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>The path to the output JSON file. If None, the JSON string is returned but not written to a file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str | None</code>           \u2013            <p>The JSON string if <code>file</code> is None, otherwise None.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>@staticmethod\ndef to_json(\n    genome_status_dict: Mapping[str, \"GenomeStatus\"], file: str | PathLike | None = None\n) -&gt; str | None:\n    \"\"\"Convert the genome status dictionary to a JSON string.\n\n    If a file path is provided, the JSON string is written to the file. If\n    the file already exists, it is overwritten.\n\n    Args:\n        genome_status_dict: A dictionary of genome\n            status objects. The keys are the original genome IDs and the values\n            are GenomeStatus objects.\n        file: The path to the output JSON file.\n            If None, the JSON string is returned but not written to a file.\n\n    Returns:\n        The JSON string if `file` is None, otherwise None.\n    \"\"\"\n    gs_list = [gs._to_dict() for gs in genome_status_dict.values()]\n    json_data = {\"genome_status\": gs_list, \"version\": \"1.0\"}\n\n    # validate json object before dumping\n    validate(json_data, schema=GENOME_STATUS_SCHEMA)\n\n    if file is not None:\n        with open(file, \"w\") as f:\n            json.dump(json_data, f)\n        return None\n    return json.dumps(json_data)\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.download_and_extract_antismash_data","title":"download_and_extract_antismash_data","text":"<pre><code>download_and_extract_antismash_data(\n    antismash_id: str,\n    download_root: str | PathLike,\n    extract_root: str | PathLike,\n) -&gt; None\n</code></pre> <p>Download and extract antiSMASH BGC archive for a specified genome.</p> <p>The antiSMASH database (https://antismash-db.secondarymetabolites.org/) is used to download the BGC archive. And antiSMASH use RefSeq assembly id of a genome as the id of the archive.</p> <p>Parameters:</p> <ul> <li> <code>antismash_id</code>               (<code>str</code>)           \u2013            <p>The id used to download BGC archive from antiSMASH database. If the id is versioned (e.g., \"GCF_004339725.1\") please be sure to specify the version as well.</p> </li> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory to place downloaded archive in.</p> </li> <li> <code>extract_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory data files will be extracted to. Note that an <code>antismash</code> directory will be created in the specified <code>extract_root</code> if it doesn't exist. The files will be extracted to <code>&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;</code> directory.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>if <code>&lt;extract_root&gt;/antismash/&lt;refseq_assembly_id&gt;</code> dir is not empty.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; download_and_extract_antismash_metadata(\"GCF_004339725.1\", \"/data/download\", \"/data/extracted\")\n</code></pre> Source code in <code>src/nplinker/genomics/antismash/antismash_downloader.py</code> <pre><code>def download_and_extract_antismash_data(\n    antismash_id: str, download_root: str | PathLike, extract_root: str | PathLike\n) -&gt; None:\n    \"\"\"Download and extract antiSMASH BGC archive for a specified genome.\n\n    The antiSMASH database (https://antismash-db.secondarymetabolites.org/)\n    is used to download the BGC archive. And antiSMASH use RefSeq assembly id\n    of a genome as the id of the archive.\n\n    Args:\n        antismash_id: The id used to download BGC archive from antiSMASH database.\n            If the id is versioned (e.g., \"GCF_004339725.1\") please be sure to\n            specify the version as well.\n        download_root: Path to the directory to place downloaded archive in.\n        extract_root: Path to the directory data files will be extracted to.\n            Note that an `antismash` directory will be created in the specified `extract_root` if\n            it doesn't exist. The files will be extracted to `&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;` directory.\n\n    Raises:\n        ValueError: if `&lt;extract_root&gt;/antismash/&lt;refseq_assembly_id&gt;` dir is not empty.\n\n    Examples:\n        &gt;&gt;&gt; download_and_extract_antismash_metadata(\"GCF_004339725.1\", \"/data/download\", \"/data/extracted\")\n    \"\"\"\n    download_root = Path(download_root)\n    extract_root = Path(extract_root)\n    extract_path = extract_root / \"antismash\" / antismash_id\n\n    try:\n        if extract_path.exists():\n            _check_extract_path(extract_path)\n        else:\n            extract_path.mkdir(parents=True, exist_ok=True)\n\n        for base_url in [ANTISMASH_DB_DOWNLOAD_URL, ANTISMASH_DBV2_DOWNLOAD_URL]:\n            url = base_url.format(antismash_id, antismash_id + \".zip\")\n            download_and_extract_archive(url, download_root, extract_path, antismash_id + \".zip\")\n            break\n\n        # delete subdirs\n        for subdir_path in list_dirs(extract_path):\n            shutil.rmtree(subdir_path)\n\n        # delete unnecessary files\n        files_to_keep = list_files(extract_path, suffix=(\".json\", \".gbk\"))\n        for file in list_files(extract_path):\n            if file not in files_to_keep:\n                os.remove(file)\n\n        logger.info(\"antiSMASH BGC data of %s is downloaded and extracted.\", antismash_id)\n\n    except Exception as e:\n        shutil.rmtree(extract_path)\n        logger.warning(e)\n        raise e\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.parse_bgc_genbank","title":"parse_bgc_genbank","text":"<pre><code>parse_bgc_genbank(file: str | PathLike) -&gt; BGC\n</code></pre> <p>Parse a single BGC gbk file to BGC object.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to BGC gbk file</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>BGC</code>           \u2013            <p>BGC object</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; bgc = AntismashBGCLoader.parse_bgc(\n...    \"/data/antismash/GCF_000016425.1/NC_009380.1.region001.gbk\")\n</code></pre> Source code in <code>src/nplinker/genomics/antismash/antismash_loader.py</code> <pre><code>def parse_bgc_genbank(file: str | PathLike) -&gt; BGC:\n    \"\"\"Parse a single BGC gbk file to BGC object.\n\n    Args:\n        file: Path to BGC gbk file\n\n    Returns:\n        BGC object\n\n    Examples:\n        &gt;&gt;&gt; bgc = AntismashBGCLoader.parse_bgc(\n        ...    \"/data/antismash/GCF_000016425.1/NC_009380.1.region001.gbk\")\n    \"\"\"\n    file = Path(file)\n    fname = file.stem\n\n    record = SeqIO.read(file, format=\"genbank\")\n    description = record.description  # \"DEFINITION\" in gbk file\n    antismash_id = record.id  # \"VERSION\" in gbk file\n    features = _parse_antismash_genbank(record)\n    product_prediction = features.get(\"product\")\n    if product_prediction is None:\n        raise ValueError(f\"Not found product prediction in antiSMASH Genbank file {file}\")\n\n    # init BGC\n    bgc = BGC(fname, *product_prediction)\n    bgc.description = description\n    bgc.antismash_id = antismash_id\n    bgc.antismash_file = str(file)\n    bgc.antismash_region = features.get(\"region_number\")\n    bgc.smiles = features.get(\"smiles\")\n    bgc.strain = Strain(fname)\n    return bgc\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.get_best_available_genome_id","title":"get_best_available_genome_id","text":"<pre><code>get_best_available_genome_id(\n    genome_id_data: Mapping[str, str]\n) -&gt; str | None\n</code></pre> <p>Get the best available ID from genome_id_data dict.</p> <p>Parameters:</p> <ul> <li> <code>genome_id_data</code>               (<code>Mapping[str, str]</code>)           \u2013            <p>dictionary containing information for each genome record present.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str | None</code>           \u2013            <p>ID for the genome, if present, otherwise None.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>def get_best_available_genome_id(genome_id_data: Mapping[str, str]) -&gt; str | None:\n    \"\"\"Get the best available ID from genome_id_data dict.\n\n    Args:\n        genome_id_data: dictionary containing information for each genome record present.\n\n    Returns:\n        ID for the genome, if present, otherwise None.\n    \"\"\"\n    if \"RefSeq_accession\" in genome_id_data:\n        best_id = genome_id_data[\"RefSeq_accession\"]\n    elif \"GenBank_accession\" in genome_id_data:\n        best_id = genome_id_data[\"GenBank_accession\"]\n    elif \"JGI_Genome_ID\" in genome_id_data:\n        best_id = genome_id_data[\"JGI_Genome_ID\"]\n    else:\n        best_id = None\n\n    if best_id is None or len(best_id) == 0:\n        logger.warning(f\"Failed to get valid genome ID in genome data: {genome_id_data}\")\n        return None\n    return best_id\n</code></pre>"},{"location":"api/antismash/#nplinker.genomics.antismash.podp_download_and_extract_antismash_data","title":"podp_download_and_extract_antismash_data","text":"<pre><code>podp_download_and_extract_antismash_data(\n    genome_records: Sequence[\n        Mapping[str, Mapping[str, str]]\n    ],\n    project_download_root: str | PathLike,\n    project_extract_root: str | PathLike,\n)\n</code></pre> <p>Download and extract antiSMASH BGC archive for the given genome records.</p> <p>Parameters:</p> <ul> <li> <code>genome_records</code>               (<code>Sequence[Mapping[str, Mapping[str, str]]]</code>)           \u2013            <p>list of dicts representing genome records.</p> <p>The dict of each genome record contains a key of genome ID with a value of another dict containing information about genome type, label and accession ids (RefSeq, GenBank, and/or JGI).</p> </li> <li> <code>project_download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory to place downloaded archive in.</p> </li> <li> <code>project_extract_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory downloaded archive will be extracted to.</p> <p>Note that an <code>antismash</code> directory will be created in the specified <code>extract_root</code> if it doesn't exist. The files will be extracted to <code>&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;</code> directory.</p> </li> </ul> <p>Warns:</p> <ul> <li> <code>UserWarning</code>             \u2013            <p>when no antiSMASH data is found for some genomes.</p> </li> </ul> Source code in <code>src/nplinker/genomics/antismash/podp_antismash_downloader.py</code> <pre><code>def podp_download_and_extract_antismash_data(\n    genome_records: Sequence[Mapping[str, Mapping[str, str]]],\n    project_download_root: str | PathLike,\n    project_extract_root: str | PathLike,\n):\n    \"\"\"Download and extract antiSMASH BGC archive for the given genome records.\n\n    Args:\n        genome_records: list of dicts representing genome records.\n\n            The dict of each genome record contains a key of genome ID with a value\n            of another dict containing information about genome type, label and\n            accession ids (RefSeq, GenBank, and/or JGI).\n        project_download_root: Path to the directory to place\n            downloaded archive in.\n        project_extract_root: Path to the directory downloaded archive will be extracted to.\n\n            Note that an `antismash` directory will be created in the specified\n            `extract_root` if it doesn't exist. The files will be extracted to\n            `&lt;extract_root&gt;/antismash/&lt;antismash_id&gt;` directory.\n\n    Warnings:\n        UserWarning: when no antiSMASH data is found for some genomes.\n    \"\"\"\n    if not Path(project_download_root).exists():\n        # otherwise in case of failed first download, the folder doesn't exist and\n        # genome_status_file can't be written\n        Path(project_download_root).mkdir(parents=True, exist_ok=True)\n\n    gs_file = Path(project_download_root, GENOME_STATUS_FILENAME)\n    gs_dict = GenomeStatus.read_json(gs_file)\n\n    for i, genome_record in enumerate(genome_records):\n        # get the best available ID from the dict\n        genome_id_data = genome_record[\"genome_ID\"]\n        raw_genome_id = get_best_available_genome_id(genome_id_data)\n        if raw_genome_id is None or len(raw_genome_id) == 0:\n            logger.warning(f'Invalid input genome record \"{genome_record}\"')\n            continue\n\n        # check if genome ID exist in the genome status file\n        if raw_genome_id not in gs_dict:\n            gs_dict[raw_genome_id] = GenomeStatus(raw_genome_id)\n\n        gs_obj = gs_dict[raw_genome_id]\n\n        logger.info(\n            f\"Checking for antismash data {i + 1}/{len(genome_records)}, \"\n            f\"current genome ID={raw_genome_id}\"\n        )\n        # first, check if BGC data is downloaded\n        if gs_obj.bgc_path and Path(gs_obj.bgc_path).exists():\n            logger.info(f\"Genome ID {raw_genome_id} already downloaded to {gs_obj.bgc_path}\")\n            continue\n        # second, check if lookup attempted previously\n        if gs_obj.resolve_attempted:\n            logger.info(f\"Genome ID {raw_genome_id} skipped due to previous failed attempt\")\n            continue\n\n        # if not downloaded or lookup attempted, then try to resolve the ID\n        # and download\n        logger.info(f\"Start lookup process for genome ID {raw_genome_id}\")\n        gs_obj.resolved_refseq_id = _resolve_refseq_id(genome_id_data)\n        gs_obj.resolve_attempted = True\n\n        if gs_obj.resolved_refseq_id == \"\":\n            # give up on this one\n            logger.warning(f\"Failed lookup for genome ID {raw_genome_id}\")\n            continue\n\n        # if resolved id is valid, try to download and extract antismash data\n        try:\n            download_and_extract_antismash_data(\n                gs_obj.resolved_refseq_id, project_download_root, project_extract_root\n            )\n\n            gs_obj.bgc_path = str(\n                Path(project_download_root, gs_obj.resolved_refseq_id + \".zip\").absolute()\n            )\n\n            output_path = Path(project_extract_root, \"antismash\", gs_obj.resolved_refseq_id)\n            if output_path.exists():\n                Path.touch(output_path / \"completed\", exist_ok=True)\n\n        except Exception:\n            gs_obj.bgc_path = \"\"\n\n    # raise and log warning for failed downloads\n    failed_ids = [gs.original_id for gs in gs_dict.values() if not gs.bgc_path]\n    if failed_ids:\n        warning_message = (\n            f\"Failed to download antiSMASH data for the following genome IDs: {failed_ids}\"\n        )\n        logger.warning(warning_message)\n        warnings.warn(warning_message, UserWarning)\n\n    # save updated genome status to json file\n    GenomeStatus.to_json(gs_dict, gs_file)\n\n    if len(failed_ids) == len(genome_records):\n        raise ValueError(\"No antiSMASH data found for any genome\")\n</code></pre>"},{"location":"api/arranger/","title":"Dataset Arranger","text":""},{"location":"api/arranger/#nplinker.arranger","title":"nplinker.arranger","text":""},{"location":"api/arranger/#nplinker.arranger.PODP_PROJECT_URL","title":"PODP_PROJECT_URL  <code>module-attribute</code>","text":"<pre><code>PODP_PROJECT_URL = \"https://pairedomicsdata.bioinformatics.nl/api/projects/{}\"\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger","title":"DatasetArranger","text":"<pre><code>DatasetArranger(config: Dynaconf)\n</code></pre> <p>Arrange datasets based on the fixed working directory structure with the given configuration.</p> Concept and Diagram <p>Working Directory Structure</p> <p>Dataset Arranging Pipeline</p> <p>\"Arrange datasets\" means:</p> <ul> <li>For <code>local</code> mode (<code>config.mode</code> is <code>local</code>), the datasets provided by users are validated.</li> <li>For <code>podp</code> mode (<code>config.mode</code> is <code>podp</code>), the datasets are automatically downloaded or     generated, then validated.</li> </ul> <p>The datasets include MIBiG, GNPS, antiSMASH, and BiG-SCAPE data.</p> <p>Attributes:</p> <ul> <li> <code>config</code>           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> <li> <code>root_dir</code>           \u2013            <p>The root directory of the datasets.</p> </li> <li> <code>downloads_dir</code>           \u2013            <p>The directory to store downloaded files.</p> </li> <li> <code>mibig_dir</code>           \u2013            <p>The directory to store MIBiG metadata.</p> </li> <li> <code>gnps_dir</code>           \u2013            <p>The directory to store GNPS data.</p> </li> <li> <code>antismash_dir</code>           \u2013            <p>The directory to store antiSMASH data.</p> </li> <li> <code>bigscape_dir</code>           \u2013            <p>The directory to store BiG-SCAPE data.</p> </li> <li> <code>bigscape_running_output_dir</code>           \u2013            <p>The directory to store the running output of BiG-SCAPE.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>config</code>               (<code>Dynaconf</code>)           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; from nplinker.config import load_config\n&gt;&gt;&gt; from nplinker.arranger import DatasetArranger\n&gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n&gt;&gt;&gt; arranger = DatasetArranger(config)\n&gt;&gt;&gt; arranger.arrange()\n</code></pre> See Also <p>DatasetLoader: Load all data from files to memory.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def __init__(self, config: Dynaconf) -&gt; None:\n    \"\"\"Initialize the DatasetArranger.\n\n    Args:\n        config: A Dynaconf object that contains the configuration settings.\n\n\n    Examples:\n        &gt;&gt;&gt; from nplinker.config import load_config\n        &gt;&gt;&gt; from nplinker.arranger import DatasetArranger\n        &gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n        &gt;&gt;&gt; arranger = DatasetArranger(config)\n        &gt;&gt;&gt; arranger.arrange()\n\n    See Also:\n        [DatasetLoader][nplinker.loader.DatasetLoader]: Load all data from files to memory.\n    \"\"\"\n    self.config = config\n    self.root_dir = config.root_dir\n    self.downloads_dir = self.root_dir / defaults.DOWNLOADS_DIRNAME\n    self.downloads_dir.mkdir(exist_ok=True)\n\n    self.mibig_dir = self.root_dir / defaults.MIBIG_DIRNAME\n    self.gnps_dir = self.root_dir / defaults.GNPS_DIRNAME\n    self.antismash_dir = self.root_dir / defaults.ANTISMASH_DIRNAME\n    self.bigscape_dir = self.root_dir / defaults.BIGSCAPE_DIRNAME\n    self.bigscape_running_output_dir = (\n        self.bigscape_dir / defaults.BIGSCAPE_RUNNING_OUTPUT_DIRNAME\n    )\n\n    self.arrange_podp_project_json()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.config","title":"config  <code>instance-attribute</code>","text":"<pre><code>config = config\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.root_dir","title":"root_dir  <code>instance-attribute</code>","text":"<pre><code>root_dir = root_dir\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.downloads_dir","title":"downloads_dir  <code>instance-attribute</code>","text":"<pre><code>downloads_dir = root_dir / DOWNLOADS_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.mibig_dir","title":"mibig_dir  <code>instance-attribute</code>","text":"<pre><code>mibig_dir = root_dir / MIBIG_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.gnps_dir","title":"gnps_dir  <code>instance-attribute</code>","text":"<pre><code>gnps_dir = root_dir / GNPS_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.antismash_dir","title":"antismash_dir  <code>instance-attribute</code>","text":"<pre><code>antismash_dir = root_dir / ANTISMASH_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.bigscape_dir","title":"bigscape_dir  <code>instance-attribute</code>","text":"<pre><code>bigscape_dir = root_dir / BIGSCAPE_DIRNAME\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.bigscape_running_output_dir","title":"bigscape_running_output_dir  <code>instance-attribute</code>","text":"<pre><code>bigscape_running_output_dir = (\n    bigscape_dir / BIGSCAPE_RUNNING_OUTPUT_DIRNAME\n)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange","title":"arrange","text":"<pre><code>arrange() -&gt; None\n</code></pre> <p>Arrange all datasets according to the configuration.</p> <p>The datasets include MIBiG, GNPS, antiSMASH, and BiG-SCAPE.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange(self) -&gt; None:\n    \"\"\"Arrange all datasets according to the configuration.\n\n    The datasets include MIBiG, GNPS, antiSMASH, and BiG-SCAPE.\n    \"\"\"\n    # The order of arranging the datasets matters, as some datasets depend on others\n    self.arrange_mibig()\n    self.arrange_gnps()\n    self.arrange_antismash()\n    self.arrange_bigscape()\n    self.arrange_strain_mappings()\n    self.arrange_strains_selected()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_podp_project_json","title":"arrange_podp_project_json","text":"<pre><code>arrange_podp_project_json() -&gt; None\n</code></pre> <p>Arrange the PODP project JSON file.</p> <p>This method only works for the <code>podp</code> mode. If the JSON file does not exist, download it first; then the downloaded or existing JSON file will be validated according to the PODP_ADAPTED_SCHEMA.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_podp_project_json(self) -&gt; None:\n    \"\"\"Arrange the PODP project JSON file.\n\n    This method only works for the `podp` mode. If the JSON file does not exist, download it\n    first; then the downloaded or existing JSON file will be validated according to the\n    [PODP_ADAPTED_SCHEMA][nplinker.schemas.PODP_ADAPTED_SCHEMA].\n    \"\"\"\n    if self.config.mode == \"podp\":\n        file_name = f\"paired_datarecord_{self.config.podp_id}.json\"\n        podp_file = self.downloads_dir / file_name\n        if not podp_file.exists():\n            download_url(\n                PODP_PROJECT_URL.format(self.config.podp_id),\n                self.downloads_dir,\n                file_name,\n            )\n\n        with open(podp_file, \"r\") as f:\n            json_data = json.load(f)\n        validate_podp_json(json_data)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_mibig","title":"arrange_mibig","text":"<pre><code>arrange_mibig() -&gt; None\n</code></pre> <p>Arrange the MIBiG metadata.</p> <p>If <code>config.mibig.to_use</code> is <code>True</code>, download and extract the MIBiG metadata and override the existing MIBiG metadata if it exists. This ensures that the MIBiG metadata is always up-to-date to the specified version in the configuration.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_mibig(self) -&gt; None:\n    \"\"\"Arrange the MIBiG metadata.\n\n    If `config.mibig.to_use` is `True`, download and extract the MIBiG metadata and override\n    the existing MIBiG metadata if it exists. This ensures that the MIBiG metadata is always\n    up-to-date to the specified version in the configuration.\n    \"\"\"\n    if self.config.mibig.to_use:\n        if self.mibig_dir.exists():\n            # remove existing mibig data\n            shutil.rmtree(self.mibig_dir)\n        download_and_extract_mibig_metadata(\n            self.downloads_dir,\n            self.mibig_dir,\n            version=self.config.mibig.version,\n        )\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_gnps","title":"arrange_gnps","text":"<pre><code>arrange_gnps() -&gt; None\n</code></pre> <p>Arrange the GNPS data.</p> <p>For <code>local</code> mode, validate the GNPS data directory.</p> <p>For <code>podp</code> mode, if the GNPS data does not exist, download it; if it exists but not valid, remove the data and re-downloads it.</p> <p>The validation process includes:</p> <ul> <li>Check if the GNPS data directory exists.</li> <li>Check if the required files exist in the GNPS data directory, including:<ul> <li><code>file_mappings.tsv</code> or <code>file_mappings.csv</code></li> <li><code>spectra.mgf</code></li> <li><code>molecular_families.tsv</code></li> <li><code>annotations.tsv</code></li> </ul> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_gnps(self) -&gt; None:\n    \"\"\"Arrange the GNPS data.\n\n    For `local` mode, validate the GNPS data directory.\n\n    For `podp` mode, if the GNPS data does not exist, download it; if it exists but not valid,\n    remove the data and re-downloads it.\n\n    The validation process includes:\n\n    - Check if the GNPS data directory exists.\n    - Check if the required files exist in the GNPS data directory, including:\n        - `file_mappings.tsv` or `file_mappings.csv`\n        - `spectra.mgf`\n        - `molecular_families.tsv`\n        - `annotations.tsv`\n    \"\"\"\n    pass_validation = False\n    if self.config.mode == \"podp\":\n        # retry downloading at most 3 times if downloaded data has problems\n        for _ in range(3):\n            try:\n                validate_gnps(self.gnps_dir)\n                pass_validation = True\n                break\n            except (FileNotFoundError, ValueError):\n                # Don't need to remove downloaded archive, as it'll be overwritten\n                shutil.rmtree(self.gnps_dir, ignore_errors=True)\n                self._download_and_extract_gnps()\n\n    if not pass_validation:\n        validate_gnps(self.gnps_dir)\n\n    # get the path to file_mappings file (csv or tsv)\n    self.gnps_file_mappings_file = self._get_gnps_file_mappings_file()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_antismash","title":"arrange_antismash","text":"<pre><code>arrange_antismash() -&gt; None\n</code></pre> <p>Arrange the antiSMASH data.</p> <p>For <code>local</code> mode, validate the antiSMASH data.</p> <p>For <code>podp</code> mode, if the antiSMASH data does not exist, download it; if it exists but not valid, remove the data and re-download it.</p> <p>The validation process includes:</p> <ul> <li>Check if the antiSMASH data directory exists.</li> <li>Check if the antiSMASH data directory contains at least one sub-directory, and each     sub-directory contains at least one BGC file (with the suffix <code>.region???.gbk</code> where     <code>???</code> is a number).</li> </ul> <p>AntiSMASH BGC directory must follow the structure below: <pre><code>antismash\n    \u251c\u2500\u2500 genome_id_1 (one AntiSMASH output, e.g. GCF_000514775.1)\n    \u2502\u00a0 \u251c\u2500\u2500 GCF_000514775.1.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 genome_id_2\n    \u2502\u00a0 \u251c\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre></p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_antismash(self) -&gt; None:\n    \"\"\"Arrange the antiSMASH data.\n\n    For `local` mode, validate the antiSMASH data.\n\n    For `podp` mode, if the antiSMASH data does not exist, download it; if it exists but not\n    valid, remove the data and re-download it.\n\n    The validation process includes:\n\n    - Check if the antiSMASH data directory exists.\n    - Check if the antiSMASH data directory contains at least one sub-directory, and each\n        sub-directory contains at least one BGC file (with the suffix `.region???.gbk` where\n        `???` is a number).\n\n    AntiSMASH BGC directory must follow the structure below:\n    ```\n    antismash\n        \u251c\u2500\u2500 genome_id_1 (one AntiSMASH output, e.g. GCF_000514775.1)\n        \u2502\u00a0 \u251c\u2500\u2500 GCF_000514775.1.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 NZ_AZWO01000004.region001.gbk\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u251c\u2500\u2500 genome_id_2\n        \u2502\u00a0 \u251c\u2500\u2500 ...\n        \u2514\u2500\u2500 ...\n    ```\n    \"\"\"\n    pass_validation = False\n    if self.config.mode == \"podp\":\n        for _ in range(3):\n            try:\n                validate_antismash(self.antismash_dir)\n                pass_validation = True\n                break\n            except FileNotFoundError:\n                shutil.rmtree(self.antismash_dir, ignore_errors=True)\n                self._download_and_extract_antismash()\n\n    if not pass_validation:\n        validate_antismash(self.antismash_dir)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_bigscape","title":"arrange_bigscape","text":"<pre><code>arrange_bigscape() -&gt; None\n</code></pre> <p>Arrange the BiG-SCAPE data.</p> <p>For <code>local</code> mode, validate the BiG-SCAPE data.</p> <p>For <code>podp</code> mode, if the BiG-SCAPE data does not exist, run BiG-SCAPE to generate the clustering file; if it exists but not valid, remove the data and re-run BiG-SCAPE to generate the data.</p> <p>The running output of BiG-SCAPE will be saved to the directory <code>bigscape_running_output</code> in the default BiG-SCAPE directory, and the clustering file <code>mix_clustering_c{self.config.bigscape.cutoff}.tsv</code> will be copied to the default BiG-SCAPE directory.</p> <p>The validation process includes:</p> <ul> <li>Check if the default BiG-SCAPE data directory exists.</li> <li>Check if the clustering file <code>mix_clustering_c{self.config.bigscape.cutoff}.tsv</code> exists in the         BiG-SCAPE data directory.</li> <li>Check if the <code>data_sqlite.db</code> file exists in the BiG-SCAPE data directory.</li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_bigscape(self) -&gt; None:\n    \"\"\"Arrange the BiG-SCAPE data.\n\n    For `local` mode, validate the BiG-SCAPE data.\n\n    For `podp` mode, if the BiG-SCAPE data does not exist, run BiG-SCAPE to generate the\n    clustering file; if it exists but not valid, remove the data and re-run BiG-SCAPE to generate\n    the data.\n\n    The running output of BiG-SCAPE will be saved to the directory `bigscape_running_output`\n    in the default BiG-SCAPE directory, and the clustering file\n    `mix_clustering_c{self.config.bigscape.cutoff}.tsv` will be copied to the default BiG-SCAPE\n    directory.\n\n    The validation process includes:\n\n    - Check if the default BiG-SCAPE data directory exists.\n    - Check if the clustering file `mix_clustering_c{self.config.bigscape.cutoff}.tsv` exists in the\n            BiG-SCAPE data directory.\n    - Check if the `data_sqlite.db` file exists in the BiG-SCAPE data directory.\n    \"\"\"\n    pass_validation = False\n    if self.config.mode == \"podp\":\n        for _ in range(3):\n            try:\n                validate_bigscape(self.bigscape_dir, self.config.bigscape.cutoff)\n                pass_validation = True\n                break\n            except FileNotFoundError:\n                shutil.rmtree(self.bigscape_dir, ignore_errors=True)\n                self._run_bigscape()\n\n    if not pass_validation:\n        validate_bigscape(self.bigscape_dir, self.config.bigscape.cutoff)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_strain_mappings","title":"arrange_strain_mappings","text":"<pre><code>arrange_strain_mappings() -&gt; None\n</code></pre> <p>Arrange the strain mappings file.</p> <p>For <code>local</code> mode, validate the strain mappings file.</p> <p>For <code>podp</code> mode, always generate the new strain mappings file and validate it.</p> <p>The validation checks if the strain mappings file exists and if it is a valid JSON file according to STRAIN_MAPPINGS_SCHEMA.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_strain_mappings(self) -&gt; None:\n    \"\"\"Arrange the strain mappings file.\n\n    For `local` mode, validate the strain mappings file.\n\n    For `podp` mode, always generate the new strain mappings file and validate it.\n\n    The validation checks if the strain mappings file exists and if it is a valid JSON file\n    according to [STRAIN_MAPPINGS_SCHEMA][nplinker.schemas.STRAIN_MAPPINGS_SCHEMA].\n    \"\"\"\n    if self.config.mode == \"podp\":\n        self._generate_strain_mappings()\n\n    self._validate_strain_mappings()\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.DatasetArranger.arrange_strains_selected","title":"arrange_strains_selected","text":"<pre><code>arrange_strains_selected() -&gt; None\n</code></pre> <p>Arrange the strains selected file.</p> <p>If the file exists, validate it according to the schema defined in <code>user_strains.json</code>.</p> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def arrange_strains_selected(self) -&gt; None:\n    \"\"\"Arrange the strains selected file.\n\n    If the file exists, validate it according to the schema defined in `user_strains.json`.\n    \"\"\"\n    strains_selected_file = self.root_dir / defaults.STRAINS_SELECTED_FILENAME\n    if strains_selected_file.exists():\n        with open(strains_selected_file, \"r\") as f:\n            json_data = json.load(f)\n        validate(instance=json_data, schema=USER_STRAINS_SCHEMA)\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.validate_gnps","title":"validate_gnps","text":"<pre><code>validate_gnps(gnps_dir: str | PathLike) -&gt; None\n</code></pre> <p>Validate the GNPS data directory and its contents.</p> <p>The GNPS data directory must contain the following files:</p> <ul> <li><code>file_mappings.tsv</code> or <code>file_mappings.csv</code></li> <li><code>spectra.mgf</code></li> <li><code>molecular_families.tsv</code></li> <li><code>annotations.tsv</code></li> </ul> <p>Parameters:</p> <ul> <li> <code>gnps_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS data directory.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the GNPS data directory is not found or any of the required files is not found.</p> </li> <li> <code>ValueError</code>             \u2013            <p>If both file_mappings.tsv and file_mapping.csv are found.</p> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def validate_gnps(gnps_dir: str | PathLike) -&gt; None:\n    \"\"\"Validate the GNPS data directory and its contents.\n\n    The GNPS data directory must contain the following files:\n\n    - `file_mappings.tsv` or `file_mappings.csv`\n    - `spectra.mgf`\n    - `molecular_families.tsv`\n    - `annotations.tsv`\n\n    Args:\n        gnps_dir: Path to the GNPS data directory.\n\n    Raises:\n        FileNotFoundError: If the GNPS data directory is not found or any of the required files\n            is not found.\n        ValueError: If both file_mappings.tsv and file_mapping.csv are found.\n    \"\"\"\n    gnps_dir = Path(gnps_dir)\n    if not gnps_dir.exists():\n        raise FileNotFoundError(f\"GNPS data directory not found at {gnps_dir}\")\n\n    file_mappings_tsv = gnps_dir / defaults.GNPS_FILE_MAPPINGS_TSV\n    file_mappings_csv = gnps_dir / defaults.GNPS_FILE_MAPPINGS_CSV\n    if file_mappings_tsv.exists() and file_mappings_csv.exists():\n        raise ValueError(\n            f\"Both {file_mappings_tsv.name} and {file_mappings_csv.name} found in GNPS directory \"\n            f\"{gnps_dir}, only one is allowed.\"\n        )\n    elif not file_mappings_tsv.exists() and not file_mappings_csv.exists():\n        raise FileNotFoundError(\n            f\"Neither {file_mappings_tsv.name} nor {file_mappings_csv.name} found in GNPS directory\"\n            f\" {gnps_dir}\"\n        )\n\n    required_files = [\n        gnps_dir / defaults.GNPS_SPECTRA_FILENAME,\n        gnps_dir / defaults.GNPS_MOLECULAR_FAMILY_FILENAME,\n        gnps_dir / defaults.GNPS_ANNOTATIONS_FILENAME,\n    ]\n    list_not_found = [f.name for f in required_files if not f.exists()]\n    if list_not_found:\n        raise FileNotFoundError(\n            f\"Files not found in GNPS directory {gnps_dir}: ', '.join({list_not_found})\"\n        )\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.validate_antismash","title":"validate_antismash","text":"<pre><code>validate_antismash(antismash_dir: str | PathLike) -&gt; None\n</code></pre> <p>Validate the antiSMASH data directory and its contents.</p> <p>The validation only checks the structure of the antiSMASH data directory and file names. It does not check</p> <ul> <li>the content of the BGC files</li> <li>the consistency between the antiSMASH data and the PODP project JSON file for the <code>podp</code> mode</li> </ul> <p>The antiSMASH data directory must exist and contain at least one sub-directory. The name of the sub-directories must not contain any space. Each sub-directory must contain at least one BGC file (with the suffix <code>.region???.gbk</code> where <code>???</code> is the region number).</p> <p>Parameters:</p> <ul> <li> <code>antismash_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the antiSMASH data directory.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the antiSMASH data directory is not found, or no sub-directories are found in the antiSMASH data directory, or no BGC files are found in any sub-directory.</p> </li> <li> <code>ValueError</code>             \u2013            <p>If any sub-directory name contains a space.</p> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def validate_antismash(antismash_dir: str | PathLike) -&gt; None:\n    \"\"\"Validate the antiSMASH data directory and its contents.\n\n    The validation only checks the structure of the antiSMASH data directory and file names.\n    It does not check\n\n    - the content of the BGC files\n    - the consistency between the antiSMASH data and the PODP project JSON file for the `podp` mode\n\n    The antiSMASH data directory must exist and contain at least one sub-directory. The name of the\n    sub-directories must not contain any space. Each sub-directory must contain at least one BGC\n    file (with the suffix `.region???.gbk` where `???` is the region number).\n\n    Args:\n        antismash_dir: Path to the antiSMASH data directory.\n\n    Raises:\n        FileNotFoundError: If the antiSMASH data directory is not found, or no sub-directories\n            are found in the antiSMASH data directory, or no BGC files are found in any\n            sub-directory.\n        ValueError: If any sub-directory name contains a space.\n    \"\"\"\n    antismash_dir = Path(antismash_dir)\n    if not antismash_dir.exists():\n        raise FileNotFoundError(f\"antiSMASH data directory not found at {antismash_dir}\")\n\n    sub_dirs = list_dirs(antismash_dir)\n    if not sub_dirs:\n        raise FileNotFoundError(\n            \"No BGC directories found in antiSMASH data directory {antismash_dir}\"\n        )\n\n    for sub_dir in sub_dirs:\n        dir_name = Path(sub_dir).name\n        if \" \" in dir_name:\n            raise ValueError(\n                f\"antiSMASH sub-directory name {dir_name} contains space, which is not allowed\"\n            )\n\n        gbk_files = list_files(sub_dir, suffix=\".gbk\", keep_parent=False)\n        bgc_files = fnmatch.filter(gbk_files, \"*.region???.gbk\")\n        if not bgc_files:\n            raise FileNotFoundError(f\"No BGC files found in antiSMASH sub-directory {sub_dir}\")\n</code></pre>"},{"location":"api/arranger/#nplinker.arranger.validate_bigscape","title":"validate_bigscape","text":"<pre><code>validate_bigscape(\n    bigscape_dir: str | PathLike, cutoff: str\n) -&gt; None\n</code></pre> <p>Validate the BiG-SCAPE data directory and its contents.</p> <p>The BiG-SCAPE data directory must exist and contain the clustering file <code>mix_clustering_c{self.config.bigscape.cutoff}.tsv</code> where <code>{self.config.bigscape.cutoff}</code> is the bigscape cutoff value set in the config file.</p> <p>Alternatively, the directory can contain the BiG-SCAPE database file generated by BiG-SCAPE v2. At the moment, all the family assignments in the database will be used, so this database should contain results from a single run with the desired cutoff.</p> <p>Parameters:</p> <ul> <li> <code>bigscape_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the BiG-SCAPE data directory.</p> </li> <li> <code>cutoff</code>               (<code>str</code>)           \u2013            <p>The BiG-SCAPE cutoff value.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the BiG-SCAPE data directory or the clustering file is not found.</p> </li> </ul> Source code in <code>src/nplinker/arranger.py</code> <pre><code>def validate_bigscape(bigscape_dir: str | PathLike, cutoff: str) -&gt; None:\n    \"\"\"Validate the BiG-SCAPE data directory and its contents.\n\n    The BiG-SCAPE data directory must exist and contain the clustering file\n    `mix_clustering_c{self.config.bigscape.cutoff}.tsv` where `{self.config.bigscape.cutoff}` is the\n    bigscape cutoff value set in the config file.\n\n    Alternatively, the directory can contain the BiG-SCAPE database file generated by BiG-SCAPE v2.\n    At the moment, all the family assignments in the database will be used, so this database should\n    contain results from a single run with the desired cutoff.\n\n    Args:\n        bigscape_dir: Path to the BiG-SCAPE data directory.\n        cutoff: The BiG-SCAPE cutoff value.\n\n    Raises:\n        FileNotFoundError: If the BiG-SCAPE data directory or the clustering file is not found.\n    \"\"\"\n    bigscape_dir = Path(bigscape_dir)\n    if not bigscape_dir.exists():\n        raise FileNotFoundError(f\"BiG-SCAPE data directory not found at {bigscape_dir}\")\n\n    clustering_file = bigscape_dir / f\"mix_clustering_c{cutoff}.tsv\"\n    database_file = bigscape_dir / \"data_sqlite.db\"\n    if not clustering_file.exists() and not database_file.exists():\n        raise FileNotFoundError(f\"BiG-SCAPE data not found in {clustering_file} or {database_file}\")\n</code></pre>"},{"location":"api/bigscape/","title":"BigScape","text":""},{"location":"api/bigscape/#nplinker.genomics.bigscape","title":"nplinker.genomics.bigscape","text":""},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeGCFLoader","title":"BigscapeGCFLoader","text":"<pre><code>BigscapeGCFLoader(cluster_file: str | PathLike)\n</code></pre> <p>               Bases: <code>GCFLoaderBase</code></p> <p>Data loader for BiG-SCAPE GCF cluster file.</p> <p>Attributes:</p> <ul> <li> <code>cluster_file</code>               (<code>str</code>)           \u2013            <p>path to the BiG-SCAPE cluster file.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>cluster_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the BiG-SCAPE cluster file, the filename has a pattern of <code>&lt;class&gt;_clustering_c0.xx.tsv</code>.</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def __init__(self, cluster_file: str | PathLike, /) -&gt; None:\n    \"\"\"Initialize the BiG-SCAPE GCF loader.\n\n    Args:\n        cluster_file: Path to the BiG-SCAPE cluster file,\n            the filename has a pattern of `&lt;class&gt;_clustering_c0.xx.tsv`.\n    \"\"\"\n    self.cluster_file: str = str(cluster_file)\n    self._gcf_list = self._parse_gcf(self.cluster_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeGCFLoader.cluster_file","title":"cluster_file  <code>instance-attribute</code>","text":"<pre><code>cluster_file: str = str(cluster_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeGCFLoader.get_gcfs","title":"get_gcfs","text":"<pre><code>get_gcfs(\n    keep_mibig_only: bool = False,\n    keep_singleton: bool = False,\n) -&gt; list[GCF]\n</code></pre> <p>Get all GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_mibig_only</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep GCFs that contain only MIBiG BGCs.</p> </li> <li> <code>keep_singleton</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep singleton GCFs. A singleton GCF is a GCF that contains only one BGC.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[GCF]</code>           \u2013            <p>A list of GCF objects.</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def get_gcfs(self, keep_mibig_only: bool = False, keep_singleton: bool = False) -&gt; list[GCF]:\n    \"\"\"Get all GCF objects.\n\n    Args:\n        keep_mibig_only: True to keep GCFs that contain only MIBiG\n            BGCs.\n        keep_singleton: True to keep singleton GCFs. A singleton GCF\n            is a GCF that contains only one BGC.\n\n    Returns:\n        A list of GCF objects.\n    \"\"\"\n    gcf_list = self._gcf_list\n    if not keep_mibig_only:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.has_mibig_only()]\n    if not keep_singleton:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.is_singleton()]\n    return gcf_list\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeV2GCFLoader","title":"BigscapeV2GCFLoader","text":"<pre><code>BigscapeV2GCFLoader(db_file: str | PathLike)\n</code></pre> <p>               Bases: <code>GCFLoaderBase</code></p> <p>Data loader for BiG-SCAPE v2 database file.</p> <p>Attributes:</p> <ul> <li> <code>db_file</code>           \u2013            <p>Path to the BiG-SCAPE database file.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>db_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the BiG-SCAPE v2 database file</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def __init__(self, db_file: str | PathLike, /) -&gt; None:\n    \"\"\"Initialize the BiG-SCAPE v2 GCF loader.\n\n    Args:\n        db_file: Path to the BiG-SCAPE v2 database file\n    \"\"\"\n    self.db_file = str(db_file)\n    self._gcf_list = self._parse_gcf(self.db_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeV2GCFLoader.db_file","title":"db_file  <code>instance-attribute</code>","text":"<pre><code>db_file = str(db_file)\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.BigscapeV2GCFLoader.get_gcfs","title":"get_gcfs","text":"<pre><code>get_gcfs(\n    keep_mibig_only: bool = False,\n    keep_singleton: bool = False,\n) -&gt; list[GCF]\n</code></pre> <p>Get all GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_mibig_only</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep GCFs that contain only MIBiG BGCs.</p> </li> <li> <code>keep_singleton</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep singleton GCFs. A singleton GCF is a GCF that contains only one BGC.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[GCF]</code>           \u2013            <p>a list of GCF objects.</p> </li> </ul> Source code in <code>src/nplinker/genomics/bigscape/bigscape_loader.py</code> <pre><code>def get_gcfs(self, keep_mibig_only: bool = False, keep_singleton: bool = False) -&gt; list[GCF]:\n    \"\"\"Get all GCF objects.\n\n    Args:\n        keep_mibig_only: True to keep GCFs that contain only MIBiG BGCs.\n        keep_singleton: True to keep singleton GCFs.\n            A singleton GCF is a GCF that contains only one BGC.\n\n    Returns:\n        a list of GCF objects.\n    \"\"\"\n    gcf_list = self._gcf_list\n    if not keep_mibig_only:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.has_mibig_only()]\n    if not keep_singleton:\n        gcf_list = [gcf for gcf in gcf_list if not gcf.is_singleton()]\n    return gcf_list\n</code></pre>"},{"location":"api/bigscape/#nplinker.genomics.bigscape.run_bigscape","title":"run_bigscape","text":"<pre><code>run_bigscape(\n    antismash_path: str | PathLike,\n    output_path: str | PathLike,\n    extra_params: str,\n    version: Literal[1, 2] = 1,\n) -&gt; bool\n</code></pre> <p>Runs BiG-SCAPE to cluster BGCs.</p> <p>The behavior of this function is slightly different depending on the version of BiG-SCAPE that is set to run using the configuration file. Mostly this means a different set of parameters is used between the two versions.</p> <p>The AntiSMASH output directory should be a directory that contains GBK files. The directory can contain subdirectories, in which case BiG-SCAPE will search recursively for GBK files. E.g.:</p> <pre><code>example_folder\n    \u251c\u2500\u2500 organism_1\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.region001.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.region002.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.region003.gbk\n    \u2502\u00a0 \u251c\u2500\u2500 organism_1.final.gbk          &lt;- skipped!\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 organism_2\n    \u2502\u00a0 \u251c\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre> <p>By default, only GBK Files with \"cluster\" or \"region\" in the filename are accepted. GBK Files with \"final\" in the filename are excluded.</p> <p>Parameters:</p> <ul> <li> <code>antismash_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the antismash output directory.</p> </li> <li> <code>output_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the output directory where BiG-SCAPE will write its results.</p> </li> <li> <code>extra_params</code>               (<code>str</code>)           \u2013            <p>Additional parameters to pass to BiG-SCAPE.</p> </li> <li> <code>version</code>               (<code>Literal[1, 2]</code>, default:                   <code>1</code> )           \u2013            <p>The version of BiG-SCAPE to run. Must be 1 or 2.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if BiG-SCAPE ran successfully, False otherwise.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If an unexpected BiG-SCAPE version number is specified.</p> </li> <li> <code>FileNotFoundError</code>             \u2013            <p>If the antismash_path does not exist or if the BiG-SCAPE python script could not be found.</p> </li> <li> <code>RuntimeError</code>             \u2013            <p>If BiG-SCAPE fails to run.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt;  from nplinker.genomics.bigscape import run_bigscape\n&gt;&gt;&gt; run_bigscape(antismash_path=\"./antismash\", output_path=\"./output\",\n... extra_params=\"--help\", version=1)\n</code></pre> Source code in <code>src/nplinker/genomics/bigscape/runbigscape.py</code> <pre><code>def run_bigscape(\n    antismash_path: str | PathLike,\n    output_path: str | PathLike,\n    extra_params: str,\n    version: Literal[1, 2] = 1,\n) -&gt; bool:\n    \"\"\"Runs BiG-SCAPE to cluster BGCs.\n\n    The behavior of this function is slightly different depending on the version of\n    BiG-SCAPE that is set to run using the configuration file.\n    Mostly this means a different set of parameters is used between the two versions.\n\n    The AntiSMASH output directory should be a directory that contains GBK files.\n    The directory can contain subdirectories, in which case BiG-SCAPE will search\n    recursively for GBK files. E.g.:\n\n    ```\n    example_folder\n        \u251c\u2500\u2500 organism_1\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.region001.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.region002.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.region003.gbk\n        \u2502\u00a0 \u251c\u2500\u2500 organism_1.final.gbk          &lt;- skipped!\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u251c\u2500\u2500 organism_2\n        \u2502\u00a0 \u251c\u2500\u2500 ...\n        \u2514\u2500\u2500 ...\n    ```\n\n    By default, only GBK Files with \"cluster\" or \"region\" in the filename are\n    accepted. GBK Files with \"final\" in the filename are excluded.\n\n    Args:\n        antismash_path: Path to the antismash output directory.\n        output_path: Path to the output directory where BiG-SCAPE will write its results.\n        extra_params: Additional parameters to pass to BiG-SCAPE.\n        version: The version of BiG-SCAPE to run. Must be 1 or 2.\n\n    Returns:\n        True if BiG-SCAPE ran successfully, False otherwise.\n\n    Raises:\n        ValueError: If an unexpected BiG-SCAPE version number is specified.\n        FileNotFoundError: If the antismash_path does not exist or if the BiG-SCAPE python\n            script could not be found.\n        RuntimeError: If BiG-SCAPE fails to run.\n\n    Examples:\n        &gt;&gt;&gt;  from nplinker.genomics.bigscape import run_bigscape\n        &gt;&gt;&gt; run_bigscape(antismash_path=\"./antismash\", output_path=\"./output\",\n        ... extra_params=\"--help\", version=1)\n    \"\"\"\n    # switch to correct version of BiG-SCAPE\n    if version == 1:\n        bigscape_py_path = \"bigscape.py\"\n    elif version == 2:\n        bigscape_py_path = \"bigscape-v2.py\"\n    else:\n        raise ValueError(\"Invalid BiG-SCAPE version number. Expected: 1 or 2.\")\n\n    try:\n        subprocess.run([bigscape_py_path, \"-h\"], capture_output=True, check=True)\n    except Exception as e:\n        raise FileNotFoundError(\n            f\"Failed to find/run BiG-SCAPE executable program (path={bigscape_py_path}, err={e})\"\n        ) from e\n\n    if not os.path.exists(antismash_path):\n        raise FileNotFoundError(f'antismash_path \"{antismash_path}\" does not exist!')\n\n    logger.info(f\"Running BiG-SCAPE version {version}\")\n    logger.info(\n        f'run_bigscape: input=\"{antismash_path}\", output=\"{output_path}\", extra_params={extra_params}\"'\n    )\n\n    # assemble arguments. first argument is the python file\n    args = [bigscape_py_path]\n\n    # version 2 points to specific Pfam file, version 1 points to directory\n    # version 2 also requires the cluster subcommand\n    if version == 1:\n        args.extend([\"--pfam_dir\", PFAM_PATH])\n    elif version == 2:\n        args.extend([\"cluster\", \"--pfam_path\", os.path.join(PFAM_PATH, \"Pfam-A.hmm\")])\n\n    # add input and output paths. these are unchanged\n    args.extend([\"-i\", str(antismash_path), \"-o\", str(output_path)])\n\n    # append the user supplied params, if any\n    if len(extra_params) &gt; 0:\n        args.extend(extra_params.split(\" \"))\n\n    logger.info(f\"BiG-SCAPE command: {args}\")\n    result = subprocess.run(args, stdout=sys.stdout, stderr=sys.stderr)\n\n    # return true on any non-error return code\n    if result.returncode == 0:\n        logger.info(f\"BiG-SCAPE completed with return code {result.returncode}\")\n        return True\n\n    # otherwise log details and raise a runtime error\n    logger.error(f\"BiG-SCAPE failed with return code {result.returncode}\")\n    logger.error(f\"output: {str(result.stdout)}\")\n    logger.error(f\"stderr: {str(result.stderr)}\")\n\n    raise RuntimeError(f\"Failed to run BiG-SCAPE with error code {result.returncode}\")\n</code></pre>"},{"location":"api/genomics/","title":"Data Models","text":""},{"location":"api/genomics/#nplinker.genomics","title":"nplinker.genomics","text":""},{"location":"api/genomics/#nplinker.genomics.BGC","title":"BGC","text":"<pre><code>BGC(id: str, /, *product_prediction: str)\n</code></pre> <p>Class to model BGC (biosynthetic gene cluster) data.</p> <p>BGC data include both annotations and sequence data. This class is mainly designed to model the annotations or metadata.</p> <p>The raw BGC data is stored in GenBank format (.gbk). Additional GenBank features could be added to the GenBank file to annotate BGCs, e.g. antiSMASH has some self-defined features (like <code>region</code>) in its output GenBank files.</p> <p>The annotations of BGC can be stored in JSON format, which is defined and used by MIBiG.</p> <p>Attributes:</p> <ul> <li> <code>id</code>           \u2013            <p>BGC identifier, e.g. MIBiG accession, GenBank accession.</p> </li> <li> <code>product_prediction</code>           \u2013            <p>A tuple of (predicted) natural products or product classes of the BGC. For antiSMASH's GenBank data, the feature <code>region /product</code> gives product information. For MIBiG metadata, its biosynthetic class provides such info.</p> </li> <li> <code>mibig_bgc_class</code>               (<code>tuple[str] | None</code>)           \u2013            <p>A tuple of MIBiG biosynthetic classes to which the BGC belongs. Defaults to None, which means the class is unknown.</p> <p>MIBiG defines 6 major biosynthetic classes for natural products, including <code>NRP</code>, <code>Polyketide</code>, <code>RiPP</code>, <code>Terpene</code>, <code>Saccharide</code> and <code>Alkaloid</code>. Note that natural products created by the other biosynthetic mechanisms fall under the category <code>Other</code>. For more details see the paper.</p> </li> <li> <code>description</code>               (<code>str | None</code>)           \u2013            <p>Brief description of the BGC. Defaults to None.</p> </li> <li> <code>smiles</code>               (<code>tuple[str] | None</code>)           \u2013            <p>A tuple of SMILES formulas of the BGC's products. Defaults to None.</p> </li> <li> <code>antismash_file</code>               (<code>str | None</code>)           \u2013            <p>The path to the antiSMASH GenBank file. Defaults to None.</p> </li> <li> <code>antismash_id</code>               (<code>str | None</code>)           \u2013            <p>Identifier of the antiSMASH BGC, referring to the feature <code>VERSION</code> of GenBank file. Defaults to None.</p> </li> <li> <code>antismash_region</code>               (<code>int | None</code>)           \u2013            <p>AntiSMASH BGC region number, referring to the feature <code>region</code> of GenBank file. Defaults to None.</p> </li> <li> <code>parents</code>               (<code>set[GCF]</code>)           \u2013            <p>The set of GCFs that contain the BGC.</p> </li> <li> <code>strain</code>               (<code>Strain | None</code>)           \u2013            <p>The strain of the BGC.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>BGC identifier, e.g. MIBiG accession, GenBank accession.</p> </li> <li> <code>product_prediction</code>               (<code>str</code>, default:                   <code>()</code> )           \u2013            <p>BGC's (predicted) natural products or product classes.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; bgc = BGC(\"Unique_BGC_ID\", \"Polyketide\", \"NRP\")\n&gt;&gt;&gt; bgc.id\n'Unique_BGC_ID'\n&gt;&gt;&gt; bgc.product_prediction\n('Polyketide', 'NRP')\n&gt;&gt;&gt; bgc.is_mibig()\nFalse\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __init__(self, id: str, /, *product_prediction: str):\n    \"\"\"Initialize the BGC object.\n\n    Args:\n        id: BGC identifier, e.g. MIBiG accession, GenBank accession.\n        product_prediction: BGC's (predicted) natural products or product classes.\n\n    Examples:\n        &gt;&gt;&gt; bgc = BGC(\"Unique_BGC_ID\", \"Polyketide\", \"NRP\")\n        &gt;&gt;&gt; bgc.id\n        'Unique_BGC_ID'\n        &gt;&gt;&gt; bgc.product_prediction\n        ('Polyketide', 'NRP')\n        &gt;&gt;&gt; bgc.is_mibig()\n        False\n    \"\"\"\n    # BGC metadata\n    self.id = id\n    self.product_prediction = product_prediction\n\n    self.mibig_bgc_class: tuple[str] | None = None\n    self.description: str | None = None\n    self.smiles: tuple[str] | None = None\n\n    # antismash related attributes\n    self.antismash_file: str | None = None\n    self.antismash_id: str | None = None  # version in .gbk, id in SeqRecord\n    self.antismash_region: int | None = None  # antismash region number\n\n    # other attributes\n    self.parents: set[GCF] = set()\n    self._strain: Strain | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id = id\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.product_prediction","title":"product_prediction  <code>instance-attribute</code>","text":"<pre><code>product_prediction = product_prediction\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.mibig_bgc_class","title":"mibig_bgc_class  <code>instance-attribute</code>","text":"<pre><code>mibig_bgc_class: tuple[str] | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.description","title":"description  <code>instance-attribute</code>","text":"<pre><code>description: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.smiles","title":"smiles  <code>instance-attribute</code>","text":"<pre><code>smiles: tuple[str] | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.antismash_file","title":"antismash_file  <code>instance-attribute</code>","text":"<pre><code>antismash_file: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.antismash_id","title":"antismash_id  <code>instance-attribute</code>","text":"<pre><code>antismash_id: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.antismash_region","title":"antismash_region  <code>instance-attribute</code>","text":"<pre><code>antismash_region: int | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.parents","title":"parents  <code>instance-attribute</code>","text":"<pre><code>parents: set[GCF] = set()\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.strain","title":"strain  <code>property</code> <code>writable</code>","text":"<pre><code>strain: Strain | None\n</code></pre> <p>Get the strain of the BGC.</p>"},{"location":"api/genomics/#nplinker.genomics.BGC.bigscape_classes","title":"bigscape_classes  <code>property</code>","text":"<pre><code>bigscape_classes: set[str | None]\n</code></pre> <p>Get BiG-SCAPE's BGC classes.</p> <p>BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:</p> <ul> <li>NRPS</li> <li>PKS-NRP_Hybrids</li> <li>PKSI</li> <li>PKSother</li> <li>RiPPs</li> <li>Saccharides</li> <li>Terpene</li> </ul> <p>For BGC falls outside of these categories, the value is \"Others\".</p> <p>Default is None, which means the class is unknown.</p> <p>More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.</p>"},{"location":"api/genomics/#nplinker.genomics.BGC.aa_predictions","title":"aa_predictions  <code>property</code>","text":"<pre><code>aa_predictions: list\n</code></pre> <p>Amino acids as predicted monomers of product.</p> <p>Returns:</p> <ul> <li> <code>list</code>           \u2013            <p>list of dicts with key as amino acid and value as prediction</p> </li> <li> <code>list</code>           \u2013            <p>probability.</p> </li> </ul>"},{"location":"api/genomics/#nplinker.genomics.BGC.__repr__","title":"__repr__","text":"<pre><code>__repr__()\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __repr__(self):\n    return str(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__str__","title":"__str__","text":"<pre><code>__str__()\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __str__(self):\n    return \"{}(id={}, strain={}, asid={}, region={})\".format(\n        self.__class__.__name__,\n        self.id,\n        self.strain,\n        self.antismash_id,\n        self.antismash_region,\n    )\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, BGC):\n        return self.id == other.id and self.product_prediction == other.product_prediction\n    return NotImplemented\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __hash__(self) -&gt; int:\n    return hash((self.id, self.product_prediction))\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (self.__class__, (self.id, *self.product_prediction), self.__dict__)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.add_parent","title":"add_parent","text":"<pre><code>add_parent(gcf: GCF) -&gt; None\n</code></pre> <p>Add a parent GCF to the BGC.</p> <p>Parameters:</p> <ul> <li> <code>gcf</code>               (<code>GCF</code>)           \u2013            <p>gene cluster family</p> </li> </ul> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def add_parent(self, gcf: GCF) -&gt; None:\n    \"\"\"Add a parent GCF to the BGC.\n\n    Args:\n        gcf: gene cluster family\n    \"\"\"\n    gcf.add_bgc(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.detach_parent","title":"detach_parent","text":"<pre><code>detach_parent(gcf: GCF) -&gt; None\n</code></pre> <p>Remove a parent GCF.</p> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def detach_parent(self, gcf: GCF) -&gt; None:\n    \"\"\"Remove a parent GCF.\"\"\"\n    gcf.detach_bgc(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.BGC.is_mibig","title":"is_mibig","text":"<pre><code>is_mibig() -&gt; bool\n</code></pre> <p>Check if the BGC is a MIBiG reference BGC or not.</p> Warning <p>This method evaluates MIBiG BGC based on the pattern that MIBiG BGC names start with \"BGC\". It might give false positive result.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if it's MIBiG reference BGC</p> </li> </ul> Source code in <code>src/nplinker/genomics/bgc.py</code> <pre><code>def is_mibig(self) -&gt; bool:\n    \"\"\"Check if the BGC is a MIBiG reference BGC or not.\n\n    Warning:\n        This method evaluates MIBiG BGC based on the pattern that MIBiG\n        BGC names start with \"BGC\". It might give false positive result.\n\n    Returns:\n        True if it's MIBiG reference BGC\n    \"\"\"\n    return self.id.startswith(\"BGC\")\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF","title":"GCF","text":"<pre><code>GCF(id: str)\n</code></pre> <p>Class to model gene cluster family (GCF).</p> <p>GCF is a group of similar BGCs and generated by clustering BGCs with tools such as BiG-SCAPE and BiG-SLICE.</p> <p>Attributes:</p> <ul> <li> <code>id</code>           \u2013            <p>id of the GCF object.</p> </li> <li> <code>bgc_ids</code>               (<code>set[str]</code>)           \u2013            <p>a set of BGC ids that belongs to the GCF.</p> </li> <li> <code>bigscape_class</code>               (<code>str | None</code>)           \u2013            <p>BiG-SCAPE's BGC class. BiG-SCAPE's BGC classes are similar to those defined in MiBIG but have more categories (7 classes), including:</p> <ul> <li>NRPS</li> <li>PKS-NRP_Hybrids</li> <li>PKSI</li> <li>PKSother</li> <li>RiPPs</li> <li>Saccharides</li> <li>Terpene</li> </ul> <p>For BGC falls outside of these categories, the value is \"Others\".</p> <p>Default is None, which means the class is unknown.</p> <p>More details see: https://doi.org/10.1038%2Fs41589-019-0400-9.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>id of the GCF object.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gcf = GCF(\"Unique_GCF_ID\")\n&gt;&gt;&gt; gcf.id\n'Unique_GCF_ID'\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __init__(self, id: str, /) -&gt; None:\n    \"\"\"Initialize the GCF object.\n\n    Args:\n        id: id of the GCF object.\n\n    Examples:\n        &gt;&gt;&gt; gcf = GCF(\"Unique_GCF_ID\")\n        &gt;&gt;&gt; gcf.id\n        'Unique_GCF_ID'\n    \"\"\"\n    self.id = id\n    self.bgc_ids: set[str] = set()\n    self.bigscape_class: str | None = None\n    self._bgcs: set[BGC] = set()\n    self._strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id = id\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.bgc_ids","title":"bgc_ids  <code>instance-attribute</code>","text":"<pre><code>bgc_ids: set[str] = set()\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.bigscape_class","title":"bigscape_class  <code>instance-attribute</code>","text":"<pre><code>bigscape_class: str | None = None\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.bgcs","title":"bgcs  <code>property</code>","text":"<pre><code>bgcs: set[BGC]\n</code></pre> <p>Get the BGC objects.</p>"},{"location":"api/genomics/#nplinker.genomics.GCF.strains","title":"strains  <code>property</code>","text":"<pre><code>strains: StrainCollection\n</code></pre> <p>Get the strains in the GCF.</p>"},{"location":"api/genomics/#nplinker.genomics.GCF.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __str__(self) -&gt; str:\n    return (\n        f\"GCF(id={self.id}, #BGC_objects={len(self.bgcs)}, #bgc_ids={len(self.bgc_ids)},\"\n        f\"#strains={len(self._strains)}).\"\n    )\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, GCF):\n        return self.id == other.id and self.bgcs == other.bgcs\n    return NotImplemented\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> <p>Hash function for GCF.</p> <p>Note that GCF class is a mutable container. We only hash the GCF id to avoid the hash value changes when <code>self._bgcs</code> is updated.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __hash__(self) -&gt; int:\n    \"\"\"Hash function for GCF.\n\n    Note that GCF class is a mutable container. We only hash the GCF id to\n    avoid the hash value changes when `self._bgcs` is updated.\n    \"\"\"\n    return hash(self.id)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (self.__class__, (self.id,), self.__dict__)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.add_bgc","title":"add_bgc","text":"<pre><code>add_bgc(bgc: BGC) -&gt; None\n</code></pre> <p>Add a BGC object to the GCF.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def add_bgc(self, bgc: BGC) -&gt; None:\n    \"\"\"Add a BGC object to the GCF.\"\"\"\n    bgc.parents.add(self)\n    self._bgcs.add(bgc)\n    self.bgc_ids.add(bgc.id)\n    if bgc.strain is not None:\n        self._strains.add(bgc.strain)\n    else:\n        logger.warning(\"No strain specified for the BGC %s\", bgc.id)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.detach_bgc","title":"detach_bgc","text":"<pre><code>detach_bgc(bgc: BGC) -&gt; None\n</code></pre> <p>Remove a child BGC object.</p> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def detach_bgc(self, bgc: BGC) -&gt; None:\n    \"\"\"Remove a child BGC object.\"\"\"\n    bgc.parents.remove(self)\n    self._bgcs.remove(bgc)\n    self.bgc_ids.remove(bgc.id)\n    if bgc.strain is not None:\n        for other_bgc in self._bgcs:\n            if other_bgc.strain == bgc.strain:\n                return\n        self._strains.remove(bgc.strain)\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.has_strain","title":"has_strain","text":"<pre><code>has_strain(strain: Strain) -&gt; bool\n</code></pre> <p>Check if the given strain exists.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p><code>Strain</code> object.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the given strain exist.</p> </li> </ul> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def has_strain(self, strain: Strain) -&gt; bool:\n    \"\"\"Check if the given strain exists.\n\n    Args:\n        strain: `Strain` object.\n\n    Returns:\n        True when the given strain exist.\n    \"\"\"\n    return strain in self._strains\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.has_mibig_only","title":"has_mibig_only","text":"<pre><code>has_mibig_only() -&gt; bool\n</code></pre> <p>Check if the GCF's children are only MIBiG BGCs.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if <code>GCF.bgc_ids</code> are only MIBiG BGC ids.</p> </li> </ul> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def has_mibig_only(self) -&gt; bool:\n    \"\"\"Check if the GCF's children are only MIBiG BGCs.\n\n    Returns:\n        True if `GCF.bgc_ids` are only MIBiG BGC ids.\n    \"\"\"\n    return all(map(lambda id: id.startswith(\"BGC\"), self.bgc_ids))\n</code></pre>"},{"location":"api/genomics/#nplinker.genomics.GCF.is_singleton","title":"is_singleton","text":"<pre><code>is_singleton() -&gt; bool\n</code></pre> <p>Check if the GCF contains only one BGC.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if <code>GCF.bgc_ids</code> contains only one BGC id.</p> </li> </ul> Source code in <code>src/nplinker/genomics/gcf.py</code> <pre><code>def is_singleton(self) -&gt; bool:\n    \"\"\"Check if the GCF contains only one BGC.\n\n    Returns:\n        True if `GCF.bgc_ids` contains only one BGC id.\n    \"\"\"\n    return len(self.bgc_ids) == 1\n</code></pre>"},{"location":"api/genomics_abc/","title":"Abstract Base Classes","text":""},{"location":"api/genomics_abc/#nplinker.genomics.abc","title":"nplinker.genomics.abc","text":""},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase","title":"BGCLoaderBase","text":"<pre><code>BGCLoaderBase(data_dir: str | PathLike)\n</code></pre> <p>               Bases: <code>ABC</code></p> <p>Abstract base class for BGC loader.</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to directory that contains BGC metadata files (.json) or full data genbank files (.gbk).</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>def __init__(self, data_dir: str | PathLike) -&gt; None:\n    \"\"\"Initialize the BGC loader.\n\n    Args:\n        data_dir: Path to directory that contains BGC metadata files\n            (.json) or full data genbank files (.gbk).\n    \"\"\"\n    self.data_dir = str(data_dir)\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase.data_dir","title":"data_dir  <code>instance-attribute</code>","text":"<pre><code>data_dir = str(data_dir)\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase.get_files","title":"get_files  <code>abstractmethod</code>","text":"<pre><code>get_files() -&gt; dict[str, str]\n</code></pre> <p>Get path to BGC files.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is BGC name and value is path to BGC file</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>@abstractmethod\ndef get_files(self) -&gt; dict[str, str]:\n    \"\"\"Get path to BGC files.\n\n    Returns:\n        The key is BGC name and value is path to BGC file\n    \"\"\"\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.BGCLoaderBase.get_bgcs","title":"get_bgcs  <code>abstractmethod</code>","text":"<pre><code>get_bgcs() -&gt; list[BGC]\n</code></pre> <p>Get BGC objects.</p> <p>Returns:</p> <ul> <li> <code>list[BGC]</code>           \u2013            <p>A list of BGC objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>@abstractmethod\ndef get_bgcs(self) -&gt; list[BGC]:\n    \"\"\"Get BGC objects.\n\n    Returns:\n        A list of BGC objects\n    \"\"\"\n</code></pre>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.GCFLoaderBase","title":"GCFLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for GCF loader.</p>"},{"location":"api/genomics_abc/#nplinker.genomics.abc.GCFLoaderBase.get_gcfs","title":"get_gcfs  <code>abstractmethod</code>","text":"<pre><code>get_gcfs(\n    keep_mibig_only: bool, keep_singleton: bool\n) -&gt; list[GCF]\n</code></pre> <p>Get GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_mibig_only</code>               (<code>bool</code>)           \u2013            <p>True to keep GCFs that contain only MIBiG BGCs.</p> </li> <li> <code>keep_singleton</code>               (<code>bool</code>)           \u2013            <p>True to keep singleton GCFs. A singleton GCF is a GCF that contains only one BGC.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[GCF]</code>           \u2013            <p>A list of GCF objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/abc.py</code> <pre><code>@abstractmethod\ndef get_gcfs(self, keep_mibig_only: bool, keep_singleton: bool) -&gt; list[GCF]:\n    \"\"\"Get GCF objects.\n\n    Args:\n        keep_mibig_only: True to keep GCFs that contain only MIBiG\n            BGCs.\n        keep_singleton: True to keep singleton GCFs. A singleton GCF\n            is a GCF that contains only one BGC.\n\n    Returns:\n        A list of GCF objects\n    \"\"\"\n</code></pre>"},{"location":"api/genomics_utils/","title":"Utilities","text":""},{"location":"api/genomics_utils/#nplinker.genomics.utils","title":"nplinker.genomics.utils","text":""},{"location":"api/genomics_utils/#nplinker.genomics.utils.generate_mappings_genome_id_bgc_id","title":"generate_mappings_genome_id_bgc_id","text":"<pre><code>generate_mappings_genome_id_bgc_id(\n    bgc_dir: str | PathLike,\n    output_file: str | PathLike | None = None,\n) -&gt; None\n</code></pre> <p>Generate a file that maps genome id to BGC id.</p> <p>The input <code>bgc_dir</code> must follow the structure of the <code>antismash</code> directory defined in Working Directory Structure, e.g.: <pre><code>bgc_dir\n    \u251c\u2500\u2500 genome_id_1\n    \u2502\u00a0 \u251c\u2500\u2500 bgc_id_1.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u251c\u2500\u2500 genome_id_2\n    \u2502\u00a0 \u251c\u2500\u2500 bgc_id_2.gbk\n    \u2502\u00a0 \u2514\u2500\u2500 ...\n    \u2514\u2500\u2500 ...\n</code></pre></p> <p>Parameters:</p> <ul> <li> <code>bgc_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>The directory has one-layer of subfolders and each subfolder contains BGC files in <code>.gbk</code> format.</p> <p>It assumes that</p> <ul> <li>the subfolder name is the genome id (e.g. refseq),</li> <li>the BGC file name is the BGC id.</li> </ul> </li> <li> <code>output_file</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>The path to the output file. The file will be overwritten if it already exists.</p> <p>Defaults to None, in which case the output file will be placed in the directory <code>bgc_dir</code> with the file name GENOME_BGC_MAPPINGS_FILENAME.</p> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def generate_mappings_genome_id_bgc_id(\n    bgc_dir: str | PathLike, output_file: str | PathLike | None = None\n) -&gt; None:\n    \"\"\"Generate a file that maps genome id to BGC id.\n\n    The input `bgc_dir` must follow the structure of the `antismash` directory defined in\n    [Working Directory Structure][working-directory-structure], e.g.:\n    ```shell\n    bgc_dir\n        \u251c\u2500\u2500 genome_id_1\n        \u2502\u00a0 \u251c\u2500\u2500 bgc_id_1.gbk\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u251c\u2500\u2500 genome_id_2\n        \u2502\u00a0 \u251c\u2500\u2500 bgc_id_2.gbk\n        \u2502\u00a0 \u2514\u2500\u2500 ...\n        \u2514\u2500\u2500 ...\n    ```\n\n    Args:\n        bgc_dir: The directory has one-layer of subfolders and each subfolder contains BGC files\n            in `.gbk` format.\n\n            It assumes that\n\n            - the subfolder name is the genome id (e.g. refseq),\n            - the BGC file name is the BGC id.\n        output_file: The path to the output file.\n            The file will be overwritten if it already exists.\n\n            Defaults to None, in which case the output file will be placed in\n            the directory `bgc_dir` with the file name\n            [GENOME_BGC_MAPPINGS_FILENAME][nplinker.defaults.GENOME_BGC_MAPPINGS_FILENAME].\n    \"\"\"\n    bgc_dir = Path(bgc_dir)\n    genome_bgc_mappings = {}\n\n    for subdir in list_dirs(bgc_dir):\n        genome_id = Path(subdir).name\n        bgc_files = list_files(subdir, suffix=(\".gbk\"), keep_parent=False)\n        bgc_ids = [bgc_id for f in bgc_files if (bgc_id := Path(f).stem) != genome_id]\n        if bgc_ids:\n            genome_bgc_mappings[genome_id] = bgc_ids\n        else:\n            logger.warning(\"No BGC files found in %s\", subdir)\n\n    # sort mappings by genome_id and construct json data\n    genome_bgc_mappings = dict(sorted(genome_bgc_mappings.items()))\n    json_data_mappings = [{\"genome_ID\": k, \"BGC_ID\": v} for k, v in genome_bgc_mappings.items()]\n    json_data = {\"mappings\": json_data_mappings, \"version\": \"1.0\"}\n\n    # validate json data\n    validate(instance=json_data, schema=GENOME_BGC_MAPPINGS_SCHEMA)\n\n    if output_file is None:\n        output_file = bgc_dir / GENOME_BGC_MAPPINGS_FILENAME\n    with open(output_file, \"w\") as f:\n        json.dump(json_data, f)\n    logger.info(\"Generated genome-BGC mappings file: %s\", output_file)\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.add_strain_to_bgc","title":"add_strain_to_bgc","text":"<pre><code>add_strain_to_bgc(\n    strains: StrainCollection, bgcs: Sequence[BGC]\n) -&gt; tuple[list[BGC], list[BGC]]\n</code></pre> <p>Assign a Strain object to <code>BGC.strain</code> for input BGCs.</p> <p>BGC id is used to find the corresponding Strain object. It's possible that no Strain object is found for a BGC id.</p> <p>Note</p> <p>The input <code>bgcs</code> will be changed in place.</p> <p>Parameters:</p> <ul> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A collection of all strain objects.</p> </li> <li> <code>bgcs</code>               (<code>Sequence[BGC]</code>)           \u2013            <p>A list of BGC objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[BGC], list[BGC]]</code>           \u2013            <p>A tuple of two lists of BGC objects,</p> <ul> <li>the first list contains BGC objects that are updated with Strain object;</li> <li>the second list contains BGC objects that are not updated with     Strain object because no Strain object is found.</li> </ul> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Multiple strain objects found for a BGC id.</p> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def add_strain_to_bgc(\n    strains: StrainCollection, bgcs: Sequence[BGC]\n) -&gt; tuple[list[BGC], list[BGC]]:\n    \"\"\"Assign a Strain object to `BGC.strain` for input BGCs.\n\n    BGC id is used to find the corresponding Strain object. It's possible that\n    no Strain object is found for a BGC id.\n\n    !!! Note\n        The input `bgcs` will be changed in place.\n\n    Args:\n        strains: A collection of all strain objects.\n        bgcs: A list of BGC objects.\n\n    Returns:\n        A tuple of two lists of BGC objects,\n\n            - the first list contains BGC objects that are updated with Strain object;\n            - the second list contains BGC objects that are not updated with\n                Strain object because no Strain object is found.\n\n    Raises:\n        ValueError: Multiple strain objects found for a BGC id.\n    \"\"\"\n    bgc_with_strain = []\n    bgc_without_strain = []\n    for bgc in bgcs:\n        try:\n            strain_list = strains.lookup(bgc.id)\n        except ValueError:\n            bgc_without_strain.append(bgc)\n            continue\n        if len(strain_list) &gt; 1:\n            raise ValueError(\n                f\"Multiple strain objects found for BGC id '{bgc.id}'.\"\n                f\"BGC object accept only one strain.\"\n            )\n        bgc.strain = strain_list[0]\n        bgc_with_strain.append(bgc)\n\n    logger.info(\n        f\"{len(bgc_with_strain)} BGC objects updated with Strain object.\\n\"\n        f\"{len(bgc_without_strain)} BGC objects not updated with Strain object.\"\n    )\n    return bgc_with_strain, bgc_without_strain\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.add_bgc_to_gcf","title":"add_bgc_to_gcf","text":"<pre><code>add_bgc_to_gcf(\n    bgcs: Sequence[BGC], gcfs: Sequence[GCF]\n) -&gt; tuple[list[GCF], list[GCF], dict[GCF, set[str]]]\n</code></pre> <p>Add BGC objects to GCF object based on GCF's BGC ids.</p> <p>The attribute of <code>GCF.bgc_ids</code> contains the ids of BGC objects. These ids are used to find BGC objects from the input <code>bgcs</code> list. The found BGC objects are added to the <code>bgcs</code> attribute of GCF object. It is possible that some BGC ids are not found in the input <code>bgcs</code> list, and so their BGC objects are missing in the GCF object.</p> <p>Note</p> <p>This method changes the lists <code>bgcs</code> and <code>gcfs</code> in place.</p> <p>Parameters:</p> <ul> <li> <code>bgcs</code>               (<code>Sequence[BGC]</code>)           \u2013            <p>A list of BGC objects.</p> </li> <li> <code>gcfs</code>               (<code>Sequence[GCF]</code>)           \u2013            <p>A list of GCF objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[GCF], list[GCF], dict[GCF, set[str]]]</code>           \u2013            <p>A tuple of two lists and a dictionary,</p> <ul> <li>The first list contains GCF objects that are updated with BGC objects;</li> <li>The second list contains GCF objects that are not updated with BGC objects     because no BGC objects are found;</li> <li>The dictionary contains GCF objects as keys and a set of ids of missing     BGC objects as values.</li> </ul> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def add_bgc_to_gcf(\n    bgcs: Sequence[BGC], gcfs: Sequence[GCF]\n) -&gt; tuple[list[GCF], list[GCF], dict[GCF, set[str]]]:\n    \"\"\"Add BGC objects to GCF object based on GCF's BGC ids.\n\n    The attribute of `GCF.bgc_ids` contains the ids of BGC objects. These ids\n    are used to find BGC objects from the input `bgcs` list. The found BGC\n    objects are added to the `bgcs` attribute of GCF object. It is possible that\n    some BGC ids are not found in the input `bgcs` list, and so their BGC\n    objects are missing in the GCF object.\n\n    !!! note\n        This method changes the lists `bgcs` and `gcfs` in place.\n\n    Args:\n        bgcs: A list of BGC objects.\n        gcfs: A list of GCF objects.\n\n    Returns:\n        A tuple of two lists and a dictionary,\n\n            - The first list contains GCF objects that are updated with BGC objects;\n            - The second list contains GCF objects that are not updated with BGC objects\n                because no BGC objects are found;\n            - The dictionary contains GCF objects as keys and a set of ids of missing\n                BGC objects as values.\n    \"\"\"\n    bgc_dict = {bgc.id: bgc for bgc in bgcs}\n    gcf_with_bgc = []\n    gcf_without_bgc = []\n    gcf_missing_bgc: dict[GCF, set[str]] = {}\n    for gcf in gcfs:\n        for bgc_id in gcf.bgc_ids:\n            try:\n                bgc = bgc_dict[bgc_id]\n            except KeyError:\n                if gcf not in gcf_missing_bgc:\n                    gcf_missing_bgc[gcf] = {bgc_id}\n                else:\n                    gcf_missing_bgc[gcf].add(bgc_id)\n                continue\n            gcf.add_bgc(bgc)\n\n        if gcf.bgcs:\n            gcf_with_bgc.append(gcf)\n        else:\n            gcf_without_bgc.append(gcf)\n\n    logger.info(\n        f\"{len(gcf_with_bgc)} GCF objects updated with BGC objects.\\n\"\n        f\"{len(gcf_without_bgc)} GCF objects not updated with BGC objects.\\n\"\n        f\"{len(gcf_missing_bgc)} GCF objects have missing BGC objects.\"\n    )\n    return gcf_with_bgc, gcf_without_bgc, gcf_missing_bgc\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.get_mibig_from_gcf","title":"get_mibig_from_gcf","text":"<pre><code>get_mibig_from_gcf(\n    gcfs: Sequence[GCF],\n) -&gt; tuple[list[BGC], StrainCollection]\n</code></pre> <p>Get MIBiG BGCs and strains from GCF objects.</p> <p>Parameters:</p> <ul> <li> <code>gcfs</code>               (<code>Sequence[GCF]</code>)           \u2013            <p>A list of GCF objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[BGC], StrainCollection]</code>           \u2013            <p>A tuple of two objects,</p> <ul> <li>the first is a list of MIBiG BGC objects used in the GCFs;</li> <li>the second is a StrainCollection object that contains all Strain objects used in the GCFs.</li> </ul> </li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def get_mibig_from_gcf(gcfs: Sequence[GCF]) -&gt; tuple[list[BGC], StrainCollection]:\n    \"\"\"Get MIBiG BGCs and strains from GCF objects.\n\n    Args:\n        gcfs: A list of GCF objects.\n\n    Returns:\n        A tuple of two objects,\n\n            - the first is a list of MIBiG BGC objects used in the GCFs;\n            - the second is a StrainCollection object that contains all Strain objects used in the\n            GCFs.\n    \"\"\"\n    mibig_bgcs_in_use = []\n    mibig_strains_in_use = StrainCollection()\n    for gcf in gcfs:\n        for bgc in gcf.bgcs:\n            if bgc.is_mibig():\n                mibig_bgcs_in_use.append(bgc)\n                if bgc.strain is not None:\n                    mibig_strains_in_use.add(bgc.strain)\n    return mibig_bgcs_in_use, mibig_strains_in_use\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.extract_mappings_strain_id_original_genome_id","title":"extract_mappings_strain_id_original_genome_id","text":"<pre><code>extract_mappings_strain_id_original_genome_id(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"strain_id &lt;-&gt; original_genome_id\".</p> <p>Tip</p> <p>The <code>podp_project_json_file</code> is the JSON file downloaded from PODP platform.</p> <p>For example, for PODP project MSV000079284, its JSON file is https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.</p> <p>Parameters:</p> <ul> <li> <code>podp_project_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the PODP project JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of original genome ids.</p> </li> </ul> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def extract_mappings_strain_id_original_genome_id(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"strain_id &lt;-&gt; original_genome_id\".\n\n    !!! tip\n        The `podp_project_json_file` is the JSON file downloaded from PODP platform.\n\n        For example, for PODP project MSV000079284, its JSON file is\n        https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.\n\n    Args:\n        podp_project_json_file: The path to the PODP project\n            JSON file.\n\n    Returns:\n        Key is strain id and value is a set of original genome ids.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict: dict[str, set[str]] = {}\n    with open(podp_project_json_file, \"r\") as f:\n        json_data = json.load(f)\n\n    validate_podp_json(json_data)\n\n    for record in json_data[\"genomes\"]:\n        strain_id = record[\"genome_label\"]\n        genome_id = get_best_available_genome_id(record[\"genome_ID\"])\n        if genome_id is None:\n            logger.warning(\"Failed to extract genome ID from genome with label %s\", strain_id)\n            continue\n        if strain_id in mappings_dict:\n            mappings_dict[strain_id].add(genome_id)\n        else:\n            mappings_dict[strain_id] = {genome_id}\n    return mappings_dict\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.extract_mappings_original_genome_id_resolved_genome_id","title":"extract_mappings_original_genome_id_resolved_genome_id","text":"<pre><code>extract_mappings_original_genome_id_resolved_genome_id(\n    genome_status_json_file: str | PathLike,\n) -&gt; dict[str, str]\n</code></pre> <p>Extract mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".</p> <p>Tip</p> <p>The <code>genome_status_json_file</code> is generated by the podp_download_and_extract_antismash_data function with a default file name GENOME_STATUS_FILENAME.</p> <p>Parameters:</p> <ul> <li> <code>genome_status_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome status JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>Key is original genome id and value is resolved genome id.</p> </li> </ul> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def extract_mappings_original_genome_id_resolved_genome_id(\n    genome_status_json_file: str | PathLike,\n) -&gt; dict[str, str]:\n    \"\"\"Extract mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".\n\n    !!! tip\n        The `genome_status_json_file` is generated by the [podp_download_and_extract_antismash_data]\n        [nplinker.genomics.antismash.podp_antismash_downloader.podp_download_and_extract_antismash_data]\n        function with a default file name [GENOME_STATUS_FILENAME][nplinker.defaults.GENOME_STATUS_FILENAME].\n\n    Args:\n        genome_status_json_file: The path to the genome status JSON file.\n\n\n    Returns:\n        Key is original genome id and value is resolved genome id.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    gs_mappings_dict = GenomeStatus.read_json(genome_status_json_file)\n    return {gs.original_id: gs.resolved_refseq_id for gs in gs_mappings_dict.values()}\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.extract_mappings_resolved_genome_id_bgc_id","title":"extract_mappings_resolved_genome_id_bgc_id","text":"<pre><code>extract_mappings_resolved_genome_id_bgc_id(\n    genome_bgc_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"resolved_genome_id &lt;-&gt; bgc_id\".</p> <p>Tip</p> <p>The <code>genome_bgc_mappings_file</code> is usually generated by the generate_mappings_genome_id_bgc_id function with a default file name GENOME_BGC_MAPPINGS_FILENAME.</p> <p>Parameters:</p> <ul> <li> <code>genome_bgc_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome BGC mappings JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is resolved genome id and value is a set of BGC ids.</p> </li> </ul> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def extract_mappings_resolved_genome_id_bgc_id(\n    genome_bgc_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"resolved_genome_id &lt;-&gt; bgc_id\".\n\n    !!! tip\n        The `genome_bgc_mappings_file` is usually generated by the\n        [generate_mappings_genome_id_bgc_id][nplinker.genomics.utils.generate_mappings_genome_id_bgc_id]\n        function with a default file name [GENOME_BGC_MAPPINGS_FILENAME][nplinker.defaults.GENOME_BGC_MAPPINGS_FILENAME].\n\n    Args:\n        genome_bgc_mappings_file: The path to the genome BGC\n            mappings JSON file.\n\n    Returns:\n        Key is resolved genome id and value is a set of BGC ids.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    with open(genome_bgc_mappings_file, \"r\") as f:\n        json_data = json.load(f)\n\n    # validate the JSON data\n    validate(json_data, GENOME_BGC_MAPPINGS_SCHEMA)\n\n    return {mapping[\"genome_ID\"]: set(mapping[\"BGC_ID\"]) for mapping in json_data[\"mappings\"]}\n</code></pre>"},{"location":"api/genomics_utils/#nplinker.genomics.utils.get_mappings_strain_id_bgc_id","title":"get_mappings_strain_id_bgc_id","text":"<pre><code>get_mappings_strain_id_bgc_id(\n    mappings_strain_id_original_genome_id: Mapping[\n        str, set[str]\n    ],\n    mappings_original_genome_id_resolved_genome_id: Mapping[\n        str, str\n    ],\n    mappings_resolved_genome_id_bgc_id: Mapping[\n        str, set[str]\n    ],\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Get mappings \"strain_id &lt;-&gt; bgc_id\".</p> <p>Parameters:</p> <ul> <li> <code>mappings_strain_id_original_genome_id</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"strain_id &lt;-&gt; original_genome_id\".</p> </li> <li> <code>mappings_original_genome_id_resolved_genome_id</code>               (<code>Mapping[str, str]</code>)           \u2013            <p>Mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".</p> </li> <li> <code>mappings_resolved_genome_id_bgc_id</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"resolved_genome_id &lt;-&gt; bgc_id\".</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of BGC ids.</p> </li> </ul> See Also <ul> <li><code>extract_mappings_strain_id_original_genome_id</code>: Extract mappings     \"strain_id &lt;-&gt; original_genome_id\".</li> <li><code>extract_mappings_original_genome_id_resolved_genome_id</code>: Extract mappings     \"original_genome_id &lt;-&gt; resolved_genome_id\".</li> <li><code>extract_mappings_resolved_genome_id_bgc_id</code>: Extract mappings     \"resolved_genome_id &lt;-&gt; bgc_id\".</li> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/genomics/utils.py</code> <pre><code>def get_mappings_strain_id_bgc_id(\n    mappings_strain_id_original_genome_id: Mapping[str, set[str]],\n    mappings_original_genome_id_resolved_genome_id: Mapping[str, str],\n    mappings_resolved_genome_id_bgc_id: Mapping[str, set[str]],\n) -&gt; dict[str, set[str]]:\n    \"\"\"Get mappings \"strain_id &lt;-&gt; bgc_id\".\n\n    Args:\n        mappings_strain_id_original_genome_id: Mappings \"strain_id &lt;-&gt; original_genome_id\".\n        mappings_original_genome_id_resolved_genome_id: Mappings \"original_genome_id &lt;-&gt; resolved_genome_id\".\n        mappings_resolved_genome_id_bgc_id: Mappings \"resolved_genome_id &lt;-&gt; bgc_id\".\n\n    Returns:\n        Key is strain id and value is a set of BGC ids.\n\n    See Also:\n        - `extract_mappings_strain_id_original_genome_id`: Extract mappings\n            \"strain_id &lt;-&gt; original_genome_id\".\n        - `extract_mappings_original_genome_id_resolved_genome_id`: Extract mappings\n            \"original_genome_id &lt;-&gt; resolved_genome_id\".\n        - `extract_mappings_resolved_genome_id_bgc_id`: Extract mappings\n            \"resolved_genome_id &lt;-&gt; bgc_id\".\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict = {}\n    for strain_id, original_genome_ids in mappings_strain_id_original_genome_id.items():\n        bgc_ids = set()\n        for original_genome_id in original_genome_ids:\n            resolved_genome_id = mappings_original_genome_id_resolved_genome_id[original_genome_id]\n            if (bgc_id := mappings_resolved_genome_id_bgc_id.get(resolved_genome_id)) is not None:\n                bgc_ids.update(bgc_id)\n        if bgc_ids:\n            mappings_dict[strain_id] = bgc_ids\n    return mappings_dict\n</code></pre>"},{"location":"api/gnps/","title":"GNPS","text":""},{"location":"api/gnps/#nplinker.metabolomics.gnps","title":"nplinker.metabolomics.gnps","text":""},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat","title":"GNPSFormat","text":"<p>               Bases: <code>Enum</code></p> <p>Enum class for GNPS formats or workflows.</p> Concept <p>GNPS data</p> <p>The name of the enum is a short name for the workflow, and the value of the enum is the workflow name used on the GNPS website.</p>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.SNETS","title":"SNETS  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>SNETS = 'METABOLOMICS-SNETS'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.SNETSV2","title":"SNETSV2  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>SNETSV2 = 'METABOLOMICS-SNETS-V2'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.FBMN","title":"FBMN  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>FBMN = 'FEATURE-BASED-MOLECULAR-NETWORKING'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFormat.Unknown","title":"Unknown  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>Unknown = 'Unknown-GNPS-Workflow'\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader","title":"GNPSDownloader","text":"<pre><code>GNPSDownloader(task_id: str, download_root: str | PathLike)\n</code></pre> <p>Download GNPS zip archive for the given task id.</p> Concept <p>GNPS data</p> <p>Note that only GNPS workflows listed in the GNPSFormat enum are supported.</p> <p>Attributes:</p> <ul> <li> <code>GNPS_DATA_DOWNLOAD_URL</code>               (<code>str</code>)           \u2013            <p>URL template for downloading GNPS data.</p> </li> <li> <code>GNPS_DATA_DOWNLOAD_URL_FBMN</code>               (<code>str</code>)           \u2013            <p>URL template for downloading GNPS data for FBMN.</p> </li> <li> <code>gnps_format</code>               (<code>GNPSFormat</code>)           \u2013            <p>GNPS workflow type.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>task_id</code>               (<code>str</code>)           \u2013            <p>GNPS task id, identifying the data to be downloaded.</p> </li> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path where to store the downloaded archive.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the given task id does not correspond to a supported GNPS workflow.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; GNPSDownloader(\"c22f44b14a3d450eb836d607cb9521bb\", \"~/downloads\")\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def __init__(self, task_id: str, download_root: str | PathLike):\n    \"\"\"Initialize the GNPSDownloader.\n\n    Args:\n        task_id: GNPS task id, identifying the data to be downloaded.\n        download_root: Path where to store the downloaded archive.\n\n    Raises:\n        ValueError: If the given task id does not correspond to a supported\n            GNPS workflow.\n\n    Examples:\n        &gt;&gt;&gt; GNPSDownloader(\"c22f44b14a3d450eb836d607cb9521bb\", \"~/downloads\")\n    \"\"\"\n    gnps_format = gnps_format_from_task_id(task_id)\n    if gnps_format == GNPSFormat.Unknown:\n        raise ValueError(\n            f\"Unknown workflow type for GNPS task '{task_id}'.\"\n            f\"Supported GNPS workflows are described in the GNPSFormat enum, \"\n            f\"including such as 'METABOLOMICS-SNETS', 'METABOLOMICS-SNETS-V2' \"\n            f\"and 'FEATURE-BASED-MOLECULAR-NETWORKING'.\"\n        )\n\n    self._task_id = task_id\n    self._download_root: Path = Path(download_root)\n    self._gnps_format = gnps_format\n    self._file_name = gnps_format.value + \"-\" + self._task_id + \".zip\"\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.GNPS_DATA_DOWNLOAD_URL","title":"GNPS_DATA_DOWNLOAD_URL  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>GNPS_DATA_DOWNLOAD_URL: str = (\n    \"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&amp;view=download_clustered_spectra\"\n)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.GNPS_DATA_DOWNLOAD_URL_FBMN","title":"GNPS_DATA_DOWNLOAD_URL_FBMN  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>GNPS_DATA_DOWNLOAD_URL_FBMN: str = (\n    \"https://gnps.ucsd.edu/ProteoSAFe/DownloadResult?task={}&amp;view=download_cytoscape_data\"\n)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.gnps_format","title":"gnps_format  <code>property</code>","text":"<pre><code>gnps_format: GNPSFormat\n</code></pre> <p>Get the GNPS workflow type.</p> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>GNPS workflow type.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.download","title":"download","text":"<pre><code>download() -&gt; Self\n</code></pre> <p>Download GNPS data.</p> <p>Note: GNPS data is downloaded using the POST method (empty payload is OK).</p> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def download(self) -&gt; Self:\n    \"\"\"Download GNPS data.\n\n    Note: GNPS data is downloaded using the POST method (empty payload is OK).\n    \"\"\"\n    download_url(\n        self.get_url(), self._download_root, filename=self._file_name, http_method=\"POST\"\n    )\n    return self\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.get_download_file","title":"get_download_file","text":"<pre><code>get_download_file() -&gt; str\n</code></pre> <p>Get the path to the downloaded file.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Download path as string</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def get_download_file(self) -&gt; str:\n    \"\"\"Get the path to the downloaded file.\n\n    Returns:\n        Download path as string\n    \"\"\"\n    return str(Path(self._download_root) / self._file_name)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.get_task_id","title":"get_task_id","text":"<pre><code>get_task_id() -&gt; str\n</code></pre> <p>Get the GNPS task id.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Task id as string.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def get_task_id(self) -&gt; str:\n    \"\"\"Get the GNPS task id.\n\n    Returns:\n        Task id as string.\n    \"\"\"\n    return self._task_id\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSDownloader.get_url","title":"get_url","text":"<pre><code>get_url() -&gt; str\n</code></pre> <p>Get the download URL.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>URL pointing to the GNPS data to be downloaded.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_downloader.py</code> <pre><code>def get_url(self) -&gt; str:\n    \"\"\"Get the download URL.\n\n    Returns:\n        URL pointing to the GNPS data to be downloaded.\n    \"\"\"\n    if self.gnps_format == GNPSFormat.FBMN:\n        return GNPSDownloader.GNPS_DATA_DOWNLOAD_URL_FBMN.format(self._task_id)\n    return GNPSDownloader.GNPS_DATA_DOWNLOAD_URL.format(self._task_id)\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSExtractor","title":"GNPSExtractor","text":"<pre><code>GNPSExtractor(\n    file: str | PathLike, extract_dir: str | PathLike\n)\n</code></pre> <p>Extract files from a GNPS molecular networking archive (.zip).</p> Concept <p>GNPS data</p> <p>Four files are extracted and renamed to the following names:</p> <ul> <li>file_mappings(.tsv/.csv)</li> <li>spectra.mgf</li> <li>molecular_families.tsv</li> <li>annotations.tsv</li> </ul> <p>The files to be extracted are selected based on the GNPS workflow type, as described below (in the order of the files above):</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv</li> <li>METABOLOMICS-SNETS*.mgf</li> <li>networkedges_selfloop/*.pairsinfo</li> <li>result_specnets_DB/*.tsv</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary</li> <li>METABOLOMICS-SNETS-V2*.mgf</li> <li>networkedges_selfloop/*.selfloop</li> <li>result_specnets_DB/.tsv</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>quantification_table/.csv</li> <li>spectra/*.mgf</li> <li>networkedges_selfloop/*.selfloop</li> <li>DB_result/*.tsv</li> </ul> </li> </ol> <p>Attributes:</p> <ul> <li> <code>gnps_format</code>               (<code>GNPSFormat</code>)           \u2013            <p>The GNPS workflow type.</p> </li> <li> <code>extract_dir</code>               (<code>str</code>)           \u2013            <p>The path where to extract the files to.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the GNPS zip file.</p> </li> <li> <code>extract_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>path to the directory where to extract the files to.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the given file is an invalid GNPS archive.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gnps_extractor = GNPSExtractor(\"path/to/gnps_archive.zip\", \"path/to/extract_dir\")\n&gt;&gt;&gt; gnps_extractor.gnps_format\n&lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n&gt;&gt;&gt; gnps_extractor.extract_dir\n'path/to/extract_dir'\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_extractor.py</code> <pre><code>def __init__(self, file: str | PathLike, extract_dir: str | PathLike):\n    \"\"\"Initialize the GNPSExtractor.\n\n    Args:\n        file: The path to the GNPS zip file.\n        extract_dir: path to the directory where to extract the files to.\n\n    Raises:\n        ValueError: If the given file is an invalid GNPS archive.\n\n    Examples:\n        &gt;&gt;&gt; gnps_extractor = GNPSExtractor(\"path/to/gnps_archive.zip\", \"path/to/extract_dir\")\n        &gt;&gt;&gt; gnps_extractor.gnps_format\n        &lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n        &gt;&gt;&gt; gnps_extractor.extract_dir\n        'path/to/extract_dir'\n    \"\"\"\n    gnps_format = gnps_format_from_archive(file)\n    if gnps_format == GNPSFormat.Unknown:\n        raise ValueError(\n            f\"Unknown workflow type for GNPS archive '{file}'.\"\n            f\"Supported GNPS workflows are described in the GNPSFormat enum, \"\n            f\"including such as 'METABOLOMICS-SNETS', 'METABOLOMICS-SNETS-V2' \"\n            f\"and 'FEATURE-BASED-MOLECULAR-NETWORKING'.\"\n        )\n\n    self._file = Path(file)\n    self._extract_path = Path(extract_dir)\n    self._gnps_format = gnps_format\n    # the order of filenames matters\n    self._target_files = [\n        \"file_mappings\",\n        \"spectra.mgf\",\n        \"molecular_families.tsv\",\n        \"annotations.tsv\",\n    ]\n\n    self._extract()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSExtractor.gnps_format","title":"gnps_format  <code>property</code>","text":"<pre><code>gnps_format: GNPSFormat\n</code></pre> <p>Get the GNPS workflow type.</p> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>GNPS workflow type.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSExtractor.extract_dir","title":"extract_dir  <code>property</code>","text":"<pre><code>extract_dir: str\n</code></pre> <p>Get the path where to extract the files to.</p> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Path where to extract files as string.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSSpectrumLoader","title":"GNPSSpectrumLoader","text":"<pre><code>GNPSSpectrumLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>SpectrumLoaderBase</code></p> <p>Load mass spectra from the given GNPS MGF file.</p> Concept <p>GNPS data</p> <p>The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>METABOLOMICS-SNETS*.mgf</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>METABOLOMICS-SNETS-V2*.mgf</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>spectra/*.mgf</li> </ul> </li> </ol> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>path to the MGF file.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Raises ValueError if the file is not valid.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSSpectrumLoader(\"gnps_spectra.mgf\")\n&gt;&gt;&gt; print(loader.spectra[0])\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_spectrum_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSSpectrumLoader.\n\n    Args:\n        file: path to the MGF file.\n\n    Raises:\n        ValueError: Raises ValueError if the file is not valid.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSSpectrumLoader(\"gnps_spectra.mgf\")\n        &gt;&gt;&gt; print(loader.spectra[0])\n    \"\"\"\n    self._file = str(file)\n    self._spectra: list[Spectrum] = []\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSSpectrumLoader.spectra","title":"spectra  <code>property</code>","text":"<pre><code>spectra: list[Spectrum]\n</code></pre> <p>Get the list of Spectrum objects.</p> <p>Returns:</p> <ul> <li> <code>list[Spectrum]</code>           \u2013            <p>list[Spectrum]: the loaded spectra as a list of <code>Spectrum</code> objects.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSMolecularFamilyLoader","title":"GNPSMolecularFamilyLoader","text":"<pre><code>GNPSMolecularFamilyLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>MolecularFamilyLoaderBase</code></p> <p>Load molecular families from GNPS data.</p> Concept <p>GNPS data</p> <p>The molecular family file is from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>networkedges_selfloop/*.pairsinfo</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>networkedges_selfloop/*.selfloop</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>networkedges_selfloop/*.selfloop</li> </ul> </li> </ol> <p>The <code>ComponentIndex</code> column in the GNPS molecular family file is treated as family id.</p> <p>But for molecular families that have only one member (i.e. spectrum), named singleton molecular families, their files have the same value of <code>-1</code> in the <code>ComponentIndex</code> column. To make the family id unique,the spectrum id plus a prefix <code>singleton-</code> is used as the family id of singleton molecular families.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS molecular family file.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Raises ValueError if the file is not valid.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSMolecularFamilyLoader(\"gnps_molecular_families.tsv\")\n&gt;&gt;&gt; print(loader.families)\n[&lt;MolecularFamily 1&gt;, &lt;MolecularFamily 2&gt;, ...]\n&gt;&gt;&gt; print(loader.families[0].spectra_ids)\n{'1', '3', '7', ...}\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSMolecularFamilyLoader.\n\n    Args:\n        file: Path to the GNPS molecular family file.\n\n    Raises:\n        ValueError: Raises ValueError if the file is not valid.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSMolecularFamilyLoader(\"gnps_molecular_families.tsv\")\n        &gt;&gt;&gt; print(loader.families)\n        [&lt;MolecularFamily 1&gt;, &lt;MolecularFamily 2&gt;, ...]\n        &gt;&gt;&gt; print(loader.families[0].spectra_ids)\n        {'1', '3', '7', ...}\n    \"\"\"\n    self._mfs: list[MolecularFamily] = []\n    self._file = file\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSMolecularFamilyLoader.get_mfs","title":"get_mfs","text":"<pre><code>get_mfs(\n    keep_singleton: bool = False,\n) -&gt; list[MolecularFamily]\n</code></pre> <p>Get MolecularFamily objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_singleton</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>True to keep singleton molecular families. A singleton molecular family is a molecular family that contains only one spectrum.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[MolecularFamily]</code>           \u2013            <p>A list of MolecularFamily objects with their spectra ids.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_molecular_family_loader.py</code> <pre><code>def get_mfs(self, keep_singleton: bool = False) -&gt; list[MolecularFamily]:\n    \"\"\"Get MolecularFamily objects.\n\n    Args:\n        keep_singleton: True to keep singleton molecular families. A\n            singleton molecular family is a molecular family that contains\n            only one spectrum.\n\n    Returns:\n        A list of MolecularFamily objects with their spectra ids.\n    \"\"\"\n    mfs = self._mfs\n    if not keep_singleton:\n        mfs = [mf for mf in mfs if not mf.is_singleton()]\n    return mfs\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSAnnotationLoader","title":"GNPSAnnotationLoader","text":"<pre><code>GNPSAnnotationLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>AnnotationLoaderBase</code></p> <p>Load annotations from GNPS output file.</p> Concept <p>GNPS data</p> <p>The annotation file is a <code>.tsv</code> file from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>result_specnets_DB/*.tsv</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>result_specnets_DB/.tsv</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>DB_result/*.tsv</li> </ul> </li> </ol> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>The GNPS annotation file.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSAnnotationLoader(\"gnps_annotations.tsv\")\n&gt;&gt;&gt; print(loader.annotations[\"100\"])\n{'#Scan#': '100',\n'Adduct': 'M+H',\n'CAS_Number': 'N/A',\n'Charge': '1',\n'Compound_Name': 'MLS002153841-01!Iobenguane sulfate',\n'Compound_Source': 'NIH Pharmacologically Active Library',\n'Data_Collector': 'VP/LMS',\n'ExactMass': '274.992',\n'INCHI': 'N/A',\n'INCHI_AUX': 'N/A',\n'Instrument': 'qTof',\n'IonMode': 'Positive',\n'Ion_Source': 'LC-ESI',\n'LibMZ': '276.003',\n'LibraryName': 'lib-00014.mgf',\n'LibraryQualityString': 'Gold',\n'Library_Class': '1',\n'MQScore': '0.704152',\n'MZErrorPPM': '405416',\n'MassDiff': '111.896',\n'Organism': 'GNPS-NIH-SMALLMOLECULEPHARMACOLOGICALLYACTIVE',\n'PI': 'Dorrestein',\n'Precursor_MZ': '276.003',\n'Pubmed_ID': 'N/A',\n'RT_Query': '795.979',\n'SharedPeaks': '7',\n'Smiles': 'NC(=N)NCc1cccc(I)c1.OS(=O)(=O)O',\n'SpecCharge': '1',\n'SpecMZ': '164.107',\n'SpectrumFile': 'spectra/specs_ms.pklbin',\n'SpectrumID': 'CCMSLIB00000086167',\n'TIC_Query': '986.997',\n'UpdateWorkflowName': 'UPDATE-SINGLE-ANNOTATED-GOLD',\n'tags': ' ',\n'png_url': 'https://metabolomics-usi.gnps2.org/png/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n'json_url': 'https://metabolomics-usi.gnps2.org/json/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n'svg_url': 'https://metabolomics-usi.gnps2.org/svg/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n'spectrum_url': 'https://metabolomics-usi.gnps2.org/spectrum/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167'}\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_annotation_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSAnnotationLoader.\n\n    Args:\n        file: The GNPS annotation file.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSAnnotationLoader(\"gnps_annotations.tsv\")\n        &gt;&gt;&gt; print(loader.annotations[\"100\"])\n        {'#Scan#': '100',\n        'Adduct': 'M+H',\n        'CAS_Number': 'N/A',\n        'Charge': '1',\n        'Compound_Name': 'MLS002153841-01!Iobenguane sulfate',\n        'Compound_Source': 'NIH Pharmacologically Active Library',\n        'Data_Collector': 'VP/LMS',\n        'ExactMass': '274.992',\n        'INCHI': 'N/A',\n        'INCHI_AUX': 'N/A',\n        'Instrument': 'qTof',\n        'IonMode': 'Positive',\n        'Ion_Source': 'LC-ESI',\n        'LibMZ': '276.003',\n        'LibraryName': 'lib-00014.mgf',\n        'LibraryQualityString': 'Gold',\n        'Library_Class': '1',\n        'MQScore': '0.704152',\n        'MZErrorPPM': '405416',\n        'MassDiff': '111.896',\n        'Organism': 'GNPS-NIH-SMALLMOLECULEPHARMACOLOGICALLYACTIVE',\n        'PI': 'Dorrestein',\n        'Precursor_MZ': '276.003',\n        'Pubmed_ID': 'N/A',\n        'RT_Query': '795.979',\n        'SharedPeaks': '7',\n        'Smiles': 'NC(=N)NCc1cccc(I)c1.OS(=O)(=O)O',\n        'SpecCharge': '1',\n        'SpecMZ': '164.107',\n        'SpectrumFile': 'spectra/specs_ms.pklbin',\n        'SpectrumID': 'CCMSLIB00000086167',\n        'TIC_Query': '986.997',\n        'UpdateWorkflowName': 'UPDATE-SINGLE-ANNOTATED-GOLD',\n        'tags': ' ',\n        'png_url': 'https://metabolomics-usi.gnps2.org/png/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n        'json_url': 'https://metabolomics-usi.gnps2.org/json/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n        'svg_url': 'https://metabolomics-usi.gnps2.org/svg/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167',\n        'spectrum_url': 'https://metabolomics-usi.gnps2.org/spectrum/?usi1=mzspec:GNPS:GNPS-LIBRARY:accession:CCMSLIB00000086167'}\n    \"\"\"\n    self._file = Path(file)\n    self._annotations: dict[str, dict] = {}\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSAnnotationLoader.annotations","title":"annotations  <code>property</code>","text":"<pre><code>annotations: dict[str, dict]\n</code></pre> <p>Get annotations.</p> <p>Returns:</p> <ul> <li> <code>dict[str, dict]</code>           \u2013            <p>Keys are spectrum ids (\"#Scan#\" in annotation file) and values are the annotations dict</p> </li> <li> <code>dict[str, dict]</code>           \u2013            <p>for each spectrum.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFileMappingLoader","title":"GNPSFileMappingLoader","text":"<pre><code>GNPSFileMappingLoader(file: str | PathLike)\n</code></pre> <p>               Bases: <code>FileMappingLoaderBase</code></p> <p>Class to load file mappings from GNPS output file.</p> Concept <p>GNPS data</p> <p>File mappings refers to the mapping from spectrum id to files in which this spectrum occurs.</p> <p>The file mappings file is from GNPS output archive, as described below for each GNPS workflow type:</p> <ol> <li>METABOLOMICS-SNETS<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv</li> </ul> </li> <li>METABOLOMICS-SNETS-V2<ul> <li>clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary</li> </ul> </li> <li>FEATURE-BASED-MOLECULAR-NETWORKING<ul> <li>quantification_table/.csv</li> </ul> </li> </ol> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS file mappings file.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>Raises ValueError if the file is not valid.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = GNPSFileMappingLoader(\"gnps_file_mappings.tsv\")\n&gt;&gt;&gt; print(loader.mappings[\"1\"])\n['26c.mzXML']\n&gt;&gt;&gt; print(loader.mapping_reversed[\"26c.mzXML\"])\n{'1', '3', '7', ...}\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_file_mapping_loader.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the GNPSFileMappingLoader.\n\n    Args:\n        file: Path to the GNPS file mappings file.\n\n    Raises:\n        ValueError: Raises ValueError if the file is not valid.\n\n    Examples:\n        &gt;&gt;&gt; loader = GNPSFileMappingLoader(\"gnps_file_mappings.tsv\")\n        &gt;&gt;&gt; print(loader.mappings[\"1\"])\n        ['26c.mzXML']\n        &gt;&gt;&gt; print(loader.mapping_reversed[\"26c.mzXML\"])\n        {'1', '3', '7', ...}\n    \"\"\"\n    self._gnps_format = gnps_format_from_file_mapping(file)\n    if self._gnps_format is GNPSFormat.Unknown:\n        raise ValueError(\"Unknown workflow type for GNPS file mappings file \")\n\n    self._file = Path(file)\n    self._mapping: dict[str, list[str]] = {}\n\n    self._validate()\n    self._load()\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFileMappingLoader.mappings","title":"mappings  <code>property</code>","text":"<pre><code>mappings: dict[str, list[str]]\n</code></pre> <p>Return mapping from spectrum id to files in which this spectrum occurs.</p> <p>Returns:</p> <ul> <li> <code>dict[str, list[str]]</code>           \u2013            <p>Mapping from spectrum id to names of all files in which this spectrum occurs.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.GNPSFileMappingLoader.mapping_reversed","title":"mapping_reversed  <code>property</code>","text":"<pre><code>mapping_reversed: dict[str, set[str]]\n</code></pre> <p>Return mapping from file name to all spectra that occur in this file.</p> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Mapping from file name to all spectra ids that occur in this file.</p> </li> </ul>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.gnps_format_from_archive","title":"gnps_format_from_archive","text":"<pre><code>gnps_format_from_archive(\n    zip_file: str | PathLike,\n) -&gt; GNPSFormat\n</code></pre> <p>Detect GNPS format from GNPS zip archive.</p> <p>The detection is based on the filename of the zip file and the names of the files contained in the zip file.</p> <p>Parameters:</p> <ul> <li> <code>zip_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the GNPS zip file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>The format identified in the GNPS zip file.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-c22f44b1-download_clustered_spectra.zip\")\n&lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n&gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip\")\n&lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n&gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-672d0a53-download_cytoscape_data.zip\")\n&lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_format.py</code> <pre><code>def gnps_format_from_archive(zip_file: str | PathLike) -&gt; GNPSFormat:\n    \"\"\"Detect GNPS format from GNPS zip archive.\n\n    The detection is based on the filename of the zip file and the names of the\n    files contained in the zip file.\n\n    Args:\n        zip_file: Path to the GNPS zip file.\n\n    Returns:\n        The format identified in the GNPS zip file.\n\n    Examples:\n        &gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-c22f44b1-download_clustered_spectra.zip\")\n        &lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n        &gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-METABOLOMICS-SNETS-V2-189e8bf1-download_clustered_spectra.zip\")\n        &lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n        &gt;&gt;&gt; gnps_format_from_archive(\"ProteoSAFe-FEATURE-BASED-MOLECULAR-NETWORKING-672d0a53-download_cytoscape_data.zip\")\n        &lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n    \"\"\"\n    file = Path(zip_file)\n    # Guess the format from the filename of the zip file\n    if GNPSFormat.FBMN.value in file.name:\n        return GNPSFormat.FBMN\n    # the order of the if statements matters for the following two\n    if GNPSFormat.SNETSV2.value in file.name:\n        return GNPSFormat.SNETSV2\n    if GNPSFormat.SNETS.value in file.name:\n        return GNPSFormat.SNETS\n\n    # Guess the format from the names of the files in the zip file\n    with zipfile.ZipFile(file) as archive:\n        filenames = archive.namelist()\n    if any(GNPSFormat.FBMN.value in x for x in filenames):\n        return GNPSFormat.FBMN\n    # the order of the if statements matters for the following two\n    if any(GNPSFormat.SNETSV2.value in x for x in filenames):\n        return GNPSFormat.SNETSV2\n    if any(GNPSFormat.SNETS.value in x for x in filenames):\n        return GNPSFormat.SNETS\n\n    return GNPSFormat.Unknown\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.gnps_format_from_file_mapping","title":"gnps_format_from_file_mapping","text":"<pre><code>gnps_format_from_file_mapping(\n    file: str | PathLike,\n) -&gt; GNPSFormat\n</code></pre> <p>Detect GNPS format from the given file mapping file.</p> <p>The GNPS file mapping file is located in different folders depending on the GNPS workflow. Here are the locations in corresponding GNPS zip archives:</p> <ul> <li><code>METABOLOMICS-SNETS</code> workflow: the <code>.tsv</code> file in the folder     <code>clusterinfosummarygroup_attributes_withIDs_withcomponentID</code></li> <li><code>METABOLOMICS-SNETS-V2</code> workflow: the <code>.clustersummary</code> file (tsv) in the folder     <code>clusterinfosummarygroup_attributes_withIDs_withcomponentID</code></li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code> workflow: the <code>.csv</code> file in the folder     <code>quantification_table</code></li> </ul> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file to peek the format for.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>GNPS format identified in the file.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/gnps/gnps_format.py</code> <pre><code>def gnps_format_from_file_mapping(file: str | PathLike) -&gt; GNPSFormat:\n    \"\"\"Detect GNPS format from the given file mapping file.\n\n    The GNPS file mapping file is located in different folders depending on the\n    GNPS workflow. Here are the locations in corresponding GNPS zip archives:\n\n    - `METABOLOMICS-SNETS` workflow: the `.tsv` file in the folder\n        `clusterinfosummarygroup_attributes_withIDs_withcomponentID`\n    - `METABOLOMICS-SNETS-V2` workflow: the `.clustersummary` file (tsv) in the folder\n        `clusterinfosummarygroup_attributes_withIDs_withcomponentID`\n    - `FEATURE-BASED-MOLECULAR-NETWORKING` workflow: the `.csv` file in the folder\n        `quantification_table`\n\n    Args:\n        file: Path to the file to peek the format for.\n\n    Returns:\n        GNPS format identified in the file.\n    \"\"\"\n    with open(file, \"r\") as f:\n        header = f.readline().strip()\n\n    if re.search(r\"\\bAllFiles\\b\", header):\n        return GNPSFormat.SNETS\n    if re.search(r\"\\bUniqueFileSources\\b\", header):\n        return GNPSFormat.SNETSV2\n    if re.search(r\"\\b{}\\b\".format(re.escape(\"row ID\")), header):\n        return GNPSFormat.FBMN\n    return GNPSFormat.Unknown\n</code></pre>"},{"location":"api/gnps/#nplinker.metabolomics.gnps.gnps_format_from_task_id","title":"gnps_format_from_task_id","text":"<pre><code>gnps_format_from_task_id(task_id: str) -&gt; GNPSFormat\n</code></pre> <p>Detect GNPS format for the given task id.</p> <p>Parameters:</p> <ul> <li> <code>task_id</code>               (<code>str</code>)           \u2013            <p>GNPS task id.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GNPSFormat</code>           \u2013            <p>The format identified in the GNPS task.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; gnps_format_from_task_id(\"c22f44b14a3d450eb836d607cb9521bb\")\n&lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n&gt;&gt;&gt; gnps_format_from_task_id(\"189e8bf16af145758b0a900f1c44ff4a\")\n&lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n&gt;&gt;&gt; gnps_format_from_task_id(\"92036537c21b44c29e509291e53f6382\")\n&lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n&gt;&gt;&gt; gnps_format_from_task_id(\"0ad6535e34d449788f297e712f43068a\")\n&lt;GNPSFormat.Unknown: 'Unknown-GNPS-Workflow'&gt;\n</code></pre> Source code in <code>src/nplinker/metabolomics/gnps/gnps_format.py</code> <pre><code>def gnps_format_from_task_id(task_id: str) -&gt; GNPSFormat:\n    \"\"\"Detect GNPS format for the given task id.\n\n    Args:\n        task_id: GNPS task id.\n\n    Returns:\n        The format identified in the GNPS task.\n\n    Examples:\n        &gt;&gt;&gt; gnps_format_from_task_id(\"c22f44b14a3d450eb836d607cb9521bb\")\n        &lt;GNPSFormat.SNETS: 'METABOLOMICS-SNETS'&gt;\n        &gt;&gt;&gt; gnps_format_from_task_id(\"189e8bf16af145758b0a900f1c44ff4a\")\n        &lt;GNPSFormat.SNETSV2: 'METABOLOMICS-SNETS-V2'&gt;\n        &gt;&gt;&gt; gnps_format_from_task_id(\"92036537c21b44c29e509291e53f6382\")\n        &lt;GNPSFormat.FBMN: 'FEATURE-BASED-MOLECULAR-NETWORKING'&gt;\n        &gt;&gt;&gt; gnps_format_from_task_id(\"0ad6535e34d449788f297e712f43068a\")\n        &lt;GNPSFormat.Unknown: 'Unknown-GNPS-Workflow'&gt;\n    \"\"\"\n    task_html = httpx.get(GNPS_TASK_URL.format(task_id))\n    soup = BeautifulSoup(task_html.text, features=\"html.parser\")\n    try:\n        # find the td tag that follows the th tag containing 'Workflow'\n        workflow_tag = soup.find(\"th\", string=\"Workflow\").find_next_sibling(\"td\")  # type: ignore\n        workflow_format = workflow_tag.contents[0].strip()  # type: ignore\n    except AttributeError:\n        return GNPSFormat.Unknown\n\n    if workflow_format == GNPSFormat.FBMN.value:\n        return GNPSFormat.FBMN\n    if workflow_format == GNPSFormat.SNETSV2.value:\n        return GNPSFormat.SNETSV2\n    if workflow_format == GNPSFormat.SNETS.value:\n        return GNPSFormat.SNETS\n    return GNPSFormat.Unknown\n</code></pre>"},{"location":"api/loader/","title":"Dataset Loader","text":""},{"location":"api/loader/#nplinker.loader","title":"nplinker.loader","text":""},{"location":"api/loader/#nplinker.loader.DatasetLoader","title":"DatasetLoader","text":"<pre><code>DatasetLoader(config: Dynaconf)\n</code></pre> <p>Load datasets from the working directory with the given configuration.</p> Concept and Diagram <p>Working Directory Structure</p> <p>Dataset Loading Pipeline</p> <p>Loaded data are stored in the data containers (attributes), e.g. <code>self.bgcs</code>, <code>self.gcfs</code>, etc.</p> <p>Attributes:</p> <ul> <li> <code>config</code>           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> <li> <code>bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of BGC objects.</p> </li> <li> <code>gcfs</code>               (<code>list[GCF]</code>)           \u2013            <p>A list of GCF objects.</p> </li> <li> <code>spectra</code>               (<code>list[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> <li> <code>mfs</code>               (<code>list[MolecularFamily]</code>)           \u2013            <p>A list of MolecularFamily objects.</p> </li> <li> <code>mibig_bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of MIBiG BGC objects.</p> </li> <li> <code>mibig_strains_in_use</code>               (<code>StrainCollection</code>)           \u2013            <p>A StrainCollection object that contains the strains in use from MIBiG.</p> </li> <li> <code>product_types</code>               (<code>list</code>)           \u2013            <p>A list of product types.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A StrainCollection object that contains all strains.</p> </li> <li> <code>class_matches</code>           \u2013            <p>A ClassMatches object that contains class match info.</p> </li> <li> <code>chem_classes</code>           \u2013            <p>A ChemClassPredictions object that contains chemical class predictions.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>config</code>               (<code>Dynaconf</code>)           \u2013            <p>A Dynaconf object that contains the configuration settings.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; from nplinker.config import load_config\n&gt;&gt;&gt; from nplinker.loader import DatasetLoader\n&gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n&gt;&gt;&gt; loader = DatasetLoader(config)\n&gt;&gt;&gt; loader.load()\n</code></pre> See Also <p>DatasetArranger: Download, generate and/or validate     datasets to ensure they are ready for loading.</p> Source code in <code>src/nplinker/loader.py</code> <pre><code>def __init__(self, config: Dynaconf) -&gt; None:\n    \"\"\"Initialize the DatasetLoader.\n\n    Args:\n        config: A Dynaconf object that contains the configuration settings.\n\n    Examples:\n        &gt;&gt;&gt; from nplinker.config import load_config\n        &gt;&gt;&gt; from nplinker.loader import DatasetLoader\n        &gt;&gt;&gt; config = load_config(\"nplinker.toml\")\n        &gt;&gt;&gt; loader = DatasetLoader(config)\n        &gt;&gt;&gt; loader.load()\n\n    See Also:\n        [DatasetArranger][nplinker.arranger.DatasetArranger]: Download, generate and/or validate\n            datasets to ensure they are ready for loading.\n    \"\"\"\n    self.config = config\n\n    self.bgcs: list[BGC] = []\n    self.gcfs: list[GCF] = []\n    self.spectra: list[Spectrum] = []\n    self.mfs: list[MolecularFamily] = []\n    self.mibig_bgcs: list[BGC] = []\n    self.mibig_strains_in_use: StrainCollection = StrainCollection()\n    self.product_types: list = []\n    self.strains: StrainCollection = StrainCollection()\n\n    self.class_matches = None\n    self.chem_classes = None\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.RUN_CANOPUS_DEFAULT","title":"RUN_CANOPUS_DEFAULT  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>RUN_CANOPUS_DEFAULT = False\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.EXTRA_CANOPUS_PARAMS_DEFAULT","title":"EXTRA_CANOPUS_PARAMS_DEFAULT  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>EXTRA_CANOPUS_PARAMS_DEFAULT = (\n    \"--maxmz 600 formula zodiac structure canopus\"\n)\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.OR_CANOPUS","title":"OR_CANOPUS  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>OR_CANOPUS = 'canopus_dir'\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.OR_MOLNETENHANCER","title":"OR_MOLNETENHANCER  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>OR_MOLNETENHANCER = 'molnetenhancer_dir'\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.config","title":"config  <code>instance-attribute</code>","text":"<pre><code>config = config\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.bgcs","title":"bgcs  <code>instance-attribute</code>","text":"<pre><code>bgcs: list[BGC] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.gcfs","title":"gcfs  <code>instance-attribute</code>","text":"<pre><code>gcfs: list[GCF] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.spectra","title":"spectra  <code>instance-attribute</code>","text":"<pre><code>spectra: list[Spectrum] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.mfs","title":"mfs  <code>instance-attribute</code>","text":"<pre><code>mfs: list[MolecularFamily] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.mibig_bgcs","title":"mibig_bgcs  <code>instance-attribute</code>","text":"<pre><code>mibig_bgcs: list[BGC] = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.mibig_strains_in_use","title":"mibig_strains_in_use  <code>instance-attribute</code>","text":"<pre><code>mibig_strains_in_use: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.product_types","title":"product_types  <code>instance-attribute</code>","text":"<pre><code>product_types: list = []\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.strains","title":"strains  <code>instance-attribute</code>","text":"<pre><code>strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.class_matches","title":"class_matches  <code>instance-attribute</code>","text":"<pre><code>class_matches = None\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.chem_classes","title":"chem_classes  <code>instance-attribute</code>","text":"<pre><code>chem_classes = None\n</code></pre>"},{"location":"api/loader/#nplinker.loader.DatasetLoader.load","title":"load","text":"<pre><code>load() -&gt; bool\n</code></pre> <p>Load all data from data files in the working directory.</p> <p>See Dataset Loading Pipeline for the detailed steps.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if all data are loaded successfully.</p> </li> </ul> Source code in <code>src/nplinker/loader.py</code> <pre><code>def load(self) -&gt; bool:\n    \"\"\"Load all data from data files in the working directory.\n\n    See [Dataset Loading Pipeline][dataset-loading-pipeline] for the detailed steps.\n\n    Returns:\n        True if all data are loaded successfully.\n    \"\"\"\n    if not self._load_strain_mappings():\n        return False\n\n    if not self._load_metabolomics():\n        return False\n\n    if not self._load_genomics():\n        return False\n\n    # set self.strains with all strains from input plus mibig strains in use\n    self.strains = self.strains + self.mibig_strains_in_use\n\n    if len(self.strains) == 0:\n        raise Exception(\"Failed to find *ANY* strains.\")\n\n    return True\n</code></pre>"},{"location":"api/metabolomics/","title":"Data Models","text":""},{"location":"api/metabolomics/#nplinker.metabolomics","title":"nplinker.metabolomics","text":""},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily","title":"MolecularFamily","text":"<pre><code>MolecularFamily(id: str)\n</code></pre> <p>Class to model molecular family.</p> <p>Attributes:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>Unique id for the molecular family.</p> </li> <li> <code>spectra_ids</code>               (<code>set[str]</code>)           \u2013            <p>Set of spectrum ids in the molecular family.</p> </li> <li> <code>spectra</code>               (<code>set[Spectrum]</code>)           \u2013            <p>Set of Spectrum objects in the molecular family.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>StrainCollection object that contains strains in the molecular family.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>Unique id for the molecular family.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __init__(self, id: str):\n    \"\"\"Initialize the MolecularFamily.\n\n    Args:\n        id: Unique id for the molecular family.\n    \"\"\"\n    self.id: str = id\n    self.spectra_ids: set[str] = set()\n    self._spectra: set[Spectrum] = set()\n    self._strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id: str = id\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.spectra_ids","title":"spectra_ids  <code>instance-attribute</code>","text":"<pre><code>spectra_ids: set[str] = set()\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.spectra","title":"spectra  <code>property</code>","text":"<pre><code>spectra: set[Spectrum]\n</code></pre> <p>Get Spectrum objects in the molecular family.</p>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.strains","title":"strains  <code>property</code>","text":"<pre><code>strains: StrainCollection\n</code></pre> <p>Get strains in the molecular family.</p>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __str__(self) -&gt; str:\n    return (\n        f\"MolecularFamily(id={self.id}, #Spectrum_objects={len(self._spectra)}, \"\n        f\"#spectrum_ids={len(self.spectra_ids)}, #strains={len(self._strains)})\"\n    )\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, MolecularFamily):\n        return self.id == other.id\n    return NotImplemented\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __hash__(self) -&gt; int:\n    return hash(self.id)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (self.__class__, (self.id,), self.__dict__)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.add_spectrum","title":"add_spectrum","text":"<pre><code>add_spectrum(spectrum: Spectrum) -&gt; None\n</code></pre> <p>Add a Spectrum object to the molecular family.</p> <p>Parameters:</p> <ul> <li> <code>spectrum</code>               (<code>Spectrum</code>)           \u2013            <p><code>Spectrum</code> object to add to the molecular family.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def add_spectrum(self, spectrum: Spectrum) -&gt; None:\n    \"\"\"Add a Spectrum object to the molecular family.\n\n    Args:\n        spectrum: `Spectrum` object to add to the molecular family.\n    \"\"\"\n    self._spectra.add(spectrum)\n    self.spectra_ids.add(spectrum.id)\n    self._strains = self._strains + spectrum.strains\n    # add the molecular family to the spectrum\n    spectrum.family = self\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.detach_spectrum","title":"detach_spectrum","text":"<pre><code>detach_spectrum(spectrum: Spectrum) -&gt; None\n</code></pre> <p>Remove a Spectrum object from the molecular family.</p> <p>Parameters:</p> <ul> <li> <code>spectrum</code>               (<code>Spectrum</code>)           \u2013            <p><code>Spectrum</code> object to remove from the molecular family.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def detach_spectrum(self, spectrum: Spectrum) -&gt; None:\n    \"\"\"Remove a Spectrum object from the molecular family.\n\n    Args:\n        spectrum: `Spectrum` object to remove from the molecular family.\n    \"\"\"\n    self._spectra.remove(spectrum)\n    self.spectra_ids.remove(spectrum.id)\n    self._strains = self._update_strains()\n    # remove the molecular family from the spectrum\n    spectrum.family = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.has_strain","title":"has_strain","text":"<pre><code>has_strain(strain: Strain) -&gt; bool\n</code></pre> <p>Check if the given strain exists.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p><code>Strain</code> object.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the given strain exists.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def has_strain(self, strain: Strain) -&gt; bool:\n    \"\"\"Check if the given strain exists.\n\n    Args:\n        strain: `Strain` object.\n\n    Returns:\n        True when the given strain exists.\n    \"\"\"\n    return strain in self._strains\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.MolecularFamily.is_singleton","title":"is_singleton","text":"<pre><code>is_singleton() -&gt; bool\n</code></pre> <p>Check if the molecular family contains only one spectrum.</p> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the molecular family has only one spectrum.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/molecular_family.py</code> <pre><code>def is_singleton(self) -&gt; bool:\n    \"\"\"Check if the molecular family contains only one spectrum.\n\n    Returns:\n        True when the molecular family has only one spectrum.\n    \"\"\"\n    return len(self.spectra_ids) == 1\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum","title":"Spectrum","text":"<pre><code>Spectrum(\n    id: str,\n    mz: list[float],\n    intensity: list[float],\n    precursor_mz: float,\n    rt: float = 0,\n    metadata: dict | None = None,\n)\n</code></pre> <p>Class to model MS/MS Spectrum.</p> <p>Attributes:</p> <ul> <li> <code>id</code>           \u2013            <p>the spectrum ID.</p> </li> <li> <code>mz</code>           \u2013            <p>the list of m/z values.</p> </li> <li> <code>intensity</code>           \u2013            <p>the list of intensity values.</p> </li> <li> <code>precursor_mz</code>           \u2013            <p>the m/z value of the precursor.</p> </li> <li> <code>rt</code>           \u2013            <p>the retention time in seconds.</p> </li> <li> <code>metadata</code>           \u2013            <p>the metadata of the spectrum, i.e. the header information in the MGF file.</p> </li> <li> <code>gnps_annotations</code>               (<code>dict</code>)           \u2013            <p>the GNPS annotations of the spectrum.</p> </li> <li> <code>gnps_id</code>               (<code>str | None</code>)           \u2013            <p>the GNPS ID of the spectrum.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>the strains that this spectrum belongs to.</p> </li> <li> <code>family</code>               (<code>MolecularFamily | None</code>)           \u2013            <p>the molecular family that this spectrum belongs to.</p> </li> <li> <code>peaks</code>               (<code>ndarray</code>)           \u2013            <p>2D array of peaks, each row is a peak of (m/z, intensity) values.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the spectrum ID.</p> </li> <li> <code>mz</code>               (<code>list[float]</code>)           \u2013            <p>the list of m/z values.</p> </li> <li> <code>intensity</code>               (<code>list[float]</code>)           \u2013            <p>the list of intensity values.</p> </li> <li> <code>precursor_mz</code>               (<code>float</code>)           \u2013            <p>the precursor m/z.</p> </li> <li> <code>rt</code>               (<code>float</code>, default:                   <code>0</code> )           \u2013            <p>the retention time in seconds. Defaults to 0.</p> </li> <li> <code>metadata</code>               (<code>dict | None</code>, default:                   <code>None</code> )           \u2013            <p>the metadata of the spectrum, i.e. the header information in the MGF file.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __init__(\n    self,\n    id: str,\n    mz: list[float],\n    intensity: list[float],\n    precursor_mz: float,\n    rt: float = 0,\n    metadata: dict | None = None,\n) -&gt; None:\n    \"\"\"Initialize the Spectrum.\n\n    Args:\n        id: the spectrum ID.\n        mz: the list of m/z values.\n        intensity: the list of intensity values.\n        precursor_mz: the precursor m/z.\n        rt: the retention time in seconds. Defaults to 0.\n        metadata: the metadata of the spectrum, i.e. the header information\n            in the MGF file.\n    \"\"\"\n    self.id = id\n    self.mz = mz\n    self.intensity = intensity\n    self.precursor_mz = precursor_mz\n    self.rt = rt\n    self.metadata = metadata or {}\n\n    self.gnps_annotations: dict = {}\n    self.gnps_id: str | None = None\n    self.strains: StrainCollection = StrainCollection()\n    self.family: MolecularFamily | None = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id = id\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.mz","title":"mz  <code>instance-attribute</code>","text":"<pre><code>mz = mz\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.intensity","title":"intensity  <code>instance-attribute</code>","text":"<pre><code>intensity = intensity\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.precursor_mz","title":"precursor_mz  <code>instance-attribute</code>","text":"<pre><code>precursor_mz = precursor_mz\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.rt","title":"rt  <code>instance-attribute</code>","text":"<pre><code>rt = rt\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.metadata","title":"metadata  <code>instance-attribute</code>","text":"<pre><code>metadata = metadata or {}\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.gnps_annotations","title":"gnps_annotations  <code>instance-attribute</code>","text":"<pre><code>gnps_annotations: dict = {}\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.gnps_id","title":"gnps_id  <code>instance-attribute</code>","text":"<pre><code>gnps_id: str | None = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.strains","title":"strains  <code>instance-attribute</code>","text":"<pre><code>strains: StrainCollection = StrainCollection()\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.family","title":"family  <code>instance-attribute</code>","text":"<pre><code>family: MolecularFamily | None = None\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.peaks","title":"peaks  <code>cached</code> <code>property</code>","text":"<pre><code>peaks: ndarray\n</code></pre> <p>Get the peaks, a 2D array with each row containing the values of (m/z, intensity).</p>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __str__(self) -&gt; str:\n    return f\"Spectrum(id={self.id}, #strains={len(self.strains)})\"\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, Spectrum):\n        return self.id == other.id and self.precursor_mz == other.precursor_mz\n    return NotImplemented\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __hash__(self) -&gt; int:\n    return hash((self.id, self.precursor_mz))\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.__reduce__","title":"__reduce__","text":"<pre><code>__reduce__() -&gt; tuple\n</code></pre> <p>Reduce function for pickling.</p> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def __reduce__(self) -&gt; tuple:\n    \"\"\"Reduce function for pickling.\"\"\"\n    return (\n        self.__class__,\n        (self.id, self.mz, self.intensity, self.precursor_mz, self.rt, self.metadata),\n        self.__dict__,\n    )\n</code></pre>"},{"location":"api/metabolomics/#nplinker.metabolomics.Spectrum.has_strain","title":"has_strain","text":"<pre><code>has_strain(strain: Strain) -&gt; bool\n</code></pre> <p>Check if the given strain exists in the spectrum.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p><code>Strain</code> object.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True when the given strain exist in the spectrum.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/spectrum.py</code> <pre><code>def has_strain(self, strain: Strain) -&gt; bool:\n    \"\"\"Check if the given strain exists in the spectrum.\n\n    Args:\n        strain: `Strain` object.\n\n    Returns:\n        True when the given strain exist in the spectrum.\n    \"\"\"\n    return strain in self.strains\n</code></pre>"},{"location":"api/metabolomics_abc/","title":"Abstract Base Classes","text":""},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc","title":"nplinker.metabolomics.abc","text":""},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.SpectrumLoaderBase","title":"SpectrumLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for SpectrumLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.SpectrumLoaderBase.spectra","title":"spectra  <code>abstractmethod</code> <code>property</code>","text":"<pre><code>spectra: list[Spectrum]\n</code></pre> <p>Get Spectrum objects.</p> <p>Returns:</p> <ul> <li> <code>list[Spectrum]</code>           \u2013            <p>A sequence of Spectrum objects.</p> </li> </ul>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.MolecularFamilyLoaderBase","title":"MolecularFamilyLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for MolecularFamilyLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.MolecularFamilyLoaderBase.get_mfs","title":"get_mfs  <code>abstractmethod</code>","text":"<pre><code>get_mfs(keep_singleton: bool) -&gt; list[MolecularFamily]\n</code></pre> <p>Get MolecularFamily objects.</p> <p>Parameters:</p> <ul> <li> <code>keep_singleton</code>               (<code>bool</code>)           \u2013            <p>True to keep singleton molecular families. A singleton molecular family is a molecular family that contains only one spectrum.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[MolecularFamily]</code>           \u2013            <p>A sequence of MolecularFamily objects.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/abc.py</code> <pre><code>@abstractmethod\ndef get_mfs(self, keep_singleton: bool) -&gt; list[MolecularFamily]:\n    \"\"\"Get MolecularFamily objects.\n\n    Args:\n        keep_singleton: True to keep singleton molecular families. A\n            singleton molecular family is a molecular family that contains\n            only one spectrum.\n\n    Returns:\n        A sequence of MolecularFamily objects.\n    \"\"\"\n</code></pre>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.FileMappingLoaderBase","title":"FileMappingLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for FileMappingLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.FileMappingLoaderBase.mappings","title":"mappings  <code>abstractmethod</code> <code>property</code>","text":"<pre><code>mappings: dict[str, list[str]]\n</code></pre> <p>Get file mappings.</p> <p>Returns:</p> <ul> <li> <code>dict[str, list[str]]</code>           \u2013            <p>A mapping from spectrum ID to the names of files where the spectrum occurs.</p> </li> </ul>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.AnnotationLoaderBase","title":"AnnotationLoaderBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class for AnnotationLoader.</p>"},{"location":"api/metabolomics_abc/#nplinker.metabolomics.abc.AnnotationLoaderBase.annotations","title":"annotations  <code>abstractmethod</code> <code>property</code>","text":"<pre><code>annotations: dict[str, dict]\n</code></pre> <p>Get annotations.</p> <p>Returns:</p> <ul> <li> <code>dict[str, dict]</code>           \u2013            <p>A mapping from spectrum ID to its annotations.</p> </li> </ul>"},{"location":"api/metabolomics_utils/","title":"Utilities","text":""},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils","title":"nplinker.metabolomics.utils","text":""},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.add_annotation_to_spectrum","title":"add_annotation_to_spectrum","text":"<pre><code>add_annotation_to_spectrum(\n    annotations: Mapping[str, dict],\n    spectra: Sequence[Spectrum],\n) -&gt; None\n</code></pre> <p>Add annotations to the <code>Spectrum.gnps_annotations</code> attribute for input spectra.</p> <p>It is possible that some spectra don't have annotations.</p> <p>Note</p> <p>The input <code>spectra</code> list is changed in place.</p> <p>Parameters:</p> <ul> <li> <code>annotations</code>               (<code>Mapping[str, dict]</code>)           \u2013            <p>A dictionary of GNPS annotations, where the keys are spectrum ids and the values are GNPS annotations.</p> </li> <li> <code>spectra</code>               (<code>Sequence[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def add_annotation_to_spectrum(\n    annotations: Mapping[str, dict], spectra: Sequence[Spectrum]\n) -&gt; None:\n    \"\"\"Add annotations to the `Spectrum.gnps_annotations` attribute for input spectra.\n\n    It is possible that some spectra don't have annotations.\n\n    !!! note\n        The input `spectra` list is changed in place.\n\n    Args:\n        annotations: A dictionary of GNPS annotations, where the keys are\n            spectrum ids and the values are GNPS annotations.\n        spectra: A list of Spectrum objects.\n    \"\"\"\n    for spec in spectra:\n        if spec.id in annotations:\n            spec.gnps_annotations = annotations[spec.id]\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.add_strains_to_spectrum","title":"add_strains_to_spectrum","text":"<pre><code>add_strains_to_spectrum(\n    strains: StrainCollection, spectra: Sequence[Spectrum]\n) -&gt; tuple[list[Spectrum], list[Spectrum]]\n</code></pre> <p>Add <code>Strain</code> objects to the <code>Spectrum.strains</code> attribute for input spectra.</p> <p>Note</p> <p>The input <code>spectra</code> list is changed in place.</p> <p>Parameters:</p> <ul> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A collection of strain objects.</p> </li> <li> <code>spectra</code>               (<code>Sequence[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[Spectrum], list[Spectrum]]</code>           \u2013            <p>A tuple of two lists of Spectrum objects,</p> <ul> <li>the first list contains Spectrum objects that are updated with Strain objects;</li> <li>the second list contains Spectrum objects that are not updated with Strain objects because no Strain objects are found.</li> </ul> </li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def add_strains_to_spectrum(\n    strains: StrainCollection, spectra: Sequence[Spectrum]\n) -&gt; tuple[list[Spectrum], list[Spectrum]]:\n    \"\"\"Add `Strain` objects to the `Spectrum.strains` attribute for input spectra.\n\n    !!! note\n        The input `spectra` list is changed in place.\n\n    Args:\n        strains: A collection of strain objects.\n        spectra: A list of Spectrum objects.\n\n    Returns:\n        A tuple of two lists of Spectrum objects,\n\n            - the first list contains Spectrum objects that are updated with Strain objects;\n            - the second list contains Spectrum objects that are not updated with Strain objects\n            because no Strain objects are found.\n    \"\"\"\n    spectra_with_strains = []\n    spectra_without_strains = []\n    for spec in spectra:\n        try:\n            strain_list = strains.lookup(spec.id)\n        except ValueError:\n            spectra_without_strains.append(spec)\n            continue\n\n        for strain in strain_list:\n            spec.strains.add(strain)\n        spectra_with_strains.append(spec)\n\n    logger.info(\n        f\"{len(spectra_with_strains)} Spectrum objects updated with Strain objects.\\n\"\n        f\"{len(spectra_without_strains)} Spectrum objects not updated with Strain objects.\"\n    )\n\n    return spectra_with_strains, spectra_without_strains\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.add_spectrum_to_mf","title":"add_spectrum_to_mf","text":"<pre><code>add_spectrum_to_mf(\n    spectra: Sequence[Spectrum],\n    mfs: Sequence[MolecularFamily],\n) -&gt; tuple[\n    list[MolecularFamily],\n    list[MolecularFamily],\n    dict[MolecularFamily, set[str]],\n]\n</code></pre> <p>Add Spectrum objects to MolecularFamily objects.</p> <p>The attribute <code>MolecularFamily.spectra_ids</code> contains the ids of <code>Spectrum</code> objects. These ids are used to find <code>Spectrum</code> objects from the input <code>spectra</code> list. The found <code>Spectrum</code> objects are added to the <code>MolecularFamily.spectra</code> attribute.</p> <p>It is possible that some spectrum ids are not found in the input <code>spectra</code> list, and so their <code>Spectrum</code> objects are missing in the <code>MolecularFamily</code> object.</p> <p>Note</p> <p>The input <code>mfs</code> list is changed in place.</p> <p>Parameters:</p> <ul> <li> <code>spectra</code>               (<code>Sequence[Spectrum]</code>)           \u2013            <p>A list of Spectrum objects.</p> </li> <li> <code>mfs</code>               (<code>Sequence[MolecularFamily]</code>)           \u2013            <p>A list of MolecularFamily objects.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>tuple[list[MolecularFamily], list[MolecularFamily], dict[MolecularFamily, set[str]]]</code>           \u2013            <p>A tuple of three elements,</p> <ul> <li>the first list contains <code>MolecularFamily</code> objects that are updated with <code>Spectrum</code> objects</li> <li>the second list contains <code>MolecularFamily</code> objects that are not updated with <code>Spectrum</code> objects (all <code>Spectrum</code> objects are missing).</li> <li>the third is a dictionary containing <code>MolecularFamily</code> objects as keys and a set of ids of missing <code>Spectrum</code> objects as values.</li> </ul> </li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def add_spectrum_to_mf(\n    spectra: Sequence[Spectrum], mfs: Sequence[MolecularFamily]\n) -&gt; tuple[list[MolecularFamily], list[MolecularFamily], dict[MolecularFamily, set[str]]]:\n    \"\"\"Add Spectrum objects to MolecularFamily objects.\n\n    The attribute `MolecularFamily.spectra_ids` contains the ids of `Spectrum` objects.\n    These ids are used to find `Spectrum` objects from the input `spectra` list. The found `Spectrum`\n    objects are added to the `MolecularFamily.spectra` attribute.\n\n    It is possible that some spectrum ids are not found in the input `spectra` list, and so their\n    `Spectrum` objects are missing in the `MolecularFamily` object.\n\n\n    !!! note\n        The input `mfs` list is changed in place.\n\n    Args:\n        spectra: A list of Spectrum objects.\n        mfs: A list of MolecularFamily objects.\n\n    Returns:\n        A tuple of three elements,\n\n            - the first list contains `MolecularFamily` objects that are updated with `Spectrum` objects\n            - the second list contains `MolecularFamily` objects that are not updated with `Spectrum`\n            objects (all `Spectrum` objects are missing).\n            - the third is a dictionary containing `MolecularFamily` objects as keys and a set of ids\n            of missing `Spectrum` objects as values.\n    \"\"\"\n    spec_dict = {spec.id: spec for spec in spectra}\n    mf_with_spec = []\n    mf_without_spec = []\n    mf_missing_spec: dict[MolecularFamily, set[str]] = {}\n    for mf in mfs:\n        for spec_id in mf.spectra_ids:\n            try:\n                spec = spec_dict[spec_id]\n            except KeyError:\n                if mf not in mf_missing_spec:\n                    mf_missing_spec[mf] = {spec_id}\n                else:\n                    mf_missing_spec[mf].add(spec_id)\n                continue\n            mf.add_spectrum(spec)\n\n        if mf.spectra:\n            mf_with_spec.append(mf)\n        else:\n            mf_without_spec.append(mf)\n\n    logger.info(\n        f\"{len(mf_with_spec)} MolecularFamily objects updated with Spectrum objects.\\n\"\n        f\"{len(mf_without_spec)} MolecularFamily objects not updated with Spectrum objects.\\n\"\n        f\"{len(mf_missing_spec)} MolecularFamily objects have missing Spectrum objects.\"\n    )\n    return mf_with_spec, mf_without_spec, mf_missing_spec\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.extract_mappings_strain_id_ms_filename","title":"extract_mappings_strain_id_ms_filename","text":"<pre><code>extract_mappings_strain_id_ms_filename(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"strain_id &lt;-&gt; MS_filename\".</p> <p>Parameters:</p> <ul> <li> <code>podp_project_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the PODP project JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of MS filenames.</p> </li> </ul> Notes <p>The <code>podp_project_json_file</code> is the project JSON file downloaded from PODP platform. For example, for project MSV000079284, its json file is https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.</p> See Also <ul> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def extract_mappings_strain_id_ms_filename(\n    podp_project_json_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"strain_id &lt;-&gt; MS_filename\".\n\n    Args:\n        podp_project_json_file: The path to the PODP project JSON file.\n\n    Returns:\n        Key is strain id and value is a set of MS filenames.\n\n    Notes:\n        The `podp_project_json_file` is the project JSON file downloaded from\n        PODP platform. For example, for project MSV000079284, its json file is\n        https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.\n\n    See Also:\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict: dict[str, set[str]] = {}\n    with open(podp_project_json_file, \"r\") as f:\n        json_data = json.load(f)\n\n    validate_podp_json(json_data)\n\n    # Extract mappings strain id &lt;-&gt; metabolomics filename\n    for record in json_data[\"genome_metabolome_links\"]:\n        strain_id = record[\"genome_label\"]\n        # get the actual filename of the mzXML URL\n        filename = Path(record[\"metabolomics_file\"]).name\n        if strain_id in mappings_dict:\n            mappings_dict[strain_id].add(filename)\n        else:\n            mappings_dict[strain_id] = {filename}\n    return mappings_dict\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.extract_mappings_ms_filename_spectrum_id","title":"extract_mappings_ms_filename_spectrum_id","text":"<pre><code>extract_mappings_ms_filename_spectrum_id(\n    gnps_file_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".</p> <p>Parameters:</p> <ul> <li> <code>gnps_file_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the GNPS file mappings file (csv or tsv).</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is MS filename and value is a set of spectrum ids.</p> </li> </ul> Notes <p>The <code>gnps_file_mappings_file</code> is downloaded from GNPS website and named as GNPS_FILE_MAPPINGS_TSV or GNPS_FILE_MAPPINGS_CSV. For more details, see GNPS data.</p> See Also <ul> <li>GNPSFileMappingLoader: Load GNPS file mappings file.</li> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def extract_mappings_ms_filename_spectrum_id(\n    gnps_file_mappings_file: str | PathLike,\n) -&gt; dict[str, set[str]]:\n    \"\"\"Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".\n\n    Args:\n        gnps_file_mappings_file: The path to the GNPS file mappings file (csv or tsv).\n\n    Returns:\n        Key is MS filename and value is a set of spectrum ids.\n\n    Notes:\n        The `gnps_file_mappings_file` is downloaded from GNPS website and named as\n        [GNPS_FILE_MAPPINGS_TSV][nplinker.defaults.GNPS_FILE_MAPPINGS_TSV] or\n        [GNPS_FILE_MAPPINGS_CSV][nplinker.defaults.GNPS_FILE_MAPPINGS_CSV].\n        For more details, see [GNPS data][gnps-data].\n\n    See Also:\n        - [GNPSFileMappingLoader][nplinker.metabolomics.gnps.gnps_file_mapping_loader.GNPSFileMappingLoader]:\n        Load GNPS file mappings file.\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    loader = GNPSFileMappingLoader(gnps_file_mappings_file)\n    return loader.mapping_reversed\n</code></pre>"},{"location":"api/metabolomics_utils/#nplinker.metabolomics.utils.get_mappings_strain_id_spectrum_id","title":"get_mappings_strain_id_spectrum_id","text":"<pre><code>get_mappings_strain_id_spectrum_id(\n    mappings_strain_id_ms_filename: Mapping[str, set[str]],\n    mappings_ms_filename_spectrum_id: Mapping[\n        str, set[str]\n    ],\n) -&gt; dict[str, set[str]]\n</code></pre> <p>Get mappings \"strain_id &lt;-&gt; spectrum_id\".</p> <p>Parameters:</p> <ul> <li> <code>mappings_strain_id_ms_filename</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"strain_id &lt;-&gt; MS_filename\".</p> </li> <li> <code>mappings_ms_filename_spectrum_id</code>               (<code>Mapping[str, set[str]]</code>)           \u2013            <p>Mappings \"MS_filename &lt;-&gt; spectrum_id\".</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, set[str]]</code>           \u2013            <p>Key is strain id and value is a set of spectrum ids.</p> </li> </ul> See Also <ul> <li><code>extract_mappings_strain_id_ms_filename</code>: Extract mappings \"strain_id &lt;-&gt; MS_filename\".</li> <li><code>extract_mappings_ms_filename_spectrum_id</code>: Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".</li> <li>podp_generate_strain_mappings:     Generate strain mappings JSON file for PODP pipeline.</li> </ul> Source code in <code>src/nplinker/metabolomics/utils.py</code> <pre><code>def get_mappings_strain_id_spectrum_id(\n    mappings_strain_id_ms_filename: Mapping[str, set[str]],\n    mappings_ms_filename_spectrum_id: Mapping[str, set[str]],\n) -&gt; dict[str, set[str]]:\n    \"\"\"Get mappings \"strain_id &lt;-&gt; spectrum_id\".\n\n    Args:\n        mappings_strain_id_ms_filename: Mappings\n            \"strain_id &lt;-&gt; MS_filename\".\n        mappings_ms_filename_spectrum_id: Mappings\n            \"MS_filename &lt;-&gt; spectrum_id\".\n\n    Returns:\n        Key is strain id and value is a set of spectrum ids.\n\n\n    See Also:\n        - `extract_mappings_strain_id_ms_filename`: Extract mappings \"strain_id &lt;-&gt; MS_filename\".\n        - `extract_mappings_ms_filename_spectrum_id`: Extract mappings \"MS_filename &lt;-&gt; spectrum_id\".\n        - [podp_generate_strain_mappings][nplinker.strain.utils.podp_generate_strain_mappings]:\n            Generate strain mappings JSON file for PODP pipeline.\n    \"\"\"\n    mappings_dict = {}\n    for strain_id, ms_filenames in mappings_strain_id_ms_filename.items():\n        spectrum_ids = set()\n        for ms_filename in ms_filenames:\n            if (sid := mappings_ms_filename_spectrum_id.get(ms_filename)) is not None:\n                spectrum_ids.update(sid)\n        if spectrum_ids:\n            mappings_dict[strain_id] = spectrum_ids\n    return mappings_dict\n</code></pre>"},{"location":"api/mibig/","title":"MiBIG","text":""},{"location":"api/mibig/#nplinker.genomics.mibig","title":"nplinker.genomics.mibig","text":""},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader","title":"MibigLoader","text":"<pre><code>MibigLoader(data_dir: str | PathLike)\n</code></pre> <p>               Bases: <code>BGCLoaderBase</code></p> <p>Parse MIBiG metadata files and return BGC objects.</p> <p>MIBiG metadata file (json) contains annotations/metadata information for each BGC. See https://mibig.secondarymetabolites.org/download.</p> <p>The MiBIG accession is used as BGC id and strain name. The loaded BGC objects have Strain object as their strain attribute (i.e. <code>BGC.strain</code>).</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory of MIBiG metadata json files</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; loader = MibigLoader(\"path/to/mibig/data/dir\")\n&gt;&gt;&gt; loader.data_dir\n'path/to/mibig/data/dir'\n&gt;&gt;&gt; loader.get_bgcs()\n[BGC('BGC000001', 'NRP'), BGC('BGC000002', 'Polyketide')]\n</code></pre> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def __init__(self, data_dir: str | PathLike):\n    \"\"\"Initialize the MIBiG metadata loader.\n\n    Args:\n        data_dir: Path to the directory of MIBiG metadata json files\n\n    Examples:\n        &gt;&gt;&gt; loader = MibigLoader(\"path/to/mibig/data/dir\")\n        &gt;&gt;&gt; loader.data_dir\n        'path/to/mibig/data/dir'\n        &gt;&gt;&gt; loader.get_bgcs()\n        [BGC('BGC000001', 'NRP'), BGC('BGC000002', 'Polyketide')]\n    \"\"\"\n    self.data_dir = str(data_dir)\n    self._file_dict = self.parse_data_dir(self.data_dir)\n    self._metadata_dict = self._parse_metadata()\n    self._bgcs = self._parse_bgcs()\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.data_dir","title":"data_dir  <code>instance-attribute</code>","text":"<pre><code>data_dir = str(data_dir)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.get_files","title":"get_files","text":"<pre><code>get_files() -&gt; dict[str, str]\n</code></pre> <p>Get the path of all MIBiG metadata json files.</p> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is metadata file name (BGC accession), and the value is path to the metadata</p> </li> <li> <code>dict[str, str]</code>           \u2013            <p>json file</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def get_files(self) -&gt; dict[str, str]:\n    \"\"\"Get the path of all MIBiG metadata json files.\n\n    Returns:\n        The key is metadata file name (BGC accession), and the value is path to the metadata\n        json file\n    \"\"\"\n    return self._file_dict\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.parse_data_dir","title":"parse_data_dir  <code>staticmethod</code>","text":"<pre><code>parse_data_dir(data_dir: str | PathLike) -&gt; dict[str, str]\n</code></pre> <p>Parse metadata directory and return paths to all metadata json files.</p> <p>Parameters:</p> <ul> <li> <code>data_dir</code>               (<code>str | PathLike</code>)           \u2013            <p>path to the directory of MIBiG metadata json files</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[str, str]</code>           \u2013            <p>The key is metadata file name (BGC accession), and the value is path to the metadata</p> </li> <li> <code>dict[str, str]</code>           \u2013            <p>json file</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>@staticmethod\ndef parse_data_dir(data_dir: str | PathLike) -&gt; dict[str, str]:\n    \"\"\"Parse metadata directory and return paths to all metadata json files.\n\n    Args:\n        data_dir: path to the directory of MIBiG metadata json files\n\n    Returns:\n        The key is metadata file name (BGC accession), and the value is path to the metadata\n        json file\n    \"\"\"\n    file_dict = {}\n    json_files = list_files(data_dir, prefix=\"BGC\", suffix=\".json\")\n    for file in json_files:\n        fname = Path(file).stem\n        file_dict[fname] = file\n    return file_dict\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.get_metadata","title":"get_metadata","text":"<pre><code>get_metadata() -&gt; dict[str, MibigMetadata]\n</code></pre> <p>Get MibigMetadata objects.</p> <p>Returns:</p> <ul> <li> <code>dict[str, MibigMetadata]</code>           \u2013            <p>The key is BGC accession (file name) and the value is MibigMetadata object</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def get_metadata(self) -&gt; dict[str, MibigMetadata]:\n    \"\"\"Get MibigMetadata objects.\n\n    Returns:\n        The key is BGC accession (file name) and the value is MibigMetadata object\n    \"\"\"\n    return self._metadata_dict\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigLoader.get_bgcs","title":"get_bgcs","text":"<pre><code>get_bgcs() -&gt; list[BGC]\n</code></pre> <p>Get BGC objects.</p> <p>The BGC objects use MiBIG accession as id and have Strain object as their strain attribute (i.e. <code>BGC.strain</code>), where the name of the Strain object is also MiBIG accession.</p> <p>Returns:</p> <ul> <li> <code>list[BGC]</code>           \u2013            <p>A list of BGC objects</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def get_bgcs(self) -&gt; list[BGC]:\n    \"\"\"Get BGC objects.\n\n    The BGC objects use MiBIG accession as id and have Strain object as\n    their strain attribute (i.e. `BGC.strain`), where the name of the Strain\n    object is also MiBIG accession.\n\n    Returns:\n        A list of BGC objects\n    \"\"\"\n    return self._bgcs\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata","title":"MibigMetadata","text":"<pre><code>MibigMetadata(file: str | PathLike)\n</code></pre> <p>Class to model the BGC metadata/annotations defined in MIBiG.</p> <p>MIBiG is a specification of BGC metadata and use JSON schema to represent BGC metadata. More details see: https://mibig.secondarymetabolites.org/download.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the json file of MIBiG BGC metadata</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; metadata = MibigMetadata(\"/data/BGC0000001.json\")\n</code></pre> Source code in <code>src/nplinker/genomics/mibig/mibig_metadata.py</code> <pre><code>def __init__(self, file: str | PathLike) -&gt; None:\n    \"\"\"Initialize the MIBiG metadata object.\n\n    Args:\n        file: Path to the json file of MIBiG BGC metadata\n\n    Examples:\n        &gt;&gt;&gt; metadata = MibigMetadata(\"/data/BGC0000001.json\")\n    \"\"\"\n    self.file = str(file)\n    with open(self.file, \"rb\") as f:\n        self.metadata = json.load(f)\n\n    self._mibig_accession: str\n    self._biosyn_class: tuple[str]\n    self._parse_metadata()\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.file","title":"file  <code>instance-attribute</code>","text":"<pre><code>file = str(file)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.metadata","title":"metadata  <code>instance-attribute</code>","text":"<pre><code>metadata = load(f)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.mibig_accession","title":"mibig_accession  <code>property</code>","text":"<pre><code>mibig_accession: str\n</code></pre> <p>Get the value of metadata item 'mibig_accession'.</p>"},{"location":"api/mibig/#nplinker.genomics.mibig.MibigMetadata.biosyn_class","title":"biosyn_class  <code>property</code>","text":"<pre><code>biosyn_class: tuple[str]\n</code></pre> <p>Get the value of metadata item 'biosyn_class'.</p> <p>The 'biosyn_class' is biosynthetic class(es), namely the type of natural product or secondary metabolite.</p> <p>MIBiG defines 6 major biosynthetic classes for natural products, including <code>NRP</code>, <code>Polyketide</code>, <code>RiPP</code>, <code>Terpene</code>, <code>Saccharide</code> and <code>Alkaloid</code>. Note that natural products created by the other biosynthetic mechanisms fall under the category <code>Other</code>. For more details see the paper.</p>"},{"location":"api/mibig/#nplinker.genomics.mibig.download_and_extract_mibig_metadata","title":"download_and_extract_mibig_metadata","text":"<pre><code>download_and_extract_mibig_metadata(\n    download_root: str | PathLike,\n    extract_path: str | PathLike,\n    version: str = \"3.1\",\n)\n</code></pre> <p>Download and extract MIBiG metadata json files.</p> <p>Note that it does not matter whether the metadata json files are in nested folders or not in the archive, all json files will be extracted to the same location, i.e. <code>extract_path</code>. The nested folders will be removed if they exist. So the <code>extract_path</code> will have only json files.</p> <p>Parameters:</p> <ul> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory in which to place the downloaded archive.</p> </li> <li> <code>extract_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to an empty directory where the json files will be extracted. The directory must be empty if it exists. If it doesn't exist, the directory will be created.</p> </li> <li> <code>version</code>               (<code>str</code>, default:                   <code>'3.1'</code> )           \u2013            <p>description. Defaults to \"3.1\".</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; download_and_extract_mibig_metadata(\"/data/download\", \"/data/mibig_metadata\")\n</code></pre> Source code in <code>src/nplinker/genomics/mibig/mibig_downloader.py</code> <pre><code>def download_and_extract_mibig_metadata(\n    download_root: str | os.PathLike,\n    extract_path: str | os.PathLike,\n    version: str = \"3.1\",\n):\n    \"\"\"Download and extract MIBiG metadata json files.\n\n    Note that it does not matter whether the metadata json files are in nested folders or not in the archive,\n    all json files will be extracted to the same location, i.e. `extract_path`. The nested\n    folders will be removed if they exist. So the `extract_path` will have only json files.\n\n    Args:\n        download_root: Path to the directory in which to place the downloaded archive.\n        extract_path: Path to an empty directory where the json files will be extracted.\n            The directory must be empty if it exists. If it doesn't exist, the directory will be created.\n        version: _description_. Defaults to \"3.1\".\n\n    Examples:\n        &gt;&gt;&gt; download_and_extract_mibig_metadata(\"/data/download\", \"/data/mibig_metadata\")\n    \"\"\"\n    download_root = Path(download_root)\n    extract_path = Path(extract_path)\n\n    if download_root == extract_path:\n        raise ValueError(\"Identical path of download directory and extract directory\")\n\n    # check if extract_path is empty\n    if not extract_path.exists():\n        extract_path.mkdir(parents=True)\n    else:\n        if len(list(extract_path.iterdir())) != 0:\n            raise ValueError(f'Nonempty directory: \"{extract_path}\"')\n\n    # download and extract\n    md5 = _MD5_MIBIG_METADATA[version]\n    download_and_extract_archive(\n        url=MIBIG_METADATA_URL.format(version=version),\n        download_root=download_root,\n        extract_root=extract_path,\n        md5=md5,\n    )\n\n    # After extracting mibig archive, it's either one dir or many json files,\n    # if it's a dir, then move all json files from it to extract_path\n    subdirs = list_dirs(extract_path)\n    if len(subdirs) &gt; 1:\n        raise ValueError(f\"Expected one extracted directory, got {len(subdirs)}\")\n\n    if len(subdirs) == 1:\n        subdir_path = subdirs[0]\n        for fname in list_files(subdir_path, prefix=\"BGC\", suffix=\".json\", keep_parent=False):\n            shutil.move(os.path.join(subdir_path, fname), os.path.join(extract_path, fname))\n        # delete subdir\n        if subdir_path != extract_path:\n            shutil.rmtree(subdir_path)\n</code></pre>"},{"location":"api/mibig/#nplinker.genomics.mibig.parse_bgc_metadata_json","title":"parse_bgc_metadata_json","text":"<pre><code>parse_bgc_metadata_json(file: str | PathLike) -&gt; BGC\n</code></pre> <p>Parse MIBiG metadata file and return BGC object.</p> <p>Note that the MiBIG accession is used as the BGC id and strain name. The BGC object has Strain object as its strain attribute.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the MIBiG metadata json file</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>BGC</code>           \u2013            <p>BGC object</p> </li> </ul> Source code in <code>src/nplinker/genomics/mibig/mibig_loader.py</code> <pre><code>def parse_bgc_metadata_json(file: str | PathLike) -&gt; BGC:\n    \"\"\"Parse MIBiG metadata file and return BGC object.\n\n    Note that the MiBIG accession is used as the BGC id and strain name. The BGC\n    object has Strain object as its strain attribute.\n\n    Args:\n        file: Path to the MIBiG metadata json file\n\n    Returns:\n        BGC object\n    \"\"\"\n    metadata = MibigMetadata(str(file))\n    mibig_bgc = BGC(metadata.mibig_accession, *metadata.biosyn_class)\n    mibig_bgc.mibig_bgc_class = metadata.biosyn_class\n    mibig_bgc.strain = Strain(metadata.mibig_accession)\n    return mibig_bgc\n</code></pre>"},{"location":"api/nplinker/","title":"NPLinker","text":""},{"location":"api/nplinker/#nplinker","title":"nplinker","text":""},{"location":"api/nplinker/#nplinker.NPLinker","title":"NPLinker","text":"<pre><code>NPLinker(config_file: str | PathLike)\n</code></pre> <p>The central class of NPLinker application.</p> <p>Attributes:</p> <ul> <li> <code>config</code>               (<code>Dynaconf</code>)           \u2013            <p>The configuration object for the current NPLinker application.</p> </li> <li> <code>root_dir</code>               (<code>str</code>)           \u2013            <p>The path to the root directory of the current NPLinker application.</p> </li> <li> <code>output_dir</code>               (<code>str</code>)           \u2013            <p>The path to the output directory of the current NPLinker application.</p> </li> <li> <code>bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of all BGC objects.</p> </li> <li> <code>gcfs</code>               (<code>list[GCF]</code>)           \u2013            <p>A list of all GCF objects.</p> </li> <li> <code>spectra</code>               (<code>list[Spectrum]</code>)           \u2013            <p>A list of all Spectrum objects.</p> </li> <li> <code>mfs</code>               (<code>list[MolecularFamily]</code>)           \u2013            <p>A list of all MolecularFamily objects.</p> </li> <li> <code>mibig_bgcs</code>               (<code>list[BGC]</code>)           \u2013            <p>A list of all MiBIG BGC objects.</p> </li> <li> <code>strains</code>               (<code>StrainCollection</code>)           \u2013            <p>A StrainCollection object containing all Strain objects.</p> </li> <li> <code>product_types</code>               (<code>list[str]</code>)           \u2013            <p>A list of all BiGSCAPE product types.</p> </li> <li> <code>scoring_methods</code>               (<code>list[str]</code>)           \u2013            <p>A list of all valid scoring methods.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>config_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the configuration file to use.</p> </li> </ul> <p>Examples:</p> <p>Starting the NPLinker application:</p> <pre><code>&gt;&gt;&gt; from nplinker import NPLinker\n&gt;&gt;&gt; npl = NPLinker(\"path/to/config.toml\")\n</code></pre> <p>Loading data from files to python objects:</p> <pre><code>&gt;&gt;&gt; npl.load_data()\n</code></pre> <p>Checking the number of GCF objects:</p> <pre><code>&gt;&gt;&gt; len(npl.gcfs)\n</code></pre> <p>Getting the links for all GCF objects using the Metcalf scoring method, and the result is stored in a LinkGraph object:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n</code></pre> <p>Getting the link data between two objects:</p> <pre><code>&gt;&gt;&gt; link_data = lg.get_link_data(npl.gcfs[0], npl.spectra[0])\n{\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0, \"standardised\": False})}\n</code></pre> <p>Saving the data to a pickle file:</p> <pre><code>&gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def __init__(self, config_file: str | PathLike):\n    \"\"\"Initialise an NPLinker instance.\n\n    Args:\n        config_file: Path to the configuration file to use.\n\n\n    Examples:\n        Starting the NPLinker application:\n        &gt;&gt;&gt; from nplinker import NPLinker\n        &gt;&gt;&gt; npl = NPLinker(\"path/to/config.toml\")\n\n        Loading data from files to python objects:\n        &gt;&gt;&gt; npl.load_data()\n\n        Checking the number of GCF objects:\n        &gt;&gt;&gt; len(npl.gcfs)\n\n        Getting the links for all GCF objects using the Metcalf scoring method, and the result\n        is stored in a [LinkGraph][nplinker.scoring.LinkGraph] object:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n\n        Getting the link data between two objects:\n        &gt;&gt;&gt; link_data = lg.get_link_data(npl.gcfs[0], npl.spectra[0])\n        {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0, \"standardised\": False})}\n\n        Saving the data to a pickle file:\n        &gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n    \"\"\"\n    # Load the configuration file\n    self.config: Dynaconf = load_config(config_file)\n\n    # Setup logging for the application\n    setup_logging(\n        level=self.config.log.level,\n        file=self.config.log.get(\"file\", \"\"),\n        use_console=self.config.log.use_console,\n    )\n    logger.info(\n        \"Configuration:\\n %s\", pformat(self.config.as_dict(), width=20, sort_dicts=False)\n    )\n\n    # Setup the output directory\n    self._output_dir = self.config.root_dir / OUTPUT_DIRNAME\n    self._output_dir.mkdir(exist_ok=True)\n\n    # Initialise data containers that will be populated by the `load_data` method\n    self._bgc_dict: dict[str, BGC] = {}\n    self._gcf_dict: dict[str, GCF] = {}\n    self._spec_dict: dict[str, Spectrum] = {}\n    self._mf_dict: dict[str, MolecularFamily] = {}\n    self._mibig_bgcs: list[BGC] = []\n    self._strains: StrainCollection = StrainCollection()\n    self._product_types: list = []\n    self._chem_classes = None  # TODO: to be refactored\n    self._class_matches = None  # TODO: to be refactored\n\n    # Flags to keep track of whether the scoring methods have been set up\n    self._scoring_methods_setup_done = {name: False for name in self._valid_scoring_methods}\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.config","title":"config  <code>instance-attribute</code>","text":"<pre><code>config: Dynaconf = load_config(config_file)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.root_dir","title":"root_dir  <code>property</code>","text":"<pre><code>root_dir: str\n</code></pre> <p>Get the path to the root directory of the current NPLinker instance.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.output_dir","title":"output_dir  <code>property</code>","text":"<pre><code>output_dir: str\n</code></pre> <p>Get the path to the output directory of the current NPLinker instance.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.bgcs","title":"bgcs  <code>property</code>","text":"<pre><code>bgcs: list[BGC]\n</code></pre> <p>Get all BGC objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.gcfs","title":"gcfs  <code>property</code>","text":"<pre><code>gcfs: list[GCF]\n</code></pre> <p>Get all GCF objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.spectra","title":"spectra  <code>property</code>","text":"<pre><code>spectra: list[Spectrum]\n</code></pre> <p>Get all Spectrum objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.mfs","title":"mfs  <code>property</code>","text":"<pre><code>mfs: list[MolecularFamily]\n</code></pre> <p>Get all MolecularFamily objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.mibig_bgcs","title":"mibig_bgcs  <code>property</code>","text":"<pre><code>mibig_bgcs: list[BGC]\n</code></pre> <p>Get all MiBIG BGC objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.strains","title":"strains  <code>property</code>","text":"<pre><code>strains: StrainCollection\n</code></pre> <p>Get all Strain objects.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.product_types","title":"product_types  <code>property</code>","text":"<pre><code>product_types: list[str]\n</code></pre> <p>Get all BiGSCAPE product types.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.chem_classes","title":"chem_classes  <code>property</code>","text":"<pre><code>chem_classes\n</code></pre> <p>Returns loaded ChemClassPredictions with the class predictions.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.class_matches","title":"class_matches  <code>property</code>","text":"<pre><code>class_matches\n</code></pre> <p>ClassMatches with the matched classes and scoring tables from MIBiG.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.scoring_methods","title":"scoring_methods  <code>property</code>","text":"<pre><code>scoring_methods: list[str]\n</code></pre> <p>Get names of all valid scoring methods.</p>"},{"location":"api/nplinker/#nplinker.NPLinker.load_data","title":"load_data","text":"<pre><code>load_data()\n</code></pre> <p>Load all data from files into memory.</p> <p>This method is a convenience function that calls the <code>DatasetArranger</code> class to arrange data files (download, generate and/or validate data) in the correct directory structure, and then calls the <code>DatasetLoader</code> class to load all data from the files into memory.</p> <p>The loaded data is stored in various data containers for easy access, e.g. <code>self.bgcs</code> for all BGC objects, <code>self.strains</code> for all Strain objects, etc.</p> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def load_data(self):\n    \"\"\"Load all data from files into memory.\n\n    This method is a convenience function that calls the\n    [`DatasetArranger`][nplinker.arranger.DatasetArranger] class to arrange data files\n    (download, generate and/or validate data) in the [correct directory structure][working-directory-structure],\n    and then calls the [`DatasetLoader`][nplinker.loader.DatasetLoader] class to load all data\n    from the files into memory.\n\n    The loaded data is stored in various data containers for easy access, e.g.\n    [`self.bgcs`][nplinker.NPLinker.bgcs] for all BGC objects,\n    [`self.strains`][nplinker.NPLinker.strains] for all Strain objects, etc.\n    \"\"\"\n    arranger = DatasetArranger(self.config)\n    arranger.arrange()\n    loader = DatasetLoader(self.config)\n    loader.load()\n\n    self._bgc_dict = {bgc.id: bgc for bgc in loader.bgcs}\n    self._gcf_dict = {gcf.id: gcf for gcf in loader.gcfs}\n    self._spec_dict = {spec.id: spec for spec in loader.spectra}\n    self._mf_dict = {mf.id: mf for mf in loader.mfs}\n\n    self._mibig_bgcs = loader.mibig_bgcs\n    self._strains = loader.strains\n    self._product_types = loader.product_types\n    self._chem_classes = loader.chem_classes\n    self._class_matches = loader.class_matches\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.get_links","title":"get_links","text":"<pre><code>get_links(\n    objects: (\n        Sequence[BGC]\n        | Sequence[GCF]\n        | Sequence[Spectrum]\n        | Sequence[MolecularFamily]\n    ),\n    scoring_method: str,\n    **scoring_params: Any\n) -&gt; LinkGraph\n</code></pre> <p>Get links for the given objects using the specified scoring method and parameters.</p> <p>Parameters:</p> <ul> <li> <code>objects</code>               (<code>Sequence[BGC] | Sequence[GCF] | Sequence[Spectrum] | Sequence[MolecularFamily]</code>)           \u2013            <p>A sequence of objects to get links for. The objects must be of the same type, i.e. <code>BGC</code>, <code>GCF</code>, <code>Spectrum</code> or <code>MolecularFamily</code> type.</p> <p>Warning</p> <p>For scoring method <code>metcalf</code>, the <code>BGC</code> objects are not supported.</p> </li> <li> <code>scoring_method</code>               (<code>str</code>)           \u2013            <p>The scoring method to use. Must be one of the valid scoring methods <code>self.scoring_methods</code>, such as <code>metcalf</code>.</p> </li> <li> <code>scoring_params</code>               (<code>Any</code>, default:                   <code>{}</code> )           \u2013            <p>Parameters to pass to the scoring method. If not given, the default parameters of the specified scoring method will be used.</p> <p>Check the <code>get_links</code> method of the scoring method class for the available parameters and their default values.</p> Scoring Method Scoring Parameters <code>metcalf</code> <code>cutoff</code>, <code>standardised</code> </li> </ul> <p>Returns:</p> <ul> <li> <code>LinkGraph</code>           \u2013            <p>A LinkGraph object containing the links for the given objects.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If input objects are empty or if the scoring method is invalid.</p> </li> <li> <code>TypeError</code>             \u2013            <p>If the input objects are not of the same type or if the object type is invalid.</p> </li> </ul> <p>Examples:</p> <p>Using default scoring parameters:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n</code></pre> <p>Scoring parameters provided:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\", cutoff=0.5, standardised=True)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def get_links(\n    self,\n    objects: Sequence[BGC] | Sequence[GCF] | Sequence[Spectrum] | Sequence[MolecularFamily],\n    scoring_method: str,\n    **scoring_params: Any,\n) -&gt; LinkGraph:\n    \"\"\"Get links for the given objects using the specified scoring method and parameters.\n\n    Args:\n        objects: A sequence of objects to get links for. The objects must be of the same\n            type, i.e. `BGC`, `GCF`, `Spectrum` or `MolecularFamily` type.\n            !!! Warning\n                For scoring method `metcalf`, the `BGC` objects are not supported.\n        scoring_method: The scoring method to use. Must be one of the valid scoring methods\n            [`self.scoring_methods`][nplinker.NPLinker.scoring_methods], such as `metcalf`.\n        scoring_params: Parameters to pass to the scoring method. If not given, the default\n            parameters of the specified scoring method will be used.\n\n            Check the `get_links` method of the scoring method class for the available\n            parameters and their default values.\n\n            | Scoring Method | Scoring Parameters |\n            | -------------- | ------------------ |\n            | `metcalf` | [`cutoff`, `standardised`][nplinker.scoring.MetcalfScoring.get_links] |\n\n    Returns:\n        A LinkGraph object containing the links for the given objects.\n\n    Raises:\n        ValueError: If input objects are empty or if the scoring method is invalid.\n        TypeError: If the input objects are not of the same type or if the object type is invalid.\n\n    Examples:\n        Using default scoring parameters:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n\n        Scoring parameters provided:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\", cutoff=0.5, standardised=True)\n    \"\"\"\n    # Validate objects\n    if len(objects) == 0:\n        raise ValueError(\"No objects provided to get links for\")\n    # check if all objects are of the same type\n    types = {type(i) for i in objects}\n    if len(types) &gt; 1:\n        raise TypeError(\"Input objects must be of the same type.\")\n    # check if the object type is valid\n    obj_type = next(iter(types))\n    if obj_type not in (BGC, GCF, Spectrum, MolecularFamily):\n        raise TypeError(\n            f\"Invalid type {obj_type}. Input objects must be BGC, GCF, Spectrum or MolecularFamily objects.\"\n        )\n\n    # Validate scoring method\n    if scoring_method not in self._valid_scoring_methods:\n        raise ValueError(f\"Invalid scoring method {scoring_method}.\")\n\n    # Check if the scoring method has been set up\n    if not self._scoring_methods_setup_done[scoring_method]:\n        self._valid_scoring_methods[scoring_method].setup(self)\n        self._scoring_methods_setup_done[scoring_method] = True\n\n    # Initialise the scoring method\n    scoring = self._valid_scoring_methods[scoring_method]()\n\n    return scoring.get_links(*objects, **scoring_params)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_bgc","title":"lookup_bgc","text":"<pre><code>lookup_bgc(id: str) -&gt; BGC | None\n</code></pre> <p>Get the BGC object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the BGC to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>BGC | None</code>           \u2013            <p>The BGC object with the given ID, or None if no such object exists.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; bgc = npl.lookup_bgc(\"BGC000001\")\n&gt;&gt;&gt; bgc\nBGC(id=\"BGC000001\", ...)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_bgc(self, id: str) -&gt; BGC | None:\n    \"\"\"Get the BGC object with the given ID.\n\n    Args:\n        id: the ID of the BGC to look up.\n\n    Returns:\n        The BGC object with the given ID, or None if no such object exists.\n\n    Examples:\n        &gt;&gt;&gt; bgc = npl.lookup_bgc(\"BGC000001\")\n        &gt;&gt;&gt; bgc\n        BGC(id=\"BGC000001\", ...)\n    \"\"\"\n    return self._bgc_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_gcf","title":"lookup_gcf","text":"<pre><code>lookup_gcf(id: str) -&gt; GCF | None\n</code></pre> <p>Get the GCF object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the GCF to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>GCF | None</code>           \u2013            <p>The GCF object with the given ID, or None if no such object exists.</p> </li> </ul> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_gcf(self, id: str) -&gt; GCF | None:\n    \"\"\"Get the GCF object with the given ID.\n\n    Args:\n        id: the ID of the GCF to look up.\n\n    Returns:\n        The GCF object with the given ID, or None if no such object exists.\n    \"\"\"\n    return self._gcf_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_spectrum","title":"lookup_spectrum","text":"<pre><code>lookup_spectrum(id: str) -&gt; Spectrum | None\n</code></pre> <p>Get the Spectrum object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the Spectrum to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>Spectrum | None</code>           \u2013            <p>The Spectrum object with the given ID, or None if no such object exists.</p> </li> </ul> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_spectrum(self, id: str) -&gt; Spectrum | None:\n    \"\"\"Get the Spectrum object with the given ID.\n\n    Args:\n        id: the ID of the Spectrum to look up.\n\n    Returns:\n        The Spectrum object with the given ID, or None if no such object exists.\n    \"\"\"\n    return self._spec_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.lookup_mf","title":"lookup_mf","text":"<pre><code>lookup_mf(id: str) -&gt; MolecularFamily | None\n</code></pre> <p>Get the MolecularFamily object with the given ID.</p> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the ID of the MolecularFamily to look up.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>MolecularFamily | None</code>           \u2013            <p>The MolecularFamily object with the given ID, or None if no such object exists.</p> </li> </ul> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def lookup_mf(self, id: str) -&gt; MolecularFamily | None:\n    \"\"\"Get the MolecularFamily object with the given ID.\n\n    Args:\n        id: the ID of the MolecularFamily to look up.\n\n    Returns:\n        The MolecularFamily object with the given ID, or None if no such object exists.\n    \"\"\"\n    return self._mf_dict.get(id, None)\n</code></pre>"},{"location":"api/nplinker/#nplinker.NPLinker.save_data","title":"save_data","text":"<pre><code>save_data(\n    file: str | PathLike, links: LinkGraph | None = None\n) -&gt; None\n</code></pre> <p>Pickle data to a file.</p> <p>The  pickled data is a tuple of BGCs, GCFs, Spectra, MolecularFamilies, StrainCollection and links, i.e. <code>(bgcs, gcfs, spectra, mfs, strains, links)</code>.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the pickle file to save the data to.</p> </li> <li> <code>links</code>               (<code>LinkGraph | None</code>, default:                   <code>None</code> )           \u2013            <p>The LinkGraph object to save.</p> </li> </ul> <p>Examples:</p> <p>Saving the data to a pickle file, links data is <code>None</code>:</p> <pre><code>&gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\")\n</code></pre> <p>Also saving the links data:</p> <pre><code>&gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n&gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n</code></pre> Source code in <code>src/nplinker/nplinker.py</code> <pre><code>def save_data(\n    self,\n    file: str | PathLike,\n    links: LinkGraph | None = None,\n) -&gt; None:\n    \"\"\"Pickle data to a file.\n\n    The  pickled data is a tuple of BGCs, GCFs, Spectra, MolecularFamilies, StrainCollection and\n    links, i.e. `(bgcs, gcfs, spectra, mfs, strains, links)`.\n\n    Args:\n        file: The path to the pickle file to save the data to.\n        links: The LinkGraph object to save.\n\n    Examples:\n        Saving the data to a pickle file, links data is `None`:\n        &gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\")\n\n        Also saving the links data:\n        &gt;&gt;&gt; lg = npl.get_links(npl.gcfs, \"metcalf\")\n        &gt;&gt;&gt; npl.save_data(\"path/to/output.pkl\", lg)\n    \"\"\"\n    data = (self.bgcs, self.gcfs, self.spectra, self.mfs, self.strains, links)\n    with open(file, \"wb\") as f:\n        pickle.dump(data, f)\n</code></pre>"},{"location":"api/nplinker/#nplinker.setup_logging","title":"setup_logging","text":"<pre><code>setup_logging(\n    level: str = \"INFO\",\n    file: str = \"\",\n    use_console: bool = True,\n) -&gt; None\n</code></pre> <p>Setup logging configuration for the ancestor logger \"nplinker\".</p> Usage Documentation <p>How to setup logging</p> <p>Parameters:</p> <ul> <li> <code>level</code>               (<code>str</code>, default:                   <code>'INFO'</code> )           \u2013            <p>The log level, use the logging module's log level constants. Valid levels are: <code>NOTSET</code>, <code>DEBUG</code>, <code>INFO</code>, <code>WARNING</code>, <code>ERROR</code>, <code>CRITICAL</code>.</p> </li> <li> <code>file</code>               (<code>str</code>, default:                   <code>''</code> )           \u2013            <p>The file to write the log to. If the file is an empty string (by default), the log will not be written to a file. If the file does not exist, it will be created. The log will be written to the file in append mode.</p> </li> <li> <code>use_console</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>Whether to log to the console.</p> </li> </ul> Source code in <code>src/nplinker/logger.py</code> <pre><code>def setup_logging(level: str = \"INFO\", file: str = \"\", use_console: bool = True) -&gt; None:\n    \"\"\"Setup logging configuration for the ancestor logger \"nplinker\".\n\n    ??? info \"Usage Documentation\"\n        [How to setup logging][how-to-setup-logging]\n\n    Args:\n        level: The log level, use the logging module's log level constants.\n            Valid levels are: `NOTSET`, `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.\n        file: The file to write the log to.\n            If the file is an empty string (by default), the log will not be written to a file.\n            If the file does not exist, it will be created.\n            The log will be written to the file in append mode.\n        use_console: Whether to log to the console.\n    \"\"\"\n    # Get the ancestor logger \"nplinker\"\n    logger = logging.getLogger(\"nplinker\")\n    logger.setLevel(level)\n\n    # File handler\n    if file:\n        logger.addHandler(\n            RichHandler(\n                console=Console(file=open(file, \"a\"), width=120),  # force the line width to 120\n                omit_repeated_times=False,\n                rich_tracebacks=True,\n                tracebacks_show_locals=True,\n                log_time_format=\"[%Y-%m-%d %X]\",\n            )\n        )\n\n    # Console handler\n    if use_console:\n        logger.addHandler(\n            RichHandler(\n                omit_repeated_times=False,\n                rich_tracebacks=True,\n                tracebacks_show_locals=True,\n                log_time_format=\"[%Y-%m-%d %X]\",\n            )\n        )\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults","title":"nplinker.defaults","text":""},{"location":"api/nplinker/#nplinker.defaults.NPLINKER_APP_DATA_DIR","title":"NPLINKER_APP_DATA_DIR  <code>module-attribute</code>","text":"<pre><code>NPLINKER_APP_DATA_DIR: Final = parent / 'data'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.STRAIN_MAPPINGS_FILENAME","title":"STRAIN_MAPPINGS_FILENAME  <code>module-attribute</code>","text":"<pre><code>STRAIN_MAPPINGS_FILENAME: Final = 'strain_mappings.json'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GENOME_BGC_MAPPINGS_FILENAME","title":"GENOME_BGC_MAPPINGS_FILENAME  <code>module-attribute</code>","text":"<pre><code>GENOME_BGC_MAPPINGS_FILENAME: Final = (\n    \"genome_bgc_mappings.json\"\n)\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GENOME_STATUS_FILENAME","title":"GENOME_STATUS_FILENAME  <code>module-attribute</code>","text":"<pre><code>GENOME_STATUS_FILENAME: Final = 'genome_status.json'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_SPECTRA_FILENAME","title":"GNPS_SPECTRA_FILENAME  <code>module-attribute</code>","text":"<pre><code>GNPS_SPECTRA_FILENAME: Final = 'spectra.mgf'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_MOLECULAR_FAMILY_FILENAME","title":"GNPS_MOLECULAR_FAMILY_FILENAME  <code>module-attribute</code>","text":"<pre><code>GNPS_MOLECULAR_FAMILY_FILENAME: Final = (\n    \"molecular_families.tsv\"\n)\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_ANNOTATIONS_FILENAME","title":"GNPS_ANNOTATIONS_FILENAME  <code>module-attribute</code>","text":"<pre><code>GNPS_ANNOTATIONS_FILENAME: Final = 'annotations.tsv'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_FILE_MAPPINGS_TSV","title":"GNPS_FILE_MAPPINGS_TSV  <code>module-attribute</code>","text":"<pre><code>GNPS_FILE_MAPPINGS_TSV: Final = 'file_mappings.tsv'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_FILE_MAPPINGS_CSV","title":"GNPS_FILE_MAPPINGS_CSV  <code>module-attribute</code>","text":"<pre><code>GNPS_FILE_MAPPINGS_CSV: Final = 'file_mappings.csv'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.STRAINS_SELECTED_FILENAME","title":"STRAINS_SELECTED_FILENAME  <code>module-attribute</code>","text":"<pre><code>STRAINS_SELECTED_FILENAME: Final = 'strains_selected.json'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.DOWNLOADS_DIRNAME","title":"DOWNLOADS_DIRNAME  <code>module-attribute</code>","text":"<pre><code>DOWNLOADS_DIRNAME: Final = 'downloads'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.MIBIG_DIRNAME","title":"MIBIG_DIRNAME  <code>module-attribute</code>","text":"<pre><code>MIBIG_DIRNAME: Final = 'mibig'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.GNPS_DIRNAME","title":"GNPS_DIRNAME  <code>module-attribute</code>","text":"<pre><code>GNPS_DIRNAME: Final = 'gnps'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.ANTISMASH_DIRNAME","title":"ANTISMASH_DIRNAME  <code>module-attribute</code>","text":"<pre><code>ANTISMASH_DIRNAME: Final = 'antismash'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.BIGSCAPE_DIRNAME","title":"BIGSCAPE_DIRNAME  <code>module-attribute</code>","text":"<pre><code>BIGSCAPE_DIRNAME: Final = 'bigscape'\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.BIGSCAPE_RUNNING_OUTPUT_DIRNAME","title":"BIGSCAPE_RUNNING_OUTPUT_DIRNAME  <code>module-attribute</code>","text":"<pre><code>BIGSCAPE_RUNNING_OUTPUT_DIRNAME: Final = (\n    \"bigscape_running_output\"\n)\n</code></pre>"},{"location":"api/nplinker/#nplinker.defaults.OUTPUT_DIRNAME","title":"OUTPUT_DIRNAME  <code>module-attribute</code>","text":"<pre><code>OUTPUT_DIRNAME: Final = 'output'\n</code></pre>"},{"location":"api/nplinker/#nplinker.config","title":"nplinker.config","text":""},{"location":"api/nplinker/#nplinker.config.CONFIG_VALIDATORS","title":"CONFIG_VALIDATORS  <code>module-attribute</code>","text":"<pre><code>CONFIG_VALIDATORS = [\n    Validator(\n        \"root_dir\",\n        required=True,\n        cast=transform_to_full_path,\n        condition=lambda v: is_dir(),\n    ),\n    Validator(\n        \"mode\",\n        required=True,\n        cast=lambda v: lower(),\n        is_in=[\"local\", \"podp\"],\n    ),\n    Validator(\n        \"podp_id\",\n        required=True,\n        when=Validator(\"mode\", eq=\"podp\"),\n    ),\n    Validator(\n        \"podp_id\",\n        required=False,\n        when=Validator(\"mode\", eq=\"local\"),\n    ),\n    Validator(\n        \"log.level\",\n        is_type_of=str,\n        cast=lambda v: upper(),\n        is_in=[\n            \"NOTSET\",\n            \"DEBUG\",\n            \"INFO\",\n            \"WARNING\",\n            \"ERROR\",\n            \"CRITICAL\",\n        ],\n    ),\n    Validator(\"log.file\", is_type_of=str),\n    Validator(\"log.use_console\", is_type_of=bool),\n    Validator(\n        \"mibig.to_use\", required=True, is_type_of=bool\n    ),\n    Validator(\n        \"mibig.version\",\n        required=True,\n        is_type_of=str,\n        when=Validator(\"mibig.to_use\", eq=True),\n    ),\n    Validator(\n        \"bigscape.parameters\", required=True, is_type_of=str\n    ),\n    Validator(\n        \"bigscape.cutoff\", required=True, is_type_of=str\n    ),\n    Validator(\n        \"bigscape.version\", required=True, is_type_of=int\n    ),\n    Validator(\n        \"scoring.methods\",\n        required=True,\n        cast=lambda v: [lower() for i in v],\n        is_type_of=list,\n        len_min=1,\n        condition=lambda v: issubset(\n            {\"metcalf\", \"rosetta\"}\n        ),\n    ),\n]\n</code></pre>"},{"location":"api/nplinker/#nplinker.config.load_config","title":"load_config","text":"<pre><code>load_config(config_file: str | PathLike) -&gt; Dynaconf\n</code></pre> <p>Load and validate the configuration file.</p> Usage Documentation <p>Config Loader</p> <p>Parameters:</p> <ul> <li> <code>config_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the configuration file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>Dynaconf</code> (              <code>Dynaconf</code> )          \u2013            <p>A Dynaconf object containing the configuration settings.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>FileNotFoundError</code>             \u2013            <p>If the configuration file does not exist.</p> </li> </ul> Source code in <code>src/nplinker/config.py</code> <pre><code>def load_config(config_file: str | PathLike) -&gt; Dynaconf:\n    \"\"\"Load and validate the configuration file.\n\n    ??? info \"Usage Documentation\"\n        [Config Loader][config-loader]\n\n    Args:\n        config_file: Path to the configuration file.\n\n    Returns:\n        Dynaconf: A Dynaconf object containing the configuration settings.\n\n    Raises:\n        FileNotFoundError: If the configuration file does not exist.\n    \"\"\"\n    config_file = transform_to_full_path(config_file)\n    if not config_file.exists():\n        raise FileNotFoundError(f\"Config file '{config_file}' not found\")\n\n    # Locate the default config file\n    default_config_file = Path(__file__).resolve().parent / \"nplinker_default.toml\"\n\n    # Load config files\n    config = Dynaconf(settings_files=[config_file], preload=[default_config_file])\n\n    # Validate configs\n    config.validators.register(*CONFIG_VALIDATORS)\n    config.validators.validate()\n\n    return config\n</code></pre>"},{"location":"api/schema/","title":"Schemas","text":""},{"location":"api/schema/#nplinker.schemas","title":"nplinker.schemas","text":""},{"location":"api/schema/#nplinker.schemas.GENOME_STATUS_SCHEMA","title":"GENOME_STATUS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>GENOME_STATUS_SCHEMA = load(f)\n</code></pre> <p>Schema for the genome status JSON file.</p> Schema Content: <code>genome_status_schema.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/genome_status_schema.json\",\n  \"title\": \"Status of genomes\",\n  \"description\": \"A list of genome status objects, each of which contains information about a single genome\",\n  \"type\": \"object\",\n  \"required\": [\n    \"genome_status\",\n    \"version\"\n  ],\n  \"properties\": {\n    \"genome_status\": {\n      \"type\": \"array\",\n      \"title\": \"Genome status\",\n      \"description\": \"A list of genome status objects\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"original_id\",\n          \"resolved_refseq_id\",\n          \"resolve_attempted\",\n          \"bgc_path\"\n        ],\n        \"properties\": {\n          \"original_id\": {\n            \"type\": \"string\",\n            \"title\": \"Original ID\",\n            \"description\": \"The original ID of the genome\",\n            \"minLength\": 1\n          },\n          \"resolved_refseq_id\": {\n            \"type\": \"string\",\n            \"title\": \"Resolved RefSeq ID\",\n            \"description\": \"The RefSeq ID that was resolved for this genome\"\n          },\n          \"resolve_attempted\": {\n            \"type\": \"boolean\",\n            \"title\": \"Resolve Attempted\",\n            \"description\": \"Whether or not an attempt was made to resolve this genome\"\n          },\n          \"bgc_path\": {\n            \"type\": \"string\",\n            \"title\": \"BGC Path\",\n            \"description\": \"The path to the downloaded BGC file for this genome\"\n          }\n        }\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.GENOME_BGC_MAPPINGS_SCHEMA","title":"GENOME_BGC_MAPPINGS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>GENOME_BGC_MAPPINGS_SCHEMA = load(f)\n</code></pre> <p>Schema for genome BGC mappings JSON file.</p> Schema Content: <code>genome_bgc_mappings_schema.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/genome_bgc_mappings_schema.json\",\n  \"title\": \"Mappings from genome ID to BGC IDs\",\n  \"description\": \"A list of mappings from genome ID to BGC (biosynthetic gene cluster) IDs\",\n  \"type\": \"object\",\n  \"required\": [\n    \"mappings\",\n    \"version\"\n  ],\n  \"properties\": {\n    \"mappings\": {\n      \"type\": \"array\",\n      \"title\": \"Mappings from genome ID to BGC IDs\",\n      \"description\": \"A list of mappings from genome ID to BGC IDs\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"genome_ID\",\n          \"BGC_ID\"\n        ],\n        \"properties\": {\n          \"genome_ID\": {\n            \"type\": \"string\",\n            \"title\": \"Genome ID\",\n            \"description\": \"The genome ID used in BGC database such as antiSMASH\",\n            \"minLength\": 1\n          },\n          \"BGC_ID\": {\n            \"type\": \"array\",\n            \"title\": \"BGC ID\",\n            \"description\": \"A list of BGC IDs\",\n            \"items\": {\n              \"type\": \"string\",\n              \"minLength\": 1\n            },\n            \"minItems\": 1,\n            \"uniqueItems\": true\n          }\n        }\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.STRAIN_MAPPINGS_SCHEMA","title":"STRAIN_MAPPINGS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>STRAIN_MAPPINGS_SCHEMA = load(f)\n</code></pre> <p>Schema for strain mappings JSON file.</p> Schema Content: <code>strain_mappings_schema.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/strain_mappings_schema.json\",\n  \"title\": \"Strain mappings\",\n  \"description\": \"A list of mappings from strain ID to strain aliases\",\n  \"type\": \"object\",\n  \"required\": [\n    \"strain_mappings\",\n    \"version\"\n  ],\n  \"properties\": {\n    \"strain_mappings\": {\n      \"type\": \"array\",\n      \"title\": \"Strain mappings\",\n      \"description\": \"A list of strain mappings\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"strain_id\",\n          \"strain_alias\"\n        ],\n        \"properties\": {\n          \"strain_id\": {\n            \"type\": \"string\",\n            \"title\": \"Strain ID\",\n            \"description\": \"Strain ID, which could be any strain name or accession number\",\n            \"minLength\": 1\n          },\n          \"strain_alias\": {\n            \"type\": \"array\",\n            \"title\": \"Strain aliases\",\n            \"description\": \"A list of strain aliases, which could be any names that refer to the same strain\",\n            \"items\": {\n              \"type\": \"string\",\n              \"minLength\": 1\n            },\n            \"minItems\": 1,\n            \"uniqueItems\": true\n          }\n        }\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.USER_STRAINS_SCHEMA","title":"USER_STRAINS_SCHEMA  <code>module-attribute</code>","text":"<pre><code>USER_STRAINS_SCHEMA = load(f)\n</code></pre> <p>Schema for user strains JSON file.</p> Schema Content: <code>user_strains.json</code> <pre><code>{\n  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/user_strains.json\",\n  \"title\": \"User specified strains\",\n  \"description\": \"A list of strain IDs specified by user\",\n  \"type\": \"object\",\n  \"required\": [\n    \"strain_ids\"\n  ],\n  \"properties\": {\n    \"strain_ids\": {\n      \"type\": \"array\",\n      \"title\": \"Strain IDs\",\n      \"description\": \"A list of strain IDs specified by user. The strain IDs must be the same as the ones in the strain mappings file.\",\n      \"items\": {\n        \"type\": \"string\",\n        \"minLength\": 1\n      },\n      \"minItems\": 1,\n      \"uniqueItems\": true\n    },\n    \"version\": {\n      \"type\": \"string\",\n      \"enum\": [\n        \"1.0\"\n      ]\n    }\n  },\n  \"additionalProperties\": false\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.PODP_ADAPTED_SCHEMA","title":"PODP_ADAPTED_SCHEMA  <code>module-attribute</code>","text":"<pre><code>PODP_ADAPTED_SCHEMA = load(f)\n</code></pre> <p>Schema for PODP JSON file.</p> <p>The PODP JSON file is the project JSON file downloaded from PODP platform. For example, for PODP project MSV000079284, its JSON file is https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.</p> Schema Content: <code>podp_adapted_schema.json</code> <pre><code>{\n  \"$schema\": \"http://json-schema.org/draft-07/schema#\",\n  \"$id\": \"https://raw.githubusercontent.com/NPLinker/nplinker/main/src/nplinker/schemas/podp_adapted_schema.json\",\n  \"title\": \"Adapted Paired Omics Data Platform Schema for NPLinker\",\n  \"description\": \"This schema is adapted from PODP schema (https://pairedomicsdata.bioinformatics.nl/schema.json) for NPLinker. It's used to validate the input data for NPLinker. Thus, only required fields for NPLinker are kept in this schema, and some fields are modified to fit NPLinker's requirements.\",\n  \"type\": \"object\",\n  \"required\": [\n    \"version\",\n    \"metabolomics\",\n    \"genomes\",\n    \"genome_metabolome_links\"\n  ],\n  \"properties\": {\n    \"version\": {\n      \"type\": \"string\",\n      \"readOnly\": true,\n      \"default\": \"3\",\n      \"enum\": [\n        \"3\"\n      ]\n    },\n    \"metabolomics\": {\n      \"type\": \"object\",\n      \"title\": \"2. Metabolomics Information\",\n      \"description\": \"Please provide basic information on the publicly available metabolomics project from which paired data is available. Currently, we allow for links to mass spectrometry data deposited in GNPS-MaSSIVE or MetaboLights.\",\n      \"properties\": {\n        \"project\": {\n          \"type\": \"object\",\n          \"required\": [\n            \"molecular_network\"\n          ],\n          \"title\": \"GNPS-MassIVE\",\n          \"properties\": {\n            \"GNPSMassIVE_ID\": {\n              \"type\": \"string\",\n              \"title\": \"GNPS-MassIVE identifier\",\n              \"description\": \"Please provide the GNPS-MassIVE identifier of your metabolomics data set, e.g., MSV000078839.\",\n              \"pattern\": \"^MSV[0-9]{9}$\"\n            },\n            \"MaSSIVE_URL\": {\n              \"type\": \"string\",\n              \"title\": \"Link to MassIVE upload\",\n              \"description\": \"Please provide the link to the MassIVE upload, e.g., &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=a507232a787243a5afd69a6c6fa1e508&amp;view=advanced_view\\\"&gt;https://gnps.ucsd.edu/ProteoSAFe/result.jsp?task=a507232a787243a5afd69a6c6fa1e508&amp;view=advanced_view&lt;/a&gt;. Warning, there cannot be spaces in the URI.\",\n              \"format\": \"uri\"\n            },\n            \"molecular_network\": {\n              \"type\": \"string\",\n              \"pattern\": \"^[0-9a-z]{32}$\",\n              \"title\": \"Molecular Network Task ID\",\n              \"description\": \"If you have run a Molecular Network on GNPS, please provide the task ID of the Molecular Network job. It can be found in the URL of the Molecular Networking job, e.g., in &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c36f90ba29fe44c18e96db802de0c6b9\\\"&gt;https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=c36f90ba29fe44c18e96db802de0c6b9&lt;/a&gt; the task ID is c36f90ba29fe44c18e96db802de0c6b9.\"\n            }\n          }\n        }\n      },\n      \"required\": [\n        \"project\"\n      ],\n      \"additionalProperties\": true\n    },\n    \"genomes\": {\n      \"type\": \"array\",\n      \"title\": \"3. (Meta)genomics Information\",\n      \"description\": \"Please add all genomes and/or metagenomes for which paired data is available as separate entries.\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"genome_ID\",\n          \"genome_label\"\n        ],\n        \"properties\": {\n          \"genome_ID\": {\n            \"type\": \"object\",\n            \"title\": \"Genome accession\",\n            \"description\": \"At least one of the three identifiers is required.\",\n            \"anyOf\": [\n              {\n                \"required\": [\n                  \"GenBank_accession\"\n                ]\n              },\n              {\n                \"required\": [\n                  \"RefSeq_accession\"\n                ]\n              },\n              {\n                \"required\": [\n                  \"JGI_Genome_ID\"\n                ]\n              }\n            ],\n            \"properties\": {\n              \"GenBank_accession\": {\n                \"type\": \"string\",\n                \"title\": \"GenBank accession number\",\n                \"description\": \"If the publicly available genome got a GenBank accession number assigned, e.g., &lt;a href=\\\"https://www.ncbi.nlm.nih.gov/nuccore/AL645882\\\" target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\"&gt;AL645882&lt;/a&gt;, please provide it here. The genome sequence must be submitted to GenBank/ENA/DDBJ (and an accession number must be received) before this form can be filled out. In case of a whole genome sequence, please use master records. At least one identifier must be entered.\",\n                \"minLength\": 1\n              },\n              \"RefSeq_accession\": {\n                \"type\": \"string\",\n                \"title\": \"RefSeq accession number\",\n                \"description\": \"For example: &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://www.ncbi.nlm.nih.gov/nuccore/NC_003888.3\\\"&gt;NC_003888.3&lt;/a&gt;\",\n                \"minLength\": 1\n              },\n              \"JGI_Genome_ID\": {\n                \"type\": \"string\",\n                \"title\": \"JGI IMG genome ID\",\n                \"description\": \"For example: &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=TaxonDetail&amp;page=taxonDetail&amp;taxon_oid=641228474\\\"&gt;641228474&lt;/a&gt;\",\n                \"minLength\": 1\n              }\n            }\n          },\n          \"genome_label\": {\n            \"type\": \"string\",\n            \"title\": \"Genome label\",\n            \"description\": \"Please assign a unique Genome Label for this genome or metagenome to help you recall it during the linking step. For example 'Streptomyces sp. CNB091'\",\n            \"minLength\": 1\n          }\n        }\n      },\n      \"minItems\": 1\n    },\n    \"genome_metabolome_links\": {\n      \"type\": \"array\",\n      \"title\": \"6. Genome - Proteome - Metabolome Links\",\n      \"description\": \"Create a linked pair by selecting the Genome Label and optional Proteome label as provided earlier. Subsequently links to the metabolomics data file belonging to that genome/proteome with appropriate experimental methods.\",\n      \"items\": {\n        \"type\": \"object\",\n        \"required\": [\n          \"genome_label\",\n          \"metabolomics_file\"\n        ],\n        \"properties\": {\n          \"genome_label\": {\n            \"type\": \"string\",\n            \"title\": \"Genome/Metagenome\",\n            \"description\": \"Please select the Genome Label to be linked to a metabolomics data file.\"\n          },\n          \"metabolomics_file\": {\n            \"type\": \"string\",\n            \"title\": \"Location of metabolomics data file\",\n            \"description\": \"Please provide a direct link to the metabolomics data file location, e.g. &lt;a href=\\\"ftp://massive.ucsd.edu/MSV000078839/spectrum/R5/CNB091_R5_M.mzXML\\\" target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\"&gt;ftp://massive.ucsd.edu/MSV000078839/spectrum/R5/CNB091_R5_M.mzXML&lt;/a&gt; found in the FTP download of a MassIVE dataset or &lt;a target=\\\"_blank\\\" rel=\\\"noopener noreferrer\\\" href=\\\"https://www.ebi.ac.uk/metabolights/MTBLS307/files/Urine_44_fullscan1_pos.mzXML\\\"&gt;https://www.ebi.ac.uk/metabolights/MTBLS307/files/Urine_44_fullscan1_pos.mzXML&lt;/a&gt; found in the Files section of a MetaboLights study. Warning, there cannot be spaces in the URI.\",\n            \"format\": \"uri\"\n          }\n        },\n        \"additionalProperties\": true\n      },\n      \"minItems\": 1\n    }\n  },\n  \"additionalProperties\": true\n}\n</code></pre>"},{"location":"api/schema/#nplinker.schemas.validate_podp_json","title":"validate_podp_json","text":"<pre><code>validate_podp_json(json_data: dict) -&gt; None\n</code></pre> <p>Validate JSON data against the PODP JSON schema.</p> <p>All validation error messages are collected and raised as a single ValueError.</p> <p>Parameters:</p> <ul> <li> <code>json_data</code>               (<code>dict</code>)           \u2013            <p>The JSON data to validate.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the JSON data does not match the schema.</p> </li> </ul> <p>Examples:</p> <p>Download PODP JSON file for project MSV000079284 from https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4 and save it as <code>podp_project.json</code>.</p> <p>Validate it:</p> <pre><code>&gt;&gt;&gt; with open(podp_project.json, \"r\") as f:\n...     json_data = json.load(f)\n&gt;&gt;&gt; validate_podp_json(json_data)\n</code></pre> Source code in <code>src/nplinker/schemas/__init__.py</code> <pre><code>def validate_podp_json(json_data: dict) -&gt; None:\n    \"\"\"Validate JSON data against the PODP JSON schema.\n\n    All validation error messages are collected and raised as a single\n    ValueError.\n\n    Args:\n        json_data: The JSON data to validate.\n\n    Raises:\n        ValueError: If the JSON data does not match the schema.\n\n    Examples:\n        Download PODP JSON file for project MSV000079284 from\n        https://pairedomicsdata.bioinformatics.nl/api/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4\n        and save it as `podp_project.json`.\n\n        Validate it:\n        &gt;&gt;&gt; with open(podp_project.json, \"r\") as f:\n        ...     json_data = json.load(f)\n        &gt;&gt;&gt; validate_podp_json(json_data)\n    \"\"\"\n    validator = Draft7Validator(PODP_ADAPTED_SCHEMA)\n    errors = sorted(validator.iter_errors(json_data), key=lambda e: e.path)\n    if errors:\n        error_messages = [f\"{e.json_path}: {e.message}\" for e in errors]\n        raise ValueError(\n            \"Not match PODP adapted schema, here are the detailed error:\\n  - \"\n            + \"\\n  - \".join(error_messages)\n        )\n</code></pre>"},{"location":"api/scoring/","title":"Data Models","text":""},{"location":"api/scoring/#nplinker.scoring","title":"nplinker.scoring","text":""},{"location":"api/scoring/#nplinker.scoring.LinkGraph","title":"LinkGraph","text":"<pre><code>LinkGraph()\n</code></pre> <p>Class to represent the links between objects in NPLinker.</p> <p>This class wraps the <code>networkx.Graph</code> class to provide a more user-friendly interface for working with the links.</p> <p>The links between objects are stored as edges in a graph, while the objects themselves are stored as nodes.</p> <p>The scoring data for each link (or link data) is stored as the key/value attributes of the edge.</p> <p>Examples:</p> <p>Create a LinkGraph object:</p> <pre><code>&gt;&gt;&gt; lg = LinkGraph()\n</code></pre> <p>Add a link between a GCF and a Spectrum object:</p> <pre><code>&gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n</code></pre> <p>Get all links for a given object:</p> <pre><code>&gt;&gt;&gt; lg[gcf]\n{spectrum: {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}}\n</code></pre> <p>Get all links in the LinkGraph:</p> <pre><code>&gt;&gt;&gt; lg.links\n[(gcf, spectrum, {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})})]\n</code></pre> <p>Check if there is a link between two objects:</p> <pre><code>&gt;&gt;&gt; lg.has_link(gcf, spectrum)\nTrue\n</code></pre> <p>Get the link data between two objects:</p> <pre><code>&gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n{\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def __init__(self) -&gt; None:\n    \"\"\"Initialize a LinkGraph object.\n\n    Examples:\n        Create a LinkGraph object:\n        &gt;&gt;&gt; lg = LinkGraph()\n\n        Add a link between a GCF and a Spectrum object:\n        &gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n\n        Get all links for a given object:\n        &gt;&gt;&gt; lg[gcf]\n        {spectrum: {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}}\n\n        Get all links in the LinkGraph:\n        &gt;&gt;&gt; lg.links\n        [(gcf, spectrum, {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})})]\n\n        Check if there is a link between two objects:\n        &gt;&gt;&gt; lg.has_link(gcf, spectrum)\n        True\n\n        Get the link data between two objects:\n        &gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n        {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n    \"\"\"\n    self._g: Graph = Graph()\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.links","title":"links  <code>property</code>","text":"<pre><code>links: list[LINK]\n</code></pre> <p>Get all links.</p> <p>Returns:</p> <ul> <li> <code>list[LINK]</code>           \u2013            <p>A list of tuples containing the links between objects.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.links\n[(gcf, spectrum, {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})})]\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> <p>Get a short summary of the LinkGraph.</p> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def __str__(self) -&gt; str:\n    \"\"\"Get a short summary of the LinkGraph.\"\"\"\n    return f\"{self.__class__.__name__}(#links={len(self.links)}, #objects={len(self)})\"\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.__len__","title":"__len__","text":"<pre><code>__len__() -&gt; int\n</code></pre> <p>Get the number of objects.</p> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def __len__(self) -&gt; int:\n    \"\"\"Get the number of objects.\"\"\"\n    return len(self._g)\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.__getitem__","title":"__getitem__","text":"<pre><code>__getitem__(u: Entity) -&gt; dict[Entity, LINK_DATA]\n</code></pre> <p>Get all links for a given object.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the given object</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>dict[Entity, LINK_DATA]</code>           \u2013            <p>A dictionary of links for the given object.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>KeyError</code>             \u2013            <p>if the input object is not found in the link graph.</p> </li> </ul> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_u\ndef __getitem__(self, u: Entity) -&gt; dict[Entity, LINK_DATA]:\n    \"\"\"Get all links for a given object.\n\n    Args:\n        u: the given object\n\n    Returns:\n        A dictionary of links for the given object.\n\n    Raises:\n        KeyError: if the input object is not found in the link graph.\n    \"\"\"\n    try:\n        links = self._g[u]\n    except KeyError:\n        raise KeyError(f\"{u} not found in the link graph.\")\n\n    return {**links}  # type: ignore\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.add_link","title":"add_link","text":"<pre><code>add_link(u: Entity, v: Entity, **data: Score) -&gt; None\n</code></pre> <p>Add a link between two objects.</p> <p>The objects <code>u</code> and <code>v</code> must be different types, i.e. one must be a GCF and the other must be a Spectrum or MolecularFamily.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the first object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>v</code>               (<code>Entity</code>)           \u2013            <p>the second object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>data</code>               (<code>Score</code>, default:                   <code>{}</code> )           \u2013            <p>keyword arguments. At least one scoring method and its data must be provided. The key must be the name of the scoring method defined in <code>ScoringMethod</code>, and the value is a <code>Score</code> object, e.g. <code>metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})</code>.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_uv\ndef add_link(\n    self,\n    u: Entity,\n    v: Entity,\n    **data: Score,\n) -&gt; None:\n    \"\"\"Add a link between two objects.\n\n    The objects `u` and `v` must be different types, i.e. one must be a GCF and the other must be\n    a Spectrum or MolecularFamily.\n\n    Args:\n        u: the first object, either a GCF, Spectrum, or MolecularFamily\n        v: the second object, either a GCF, Spectrum, or MolecularFamily\n        data: keyword arguments. At least one scoring method and its data must be provided.\n            The key must be the name of the scoring method defined in `ScoringMethod`, and the\n            value is a `Score` object, e.g. `metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})`.\n\n    Examples:\n        &gt;&gt;&gt; lg.add_link(gcf, spectrum, metcalf=Score(\"metcalf\", 1.0, {\"cutoff\": 0.5}))\n    \"\"\"\n    # validate the data\n    if not data:\n        raise ValueError(\"At least one scoring method and its data must be provided.\")\n    for key, value in data.items():\n        if not ScoringMethod.has_value(key):\n            raise ValueError(\n                f\"{key} is not a valid name of scoring method. See `ScoringMethod` for valid names.\"\n            )\n        if not isinstance(value, Score):\n            raise TypeError(f\"{value} is not a Score object.\")\n\n    self._g.add_edge(u, v, **data)\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.has_link","title":"has_link","text":"<pre><code>has_link(u: Entity, v: Entity) -&gt; bool\n</code></pre> <p>Check if there is a link between two objects.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the first object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>v</code>               (<code>Entity</code>)           \u2013            <p>the second object, either a GCF, Spectrum, or MolecularFamily</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if there is a link between the two objects, False otherwise</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.has_link(gcf, spectrum)\nTrue\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_uv\ndef has_link(self, u: Entity, v: Entity) -&gt; bool:\n    \"\"\"Check if there is a link between two objects.\n\n    Args:\n        u: the first object, either a GCF, Spectrum, or MolecularFamily\n        v: the second object, either a GCF, Spectrum, or MolecularFamily\n\n    Returns:\n        True if there is a link between the two objects, False otherwise\n\n    Examples:\n        &gt;&gt;&gt; lg.has_link(gcf, spectrum)\n        True\n    \"\"\"\n    return self._g.has_edge(u, v)\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.get_link_data","title":"get_link_data","text":"<pre><code>get_link_data(u: Entity, v: Entity) -&gt; LINK_DATA | None\n</code></pre> <p>Get the data for a link between two objects.</p> <p>Parameters:</p> <ul> <li> <code>u</code>               (<code>Entity</code>)           \u2013            <p>the first object, either a GCF, Spectrum, or MolecularFamily</p> </li> <li> <code>v</code>               (<code>Entity</code>)           \u2013            <p>the second object, either a GCF, Spectrum, or MolecularFamily</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>LINK_DATA | None</code>           \u2013            <p>A dictionary of scoring methods and their data for the link between the two objects, or</p> </li> <li> <code>LINK_DATA | None</code>           \u2013            <p>None if there is no link between the two objects.</p> </li> </ul> <p>Examples:</p> <pre><code>&gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n{\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>@validate_uv\ndef get_link_data(\n    self,\n    u: Entity,\n    v: Entity,\n) -&gt; LINK_DATA | None:\n    \"\"\"Get the data for a link between two objects.\n\n    Args:\n        u: the first object, either a GCF, Spectrum, or MolecularFamily\n        v: the second object, either a GCF, Spectrum, or MolecularFamily\n\n    Returns:\n        A dictionary of scoring methods and their data for the link between the two objects, or\n        None if there is no link between the two objects.\n\n    Examples:\n        &gt;&gt;&gt; lg.get_link_data(gcf, spectrum)\n        {\"metcalf\": Score(\"metcalf\", 1.0, {\"cutoff\": 0.5})}\n    \"\"\"\n    return self._g.get_edge_data(u, v)  # type: ignore\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.LinkGraph.filter","title":"filter","text":"<pre><code>filter(\n    u_nodes: Sequence[Entity],\n    v_nodes: Sequence[Entity] = [],\n) -&gt; LinkGraph\n</code></pre> <p>Return a new LinkGraph object with the filtered links between the given objects.</p> <p>The new LinkGraph object will only contain the links between <code>u_nodes</code> and <code>v_nodes</code>.</p> <p>If <code>u_nodes</code> or <code>v_nodes</code> is empty, the new LinkGraph object will contain the links for the given objects in <code>v_nodes</code> or <code>u_nodes</code>, respectively. If both are empty, return an empty LinkGraph object.</p> <p>Note that not all objects in <code>u_nodes</code> and <code>v_nodes</code> need to be present in the original LinkGraph.</p> <p>Parameters:</p> <ul> <li> <code>u_nodes</code>               (<code>Sequence[Entity]</code>)           \u2013            <p>a sequence of objects used as the first object in the links</p> </li> <li> <code>v_nodes</code>               (<code>Sequence[Entity]</code>, default:                   <code>[]</code> )           \u2013            <p>a sequence of objects used as the second object in the links</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>LinkGraph</code>           \u2013            <p>A new LinkGraph object with the filtered links between the given objects.</p> </li> </ul> <p>Examples:</p> <p>Filter the links for <code>gcf1</code> and <code>gcf2</code>:</p> <pre><code>&gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2])\nFilter the links for `spectrum1` and `spectrum2`:\n&gt;&gt;&gt; new_lg = lg.filter([spectrum1, spectrum2])\nFilter the links between two lists of objects:\n&gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])\n</code></pre> Source code in <code>src/nplinker/scoring/link_graph.py</code> <pre><code>def filter(self, u_nodes: Sequence[Entity], v_nodes: Sequence[Entity] = [], /) -&gt; LinkGraph:\n    \"\"\"Return a new LinkGraph object with the filtered links between the given objects.\n\n    The new LinkGraph object will only contain the links between `u_nodes` and `v_nodes`.\n\n    If `u_nodes` or `v_nodes` is empty, the new LinkGraph object will contain the links for\n    the given objects in `v_nodes` or `u_nodes`, respectively. If both are empty, return an\n    empty LinkGraph object.\n\n    Note that not all objects in `u_nodes` and `v_nodes` need to be present in the original\n    LinkGraph.\n\n    Args:\n        u_nodes: a sequence of objects used as the first object in the links\n        v_nodes: a sequence of objects used as the second object in the links\n\n    Returns:\n        A new LinkGraph object with the filtered links between the given objects.\n\n    Examples:\n        Filter the links for `gcf1` and `gcf2`:\n        &gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2])\n        Filter the links for `spectrum1` and `spectrum2`:\n        &gt;&gt;&gt; new_lg = lg.filter([spectrum1, spectrum2])\n        Filter the links between two lists of objects:\n        &gt;&gt;&gt; new_lg = lg.filter([gcf1, gcf2], [spectrum1, spectrum2])\n    \"\"\"\n    lg = LinkGraph()\n\n    # exchange u_nodes and v_nodes if u_nodes is empty but v_nodes not\n    if len(u_nodes) == 0 and len(v_nodes) != 0:\n        u_nodes = v_nodes\n        v_nodes = []\n\n    if len(v_nodes) == 0:\n        for u in u_nodes:\n            self._filter_one_node(u, lg)\n\n    for u in u_nodes:\n        for v in v_nodes:\n            self._filter_two_nodes(u, v, lg)\n\n    return lg\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score","title":"Score  <code>dataclass</code>","text":"<pre><code>Score(name: str, value: float, parameter: dict)\n</code></pre> <p>A data class to represent score data.</p> <p>Attributes:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>the name of the scoring method. See <code>ScoringMethod</code> for valid values.</p> </li> <li> <code>value</code>               (<code>float</code>)           \u2013            <p>the score value.</p> </li> <li> <code>parameter</code>               (<code>dict</code>)           \u2013            <p>the parameters used for the scoring method.</p> </li> </ul>"},{"location":"api/scoring/#nplinker.scoring.Score.name","title":"name  <code>instance-attribute</code>","text":"<pre><code>name: str\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.value","title":"value  <code>instance-attribute</code>","text":"<pre><code>value: float\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.parameter","title":"parameter  <code>instance-attribute</code>","text":"<pre><code>parameter: dict\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.__post_init__","title":"__post_init__","text":"<pre><code>__post_init__() -&gt; None\n</code></pre> <p>Check if the value of <code>name</code> is valid.</p> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>if the value of <code>name</code> is not valid.</p> </li> </ul> Source code in <code>src/nplinker/scoring/score.py</code> <pre><code>def __post_init__(self) -&gt; None:\n    \"\"\"Check if the value of `name` is valid.\n\n    Raises:\n        ValueError: if the value of `name` is not valid.\n    \"\"\"\n    if ScoringMethod.has_value(self.name) is False:\n        raise ValueError(\n            f\"{self.name} is not a valid value. Valid values are: {[e.value for e in ScoringMethod]}\"\n        )\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.__getitem__","title":"__getitem__","text":"<pre><code>__getitem__(key)\n</code></pre> Source code in <code>src/nplinker/scoring/score.py</code> <pre><code>def __getitem__(self, key):\n    if key in {field.name for field in fields(self)}:\n        return getattr(self, key)\n    else:\n        raise KeyError(f\"{key} not found in {self.__class__.__name__}\")\n</code></pre>"},{"location":"api/scoring/#nplinker.scoring.Score.__setitem__","title":"__setitem__","text":"<pre><code>__setitem__(key, value)\n</code></pre> Source code in <code>src/nplinker/scoring/score.py</code> <pre><code>def __setitem__(self, key, value):\n    # validate the value of `name`\n    if key == \"name\" and ScoringMethod.has_value(value) is False:\n        raise ValueError(\n            f\"{value} is not a valid value. Valid values are: {[e.value for e in ScoringMethod]}\"\n        )\n\n    if key in {field.name for field in fields(self)}:\n        setattr(self, key, value)\n    else:\n        raise KeyError(f\"{key} not found in {self.__class__.__name__}\")\n</code></pre>"},{"location":"api/scoring_abc/","title":"Abstract Base Classes","text":""},{"location":"api/scoring_abc/#nplinker.scoring.abc","title":"nplinker.scoring.abc","text":""},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase","title":"ScoringBase","text":"<p>               Bases: <code>ABC</code></p> <p>Abstract base class of scoring methods.</p> <p>Attributes:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>The name of the scoring method.</p> </li> <li> <code>npl</code>               (<code>NPLinker | None</code>)           \u2013            <p>The NPLinker object.</p> </li> </ul>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.name","title":"name  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>name: str = 'ScoringBase'\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.npl","title":"npl  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>npl: NPLinker | None = None\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.setup","title":"setup  <code>abstractmethod</code> <code>classmethod</code>","text":"<pre><code>setup(npl: NPLinker)\n</code></pre> <p>Setup class level attributes.</p> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@classmethod\n@abstractmethod\ndef setup(cls, npl: NPLinker):\n    \"\"\"Setup class level attributes.\"\"\"\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.get_links","title":"get_links  <code>abstractmethod</code>","text":"<pre><code>get_links(*objects, **parameters) -&gt; LinkGraph\n</code></pre> <p>Get links information for the given objects.</p> <p>Parameters:</p> <ul> <li> <code>objects</code>           \u2013            <p>A list of objects to get links for.</p> </li> <li> <code>parameters</code>           \u2013            <p>The parameters used for scoring.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>LinkGraph</code>           \u2013            <p>The LinkGraph object.</p> </li> </ul> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@abstractmethod\ndef get_links(\n    self,\n    *objects,\n    **parameters,\n) -&gt; LinkGraph:\n    \"\"\"Get links information for the given objects.\n\n    Args:\n        objects: A list of objects to get links for.\n        parameters: The parameters used for scoring.\n\n    Returns:\n        The LinkGraph object.\n    \"\"\"\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.format_data","title":"format_data  <code>abstractmethod</code>","text":"<pre><code>format_data(data) -&gt; str\n</code></pre> <p>Format the scoring data to a string.</p> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@abstractmethod\ndef format_data(self, data) -&gt; str:\n    \"\"\"Format the scoring data to a string.\"\"\"\n</code></pre>"},{"location":"api/scoring_abc/#nplinker.scoring.abc.ScoringBase.sort","title":"sort  <code>abstractmethod</code>","text":"<pre><code>sort(objects, reverse=True) -&gt; list\n</code></pre> <p>Sort the given objects based on the scoring data.</p> Source code in <code>src/nplinker/scoring/abc.py</code> <pre><code>@abstractmethod\ndef sort(self, objects, reverse=True) -&gt; list:\n    \"\"\"Sort the given objects based on the scoring data.\"\"\"\n</code></pre>"},{"location":"api/scoring_methods/","title":"Scoring Methods","text":""},{"location":"api/scoring_methods/#nplinker.scoring","title":"nplinker.scoring","text":""},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod","title":"ScoringMethod","text":"<p>               Bases: <code>Enum</code></p> <p>Enum class for scoring methods.</p>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.METCALF","title":"METCALF  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>METCALF = 'metcalf'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.ROSETTA","title":"ROSETTA  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>ROSETTA = 'rosetta'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.NPLCLASS","title":"NPLCLASS  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>NPLCLASS = 'nplclass'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.ScoringMethod.has_value","title":"has_value  <code>classmethod</code>","text":"<pre><code>has_value(value: str) -&gt; bool\n</code></pre> <p>Check if the enum has a value.</p> Source code in <code>src/nplinker/scoring/scoring_method.py</code> <pre><code>@classmethod\ndef has_value(cls, value: str) -&gt; bool:\n    \"\"\"Check if the enum has a value.\"\"\"\n    return any(value == item.value for item in cls)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring","title":"MetcalfScoring","text":"<p>               Bases: <code>ScoringBase</code></p> <p>Metcalf scoring method.</p> <p>Attributes:</p> <ul> <li> <code>name</code>           \u2013            <p>The name of this scoring method, set to a fixed value <code>metcalf</code>.</p> </li> <li> <code>npl</code>               (<code>NPLinker | None</code>)           \u2013            <p>The NPLinker object.</p> </li> <li> <code>CACHE</code>               (<code>str</code>)           \u2013            <p>The name of the cache file to use for storing the MetcalfScoring.</p> </li> <li> <code>presence_gcf_strain</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store presence of gcfs with respect to strains. The index of the DataFrame are the GCF objects and the columns are Strain objects. The values are 1 where the gcf occurs in the strain, 0 otherwise.</p> </li> <li> <code>presence_spec_strain</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store presence of spectra with respect to strains. The index of the DataFrame are the Spectrum objects and the columns are Strain objects. The values are 1 where the spectrum occurs in the strain, 0 otherwise.</p> </li> <li> <code>presence_mf_strain</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store presence of molecular families with respect to strains. The index of the DataFrame are the MolecularFamily objects and the columns are Strain objects. The values are 1 where the molecular family occurs in the strain, 0 otherwise.</p> </li> <li> <code>raw_score_spec_gcf</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store the raw Metcalf scores for spectrum-gcf links. The columns are \"spec\", \"gcf\" and \"score\":</p> <ul> <li>The \"spec\" and \"gcf\" columns contain the Spectrum and GCF objects respectively,</li> <li>The \"score\" column contains the raw Metcalf scores.</li> </ul> </li> <li> <code>raw_score_mf_gcf</code>               (<code>DataFrame</code>)           \u2013            <p>A DataFrame to store the raw Metcalf scores for molecular family-gcf links. The columns are \"mf\", \"gcf\" and \"score\":</p> <ul> <li>The \"mf\" and \"gcf\" columns contain the MolecularFamily and GCF objects respectively,</li> <li>the \"score\" column contains the raw Metcalf scores.</li> </ul> </li> <li> <code>metcalf_mean</code>               (<code>ndarray | None</code>)           \u2013            <p>A numpy array to store the mean value used for standardising Metcalf scores. The array has shape (n_strains+1, n_strains+1), where n_strains is the number of strains.</p> </li> <li> <code>metcalf_std</code>               (<code>ndarray | None</code>)           \u2013            <p>A numpy array to store the standard deviation value used for standardising Metcalf scores. The array has shape (n_strains+1, n_strains+1), where n_strains is the number of strains.</p> </li> </ul>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.name","title":"name  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>name = METCALF.value\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.npl","title":"npl  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>npl: NPLinker | None = None\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.CACHE","title":"CACHE  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>CACHE: str = 'cache_metcalf_scoring.pckl'\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.metcalf_weights","title":"metcalf_weights  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>metcalf_weights: tuple[int, int, int, int] = (10, -10, 0, 1)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.presence_gcf_strain","title":"presence_gcf_strain  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>presence_gcf_strain: DataFrame = DataFrame()\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.presence_spec_strain","title":"presence_spec_strain  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>presence_spec_strain: DataFrame = DataFrame()\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.presence_mf_strain","title":"presence_mf_strain  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>presence_mf_strain: DataFrame = DataFrame()\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.raw_score_spec_gcf","title":"raw_score_spec_gcf  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>raw_score_spec_gcf: DataFrame = DataFrame(\n    columns=[\"spec\", \"gcf\", \"score\"]\n)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.raw_score_mf_gcf","title":"raw_score_mf_gcf  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>raw_score_mf_gcf: DataFrame = DataFrame(\n    columns=[\"mf\", \"gcf\", \"score\"]\n)\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.metcalf_mean","title":"metcalf_mean  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>metcalf_mean: ndarray | None = None\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.metcalf_std","title":"metcalf_std  <code>class-attribute</code> <code>instance-attribute</code>","text":"<pre><code>metcalf_std: ndarray | None = None\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.setup","title":"setup  <code>classmethod</code>","text":"<pre><code>setup(npl: NPLinker) -&gt; None\n</code></pre> <p>Setup the MetcalfScoring object.</p> <p>This method is only called once to setup the MetcalfScoring object.</p> <p>Parameters:</p> <ul> <li> <code>npl</code>               (<code>NPLinker</code>)           \u2013            <p>The NPLinker object.</p> </li> </ul> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>@classmethod\ndef setup(cls, npl: NPLinker) -&gt; None:\n    \"\"\"Setup the MetcalfScoring object.\n\n    This method is only called once to setup the MetcalfScoring object.\n\n    Args:\n        npl: The NPLinker object.\n    \"\"\"\n    if cls.npl is not None:\n        logger.info(\"MetcalfScoring.setup already called, skipping.\")\n        return\n\n    logger.info(\n        f\"MetcalfScoring.setup starts: #bgcs={len(npl.bgcs)}, #gcfs={len(npl.gcfs)}, \"\n        f\"#spectra={len(npl.spectra)}, #mfs={len(npl.mfs)}, #strains={npl.strains}\"\n    )\n    cls.npl = npl\n\n    # calculate presence of gcfs/spectra/mfs with respect to strains\n    cls.presence_gcf_strain = get_presence_gcf_strain(npl.gcfs, npl.strains)\n    cls.presence_spec_strain = get_presence_spec_strain(npl.spectra, npl.strains)\n    cls.presence_mf_strain = get_presence_mf_strain(npl.mfs, npl.strains)\n\n    # calculate raw Metcalf scores for spec-gcf links\n    raw_score_spec_gcf = cls._calc_raw_score(\n        cls.presence_spec_strain, cls.presence_gcf_strain, cls.metcalf_weights\n    )\n    cls.raw_score_spec_gcf = raw_score_spec_gcf.reset_index().melt(id_vars=\"index\")\n    cls.raw_score_spec_gcf.columns = [\"spec\", \"gcf\", \"score\"]  # type: ignore\n\n    # calculate raw Metcalf scores for spec-gcf links\n    raw_score_mf_gcf = cls._calc_raw_score(\n        cls.presence_mf_strain, cls.presence_gcf_strain, cls.metcalf_weights\n    )\n    cls.raw_score_mf_gcf = raw_score_mf_gcf.reset_index().melt(id_vars=\"index\")\n    cls.raw_score_mf_gcf.columns = [\"mf\", \"gcf\", \"score\"]  # type: ignore\n\n    # calculate mean and std for standardising Metcalf scores\n    cls.metcalf_mean, cls.metcalf_std = cls._calc_mean_std(\n        len(npl.strains), cls.metcalf_weights\n    )\n\n    logger.info(\"MetcalfScoring.setup completed\")\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.get_links","title":"get_links","text":"<pre><code>get_links(*objects, **parameters)\n</code></pre> <p>Get links for the given objects.</p> <p>Parameters:</p> <ul> <li> <code>objects</code>           \u2013            <p>The objects to get links for. All objects must be of the same type, i.e. <code>GCF</code>, <code>Spectrum</code> or <code>MolecularFamily</code> type. If no objects are provided, all detected objects (<code>npl.gcfs</code>) will be used.</p> </li> <li> <code>parameters</code>           \u2013            <p>The scoring parameters to use for the links. The parameters are:</p> <ul> <li><code>cutoff</code>: The minimum score to consider a link (\u2265cutoff). Default is 0.</li> <li><code>standardised</code>: Whether to use standardised scores. Default is False.</li> </ul> </li> </ul> <p>Returns:</p> <ul> <li>           \u2013            <p>The <code>LinkGraph</code> object containing the links involving the input objects with the Metcalf scores.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>TypeError</code>             \u2013            <p>If the input objects are not of the same type or the object type is invalid.</p> </li> </ul> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>def get_links(self, *objects, **parameters):\n    \"\"\"Get links for the given objects.\n\n    Args:\n        objects: The objects to get links for. All objects must be of the same type, i.e. `GCF`,\n            `Spectrum` or `MolecularFamily` type.\n            If no objects are provided, all detected objects (`npl.gcfs`) will be used.\n        parameters: The scoring parameters to use for the links.\n            The parameters are:\n\n            - `cutoff`: The minimum score to consider a link (\u2265cutoff). Default is 0.\n            - `standardised`: Whether to use standardised scores. Default is False.\n\n    Returns:\n        The [`LinkGraph`][nplinker.scoring.LinkGraph] object containing the links involving the\n            input objects with the Metcalf scores.\n\n    Raises:\n        TypeError: If the input objects are not of the same type or the object type is invalid.\n    \"\"\"\n    # validate input objects\n    if len(objects) == 0:\n        objects = self.npl.gcfs\n    # check if all objects are of the same type\n    types = {type(i) for i in objects}\n    if len(types) &gt; 1:\n        raise TypeError(\"Input objects must be of the same type.\")\n    # check if the object type is valid\n    obj_type = next(iter(types))\n    if obj_type not in (GCF, Spectrum, MolecularFamily):\n        raise TypeError(\n            f\"Invalid type {obj_type}. Input objects must be GCF, Spectrum or MolecularFamily objects.\"\n        )\n\n    # validate scoring parameters\n    self._cutoff: float = parameters.get(\"cutoff\", 0)\n    self._standardised: bool = parameters.get(\"standardised\", False)\n    parameters.update({\"cutoff\": self._cutoff, \"standardised\": self._standardised})\n\n    logger.info(\n        f\"MetcalfScoring: #objects={len(objects)}, type={obj_type}, cutoff={self._cutoff}, \"\n        f\"standardised={self._standardised}\"\n    )\n    if not self._standardised:\n        scores_list = self._get_links(*objects, obj_type=obj_type, score_cutoff=self._cutoff)\n    else:\n        if self.metcalf_mean is None or self.metcalf_std is None:\n            raise ValueError(\n                \"MetcalfScoring.metcalf_mean and metcalf_std are not set. Run MetcalfScoring.setup first.\"\n            )\n        # use negative infinity as the score cutoff to ensure we get all links\n        scores_list = self._get_links(*objects, obj_type=obj_type, score_cutoff=-np.inf)\n        scores_list = self._calc_standardised_score(scores_list)\n\n    links = LinkGraph()\n    for score_df in scores_list:\n        for row in score_df.itertuples(index=False):  # row has attributes: spec/mf, gcf, score\n            met = row.spec if score_df.name == LinkType.SPEC_GCF else row.mf\n            links.add_link(\n                row.gcf,\n                met,\n                metcalf=Score(self.name, row.score, parameters),\n            )\n\n    logger.info(f\"MetcalfScoring: completed! Found {len(links.links)} links in total.\")\n    return links\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.format_data","title":"format_data","text":"<pre><code>format_data(data)\n</code></pre> <p>Format the data for display.</p> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>def format_data(self, data):\n    \"\"\"Format the data for display.\"\"\"\n    # for metcalf the data will just be a floating point value (i.e. the score)\n    return f\"{data:.4f}\"\n</code></pre>"},{"location":"api/scoring_methods/#nplinker.scoring.MetcalfScoring.sort","title":"sort","text":"<pre><code>sort(objects, reverse=True)\n</code></pre> <p>Sort the objects based on the score.</p> Source code in <code>src/nplinker/scoring/metcalf_scoring.py</code> <pre><code>def sort(self, objects, reverse=True):\n    \"\"\"Sort the objects based on the score.\"\"\"\n    # sort based on score\n    return sorted(objects, key=lambda objlink: objlink[self], reverse=reverse)\n</code></pre>"},{"location":"api/scoring_utils/","title":"Utilities","text":""},{"location":"api/scoring_utils/#nplinker.scoring.utils","title":"nplinker.scoring.utils","text":""},{"location":"api/scoring_utils/#nplinker.scoring.utils.get_presence_gcf_strain","title":"get_presence_gcf_strain","text":"<pre><code>get_presence_gcf_strain(\n    gcfs: Sequence[GCF], strains: StrainCollection\n) -&gt; DataFrame\n</code></pre> <p>Get the occurrence of strains in gcfs.</p> <p>The occurrence is a DataFrame with GCF objects as index and Strain objects as columns, and the values are 1 if the gcf occurs in the strain,  0 otherwise.</p> Source code in <code>src/nplinker/scoring/utils.py</code> <pre><code>def get_presence_gcf_strain(gcfs: Sequence[GCF], strains: StrainCollection) -&gt; pd.DataFrame:\n    \"\"\"Get the occurrence of strains in gcfs.\n\n    The occurrence is a DataFrame with GCF objects as index and Strain objects as columns, and the\n    values are 1 if the gcf occurs in the strain,  0 otherwise.\n    \"\"\"\n    df_gcf_strain = pd.DataFrame(\n        0,\n        index=gcfs,\n        columns=list(strains),\n        dtype=int,\n    )  # type: ignore\n    for gcf in gcfs:\n        for strain in strains:\n            if gcf.has_strain(strain):\n                df_gcf_strain.loc[gcf, strain] = 1\n    return df_gcf_strain  # type: ignore\n</code></pre>"},{"location":"api/scoring_utils/#nplinker.scoring.utils.get_presence_spec_strain","title":"get_presence_spec_strain","text":"<pre><code>get_presence_spec_strain(\n    spectra: Sequence[Spectrum], strains: StrainCollection\n) -&gt; DataFrame\n</code></pre> <p>Get the occurrence of strains in spectra.</p> <p>The occurrence is a DataFrame with Spectrum objects as index and Strain objects as columns, and the values are 1 if the spectrum occurs in the strain, 0 otherwise.</p> Source code in <code>src/nplinker/scoring/utils.py</code> <pre><code>def get_presence_spec_strain(\n    spectra: Sequence[Spectrum], strains: StrainCollection\n) -&gt; pd.DataFrame:\n    \"\"\"Get the occurrence of strains in spectra.\n\n    The occurrence is a DataFrame with Spectrum objects as index and Strain objects as columns, and\n    the values are 1 if the spectrum occurs in the strain, 0 otherwise.\n    \"\"\"\n    df_spec_strain = pd.DataFrame(\n        0,\n        index=spectra,\n        columns=list(strains),\n        dtype=int,\n    )  # type: ignore\n    for spectrum in spectra:\n        for strain in strains:\n            if spectrum.has_strain(strain):\n                df_spec_strain.loc[spectrum, strain] = 1\n    return df_spec_strain  # type: ignore\n</code></pre>"},{"location":"api/scoring_utils/#nplinker.scoring.utils.get_presence_mf_strain","title":"get_presence_mf_strain","text":"<pre><code>get_presence_mf_strain(\n    mfs: Sequence[MolecularFamily],\n    strains: StrainCollection,\n) -&gt; DataFrame\n</code></pre> <p>Get the occurrence of strains in molecular families.</p> <p>The occurrence is a DataFrame with MolecularFamily objects as index and Strain objects as columns, and the values are 1 if the molecular family occurs in the strain, 0 otherwise.</p> Source code in <code>src/nplinker/scoring/utils.py</code> <pre><code>def get_presence_mf_strain(\n    mfs: Sequence[MolecularFamily], strains: StrainCollection\n) -&gt; pd.DataFrame:\n    \"\"\"Get the occurrence of strains in molecular families.\n\n    The occurrence is a DataFrame with MolecularFamily objects as index and Strain objects as\n    columns, and the values are 1 if the molecular family occurs in the strain, 0 otherwise.\n    \"\"\"\n    df_mf_strain = pd.DataFrame(\n        0,\n        index=mfs,\n        columns=list(strains),\n        dtype=int,\n    )  # type: ignore\n    for mf in mfs:\n        for strain in strains:\n            if mf.has_strain(strain):\n                df_mf_strain.loc[mf, strain] = 1\n    return df_mf_strain  # type: ignore\n</code></pre>"},{"location":"api/strain/","title":"Data Models","text":""},{"location":"api/strain/#nplinker.strain","title":"nplinker.strain","text":""},{"location":"api/strain/#nplinker.strain.Strain","title":"Strain","text":"<pre><code>Strain(id: str)\n</code></pre> <p>Class to model the mapping between strain id and its aliases.</p> <p>It's recommended to use NCBI taxonomy strain id or name as the primary id.</p> <p>Attributes:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>The representative id of the strain.</p> </li> <li> <code>names</code>               (<code>set[str]</code>)           \u2013            <p>A set of names associated with the strain.</p> </li> <li> <code>aliases</code>               (<code>set[str]</code>)           \u2013            <p>A set of aliases associated with the strain.</p> </li> </ul> <p>Parameters:</p> <ul> <li> <code>id</code>               (<code>str</code>)           \u2013            <p>the representative id of the strain.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __init__(self, id: str) -&gt; None:\n    \"\"\"To model the mapping between strain id and its aliases.\n\n    Args:\n        id: the representative id of the strain.\n    \"\"\"\n    self.id: str = id\n    self._aliases: set[str] = set()\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.id","title":"id  <code>instance-attribute</code>","text":"<pre><code>id: str = id\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.names","title":"names  <code>property</code>","text":"<pre><code>names: set[str]\n</code></pre> <p>Get the set of strain names including id and aliases.</p> <p>Returns:</p> <ul> <li> <code>set[str]</code>           \u2013            <p>A set of names associated with the strain.</p> </li> </ul>"},{"location":"api/strain/#nplinker.strain.Strain.aliases","title":"aliases  <code>property</code>","text":"<pre><code>aliases: set[str]\n</code></pre> <p>Get the set of known aliases.</p> <p>Returns:</p> <ul> <li> <code>set[str]</code>           \u2013            <p>A set of aliases associated with the strain.</p> </li> </ul>"},{"location":"api/strain/#nplinker.strain.Strain.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __str__(self) -&gt; str:\n    return f\"Strain({self.id}) [{len(self._aliases)} aliases]\"\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, Strain):\n        return self.id == other.id\n    return NotImplemented\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__hash__","title":"__hash__","text":"<pre><code>__hash__() -&gt; int\n</code></pre> <p>Hash function for Strain.</p> <p>Note that Strain is a mutable container, so here we hash on only the id to avoid the hash value changes when <code>self._aliases</code> is updated.</p> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __hash__(self) -&gt; int:\n    \"\"\"Hash function for Strain.\n\n    Note that Strain is a mutable container, so here we hash on only the id\n    to avoid the hash value changes when `self._aliases` is updated.\n    \"\"\"\n    return hash(self.id)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.__contains__","title":"__contains__","text":"<pre><code>__contains__(alias: str) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def __contains__(self, alias: str) -&gt; bool:\n    if not isinstance(alias, str):\n        raise TypeError(f\"Expected str, got {type(alias)}\")\n    return alias in self._aliases\n</code></pre>"},{"location":"api/strain/#nplinker.strain.Strain.add_alias","title":"add_alias","text":"<pre><code>add_alias(alias: str) -&gt; None\n</code></pre> <p>Add an alias for the strain.</p> <p>Parameters:</p> <ul> <li> <code>alias</code>               (<code>str</code>)           \u2013            <p>The alias to add for the strain.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain.py</code> <pre><code>def add_alias(self, alias: str) -&gt; None:\n    \"\"\"Add an alias for the strain.\n\n    Args:\n        alias: The alias to add for the strain.\n    \"\"\"\n    if not isinstance(alias, str):\n        raise TypeError(f\"Expected str, got {type(alias)}\")\n    if len(alias) == 0:\n        logger.warning(\"Refusing to add an empty-string alias to strain {%s}\", self)\n    else:\n        self._aliases.add(alias)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection","title":"StrainCollection","text":"<pre><code>StrainCollection()\n</code></pre> <p>A collection of <code>Strain</code> objects.</p> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __init__(self) -&gt; None:\n    # the order of strains is needed for scoring part, so use a list\n    self._strains: list[Strain] = []\n    self._strain_dict_name: dict[str, list[Strain]] = {}\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__repr__","title":"__repr__","text":"<pre><code>__repr__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __repr__(self) -&gt; str:\n    return str(self)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__str__","title":"__str__","text":"<pre><code>__str__() -&gt; str\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __str__(self) -&gt; str:\n    if len(self) &gt; 20:\n        return f\"StrainCollection(n={len(self)})\"\n\n    return f\"StrainCollection(n={len(self)}) [\" + \",\".join(s.id for s in self._strains) + \"]\"\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__len__","title":"__len__","text":"<pre><code>__len__() -&gt; int\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __len__(self) -&gt; int:\n    return len(self._strains)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__eq__","title":"__eq__","text":"<pre><code>__eq__(other) -&gt; bool\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __eq__(self, other) -&gt; bool:\n    if isinstance(other, StrainCollection):\n        return (\n            self._strains == other._strains\n            and self._strain_dict_name == other._strain_dict_name\n        )\n    return NotImplemented\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__add__","title":"__add__","text":"<pre><code>__add__(other) -&gt; StrainCollection\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __add__(self, other) -&gt; StrainCollection:\n    if isinstance(other, StrainCollection):\n        sc = StrainCollection()\n        for strain in self._strains:\n            sc.add(strain)\n        for strain in other._strains:\n            sc.add(strain)\n        return sc\n    return NotImplemented\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__contains__","title":"__contains__","text":"<pre><code>__contains__(item: Strain) -&gt; bool\n</code></pre> <p>Check if the strain collection contains the given Strain object.</p> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __contains__(self, item: Strain) -&gt; bool:\n    \"\"\"Check if the strain collection contains the given Strain object.\"\"\"\n    if isinstance(item, Strain):\n        return item.id in self._strain_dict_name\n    raise TypeError(f\"Expected Strain, got {type(item)}\")\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.__iter__","title":"__iter__","text":"<pre><code>__iter__() -&gt; Iterator[Strain]\n</code></pre> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def __iter__(self) -&gt; Iterator[Strain]:\n    return iter(self._strains)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.add","title":"add","text":"<pre><code>add(strain: Strain) -&gt; None\n</code></pre> <p>Add strain to the collection.</p> <p>If the strain already exists, merge the aliases.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p>The strain to add.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def add(self, strain: Strain) -&gt; None:\n    \"\"\"Add strain to the collection.\n\n    If the strain already exists, merge the aliases.\n\n    Args:\n        strain: The strain to add.\n    \"\"\"\n    if strain in self._strains:\n        # only one strain object per id\n        strain_ref = self._strain_dict_name[strain.id][0]\n        new_aliases = [alias for alias in strain.aliases if alias not in strain_ref.aliases]\n        for alias in new_aliases:\n            strain_ref.add_alias(alias)\n            if alias not in self._strain_dict_name:\n                self._strain_dict_name[alias] = [strain_ref]\n            else:\n                self._strain_dict_name[alias].append(strain_ref)\n    else:\n        self._strains.append(strain)\n        for name in strain.names:\n            if name not in self._strain_dict_name:\n                self._strain_dict_name[name] = [strain]\n            else:\n                self._strain_dict_name[name].append(strain)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.remove","title":"remove","text":"<pre><code>remove(strain: Strain) -&gt; None\n</code></pre> <p>Remove a strain from the collection.</p> <p>It removes the given strain object from the collection by strain id. If the strain id is not found, raise <code>ValueError</code>.</p> <p>Parameters:</p> <ul> <li> <code>strain</code>               (<code>Strain</code>)           \u2013            <p>The strain to remove.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the strain is not found in the collection.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def remove(self, strain: Strain) -&gt; None:\n    \"\"\"Remove a strain from the collection.\n\n    It removes the given strain object from the collection by strain id.\n    If the strain id is not found, raise `ValueError`.\n\n    Args:\n        strain: The strain to remove.\n\n    Raises:\n        ValueError: If the strain is not found in the collection.\n    \"\"\"\n    if strain in self._strains:\n        self._strains.remove(strain)\n        # only one strain object per id\n        strain_ref = self._strain_dict_name[strain.id][0]\n        for name in strain_ref.names:\n            if name in self._strain_dict_name:\n                new_strain_list = [s for s in self._strain_dict_name[name] if s.id != strain.id]\n                if not new_strain_list:\n                    del self._strain_dict_name[name]\n                else:\n                    self._strain_dict_name[name] = new_strain_list\n    else:\n        raise ValueError(f\"Strain {strain} not found in the strain collection.\")\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.filter","title":"filter","text":"<pre><code>filter(strain_set: set[Strain])\n</code></pre> <p>Remove all strains that are not in <code>strain_set</code> from the strain collection.</p> <p>Parameters:</p> <ul> <li> <code>strain_set</code>               (<code>set[Strain]</code>)           \u2013            <p>Set of strains to keep.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def filter(self, strain_set: set[Strain]):\n    \"\"\"Remove all strains that are not in `strain_set` from the strain collection.\n\n    Args:\n        strain_set: Set of strains to keep.\n    \"\"\"\n    # note that we need to copy the list of strains, as we are modifying it\n    for strain in self._strains.copy():\n        if strain not in strain_set:\n            self.remove(strain)\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.intersection","title":"intersection","text":"<pre><code>intersection(other: StrainCollection) -&gt; StrainCollection\n</code></pre> <p>Get the intersection of two strain collections.</p> <p>Parameters:</p> <ul> <li> <code>other</code>               (<code>StrainCollection</code>)           \u2013            <p>The other strain collection to compare.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>StrainCollection</code>           \u2013            <p>StrainCollection object containing the strains that are in both collections.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def intersection(self, other: StrainCollection) -&gt; StrainCollection:\n    \"\"\"Get the intersection of two strain collections.\n\n    Args:\n        other: The other strain collection to compare.\n\n    Returns:\n        StrainCollection object containing the strains that are in both collections.\n    \"\"\"\n    intersection = StrainCollection()\n    for strain in self:\n        if strain in other:\n            intersection.add(strain)\n    return intersection\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.has_name","title":"has_name","text":"<pre><code>has_name(name: str) -&gt; bool\n</code></pre> <p>Check if the strain collection contains the given strain name (id or alias).</p> <p>Parameters:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>Strain name (id or alias) to check.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if the strain name is in the collection, False otherwise.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def has_name(self, name: str) -&gt; bool:\n    \"\"\"Check if the strain collection contains the given strain name (id or alias).\n\n    Args:\n        name: Strain name (id or alias) to check.\n\n    Returns:\n        True if the strain name is in the collection, False otherwise.\n    \"\"\"\n    return name in self._strain_dict_name\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.lookup","title":"lookup","text":"<pre><code>lookup(name: str) -&gt; list[Strain]\n</code></pre> <p>Lookup a strain by name (id or alias).</p> <p>Parameters:</p> <ul> <li> <code>name</code>               (<code>str</code>)           \u2013            <p>Strain name (id or alias) to lookup.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>list[Strain]</code>           \u2013            <p>List of Strain objects with the given name.</p> </li> </ul> <p>Raises:</p> <ul> <li> <code>ValueError</code>             \u2013            <p>If the strain name is not found.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def lookup(self, name: str) -&gt; list[Strain]:\n    \"\"\"Lookup a strain by name (id or alias).\n\n    Args:\n        name: Strain name (id or alias) to lookup.\n\n    Returns:\n        List of Strain objects with the given name.\n\n    Raises:\n        ValueError: If the strain name is not found.\n    \"\"\"\n    if name in self._strain_dict_name:\n        return self._strain_dict_name[name]\n    raise ValueError(f\"Strain {name} not found in the strain collection.\")\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.read_json","title":"read_json  <code>staticmethod</code>","text":"<pre><code>read_json(file: str | PathLike) -&gt; StrainCollection\n</code></pre> <p>Read a strain mappings JSON file and return a <code>StrainCollection</code> object.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the strain mappings JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>StrainCollection</code>           \u2013            <p><code>StrainCollection</code> object.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>@staticmethod\ndef read_json(file: str | PathLike) -&gt; StrainCollection:\n    \"\"\"Read a strain mappings JSON file and return a `StrainCollection` object.\n\n    Args:\n        file: Path to the strain mappings JSON file.\n\n    Returns:\n        `StrainCollection` object.\n    \"\"\"\n    with open(file, \"r\") as f:\n        json_data = json.load(f)\n\n    # validate json data\n    validate(instance=json_data, schema=STRAIN_MAPPINGS_SCHEMA)\n\n    strain_collection = StrainCollection()\n    for data in json_data[\"strain_mappings\"]:\n        strain = Strain(data[\"strain_id\"])\n        for alias in data[\"strain_alias\"]:\n            strain.add_alias(alias)\n        strain_collection.add(strain)\n    return strain_collection\n</code></pre>"},{"location":"api/strain/#nplinker.strain.StrainCollection.to_json","title":"to_json","text":"<pre><code>to_json(file: str | PathLike | None = None) -&gt; str | None\n</code></pre> <p>Convert the <code>StrainCollection</code> object to a JSON string.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>Path to output JSON file. If None, return the JSON string instead.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str | None</code>           \u2013            <p>If input <code>file</code> is None, return the JSON string. Otherwise, write the JSON string to the given</p> </li> <li> <code>str | None</code>           \u2013            <p>file.</p> </li> </ul> Source code in <code>src/nplinker/strain/strain_collection.py</code> <pre><code>def to_json(self, file: str | PathLike | None = None) -&gt; str | None:\n    \"\"\"Convert the `StrainCollection` object to a JSON string.\n\n    Args:\n        file: Path to output JSON file. If None, return the JSON string instead.\n\n    Returns:\n        If input `file` is None, return the JSON string. Otherwise, write the JSON string to the given\n        file.\n    \"\"\"\n    data_list = [\n        {\"strain_id\": strain.id, \"strain_alias\": list(strain.aliases)} for strain in self\n    ]\n    json_data = {\"strain_mappings\": data_list, \"version\": \"1.0\"}\n\n    # validate json data\n    validate(instance=json_data, schema=STRAIN_MAPPINGS_SCHEMA)\n\n    if file is not None:\n        with open(file, \"w\") as f:\n            json.dump(json_data, f)\n        return None\n    return json.dumps(json_data)\n</code></pre>"},{"location":"api/strain_utils/","title":"Utilities","text":""},{"location":"api/strain_utils/#nplinker.strain.utils","title":"nplinker.strain.utils","text":""},{"location":"api/strain_utils/#nplinker.strain.utils.load_user_strains","title":"load_user_strains","text":"<pre><code>load_user_strains(json_file: str | PathLike) -&gt; set[Strain]\n</code></pre> <p>Load user specified strains from a JSON file.</p> <p>The JSON file will be validated against the schema USER_STRAINS_SCHEMA</p> <p>The content of the JSON file could be, for example: <pre><code>{\"strain_ids\": [\"strain1\", \"strain2\"]}\n</code></pre></p> <p>Parameters:</p> <ul> <li> <code>json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the JSON file containing user specified strains.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>set[Strain]</code>           \u2013            <p>A set of user specified strains.</p> </li> </ul> Source code in <code>src/nplinker/strain/utils.py</code> <pre><code>def load_user_strains(json_file: str | PathLike) -&gt; set[Strain]:\n    \"\"\"Load user specified strains from a JSON file.\n\n    The JSON file will be validated against the schema\n    [USER_STRAINS_SCHEMA][nplinker.schemas.USER_STRAINS_SCHEMA]\n\n    The content of the JSON file could be, for example:\n    ```\n    {\"strain_ids\": [\"strain1\", \"strain2\"]}\n    ```\n\n    Args:\n        json_file: Path to the JSON file containing user specified strains.\n\n    Returns:\n        A set of user specified strains.\n    \"\"\"\n    with open(json_file, \"r\") as f:\n        json_data = json.load(f)\n\n    # validate json data\n    validate(instance=json_data, schema=USER_STRAINS_SCHEMA)\n\n    strains = set()\n    for strain_id in json_data[\"strain_ids\"]:\n        strains.add(Strain(strain_id))\n\n    return strains\n</code></pre>"},{"location":"api/strain_utils/#nplinker.strain.utils.podp_generate_strain_mappings","title":"podp_generate_strain_mappings","text":"<pre><code>podp_generate_strain_mappings(\n    podp_project_json_file: str | PathLike,\n    genome_status_json_file: str | PathLike,\n    genome_bgc_mappings_file: str | PathLike,\n    gnps_file_mappings_file: str | PathLike,\n    output_json_file: str | PathLike,\n) -&gt; StrainCollection\n</code></pre> <p>Generate strain mappings JSON file for PODP pipeline.</p> <p>To get the strain mappings, we need to combine the following mappings:</p> <ul> <li>strain_id &lt;-&gt; original_genome_id &lt;-&gt; resolved_genome_id &lt;-&gt; bgc_id</li> <li>strain_id &lt;-&gt; MS_filename &lt;-&gt; spectrum_id</li> </ul> <p>These mappings are extracted from the following files:</p> <ul> <li>\"strain_id &lt;-&gt; original_genome_id\" is extracted from <code>podp_project_json_file</code>.</li> <li>\"original_genome_id &lt;-&gt; resolved_genome_id\" is extracted from <code>genome_status_json_file</code>.</li> <li>\"resolved_genome_id &lt;-&gt; bgc_id\" is extracted from <code>genome_bgc_mappings_file</code>.</li> <li>\"strain_id &lt;-&gt; MS_filename\" is extracted from <code>podp_project_json_file</code>.</li> <li>\"MS_filename &lt;-&gt; spectrum_id\" is extracted from <code>gnps_file_mappings_file</code>.</li> </ul> <p>Parameters:</p> <ul> <li> <code>podp_project_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the PODP project JSON file.</p> </li> <li> <code>genome_status_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome status JSON file.</p> </li> <li> <code>genome_bgc_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the genome BGC mappings JSON file.</p> </li> <li> <code>gnps_file_mappings_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the GNPS file mappings file (csv or tsv).</p> </li> <li> <code>output_json_file</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to the output JSON file.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>StrainCollection</code>           \u2013            <p>The strain mappings stored in a StrainCollection object.</p> </li> </ul> See Also <ul> <li><code>extract_mappings_strain_id_original_genome_id</code>: Extract mappings     \"strain_id &lt;-&gt; original_genome_id\".</li> <li><code>extract_mappings_original_genome_id_resolved_genome_id</code>: Extract mappings     \"original_genome_id &lt;-&gt; resolved_genome_id\".</li> <li><code>extract_mappings_resolved_genome_id_bgc_id</code>: Extract mappings     \"resolved_genome_id &lt;-&gt; bgc_id\".</li> <li><code>get_mappings_strain_id_bgc_id</code>: Get mappings \"strain_id &lt;-&gt; bgc_id\".</li> <li><code>extract_mappings_strain_id_ms_filename</code>: Extract mappings     \"strain_id &lt;-&gt; MS_filename\".</li> <li><code>extract_mappings_ms_filename_spectrum_id</code>: Extract mappings     \"MS_filename &lt;-&gt; spectrum_id\".</li> <li><code>get_mappings_strain_id_spectrum_id</code>: Get mappings \"strain_id &lt;-&gt; spectrum_id\".</li> </ul> Source code in <code>src/nplinker/strain/utils.py</code> <pre><code>def podp_generate_strain_mappings(\n    podp_project_json_file: str | PathLike,\n    genome_status_json_file: str | PathLike,\n    genome_bgc_mappings_file: str | PathLike,\n    gnps_file_mappings_file: str | PathLike,\n    output_json_file: str | PathLike,\n) -&gt; StrainCollection:\n    \"\"\"Generate strain mappings JSON file for PODP pipeline.\n\n    To get the strain mappings, we need to combine the following mappings:\n\n    - strain_id &lt;-&gt; original_genome_id &lt;-&gt; resolved_genome_id &lt;-&gt; bgc_id\n    - strain_id &lt;-&gt; MS_filename &lt;-&gt; spectrum_id\n\n    These mappings are extracted from the following files:\n\n    - \"strain_id &lt;-&gt; original_genome_id\" is extracted from `podp_project_json_file`.\n    - \"original_genome_id &lt;-&gt; resolved_genome_id\" is extracted from `genome_status_json_file`.\n    - \"resolved_genome_id &lt;-&gt; bgc_id\" is extracted from `genome_bgc_mappings_file`.\n    - \"strain_id &lt;-&gt; MS_filename\" is extracted from `podp_project_json_file`.\n    - \"MS_filename &lt;-&gt; spectrum_id\" is extracted from `gnps_file_mappings_file`.\n\n    Args:\n        podp_project_json_file: The path to the PODP project\n            JSON file.\n        genome_status_json_file: The path to the genome status\n            JSON file.\n        genome_bgc_mappings_file: The path to the genome BGC\n            mappings JSON file.\n        gnps_file_mappings_file: The path to the GNPS file\n            mappings file (csv or tsv).\n        output_json_file: The path to the output JSON file.\n\n    Returns:\n        The strain mappings stored in a StrainCollection object.\n\n    See Also:\n        - `extract_mappings_strain_id_original_genome_id`: Extract mappings\n            \"strain_id &lt;-&gt; original_genome_id\".\n        - `extract_mappings_original_genome_id_resolved_genome_id`: Extract mappings\n            \"original_genome_id &lt;-&gt; resolved_genome_id\".\n        - `extract_mappings_resolved_genome_id_bgc_id`: Extract mappings\n            \"resolved_genome_id &lt;-&gt; bgc_id\".\n        - `get_mappings_strain_id_bgc_id`: Get mappings \"strain_id &lt;-&gt; bgc_id\".\n        - `extract_mappings_strain_id_ms_filename`: Extract mappings\n            \"strain_id &lt;-&gt; MS_filename\".\n        - `extract_mappings_ms_filename_spectrum_id`: Extract mappings\n            \"MS_filename &lt;-&gt; spectrum_id\".\n        - `get_mappings_strain_id_spectrum_id`: Get mappings \"strain_id &lt;-&gt; spectrum_id\".\n    \"\"\"\n    # Get mappings strain_id &lt;-&gt; original_genome_id &lt;-&gt; resolved_genome_id &lt;-&gt; bgc_id\n    mappings_strain_id_bgc_id = get_mappings_strain_id_bgc_id(\n        extract_mappings_strain_id_original_genome_id(podp_project_json_file),\n        extract_mappings_original_genome_id_resolved_genome_id(genome_status_json_file),\n        extract_mappings_resolved_genome_id_bgc_id(genome_bgc_mappings_file),\n    )\n\n    # Get mappings strain_id &lt;-&gt; MS_filename &lt;-&gt; spectrum_id\n    mappings_strain_id_spectrum_id = get_mappings_strain_id_spectrum_id(\n        extract_mappings_strain_id_ms_filename(podp_project_json_file),\n        extract_mappings_ms_filename_spectrum_id(gnps_file_mappings_file),\n    )\n\n    # Get mappings strain_id &lt;-&gt; bgc_id / spectrum_id\n    mappings = mappings_strain_id_bgc_id.copy()\n    for strain_id, spectrum_ids in mappings_strain_id_spectrum_id.items():\n        if strain_id in mappings:\n            mappings[strain_id].update(spectrum_ids)\n        else:\n            mappings[strain_id] = spectrum_ids.copy()\n\n    # Create StrainCollection\n    sc = StrainCollection()\n    for strain_id, bgc_ids in mappings.items():\n        if not sc.has_name(strain_id):\n            strain = Strain(strain_id)\n            for bgc_id in bgc_ids:\n                strain.add_alias(bgc_id)\n            sc.add(strain)\n        else:\n            # strain_list has only one element\n            strain_list = sc.lookup(strain_id)\n            for bgc_id in bgc_ids:\n                strain_list[0].add_alias(bgc_id)\n\n    # Write strain mappings JSON file\n    sc.to_json(output_json_file)\n    logger.info(\"Generated strain mappings JSON file: %s\", output_json_file)\n\n    return sc\n</code></pre>"},{"location":"api/utils/","title":"Utilities","text":""},{"location":"api/utils/#nplinker.utils","title":"nplinker.utils","text":""},{"location":"api/utils/#nplinker.utils.calculate_md5","title":"calculate_md5","text":"<pre><code>calculate_md5(\n    fpath: str | PathLike, chunk_size: int = 1024 * 1024\n) -&gt; str\n</code></pre> <p>Calculate the MD5 checksum of a file.</p> <p>Parameters:</p> <ul> <li> <code>fpath</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file.</p> </li> <li> <code>chunk_size</code>               (<code>int</code>, default:                   <code>1024 * 1024</code> )           \u2013            <p>Chunk size for reading the file. Defaults to 1024*1024.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>MD5 checksum of the file.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def calculate_md5(fpath: str | PathLike, chunk_size: int = 1024 * 1024) -&gt; str:\n    \"\"\"Calculate the MD5 checksum of a file.\n\n    Args:\n        fpath: Path to the file.\n        chunk_size: Chunk size for reading the file. Defaults to 1024*1024.\n\n    Returns:\n        MD5 checksum of the file.\n    \"\"\"\n    if sys.version_info &gt;= (3, 9):\n        md5 = hashlib.md5(usedforsecurity=False)\n    else:\n        md5 = hashlib.md5()\n    with open(fpath, \"rb\") as f:\n        for chunk in iter(lambda: f.read(chunk_size), b\"\"):\n            md5.update(chunk)\n    return md5.hexdigest()\n</code></pre>"},{"location":"api/utils/#nplinker.utils.check_disk_space","title":"check_disk_space","text":"<pre><code>check_disk_space(func)\n</code></pre> <p>A decorator to check available disk space.</p> <p>If the available disk space is less than 500GB, raise and log a warning.</p> <p>Warns:</p> <ul> <li> <code>UserWarning</code>             \u2013            <p>If the available disk space is less than 500GB.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def check_disk_space(func):\n    \"\"\"A decorator to check available disk space.\n\n    If the available disk space is less than 500GB, raise and log a warning.\n\n    Warnings:\n        UserWarning: If the available disk space is less than 500GB.\n    \"\"\"\n\n    @functools.wraps(func)\n    def wrapper_check_disk_space(*args, **kwargs):\n        _, _, free = shutil.disk_usage(\"/\")\n        free_gb = free // (2**30)\n        if free_gb &lt; 50:\n            warning_message = f\"Available disk space is {free_gb}GB. Is it enough for your project?\"\n            logger.warning(warning_message)\n            warnings.warn(warning_message, UserWarning)\n        return func(*args, **kwargs)\n\n    return wrapper_check_disk_space\n</code></pre>"},{"location":"api/utils/#nplinker.utils.check_md5","title":"check_md5","text":"<pre><code>check_md5(fpath: str | PathLike, md5: str) -&gt; bool\n</code></pre> <p>Verify the MD5 checksum of a file.</p> <p>Parameters:</p> <ul> <li> <code>fpath</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file.</p> </li> <li> <code>md5</code>               (<code>str</code>)           \u2013            <p>MD5 checksum to verify.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if the MD5 checksum matches, False otherwise.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def check_md5(fpath: str | PathLike, md5: str) -&gt; bool:\n    \"\"\"Verify the MD5 checksum of a file.\n\n    Args:\n        fpath: Path to the file.\n        md5: MD5 checksum to verify.\n\n    Returns:\n        True if the MD5 checksum matches, False otherwise.\n    \"\"\"\n    return md5 == calculate_md5(fpath)\n</code></pre>"},{"location":"api/utils/#nplinker.utils.download_and_extract_archive","title":"download_and_extract_archive","text":"<pre><code>download_and_extract_archive(\n    url: str,\n    download_root: str | PathLike,\n    extract_root: str | Path | None = None,\n    filename: str | None = None,\n    md5: str | None = None,\n    remove_finished: bool = False,\n) -&gt; None\n</code></pre> <p>Download an archive file and then extract it.</p> <p>This method is a wrapper of <code>download_url</code> and <code>extract_archive</code> functions.</p> <p>Parameters:</p> <ul> <li> <code>url</code>               (<code>str</code>)           \u2013            <p>URL to download file from</p> </li> <li> <code>download_root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the directory to place downloaded file in. If it doesn't exist, it will be created.</p> </li> <li> <code>extract_root</code>               (<code>str | Path | None</code>, default:                   <code>None</code> )           \u2013            <p>Path to the directory the file will be extracted to. The given directory will be created if not exist. If omitted, the <code>download_root</code> is used.</p> </li> <li> <code>filename</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>Name to save the downloaded file under. If None, use the basename of the URL</p> </li> <li> <code>md5</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>MD5 checksum of the download. If None, do not check</p> </li> <li> <code>remove_finished</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>If <code>True</code>, remove the downloaded file  after the extraction. Defaults to False.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def download_and_extract_archive(\n    url: str,\n    download_root: str | PathLike,\n    extract_root: str | Path | None = None,\n    filename: str | None = None,\n    md5: str | None = None,\n    remove_finished: bool = False,\n) -&gt; None:\n    \"\"\"Download an archive file and then extract it.\n\n    This method is a wrapper of [`download_url`][nplinker.utils.download_url] and\n    [`extract_archive`][nplinker.utils.extract_archive] functions.\n\n    Args:\n        url: URL to download file from\n        download_root: Path to the directory to place downloaded\n            file in. If it doesn't exist, it will be created.\n        extract_root: Path to the directory the file\n            will be extracted to. The given directory will be created if not exist.\n            If omitted, the `download_root` is used.\n        filename: Name to save the downloaded file under.\n            If None, use the basename of the URL\n        md5: MD5 checksum of the download. If None, do not check\n        remove_finished: If `True`, remove the downloaded file\n             after the extraction. Defaults to False.\n    \"\"\"\n    download_root = Path(download_root)\n    if extract_root is None:\n        extract_root = download_root\n    else:\n        extract_root = Path(extract_root)\n    if not filename:\n        filename = Path(url).name\n\n    download_url(url, download_root, filename, md5)\n\n    archive = download_root / filename\n    extract_archive(archive, extract_root, remove_finished=remove_finished)\n</code></pre>"},{"location":"api/utils/#nplinker.utils.download_url","title":"download_url","text":"<pre><code>download_url(\n    url: str,\n    root: str | PathLike,\n    filename: str | None = None,\n    md5: str | None = None,\n    http_method: str = \"GET\",\n    allow_http_redirect: bool = True,\n) -&gt; None\n</code></pre> <p>Download a file from a url and place it in root.</p> <p>Parameters:</p> <ul> <li> <code>url</code>               (<code>str</code>)           \u2013            <p>URL to download file from</p> </li> <li> <code>root</code>               (<code>str | PathLike</code>)           \u2013            <p>Directory to place downloaded file in. If it doesn't exist, it will be created.</p> </li> <li> <code>filename</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>Name to save the file under. If None, use the basename of the URL.</p> </li> <li> <code>md5</code>               (<code>str | None</code>, default:                   <code>None</code> )           \u2013            <p>MD5 checksum of the download. If None, do not check.</p> </li> <li> <code>http_method</code>               (<code>str</code>, default:                   <code>'GET'</code> )           \u2013            <p>HTTP request method, e.g. \"GET\", \"POST\". Defaults to \"GET\".</p> </li> <li> <code>allow_http_redirect</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>If true, enable following redirects for all HTTP (\"http:\") methods.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>@check_disk_space\ndef download_url(\n    url: str,\n    root: str | PathLike,\n    filename: str | None = None,\n    md5: str | None = None,\n    http_method: str = \"GET\",\n    allow_http_redirect: bool = True,\n) -&gt; None:\n    \"\"\"Download a file from a url and place it in root.\n\n    Args:\n        url: URL to download file from\n        root: Directory to place downloaded file in. If it doesn't exist, it will be created.\n        filename: Name to save the file under. If None, use the\n            basename of the URL.\n        md5: MD5 checksum of the download. If None, do not check.\n        http_method: HTTP request method, e.g. \"GET\", \"POST\".\n            Defaults to \"GET\".\n        allow_http_redirect: If true, enable following redirects for all HTTP (\"http:\") methods.\n    \"\"\"\n    root = transform_to_full_path(root)\n    # create the download directory if not exist\n    root.mkdir(exist_ok=True)\n    if not filename:\n        filename = Path(url).name\n    fpath = root / filename\n\n    # check if file is already present locally\n    if fpath.is_file() and md5 is not None and check_md5(fpath, md5):\n        logger.info(\"Using downloaded and verified file: \" + str(fpath))\n        return\n\n    # download the file\n    logger.info(f\"Downloading {filename} to {root}\")\n    with open(fpath, \"wb\") as fh:\n        with httpx.stream(http_method, url, follow_redirects=allow_http_redirect) as response:\n            if not response.is_success:\n                fpath.unlink(missing_ok=True)\n                raise RuntimeError(\n                    f\"Failed to download url {url} with status code {response.status_code}\"\n                )\n            total = int(response.headers.get(\"Content-Length\", 0))\n\n            with Progress(\n                TextColumn(\"[progress.description]{task.description}\"),\n                BarColumn(bar_width=None),\n                \"[progress.percentage]{task.percentage:&gt;3.1f}%\",\n                \"\u2022\",\n                DownloadColumn(),\n                \"\u2022\",\n                TransferSpeedColumn(),\n                \"\u2022\",\n                TimeRemainingColumn(),\n                \"\u2022\",\n                TimeElapsedColumn(),\n            ) as progress:\n                task = progress.add_task(f\"[hot_pink]Downloading {fpath.name}\", total=total)\n                for chunk in response.iter_bytes():\n                    fh.write(chunk)\n                    progress.update(task, advance=len(chunk))\n\n    # check integrity of downloaded file\n    if md5 is not None and not check_md5(fpath, md5):\n        raise RuntimeError(\"MD5 validation failed.\")\n</code></pre>"},{"location":"api/utils/#nplinker.utils.extract_archive","title":"extract_archive","text":"<pre><code>extract_archive(\n    from_path: str | PathLike,\n    extract_root: str | PathLike | None = None,\n    members: list | None = None,\n    remove_finished: bool = False,\n) -&gt; str\n</code></pre> <p>Extract an archive.</p> <p>The archive type and a possible compression is automatically detected from the file name.</p> <p>If the file is compressed but not an archive, the call is dispatched to <code>_decompress</code> function.</p> <p>Parameters:</p> <ul> <li> <code>from_path</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file to be extracted.</p> </li> <li> <code>extract_root</code>               (<code>str | PathLike | None</code>, default:                   <code>None</code> )           \u2013            <p>Path to the directory the file will be extracted to. The given directory will be created if not exist. If omitted, the directory of the archive file is used.</p> </li> <li> <code>members</code>               (<code>list | None</code>, default:                   <code>None</code> )           \u2013            <p>Optional selection of members to extract. If not specified, all members are extracted. Members must be a subset of the list returned by - <code>zipfile.ZipFile.namelist()</code> or a list of strings for zip file - <code>tarfile.TarFile.getmembers()</code> for tar file</p> </li> <li> <code>remove_finished</code>               (<code>bool</code>, default:                   <code>False</code> )           \u2013            <p>If <code>True</code>, remove the file after the extraction.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>str</code>           \u2013            <p>Path to the directory the file was extracted to.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def extract_archive(\n    from_path: str | PathLike,\n    extract_root: str | PathLike | None = None,\n    members: list | None = None,\n    remove_finished: bool = False,\n) -&gt; str:\n    \"\"\"Extract an archive.\n\n    The archive type and a possible compression is automatically detected from\n    the file name.\n\n    If the file is compressed but not an archive, the call is dispatched to `_decompress` function.\n\n    Args:\n        from_path: Path to the file to be extracted.\n        extract_root: Path to the directory the file will be extracted to.\n            The given directory will be created if not exist.\n            If omitted, the directory of the archive file is used.\n        members: Optional selection of members to extract. If not specified,\n            all members are extracted.\n            Members must be a subset of the list returned by\n            - `zipfile.ZipFile.namelist()` or a list of strings for zip file\n            - `tarfile.TarFile.getmembers()` for tar file\n        remove_finished: If `True`, remove the file after the extraction.\n\n    Returns:\n        Path to the directory the file was extracted to.\n    \"\"\"\n    from_path = Path(from_path)\n\n    if extract_root is None:\n        extract_root = from_path.parent\n    else:\n        extract_root = Path(extract_root)\n\n    # create the extract directory if not exist\n    extract_root.mkdir(exist_ok=True)\n\n    logger.info(f\"Extracting {from_path} to {extract_root}\")\n    suffix, archive_type, compression = _detect_file_type(from_path)\n    if not archive_type:\n        return _decompress(\n            from_path,\n            extract_root / from_path.name.replace(suffix, \"\"),\n            remove_finished=remove_finished,\n        )\n\n    extractor = _ARCHIVE_EXTRACTORS[archive_type]\n\n    extractor(str(from_path), str(extract_root), members, compression)\n    if remove_finished:\n        from_path.unlink()\n\n    return str(extract_root)\n</code></pre>"},{"location":"api/utils/#nplinker.utils.is_file_format","title":"is_file_format","text":"<pre><code>is_file_format(\n    file: str | PathLike, format: str = \"tsv\"\n) -&gt; bool\n</code></pre> <p>Check if the file is in the given format.</p> <p>Parameters:</p> <ul> <li> <code>file</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to the file to check.</p> </li> <li> <code>format</code>               (<code>str</code>, default:                   <code>'tsv'</code> )           \u2013            <p>The format to check for, either \"tsv\" or \"csv\".</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>bool</code>           \u2013            <p>True if the file is in the given format, False otherwise.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def is_file_format(file: str | PathLike, format: str = \"tsv\") -&gt; bool:\n    \"\"\"Check if the file is in the given format.\n\n    Args:\n        file: Path to the file to check.\n        format: The format to check for, either \"tsv\" or \"csv\".\n\n    Returns:\n        True if the file is in the given format, False otherwise.\n    \"\"\"\n    try:\n        with open(file, \"rt\") as f:\n            if format == \"tsv\":\n                reader = csv.reader(f, delimiter=\"\\t\")\n            elif format == \"csv\":\n                reader = csv.reader(f, delimiter=\",\")\n            else:\n                raise ValueError(f\"Unknown format '{format}'.\")\n            for _ in reader:\n                pass\n        return True\n    except csv.Error:\n        return False\n</code></pre>"},{"location":"api/utils/#nplinker.utils.list_dirs","title":"list_dirs","text":"<pre><code>list_dirs(\n    root: str | PathLike, keep_parent: bool = True\n) -&gt; list[str]\n</code></pre> <p>List all directories at a given root.</p> <p>Parameters:</p> <ul> <li> <code>root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to directory whose folders need to be listed</p> </li> <li> <code>keep_parent</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>If true, prepends the path to each result, otherwise only returns the name of the directories found</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def list_dirs(root: str | PathLike, keep_parent: bool = True) -&gt; list[str]:\n    \"\"\"List all directories at a given root.\n\n    Args:\n        root: Path to directory whose folders need to be listed\n        keep_parent: If true, prepends the path to each result, otherwise\n            only returns the name of the directories found\n    \"\"\"\n    root = transform_to_full_path(root)\n    directories = [str(p) for p in root.iterdir() if p.is_dir()]\n    if not keep_parent:\n        directories = [os.path.basename(d) for d in directories]\n    return directories\n</code></pre>"},{"location":"api/utils/#nplinker.utils.list_files","title":"list_files","text":"<pre><code>list_files(\n    root: str | PathLike,\n    prefix: str | tuple[str, ...] = \"\",\n    suffix: str | tuple[str, ...] = \"\",\n    keep_parent: bool = True,\n) -&gt; list[str]\n</code></pre> <p>List all files at a given root.</p> <p>Parameters:</p> <ul> <li> <code>root</code>               (<code>str | PathLike</code>)           \u2013            <p>Path to directory whose files need to be listed</p> </li> <li> <code>prefix</code>               (<code>str | tuple[str, ...]</code>, default:                   <code>''</code> )           \u2013            <p>Prefix of the file names to match, Defaults to empty string '\"\"'.</p> </li> <li> <code>suffix</code>               (<code>str | tuple[str, ...]</code>, default:                   <code>''</code> )           \u2013            <p>Suffix of the files to match, e.g. \".png\" or (\".jpg\", \".png\"). Defaults to empty string '\"\"'.</p> </li> <li> <code>keep_parent</code>               (<code>bool</code>, default:                   <code>True</code> )           \u2013            <p>If true, prepends the parent path to each result, otherwise only returns the name of the files found. Defaults to False.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def list_files(\n    root: str | PathLike,\n    prefix: str | tuple[str, ...] = \"\",\n    suffix: str | tuple[str, ...] = \"\",\n    keep_parent: bool = True,\n) -&gt; list[str]:\n    \"\"\"List all files at a given root.\n\n    Args:\n        root: Path to directory whose files need to be listed\n        prefix: Prefix of the file names to match,\n            Defaults to empty string '\"\"'.\n        suffix: Suffix of the files to match, e.g. \".png\" or\n            (\".jpg\", \".png\").\n            Defaults to empty string '\"\"'.\n        keep_parent: If true, prepends the parent path to each\n            result, otherwise only returns the name of the files found.\n            Defaults to False.\n    \"\"\"\n    root = Path(root)\n    files = [\n        str(p)\n        for p in root.iterdir()\n        if p.is_file() and p.name.startswith(prefix) and p.name.endswith(suffix)\n    ]\n\n    if not keep_parent:\n        files = [os.path.basename(f) for f in files]\n\n    return files\n</code></pre>"},{"location":"api/utils/#nplinker.utils.transform_to_full_path","title":"transform_to_full_path","text":"<pre><code>transform_to_full_path(p: str | PathLike) -&gt; Path\n</code></pre> <p>Transform a path to a full path.</p> <p>The path is expanded (i.e. the <code>~</code> will be replaced with actual path) and converted to an absolute path (i.e. <code>.</code> or <code>..</code> will be replaced with actual path).</p> <p>Parameters:</p> <ul> <li> <code>p</code>               (<code>str | PathLike</code>)           \u2013            <p>The path to transform.</p> </li> </ul> <p>Returns:</p> <ul> <li> <code>Path</code>           \u2013            <p>The transformed full path.</p> </li> </ul> Source code in <code>src/nplinker/utils.py</code> <pre><code>def transform_to_full_path(p: str | PathLike) -&gt; Path:\n    \"\"\"Transform a path to a full path.\n\n    The path is expanded (i.e. the `~` will be replaced with actual path) and converted to an\n    absolute path (i.e. `.` or `..` will be replaced with actual path).\n\n    Args:\n        p: The path to transform.\n\n    Returns:\n        The transformed full path.\n    \"\"\"\n    # Multiple calls to `Path` are used to ensure static typing compatibility.\n    p = Path(p).expanduser()\n    p = Path(p).resolve()\n    return Path(p)\n</code></pre>"},{"location":"concepts/bigscape/","title":"BigScape","text":"<p>NPLinker can run BigScape automatically if the <code>bigscape</code> directory does not exist in the working directory. Both version 1 and version 2 of BigScape are supported.</p> <p>See the configuration template for how to set parameters for running BigScape.</p> <p>See the default configurations for the default parameters used in NPLinker.</p>"},{"location":"concepts/config_file/","title":"Config File","text":""},{"location":"concepts/config_file/#configuration-template","title":"Configuration Template","text":"<pre><code>#############################\n# NPLinker configuration file\n#############################\n\n# The root directory of the NPLinker project. You need to create it first.\n# The value is required and must be a full path.\nroot_dir = \"&lt;NPLinker root directory&gt;\"\n# The mode for preparing dataset.\n# The available modes are \"podp\" and \"local\".\n# \"podp\" mode is for using the PODP platform (https://pairedomicsdata.bioinformatics.nl/) to prepare the dataset.\n# \"local\" mode is for preparing the dataset locally. So uers do not need to upload their data to the PODP platform.\n# The value is required.\nmode = \"podp\"\n# The PODP project identifier.\n# The value is required if the mode is \"podp\".\npodp_id = \"\"\n\n\n[log]\n# Log level. The available levels are same as the levels in python package `logging`:\n# \"DEBUG\", \"INFO\", \"WARNING\", \"ERROR\", \"CRITICAL\".\n# The default value is \"INFO\".\nlevel = \"INFO\"\n# The log file to append log messages.\n# The value is optional.\n# If not set or use empty string, log messages will not be written to a file.\n# The file will be created if it does not exist. Log messages will be appended to the file if it exists.\nfile = \"path/to/logfile\"\n# Whether to write log meesages to console.\n# The default value is true.\nuse_console = true\n\n\n[mibig]\n# Whether to use mibig metadta (json).\n# The default value is true.\nto_use = true\n# The version of mibig metadata.\n# Make sure using the same version of mibig in bigscape.\n# The default value is \"3.1\"\nversion = \"3.1\"\n\n\n[bigscape]\n# The parameters to use for running BiG-SCAPE.\n# Version of BiG-SCAPE to run. Make sure to change the parameters property below as well\n# when changing versions.\nversion = 1\n# Required BiG-SCAPE parameters.\n# --------------\n# For version 1:\n# -------------\n# Required parameters are: `--mix`, `--include_singletons` and `--cutoffs`. NPLinker needs them to run the analysis properly.\n# Do NOT set these parameters: `--inputdir`, `--outputdir`, `--pfam_dir`. NPLinker will automatically configure them.\n# If parameter `--mibig` is set, make sure to set the config `mibig.to_use` to true and `mibig.version` to the version of mibig in BiG-SCAPE.\n# The default value is \"--mibig --clans-off --mix --include_singletons --cutoffs 0.30\".\n# --------------\n# For version 2:\n# --------------\n# Note that BiG-SCAPE v2 has subcommands. NPLinker requires the `cluster` subcommand and its parameters.\n# Required parameters of `cluster` subcommand are: `--mibig_version`, `--include_singletons` and `--gcf_cutoffs`.\n# DO NOT set these parameters: `--pfam_path`, `--inputdir`, `--outputdir`. NPLinker will automatically configure them.\n# BiG-SCPAPE v2 also runs a `--mix` analysis by default, so you don't need to set this parameter here.\n# Example parameters for BiG-SCAPE v2: \"--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30\"\nparameters = \"--mibig --clans-off --mix --include_singletons --cutoffs 0.30\"\n# Which bigscape cutoff to use for NPLinker analysis.\n# There might be multiple cutoffs in bigscape output.\n# Note that this value must be a string.\n# The default value is \"0.30\".\ncutoff = \"0.30\"\n\n\n[scoring]\n# Scoring methods.\n# Valid values are \"metcalf\" and \"rosetta\".\n# The default value is \"metcalf\".\nmethods = [\"metcalf\"]\n</code></pre>"},{"location":"concepts/config_file/#default-configurations","title":"Default Configurations","text":"<p>The default configurations are automatically used by NPLinker if you don't set them in your config file.</p> <pre><code># NPLinker default configurations\n\n[log]\nlevel = \"INFO\"\nuse_console = true\n\n[mibig]\nto_use = true\nversion = \"3.1\"\n\n[bigscape]\nversion = 1\nparameters = \"--mibig --clans-off --mix --include_singletons --cutoffs 0.30\"\ncutoff = \"0.30\"\n\n[scoring]\nmethods = [\"metcalf\"]\n</code></pre>"},{"location":"concepts/config_file/#config-loader","title":"Config loader","text":"<p>You can load the configuration file using the load_config function.</p> <pre><code>from nplinker.config import load_config\nconfig = load_config('path/to/nplinker.toml')\n</code></pre> <p>When you use NPLinker as an application, you can get access to the configuration object directly:</p> <pre><code>from nplinker import NPLinker\nnpl = NPLinker('path/to/nplinker.toml')\nprint(npl.config)\n</code></pre>"},{"location":"concepts/gnps_data/","title":"GNPS data","text":"<p>NPLinker requires GNPS molecular networking data as input. It currently accepts data from the following  GNPS workflows:</p> <ul> <li><code>METABOLOMICS-SNETS</code> (data should be downloaded from the option <code>Download Clustered Spectra as MGF</code>)</li> <li><code>METABOLOMICS-SNETS-V2</code> (<code>Download Clustered Spectra as MGF</code>)</li> <li><code>FEATURE-BASED-MOLECULAR-NETWORKING</code> (<code>Download Cytoscape Data</code>)</li> </ul>"},{"location":"concepts/gnps_data/#mappings-from-gnps-data-to-nplinker-input","title":"Mappings from GNPS data to NPLinker input","text":"<code>METABOLOMICS-SNETS</code> workflow<code>METABOLOMICS-SNETS-V2</code><code>FEATURE-BASED-MOLECULAR-NETWORKING</code> NPLinker input GNPS file in the archive of <code>Download Clustered Spectra as MGF</code> spectra.mgf METABOLOMICS-SNETS*.mgf molecular_families.tsv networkedges_selfloop/*.pairsinfo annotations.tsv result_specnets_DB/*.tsv file_mappings.tsv clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.tsv <p>For example, the file <code>METABOLOMICS-SNETS*.mgf</code> from the downloaded zip archive is used as  the <code>spectra.mgf</code> input file of NPLinker. </p> <p>When manually preparing GNPS data for NPLinker, the <code>METABOLOMICS-SNETS*.mgf</code> must be renamed to <code>spectra.mgf</code> and placed in the <code>gnps</code> sub-directory of the NPLinker working directory.</p> NPLinker input GNPS file in the archive of <code>Download Clustered Spectra as MGF</code> spectra.mgf METABOLOMICS-SNETS-V2*.mgf molecular_families.tsv networkedges_selfloop/*.selfloop annotations.tsv result_specnets_DB/*.tsv file_mappings.tsv clusterinfosummarygroup_attributes_withIDs_withcomponentID/*.clustersummary NPLinker input GNPS file in the archive of <code>Download Cytoscape Data</code> spectra.mgf spectra/*.mgf molecular_families.tsv networkedges_selfloop/*.selfloop annotations.tsv DB_result/*.tsv file_mappings.csv quantification_table/*.csv <p>Note that <code>file_mappings.csv</code> is a CSV file, not a TSV file, different from the other workflows.</p>"},{"location":"concepts/working_dir_structure/","title":"Working Directory Structure","text":"<p>NPLinker requires a fixed structure of working directory with fixed names for the input and output data.</p> <pre><code>root_dir # (1)!\n    \u2502\n    \u251c\u2500\u2500 nplinker.toml                           # (2)!\n    \u251c\u2500\u2500 strain_mappings.json                [F] # (3)!\n    \u251c\u2500\u2500 strains_selected.json               [F][O] # (4)!\n    \u2502\n    \u251c\u2500\u2500 gnps                                [F] # (5)!\n    \u2502       \u251c\u2500\u2500 spectra.mgf                 [F]\n    \u2502       \u251c\u2500\u2500 molecular_families.tsv      [F]\n    \u2502       \u251c\u2500\u2500 annotations.tsv             [F]\n    \u2502       \u2514\u2500\u2500 file_mappings.tsv (.csv)    [F] # (6)!\n    \u2502\n    \u251c\u2500\u2500 antismash                           [F] # (7)!\n    \u2502   \u251c\u2500\u2500 GCF_000514975.1\n    \u2502   \u2502   \u251c\u2500\u2500 xxx.region001.gbk\n    \u2502   \u2502   \u2514\u2500\u2500 ...\n    \u2502   \u251c\u2500\u2500 GCF_000016425.1\n    \u2502   \u2502   \u251c\u2500\u2500 xxxx.region001.gbk\n    \u2502   \u2502   \u2514\u2500\u2500 ...\n    \u2502   \u2514\u2500\u2500 ...\n    \u2502\n    \u251c\u2500\u2500 bigscape                            [F][O] # (8)!\n    \u2502   \u251c\u2500\u2500 mix_clustering_c0.30.tsv        [F]    # (9)!\n    \u2502   \u2514\u2500\u2500 bigscape_running_output\n    \u2502       \u2514\u2500\u2500 ...\n    \u2502\n    \u251c\u2500\u2500 downloads                           [F][A] # (10)!\n    \u2502       \u251c\u2500\u2500 paired_datarecord_4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4.json # (11)!\n    \u2502       \u251c\u2500\u2500 GCF_000016425.1.zip\n    \u2502       \u251c\u2500\u2500 GCF_0000514975.1.zip\n    \u2502       \u251c\u2500\u2500 c22f44b14a3d450eb836d607cb9521bb.zip\n    \u2502       \u251c\u2500\u2500 genome_status.json\n    \u2502       \u2514\u2500\u2500 mibig_json_3.1.tar.gz\n    \u2502\n    \u251c\u2500\u2500 mibig                               [F][A] # (12)!\n    \u2502   \u251c\u2500\u2500 BGC0000001.json\n    \u2502   \u251c\u2500\u2500 BGC0000002.json\n    \u2502   \u2514\u2500\u2500 ...\n    \u2502\n    \u251c\u2500\u2500 output                              [F][A] # (13)!\n    \u2502   \u2514\u2500\u2500 ...\n    \u2502\n    \u2514\u2500\u2500 ...                                        # (14)!\n</code></pre> <ol> <li><code>root_dir</code> is the working directory you created, used as the root directory for NPLinker.</li> <li><code>nplinker.toml</code> is the configuration file (toml format) provided by the user for running NPLinker. </li> <li><code>strain_mappings.json</code> contains the mappings from strain to genomics and metabolomics data. It is     generated by NPLinker for <code>podp</code> mode; for <code>local</code> mode, users need to create it manually. <code>[F]</code> means the file name <code>nplinker.toml</code> is a fixed name (including the extension) and must be     named as shown.</li> <li><code>strains_selected.json</code> is an optional file containing the list of strains to be used in the analysis.     If it is not provided, NPLinker will use all strains detected from the input data.  <code>[O]</code> means the file <code>strains_selected.json</code> is optional for users to provide.</li> <li><code>gnps</code> directory contains the GNPS data. The files in this directory must be named as shown.     See XXX for more information about the GNPS data.</li> <li>This file could be <code>.tsv</code> or <code>.csv</code> format.</li> <li><code>antismash</code> directory contains a collection of AntiSMASH BGC data. The BGC data (<code>*.region*.gbk</code>      files) must be stored in subdirectories named after NCBI accession number (e.g. <code>GCF_000514975.1</code>).</li> <li><code>bigscape</code> directory is optional and contains the output of BigScape. If the directory is not     provided, NPLinker will run BigScape automatically to generate the data using the AntiSMASH BGC     data.</li> <li><code>mix_clustering_c0.30.tsv</code> is an example output of BigScape. The file name must follow the pattern     <code>mix_clustering_c{cutoff}.tsv</code>, where <code>{cutoff}</code> is the cutoff value used in the BigScape run.</li> <li><code>downloads</code> directory is automatically created and managed by NPLinker. It stores the downloaded data    from the internet. Users can also use it to store their own downloaded data.  <code>[A]</code> means the directory is automatically created and/or managed by NPLinker.</li> <li>This is an example file, the actual file would be different. Same as the other files in     the <code>downloads</code> directory.</li> <li><code>mibig</code> directory contains the MIBiG metadata, which is automatically created and downloaded by      NPLinker. Users should not interfere with this directory and its content.</li> <li><code>output</code> directory is automatically created by NPLinker. It stores the output data of NPLinker.</li> <li>It's flexible to extend NPLinker by adding other types of data.</li> </ol> <p>Tip</p> <ul> <li><code>[F]</code> means the file or directory name is fixed and must be named as shown. The names are defined in the defaults module.</li> <li><code>[O]</code> means the file or directory is optional for users to provide. It does not mean the file or directory is optional for NPLinker to use. If it's not provided by the user, NPLinker may generate it.</li> <li><code>[A]</code> means the directory is automatically created and/or managed by NPLinker.</li> </ul>"},{"location":"diagrams/arranger/","title":"Dataset Arranging Pipeline","text":"<p>The DatasetArranger is implemented according to the following flowcharts.</p>"},{"location":"diagrams/arranger/#strain-mappings-file","title":"Strain mappings file","text":"<pre><code>flowchart TD\n    StrainMappings[`strain_mappings.json`] --&gt; SM{Is the mode PODP?}\n    SM --&gt; |No |SM0[Validate the file]\n    SM --&gt; |Yes|SM1[Generate the file] --&gt; SM0</code></pre>"},{"location":"diagrams/arranger/#strain-selection-file","title":"Strain selection file","text":"<pre><code>flowchart TD\n    StrainsSelected[`strains_selected.json`] --&gt; S{Does the file exist?}\n    S --&gt; |No | S0[Nothing to do]\n    S --&gt; |Yes| S1[Validate the file]</code></pre>"},{"location":"diagrams/arranger/#podp-project-metadata-json-file","title":"PODP project metadata json file","text":"<pre><code>flowchart TD\n    podp[PODP project metadata json file] --&gt; A{Is the mode PODP?}\n    A --&gt; |No | A0[Nothing to do]\n    A --&gt; |Yes| P{Does the file exist?}\n    P --&gt; |No | P0[Download the file] --&gt; P1\n    P --&gt; |Yes| P1[Validate the file]</code></pre>"},{"location":"diagrams/arranger/#gnps-antismash-and-bigscape","title":"GNPS, AntiSMASH and BigScape","text":"<pre><code>flowchart TD\n    ConfigError[Dynaconf config validation error]\n    DataError[Data validation error]\n    UseIt[Use the data]\n    Download[First remove existing data if relevent, then download or generate data]\n\n    A[GNPS, antiSMASH and BigSCape] --&gt; B{Pass Dynaconf config validation?}\n    B --&gt;|No | ConfigError\n    B --&gt;|Yes| G{Is the mode PODP?}\n\n    G --&gt;|No, local mode| G1{Does data dir exist?}\n    G1 --&gt;|No | DataError\n    G1 --&gt;|Yes| H{Pass data validation?}\n    H --&gt; |No | DataError\n    H --&gt; |Yes| UseIt \n\n    G --&gt;|Yes, podp mode| G2{Does data dir exist?}\n    G2 --&gt; |No | Download\n    G2 --&gt; |Yes | J{Pass data validation?}\n    J --&gt;|No | Download --&gt; |try max 2 times| J\n    J --&gt;|Yes| UseIt</code></pre>"},{"location":"diagrams/arranger/#mibig-data","title":"MIBiG Data","text":"<p>MIBiG data is always downloaded automatically. Users cannot provide their own MIBiG data.</p> <pre><code>flowchart TD\n    Mibig[MIBiG] --&gt; M0{Pass Dynaconf config validation?}\n    M0 --&gt;|No | M01[Dynaconf config validation error]\n    M0 --&gt;|Yes | MibigDownload[First remove existing data if relevant and then download data]</code></pre>"},{"location":"diagrams/loader/","title":"Dataset Loading Pipeline","text":"<p>The DatasetLoader is implemented according to the following pipeline.</p> <p></p>"}]}
\ No newline at end of file