-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Cami Opal - A tool for evaluating taxonomic metagenome profilers #6096
Open
Albert-Ber
wants to merge
32
commits into
galaxyproject:main
Choose a base branch
from
Albert-Ber:feature-cami_opal
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
33a53eb
Added the profile2cami tool, a component of the TaxonKit suite.
Albert-Ber 461a8a7
Renamed shed.yml -> .shed.yml
Albert-Ber 3db4745
Shrinked delnodes.dmp 684kb -> 150kb
Albert-Ber d15df70
Renamed test.loc -> ncbi_taxonomy.loc
Albert-Ber a09b364
Added the cami OPAl tool.
Albert-Ber fbd0a54
Removed large files from test-data
Albert-Ber 5a195bb
Cleaned up test-data, modified test in cami_opal.xml.
Albert-Ber d462694
renamed shed.yml -> .shed.yml
Albert-Ber 0f22502
fixed linting errors
Albert-Ber 15dea3a
Merge branch 'galaxyproject:main' into feature-cami_opal
Albert-Ber 047becc
Adjusted, issues from PR for opal
Albert-Ber a7d5b4b
Removed taxonkit, from the cami-opal PR
Albert-Ber 060f3bf
Exchanged .shed.yml with the right one
Albert-Ber fde7a42
Fixed the tests, worked on issues, cleaned up the code
Albert-Ber f01b119
Some cleaning up...
Albert-Ber be596e9
Merge branch 'galaxyproject:main' into feature-cami_opal
Albert-Ber 75cb130
Worked on issues regarding opal.xml
Albert-Ber d984487
collection output
paulzierep f40b015
Merge pull request #2 from paulzierep/feature-cami_opal
Albert-Ber 76a72b1
Removed gzip option, renamed campi-opal to cami_opal to match cami_amber
Albert-Ber ae15a9a
Worked on implementing collection output, integrated new tests, label…
Albert-Ber d21db03
Removed unnecessary files from cami_opal
Albert-Ber edd2a57
Adjusted shed.yml
Albert-Ber 21ef9db
Adjusted the help label, mentioned profile2cami to make it easier for…
Albert-Ber 080c7a3
Worked on Opal issues
Albert-Ber e708085
Created a test to check normalization, adjusted the filter option
Albert-Ber 4361fe2
Reset Version to 1.0.12
Albert-Ber db4cc5c
Merge branch 'galaxyproject:main' into feature-cami_opal
Albert-Ber c7a4d49
Added biotools, removed extra info from help section
Albert-Ber 7e05dd0
Added right biotools ref
Albert-Ber 7146070
Changed Discription
Albert-Ber 8cffbbb
Fixed issues mentioned by bgruening
Albert-Ber File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
name: cami_opal | ||
owner: iuc | ||
description: Evaluation package for metagenome taxonomic assignments | ||
homepage_url: https://github.com/CAMI-challenge/OPAL | ||
long_description: | | ||
OPAL is an evaluation package designed for assessing metagenome taxonomic assignments. | ||
It provides performance metrics, results rankings, and comparative visualizations | ||
for evaluating multiple programs or parameter effects on metagenome taxonomic assignments. | ||
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/master/tools/opal/ | ||
type: unrestricted | ||
categories: | ||
- Metagenomics |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,293 @@ | ||
<tool id="cami_opal" name="CAMI OPAL" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
<description>Evaluation tool for multiple read-based metagenomic taxonomic profilers</description> | ||
<macros> | ||
<import>macros.xml</import> | ||
</macros> | ||
<expand macro="biotools"/> | ||
<expand macro="requirements" /> | ||
<command detect_errors="exit_code"> | ||
<![CDATA[ | ||
## Set environment variable to ignore specific Python warnings | ||
export PYTHONWARNINGS="ignore::FutureWarning" && | ||
|
||
#import re | ||
|
||
## Define the path for the input files and create directories | ||
mkdir -p inputs && | ||
#set $labels = [] | ||
|
||
## Create symbolic links for input files in the 'inputs' directory | ||
#for $i, $file in enumerate($input_files): | ||
#set safe_identifier = re.sub('[^\w\-\.]', '_', $file.element_identifier) | ||
ln -s '$file' 'inputs/${safe_identifier}' && | ||
$labels.append($file.element_identifier) | ||
#end for | ||
|
||
opal.py | ||
-g '${gold_standard_file}' | ||
|
||
#for $i, $file in enumerate($input_files): | ||
#set safe_identifier = re.sub('[^\w\-\.]', '_', $file.element_identifier) | ||
'inputs/${safe_identifier}' | ||
#end for | ||
|
||
-l '${','.join($labels)}' | ||
|
||
$normalize | ||
|
||
#if $filter: | ||
-f '${filter}' | ||
#end if | ||
|
||
$plot_abundances | ||
|
||
#if $desc: | ||
-d '${desc}' | ||
#end if | ||
#if $ranks: | ||
-r '${ranks}' | ||
#end if | ||
#if $metrics_plot_rel: | ||
--metrics_plot_rel '${metrics_plot_rel}' | ||
#end if | ||
#if $metrics_plot_abs: | ||
--metrics_plot_abs '${metrics_plot_abs}' | ||
#end if | ||
#if $branch_length_function: | ||
-b '${branch_length_function}' | ||
#end if | ||
|
||
$normalized_unifrac | ||
|
||
-o output | ||
|
||
#if $html_output | ||
## Copy the results to the specified output folder | ||
&& mkdir '$htmlreport.extra_files_path' | ||
&& cp output/results.html $htmlreport | ||
&& cp -r output/* '$htmlreport.extra_files_path' | ||
#end if | ||
]]> | ||
</command> | ||
<inputs> | ||
<param name="gold_standard_file" type="data" format="txt" label="Gold standard file" | ||
help="Input the gold standard file here. Format: CAMI Profiling Bioboxes." /> | ||
<param name="input_files" type="data" format="txt" multiple="true" label="Input files" | ||
help="Enter multiple input files. Format: CAMI Profiling Bioboxes. If your files are not in this format, you can use the 'profile2cami' tool to convert them to the CAMI Profiling format." /> | ||
<param name="html_output" type="boolean" label="Output in HTML format" | ||
help="Select this option to generate an HTML file that contains the analysis results." | ||
truevalue="--html_output" falsevalue="" checked="true" /> | ||
<param name="output_collections" type="boolean" label="Generate tool and rank output collections" | ||
help="Select this option to generate collections of tool-specific and rank-specific tables." | ||
truevalue="true" falsevalue="false" checked="true" /> | ||
<param argument="-n" name="normalize" type="boolean" optional="true" | ||
label="Normalize samples" | ||
help="Normalize the samples to compare them on the same scale." | ||
Albert-Ber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
truevalue="-n" falsevalue="" /> | ||
<param argument="--filter" type="float" value="0" optional="true" | ||
label="Filter out predictions with the smallest relative abundances summing up to this percentage within a rank" | ||
help="This parameter allows you to filter out the predictions with the smallest relative abundances, such that their cumulative sum is equal to the specified percentage within a taxonomic rank. The value should be between 0 and 100." | ||
min="0" max="100" /> | ||
<param name="plot_abundances" type="boolean" optional="true" | ||
label="Plot abundances in the gold standard" | ||
help="Plot abundances in the gold standard (can take some minutes)" | ||
truevalue="-p" falsevalue="" /> | ||
<param argument="--desc" type="text" value="" | ||
label="HTML description" | ||
help="Enter the HTML page description here" /> | ||
<param argument="--ranks" type="select" multiple="true" label="Taxonomic ranks" | ||
help="Choose the highest and lowest taxonomic ranks to consider in performance rankings."> | ||
<option value="superkingdom">Superkingdom</option> | ||
<option value="phylum">Phylum</option> | ||
<option value="class">Class</option> | ||
<option value="order">Order</option> | ||
<option value="family">Family</option> | ||
<option value="genus">Genus</option> | ||
<option value="species">Species</option> | ||
<option value="strain">Strain</option> | ||
</param> | ||
<param argument="--metrics_plot_rel" type="select" multiple="true" label="Metrics for relative performance plot" | ||
help="Select metrics to include in the spider plot of relative performances."> | ||
<option value="w">Weighted Unifrac</option> | ||
<option value="l">L1 Norm</option> | ||
<option value="c">Completeness</option> | ||
<option value="p">Purity</option> | ||
<option value="f">False Positives</option> | ||
<option value="t">True Positives</option> | ||
</param> | ||
<param argument="--metrics_plot_abs" type="select" multiple="true" optional="true" | ||
label="Metrics for spider plot of absolute performances" | ||
help="Select valid metrics for the spider plot of absolute performances."> | ||
<option value="c">Completeness</option> | ||
<option value="p">Purity</option> | ||
<option value="b">Bray-Curtis</option> | ||
</param> | ||
<param argument="--branch_length_function" type="text" value="" optional="true" | ||
label="UniFrac tree branch length function" | ||
help="Default: 'lambda x: 1/x', where x=tree depth" /> | ||
<param name="normalized_unifrac" type="boolean" optional="true" | ||
label="Compute normalized version of weighted UniFrac" | ||
help="Compute normalized version of weighted UniFrac by dividing by the theoretical max unweighted UniFrac" | ||
truevalue="--normalized_unifrac" falsevalue="" /> | ||
</inputs> | ||
<outputs> | ||
<data format="html" name="htmlreport" label="${tool.name} on ${on_string}: HTML report" > | ||
<filter>html_output</filter> | ||
</data> | ||
<data name="result" format="tabular" from_work_dir="output/results.tsv" label="${tool.name} on ${on_string}: Results" /> | ||
<collection name="rank_output" type="list" label="${tool.name}: Rank tables" > | ||
<filter>output_collections</filter> | ||
<discover_datasets pattern="(?P<designation>.+)\.tsv" directory="output/by_rank" format="tabular"/> | ||
</collection> | ||
<collection name="tool_output" type="list" label="${tool.name}: Tool tables" > | ||
<filter>output_collections</filter> | ||
<discover_datasets pattern="(?P<designation>.+)\.tsv" directory="output/by_tool/" format="tabular"/> | ||
</collection> | ||
</outputs> | ||
<tests> | ||
<!-- Test basic functionality with one input file and default parameters --> | ||
<test expect_num_outputs="1"> | ||
<param name="gold_standard_file" value="gs_test.profile" /> | ||
<param name="input_files" value="motus_test.profile,metaphlan2_test.profile" /> | ||
<param name="html_output" value="false" /> | ||
<param name="output_collections" value="false"/> | ||
<param name="normalize" value="false"/> | ||
<output name="result" ftype="tabular"> | ||
<assert_contents> | ||
Albert-Ber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<has_text text="Gold standard" /> | ||
<has_text text="metaphlan2_test.profile"/> | ||
<has_text text="motus_test.profile"/> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
|
||
<!-- Test with HTML output enabled --> | ||
<test expect_num_outputs="2"> | ||
<param name="gold_standard_file" value="gs_test.profile" /> | ||
<param name="input_files" value="motus_test.profile,metaphlan2_test.profile" /> | ||
<param name="desc" value="Test description for OPAL"/> | ||
<param name="html_output" value="true"/> | ||
<param name="output_collections" value="false"/> | ||
<output name="htmlreport" ftype="html"> | ||
<assert_contents> | ||
<has_text text="Test description for OPAL" /> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
|
||
<!-- Test with all parameters enabled --> | ||
<test expect_num_outputs="4"> | ||
<param name="gold_standard_file" value="gs_test.profile" /> | ||
<param name="input_files" value="motus_test.profile,metaphlan2_test.profile,metaphyler_test.profile" /> | ||
<param name="normalize" value="true"/> | ||
<param name="filter" value="5"/> | ||
<param name="plot_abundances" value="true"/> | ||
<param name="desc" value="Test description for OPAL"/> | ||
<param name="ranks" value="superkingdom,species"/> | ||
<param name="metrics_plot_rel" value="w,l,c,p,f,t"/> | ||
<param name="metrics_plot_abs" value="c,p,b"/> | ||
<param name="branch_length_function" value="lambda x: 1/x"/> | ||
<param name="normalized_unifrac" value="true"/> | ||
<param name="html_output" value="true"/> | ||
<param name="output_collections" value="true"/> | ||
<output name="htmlreport" ftype="html"> | ||
<assert_contents> | ||
<has_text text="Test description for OPAL" /> | ||
</assert_contents> | ||
</output> | ||
<output name="result" ftype="tabular"> | ||
<assert_contents> | ||
<has_text text="Gold standard" /> | ||
<has_text text="metaphlan2_test.profile"/> | ||
<has_text text="motus_test.profile"/> | ||
<has_text text="metaphyler_test.profile"/> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
|
||
<!-- Test with all multiple samples in Test-Data --> | ||
<test expect_num_outputs="2"> | ||
<param name="gold_standard_file" value="gs_test.profile" /> | ||
<param name="input_files" value="motus_test.profile,metaphlan2_test.profile,metaphyler_test.profile" /> | ||
<param name="normalize" value="true"/> | ||
<param name="filter" value="5"/> | ||
<param name="plot_abundances" value="true"/> | ||
<param name="desc" value="Test description for OPAL"/> | ||
<param name="ranks" value="superkingdom,species"/> | ||
<param name="metrics_plot_rel" value="w,l,c,p,f,t"/> | ||
<param name="metrics_plot_abs" value="c,p,b"/> | ||
<param name="branch_length_function" value="lambda x: 1/x"/> | ||
<param name="normalized_unifrac" value="true"/> | ||
<param name="html_output" value="true"/> | ||
<param name="output_collections" value="false"/> | ||
<output name="htmlreport" ftype="html"> | ||
<assert_contents> | ||
<has_text text="Test description for OPAL" /> | ||
</assert_contents> | ||
</output> | ||
<output name="result" ftype="tabular"> | ||
<assert_contents> | ||
<has_text text="Gold standard" /> | ||
<has_text text="metaphlan2_test.profile"/> | ||
<has_text text="motus_test.profile"/> | ||
<has_text text="metaphyler_test.profile"/> | ||
</assert_contents> | ||
</output> | ||
</test> | ||
<!-- Test with normalization enabled --> | ||
<test expect_num_outputs="1"> | ||
<param name="gold_standard_file" value="gs_test.profile" /> | ||
<param name="input_files" value="kraken_test.profile" /> | ||
<param name="normalize" value="true"/> | ||
<param name="html_output" value="false"/> | ||
<param name="output_collections" value="false"/> | ||
<output name="result" ftype="tabular" file="normalized_k.tsv" lines_diff="30" /> | ||
</test> | ||
|
||
<!-- Test with normalization disabled --> | ||
<test expect_num_outputs="1"> | ||
<param name="gold_standard_file" value="gs_test.profile" /> | ||
<param name="input_files" value="kraken_test.profile" /> | ||
<param name="normalize" value="false"/> | ||
<param name="html_output" value="false"/> | ||
<param name="output_collections" value="false"/> | ||
<output name="result" ftype="tabular" file="not_normalized_k.tsv" lines_diff="30" /> | ||
</test> | ||
</tests> | ||
<help> | ||
<![CDATA[ | ||
.. class:: infomark | ||
|
||
**What is OPAL** | ||
|
||
OPAL is an evaluation package for the comparative assessment of metagenome benchmark datasets. It calculates multiple metrics per dataset and provides results rankings and visualizations for assessing multiple programs or parameter effects. | ||
|
||
**What it does** | ||
|
||
OPAL performs the following key tasks: | ||
- Evaluates profiles using a gold standard file. | ||
- Generates multiple metrics for each profile. | ||
- Provides comparative visualizations and performance rankings. | ||
|
||
For more information, please visit `OPAL on GitHub <https://github.com/CAMI-challenge/OPAL>`_. | ||
|
||
**Input** | ||
|
||
OPAL requires the following inputs: | ||
|
||
1. **Gold Standard File** | ||
- This file is essential for the evaluation and should be CAMI Profiling Bioboxes format. | ||
|
||
2. **Profiles Files** | ||
- Multiple profile files are required for evaluation. If your files are not in the required format, you can use the `profile2cami` tool to convert them to the CAMI Profiling format. | ||
|
||
**Outputs** | ||
|
||
OPAL generates the following outputs: | ||
|
||
1. **HTML Report**: An HTML file containing visualizations and summary of the evaluation. | ||
2. **Results File**: A TSV file with detailed evaluation metrics for each profile. | ||
]]> | ||
</help> | ||
<expand macro="citations" /> | ||
</tool> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
<macros> | ||
bgruening marked this conversation as resolved.
Show resolved
Hide resolved
|
||
<xml name="requirements"> | ||
<requirements> | ||
<requirement type="package" version="@TOOL_VERSION@">cami-opal</requirement> | ||
<yield/> | ||
</requirements> | ||
</xml> | ||
<token name="@TOOL_VERSION@">1.0.12</token> | ||
<token name="@VERSION_SUFFIX@">0</token> | ||
<token name="@PROFILE@">21.05</token> | ||
<xml name="biotools"> | ||
<xrefs> | ||
<xref type="bio.tools">Open-community_Profiling_Assessment_tooL</xref> | ||
</xrefs> | ||
</xml> | ||
<xml name="citations"> | ||
<citations> | ||
<citation type="doi">10.1038/s41592-022-01431-4</citation> | ||
<yield/> | ||
</citations> | ||
</xml> | ||
</macros> |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all output files needs to be in this folder or only a handful that are part of the HTML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most files should remain, as I haven't tested which ones are essential and which can be removed. So, I would recommend that you keep all of them in the folder for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large is such a folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the detailed examples form: Here - they are approximately 20 MB.