-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added the profile2cami tool, a component of the TaxonKit suite. #6085
Added the profile2cami tool, a component of the TaxonKit suite. #6085
Conversation
<param name="show_rank" type="text" value="superkingdom,phylum,class,order,family,genus,species,strain" label="Show Ranks" help="Specify the ranks to show in the result file." /> | ||
</inputs> | ||
<outputs> | ||
<data name="cami_output" format="tsv" label="Converted CAMI Output" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a proper CAMI format to Galaxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taxonkit serves as an intermediary step in my project to prepare data for the upcoming CAMI Opal Tool, which requires the CAMI Profiling Bioboxes format as input. Within the scope of my project, implementing the CAMI format isn't necessary, but it could be beneficial for broader applications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is well-defined and could become a standard (https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd) probably a good start for me to learn how to add a new format. But as table it works just fine ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this file .loc.test
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested renaming the file; unfortunately, I think Galaxy needs a .loc
file to find the content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to shrink this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally! Reduced the size a bit; I hope that is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome quality! Just a few tiny questions. Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. In generally, I am just a bit confused about the recomputation of the abundance. Basic: 99.999999999999986
; with percentage: 1.000000000000000
; that could just be NumPy rounding issues, but why does recompute not consider the deleted taxon ?
<param name="show_rank" type="text" value="superkingdom,phylum,class,order,family,genus,species,strain" label="Show Ranks" help="Specify the ranks to show in the result file." /> | ||
</inputs> | ||
<outputs> | ||
<data name="cami_output" format="tsv" label="Converted CAMI Output" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is well-defined and could become a standard (https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd) probably a good start for me to learn how to add a new format. But as table it works just fine ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with these small changes it is ready. But when I tested it, I realised, that the results seem not to comply with the CAMI satetment: The percentages given for all taxa from the same rank should sum up to <= 100%
https://github.com/CAMI-challenge/contest_information/blob/master/file_formats/CAMI_TP_specification.mkd
However, the tools seems to sum up percentages when lower ranks are reported, this is not correct. I need to inverstigate, but if the tools is indeed wrong, I think we should not merge it ...
tools/taxonkit/.shed.yml
Outdated
@@ -0,0 +1,18 @@ | |||
name: taxonkit | |||
owner: iuc | |||
description: Tools for metagenomic profiling and analysis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not used for profiling and specific to NCBI TaxonKit - A Practical and Efficient NCBI Taxonomy Toolkit
something like that I guess
@@ -0,0 +1,17 @@ | |||
<macros> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add bio.tool xml: https://bio.tools/taxonkit
--data-dir '${taxonomy.fields.path}' | ||
--abundance-field '${abundance_field}' | ||
--taxid-field '${taxid_field}' | ||
#if $percentage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See @bernt-matthias comment https://github.com/galaxyproject/tools-iuc/pull/6096/files/0f22502b798ad0481fd372e66c1cb26e31734ee8#r1647385458
Can you please use truevalue and remove the if for all such cases ?
]]> | ||
</command> | ||
<inputs> | ||
<param name="input_file" type="data" format="txt" label="Input Profile File" help="Input the tab-delimited profile file with TaxId and abundance columns." /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<param name="input_file" type="data" format="txt" label="Input Profile File" help="Input the tab-delimited profile file with TaxId and abundance columns." /> | |
<param name="input_file" type="data" format="txt" label="Input Profile File" help="A tab-delimited profile file with TaxId and abundance columns." /> |
</command> | ||
<inputs> | ||
<param name="input_file" type="data" format="txt" label="Input Profile File" help="Input the tab-delimited profile file with TaxId and abundance columns." /> | ||
<param argument="--taxonomy" type="select" label="NCBI taxonomy" help="To have actual human-readable taxon names in the standardised output"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<param argument="--taxonomy" type="select" label="NCBI taxonomy" help="To have actual human-readable taxon names in the standardised output"> | |
<param argument="--taxonomy" type="select" label="NCBI taxonomy" help="This NCBI database is used to map human-readable taxon names to TaxId's."> |
|
||
Profile2CAMI is a tool for converting metagenomic profile tables to CAMI format. | ||
|
||
**Inputs** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All info that is given in the input parameters can be removed from the help
<param name="keep_zero" type="boolean" value="false" label="Keep Zero Abundances" help="Check to keep taxons with abundance of zero." /> | ||
<param name="sample_id" type="text" value="" label="Sample ID" help="Optional sample ID to include in the result file." /> | ||
<param name="taxonomy_id" type="text" value="" label="Taxonomy ID" help="Optional taxonomy ID to include in the result file." /> | ||
<param name="show_rank" type="text" value="superkingdom,phylum,class,order,family,genus,species,strain" label="Show Ranks" help="Specify the ranks to show in the result file." /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use a type="select" and multiple="true" param for this ´, like this one:
<param name="show_rank" type="select" multiple="true" label="Show Ranks">
<option value="superkingdom">Superkingdom</option>
<option value="phylum">Phylum</option>
<option value="class">Class</option>
<option value="order">Order</option>
<option value="family">Family</option>
<option value="genus">Genus</option>
<option value="species">Species</option>
<option value="strain">Strain</option>
</param>
Make all options
When investigating further, I think this tool does currently not work as expected: shenwei356/taxonkit#99 |
Hi guys, a new TaxonKit version is ready for this: bioconda/bioconda-recipes#48917 |
<!-- Locations of taxonomy data downloaded from NCBI --> | ||
<table name="ncbi_taxonomy" comment_char="#"> | ||
<columns>value, name, path</columns> | ||
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc" /> | |
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc.test" /> |
@@ -0,0 +1,2 @@ | |||
#value name path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this file should be renamed to .loc.test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgruening thank you for your review. I hope the changes now align with your suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this changes (only cosmetics) its good for merge from my side
<!-- Locations of taxonomy data downloaded from NCBI --> | ||
<table name="ncbi_taxonomy" comment_char="#"> | ||
<columns>value, name, path</columns> | ||
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc.sample" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<file path="${__HERE__}/test-data/ncbi_taxonomy.loc.sample" /> | |
<file path="tool-data/ncbi_taxonomy.loc" /> |
afaik, the ${__HERE__}
variable is only used in tests and the real data will be stored in the tool-data folder, @bgruening can you confirm that ?
Also the path should point to the .loc file not the sample.
@@ -0,0 +1,2 @@ | |||
#value name path | |||
test-db-tox "Test Database" ${__HERE__}/test-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test-db-tox "Test Database" ${__HERE__}/test-db | |
test-db-tox "Test Database" tool-deta/test-db |
Same as the other comment, although the real path does not really matter, since the DM sets it
@@ -0,0 +1,2 @@ | |||
#value name path | |||
test-db-tox "Test Database" ${__HERE__}/test-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test-db-tox "Test Database" ${__HERE__}/test-db |
Line should be removed. The .sample
files and their content would be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed or commented ?, in most other tools there is a comment showing how it looks like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented is also fine .. an example is a good thing to have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small change required
#if $taxonomy_id: | ||
-t '${taxonomy_id}' | ||
#end if | ||
--show-rank '${ranks}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put this in an if condition, so that if nothing is selected, all ranks are used as is the default by the tool ? also, please adjust the test for it, that's more convenient for the user
Good to merge from my side @bgruening @bernt-matthias ? |
…xyproject#6085) * Added the profile2cami tool, a component of the TaxonKit suite. * Renamed shed.yml -> .shed.yml * Shrinked delnodes.dmp 684kb -> 150kb * Renamed test.loc -> ncbi_taxonomy.loc * Adjusted taxonkit, removed folder profile2cami, added suite in .shed.yml * Updated Taxonkit to version 0.17.0 - worked on issues. * Renamed taxonomy.loc -> taxonomy.loc.test * Adjusted test, adjusted loc.sample * Try to fix linting error * Resolving linting error * Adjusted the output label.
Tool details for profile2cami
Checklist for submission
FOR CONTRIBUTOR:
Thank you for your consideration.