-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add name2taxid function from taxonkit #6146
Changes from 35 commits
dda83a8
698d432
28d1db1
306860f
d3876fb
855fae6
0af783a
022abb6
45b2a87
f244cdb
7d875e8
8ae7dbc
ae067f4
1be6b8f
55ef8d0
ea0ae5e
fb592ac
6cfc298
f50f0b3
380719b
65b1996
b2f8360
f93386a
03dc04b
5e4af94
12363c9
2fd395a
8fe321e
5341887
d96c35d
ccd125e
d44f3dd
214cc69
de7b816
7a9c364
49663eb
3d91034
e2f9f64
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
<tool id="name2taxid" name="Name2taxid" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
<description>Convert taxon names to NCBI Taxids</description> | ||
<macros> | ||
<import>macros.xml</import> | ||
</macros> | ||
<expand macro="biotools"/> | ||
<expand macro="requirements"/> | ||
<command detect_errors="exit_code"> | ||
<![CDATA[ | ||
|
||
mkdir -p ../home/.taxonkit && | ||
|
||
#if $data.is_select == 'his': | ||
ln -s '$taxdump' 'taxdump.tar.gz' && | ||
tar -xf 'taxdump.tar.gz' -C '.' && | ||
#else: | ||
ln -s '$ncbi.fields.path/names.dmp' 'names.dmp' && | ||
ln -s '$ncbi.fields.path/merged.dmp' 'merged.dmp' && | ||
ln -s '$ncbi.fields.path/nodes.dmp' 'nodes.dmp' && | ||
ln -s '$ncbi.fields.path/delnodes.dmp' 'delnodes.dmp' && | ||
#end if | ||
|
||
taxonkit name2taxid | ||
--data-dir '.' | ||
--name-field '$name_field' | ||
$sci_name | ||
$show_rank | ||
'$input' | ||
> '$output' | ||
]]> | ||
</command> | ||
<inputs> | ||
<param name="input" type="data" format="tabular" label="Input file" help="Input any tsv file where the NCBI names are written. You can also use a .txt but only one name per row!"/> | ||
<param argument="--name-field" type="data_column" data_ref="input" label="Select column with the names" help="Select the column where the name is written"/> | ||
<param argument="--sci-name" type="boolean" falsevalue="" truevalue="--sci-name" checked="false" label="Only searching scientific names" help="With this option a non-scientific name will not yield any taxid since the tool will ignore them in the search. NOTE: The non-scientific names will still be in the output without taxid! "/> | ||
<param argument="--show-rank" type="boolean" falsevalue="" truevalue="--show-rank" checked="false" label="Show rank" help="Use this option to yield the rank of the name in the output. For an example look at the help section!"/> | ||
<conditional name="data"> | ||
<param name="is_select" type="select" label="Use either a cached NCBI database or provide a downloaded version."> | ||
<option value="dm">Cached database</option> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is really just nitpicking. But since those params are exposed to users in workflows etc ... can we rename the and his to history? |
||
<option value="his">History</option> | ||
</param> | ||
<when value="dm"> | ||
<param name="ncbi" type="select" label="NCBI database" help="Choose NCBI database version"> | ||
<options from_data_table="ncbi_taxonomy"> | ||
<validator message="No NCBI database is available" type="no_options"/> | ||
</options> | ||
</param> | ||
</when> | ||
<when value="his"> | ||
<param name="taxdump" type="data" format="tgz" label="Input the taxdump.tar.gz file" | ||
help="You can find the taxdum.tar.gz at ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz"/> | ||
</when> | ||
</conditional> | ||
</inputs> | ||
<outputs> | ||
<data name="output" format="tabular" label="Names2taxID"/> | ||
</outputs> | ||
<tests> | ||
<test> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can u add a test for the rank option There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no way i can add this since i need special lines from the original database and i dont know how they work together to get the rank option. I did add an example in the help section to show how the output should look it you use a complete database |
||
<param name="input" value="name2taxid_test1.tsv" ftype="tabular"/> | ||
<param name="name_field" value="1"/> | ||
<param name="sci_name" value="True"/> | ||
<conditional name="data"> | ||
<param name="is_select" value="dm"/> | ||
<param name="ncbi" value="test-db-tox"/> | ||
</conditional> | ||
<output name="output" file="name2taxid_result1.tsv"/> | ||
</test> | ||
<test> | ||
<param name="input" value="name2taxid_test2.tsv" ftype="tabular"/> | ||
<param name="show_rank" value="True"/> | ||
<conditional name="data"> | ||
<param name="is_select" value="dm"/> | ||
<param name="ncbi" value="test-db-tox"/> | ||
</conditional> | ||
<param name="name_field" value="2"/> | ||
<output name="output" file="name2taxid_result2.tsv"/> | ||
</test> | ||
<test> | ||
<param name="input" value="name2taxid_test3.txt" ftype="tabular"/> | ||
<param name="name_field" value="1"/> | ||
<conditional name="data"> | ||
<param name="is_select" value="his"/> | ||
<param name="taxdump" ftype="tgz" value="test.tar.gz"/> | ||
</conditional> | ||
<output name="output" file="name2taxid_result3.tsv"/> | ||
</test> | ||
</tests> | ||
<help> | ||
<![CDATA[ | ||
|
||
This tool can convert a NCBI name to its corresponding taxid. Input a tsv or txt file and state the column where the name are written | ||
|
||
.. class:: infomark | ||
|
||
Example | ||
|
||
:: | ||
|
||
Homo sapiens | ||
Akkermansia muciniphila ATCC BAA-835 | ||
Akkermansia muciniphila | ||
Mouse Intracisternal A-particle | ||
|
||
**sci_name option** | ||
.. class:: infomark | ||
|
||
For example, the name "Enterococcus coli" is not a scientific name which means with this option you can remove it from the query to find a taxid to it but it will still be in the output. In contrast, for example, Drosophila is a scientific name which means that this will always be searched in the query even if the option is on or off. | ||
|
||
**show_rank option** | ||
..class:: infomark | ||
|
||
Here is an example of the output if you use the option: | ||
|
||
:: | ||
|
||
Homo sapiens 9606 species | ||
Akkermansia muciniphila ATCC BAA-835 349741 strain | ||
Akkermansia muciniphila 239935 species | ||
Mouse Intracisternal A-particle 11932 species | ||
|
||
without this option the output will be: | ||
|
||
:: | ||
|
||
Homo sapiens 9606 | ||
Akkermansia muciniphila ATCC BAA-835 349741 | ||
Akkermansia muciniphila 239935 | ||
Mouse Intracisternal A-particle 11932 | ||
|
||
]]> | ||
</help> | ||
<expand macro="citations"/> | ||
</tool> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Homo sapiens 9606 | ||
Akkermansia muciniphila ATCC BAA-835 349741 | ||
Akkermansia muciniphila 239935 | ||
Mouse Intracisternal A-particle 11932 | ||
Enterococcus coli |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
test Homo sapiens 9606 | ||
test Akkermansia muciniphila ATCC BAA-835 349741 | ||
test Akkermansia muciniphila 239935 | ||
test Mouse Intracisternal A-particle 11932 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Drosophila 7215 | ||
Drosophila 32281 | ||
Drosophila 2081351 | ||
Enterococcus coli 562 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
Homo sapiens | ||
Akkermansia muciniphila ATCC BAA-835 | ||
Akkermansia muciniphila | ||
Mouse Intracisternal A-particle | ||
Enterococcus coli |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
test Homo sapiens | ||
test Akkermansia muciniphila ATCC BAA-835 | ||
test Akkermansia muciniphila | ||
test Mouse Intracisternal A-particle |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Drosophila | ||
Enterococcus coli |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
#value name path | ||
test-db-tox Test Database ${__HERE__}/test-db |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,4 @@ | ||
|
||
2923441 | | ||
2923440 | | ||
2923439 | | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,2 @@ | ||
#value name path | ||
# test-db-tox "Test Database" tool-data/test-db | ||
test-db-tox "Test Database" ${__HERE__}/test-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we still need delnodes.dmp and names.dmp in the test-db if we only use the test.tar.gz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, i will remove it,