Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting GTDB taxID for newly generated MAGs #7

Open
PeterCx opened this issue Jan 20, 2023 · 6 comments
Open

Getting GTDB taxID for newly generated MAGs #7

PeterCx opened this issue Jan 20, 2023 · 6 comments

Comments

@PeterCx
Copy link

PeterCx commented Jan 20, 2023

Hi there,

I have MAGs is generated from my own samples and annotated using GTDB-Tk. They have been de-replicated leaving me with a unique set which are not genetically close to any reference genome in GTDBr207. Can I use this tool to get taxIDs for my MAGs? I want to build a custom database with my MAGs and GTDB and for this I require taxIDs.

Its not clear to me if this is possible using this tool. Your help is greatly appreciated.

Kind regards,

P

@shenwei356
Copy link
Owner

Sure. This tutorial could be a reference.

Steps:

  1. Exporting taxonomic lineages of taxa with rank equal to species from GTDB-taxdump, into tabular format.

     taxonkit list --data-dir gtdb-taxdump/R207/ --ids 1 --indent "" \
         | taxonkit filter --data-dir gtdb-taxdump/R207/ --equal-to species \
         | taxonkit reformat --data-dir gtdb-taxdump/R207/ --taxid-field 1 \
             --format "{k}\t{p}\t{c}\t{o}\t{f}\t{g}\t{s}" \
             -o gtdb.tsv
    
  2. For the new MAGs, you need to prepare the full lineages in 7-column tabular format. You may create new species names.

     # custom.tsv
     $ cat custom.tsv 
     A       B       C       D       E       F       G
    
  3. Creating taxdump from lineages above.

     (cut -f 2- gtdb.tsv; cat custom.tsv) \
         | taxonkit create-taxdump \
             -R "superkingdom,phylum,class,order,family,genus,species" \
             -O taxdump
    
  4. Some tests.

     $ echo G | taxonkit name2taxid --data-dir taxdump/
     G       1630414510
     
     $ echo 1630414510 | taxonkit lineage --data-dir taxdump/ -r
     1630414510      A;B;C;D;E;F;G   species
    

@PeterCx
Copy link
Author

PeterCx commented Jan 27, 2023

Thank you very much. This has worked. In that I have been able to generate TaxID for my MAGs.

However I think something is incorrect. I was using a custom taxdump that was generated previously. I was assuming this taxdump would be updated to include my MAGs. Instead a different taxdump is generated which is much smaller in size, although it does contain my MAGs. See the files below. Not sure what the problem is.

Old_TaxDump_Names.dmp.txt
New_TaxDump_Names.dmp.txt

Your help is appreciated.

Kind regards,

P

@shenwei356
Copy link
Owner

So how was the old one generated?

@PeterCx
Copy link
Author

PeterCx commented Jan 27, 2023

It was downloaded from here:

http://ftp.tue.mpg.de/ebio/projects/struo2/GTDB_release207/taxdump/

I am using another tool called Struo2 to generate a Kraken2 GTDB database and then update it with my custom MAGs. The creator of this tool says he produced this specific taxdump using your tool.

@shenwei356
Copy link
Owner

I see, the taxdump files in struo2 should be generated by @nick-youngblut with old versions of TaxonKit, which might produce different TaxId values for the same lineage.

I'm not sure whether @nick-youngblut performed other transforms cause TaxonKit began to save taxIds in int32 instead of uint32 since v0.14.0 (Nov 28, 2022), as BLAST and DIAMOND do since v0.14.0 (Nov 28, 2022).

Or you can also try v0.12.0, which should be the version he used, and regenerate new taxdump files.

@nick-youngblut
Copy link

I'm not sure whether @nick-youngblut performed other transforms cause TaxonKit began to save taxIds in int32 instead of uint32 since v0.14.0 (Nov 28, 2022), as BLAST and DIAMOND do since v0.14.0 (Nov 28, 2022).

I did not conduct any transformations. I used the taxdump as-is for GTDB-r207. The taxdump files were downloaded in June 2022.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants