-
Notifications
You must be signed in to change notification settings - Fork 1
metamlst index
MetaMLST-index builds and manages the internal MetaMLST SQLite database. Specifically:
- creates a new database
- updates a database with additional MLST-loci sequences and MLST Sequence Types
- creates Bowtie2 indexes from a database
usage: metamlst-index.py [-h] [-t TYPINGS] [-s SEQUENCES] [-q DUMP_DB]
[-i BUILDINDEX] [-b BUILDBLAST] [-d DB PATH] [--list]
[--filter FILTER] [--version]
[--bowtie2_threads BOWTIE2_THREADS]
[--bowtie2_build BOWTIE2_BUILD]
Builds and manages the MetaMLST SQLite Databases
optional arguments:
-h, --help show this help message and exit
-t TYPINGS, --typings TYPINGS
Typings in TAB separated file (Build New Database)
(default: None)
-s SEQUENCES, --sequences SEQUENCES
Sequences in FASTA format (comma separated list of
files) (default: None)
-q DUMP_DB, --dump_db DUMP_DB
Dump the entire database to file in fasta format)
(default: None)
-i BUILDINDEX, --buildindex BUILDINDEX
Build a Bowtie2 Index from the DB (default: None)
-b BUILDBLAST, --buildblast BUILDBLAST
Build a BLAST Index from the DB (default: None)
-d DB PATH, --database DB PATH
MetaMLST Database File (if unset, use the default
database. If a file name is given, MetaMLST will
create a new DB or update an existing one) (default:
[METAMLST_INSTALL_FOLDER]/metamlst_databases/metamlstDB_2018.db)
--list Lists all the MLST keys present in the database and
exit (default: False)
--filter FILTER filters the db for a specific bacterium (default:
None)
--version Prints version informations (default: False)
--bowtie2_threads BOWTIE2_THREADS
Number of Threads to use with bowtie2-build (default:
4)
--bowtie2_build BOWTIE2_BUILD
Full path to the bowtie2-build command to use, deafult
assumes that 'bowtie2-build is present in the system
path (default: bowtie2-build)
MetaMLST organizes publicly available MLST data in an internal SQLite database. A premade version of the DB is available with MetaMLST, but you can generate your own starting from your MLST data.
To create a database use:
metamlst-index.py -s MLST_SEQS.fasta -t MLST_TYPES.txt -d NEW_DATABASE.db
NEW_DATABASE.db is the path where the database will be created -s specifies the path to the sequences file (see below) -t specifies the path to the typing file (see below)
Note: You can run metamlst-index.py with both -s and -t (adds the sequences first, then the types), or run the -s and -t phases separately. Please note that in this case you have to perform the -s step before.
To add types and sequences to an existing database, use:
metamlst-index.py -s MLST_SEQS.fasta -t MLST_TYPES.txt -d MY_DATABASE.db
Please consider that:
- If you provide a sequence file for a species already in the database, only the new sequences will be added (the others will stay un-updated)
- If you provide a typing file for a species already in the database, the old typing data will be deleted
This file contains the MLST sequences in FASTA format. If you have multiple typing files (e.g. one per species) you can either:
- provide all the sequences in a single file; or
- provide a comma-separated list of FASTA files; or
- run metaMLST-index on the same database file subsequently, once for each FASTA file.
The file must be formatted in the following way:
- Sequence IDs: species_locus_alleleID
- The character "_" is allowed only to separate species, locus and alleleID
- All sequences should be identical in length, with no gaps (best practice)
You can find an example file here
This file contains the MLST profiles in tab-separated format. If you have multiple typing files (e.g. one per species) you can either:
- provide a comma-separated list of typing files; or
- run metaMLST-index on the same database file subsequently, once for each typing file.
Typing files from the publicly available repository (PubMLST) can be used, provided that you add on the first line:
#species|Species Extended Name
where species is the MLST name for the species, the same you use in the sequences FASTA file (see above): species_locus_alleleID
Generally, the typing file must be formatted in the following way:
- The first line must contain a "#", followed by the MLST-key and the species extended name. (see below)
- The second line must be a table-header with 'ST' and the names of each MLST locus
- The following lines contain the profiles numeric IDs
- columns should not contain any other information (e.g. clonal complexes, ST-complex... etc). The script however ignores the followings: clonal_complex,species,mlst_clade.
You can find an example file here
MetaMLST is a project of the Computational Metagenomics Lab at CIBIO, University of Trento, Italy.
M. Zolfo, A. Tett, O. Jousson, C. Donati and N. Segata - MetaMLST: multi-locus strain-level bacterial typing from metagenomic samples - Nucleic Acids Research, 2016 DOI: 10.1093/nar/gkw837