Kaiju can be downloaded and compiled from source, or easily installed via the bioconda channel:
conda install -c bioconda kaiju
Kaiju requires an index file created from a reference database of protein sequences.
You can either create such an index locally or download a pre-built index.
For example, to download the Kaiju index for the NCBI BLAST nr database, download the index file with
wget https://kaiju.binf.ku.dk/database/kaiju_db_nr_2023-05-10.tgz
and unpack the tar archive with:
tar xzf kaiju_db_nr_2023-05-10.tgz
which will give these 3 files:
kaiju_db_nr.fmi
nodes.dmp
names.dmp
The Kaiju index itself is in the file kaiju_db_nr.fmi
, containing the Borrows-Wheeler-Transform and the FM-Index of the protein sequences, wereas nodes.dmp
and names.dmp
contain the taxonomic tree and taxon names from the NCBI taxonomy.
To run Kaiju with the downloaded and unpacked files run:
kaiju -t nodes.dmp -f kaiju_db_nr.fmi -i sequencing_reads.fastq.gz
For paired-end reads use:
kaiju -t nodes.dmp -f kaiju_db_nr.fmi -i sequencing_reads_R1.fastq.gz -j sequencing_reads_R2.fastq.gz
Note: The reads must be in the same order in both files!
Kaiju can read input files in FASTQ or FASTA format, which may be gzip-compressed.
By default, Kaiju will print the output to the terminal (STDOUT).
The output can also be written to a file using the -o
option:
kaiju -t nodes.dmp -f kaiju_db.fmi -i sequencing_reads.fastq.gz -o kaiju.out
Kaiju can use multiple parallel threads, which can be specified with the -z
option, e.g. for using 25 parallel threads:
kaiju -z 25 -t nodes.dmp -f kaiju_db.fmi -i sequencing_reads.fastq.gz -o kaiju.out
Multiple samples can be processed at once using kaiju-multi.
Kaiju has two run modes and several command-line parameters that influence the classification accuracy, see the original paper and the README.
Kaiju will print one line for each read or read pair. The default output format contains three columns separated by tabs:
- either C or U, indicating whether the read is classified or unclassified.
- name of the read
- NCBI taxon identifier of the assigned taxon
Using the option -v
enables the verbose output, which will print additional columns.
The included program kaiju2table converts Kaiju's output file(s) into a summary table for a given taxonomic rank and kaiju2krona creates a file for making a Krona visualisation.