Assembly Tips

Choosing k

MEGAHIT uses multiple k-mer strategy. Minimum k, maximum k and the step for iteration can be set by options --k-min, --k-max and --k-step respectively. k must be odd numbers while the step must be an even number.

for ultra complex metagenomics data such as soil, a larger k_min, say 27, is recommended to reduce the complexity of the de Bruijn graph. Quality trimming is also recommended
for high-depth generic data, large --k-min (25 to 31) is recommended
smaller --k-step, say 10, is more friendly to low-coverage datasets

Filtering (k_min+1)-mer

(k_min+1)-mer with multiplicity lower than d (default 2, specified by --min-count option) will be discarded. You should be cautious to set d less than 2, which will lead to a much larger and noisy graph. We recommend using the default value 2 for metagenomics assembly. If you want to use MEGAHIT to do generic assemblies, please change this value according to the sequencing depth. (recommend --min-count 3 for >40x).

Mercy k-mer

This is specially designed for metagenomics assembly to recover low coverage sequence. For generic dataset >= 30x, MEGAHIT may generate better results with --no-mercy option.

k-min 1pass mode

This mode can be activated by option --kmin-1pass. It is more memory efficient for ultra low-depth datasets, such as soil metagenomics data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assembly Tips

Choosing k

Filtering (k_min+1)-mer

Mercy k-mer

k-min 1pass mode

Clone this wiki locally

Assembly Tips

Choosing k

Filtering (kmin+1)-mer

Mercy k-mer

k-min 1pass mode

Clone this wiki locally

Filtering (k_min+1)-mer