Genome read alignments visualizer
ra-visualizer is a Python script for visualizing genome reads and/or contigs alignments to reference.
It can generate the following pictures (in .svg format), with arbitrary zoom and offset, relative to the start
of the reference:
Image with reads coloring:
Only contig mapping output:
- python3
- bwa
- TeX Live
(sudo apt install texlive
)
Python packages:
- pyx
- pysam
NB. Python package pysam
is working only in Linux OS, but for Windows users there are several ways
how to make everything work. One of them is described in the next section.
Install Linux (for example, Ubuntu 22.04.3 LTS) with WSL for Windows (you can use any instruction on Web
how to do this, for example, see first steps in this article).
After that you can install python3
and all dependencies for it (don't forget to install texlive).
After all, you can run python3 src/visualize.py
in folder /mnt/<PATH_TO_PROJECT_DIR>
.
Suppose you have:
- Reference in
.fasta
format - Some assembly (contigs) in
.fasta
format - And long reads in
.fastq
format...
What should you do to visualize it all together? ☺
Run these steps:
bwa index reference.fasta
bwa mem reference.fasta contigs.fasta > contigs.sam
bwa mem -x ont2d -A1 -B2 -O2 -E1 -t <number_of_CPUs_to_use> reference.fasta reads.fastq > reads.sam
NB. This command is used for Oxford Nanopore long reads. For other sequencing technologies you should use different command to align reads to reference.- (temporary) Generate
.stats
file fromcontigs.sam
file via command:
python3 src/gen_stats_table.py -i contigs.sam -o contigs.stats
- (temporary) Generate
.stats
file fromreads.sam
file via command:
python3 src/gen_stats_table.py -i reads.sam -o reads.stats
- Run Visualizer:
python3 src/visualize.py --ref-size <reference_size> -c contigs.stats -r reads.stats
- See the resulting
output.svg
image and/or change the output options in the previous step
NB. If you don't have assembly or reads, you can run Visualizer without it. Just skip the commands 2,4 or 3,5 (depending on what data is missing), and don't pass the corresponding parameter to Visualizer.
Let's assume you have:
- Reference of E.coli bacterium in file
ecoli.fasta
- Some assembly (contigs) in file
contigs.fasta
- A long nanopore reads in file
nanopore-reads.fastq
All files are placed in example1 folder.
So, to view it all together, run steps from 'Run steps' section. Or you can use ready-made files
contigs.stats
and nanopore-reads.stats
in the example folder to view the resulting image. For that, run the command from the project folder:
python3 src/visualize.py --ref-size 4641652 -c example1/contigs.stats -r example1/nanopore-reads.stats
After that you can view the resulting image output.svg
in the project folder.
You can also change the output options (you can see all available options by running python3 src/visualize.py -h
).
Let's assume you:
- Don't have any references...
- But you have an assembly in file
contigs.fasta
- And long reads in file
nanopore-reads.fastq
What should you do to visualize it together?
Run these steps:
python3 src/concat_contigs.py -i contigs.fasta -o scaffold.fasta
bwa index scaffold.fasta
bwa mem -x ont2d -A1 -B2 -O2 -E1 -t <number_of_CPUs_to_use> scaffold.fasta nanopore-reads.fastq > nanopore-reads.sam
NB. This command is used for Oxford Nanopore long reads. For other sequencing technologies you should use different command to align reads to reference.- (temporary) Generate
.stats
file fromnanopore-reads.sam
file via command:
python3 src/gen_stats_table.py -i reads.sam -o reads.stats
... TBA
- Add support for working directly with .sam files.
If you have any ideas how to improve this project, you can write me directly to svkazakov.me at gmail.com.
Please report any problem through the GitHub issue tracker.
For other questions write me directly to svkazakov.me at gmail.com.
The MIT License (MIT)