Roko

A deep learning based tool for consensus polishing.

Description

Roko is a consensus polisher which takes draft assembly and aligned reads in BAM format and outputs a set of contigs in FASTA format. It uses deep learning architecture to produce high quality consensus. Features are represented as sampled reads in a window and labels are mapped to draft assembly in Medaka-style fashion.

Dependencies

Check HTSlib dependencies.
gcc 5.0+ and g++
python 3.6 or 3.7 (python3-dev and venv)

Installation

GPU

git clone https://github.com/lbcb-sci/roko.git roko
cd roko
make gpu

CPU

git clone https://github.com/lbcb-sci/roko.git roko
cd roko
make cpu

Usage

To activate virtual environment:

. $PROJECT_DIR/roko/bin/activate

To generate features for model training or inference:

    python features.py [options ...] <ref> <X> <o>
        <ref>
            Draft sequence in FASTA format
        <X>
            Reads aligned to <ref> in BAM format
        <o>
            Output name (e.g. output.hdf5) 
        
        options:
            --Y
                Truth genome aligned to <ref> in BAM format (training only)
            --t 
                default: 1
                Number of worker processes

To generate BAM files for feature generation pomoxis mini_align method is recommended.

To train a model:

    python train.py [options ...] <train> <out>
        <train>
            Directory containing generated .hdf5 files used for training (or one .hdf5 file)
        <out>
            Directory for saving trained model
            
        options:
            --val
                Directory containing generated .hdf5 files used for validation (or one .hdf5 file)
            --b
                default: 128
                Batch size used for train and validation
            --memory
                default: False
                If flag is present, traning and validation data is stored in RAM
            --t
                default: 0
                Number of workers for train and validation data loaders (--t for train data loader and --t for validation)

To make inference:

    python inference.py [options ...] <data> <model> <out>
        <data>
            Path to the generated features in .hdf5
        <model>
            Path to the saved model in .pth format
        <out>
            Path to the output file (FASTA format)
            
        options:
            --t
                default: 0
                Number of workers for inference
            --b
                default: 128
                Inference batch size

Comparison

The model was trained and tested on FASTQ Basecalls from Zymo R10 Native “3 Peaks”. Data was binned using Loman's script. Draft assemblies were generated using raven. BAM files used for feature generation and BAM files used for labeling were generated by mini_align script from pomoxis tool.

Organisms used for training are: B. subtilis, E. faecalis, E. coli, L. Monocytogenes and S. enterica. P. aeruginosa was used for validation. Models are tested on S. aureus. Results were evaluated using pomoxis assess_assembly script.

The (mean) results are given in the following table:

Model	Total error	Mismatch	Deletion	Insertion	Qscore
Raven	0.160%	0.040%	0.059%	0.061%	27.97
Medaka	0.037%	0.012%	0.007%	0.017%	34.30
HELEN	0.066%	0.019%	0.031%	0.016%	31.78
Roko	0.035%	0.013%	0.008%	0.013%	34.55

Total error does not correspond to the sum of errors because of rounding.

Download

The model stated in comparison section (R10, Guppy 2.3.8) can be downloaded here.

Contact information

This tool is still in an early development stage. All bugs and questions can be reported to: dominik.stanojevic@fer.hr, mile.sikic@fer.hr or mile_sikic@gis.a-star.edu.sg.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Dependencies/htslib-1.9		Dependencies/htslib-1.9
include		include
roko		roko
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
gen.cpp		gen.cpp
generate.cpp		generate.cpp
models.cpp		models.cpp
requirements.txt		requirements.txt
requirements_cpu.txt		requirements_cpu.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Roko

Description

Dependencies

Installation

GPU

CPU

Usage

Comparison

Download

Contact information

About

Releases

Packages

Contributors 2

Languages

License

lbcb-sci/roko

Folders and files

Latest commit

History

Repository files navigation

Roko

Description

Dependencies

Installation

GPU

CPU

Usage

Comparison

Download

Contact information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages