CAPYBARA

Capybara, a Core-snp Assignment PYthon tool for Acinetobacter baumannii

Capybara enables you to identify hierarchical populations in epidemic super-lineage (ESL) of Acinetobacter baumannii using a set of core-genome SNPs. For ESL or citation of Capybara, see DOI: 10.21203/rs.3.rs-4129268/v1.

Installation:

Capybara was devoloped and tested in Python 3.9.0, and requires a several modules:

minimap2
mash
samtools
bcftools

You can easily install these packages using command below:

conda install -c bio-conda samtools bcftools minimap2 mash

Then you can use git to clone Capybara into your PC.

git clone git@github.com:Zhou-lab-SUDA/CAPYBARA.git

Quick Start (with examples)

$ cd /path/to/Capybara/

$ capy.py -i Examples/2.5.6.fna

It will generate a report file for Examples/2.5.6.fna about its population.

A single run for an assembled genome will finish <3 minutes for a 4 CPUs laptop (>10 minutes for short reads).

Usage

$ Usage: capy.py [OPTIONS]

Options:

  -i, --query TEXT  [Required] Input data, both assembled genome or short reads are acceptable.

  -p, --prefix TEXT [Optional] Prefix for output file. Default as Capy.

  -t, --threads INTEGER [Optional] Number of process to use. default: 8

  -l, --list TEXT   [Optional] A file containing list of query files, one per line.

  --help    Show this message and exit.

Capybara generates a report file in format below:

query	ESL	Lineage	Variant
2.5.6.fna	True	2.5	2.5.6
IC7.fna	False	-	-

Work flow and Reproduction Instructions

Work flow

A basic run for Capybara is as follows:

ESL identification:
- We pre-sketched all 5,824 representative genomes. Genetic distance between query data and pre-sketched data will be evaluated to find the most closed genomes.
- If query data does not contains any sequential information related to ESL genomes, it will be classified as non-ESL. Otherwise, it will be analyzed as follows.
Sequential alignment:
- Query data will be aligned onto ESL's reference genome (MDR-TJ:GCF_000187205.2) to generate a BAM file.
SNP calling:
- A series SNPs will be called from BAM and then generate an VCF file.
Population assignment:
- Using a pre-built SNP scheme to assign hierarchical population of query data.

Workflow chart:

Reproduction Instructions

All data required for reproduction of the analysis were distributed in this repository under CAPYBARA/capydb/

which included:

esl/esl.fna

Reference genome for ESL.

msh/*.msh

5,824 pre-sketched files by Mash sketch.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
capydb		capydb
examples		examples
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
capy.py		capy.py
configure.py		configure.py
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CAPYBARA

Installation:

Quick Start (with examples)

Usage

Work flow and Reproduction Instructions

Work flow

Reproduction Instructions

About

Releases 1

Packages

Languages

License

Naclist/CAPYBARA

Folders and files

Latest commit

History

Repository files navigation

CAPYBARA

Installation:

Quick Start (with examples)

Usage

Work flow and Reproduction Instructions

Work flow

Reproduction Instructions

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages