Skip to content

A genotyping pipeline for Acinetobacter baumannii, presented proudly by a capybara

License

Notifications You must be signed in to change notification settings

Zhou-lab-SUDA/CAPYBARA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAPYBARA

Capybara is a Core-snp Assignment PYthon tool for Acinetobacter baumannii. It screens either raw reads or assemblies for :

  • Identifying whether a query belongs to the epidemic super-lineage (ESL), a super-set of two predominant international clones: IC1 and IC2.
  • Assignment of query strain into one of the lineages, clusters, and clades in the ESL based on a pre-curated set of SNPs.

Citation

Shengkai Li, Heng Li, Guilai Jiang, Shengke Wang, Min Wang, Yilei Wu, Xiao Liu, Ling Zhong, Shichang Xie, Yi Ren, Yongliang Lou, Jimei Du, Zhemin Zhou, 2024, Emergence and Global Spread of a Dominant Multidrug-Resistant Variant in Acinetobacter baumannii, https://doi.org/10.21203/rs.3.rs-4224555/v1


Installation:

Capybara was devoloped and tested in Python 3.9.0, and requires several modules:

minimap2
mash
samtools
bcftools

You can easily install these packages using command below:

conda install -c bioconda samtools bcftools minimap2 mash

Then you can use git to clone Capybara into your PC.

git clone git@github.com:Zhou-lab-SUDA/CAPYBARA.git

Quick Start (with examples)

$ cd /path/to/Capybara/

$ python capy.py -i examples/2.5.6.fa

It will generate a report file for Examples/2.5.6.fa about its population.

A single run for an assembled genome will finish <3 minutes for a 4 CPUs laptop (>10 minutes for short reads).

Usage

$ Usage: capy.py [OPTIONS]

Options:

  -i, --query TEXT  [Required] Input data, both assembled genome or short reads are acceptable.

  -p, --prefix TEXT [Optional] Prefix for output file. Default as Capy.

  -t, --threads INTEGER [Optional] Number of process to use. default: 8

  -l, --list TEXT   [Optional] A file containing list of query files, one per line.

  --help    Show this message and exit.

Capybara generates a report file in format below:

query ESL Clades Coverage
2.5.6.fa True 2.5.6 30.4183
Non-esl.fa False - -

Work flow and Reproduction Instructions

Work flow

A basic run for Capybara is as follows:

  • ESL identification:
    • We pre-sketched all 5,824 representative genomes. Genetic distance between query data and pre-sketched data will be evaluated to find the most closed genomes.
    • If query data does not contains any sequential information related to ESL genomes, it will be classified as non-ESL. Otherwise, it will be analyzed as follows.
  • Sequential alignment:
    • Query data will be aligned onto ESL's reference genome (MDR-TJ:GCF_000187205.2) to generate a BAM file.
  • SNP calling:
    • A series SNPs will be called from BAM and then generate an VCF file.
  • Population assignment:
    • Using a pre-built SNP scheme to assign hierarchical population of query data.

Workflow chart:

Reproduction Instructions

All data required for reproduction of the analysis were distributed in this repository under CAPYBARA/capydb/

which included:

  • esl/esl.fna
Reference genome for ESL.
  • msh/*.msh
5,824 pre-sketched files by Mash sketch.

Our pulished release

You may also be interested KleTy, a pipeline for analysis of Klebsiella and can also genotype plasmids from short-reads file.

DOI

About

A genotyping pipeline for Acinetobacter baumannii, presented proudly by a capybara

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages