CACY: Core genes Alignment-free phylogeny and Capture of taxonomY relationship

Attention

This repository is currently under active development. New features and documentation are coming soon.

Summary

CACY is a command-line tool for the Phylogenetic and Taxonomic analysis of closely related organisms.

For Phylogenetic analysis, the tool uses alignment-free methods to construct Phylogenetic trees based on the amino acid sequences from core genes. Given a list of proteomes in fasta format from various species, it performs clustering with all the proteins and selects those from core genomes. Then, the pipeline feeds them into alignment-free methods to generate the Phylogenetic tree (or splits). For Taxonomic analysis, the tool calculates the pairwise Average Nucleotide Identity (ANI) or Percentage Of Conserved Proteins (POCP) values, and then reports strict Operational Taxonomic Units (OTUs) using the graph-based algorithm.

You can find a more detailed explanation of the tool on ReadTheDocs.

Installation

CACY is installable from conda:

git clone https://github.com/garrison-chen/CACY.git && cd CACY
conda env create --file=environments.yaml
conda activate cacy

Next, run the following command to install the additional dependencies:

git clone https://github.com/gi-bielefeld/sans.git
cd sans
make

Usage

CACY is designed to run several workflows as integrations of different modules, the latter can also be run individually. With a list of closely-related strains (proteomes or genomes) as input:

Workflow: easy-core-phylo
Modules: cluster > distribute > extract > phylo
Run this workflow if you want to construct a Phylogenetic tree using the core genes. This workflow performs clustering with all the emsumbled proteins and selects those from core genomes. Then the workflow feeds them into the alignment-free methods to efficiently generate the Phylogenetic trees.

python CACY.py easy-core-phylo -i input_directoty -o output_directory -c clusering_option -f threshold

Workflow: easy-compare-sotu
Modules: compare > sotu
Run this workflow if you want to calculate the pairwise ANI or POCP values and report strict OTUs. This workflow uses fastANI and POCP to calculate pairwise ANI and POCP values and store the results to a phylip-formatted lower triangle matrix. This matrix is then converted to an adjacency matrix according to a user-defined cutoff. Next, the workflow turns the adjacency matrix into an undirected graph and applies the Bron-Kerbosch algorithm using solver from NetworkX to calculate all the maximal cliques as the strict OTU groups.

python CACY.py easy-compare-sotu -i input_directory -o output_directory -c clustering_option

Workflow: easy-compare-phylo
Module: compare > phylo
Run this workflow if you want to construct Phylogenetic trees using the pairwise ANI or POCP values. Similar to the previous workflow, this one uses fastANI and POCP to calculate pairwise ANI and POCP values. Then the workflow applies the neighbour-joining algorithm to construct the phylogenetic trees.

python CACY.py easy-compare-phylo -i input_directory -o output_directory -m similarity_metrix

Run the following workflow if you want to identify the cutoffs for separating the specific taxon.

python CACY.py easy-todo

The full usage is shown below:

CACY (Core genes Alignment-free phylogeny and Capture of taxonomY relationship), V1.0.0, Mar 2025

WORKFLOW:
[easy-core-phylo]              [cluster] > [distribute] > [extract] > [phylo]
[easy-compare-sotu]            [compare] > [sotu]

COMMANDS (core modules):
[cluster]                      Perform clustering on the input amino acid sequences
[distribute]                   Create the universal gene frequency distribution U-shape plot
[extract]                      Select and extract the core-genes amino acid sequences from each proteome
[compare]                      Calculate the pairwise similarities among the given strains using POCP or ANI
[sotu]                         Report the strict OTU (sOTU) groups
[phylo]                        Construct the Phylogenetic tree
[hgt]                          Detect the horizontal gene transfer

COMMANDS (auxiliary modules):
[taxon-search]                 search for the NCBI taxon id/name
[download]                     download the NCBI RefSeq data
[annotate]                     perform genome annotation

Usage: python CACY.py COMMANDS/WORKFLOW [OPTIONS]
Possible [OPTIONS] for COMMANDS/WORKFLOW can be seen with syntax: python CACY.py COMMANDS/WORKFLOW --help

Description of workflows and modules

Workflow		Module	Description	Input	Output
easy-core-phylo	easy-compare-phylo	easy-compare-sotu
1		cluster	Perform clustering on the input amino acid sequences	amino acid sequences	protein clusters
2		distribute	Create the universal gene frequency distribution U-shape plot	protein clusters	gene frequency distribution plot
3		extract	Select and extract the core-genes amino acid sequences from each proteome	amino acid sequences, protein clusters	selected amino acids sequences from core genes
	1	compare	Calculate the pairwise similarities among the given strains using pocp or ani	proteomes/genomes	pairwise similarity matrix
	2	sotu	Report the strict otu (sotu) groups	pairwise similarity matrix	strict OTU groups
4	2	phylo	Construct the Phylogenetic tree	amino acid sequences	phylogenetic tree/splits
		hgt	Detect the horizontal gene transfer	amino acid sequences	HGT donors
		taxon-search	Search for the NCBI taxon id/name	organism’s name/ncbi taxon id	organism’s name/ncbi taxon id
		download	Download the NCBI RefSeq data	organism’s name/ncbi taxon id	ncbi genomes (with proteomes)
		annotate	Perform genome annotation	genome/nucleotide sequences	annotated sequences

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
Archive		Archive
data		data
library		library
recipe		recipe
testdata		testdata
utils		utils
.gitignore		.gitignore
CACY.py		CACY.py
LICENSE		LICENSE
Notes.txt		Notes.txt
README.md		README.md
__init__.py		__init__.py
__main__.py		__main__.py
module_annotate.py		module_annotate.py
module_cluster.py		module_cluster.py
module_compare.py		module_compare.py
module_cutoff.py		module_cutoff.py
module_cutoff_alternative.py		module_cutoff_alternative.py
module_distribute.py		module_distribute.py
module_download.py		module_download.py
module_extract.py		module_extract.py
module_hgt.py		module_hgt.py
module_phylo.py		module_phylo.py
module_sotu.py		module_sotu.py
module_taxon_search.py		module_taxon_search.py
parser_cacy.py		parser_cacy.py
workflow_easy_compare_cutoff.py		workflow_easy_compare_cutoff.py
workflow_easy_compare_phylo.py		workflow_easy_compare_phylo.py
workflow_easy_compare_sotu.py		workflow_easy_compare_sotu.py
workflow_easy_core_phylo.py		workflow_easy_core_phylo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CACY: Core genes Alignment-free phylogeny and Capture of taxonomY relationship

Attention

Summary

Installation

Usage

Description of workflows and modules

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

kalininalab/CACY

Folders and files

Latest commit

History

Repository files navigation

CACY: Core genes Alignment-free phylogeny and Capture of taxonomY relationship

Attention

Summary

Installation

Usage

Description of workflows and modules

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages