This pipeline is work-in-progress, you might fing bugs, some are known, while others remain undiscovered. Before getting desperate, please check out the Issues that are already opened and discussed. We encourage the community to contribute by reporting any issues they encounter on GitHub. Feel free to reach out to me via email (maria.schreiber@uni-jena.de) or open an issue directly. It's important to note that I cannot be held responsible for any results obtained using SweetSynteny or any conclusions drawn from them.
Microsynteny, the conservation of gene order and orientation within small genomic regions across different species, provides crucial insights into evolutionary relationships and functional conservation.
Key features of SweetSynteny:
- Flexible input:
- different number of organisms (from bacteria to eukaryotes)
- different searches (
cmsearchfor sRNA orblastfor protein or with the gff filesfrom_gff)
- Data filtering (E-value, hit length)
- Sequence-driven clustering and color-pattern Microsynteny clustering
- on sequence / structur level:
mmseq easy lineclustorcmscanorhmmscan(see table) - pca for dimension reduction
- on microsynteny level: on global level: microsynteny cluster by ward or dbscan
- on sequence / structur level:
- Comprehensive results:
- phylogenetic trees using
dendrogrambuild by scipy.cluster.hierarchy or scatterplot - statistical summaries of adjacent genes and genome location
- microsynteny plots
- statistics on the similarity of the microsynteny locations, e.g. cosinus similarity
- Optional: get gene of interest sequence and its promoter sequence (default: 100 nt upstream or up to the next adjacent gene)
- phylogenetic trees using
- Implementation: Nextflow
| Conitig:Counter | Gene Name | Start | Stop | Strand | Bio_type | Color |
|---|---|---|---|---|---|---|
| NZ_CP013002.1:0 | gene-AQ619_RS00960 | 215167 | 216307 | sense | protein_coding | #FFFFFF |
So, as you can see, with SweetSynteny, your Microsynteny analysis will be, well... sweet!
The pipeline is written in Nextflow. In order to run SweetSynteny, I recommend creating a conda environment dedicated for NextFlow.
- Install miniconda or conda
- Create a conda environment and install NextFlow within this environment and install everything else.
mamba create -n env_name conda activate env_name mamba install -c conda-forge -c bioconda nextflow openjdk \ infernal blast mmseqs2 hmmer \ biopython matplotlib pandas platformdirs pytest requests seaborn numpy scipy scikit-learn - sugar
pip install rnajena-sugar - Clone the github repository for the latest version of
SweetSyntenynextflow pull rnajena/SweetSynteny
- Get DB
Download: https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
Download: https://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz
hmmpress /path/to/Pfam-A.hmm cmpress /path/to/Rfam.cm - Done!
Let us briefly go over the most important parameters and options.
search_types infernal|blastn|blastp|tblastn
- For protein(s) we recommended a (m)fasta of amino acid sequences and tblastn
- For sRNA(s) we recommend a corresponding CM from RFAM or self-built\
- You have the choice
bio_type ncRNA|protein
genomes_dir FOLDER
-
Please choose 2 or more genomes you want to search and save them here.
-
And use following structure:
└── genomes_dir
├── genome1_dir │ ├── db.gff │ └── db.fna ├── genome2_dir . ├── db.gff . └── db.fna...
annotation_type .gff | other_types_for_futur
query .cm | .fna
- Path to CM or FASTA of the gene of interest
output_dir FOLDER
- Path to output folder
gene_of_interest string
- Name of the gene of interest
adjacent_gene_clustering hmmscan,cmscan | mmseq,cmscan | hmmscan,mmseq | mmseq,mmseq
- Chose clustering for adjacent genes
neighbours x:y | x-y
- Set numbers of neighbours (:) or number of nucleotides (-)
- x and y should be Integer numbers
- It is also possible for e.g. ribsowitches to write 0,4 and only focus on the downstream genes.
scale yes | no
- Chose if you want to scaled and aligned the microsynteny plots
cluster >2
- Chose minimal cluster size for
DBscanclustering
threshold 0-1
- Select a similarity threshold for clustering
cut_height_args float
- Cutting threshold for ward clustering
pfam_db : /path/to/result/folder/Pfam-A.hmm
- pls, download the pfam db and call ...
rfam_db : /path/to/result/folder/Rfam.cm
- pls, download the rfam db and call ...
name_file : ""
- Path to genome name file
- It should look like this:
| strain | contig | organism_name |
|---|---|---|
| GCF_000731315.1 | NZ_HG938354.1 | Neorhizobium galegae bv. orientalis str. HAMBI 540 |
| GCF_000731315.1 | NZ_HG938353.1 | Neorhizobium galegae bv. orientalis str. HAMBI 540 |
| GCF_042657465.1 | NZ_JBHSLC010000080.1 | Azospirillum himalayense |
| GCF_042657465.1 | NZ_JBHSLC010000008.1 | Azospirillum himalayense |
| GCF_042657465.1 | NZ_JBHSLC010000094.1 | Azospirillum himalayense |
ignore_overlaps : "True"|"False"
- When you know you search hit overlaps with another annotation, but not more than 75%
substring_search : "True"|"False"
- Only when
from_gff - If you want to search for all SRPs but in you gff file you find SRP, bacterial_SRP, etc
cpu : int
See example para.json
nextflow run SweetSynteny.nf -params-file /SweetSynteny/para.json -c nextflow.config
- TODO
Click here for all citations
-
SUGAR:
Eulenfeld, Tom. "Sugar: A Python framework for bioinformatics." Journal of Open Source Software 10.111 (2025): 8122.
-
BLAST:
Korf, Ian, Mark Yandell, and Joseph Bedell. Blast. " O'Reilly Media, Inc.", 2003.
-
INFERNAL:
Nawrocki, Eric P., Diana L. Kolbe, and Sean R. Eddy. "Infernal 1.0: inference of RNA alignments." Bioinformatics 25.10 (2009): 1335-1337.
-
MMSeqs2:
Steinegger, M., Söding, J. "MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets". Nat Biotechnol 35, 1026–1028 (2017)
-
ETE3:
Huerta-Cepas, Jaime, François Serra, and Peer Bork. "ETE 3: reconstruction, analysis, and visualization of phylogenomic data." Molecular biology and evolution 33.6 (2016): 1635-1638.
-
DNA Features Viewer
Edinburgh Genome Foundry by Zulko. https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer
If you use SweetSynteny for your analysis, please cite our github repository.
@software{Maria_Schreiber_SweetSynteny,
author = {Maria Schreiber, Emanuel Barth, Manja Marz},
license = {MIT},
title = {{SweetSynteny}},
url = {https://github.com/rnajena/SweetSynteny}
}