WORK IN PROGRESS

This pipeline is work-in-progress, you might fing bugs, some are known, while others remain undiscovered. Before getting desperate, please check out the Issues that are already opened and discussed. We encourage the community to contribute by reporting any issues they encounter on GitHub. Feel free to reach out to me via email (maria.schreiber@uni-jena.de) or open an issue directly. It's important to note that I cannot be held responsible for any results obtained using SweetSynteny or any conclusions drawn from them.

SweetSynteny - Unraveling Microsynteny Patterns

Microsynteny, the conservation of gene order and orientation within small genomic regions across different species, provides crucial insights into evolutionary relationships and functional conservation.

Key features of SweetSynteny:

Flexible input:
1. different number of organisms (from bacteria to eukaryotes)
2. different searches (cmsearch for sRNA or blast for protein or with the gff files from_gff)
Data filtering (E-value, hit length)
Sequence-driven clustering and color-pattern Microsynteny clustering
1. on sequence / structur level: mmseq easy lineclust or cmscan or hmmscan (see table)
2. pca for dimension reduction
3. on microsynteny level: on global level: microsynteny cluster by ward or dbscan
Comprehensive results:
1. phylogenetic trees using dendrogram build by scipy.cluster.hierarchy or scatterplot
2. statistical summaries of adjacent genes and genome location
3. microsynteny plots
4. statistics on the similarity of the microsynteny locations, e.g. cosinus similarity
5. Optional: get gene of interest sequence and its promoter sequence (default: 100 nt upstream or up to the next adjacent gene)
Implementation: Nextflow

Conitig:Counter	Gene Name	Start	Stop	Strand	Bio_type	Color
NZ_CP013002.1:0	gene-AQ619_RS00960	215167	216307	sense	protein_coding	#FFFFFF

So, as you can see, with SweetSynteny, your Microsynteny analysis will be, well... sweet!

Graphical Workflow

Dependencies and installation

The pipeline is written in Nextflow. In order to run SweetSynteny, I recommend creating a conda environment dedicated for NextFlow.

Install miniconda or conda

Create a conda environment and install NextFlow within this environment and install everything else.

mamba create -n env_name
conda activate env_name
mamba install -c conda-forge -c bioconda   nextflow openjdk   \
    infernal blast mmseqs2 hmmer   \
    biopython matplotlib pandas platformdirs pytest requests seaborn numpy scipy scikit-learn

sugar
```
pip install rnajena-sugar
```
Clone the github repository for the latest version of SweetSynteny
```
nextflow pull rnajena/SweetSynteny
```
Get DB Download: https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz Download: https://ftp.ebi.ac.uk/pub/databases/Rfam/CURRENT/Rfam.cm.gz
```
hmmpress /path/to/Pfam-A.hmm
cmpress /path/to/Rfam.cm
```
Done!

Usage

Let us briefly go over the most important parameters and options.

search_types infernal|blastn|blastp|tblastn

For protein(s) we recommended a (m)fasta of amino acid sequences and tblastn
For sRNA(s) we recommend a corresponding CM from RFAM or self-built\
You have the choice

bio_type ncRNA|protein

genomes_dir FOLDER

Please choose 2 or more genomes you want to search and save them here.

And use following structure:

└── genomes_dir

  ├── genome1_dir

  │    ├── db.gff

  │    └── db.fna

  ├── genome2_dir

  .    ├── db.gff

  .    └── db.fna

...

annotation_type .gff | other_types_for_futur

query .cm | .fna

Path to CM or FASTA of the gene of interest

output_dir FOLDER

Path to output folder

gene_of_interest string

Name of the gene of interest

adjacent_gene_clustering hmmscan,cmscan | mmseq,cmscan | hmmscan,mmseq | mmseq,mmseq

Chose clustering for adjacent genes

neighbours x:y | x-y

Set numbers of neighbours (:) or number of nucleotides (-)
x and y should be Integer numbers
It is also possible for e.g. ribsowitches to write 0,4 and only focus on the downstream genes.

scale yes | no

Chose if you want to scaled and aligned the microsynteny plots

cluster >2

Chose minimal cluster size for DBscan clustering

threshold 0-1

Select a similarity threshold for clustering

cut_height_args float

Cutting threshold for ward clustering

pfam_db : /path/to/result/folder/Pfam-A.hmm

pls, download the pfam db and call ...

rfam_db : /path/to/result/folder/Rfam.cm

pls, download the rfam db and call ...

name_file : ""

Path to genome name file
It should look like this:

strain	contig	organism_name
GCF_000731315.1	NZ_HG938354.1	Neorhizobium galegae bv. orientalis str. HAMBI 540
GCF_000731315.1	NZ_HG938353.1	Neorhizobium galegae bv. orientalis str. HAMBI 540
GCF_042657465.1	NZ_JBHSLC010000080.1	Azospirillum himalayense
GCF_042657465.1	NZ_JBHSLC010000008.1	Azospirillum himalayense
GCF_042657465.1	NZ_JBHSLC010000094.1	Azospirillum himalayense

ignore_overlaps : "True"|"False"

When you know you search hit overlaps with another annotation, but not more than 75%

substring_search : "True"|"False"

Only when from_gff
If you want to search for all SRPs but in you gff file you find SRP, bacterial_SRP, etc

cpu : int

Use a config file.

See example para.json

Running the pipeline

nextflow run SweetSynteny.nf -params-file /SweetSynteny/para.json -c nextflow.config

Output interpretation

TODO

Other tools

Click here for all citations

SUGAR:
- Eulenfeld, Tom. "Sugar: A Python framework for bioinformatics." Journal of Open Source Software 10.111 (2025): 8122.
BLAST:
- Korf, Ian, Mark Yandell, and Joseph Bedell. Blast. " O'Reilly Media, Inc.", 2003.
INFERNAL:
- Nawrocki, Eric P., Diana L. Kolbe, and Sean R. Eddy. "Infernal 1.0: inference of RNA alignments." Bioinformatics 25.10 (2009): 1335-1337.
MMSeqs2:
- Steinegger, M., Söding, J. "MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets". Nat Biotechnol 35, 1026–1028 (2017)
ETE3:
- Huerta-Cepas, Jaime, François Serra, and Peer Bork. "ETE 3: reconstruction, analysis, and visualization of phylogenomic data." Molecular biology and evolution 33.6 (2016): 1635-1638.
DNA Features Viewer
- Edinburgh Genome Foundry by Zulko. https://github.com/Edinburgh-Genome-Foundry/DnaFeaturesViewer

Cite us

If you use SweetSynteny for your analysis, please cite our github repository.

@software{Maria_Schreiber_SweetSynteny,
author = {Maria Schreiber, Emanuel Barth, Manja Marz},
license = {MIT},
title = {{SweetSynteny}},
url = {https://github.com/rnajena/SweetSynteny}
}

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
bin		bin
fig		fig
.gitignore		.gitignore
LICENSE		LICENSE
SweetSynteny.nf		SweetSynteny.nf
nextflow.config		nextflow.config
para.json		para.json
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WORK IN PROGRESS

SweetSynteny - Unraveling Microsynteny Patterns

Graphical Workflow

Dependencies and installation

Usage

Use a config file.

Running the pipeline

Output interpretation

Other tools

Cite us

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

rnajena/SweetSynteny

Folders and files

Latest commit

History

Repository files navigation

WORK IN PROGRESS

SweetSynteny - Unraveling Microsynteny Patterns

Graphical Workflow

Dependencies and installation

Usage

Use a config file.

Running the pipeline

Output interpretation

Other tools

Cite us

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages