1 Department of Earth and Environmental Sciences, Paleontology and Geobiology, Ludwig-Maximilians-Universität München, Munich, Germany.
2 GeoBio-Center, Ludwig-Maximilians-Universität München, Munich, Germany
3 Staatliche Naturwissenschaftliche Sammlungen Bayerns (SNSB)–Bayerische Staatssammlung für Paläontologie und Geologie, Munich, Germany
*corresponding author
Synteny, the shared arrangement of genes on chromosomes between related species, is a marker of shared ancestry, and synteny-breaking events can result in genomic incompatibilities between populations and ultimately lead to speciation events. Despite its pivotal role as a driver of speciation, the role of synteny breaks on speciation is poorly studied due to a lack of chromosome-level genome assemblies for a taxonomically broad sample of organisms. Here, using 22 con-generic animal genome pairs, we find a link between protein identity, microsynteny, and macrosynteny, but no evidence for a universal path of genomic change during speciation. We observed varied trajectories of synteny conservation relative to protein identity in non-model organisms, with many species’ pairs showing no karyotypic changes and others displaying large genomic rearrangements. This contrasts with previous studies on model organisms and indicates that the genomic changes preceding or resulting from speciation are likely very contextual between clades.
- summary_data - the main data table and other data tables used for the figures
- figures_for_paper - outputs that were used for the 3 main paper figures
- supplements_for_paper - all supplemental figures generated by R
- logfiles - log files from the running of the pipeline for each species pair
- 02-processing_scripts - Python and shell scripts used for the processing of the genomes
- 03-graphic_scripts - R scripts, used to make all figures and all supplemental figures
- 04-macrosynteny_plots - data and plots like Figure 2, but for all species pairs
- 05-microsynteny_plots - stats about which genes are found on microsynteny blocks, for all species pairs
- 06-prot_id_tables - stats for each protein pair for each species pair, calculating identities, gaps, and differences
- 07-cluster_stats - stats about the protein ortholog clusters for each species pair
- 08-off_main_tables - stats for each protein pair about those that are on the "major" macrosyntenic chromosome
For each pair of genomes (congeneric species), microsynteny and macrosynteny are both analysed.
The pipeline processor run_synteny_analysis.py is coded in Python, and run simply as:
run_synteny_analysis.py -i species_pair_list.tab
For each species pair, for example the tuna, this begins with the scaffolds, proteins, and GFF downloaded from NCBI:
GCF_910596095.1_fThuMac1.1_genomic.fna.gz
GCF_910596095.1_fThuMac1.1_genomic.gff.gz
GCF_910596095.1_fThuMac1.1_protein.faa.gz
GCF_914725855.1_fThuAlb1.1_genomic.fna.gz
GCF_914725855.1_fThuAlb1.1_genomic.gff.gz
GCF_914725855.1_fThuAlb1.1_protein.faa.gz
and this generates the following files for each species:
- get_genbank_longest_isoforms.py filtered proteins with isoforms removed
.x.faa
, like:GCF_910596095.1_fThuMac1.1_protein.x.faa
andGCF_914725855.1_fThuAlb1.1_protein.x.faa
- get_genbank_longest_isoforms.py filtered GFFs corresponding to the proteins
.x.gff
, like:GCF_910596095.1_fThuMac1.1_genomic.x.gff
,GCF_914725855.1_fThuAlb1.1_genomic.x.gff
- DIAMOND results
fThuAlb1_vs_fThuMac1.blastp.tab
andfThuAlb1_vs_fThuMac1.renamed.blastp.tab
- scaffold_synteny.py results
fThuAlb1_vs_fThuMac1.scaffold_synteny.tab
andfThuAlb1_vs_fThuMac1.scaffold_synteny.pdf
- microsynteny.py results
fThuAlb1_vs_fThuMac1.microsynteny.tab
andfThuAlb1_vs_fThuMac1.microsynteny.pdf
- fastarenamer.py renamed versions of proteins for clustering
.x.n.faa
, like:GCF_910596095.1_fThuMac1.1_protein.x.n.faa
,GCF_914725855.1_fThuAlb1.1_protein.x.n.faa
- makehomologs.py clustering outputs
fasta_clusters.H.thunnus_clusters_v1.tab
clusters_thunnus_clusters_v1.tar.gz
and logthunnus_clusters_v1.2023-08-02-010624.mh.log
- alignment_conserved_site_to_dots.py accumulated tabular output
fThuAlb1_vs_fThuMac1.homologs_identity.tab
Subsequent processing occurs using several R scripts, for analysis and plotting.
Francis, Warren R., Sergio Vargas, and Gert Wörheide. 2024. “Genomic Changes Are Varied across Congeneric Species Pairs.” bioRxiv. https://doi.org/10.1101/2024.09.05.611358.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.