Skip to content

Releases: biocorecrg/ExOrthist

v2.0.0

05 Dec 15:02
83185dc
Compare
Choose a tag to compare
  • Migration to Nextflow DSL2 syntax
  • Creation of different local modules for each process
  • Placing of the two pipelines as two workflows
  • Split the main workflow into different subworkflows
  • Addition of schema validation
  • Easier to run process (no need of params file for test dataset)
  • General execution test for the main pipeline with nf-test
  • Inclusion of nf-core profiles for different institutions into the configuration
  • Fixing the computation of sequence similarity used to set the searching window for intron correspondences

v.1.0.2

21 Jul 14:56
Compare
Choose a tag to compare

Enable connection with Zenodo.

v1.0.1

21 Jul 13:55
Compare
Choose a tag to compare

Minor changes:

  • Remove singularity as a default running mode: the user has now to specify the -with-singularity or -with-docker options to run either ExOrthist main.nf or exint_plotter.nf [Files modified: nextflow.config, exint_plotter/nextflow.config].
  • Fix version for AWS runs [Script modified: A0].
  • Small adaptations for Docker runs [Files modified: nextflow.config, main.nf].
  • Change colors in compare_exons_sets.pl graphical output [Script modified: plot_compare_exon_sets.R].
  • Updated documentation (including requirement for singularity version >= 3.2.1).

Extra files:

  • Upload Nextflow reports for the ExOrthist main.nf runs mentioned in the ExOrthist paper (Marquez et al, Genome Biology, 2021).

v1.0.0

16 Jun 22:42
5418141
Compare
Choose a tag to compare

Main module

  • Changes in intron phase handling. In previous versions, ExOrthist used the offset from the CDS lines (8th column) of the GTF file as representative of the intron phase. From this release onwards, we use the actual definition of intron phase (i.e. the nucleotide of the codon after which the intron is located). Moreover, phase 0 introns are now placed in the IPA before the aminoacid residue and phase 1 and 2 introns after the residue, to better reflect the coding meaning. IMPORTANT NOTE: these changes do not have a major impact in exon homology calling, but all the IPA alignments generated with previous versions will not be valid from v1.0.0 on (i.e. they cannot be used with the --prevaln option). [Script modified: A1].
  • Improvements in the addition of non-annotated exons (--extraexons option): the insertion of the exon in transcripts between coding C1 and C2 exons is prioritized over the insertion between non-coding exons. [Script modified: A1].
  • Non-annotated exons (--extraexons option) can now be added only for a subset of species (previously: either all or none). [Script modified: main.nf].
  • Introduction of stricter cutoffs when deciding not to realign a pair of matching exons in process parse_IPA_prot_aln. To not perform a realignment of a query and >= 2 target exons, it is now required that there is a single best exon pair from another isoform with less than 30% of gaps, more than 40% exon protein sequence similarity and an exon length ratio (shortest/longest) of at least 0.6. [Script modified: B1].
  • The file with the best matches (at the level of the target gene) for each overlapping group of exons is now saved in the output folder as filtered_best_scored_EX_matches_by_targetgene-NoOverlap.tab. [Script modified: main.nf].
  • Redundancy removal: if two variants of the same exon (overlap exon group) have two different exons from the same target gene as valid homologs, only one is selected. Priority is given to the exon associated with a bonafide exon variant (if provided), or, otherwise, to the representative variant of the query exon overlap group. [Scripts modified: C3 and C5].
  • Addition of time and version information in the run_info.log file. [Script modified: A0].

Exint plotter module

  • Addition of a test set for the exint_plotter.nf module.
  • Introduction of additional information in the legend of the exint plots.

Compare exon sets module

  • Changes in the call of not-orthologous exon regulation: it assigns the non-conserved label (instead of best_hit) to the query exon if the best hit of the target exon is in an orthogroup with a different query exon from the same query gene.
  • Changes in the statistics provided as text output: the pairwise comparisons between regulated exons in orthologous genes are now separately reported for each query species and based on the total number of query regulated exons, not pairwise comparisons.
  • Introduction of a graphical output when the module is run for two exon sets (see README).

Others

  • Updated documentation.
  • Uploading of pre-computed IPA protein alignments generated for all the species pairs in a human (hg38), mouse (mm10), zebrafish (danRer11) and fruitfly (dm6) genome-wide ExOrthist main.nf run. These pairwise alignments can be used for a new main module run with the --prevaln option, allowing to skip the alignment step for all specified species pairs. The pre-computed IPA alignments can be downloaded from the --prevaln section in the README, and more will be added to the Github repository in the near future.
  • Introduction of the retrieve_IPA_aln.pl script, to more easily isolate and visualize the best protein alignment between a pair of (query-target) exons.
  • Addition of a maximum length ratio filter to select liftOver hits in get_liftovers.pl.

v0.1.0

03 Mar 18:48
b2bac89
Compare
Choose a tag to compare

New features

  • compare_exon_sets module is now available.
  • get_cluster_stats.pl script is now available.
  • params.config (main.nf): addition of an "orthogroupnum" parameter specifying the number of orthogroups to be jointly evaluated in a single instance of the cluster_EXs process, reducing the number of required jobs.
  • Process check_input (main.nf): addition of a check for geneIDs in input files (i.e. raising of warnings for not-coding genes included in the gene orthogroups).
  • GetLiftOverFile.pl: addition of an option to filter by SS dinucleotide.
  • Addition of a test set for a limited number of selected gene orthogroups in 3 mammalian species (human, mouse and cow).

Changes to the main.nf algorithm

  • Process parse_IPA_prot_aln: addition of an initial filter to avoid comparing identical protein isoform pairs.
  • Process parse_IPA_prot_aln: modification of the sliding window to search for intron conservation depending on the number of gaps in the region surrounding the intron.
  • Process parse_IPA_prot_aln: addition of gap length correction on left and right side of the alignment surrounding an intron when evaluating intron conservation.
  • Process parse_IPA_prot_aln: addition of a filter to ensure that I1 and I2 (the two introns surrounding the evaluated exon) are indeed consecutive.
  • Process parse_IPA_prot_aln: addition of a requirement for valid single exon matches to have >50% non-gapped alignments.
  • Process parse_IPA_prot_aln: correction of cases in which aligned exons with 0% similarity were considered not aligned.
  • Process parse_IPA_prot_aln: do not consider as valid hits of internal (query) exons against first/last exons in the target isoform.
  • Process score_EX_matches: adjustment of the scoring when evaluating homology of N- and C-terminal exons.
  • Process score_EX_matches: change the score to -1 for the evaluated exon in case no exon alignment is detected in the relative target isoform.
  • Process filter_and_select_best_EX_matches_by_targetgene: microexons (<=3 amino acids) in both species automatically pass the sequence similarity.
  • Process filter_and_select_best_EX_matches_by_targetgene: inversion of the logic used in the selection of the best target-gene hit. First, we filter based on the scores of the single features, then we select the best hit per gene (prioritizing the filtered ones).
  • Process cluster_EXs: addition of an extra output including the exons excluded from the clustering algorithm.

Others

  • Updated documentation.
  • params.config (main.nf): rename "liftover" variable as "bonafide_pairs".
  • params.config (exint_plotter): change of the "isoformID" parameter from transcript ID to protein ID.
  • Various bug corrections and fixes.

v0.0.1.beta

05 Nov 14:57
d4288fd
Compare
Choose a tag to compare

First public release.

A few secondary modules are not yet available/ready:

  • compare_exons (CompareExonSets.pl).
  • get cluster statistics (GetStatsExonsClusters.pl).