See overview and documentation:
MIND: ab initio gene predictions by MAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome. BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.
1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):
- Search RNA-Seq datasets for your organism on NCBI, filter Runs (SRR) for Illumina, paired-end, HiSeq 2500 or newer.
- Download Runs from NCBI (SRA-toolkit)
- If existing annotations is available, expression quantification is done against every gene using every SRR with Kallisto.
- run phylostratr on current gene models to infer phylostrata of each gene model
- Rank the SRRs with highest number of expressed orphans and select feasible amounts of data to work with.
Note: If NCBI-SRA has no samples for your organism, and you are relying solely on RNA-Seq that you generate yourself, best practice is to maximize representation of all genes by including conditions like reproductive tissues and stresses in which orphan gene expression is high.
Pick one of the 2 ab initio predictions below:
-
Run BRAKER (See details here):
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
- Run BRAKER
-
Run MAKER (See details here):
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
- Run Trinity to generate transcriptome assembly using the BAM file
- Run TransDecoder on Trinity transcripts to predict ORFs and translate them to protein
- Run MAKER with transcripts (Trinity), proteins (TransDecoder and SwissProt), in homology-only mode
- Use the MAKER predictions to train SNAP and AUGUSTUS. Self-train GeneMark
- Run second round of MAKER with the above (SNAP, AUGUSTUS, and GeneMark) ab initio predictions plus the results from previous MAKER rounds.
3. Direct Inference evidence-based predictions (See details here):
We provide an automated pipeline for evidence-based predictions (See details here)
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id
- For each BAM file, use multiple transcript assemblers for genome guided transcript assembly:
- Class2
- StringTie
- Cufflinks
- Run PortCullis to remove invalid splice junctions
- Consolidate transcripts and generate a non-redundant set of transcripts using Mikado.
- Predict ORFs on these consolidated transcripts using TransDecoder
- Pick best transcripts using all the above information with Miakdo Pick.
If you ran BRAKER in step 2, run 4.1
- Merge BRAKER with Direct Inference (BIND) (See details here):
- Use Mikado to combine BRAKER-generated predictions with Direct Inference evidence-based predictions.
If you ran MAKER in step 2, run 4.2
- Merge MAKER with Direct Inference (MIND) (See details here):
- Use Mikado to combine MAKER-generated predictions with Direct Inference evidence-based predictions.
5. Evaluate your predictions (See details here):
- Run
BUSCO
to see how well the conserved genes are represented in your final predictions - Run
OrthoFinder
to find and annotate orthologs present in your predictions - Run
phylostratR
to find orphan genes in your predictions - Add functional annotation to your genes using homology and
InterProScan
Tool | Purpose |
---|---|
SRA Tools (v. 2.9.6 ) | SRA access |
Hisat2 (v. 2.2.0) | Alignment |
STAR (v. 2.7.7a) | Alignment |
Kallisto (v. 0.46.2) | Quantification |
Samtools (v. 1.10) | Tools |
CLASS2 (v. 2.1.7) | Transcript Assembly |
Stringtie (v. 1.3.3) | Transcript Assembly |
Cufflinks (v. 2.2.1) | Transcript Assembly |
Trinity (v. 2.6.6) | Transcript Assembly |
Porticullis (v. 1.2.2) | Tools |
Transdecoder (v. 3.0.1) | CDS prediction |
Mikado (v. 2.0) | Direct Inference prediction |
Phylostratr (v. 0.2.0) | Phylostratigraphy |
BLAST (v. 3.11.0) | Tools |
Braker (v. 2.1.2) | Ab initio prediction |
Maker (v. 2.31.10) | Ab initio prediction |
GMAP-GSNAP (v. 2019-05-12) | Alignment |
GeneMark (v. 4.83) | Ab initio Prediction |