Enhanced prediction of orphan genes in assembled genomes

See overview and documentation:

Enhanced prediction of orphan genes in assembled genomes

Gene prediction and optimization using BIND and MIND workflows:

MIND: ab initio gene predictions by MAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome. BIND: ab initio gene predictions by BRAKER combined with gene predictions INferred Directly from alignment of RNA-Seq evidence to the genome.

1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):

Search RNA-Seq datasets for your organism on NCBI, filter Runs (SRR) for Illumina, paired-end, HiSeq 2500 or newer.
Download Runs from NCBI (SRA-toolkit)
If existing annotations is available, expression quantification is done against every gene using every SRR with Kallisto.
run phylostratr on current gene models to infer phylostrata of each gene model
Rank the SRRs with highest number of expressed orphans and select feasible amounts of data to work with.

Note: If NCBI-SRA has no samples for your organism, and you are relying solely on RNA-Seq that you generate yourself, best practice is to maximize representation of all genes by including conditions like reproductive tissues and stresses in which orphan gene expression is high.

2. Ab initio gene prediction:

Pick one of the 2 ab initio predictions below:

Run BRAKER (See details here):
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
- Run BRAKER
Run MAKER (See details here):
- Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
- Generate BAM file for each SRA-SRR id, merge them to generate a single sorted BAM file
- Run Trinity to generate transcriptome assembly using the BAM file
- Run TransDecoder on Trinity transcripts to predict ORFs and translate them to protein
- Run MAKER with transcripts (Trinity), proteins (TransDecoder and SwissProt), in homology-only mode
- Use the MAKER predictions to train SNAP and AUGUSTUS. Self-train GeneMark
- Run second round of MAKER with the above (SNAP, AUGUSTUS, and GeneMark) ab initio predictions plus the results from previous MAKER rounds.

3. Direct Inference evidence-based predictions (See details here):

We provide an automated pipeline for evidence-based predictions (See details here)

Align RNA-Seq with splice aware aligner (STAR or HiSat2 preferred, HiSat2 used here)
Generate BAM file for each SRA-SRR id
For each BAM file, use multiple transcript assemblers for genome guided transcript assembly:
- Class2
- StringTie
- Cufflinks
Run PortCullis to remove invalid splice junctions
Consolidate transcripts and generate a non-redundant set of transcripts using Mikado.
Predict ORFs on these consolidated transcripts using TransDecoder
Pick best transcripts using all the above information with Miakdo Pick.

4. Combine ab initio and Direct Inference evidence-based predictions:

If you ran BRAKER in step 2, run 4.1

Merge BRAKER with Direct Inference (BIND) (See details here):

Use Mikado to combine BRAKER-generated predictions with Direct Inference evidence-based predictions.

If you ran MAKER in step 2, run 4.2

Merge MAKER with Direct Inference (MIND) (See details here):

Use Mikado to combine MAKER-generated predictions with Direct Inference evidence-based predictions.

5. Evaluate your predictions (See details here):

Run BUSCO to see how well the conserved genes are represented in your final predictions
Run OrthoFinder to find and annotate orthologs present in your predictions
Run phylostratR to find orphan genes in your predictions
Add functional annotation to your genes using homology and InterProScan

Prediction tools include:

Tool	Purpose
SRA Tools (v. 2.9.6 )	SRA access
Hisat2 (v. 2.2.0)	Alignment
STAR (v. 2.7.7a)	Alignment
Kallisto (v. 0.46.2)	Quantification
Samtools (v. 1.10)	Tools
CLASS2 (v. 2.1.7)	Transcript Assembly
Stringtie (v. 1.3.3)	Transcript Assembly
Cufflinks (v. 2.2.1)	Transcript Assembly
Trinity (v. 2.6.6)	Transcript Assembly
Porticullis (v. 1.2.2)	Tools
Transdecoder (v. 3.0.1)	CDS prediction
Mikado (v. 2.0)	Direct Inference prediction
Phylostratr (v. 0.2.0)	Phylostratigraphy
BLAST (v. 3.11.0)	Tools
Braker (v. 2.1.2)	Ab initio prediction
Maker (v. 2.31.10)	Ab initio prediction
GMAP-GSNAP (v. 2019-05-12)	Alignment
GeneMark (v. 4.83)	Ab initio Prediction

Name		Name	Last commit message	Last commit date
Latest commit History 326 Commits
Assets		Assets
SuppTables		SuppTables
case-studies		case-studies
docs		docs
evidence_based_pipeline		evidence_based_pipeline
plots_publication		plots_publication
prediction_gff3		prediction_gff3
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhanced prediction of orphan genes in assembled genomes

Table of Contents

Gene prediction and optimization using BIND and MIND workflows:

1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):

2. Ab initio gene prediction:

3. Direct Inference evidence-based predictions (See details here):

We provide an automated pipeline for evidence-based predictions (See details here)

4. Combine ab initio and Direct Inference evidence-based predictions:

5. Evaluate your predictions (See details here):

Prediction tools include:

About

Releases

Packages

Contributors 7

Languages

eswlab/orphan-prediction

Folders and files

Latest commit

History

Repository files navigation

Enhanced prediction of orphan genes in assembled genomes

Table of Contents

Gene prediction and optimization using BIND and MIND workflows:

1. Find an Orphan-Enriched RNA-Seq dataset from NCBI-SRA (See details here):

2. Ab initio gene prediction:

3. Direct Inference evidence-based predictions (See details here):

We provide an automated pipeline for evidence-based predictions (See details here)

4. Combine ab initio and Direct Inference evidence-based predictions:

5. Evaluate your predictions (See details here):

Prediction tools include:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages