Name		Name	Last commit message	Last commit date
parent directory ..
01_runMikado_round1.sh		01_runMikado_round1.sh
02_runTransDecoder.sh		02_runTransDecoder.sh
03_runMikado_round2.sh		03_runMikado_round2.sh
04_rm_redundance.sh		04_rm_redundance.sh
05_TEsorter.sh		05_TEsorter.sh
README.md		README.md

README.md

See overview and documentation:

MIND Prediction

Merge gene predictions of MRAKER (maker-final.gff3) with gene predictions INferred Directly (DI-final.gff3).

Note: See the details to generate these two predictions in make and DirectInf.

Consolidate all the transcripts from maker-final.gff3 and DI-final.gff3, and predict potential protein coding sequence by Mikado:
1. Make a configure file and prepare transcripts
  
  You should prepare a list_MIND.txt as below to include gtf path (1st column), gtf abbrev (2nd column), stranded-specific or not (3rd column):
```
maker-final.gff3    mk    False
DI-final.gff3 DI     False
```
  Then run the script as below:
```
./01_runMikado_round1.sh TAIR10_chr_all.fas junctions.bed list_MIND.txt MIND
```
  This will generate MIND_prepared.fasta file that will be used for predicting ORFs in the next step.
  
  Note: junctions.bed is the same file generate from DirectInf step.
2. Predict potential CDS from transcripts:
```
./02_runTransDecoder.sh MIND_prepared.fasta
```
  We will use MIND_prepared.fasta.transdecoder.bed in the next step.
  
  Note: Here we only kept complete CDS for next step. You can revise 02_runTransDecoder.sh to use both incomplete and complete CDS if you need.
3. Pick best transcripts for each locus and annotate them as gene:
```
./03_runMikado_round2.sh MIND_prepared.fasta.transdecoder.bed MIND
```
  This will generate:
```
mikado.metrics.tsv
mikado.scores.tsv
MIND.loci.gff3
```

Optional: Filter out transcripts with redundant CDS:

./04_rm_redundance.sh MIND.loci.gff3 TAIR10_chr_all.fas

Optional: Filter out transcripts whose predicted proteins mapped to transposon elements:
```
./05_TEsorter.sh filter.pep.fa MIND.loci.gff3
```
Note: filter.pep.fa is an output from previous step for removing redundant CDSs. You can also use all protein sequence if you don't want to remove redundant CDSs.