You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I would like to have some recommendations for using this wonderful tool. First, my goal is to annotate the protein-coding gene and then combine the MetaEuk gff3 annotation file with the Braker2 gff3 file using a combiner such as (EvidenceModeler..).
I understand that to use MetaEuk, I need the assembled file in fasta format (in my case, 100 scaffolds) is an avian species with a genome size of around 1G.
So based on my humble understanding of the homology-based approach, I searched NCBI to create a sequence database of protein, so I used All available Galliformes taxi 8976, which I consider close-related species of my genome. This is the command I used to create a target protein database: esearch -db protein -query "Galliformes [ORGN] AND refseq [filter]" | efetch -format fasta > Galliformes_proteins2.Refseq-425067.faa.
Then metaeuk Version: 6.a5d39d9 via conda for prediction in easy-mode : metaeuk easy-predict $genome $proteinDB $prefixprediction_name $tempFolder
To assess the annotation completeness, I used busco in protein mode, and I fed it with MetaEuk.fas against the aves_odb10. The completeness BUSCOs of the genome was 97% and 90% for the annotation of MetaEuk well, it is much better than the annotation of Braker2.
First of all, I'm not sure this is a good approach to take.
Second related to the MetaEuk output files :
Protein file output
The protein sequences predicted contain some lower-case characters. Does that affect BUSCOs evaluations? I assume this behavior in the MetaEuk.fas because my genome is Soft-masked. But I don't know if that supposes any issue for downstream analysis.
gff file output
I know that the gff file only contained coding regions, but my question is it compatible with EvidenceModeler gff3 input?
Third: Regarding this genome which OrthoDB did you recommend to me to train MetaEuk? The protein DB I used might not be enough for a good annotation of the protein-coding gene. I was thinking of using the whole vertebrata_odb10.fasta, but I couldn't find the link to download it in fasta format. But I'm still trying to figure out the correct way.
I would appreciate your feedback.
Thank you.
ben
The text was updated successfully, but these errors were encountered:
We're very happy you find the tool useful. It will take me some time to address your questions so I apologize in advance. In the meantime, it might be useful for you to read this section, which details an easy way to download reference databases and filter them according to taxonomy.
You could download UniRef90 for example and then filter it to contain only vertebrata (7742), for example.
Hello,
I would like to have some recommendations for using this wonderful tool. First, my goal is to annotate the protein-coding gene and then combine the MetaEuk gff3 annotation file with the Braker2 gff3 file using a combiner such as (EvidenceModeler..).
I understand that to use MetaEuk, I need the assembled file in fasta format (in my case, 100 scaffolds) is an avian species with a genome size of around 1G.
So based on my humble understanding of the homology-based approach, I searched NCBI to create a sequence database of protein, so I used All available Galliformes taxi 8976, which I consider close-related species of my genome. This is the command I used to create a target protein database:
esearch -db protein -query "Galliformes [ORGN] AND refseq [filter]" | efetch -format fasta > Galliformes_proteins2.Refseq-425067.faa
.Then metaeuk Version: 6.a5d39d9 via conda for prediction in easy-mode :
metaeuk easy-predict $genome $proteinDB $prefixprediction_name $tempFolder
protein mode
, and I fed it withMetaEuk.fas
against the aves_odb10. The completeness BUSCOs of the genome was 97% and 90% for the annotation of MetaEuk well, it is much better than the annotation of Braker2.First of all, I'm not sure this is a good approach to take.
Second related to the MetaEuk output files :
The protein sequences predicted contain some lower-case characters. Does that affect BUSCOs evaluations? I assume this behavior in the
MetaEuk.fas
because my genome is Soft-masked. But I don't know if that supposes any issue for downstream analysis.I know that the gff file only contained coding regions, but my question is it compatible with EvidenceModeler gff3 input?
Third: Regarding this genome which OrthoDB did you recommend to me to train MetaEuk? The protein DB I used might not be enough for a good annotation of the protein-coding gene. I was thinking of using the whole vertebrata_odb10.fasta, but I couldn't find the link to download it in fasta format. But I'm still trying to figure out the correct way.
I would appreciate your feedback.
Thank you.
ben
The text was updated successfully, but these errors were encountered: