-
Notifications
You must be signed in to change notification settings - Fork 2
Minimum length used by all 10 marker-gene profiles? #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Jigyasa3, We do not use a sequence length cutoff (at least for mOTUs2, I was not involved for mOTUs1, but I think also there it was not used). We select genes based on hmm scores calculated with HMMER3.
There are two steps: 1. extract genes from genomes, and 2. map metagenomic reads to the extracted genes.
Hope it makes sense. |
Hey @AlessioMilanese Thank you so much for a quick response! Sorry about not considering the marker-gene counts with length normalization for this analysis! Thank you for pointing that out. For the first question, I am still a bit confused (sorry :( ). You mention that mOTU2 (and maybe mOTU1) extract genes based on HMM search against the marker-gene. I understand that. But we find that in our metagenomes, there is a large difference in how many sequences are extracted for each marker-gene (raw counts). Could we possibly explain this by saying that I observe this in one of the single-cell genomes available on NCBI too. Candidatus Endomicrobium trichonymphae has only 8 marker-genes and this bacteria has been shown to undergo genome reduction and pseudogenization in the gut environment. We are interested in explaining the differences between marker-genes in our gut metagenomes dataset and want to tease apart what might be causing it. It looks like the biology of the microbes is the key..Is there any other way to suggest that the software used is not the reason... |
Hey @AlessioMilanese Thank you for replying! And attaching the figures to explain the process. I have one more question, sorry if I am slow. Sorry if my question is rudimentary, hope I am explaining my question properly. |
No problem. The whole process is a little complicated.
mOTUs 1 uses reads (http://www.bork.embl.de/software/mOTUs1/):
So if the inputs is reads, you use mOTUs to identify which species are present and their relative abundance (this process is called taxonomic profiling):
This is when you sequence a human gut sample. If you grow a bacteria on a petri dish and you sequence it, then you have a genome.
Extracting marker genes is based on thresholds. Which is the second column of this file: based on the HMM score |
hey @AlessioMilanese
I wanted to confirm if the minimum length used by all 10 marker-genes in mOTU2 is the same as mOTU1 paper?
Based on the supplementary table 4, a marker gene-specific sequence length cutoff is used in the mOTU1 paper.
When I extract marker-genes from my metagenomes using either mOTU2 or mOTU1, I find a correlation between no. of sequences per marker-gene (i.e. count) and marker-gene length. As there is no option to change the length cutoff in
fetchMGs.pl
andclassify-genomes
, I was wondering if that would bias the relative abundance calculations.What do you think about it?
The text was updated successfully, but these errors were encountered: