AliSim, codon models and dN/dS values #295
Replies: 5 comments
-
Hi Fabricia, If I've understood correctly what you are doing (of which I'm not sure...) then I think what you are asking seems impossible. Here's why. If you simulate data under a pre-specified dN/dS ratio, then you will get (within the limits of a stochastic process operating on a finite number of sites and branches) an alignment which match that dN/dS ratio quite closely. If you simulate a large number of such alignments, and each has many sites and many species, then you should get a collection of dN/dS ratios which closely match the value you used to simulate the data. This matches up with your first comment - that when you simulate an alignment with a pre-specified dN/dS ratio, and then estimate that ratio from the alignment in R, you get a close match. Now if you take that alignment and add non-synonymous mutations, you are systematically increasing the numerator (dN) but not the denominator (dS). That means the dN/dS ratio has to go up. Even if you add just a single non-synonymous mutation, the dN/dS ratio will go up, such that over many simulations you'd see a shift in the distribution of dN/dS ratios even on large alignments. So, there is no way to simulate an alignment with a pre-specified dN/dS ratio and add non-synonymous mutations without pushing the dN/dS ratio of the resulting edited alignment upwards. This makes me wonder if what you are asking is to pre-specify 15 specific non-synonymous mutations to be included in the simulation. That's a trickier task, and not something that AliSim can do. However, you could potentially do that post-hoc. For example, if you simulate an alignment with >>15 non-synonymous mutations, you could infer the location of the non-synonymous mutations on the tree, then edit out 15 of these at random and switch them out for the 15 you are interested in. That would incur some unavoidable biases (e.g. you are biasing your 15 mutations towards mutations you could infer on the tree), but it might be a start. To do this would be a bit complex, but you'd start with ancestral state reconstruction. I hope some of that helps, Rob |
Beta Was this translation helpful? Give feedback.
-
Hi,
Thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi Rob and Fabricia,
Would it be possible to also add a few synonymous mutations along
with non-synonymous mutations to maintain the pre-specified dN/dS ratio? If
we can somehow somehow generate a number of synonymous mutations (I'm not
really sure whether it's possible or not in terms of biology) then specify
them together with the (15) non-synonymous mutations, it would help to
maintain the dN/dS ratio as expected.
Cheers,
Nhan
…On Tue, Aug 13, 2024 at 9:40 AM roblanf ***@***.***> wrote:
Hi Fabricia,
If I've understood correctly what you are doing (of which I'm not sure...)
then I think what you are asking seems impossible.
Here's why. If you simulate data under a pre-specified dN/dS ratio, then
you will get (within the limits of a stochastic process operating on a
finite number of sites and branches) an alignment which match that dN/dS
ratio quite closely. If you simulate a large number of such alignments, and
each has many sites and many species, then you should get a collection of
dN/dS ratios which closely match the value you used to simulate the data.
This matches up with your first comment - that when you simulate an
alignment with a pre-specified dN/dS ratio, and then estimate that ratio
from the alignment in R, you get a close match.
Now if you take that alignment and add non-synonymous mutations, you are
systematically increasing the numerator (dN) but not the denominator (dS).
That means the dN/dS ratio *has* to go up. Even if you add just a single
non-synonymous mutation, the dN/dS ratio will go up, such that over many
simulations you'd see a shift in the distribution of dN/dS ratios even on
large alignments.
So, there is no way to simulate an alignment with a pre-specified dN/dS
ratio and add non-synonymous mutations without pushing the dN/dS ratio of
the resulting edited alignment upwards.
This makes me wonder if what you are asking is to pre-specify 15
*specific* non-synonymous mutations to be included in the simulation.
That's a trickier task, and not something that AliSim can do. However, you
could potentially do that post-hoc. For example, if you simulate an
alignment with >>15 non-synonymous mutations, you could infer the location
of the non-synonymous mutations on the tree, then edit out 15 of these at
random and switch them out for the 15 you are interested in. That would
incur some unavoidable biases (e.g. you are biasing your 15 mutations
towards mutations you could infer on the tree), but it might be a start. To
do this would be a bit complex, but you'd start with ancestral state
reconstruction.
I hope some of that helps,
Rob
—
Reply to this email directly, view it on GitHub
<#295 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZPMLGH6ZFN7XRKQJVBYY3ZRFBVXAVCNFSM6AAAAABMEM5MM6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZRHE3DANY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Nice idea @trongnhanuit. That makes me think that the other option would be to simulate the sequences with a lower dN/dS ratio than the target (by the count of 15 fewer dN) then add the dN mutations in after the fact. One worry with such an approach is that one may induce unwanted variation in dN/dS among branches. I think @thednainus' second suggestion is a neat one, but would be tricky to implement because specifying a dN/dS ratio and an alignment length and tree length places strong constraints on how many of each substitution type there can be on each branch of the tree. This could make it near impossible to come up with a satisfactory implementation which maintains the constraints of the intended model. An option in this case would be to get the user to specify either a tree length OR an alignment length, such that the constraints of the dN/dS model could be respected (e.g. if you want 15 dN on a single short branch, you need a really long alignment to satisfy a low dN/dS ratio on that branch). Rob |
Beta Was this translation helpful? Give feedback.
-
Hi Rob and Fabricia,
*That makes me think that the other option would be to simulate the
sequences with a lower dN/dS ratio than the target (by the count of 15
fewer dN) then add the dN mutations in after the fact.*
*One worry with such an approach is that one may induce unwanted variation
in dN/dS among branches.*
=> If you're worrying that the dN/dS ratio in those branches where the 15
dN mutations are added is higher than that of the other branches then I
think we can address that problem with the current implementation of AliSim.
Actually, AliSim implementation doesn't follow the first option mentioned
by Fabricia but it's more similar to the second one. However, we don't
consider the type of the pre-specified mutations when processing those
mutations, therefore, we fail to maintain the dN/dS ratio. To be
more specific,
At a branch, we first compute the number of mutations that would
occur along that branch (according to the branch length). Assuming that,
there are 10 mutations occuring.
+ *Case 1.* If users don't specify any pre-specified mutations, AliSim
randomly generates 10 mutations following the substitution model (where the
dN/dS is defined). Therefore, without pre-specified mutations, we can
maintain the dN/dS ratio.
*+ Case 2.* If users specify, e.g., 4 pre-specified mutations, AliSim first
enforces those 4 mutations, then randomly generates 6 remaining (i.e., 10 -
4) mutations following the substitution model. We don't consider the type
of pre-precified mutations to adjust the remaining mutations to maintain
the dN/dS ratio (because it's too difficult for implementation as Rob
mentioned above, especially considering the generalization of the
implementation to support hundreds of models in IQ-TREE). Therefore, if all
the pre-specified mutations are dN mutations, we will end up with an
increase in the dN/dS ratio.
=> If we can specify both dN and dS as pre-specified mutations following
the expected dN/dS ratio, then we can maintain that ratio for that branch.
While, the other branches (which belong to case 1) should already maintain
that ratio.
Please feel free to let me know if anything doesn't make sense or I
misunderstand something. Thank you!
Cheers,
Nhan
…On Fri, Aug 16, 2024 at 11:05 AM roblanf ***@***.***> wrote:
Nice idea @trongnhanuit <https://github.com/trongnhanuit>. That makes me
think that the other option would be to simulate the sequences with a lower
dN/dS ratio than the target (by the count of 15 fewer dN) then add the dN
mutations in after the fact.
One worry with such an approach is that one may induce unwanted variation
in dN/dS among branches.
I think @thednainus <https://github.com/thednainus>' second suggestion is
a neat one, but would be tricky to implement because specifying a dN/dS
ratio and an alignment length and tree length places strong constraints on *how
many* of each substitution type there can be on each branch of the tree.
This could make it near impossible to come up with a satisfactory
implementation which maintains the constraints of the intended model.
An option in this case would be to get the user to specify either a tree
length OR an alignment length, such that the constraints of the dN/dS model
could be respected (e.g. if you want 15 dN on a single short branch, you
need a really long alignment to satisfy a low dN/dS ratio on that branch).
Rob
—
Reply to this email directly, view it on GitHub
<#295 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZPMLDTRWC3HXHMSA4XWMTZRVF5ZAVCNFSM6AAAAABMEM5MM6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZVGM2TGOA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hello,
I am trying to simulate a codon sequence alignment with a specific value of dN/dS and ts/tv. However, I add 15 different mutations to some sequences; all of which are non synonymous mutations.
I realised that if I don't add the mutations. I get the simulated alignment and when I reestimate the dN/dS values (using R for example), I get the expected value used to simulate the alignment.
However, when I add the 15 mutations (all non-synonymous mutations); the estimated dN/dS is super high (compared to value I used to simulate the sequence).
Is there a way of adding these 15 non-synonymous mutations, and still keep the dN/dS as used in the simulations when I reestimate the dN/dS using R?
Thanks,
Fabricia.
Beta Was this translation helpful? Give feedback.
All reactions