AliSim, codon models and dN/dS values #295

thednainus · 2024-08-07T13:18:19Z

thednainus
Aug 7, 2024

Hello,

I am trying to simulate a codon sequence alignment with a specific value of dN/dS and ts/tv. However, I add 15 different mutations to some sequences; all of which are non synonymous mutations.

I realised that if I don't add the mutations. I get the simulated alignment and when I reestimate the dN/dS values (using R for example), I get the expected value used to simulate the alignment.

However, when I add the 15 mutations (all non-synonymous mutations); the estimated dN/dS is super high (compared to value I used to simulate the sequence).
Is there a way of adding these 15 non-synonymous mutations, and still keep the dN/dS as used in the simulations when I reestimate the dN/dS using R?

Thanks,
Fabricia.

roblanf · 2024-08-12T23:39:49Z

roblanf
Aug 12, 2024
Maintainer

Hi Fabricia,

If I've understood correctly what you are doing (of which I'm not sure...) then I think what you are asking seems impossible.

Here's why. If you simulate data under a pre-specified dN/dS ratio, then you will get (within the limits of a stochastic process operating on a finite number of sites and branches) an alignment which match that dN/dS ratio quite closely. If you simulate a large number of such alignments, and each has many sites and many species, then you should get a collection of dN/dS ratios which closely match the value you used to simulate the data. This matches up with your first comment - that when you simulate an alignment with a pre-specified dN/dS ratio, and then estimate that ratio from the alignment in R, you get a close match.

Now if you take that alignment and add non-synonymous mutations, you are systematically increasing the numerator (dN) but not the denominator (dS). That means the dN/dS ratio has to go up. Even if you add just a single non-synonymous mutation, the dN/dS ratio will go up, such that over many simulations you'd see a shift in the distribution of dN/dS ratios even on large alignments.

So, there is no way to simulate an alignment with a pre-specified dN/dS ratio and add non-synonymous mutations without pushing the dN/dS ratio of the resulting edited alignment upwards.

This makes me wonder if what you are asking is to pre-specify 15 specific non-synonymous mutations to be included in the simulation. That's a trickier task, and not something that AliSim can do. However, you could potentially do that post-hoc. For example, if you simulate an alignment with >>15 non-synonymous mutations, you could infer the location of the non-synonymous mutations on the tree, then edit out 15 of these at random and switch them out for the 15 you are interested in. That would incur some unavoidable biases (e.g. you are biasing your 15 mutations towards mutations you could infer on the tree), but it might be a start. To do this would be a bit complex, but you'd start with ancestral state reconstruction.

I hope some of that helps,

Rob

0 replies

thednainus · 2024-08-14T12:40:07Z

thednainus
Aug 14, 2024
Author

Hi,
Thanks! You understood my question correctly. Thanks for replying.
I am using the flag --mutation to add the mutations to my alignment available in AliSim. However, I could not find how the mutations are added to the alignment once we predefine a dN/dS and ts/tv ratio.
So it could be:

Simulate the alignment first and then add the mutations later to the simulated alignment, or
Simulate the alignment and take into consideration the mutations to be added to alignment in order to keep the predefined dN/dS and ts/tv ratio.
After you explanation, I think AliSim works with the first option.

Thanks,
Fabricia.

0 replies

trongnhanuit · 2024-08-16T00:54:14Z

trongnhanuit
Aug 16, 2024
Collaborator

Hi Rob and Fabricia, Would it be possible to also add a few synonymous mutations along with non-synonymous mutations to maintain the pre-specified dN/dS ratio? If we can somehow somehow generate a number of synonymous mutations (I'm not really sure whether it's possible or not in terms of biology) then specify them together with the (15) non-synonymous mutations, it would help to maintain the dN/dS ratio as expected. Cheers, Nhan

…

On Tue, Aug 13, 2024 at 9:40 AM roblanf ***@***.***> wrote: Hi Fabricia, If I've understood correctly what you are doing (of which I'm not sure...) then I think what you are asking seems impossible. Here's why. If you simulate data under a pre-specified dN/dS ratio, then you will get (within the limits of a stochastic process operating on a finite number of sites and branches) an alignment which match that dN/dS ratio quite closely. If you simulate a large number of such alignments, and each has many sites and many species, then you should get a collection of dN/dS ratios which closely match the value you used to simulate the data. This matches up with your first comment - that when you simulate an alignment with a pre-specified dN/dS ratio, and then estimate that ratio from the alignment in R, you get a close match. Now if you take that alignment and add non-synonymous mutations, you are systematically increasing the numerator (dN) but not the denominator (dS). That means the dN/dS ratio *has* to go up. Even if you add just a single non-synonymous mutation, the dN/dS ratio will go up, such that over many simulations you'd see a shift in the distribution of dN/dS ratios even on large alignments. So, there is no way to simulate an alignment with a pre-specified dN/dS ratio and add non-synonymous mutations without pushing the dN/dS ratio of the resulting edited alignment upwards. This makes me wonder if what you are asking is to pre-specify 15 *specific* non-synonymous mutations to be included in the simulation. That's a trickier task, and not something that AliSim can do. However, you could potentially do that post-hoc. For example, if you simulate an alignment with >>15 non-synonymous mutations, you could infer the location of the non-synonymous mutations on the tree, then edit out 15 of these at random and switch them out for the 15 you are interested in. That would incur some unavoidable biases (e.g. you are biasing your 15 mutations towards mutations you could infer on the tree), but it might be a start. To do this would be a bit complex, but you'd start with ancestral state reconstruction. I hope some of that helps, Rob — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZPMLGH6ZFN7XRKQJVBYY3ZRFBVXAVCNFSM6AAAAABMEM5MM6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZRHE3DANY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

0 replies

roblanf · 2024-08-16T01:05:09Z

roblanf
Aug 16, 2024
Maintainer

Nice idea @trongnhanuit. That makes me think that the other option would be to simulate the sequences with a lower dN/dS ratio than the target (by the count of 15 fewer dN) then add the dN mutations in after the fact.

One worry with such an approach is that one may induce unwanted variation in dN/dS among branches.

I think @thednainus' second suggestion is a neat one, but would be tricky to implement because specifying a dN/dS ratio and an alignment length and tree length places strong constraints on how many of each substitution type there can be on each branch of the tree. This could make it near impossible to come up with a satisfactory implementation which maintains the constraints of the intended model.

An option in this case would be to get the user to specify either a tree length OR an alignment length, such that the constraints of the dN/dS model could be respected (e.g. if you want 15 dN on a single short branch, you need a really long alignment to satisfy a low dN/dS ratio on that branch).

Rob

0 replies

trongnhanuit · 2024-08-16T02:12:00Z

trongnhanuit
Aug 16, 2024
Collaborator

Hi Rob and Fabricia, *That makes me think that the other option would be to simulate the sequences with a lower dN/dS ratio than the target (by the count of 15 fewer dN) then add the dN mutations in after the fact.* *One worry with such an approach is that one may induce unwanted variation in dN/dS among branches.* => If you're worrying that the dN/dS ratio in those branches where the 15 dN mutations are added is higher than that of the other branches then I think we can address that problem with the current implementation of AliSim. Actually, AliSim implementation doesn't follow the first option mentioned by Fabricia but it's more similar to the second one. However, we don't consider the type of the pre-specified mutations when processing those mutations, therefore, we fail to maintain the dN/dS ratio. To be more specific, At a branch, we first compute the number of mutations that would occur along that branch (according to the branch length). Assuming that, there are 10 mutations occuring. + *Case 1.* If users don't specify any pre-specified mutations, AliSim randomly generates 10 mutations following the substitution model (where the dN/dS is defined). Therefore, without pre-specified mutations, we can maintain the dN/dS ratio. *+ Case 2.* If users specify, e.g., 4 pre-specified mutations, AliSim first enforces those 4 mutations, then randomly generates 6 remaining (i.e., 10 - 4) mutations following the substitution model. We don't consider the type of pre-precified mutations to adjust the remaining mutations to maintain the dN/dS ratio (because it's too difficult for implementation as Rob mentioned above, especially considering the generalization of the implementation to support hundreds of models in IQ-TREE). Therefore, if all the pre-specified mutations are dN mutations, we will end up with an increase in the dN/dS ratio. => If we can specify both dN and dS as pre-specified mutations following the expected dN/dS ratio, then we can maintain that ratio for that branch. While, the other branches (which belong to case 1) should already maintain that ratio. Please feel free to let me know if anything doesn't make sense or I misunderstand something. Thank you! Cheers, Nhan

…

On Fri, Aug 16, 2024 at 11:05 AM roblanf ***@***.***> wrote: Nice idea @trongnhanuit <https://github.com/trongnhanuit>. That makes me think that the other option would be to simulate the sequences with a lower dN/dS ratio than the target (by the count of 15 fewer dN) then add the dN mutations in after the fact. One worry with such an approach is that one may induce unwanted variation in dN/dS among branches. I think @thednainus <https://github.com/thednainus>' second suggestion is a neat one, but would be tricky to implement because specifying a dN/dS ratio and an alignment length and tree length places strong constraints on *how many* of each substitution type there can be on each branch of the tree. This could make it near impossible to come up with a satisfactory implementation which maintains the constraints of the intended model. An option in this case would be to get the user to specify either a tree length OR an alignment length, such that the constraints of the dN/dS model could be respected (e.g. if you want 15 dN on a single short branch, you need a really long alignment to satisfy a low dN/dS ratio on that branch). Rob — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZPMLDTRWC3HXHMSA4XWMTZRVF5ZAVCNFSM6AAAAABMEM5MM6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAMZVGM2TGOA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AliSim, codon models and dN/dS values #295

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

AliSim, codon models and dN/dS values #295

thednainus Aug 7, 2024

Replies: 5 comments

roblanf Aug 12, 2024 Maintainer

thednainus Aug 14, 2024 Author

trongnhanuit Aug 16, 2024 Collaborator

roblanf Aug 16, 2024 Maintainer

trongnhanuit Aug 16, 2024 Collaborator

thednainus
Aug 7, 2024

roblanf
Aug 12, 2024
Maintainer

thednainus
Aug 14, 2024
Author

trongnhanuit
Aug 16, 2024
Collaborator

roblanf
Aug 16, 2024
Maintainer

trongnhanuit
Aug 16, 2024
Collaborator