Increased heterogeneity is only described on first PC #172

vali301s · 2024-08-13T09:15:59Z

Hi,

thank you for the development of Splatter, it's been very exciting exploring your package so far.

I have been using Splatter to simulate data of one group with varying heterogeneity. To set the heterogeneity levels I am adjusting the BCV parameters (for higher heterogeneity -> increase bcv.common and decrease bcv.df).
In the following picture, you can see that in the PCA plots the cells are more dispersed with higher heterogeneity (as expected). However, when I plotted the Elbow plots (below each PCA plot) I noticed that the increase in heterogeneity is mainly comming from the first PC.

This looks super unnatural to me and I have never seen this in real scRNAseq data. Do you know why this is happening? Also, despite this, do you think that I can further use the datasets that I have created, i.e. is it a problem that the Elbow plot looks like this?

PS: Apart from only changing the BCV parameters, I also estimated the parameters from real data: immune cells with low heterogeneity (Naive T/B cells) and high heterogeneity (macrophages/monocytes) and simulated new scRNAseq datasets with said parameters. Once again, I noticed that the additional heterogenetiy that the macrophages and monocytes have is again described mostly by PC1. Since the Elbow plot of the simulated macrophages/monocytes (estimated from real) data looks like this, it really seems that its a feature of Splatter to describe the heterogenetiy on only the first PC...

Thank you very much in advance.

lazappi · 2024-08-14T06:01:45Z

Hi @vali301s

Thanks for giving {splatter} a go. Modifying the variation in a single population is something that hasn't come up very often and is maybe something that the splat simulation doesn't do very well. As you have seen the bcv parameters have some effect but maybe not what you would like and introducing enough different kinds of variation is something many simulations struggle with. I would be curious to see what this looks like in real data though. If you subset to only similar cells in a real population do you see a similar effect on the PCA?

An alternative approach which has been used previously is to simulate a single path rather than one homogenous group. This gives you access to more parameters which you can manipulate to give you something closer to what you want, for example by reducing the amount of differential expression along the path so that it gives your cells some variation but not enough to create two separate populations.

lazappi · 2024-10-04T11:25:13Z

@vali301s Do you have any further questions on this?

lazappi · 2024-11-21T13:50:43Z

I'm going to close this now, please comment if you want to discuss further.

lazappi added the question label Aug 14, 2024

lazappi closed this as completed Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased heterogeneity is only described on first PC #172

Increased heterogeneity is only described on first PC #172

vali301s commented Aug 13, 2024 •

edited

Loading

lazappi commented Aug 14, 2024

lazappi commented Oct 4, 2024

lazappi commented Nov 21, 2024

Increased heterogeneity is only described on first PC #172

Increased heterogeneity is only described on first PC #172

Comments

vali301s commented Aug 13, 2024 • edited Loading

lazappi commented Aug 14, 2024

lazappi commented Oct 4, 2024

lazappi commented Nov 21, 2024

vali301s commented Aug 13, 2024 •

edited

Loading