-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significance Testing in Cell-Level #91
Comments
Hey Çağatay, Super interesting idea - so you have generated random counts and gene sets? I think the method for the original generation of the data will be the major contribution to your observed trends. Each of the set of boxplots are different gene sets and the individual boxes are samples or clusters? GSEA and ssGSEA rely on a walk across the ranked genes and report the point of maximal value (check out more here), so I do not think I would expect a mean value of 0 because of the fact the maximal value is the enrichment value. For real count data - I think you can analyze using the quantiles system you have set up, but I would do that at the individual gene set levels and not across all gene sets. |
Hello again Nick, Thank you for your reply! Data Generation Method: The original data utilized in this analysis was obtained from the 10x platform and processed using Seurat. It is normalized. Each of the set of boxplots are different clusters and individual boxes are samples. We wanted to see whether the exact enrichment score to the cells of the indivdual clusters are significant or not. Expectations from Random Gene Sets: What I understand from you is because the algorithms focus on identifying the maximal enrichment value, the mean value across all genes within the set is not necessarily expected to be zero. Instead, it's influenced by the presence of the maximal enrichment value. I hope I am correct. In our case, to enhance robustness, multiple "negative control" gene sets were indeed generated. To make that I created 500 random gene sets containing 100 random genes per every gene set. As it happens in two sample GSEA, when there's no real enrichment, this statistic fluctuates randomly around zero as it moves through the ranked gene list. So, that’s why we expected to see zero. |
Very interesting work and thanks for following up. There are a couple thoughts I had:
Nick |
Thank you Nick!
Çağatay |
Hey Çağatay, I think using the 97.5 and 2.5 threshold makes sense. The GSEA in escape is actually single sample GSEA (ssGSEA) from Barbie et al. The major underlying difference is that the enrichment score is calculated per sample, instead of by phenotype label. Nick |
Hi Nick,
I hope you're doing well. Firstly, I want to commend you on the excellent work with the package - it's been incredibly useful!
I'm reaching out with a question regarding significance testing within gene sets at the cell level, rather than between different samples as getsignificance currently operates. To explore this, I conducted a test where I generated random gene sets and applied EnrichIt to them. My expectation was to observe a mean value around 0 for the score, but instead, it averaged around 2000. What would be the reasond, do you have any idea?
Additionally, I've plotted the 0.95, 0.5 and 0.05 quantile values on a graph to visually represent the data. I'm considering interpreting values above 0.95 as significantly positively enriched and those below 0.05 as negatively enriched. Quantile values showed with the 3 blue lines. Would you say this approach is reasonable?
Here's the graph I've generated:

X axis shows the clusters while y axis shows the samples and enrichemnt scores. Red dots shows the ones with lower enrichemnt score than 0.05 quantile score of random geneset.
Best regards,
Çağatay
The text was updated successfully, but these errors were encountered: