Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSVA calculation takes extremely long #18

Open
jonrot1906 opened this issue Oct 25, 2023 · 7 comments
Open

GSVA calculation takes extremely long #18

jonrot1906 opened this issue Oct 25, 2023 · 7 comments

Comments

@jonrot1906
Copy link

Dear @guokai8,

thanks for your great package. I am currently struggling a little to use it on my dataset, as the GSVA calculation takes extremely long.
I am using a custom gene set in this structure:

GeneID | Annot
PTGS2 | Ferroptosis

And I am running these commands:

gene_set <- read.csv("gene_set.csv")
res<-scgsva(nft_ad,annot=gene_set,method="gsva",useTerm = F)

This produces the following console messages (which look fine in my opinion):

Setting parallel calculations through a MulticoreParam back-end
with workers=4 and tasks=100.
Estimating GSVA scores for 1 gene sets.
Estimating ECDFs with Poisson kernels
Estimating ECDFs in parallel on 4 cores

About 21 iterations (I assume cells) took around 12 hours. I am running this on a M1 Pro MacBook with 32 GB RAM - do you think it will be faster once I switch to a computer with better specifications? I want to run GSVA analysis on around 100000 cells...this would take ages.

I am keen to get your recommendations!
Thanks and best regards,
Jonas

@guokai8
Copy link
Owner

guokai8 commented Nov 17, 2023

Hi @jonrot1906 ,
I am working on the new version now. Will fix this issue soon. thanks!
K

@guokai8
Copy link
Owner

guokai8 commented Nov 22, 2023

Hi @jonrot1906 ,
Now, I am testing two approaches: 1, use batch methods and 2, use sampling methods. I may release the new version in few days.
Best,
K

@guokai8
Copy link
Owner

guokai8 commented Nov 28, 2023

Hi @jonrot1906 ,
batch method is available now. And you can also calculate the UCell scores by setting the method="UCell". Now working on the sampling methods
K,

@sjasws
Copy link

sjasws commented Jul 23, 2024

Hi @jonrot1906 , batch method is available now. And you can also calculate the UCell scores by setting the method="UCell". Now working on the sampling methods K,

Dear @guokai8,
I faced with the same problem when I calculated GSVA score with 80,000 cells * 30,000 genes. Thank you for providing the "batch method" to address this isssue, I am going to try it.
But could you please explain how the "batch method" done? As I found that the GSVA will give different values depending on number of samples (rcastelo/GSVA#101), which means if split the whole data to different parts, the result will different with the result calculating GSVA score with the whole data directly.
Thank you for your help!

@guokai8
Copy link
Owner

guokai8 commented Nov 4, 2024

Hi @sjasws,
As for the batch issues, they are caused by normalization within the batch or overall. will be fixed soon. Or you can just go with the batch number and won't cause issues for the final results.
K

@guokai8
Copy link
Owner

guokai8 commented Nov 8, 2024

Hi @jonrot1906 @sjasws @egeulgen ,
The new version will give exactly the same result for batch mode and no batch model.
K

@XiaoyuZhan520
Copy link

Hi @sjasws, As for the batch issues, they are caused by normalization within the batch or overall. will be fixed soon. Or you can just go with the batch number and won't cause issues for the final results. K

Thanks for the update. Could you please add the description of 'batch' in R Documentation, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants