Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update performNormalization.R #111

Merged
merged 10 commits into from
Aug 22, 2024

Conversation

LinearParadox
Copy link
Contributor

added parallelization to normalize function. This might be memory intensive, but still would offer a speed benefit over a strictly single threaded approach. Under the hood, bpvec splits the vector into the number of workers and then executes the function on them, before joining it back together.

I tested it against the equivalent single thread method and got an identical result.

LinearParadox and others added 3 commits August 13, 2024 14:00
added parallelization to normalize function. This might be memory intensive, but still would offer a speed benefit over a strictly single threaded approach. Under the hood, bpvec splits the vector into the number of workers and then executes the function on them, before joining it back together. 

I tested it against the equivalent single thread method and got an identical result.
improved efficiency in the case of a provided scale factor, fixed for cases where scale factor is not provided
Adding bpvec to importfrom
@ncborcherding
Copy link
Member

Thanks for making a pull request - agreed that the speed for performNormalization() could be improved substantially. I am working through the testing here (see above) and will get to it this weekend

@LinearParadox
Copy link
Contributor Author

If it passes checks, can we wait a bit to merge? I think the alist suffers on larger datasets with my internal testing, and I have some ideas on how to do it more efficiently after I slept on it. I'll try to push a few commits over the next 2-3 days if that's alright with you? Thank you so much for promptly responding to the push and providing the correction for the important!!

significantly speed up calculating scale factors
@ncborcherding
Copy link
Member

Absolutely. It might make things easier if you run devtools::check() locally with the updated commit. Also if interested please feel free to add your name to the function (as @author) or the package DESCRIPTION.

Nick

@LinearParadox
Copy link
Contributor Author

The tests seemed to have run locally! I ended up not parallelizing with bpparallel, mostly because it already runs fairly fast since most of it is now vectorized. If someone really wants to, it's fairly easy to just copy the function and change the mapply call to a bpmapply.

@ncborcherding
Copy link
Member

Hey it looks like the R-CMD-check failed on this one due to mismatch in the new documentation. Would you mind updating the .Rd files on your end with devtools::document()? Then I think we will be golden.

Thanks so much for your help,
Nick

added documentation for groups. added ucell split data matrix since it is an internal function and can't be imported. minor code cleanup.
removed redundant split matrix function (already present). added matrix dependency to imports.
@LinearParadox
Copy link
Contributor Author

Got it, should be done now. I benchmarked it on my dataset, and it seems like for around 50 gene sets on 9000 cells, runtime is down from around 10 mins to 3 seconds!

rolled back function renaming
@ncborcherding ncborcherding merged commit b5307d0 into BorchLab:master Aug 22, 2024
2 checks passed
@ncborcherding
Copy link
Member

Looks good to me - thanks for all the help!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants