-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update performNormalization.R #111
Update performNormalization.R #111
Conversation
added parallelization to normalize function. This might be memory intensive, but still would offer a speed benefit over a strictly single threaded approach. Under the hood, bpvec splits the vector into the number of workers and then executes the function on them, before joining it back together. I tested it against the equivalent single thread method and got an identical result.
improved efficiency in the case of a provided scale factor, fixed for cases where scale factor is not provided
Adding bpvec to importfrom
Thanks for making a pull request - agreed that the speed for |
If it passes checks, can we wait a bit to merge? I think the alist suffers on larger datasets with my internal testing, and I have some ideas on how to do it more efficiently after I slept on it. I'll try to push a few commits over the next 2-3 days if that's alright with you? Thank you so much for promptly responding to the push and providing the correction for the important!! |
significantly speed up calculating scale factors
Absolutely. It might make things easier if you run Nick |
The tests seemed to have run locally! I ended up not parallelizing with bpparallel, mostly because it already runs fairly fast since most of it is now vectorized. If someone really wants to, it's fairly easy to just copy the function and change the mapply call to a bpmapply. |
Hey it looks like the R-CMD-check failed on this one due to mismatch in the new documentation. Would you mind updating the .Rd files on your end with Thanks so much for your help, |
added documentation for groups. added ucell split data matrix since it is an internal function and can't be imported. minor code cleanup.
removed redundant split matrix function (already present). added matrix dependency to imports.
Got it, should be done now. I benchmarked it on my dataset, and it seems like for around 50 gene sets on 9000 cells, runtime is down from around 10 mins to 3 seconds! |
rolled back function renaming
Looks good to me - thanks for all the help!! |
added parallelization to normalize function. This might be memory intensive, but still would offer a speed benefit over a strictly single threaded approach. Under the hood, bpvec splits the vector into the number of workers and then executes the function on them, before joining it back together.
I tested it against the equivalent single thread method and got an identical result.