perf: more efficient _sparse_nanmean#3570
Conversation
|
@flying-sheep Can you assign the reviewer please? Fall test looks like testing bug |
|
And I can rewrite logic without any coping using numba, but it can be slowly that implemented methods |
|
I know that I have "No milestone failure" but I haven't permissions to set milestone, probably I need help of maintainers to set it |
|
@Reovirus I restarted the CI as there should hopefully be no flaky tests at the moment. Don't worry about the milestone, please. |
|
failure is scipy.sparse.csr_matrix.count_nonzero which has axis argument since scipy 1.15.0. I'll rewrite with numba |
|
@Zethson cund you please restart CI? I catch very strange bug with http, it's not mine code |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
CI pased, can somebody review in some time pls |
68e1e9c to
e089438
Compare
|
I made the benchmarks run, let’s see how well this works! Could you please add a release note fragment? |
|
Yes, l'll make a note, thanks!) |
Benchmark changes
Comparison: https://github.com/scverse/scanpy/compare/9bc2c1ed1bd1cf6f6c06cec71cce99916d048163..24f042ecdcc124a1ebcc526606373b9b3a940024 More details: https://github.com/scverse/scanpy/pull/3570/checks?check_run_id=70256269762 |
|
according to the benchmarks, this is actually slower than before. the benchmarks are a bit flaky, so it could be wrong, but usually a factor of 2.5 like here is real. |
|
Understand. Try to change and speed up |
|
@flying-sheep |
|
I think these should be parallel numba kernels with mask arguments for major and minor axis. I have a working implementation in rapids-singlecell that doesnt need to copy at all. Starting from _get_mean_var is I think the best way forward |
Intron7
left a comment
There was a problem hiding this comment.
Refactor kernels to be 0 copy
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
|
@Intron7 I rewrite logics. But my local benchmarking gives another result (I just measure peak memory by ), I'm trying to fix it. And I have no idea about some metrics like preprocessing_counts.FastSuite.time_log1p('pbmc3k', 'counts-off-axis'). It change sign randomly, is it normal? |
|
The kernels already look good. However my main point I was trying to make is that you can create the means without subsetting. That means you wouldnt need to use the get subset function but just use masks within your kernel. |
Zethson
left a comment
There was a problem hiding this comment.
Just typos I think. These should be renamed.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3570 +/- ##
==========================================
- Coverage 78.51% 78.49% -0.02%
==========================================
Files 117 117
Lines 12753 12751 -2
==========================================
- Hits 10013 10009 -4
- Misses 2740 2742 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
I merged upstream changes. Do you still plan to work in the changes @Intron7 requested? |
Fixes #1894. Reduced redundant data copying; the original matrix is now copied once instead of twice. One copy remains necessary to replace NaNs with zeros without modifying the original matrix.