feat: add illicofor rank_genes_groups#4038
feat: add illicofor rank_genes_groups#4038ilan-gold wants to merge 8 commits intoig/exp_post_aggfrom
illicofor rank_genes_groups#4038Conversation
❌ 8 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
|
Hey, good catch for the OVO column ordering. I have no idea where the bug comes from, it seems to be related to group labels for PBMC that |
|
So it seems to be due to np.argsort and np.sort not sorting the weird characters (" ", "+", "/", "#", etc) of the bulk_labels the same way. I will ship a fix sometime today or this weekend. NB: I spent a bit of time looking into the PBMC dataset and it seems rather unusual: a lot of values are negative leading to non-defined logfoldchanges, on top of which there seems to be very little value diversity in the dataset. As a result, a lot of genes end up with identical ranksum, but because illico and scanpy do not compute z-score the exact same way (mathematically equivalent but programmatically different), gene ordering is impacted. The gene ordering impacts the position of NaN values in the results, but non-NaN values do match because they are very similar. I am not sure as if this dataset is a good test case. NB2: details of the programmatic differences: U = ranksum - n_tgt * (n_tgt+ 1)/2
z = U - n_ref * n_tgt / 2scanpy does: z =
ranksum - n_tgt * (n_tgt + 1 + n_ref) / 2Those are mathematically equivalent but result in differences of the order of 1.e-9 approximately, changing gene orders when ranksums are equal. |
TODOs:
See: https://github.com/scverse/scanpy/actions/runs/24088645078/job/70268566419?pr=4038
ovo"column" in scores (cc @remydubois)illico(see thefilterwarningson the new test)ovrresults are so far offexp_post_aggfeature feat: allow exponentiation post agg for log-fold-change inrank_genes_groups#4037) toillico, we will add theillicobackend (or all changes have been documented)illico#4012