Skip to content

Clarification on high rejection rates with large single-cell dataset (~200k cells) #84

@martina811

Description

@martina811

Hi,
I’m working on a large single-cell RNA-seq dataset consisting of approximately 200,000 cells and four distinct batches. I’ve been using kBET to evaluate batch correction after integration, but I consistently observe very high rejection rates (close to 1) and p-values near zero, regardless of whether I use cell type annotations or integration-derived clusters as the label vector.

Given the size of my dataset, I suspect this may be related to the sensitivity of kBET to large sample sizes, as noted in [Issue #80]

I’d really appreciate your guidance on the following:

  1. Are there recommended parameter settings (e.g., adjusting k0, or other options) for running kBET on very large datasets?
  2. Would you recommend downsampling the dataset to a smaller subset of cells (e.g., 10k or 20k), and if so, is there an optimal sample size to maintain statistical power without inflating rejection?

Thanks in advance for any help or clarification you can provide!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions