Differential Expression

The majority of these Differential Expression (DE) discussions points come from Squair et al. (2021), which broadly argue:

Performing DE between all cell per-group using Wilcox or t-test commits the statistical error of pseodureplication (ignoring true biological replicates) and biases DE results towards more high-expressed genes.
Poisson or negative binomial generalised linear mixed models can model sample at a per-cell level, but scale poorly to large numbers of cells and are excessively computationally demanding.
Aggregating counts to the sample level per cluster followed by classical bulk RNAseq testing approaches (edgeR, DESeq2, limma), or pseudobulk analysis, can generate biologically meaningful DE results rapidly.

However, there are some exceptions:

For testing between cells within a single sample or for performing relatively coarse marker gene testing, Wilcox/t-test approaches perform satisfactorily.
Conceivably, if one wanted to probe the level of expression (versus frequency of expression in a population) of sufficiently highly-expressed genes across samples, a linear mixed models could be favourable over a pseudobulk approach.

A subset of highly expressed, sometimes stress-associated, genes can frequently appear when performing DE. These genes can be differentially expressed in a subset of samples due to subtle differences in sample quality or depth of sequencing. Some of the poorer-performing DE approaches discussed above can bias towards returning these genes as DE results.
For example, the genes discussed in Ascensión et al. (2022), seen here.

Kane Foster, 30-05-2022

Pages

Analysis Steps

Specific Cell Types