alevin: Question related to noisy reads, whitelisting, and emptyDrops #506
-
First of all, thank you for providing Salmon/Alevin and the related tools. Upon running Alevin (defaults for
One side note: I further filter the CBs for empty-droplets (using At this stage, here are my options: Somewhere in here there is also [1]
|
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Hi @imranfanaswala , Thanks for raising the important question. The short answer is that the defaults are designed with a generic case in mind. While they may works reasonably well for most cases, it's hard to come up with the parameters that work well for all cases and we might have to tweak a few things. The question, I think, is why do we see such a variation in the number of quantified cells (too high or low) and the answer is there can be multiple reasons. The first thing which can be off is the initial whitelisting. Alevin performs a knee based thresholding in the cumulative frequency distribution of the observed cellular barcodes. In a relatively clean dataset, it's easy to find the cutoff but it's a heuristic, which can over and under shoot sometimes. The first thing to monitor is The Now the options relies on user's choice and the data being quantified. If you wan't completely automated pipeline with no user intervention at all, then I think option c (in your text) will solve both your issues. However, if you can monitor some of the stats then I propose: Hope it helps. |
Beta Was this translation helpful? Give feedback.
-
I had a few questions on running alevin on 10X V3, should I still be using ISR for lib type and when I set forceCells does it take the top cells by UMI count? |
Beta Was this translation helpful? Give feedback.
-
Hi @cnk113 , ForceCells is applied on the number of observed reads for each cellular barcode. |
Beta Was this translation helpful? Give feedback.
Hi @imranfanaswala ,
Thanks for raising the important question. The short answer is that the defaults are designed with a generic case in mind. While they may works reasonably well for most cases, it's hard to come up with the parameters that work well for all cases and we might have to tweak a few things.
The question, I think, is why do we see such a variation in the number of quantified cells (too high or low) and the answer is there can be multiple reasons. The first thing which can be off is the initial whitelisting. Alevin performs a knee based thresholding in the cumulative frequency distribution of the observed cellular barcodes. In a relatively clean dataset, it's easy to find th…