-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to make alevin-fry --expect cells to be better at guessing the number of cells? #102
Comments
Hi @ljudevitluka, Thanks for the question! What is your intended workflow? I think there may be a slight disconnect between the intended usage of In general there are several ways to generate a permit list with alevin-fry:
As you suggest, force cells forces a specific number of cells to be quantified (the top N if you pass the argument N, as long as there are at least N distinct barcodes in the input). The purpose of expect cells is to provide a liberal quantification for a number of cells around the number you specify. The way it actually achieves this is motivated by what Cell Ranger does — taking the frequency distribution up to the number you specify, looking at the 99th percentile, and then including anything that has up to 1/10th that number of barcodes. As you suggest, the idea here is that it is better to quantify more cells, under the assumption that they can later be filtered out if they are of low or dubious quality once you are doing your analysis. Now, I'd expected the knee method to be most inline with what you seem to suggest from the plot. The purpose of the knee method is to find the cutoff at the knee of the rank plot. This method doesn't take an argument, and build the plot and attempts to find the knee itself. This method usually works well, however it tends to be quite conservative in the number of cells it calls — i.e. it errs on the side of excluding cells from quantification rather than including them. However, since you are using 10x Chromium technology, what I'd actually recommend as a default pipeline is to use Let me know if the above makes sense, or if you have any other questions. Also, looping in @DongzeHE as he may have thoughts as well. Best, |
Hi all, I totally agree with what @rob-p said. One thing to add: If you have an expected number of cells in your mind, you can also try to run the Best, |
thank you very much for your help and explanations! Based on your reply, a good approach would be to use --unfiltered-pl with the 10x v3 "whitelist" settings and proceed with DropletUtils to generate a list of high-quality cells. We are comparing 4 states: healthy (control) vs. infected vs. vaccinated+infected vs. "placebo"-vaccinated+infected; all conditions in duplicates. All of the analyses are very new to me, so yes I think my understanding was probably wrong. I was thinking that after mapping the list of cells that are generated with alevin-fry is almost complete, and afterward only mtDNA content and rRNA content are the values used for filtering. Best, Luka |
Discussed in #101
Originally posted by ljudevitluka January 31, 2023
Hi all,
I am working on the scRNAseq dataset generated with the 10x Genomics platform. We aimed to collect 10, 000 cells per experiment, and we mapped the reads to the transcriptome (since no genomic reference is available at the moment). Everything seems to be working more or less ok with alevin-fry, but in my samples, I always get the overestimation of the cell number (as per the graph bellow). I am aware we could use --force-cells, but would be more comfortable if the knee is detected by the --expect-cells flag. I tried calling --expect-cells 5000, 8000, and 10000 cells but the results were always overestimated.
My question is: Is there a way to make --expect-cells more sensitive? If I use --force-cells how will this affect my downstream analysis if some of the samples are carrying noise? If there is no automated way, is it better to take a bit of noise and hope it will get filtered out based on the further cell QC (mitoDNA genes content, ribosomal content etc.) or to be on the secure side and take a bit less cell with a risk of loss of the cells that have a lower number of genes expressed?
Thank you very much for your help and opinions :) Hopefully, this question is not out of the scope of discussion.
All the best, Luka
The text was updated successfully, but these errors were encountered: