Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

10X 5' single-cell data #127

Open
HenriettaHolze opened this issue Jun 7, 2024 · 4 comments
Open

10X 5' single-cell data #127

HenriettaHolze opened this issue Jun 7, 2024 · 4 comments
Labels
documentation Improvements or additions to documentation

Comments

@HenriettaHolze
Copy link

Hi @hxj5 , I have a question regarding 10X 5' scRNA-seq data.
For 5' sequencing, the read containing cell barcode and UMI contains part of the transcript https://kb.10xgenomics.com/hc/en-us/articles/360000939852-What-is-the-difference-between-Single-Cell-3-and-5-Gene-Expression-libraries.
The 10X CellRanger pipeline therefore includes both the forward and reverse read in the BAM file.
You mention in #121 that only one read per cell barcode UMI combination is used to extract the allele. In that case, only either the forward or reverse read is considered and almost half the data is discarded. Is this correct?

Cheers, Henrietta

@hxj5
Copy link
Collaborator

hxj5 commented Jun 7, 2024

Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.

@hxj5 hxj5 added the documentation Improvements or additions to documentation label Jun 7, 2024
@HenriettaHolze
Copy link
Author

Thank you, that makes sense.

@wJDKnight
Copy link

Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.

Does that mean one UMI can be counted in multiple SNPs?

@hxj5
Copy link
Collaborator

hxj5 commented Aug 2, 2024

Does that mean one UMI can be counted in multiple SNPs?

Yes, if the UMI covers multiple SNPs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants