Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use cellSNP on a big BAM file #111

Open
RosaDeSa opened this issue Nov 22, 2023 · 1 comment
Open

Use cellSNP on a big BAM file #111

RosaDeSa opened this issue Nov 22, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@RosaDeSa
Copy link

RosaDeSa commented Nov 22, 2023

Hi! I would be grateful if you can answer my question.
I am trying to use the cellSNP line tool on a large bam file (1.2 T) generated by SMARTseq.
I aim to count all the possible variants in a specific gene.
Do you think I should use the mode 2? Can the analysis be limited to a specific gene?
I'm trying this command:

cellsnp-lite -s myfile.bam -b barcodes.txt -O ./results --chrom 3 --minMAF 0.1 --minCOUNT 100 -p 20 -f chr3_genome.fa

But the vcf in output is empty.

here the header of my bam file:

A00560:334:HLKGKDSX7:1:2272:3631:22216	163	3	10001	1	2S55M93S	=	10284	326	CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCACCCATTGCGCAATCTGTCTCTTATACACATCTGACGCTGCCGACGACGGCACTGAAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAGGGGGG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFF:FF,FF:,F:F:,,,:FFFF	NH:i:4	HI:i:1	AS:i:96	nM:i:0	BX:Z:TGATCTGCACCGGCACTGAA	BC:Z:TGATCTGCACCGGCACTGAA	QB:Z:FFFFFFFFFFFFFFFFFFFF	QU:Z:FFFFFFFFFF	ES:Z:Unassigned_NoFeatures	IS:Z:Unassigned_NoFeatures	UX:Z:GGTGAGGGTT	UB:Z:GGTGAGGGTT
A00560:334:HLKGKDSX7:2:2674:26458:12994	163	3	10001	0	26S54M70S	=	10389	490	CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCACCATTGCGCAATCTGTCTCTTATACACATCTGACGCTGCCGACGACTCGTTAACGGTGTAGATCTCGGTGGT	:FFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFF:FFFFF,:FFFFF:FFFFF:FFFFFFFFFF:FFF:::FFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFF,FFFFFFFFFFF,	NH:i:40	HI:i:1	AS:i:116	nM:i:1	BX:Z:ACAATGGACACTCGTTAACG	BC:Z:ACAATGGACACTCGTTAACG	QB:Z:FFFFFFFFFFFFFFFFFFFF	QU:Z:FFFFFFFFFF	ES:Z:Unassigned_NoFeatures	IS:Z:Unassigned_NoFeatures	UX:Z:GTGAGGGTTA	UB:Z:GTGAGGGTTA
A00560:334:HLKGKDSX7:3:1412:5348:2206	163	3	10001	3	8S54M88S	=	10284	331	CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTCACCATTGCGCAATCTGTCTCTTATACACATCTGACGCTGCCGACGACTAGTAGTGAGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAGG	FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FF:FFFFFFF::FFF:FFFFFFFFFFFFF,F,FFFFFFFFFF::F,FFFFFFFFFFF:F:F:FF:FFFF,,,	NH:i:2	HI:i:1	AS:i:100	nM:i:0	BX:Z:CGCAGTTGTTCTAGTAGTGA	BC:Z:CGCAGTTGTTCTAGTAGTGA	QB:Z:FFFFFFFFFFFFFFFFFFFF	QU:Z:FFFFFFFFFF	ES:Z:Unassigned_NoFeatures	IS:Z:Unassigned_NoFeatures	UX:Z:GTGAGGGTTA	UB:Z:GTGAGGGTTA

I really appreciate any help you can provide.
Best

@hxj5
Copy link
Collaborator

hxj5 commented Nov 24, 2023

Hi, thanks for the question.

Cellsnp-lite does not have an option for genotyping a specific region (gene) now. As your BAM is indeed large, mode 2a could be very slow and you may try mode 2b + 1a instead, i.e., calling first in a bulk manner by mode 2b followed by genotyping in mode 1a.

Alternatively, you may also run mode 1a directly if you can prepare a VCF listing every position of that gene, i.e., specify the CHROM, POS, REF as it is and set the ALT as "." in the VCF file (or set both REF and ALT as ".", and then use -f option to extract the reference allele from the FASTA file). Cellsnp-lite would assign the allele with the highest UMI/read counts except REF as the ALT.

Generally, the mode 2b + 1a is recommended. Please check the example of each mode in the manual.

Is your merged BAM file from SMART-seq3? As far as I know, SMART-seq2 will produce one BAM file per cell. The empty output could be due to the unmatched cell tag. The defualt cell tag of cellsnp-lite is "CB", which is missing in the three BAM records you shared. You may set proper cell tag with --cellTAG option, based on your experimental settings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants