Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bowtie2 subprocess using more CPU cores than allowed by -j option #365

Closed
leowill01 opened this issue Jan 24, 2024 · 4 comments
Closed

bowtie2 subprocess using more CPU cores than allowed by -j option #365

leowill01 opened this issue Jan 24, 2024 · 4 comments

Comments

@leowill01
Copy link

leowill01 commented Jan 24, 2024

I'm running multiple calls to breseq using GNU parallel but i only allow each breseq call to use 1 cpu core with -j 1.
however when looking at my process monitor i see that whenever breseq calls a subprocess step for bowtie2 it uses more than 1 cpu core:

Screenshot 2024-01-24 at 4 56 39 PM

ive logged this as bowtie2 using 200% CPU (ie 2 cores) when -j 1 and 300% cpu (3 cores) when -j 2.
interestingly, in the breseq output, it shows that every call to bowtie2 is called with -p 1 so im not sure why it would be trying to use more than 1 core.

this is causing problems when trying to efficiently schedule cores/job using parallel with my scripts because i assume that 1 core = 1 job, however when breseq/bowtie2 uses more than 1 core, this has been causing problems with CPU overhead and clogging up the threads.

anyone come across this before?

@jeffreybarrick
Copy link
Contributor

I haven't noticed this, but I also haven't paid close attention.

It seems like there might be some discussion of something related over on the bowtie2 issues...

BenLangmead/bowtie2#62

@leowill01
Copy link
Author

thanks for the find! seems that issue has been open quite a while. have there ever been any plans to incorporate a choice for the aligner (eg opting to use bwa-mem2 instead of bowtie2)?
ill keep testing to see if its a problem stemming from elsewhere like within parallel.

@jeffreybarrick
Copy link
Contributor

It would be very difficult to substitute another aligner and get full breseq functionality.

In particular, the junction prediction steps require finding split read matches and breseq tracks all equivalent locations to which a read aligns. Not all aligners are good at doing these things. Most are optimized for finding the bast match and/or randomly assign a read to one equivalent location.

There is an option to use your own aligned SAM files of reads as input to breseq(--aligned-sam), in which case it will skip the alignment steps. But, it can't call JC evidence in this case, so you might as well use any other SNP / small indel calling program in this case. So, I wouldn't recommend going down that road.

$ breseq -h
...
 --aligned-sam                     Input files are aligned SAM files, rather than FASTQ
                                   files. Junction prediction steps will be skipped. Be
                                   aware that breseq assumes: (1) Your SAM file is
                                   sorted such that all alignments for a given read are
                                   on consecutive lines. You can use 'samtools sort -n'
                                   if you are not sure that this is true for the output
                                   of your alignment program. (2) You EITHER have
                                   alignment scores as additional SAM fields with the
                                   form 'AS:i:n', where n is a positive integer and
                                   higher values indicate a better alignment OR it
                                   defaults to calculating an alignment score that is
                                   equal to the number of bases in the read minus the
                                   number of inserted bases, deleted bases, and soft
                                   clipped bases in the alignment to the reference. The
                                   default highly penalizes split-read matches (with
                                   CIGAR strings such as M35D303M65).

I would have thought that disk read/write would be more limiting if you launch many breseq runs that hit the bowtie2 alignment step at the same time.

@leowill01
Copy link
Author

opened issue for bowtie2 here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants