-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status of bambu single-cell/spatial #3
Comments
Hi Luuk, You are welcome to download the repo and try it based on the information in the documentation already provided. As it is in a developmental stage it is still a bit unstable however and we have had reports of issues on HPCs. If you do have any issues or if parts of the documentation is unclear please let us know here. Kind Regards, |
Hi Andre Sim, Thanks for your response! I've given it a try and indeed I suspect there is an issue with the parallelization and the HPC system, since I'm getting:
Have you seen this error before? If not, I'm happy to dig a bit deeper into this. Is it possible that you share the updated bambu package? It would be a lot easier for me if I can clone it locally, make changes and then install it from source. EDIT: The error above might actually stem from going out of memory. I'll try and troubleshoot but either way it would be easier with the bambu version locally that you're currently using. Best, |
It was indeed a memory issue. When trying to see if running with
Again, happy to troubleshoot myself and submit pull requests if/when I find fixes to here and the R package github repo |
Hi Luuk, I will have to get back to regarding the R package, as I need to check with a few people first. In the meantime could you please share the command line you used to run the nextflow pipeline for the second example. Can I ask how large your input file is (from fastq or bam?) and how much memory you are providing? In our attempts when not in lowMemory, our largest sample used 62.54gb for 105,064,936 reads (91gb Bam file, 146gb Fastq file). This would greatly increase if running multiple samples if --lowMemory is not on. Regarding your latest issue error message this seems to similar to another issue a user has reported, where the bambu dependencies are not being correctly loaded into the R instance. The rowRanges method is meant to be imported from SummarizedExperiment. Somethings you could try and I would love to hear if this works/doesn't work for you is adding either of these to the nextflow.config process.containerOptions = "--no-home" env { Both of these are meant to stop local R package lists from being used. I hope this helps, and I appreciate you reporting these errors and your troubleshooting attempts. Always happy to receive more so that we can get this ready for the full release. Kind Regards, |
Hi Andre Sim, Thanks for getting back to me and thanks for checking if it's possible to get access to the R package. Of course I totally understand that you need to check first if it's alright. The first error message was due to the Regarding the second error. I just tried to include your recommended settings in the nextflow.config. However, this gave the same error. It didn't matter if I only included the containeroptions, the R env settings, or both. Regarding the size of my input bamfile, it's relatively small (25GB). However, the memory issues stem from the fact that this comes from a custom spatial pipeline and we have millions of spatial barcodes (most with just a few reads). Therefore, most tools I've tried simply run out of memory, even with 2TB of memory available. Many thanks again for helping. Best, |
Hi Luuk, Since the first attempt (without --lowMemory) ran through bambu_discovery, did you get some of the early output in the output directory. Namely I am interested in readClassFile.rds and quantData.rds. How big are these files, and would you be willing to send them to me so that I could try run the quantification step? (I would also need extendedAnnotations.rds) Can you share with me the command used to run the pipeline and how many cores you are using? I have never run bambu_quant with > 100000 barcodes, it is however a theoretically light memory step, each core runs 1 barcode. We did however run into issues where R would duplicate the whole environment per core including the whole quantData object which I am imaginging with millions of barcodes would be much larger than in my tests. A temporary fix could be to reduce the number of cores, even to 1. This will be slow but hopefully should run. Regarding the second error (with --lowMemory), I am sorry adding those lines did not solve it. In the work directory should be the singularity image
This will help me determine if its the image or a local environment issue. Thanks, |
Hi Andre, For your first point. It did indeed run through bambu_discovery without any issues. The readClassFile.rds and quantData.rds are ~800Mb and 1GB, respectively. I've tried running the pipeline with only 12 cores (hoping that would take less memory) and that in the end used 1.67TB of memory and finally didn't finish because I requested 4 days of running time on the HPC and it timed out in the end. You can download the files here (let me know if there are any permission issues): The exact command for the 'non-lowmem' mode was:
The command with low-mem was the same but with Finally, regarding the rowRanges() issue. Following your instructions I get the following error message: So it does indeed look like it's potentially an issue on my end (?). Edit: Although it doesn't recognize other packages that I've installed and it is executed in the container. Going into the container, binding the required files, and then trying to run the Many thanks again, |
Hi Luuk, Thanks I have received the files. I will try find time to run these soon to figure out why the memory usage is ballooning so much. Regarding the lowMemory error. This is very strange, just to clarify, when you used the instructions I sent you received this message?
I am wondering if it has something to do with sending processes to different nodes on specific HPC configurations and they do not have access to the singularity environment. You can set samples to something like samples[1:24] so that it does not take long. If this doesn't error out that at least gives me a big clue. however if it still returns the same original error but Thanks for your continued patience while we resolve this! Kind Regards, |
Hi Andre, Sorry for the continuous spam and I really appreciate you helping! I think I figured out the issue with
Now it ran completely through with lowmem mode, untill the quantification step. Where it is still either a memory or time issue. Many thanks again. Cheers, |
Ah great so it was a simple fix! Great that you found that. I was working on a reduction in the memory today and hope to have a branch you can test soon. Essentially it will split the quantData and run those in chunks, to not overload the memory all at once, at the cost of some IO and some runtime. Kind Regards, |
Hi Andre, Perfect, that would be great to test. Regarding the read counts per barcode. I think in my specific case I can not filter out these reads because at a later step I would bin multiple barcodes together. Perhaps I could do the binning before but I'd need to look into this. Let me know if the branch is ready for testing! Cheers, |
Hi Luuk, If you would like to test out the If you have the option to, I would recommend combining the barcodes together where appropriate (maybe you know which spots belong together via spatial coordinates?). This is beneficial for many reasons, reducing memory and speed problems being a part of that. Ultimately I want to add user defined, barcode combining, as at input parameter, but that will come later. Hope this helps, |
Hi Andre, Thanks a lot! I'm definitely trying it out this week. Just a quick question to be sure. Is the following line correct:
I would assume it should also get the subset of the incompatibleCountMatrix and not the full one?
Thanks again! |
Yes that is correct I overlooked that, thanks for spotting it! |
Hi Luuk, just wanted to check how you were going with this? |
Hi Andre, Sorry for not updating! Everything seemed to work fine, the chunking and processing of the chunks did not have any issues. However, the processing time of the large amounts of chunks is very long. My compute cluster has a limit of 7 days and it would easily go over. Now I can resume the workflow without an issue but in a practise this would not be feasible for continuous processing of samples. My apologies that I totally forgot to update you on this. However, if you have other suggestions or any (upcoming) updates, I am more than happy to test it out on my end and report back. Best, |
Hi Luuk, Great that it works, but I am not happy with the run time. Let me know if you have any questions and how it goes. Kind Regards, Here is an example on test data
|
Hi!
Amazing effort to also make a single-cell/spatial variant of Bambu, really excited about this and looking forward to testing it.
I was wondering what the status of this is, is it already mature enough for some testing. Won't be for anything publication level (yet) just some initial tests and checking quality of our sequencing.
Many thanks,
Luuk
The text was updated successfully, but these errors were encountered: