Memory usage with huge datasets #87

maximilian-heeg · 2023-08-08T23:09:18Z

Hi,

Thank you so much for providing Baysor. I recently updated my installation to version 0.6.2, and it is running great with Julia 1.9.

In our lab, we have recently generated new (huge) spatial datasets with up to 250 million transcripts (using a 500 gene panel), and we were planning to use Baysor for cell segmentation. I was expecting that this requires a lot of memory, so I did some benchmarking with smaller FOVs of the dataset (see below).

It seems to that, that memory use scales linear with the number of transcripts. Extrapolating this, I would assume that our dataset with 250 million transcripts requires approximately 5-6 TB memory (which I unfortunately don't even have on our HPC).

Are there any solutions to that? Is there an easy way of creating smaller tiles and stitching them back together? I think, with the increasing panel sizes and imaging areas of commercial solutions, this might become an important limitation for many users soon.

Any help/ideas/suggestions are greatly appreciated.

Max

sebgoti · 2023-08-15T10:03:09Z

A bit unrelated question to the issue but may I ask @maximilian-heeg how is your lab using Baysor on the HPC? I am trying to run it using Singularity to avoid installing things at the HPC level, so far not lucky (even though the docker container works). Thanks and sorry for any spam to your issue!

VPetukhov · 2023-08-15T10:56:10Z

@maximilian-heeg , thank you for this test! It's indeed a problem. We're working on memory optimizations for v0.7.0, and if it works as expected, it should drastically reduce the memory size (10 folds or so).

As for tiling, we also plan to add this graph cut idea, but it's not there yet. So the only thing you could atm is manually split the data by FOVs.

VPetukhov · 2023-08-15T10:57:37Z

@sebgoti , a short answer: I didn't try Baysor with Singularity. We have our lab servers, which are just big singular machines, so no clusters. If you need some input on your situation, I'd be happy to continue the discussion in a separate issue.

maximilian-heeg · 2023-08-15T18:21:44Z

@sebgoti I have tried to run the Docker container using Singularity, but that did not work for me on the HPC. I ended up installing juliaup in a conda environment and then building baysor as described in the Readme. Viel Erfolg!

@VPetukhov Thank you so much for the answer and your work on this. For us, getting a good segmentation is currently the bottleneck of processing spatial data. I will try to split it into multiple FOVs.

cbiagii · 2023-08-28T14:09:38Z

@VPetukhov, you say split the data by FOVs using the fov_name column of the transcripts.csv.gz file?

mjleone · 2024-01-31T14:31:05Z

@VPetukhov Hello, I and members of my lab are also very curious about if the new release is still in progress, and expected release time if you know. We are working with data between 10 million and 25 million transcripts, and hard to use current version with our resources

VPetukhov mentioned this issue Oct 3, 2024

Error running baysor preview - raises MethodError. #138

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory usage with huge datasets #87

Memory usage with huge datasets #87

maximilian-heeg commented Aug 8, 2023

sebgoti commented Aug 15, 2023

VPetukhov commented Aug 15, 2023

VPetukhov commented Aug 15, 2023

maximilian-heeg commented Aug 15, 2023

cbiagii commented Aug 28, 2023

mjleone commented Jan 31, 2024

Memory usage with huge datasets #87

Memory usage with huge datasets #87

Comments

maximilian-heeg commented Aug 8, 2023

sebgoti commented Aug 15, 2023

VPetukhov commented Aug 15, 2023

VPetukhov commented Aug 15, 2023

maximilian-heeg commented Aug 15, 2023

cbiagii commented Aug 28, 2023

mjleone commented Jan 31, 2024