Structuremsa consumes RAM until killed by OS #3

hughhigin · 2024-02-15T15:23:17Z

Expected Behavior

Running on a large local database with default parameters or with --filter-msa 0, occurs above ~2000 sequences without filtering.

Current Behavior

When performing the alignment, during the merging step all available memory (even when running on 100gb RAM workstation) gets consumed until the OS kills the process and the output "Killed" appears.

Steps to Reproduce (for bugs)

Running on any sufficiently large database of sequences with the default --max-seq-len parameter

Foldssek Output (for bugs)

During merging step this is how it terminates (example):
0 0 A0A7G8BFY0.pdb A0A7I8DMP5.pdb 689 (TM-align)
0 0 A0A554IY00.pdb Q2S4W8.pdb 824 (TM-align)
0 0 A0A1Q7IQS0.pdb A0A7K0WNT6.pdb 733
Killed

Context

My current solution is to change the --max-seq-len parameter to something on the order of several thousand instead of the default 65K, which keeps the memory blowup within my hardware limits for the number of sequences I'm using. Most of my sequences are on the order of a few hundred amino acids so the total alignment length is under 2000 in length.

I wonder if there is a way to fix it so structuremsa stops storing everything in RAM while merging, or a simpler improvement might be to trim the length of stored sequences based on the input.

Your Environment

I've primarily been running on a personal computer with 32GB ram on WSL 2, but the bug occurs similarly on a workstation with 128GB ram running Ubuntu.

gamcil · 2024-02-21T07:55:08Z

Hi @hughhigin, I just pushed some changes which should significantly improve memory usage. Could you try again with the latest version and see if it works?

hughhigin · 2024-02-21T18:30:43Z

@gamcil yes it looks like the changes allowed structuremsa to complete even on the full set of 23000 proteins!

I did get an error running msa2lddt afterwards (part of easy-msa) that might be useful to know about so I copied the output here. I think for my purposes at the moment it's not an issue since I'm focused on analyzing the 3Di alignment but let me know if you'd like me to dig into this part more.

msa2lddt bigtmp/18050383238977581431/structures smPGTs_3di_align_all.fa --lddt-html smPGTs_3di_align_all.html --guide-tree smPGTs_3di_align_all.nw --pair-threshold 0 --threads 20 -v 3 --report-command '--match-ratio 0.51 --filter-msa 0 --gap-open aa:10,nucl:10 --gap-extend aa:1,nucl:1 --output-mode 1 '

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted
Error: msa2lddt died

Regardless, thanks for the quick work addressing the issue! I really appreciate it.

Best,
Hugh

gamcil · 2024-02-22T01:42:09Z

Great! Thanks for testing that. Not sure about the msa2lddt issue, will have to have a look into it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structuremsa consumes RAM until killed by OS #3

Structuremsa consumes RAM until killed by OS #3

hughhigin commented Feb 15, 2024

gamcil commented Feb 21, 2024

hughhigin commented Feb 21, 2024

gamcil commented Feb 22, 2024

Structuremsa consumes RAM until killed by OS #3

Structuremsa consumes RAM until killed by OS #3

Comments

hughhigin commented Feb 15, 2024

Expected Behavior

Current Behavior

Steps to Reproduce (for bugs)

Foldssek Output (for bugs)

Context

Your Environment

gamcil commented Feb 21, 2024

hughhigin commented Feb 21, 2024

gamcil commented Feb 22, 2024