You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running on a large local database with default parameters or with --filter-msa 0, occurs above ~2000 sequences without filtering.
Current Behavior
When performing the alignment, during the merging step all available memory (even when running on 100gb RAM workstation) gets consumed until the OS kills the process and the output "Killed" appears.
Steps to Reproduce (for bugs)
Running on any sufficiently large database of sequences with the default --max-seq-len parameter
Foldssek Output (for bugs)
During merging step this is how it terminates (example):
0 0 A0A7G8BFY0.pdb A0A7I8DMP5.pdb 689 (TM-align)
0 0 A0A554IY00.pdb Q2S4W8.pdb 824 (TM-align)
0 0 A0A1Q7IQS0.pdb A0A7K0WNT6.pdb 733
Killed
Context
My current solution is to change the --max-seq-len parameter to something on the order of several thousand instead of the default 65K, which keeps the memory blowup within my hardware limits for the number of sequences I'm using. Most of my sequences are on the order of a few hundred amino acids so the total alignment length is under 2000 in length.
I wonder if there is a way to fix it so structuremsa stops storing everything in RAM while merging, or a simpler improvement might be to trim the length of stored sequences based on the input.
Your Environment
I've primarily been running on a personal computer with 32GB ram on WSL 2, but the bug occurs similarly on a workstation with 128GB ram running Ubuntu.
The text was updated successfully, but these errors were encountered:
Hi @hughhigin, I just pushed some changes which should significantly improve memory usage. Could you try again with the latest version and see if it works?
@gamcil yes it looks like the changes allowed structuremsa to complete even on the full set of 23000 proteins!
I did get an error running msa2lddt afterwards (part of easy-msa) that might be useful to know about so I copied the output here. I think for my purposes at the moment it's not an issue since I'm focused on analyzing the 3Di alignment but let me know if you'd like me to dig into this part more.
Expected Behavior
Running on a large local database with default parameters or with --filter-msa 0, occurs above ~2000 sequences without filtering.
Current Behavior
When performing the alignment, during the merging step all available memory (even when running on 100gb RAM workstation) gets consumed until the OS kills the process and the output "Killed" appears.
Steps to Reproduce (for bugs)
Running on any sufficiently large database of sequences with the default --max-seq-len parameter
Foldssek Output (for bugs)
During merging step this is how it terminates (example):
0 0 A0A7G8BFY0.pdb A0A7I8DMP5.pdb 689 (TM-align)
0 0 A0A554IY00.pdb Q2S4W8.pdb 824 (TM-align)
0 0 A0A1Q7IQS0.pdb A0A7K0WNT6.pdb 733
Killed
Context
My current solution is to change the --max-seq-len parameter to something on the order of several thousand instead of the default 65K, which keeps the memory blowup within my hardware limits for the number of sequences I'm using. Most of my sequences are on the order of a few hundred amino acids so the total alignment length is under 2000 in length.
I wonder if there is a way to fix it so structuremsa stops storing everything in RAM while merging, or a simpler improvement might be to trim the length of stored sequences based on the input.
Your Environment
I've primarily been running on a personal computer with 32GB ram on WSL 2, but the bug occurs similarly on a workstation with 128GB ram running Ubuntu.
The text was updated successfully, but these errors were encountered: