Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integron_Finder bogs down server's I/O when ran on metagenomes with ~million contigs #90

Open
JuntaoZhong opened this issue Jul 24, 2021 · 3 comments
Assignees

Comments

@JuntaoZhong
Copy link

JuntaoZhong commented Jul 24, 2021

Version of Integron_Finder:

version 2

OS

Linux

Problem1: I/O overloaded for metagenomes with hundreds of thousands of contigs.

Usually, only a small subset of the contigs have integron in them. However, integron_finder creates a separate tmp file for each contig, regardless of whether it finds integron in them.

So, that means hundreds of thousands of files all in one folder at the end of the program. And when the script tries to coalesce these files, there was an I/O overload, which prevents other users from doing basic commands like ls/cd/rm.

My temporary solution (included in the script attached):

I looked into /integron_finder/scripts/finder.py, and modify the script such that it does not create tmp files for a contig unless it finds integron in it.

Problem2: Program fails to remove tmp folders because the files are still open at the time of deletion

My tmp solution:

execute the remove command at the very end of the for loop, when I am sure that the program has closed all files in the tmp folder. [line 604-615 in my script]

Feel free to laugh at/despite my rudimentary coding skill!

finder_jimmy.py.txt

@bgruening
Copy link

@bneron we see this as well with large Galaxy input files. Million of files in one directory are hard to handle for any filesystem.
Is there any way we could do less io in the first place? Cache the results in memory and only write if they are any or enough to write them out?

@Matt-BF
Copy link

Matt-BF commented Nov 9, 2023

Hi! I am actually having similar problems as this issue for metagenomic contigs. Currently Integron_Finder generates summary and output files even when there are no Integrons found, and is unable to remove the tmp directories, resulting in thousands of files and directories.I tried JuntaoZhong's solution, but I assume the script was written for an older version of Integron_Finder and wasn't able to make it work out for my use case

@jeanrjc
Copy link
Contributor

jeanrjc commented Nov 27, 2023

Hello, we'll look into that one we have some time. Meanwhile, you can submit pull requests if you can. Thanks for reporting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants