You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problem1: I/O overloaded for metagenomes with hundreds of thousands of contigs.
Usually, only a small subset of the contigs have integron in them. However, integron_finder creates a separate tmp file for each contig, regardless of whether it finds integron in them.
So, that means hundreds of thousands of files all in one folder at the end of the program. And when the script tries to coalesce these files, there was an I/O overload, which prevents other users from doing basic commands like ls/cd/rm.
My temporary solution (included in the script attached):
I looked into /integron_finder/scripts/finder.py, and modify the script such that it does not create tmp files for a contig unless it finds integron in it.
Problem2: Program fails to remove tmp folders because the files are still open at the time of deletion
My tmp solution:
execute the remove command at the very end of the for loop, when I am sure that the program has closed all files in the tmp folder. [line 604-615 in my script]
Feel free to laugh at/despite my rudimentary coding skill!
@bneron we see this as well with large Galaxy input files. Million of files in one directory are hard to handle for any filesystem.
Is there any way we could do less io in the first place? Cache the results in memory and only write if they are any or enough to write them out?
Hi! I am actually having similar problems as this issue for metagenomic contigs. Currently Integron_Finder generates summary and output files even when there are no Integrons found, and is unable to remove the tmp directories, resulting in thousands of files and directories.I tried JuntaoZhong's solution, but I assume the script was written for an older version of Integron_Finder and wasn't able to make it work out for my use case
Version of Integron_Finder:
version 2
OS
Linux
Problem1: I/O overloaded for metagenomes with hundreds of thousands of contigs.
Usually, only a small subset of the contigs have integron in them. However, integron_finder creates a separate tmp file for each contig, regardless of whether it finds integron in them.
So, that means hundreds of thousands of files all in one folder at the end of the program. And when the script tries to coalesce these files, there was an I/O overload, which prevents other users from doing basic commands like ls/cd/rm.
My temporary solution (included in the script attached):
I looked into /integron_finder/scripts/finder.py, and modify the script such that it does not create tmp files for a contig unless it finds integron in it.
Problem2: Program fails to remove tmp folders because the files are still open at the time of deletion
My tmp solution:
execute the remove command at the very end of the for loop, when I am sure that the program has closed all files in the tmp folder. [line 604-615 in my script]
Feel free to laugh at/despite my rudimentary coding skill!
finder_jimmy.py.txt
The text was updated successfully, but these errors were encountered: