Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip: user_assembly.prodigal/mets_full/diamond/*.out: No such file or directory #302

Closed
luciazifcakova opened this issue Oct 30, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@luciazifcakova
Copy link

luciazifcakova commented Oct 30, 2024

Description of the bug

even though I run pipeline from writable location on HPC cluster, it still complains about problems with installing Eukulele dependencies... I have checked and diamond is in the container depot.galaxyproject.org-singularity-eukulele-2.0.5--pyh723bec7_0.img (see attached picture).

How can I set up "MPLCONFIGDIR environment variable to a writable directory"? Will this solve the issue?

Command used and terminal output

nextflow run nf-core/metatdenovo -r dev -c /flash/MillerU/Vibrio_first_paper_data/nextflow.config -resume '[soggy_northcutt]' -profile oist --save_trimmed true --assembly /flash/MillerU/Vibrio_first_paper_data/results/spades/transcripts.fasta --skip_eggnog --eukulele_db gtdb --outdir /flash/MillerU/Vibrio_first_paper_data/results/ --input /flash/MillerU/Vibrio_first_paper_data/samples_for_first_vibrio_paper.csv

Error executing process > 'NFCORE_METATDENOVO:METATDENOVO:SUB_EUKULELE:EUKULELE_SEARCH (user_assembly.prodigal)'

Caused by:
Process NFCORE_METATDENOVO:METATDENOVO:SUB_EUKULELE:EUKULELE_SEARCH (user_assembly.prodigal) terminated with an error exit status (1)

Command executed:

rc=0
mkdir contigs
gunzip -c user_assembly.prodigal.faa.gz > ./contigs/proteins.faa
EUKulele
-m mets
--database gtdb
--protein_extension .faa
--reference_dir eukulele
-o user_assembly.prodigal
--CPUs 12
-s
contigs || rc=$?

gzip user_assembly.prodigal/mets_full/diamond/.out
gzip user_assembly.prodigal/taxonomy_counts/
.csv
gzip user_assembly.prodigal/taxonomy_estimation/*.out

cat <<-END_VERSIONS > versions.yml
"NFCORE_METATDENOVO:METATDENOVO:SUB_EUKULELE:EUKULELE_SEARCH":
eukulele: $(echo $(EUKulele --version 2>&1) | sed -n 's/.* ([0-9]+.[0-9]+.[0-9]+).*/\1/p')
END_VERSIONS

if [ $rc -le 1 ]; then
exit 0
else
exit $rc;
fi

Command exit status:
1

Command output:
All reference files for GTDB downloaded to eukulele/gtdb
Running EUKulele with command line arguments, as no valid configuration file was provided.
Setting things up...
Could not successfully install all external dependent software.
Check DIAMOND, BLAST, BUSCO, and TransDecoder installation.
['proteins']
Specified reference directory, reference FASTA, and protein map/taxonomy table not found. Using database in location: eukulele/gtdb.
Automatically downloading database gtdb . If you intended to use an existing database folder, be sure a reference FASTA, protein map, and taxonomy table are provided. Check the documentation for details.

Command error:
5900K .......... .......... .......... .......... .......... 72% 24.5M 0s
5950K .......... .......... .......... .......... .......... 73% 34.4M 0s
6000K .......... .......... .......... .......... .......... 74% 18.1M 0s
6050K .......... .......... .......... .......... .......... 74% 35.1M 0s
6100K .......... .......... .......... .......... .......... 75% 772K 0s
6150K .......... .......... .......... .......... .......... 76% 29.2M 0s
6200K .......... .......... .......... .......... .......... 76% 43.1M 0s
6250K .......... .......... .......... .......... .......... 77% 60.4M 0s
6300K .......... .......... .......... .......... .......... 77% 31.3M 0s
6350K .......... .......... .......... .......... .......... 78% 41.1M 0s
6400K .......... .......... .......... .......... .......... 79% 30.4M 0s
6450K .......... .......... .......... .......... .......... 79% 35.7M 0s
6500K .......... .......... .......... .......... .......... 80% 39.3M 0s
6550K .......... .......... .......... .......... .......... 80% 80.5M 0s
6600K .......... .......... .......... .......... .......... 81% 31.4M 0s
6650K .......... .......... .......... .......... .......... 82% 35.4M 0s
6700K .......... .......... .......... .......... .......... 82% 51.5M 0s
6750K .......... .......... .......... .......... .......... 83% 39.6M 0s
6800K .......... .......... .......... .......... .......... 84% 33.2M 0s
6850K .......... .......... .......... .......... .......... 84% 58.2M 0s
6900K .......... .......... .......... .......... .......... 85% 33.3M 0s
6950K .......... .......... .......... .......... .......... 85% 51.1M 0s
7000K .......... .......... .......... .......... .......... 86% 66.1M 0s
7050K .......... .......... .......... .......... .......... 87% 34.5M 0s
7100K .......... .......... .......... .......... .......... 87% 34.8M 0s
7150K .......... .......... .......... .......... .......... 88% 42.5M 0s
7200K .......... .......... .......... .......... .......... 88% 50.6M 0s
7250K .......... .......... .......... .......... .......... 89% 64.2M 0s
7300K .......... .......... .......... .......... .......... 90% 35.7M 0s
7350K .......... .......... .......... .......... .......... 90% 37.8M 0s
7400K .......... .......... .......... .......... .......... 91% 40.2M 0s
7450K .......... .......... .......... .......... .......... 92% 43.3M 0s
7500K .......... .......... .......... .......... .......... 92% 93.6M 0s
7550K .......... .......... .......... .......... .......... 93% 35.3M 0s
7600K .......... .......... .......... .......... .......... 93% 36.4M 0s
7650K .......... .......... .......... .......... .......... 94% 42.4M 0s
7700K .......... .......... .......... .......... .......... 95% 47.2M 0s
7750K .......... .......... .......... .......... .......... 95% 60.5M 0s
7800K .......... .......... .......... .......... .......... 96% 56.4M 0s
7850K .......... .......... .......... .......... .......... 96% 36.5M 0s
7900K .......... .......... .......... .......... .......... 97% 43.9M 0s
7950K .......... .......... .......... .......... .......... 98% 50.7M 0s
8000K .......... .......... .......... .......... .......... 98% 68.0M 0s
8050K .......... .......... .......... .......... .......... 99% 44.1M 0s
8100K .......... .......... .......... .......... .......... 99% 24.5M 0s
8150K 100% 1.76T=1.4s

2024-10-30 16:16:54 (5.74 MB/s) - ‘eukulele/gtdb/taxonomy-table.txt’ saved [8346567/8346567]

gzip: user_assembly.prodigal/mets_full/diamond/*.out: No such file or directory

Work dir:
/flash/MillerU/Vibrio_first_paper_data/work/dd/fdbeda7dbd16acff65d2300e339a5d

Container:
/flash/MillerU/Vibrio_first_paper_data/work/singularity/depot.galaxyproject.org-singularity-eukulele-2.0.5--pyh723bec7_0.img

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

Relevant files

.command.err file:

Matplotlib created a temporary config/cache directory at /scratch/matplotlib-m8atu3f8 because the default path (/home/l/lucia-zifcakova/.config/matplotlib) is not a writable directory; it is highly recommended to set t
he MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
--2024-10-30 15:50:07-- https://www.dropbox.com/s/dh839ah2hu0m2r4/reference.pep.fa.gz?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.80.18, 2620:100:6035:18::a27d:5512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.80.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/qg9klsgyas9q7goc816n7/reference.pep.fa.gz?rlkey=v1l0emh7u68afz5apd0yw74wj&dl=1 [following]
--2024-10-30 15:50:07-- https://www.dropbox.com/scl/fi/qg9klsgyas9q7goc816n7/reference.pep.fa.gz?rlkey=v1l0emh7u68afz5apd0yw74wj&dl=1
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com/cd/0/inline/CdYlIGQAsYZIuMKiX3Rxd5O7vj6jDHu4WeUhTcXg9_wiemd1IogI53uFWRV6RAzldGm9Ro2OaNz_EAb9O9MX4i6mq2Qw1C0MyxbhsuGpNNmySpy3tYsgHi7lm6vG-bSoreHRy
IY_6ve8AzvHHdvPNDU7/file?dl=1# [following]
--2024-10-30 15:50:08-- https://ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com/cd/0/inline/CdYlIGQAsYZIuMKiX3Rxd5O7vj6jDHu4WeUhTcXg9_wiemd1IogI53uFWRV6RAzldGm9Ro2OaNz_EAb9O9MX4i6mq2Qw1C0MyxbhsuGpNNmySpy3tYsgHi
7lm6vG-bSoreHRyIY_6ve8AzvHHdvPNDU7/file?dl=1
Resolving ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com (ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com)... 162.125.80.15, 2620:100:6035:15::a27d:550f
Connecting to ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com (ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com)|162.125.80.15|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /cd/0/inline2/CdY63h-sUe-n6-gdA3xNrO0_dNRqWPRdGKaMqbW6x9Og961Z9mtFv2dgSCt-d2LLiYw1TRwhzK69k5l4cuYZV4eckCkpxSHt5g5z-M29wOP64Liqr9CIqkbzbL4vnGFn7OksrLkmksY_FX815U8lw-FiljbhSRy6qQVsv5EOaC8XZR6oPzl8_9ggdsafA4vsR7
QXNs6PCsUvjl7_8_T7aock4dQyIS8HAw92bW2yplcU5Yu88gDxBYUjvroWa_B2r-5eZ6MF60OF2S5awuxj7tpgt6bruv8zIFzXyl6VSpLjPqZuSpKbKvTSRyPsDMkGpVPC-qbDKlPXas2sI7fiE8RADrAuRNnKcevjQREPk9DqKGh4wawEV8T6TJWHg_Xk7Zo/file?dl=1 [following]
--2024-10-30 15:50:10-- https://ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com/cd/0/inline2/CdY63h-sUe-n6-gdA3xNrO0_dNRqWPRdGKaMqbW6x9Og961Z9mtFv2dgSCt-d2LLiYw1TRwhzK69k5l4cuYZV4eckCkpxSHt5g5z-M29wOP64Liqr9CIq
kbzbL4vnGFn7OksrLkmksY_FX815U8lw-FiljbhSRy6qQVsv5EOaC8XZR6oPzl8_9ggdsafA4vsR7QXNs6PCsUvjl7_8_T7aock4dQyIS8HAw92bW2yplcU5Yu88gDxBYUjvroWa_B2r-5eZ6MF60OF2S5awuxj7tpgt6bruv8zIFzXyl6VSpLjPqZuSpKbKvTSRyPsDMkGpVPC-qbDKlPXas2
sI7fiE8RADrAuRNnKcevjQREPk9DqKGh4wawEV8T6TJWHg_Xk7Zo/file?dl=1
Reusing existing connection to ucf741995026801e6bf2d677002c.dl.dropboxusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 40167558722 (37G) [application/binary]
Saving to: ‘eukulele/gtdb/reference.pep.fa’

System information

Image

N E X T F L O W ~ version 24.10.0
HPC
slurm
Singularity
CentOS Linux
metatdenovo dev

@luciazifcakova luciazifcakova added the bug Something isn't working label Oct 30, 2024
@erikrikarddaniel
Copy link
Member

Description of the bug

even though I run pipeline from writable location on HPC cluster, it still complains about problems with installing Eukulele dependencies... I have checked and diamond is in the container depot.galaxyproject.org-singularity-eukulele-2.0.5--pyh723bec7_0.img (see attached picture).

I think those are warnings that we also see and that can be ignored. (I think it's tools it looks for but doesn't use when not found.)

How can I set up "MPLCONFIGDIR environment variable to a writable directory"? Will this solve the issue?

Command used and terminal output

nextflow run nf-core/metatdenovo -r dev -c /flash/MillerU/Vibrio_first_paper_data/nextflow.config -resume '[soggy_northcutt]' -profile oist --save_trimmed true --assembly /flash/MillerU/Vibrio_first_paper_data/results/spades/transcripts.fasta --skip_eggnog --eukulele_db gtdb --outdir /flash/MillerU/Vibrio_first_paper_data/results/ --input /flash/MillerU/Vibrio_first_paper_data/samples_for_first_vibrio_paper.csv

Error executing process > 'NFCORE_METATDENOVO:METATDENOVO:SUB_EUKULELE:EUKULELE_SEARCH (user_assembly.prodigal)'

Caused by: Process NFCORE_METATDENOVO:METATDENOVO:SUB_EUKULELE:EUKULELE_SEARCH (user_assembly.prodigal) terminated with an error exit status (1)

Command executed:

rc=0 mkdir contigs gunzip -c user_assembly.prodigal.faa.gz > ./contigs/proteins.faa EUKulele -m mets --database gtdb --protein_extension .faa --reference_dir eukulele -o user_assembly.prodigal --CPUs 12 -s contigs || rc=$?

gzip user_assembly.prodigal/mets_full/diamond/.out gzip user_assembly.prodigal/taxonomy_counts/.csv gzip user_assembly.prodigal/taxonomy_estimation/*.out

My guess is that the download or the subsequent creation of the gtdb database failed. The default for the pipeline is to download to a directory called eukulele in the directory from which you executed the pipeline. Do you have that? Does it contain a subdirectory called gtdb? Does that look something like this:

drwxrwsr-x 2 danil snic2020-16-76        4096  9 jan  2024 diamond
-rw-rw-r-- 1 danil snic2020-16-76  1484155587  9 jan  2024 prot-map.json
-rw-rw-r-- 1 danil snic2020-16-76 21228445697  9 jan  2024 reference.pep.fa
-rw-rw-r-- 1 danil snic2020-16-76     8346567  9 jan  2024 taxonomy-table.txt
-rw-rw-r-- 1 danil snic2020-16-76     8729080  9 jan  2024 tax-table.txt

[...]

@luciazifcakova
Copy link
Author

luciazifcakova commented Oct 31, 2024

Image
Hi Erik, thank you for the reply. I have made eukulele directory in the directory when I run pipeline. I had gtdb dir inside eukulele dir and there were only two files there (see attached figure)....

Can you please provide links to download diamon, prot-map.json manually?

@luciazifcakova
Copy link
Author

luciazifcakova commented Oct 31, 2024

I tried to download tax-table.txt manually and it worked. The resulting file has 8MB.

[lucia-zifcakova@deigo-login2 eukulele]$ wget https://www.dropbox.com/s/3vwo1r7cbm3tn35/tax-table.txt?dl=1Time finished was 2024-10-28 15:22:00.853667
--2024-10-31 11:35:42-- https://www.dropbox.com/s/3vwo1r7cbm3tn35/tax-table.txt?dl=1Time
Resolving www.dropbox.com (www.dropbox.com)... 162.125.80.18, 2620:100:6035:18::a27d:5512
Connecting to www.dropbox.com (www.dropbox.com)|162.125.80.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/vcagwiqt2e3ccg4ywoho5/tax-table.txt?rlkey=q3eocfwnn37zg5sxsijld5yu3&dl=1Time [following]
--2024-10-31 11:35:42-- https://www.dropbox.com/scl/fi/vcagwiqt2e3ccg4ywoho5/tax-table.txt?rlkey=q3eocfwnn37zg5sxsijld5yu3&dl=1Time
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc65e568ad59b11089f1ecf5698b.dl.dropboxusercontent.com/cd/0/inline/CdcMrFyWwncjycL-9FPaWTqZ3zSWgxdDQhu2IWr_DnK-UVxenSjhOcViK9dM5qiRpwq0ULYZM9jNgrgGvB9EuhqL1mx5GMEptUcJVTSdGmdAFvXckyVqZu7Tg_llTvGjbPAAqbyKWfVFXb8Jew9GeBtm/file# [following]
--2024-10-31 11:35:43-- https://uc65e568ad59b11089f1ecf5698b.dl.dropboxusercontent.com/cd/0/inline/CdcMrFyWwncjycL-9FPaWTqZ3zSWgxdDQhu2IWr_DnK-UVxenSjhOcViK9dM5qiRpwq0ULYZM9jNgrgGvB9EuhqL1mx5GMEptUcJVTSdGmdAFvXckyVqZu7Tg_llTvGjbPAAqbyKWfVFXb8Jew9GeBtm/file
Resolving uc65e568ad59b11089f1ecf5698b.dl.dropboxusercontent.com (uc65e568ad59b11089f1ecf5698b.dl.dropboxusercontent.com)... 162.125.80.15, 2620:100:6035:15::a27d:550f
Connecting to uc65e568ad59b11089f1ecf5698b.dl.dropboxusercontent.com (uc65e568ad59b11089f1ecf5698b.dl.dropboxusercontent.com)|162.125.80.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8346567 (8.0M) [text/plain]
Saving to: ‘tax-table.txt?dl=1Time.1’

tax-table.txt?dl=1Time.1 100%[============================================================================>] 7.96M 28.3MB/s in 0.3s

2024-10-31 11:35:45 (28.3 MB/s) - ‘tax-table.txt?dl=1Time.1’ saved [8346567/8346567]

--2024-10-31 11:35:45-- http://finished/
Resolving finished (finished)... failed: Name or service not known.
wget: unable to resolve host address ‘finished’
--2024-10-31 11:35:45-- http://was/
Resolving was (was)... failed: Name or service not known.
wget: unable to resolve host address ‘was’
--2024-10-31 11:35:45-- http://2024-10-28/
Resolving 2024-10-28 (2024-10-28)... failed: Name or service not known.
wget: unable to resolve host address ‘2024-10-28’
--2024-10-31 11:35:45-- ftp://15/22:00.853667
=> ‘22:00.853667’
Resolving 15 (15)... 0.0.0.15
Connecting to 15 (15)|0.0.0.15|:21... failed: Invalid argument.
FINISHED --2024-10-31 11:35:45--
Total wall clock time: 3.0s
Downloaded: 1 files, 8.0M in 0.3s (28.3 MB/s)

@luciazifcakova
Copy link
Author

I have found link for reference.pep.fa.gz file in work directory where the eukulele tmp files were dumped. I have used the first link from these links that were there:
https://www.dropbox.com/s/dh839ah2hu0m2r4/reference.pep.fa.gz?dl=1
https://www.dropbox.com/scl/fi/qg9klsgyas9q7goc816n7/reference.pep.fa.gz?rlkey=v1l0emh7u68afz5apd0yw74wj&dl=1

Resulting .fa.gz file was 38GB big but when I unzip it it was 97GB big. The file downloaded and unzipped by pipeline was 38GB big after unzipping... seems like unzipping is not working and just the file name is changed... but file is not unzipped.

I still have no idea where to get prot-map.json and diamond files from... Can you advise me on that?
I have found bugs in eukulele script that downloads diamond and blast https://github.com/AlexanderLabWHOI/EUKulele/blob/master/scripts/install_dependencies.sh, so maybe that can be an issue?

@luciazifcakova
Copy link
Author

I have run Eukulele within the singularity container provided by pipeline and it seems like URI::Escape is missing in its perl installation, which is causing problems with Transcoder.

This is how I run Eukulele from inside the container:
srun -p short -t 0-2 -c 20 --mem=100G --pty bash
bash-4.4$ singularity shell /flash/MillerU/Vibrio_first_paper_data/work/singularity/depot.galaxyproject.org-singularity-eukulele-2.0.5--pyh723bec7_0.img

EUKulele
-m mets
--database gtdb
--protein_extension .faa
--reference_dir eukulele
-o user_assembly.prodigal
--CPUs 20
-s /flash/MillerU/Vibrio_first_paper_data/work/dd/fdbeda7dbd16acff65d2300e339a5d/contigs

and this is error message I received:

2024-11-01 09:04:08 (20.9 MB/s) - ‘TransDecoder-v5.5.0.tar.gz’ saved [15748671/15748671]

Can't locate URI/Escape.pm in @inc (you may need to install the URI::Escape module) (@inc contains: /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib /usr/lib64/perl5/lib /usr/local/lib/site_perl/5.26.2/x86_64-linux-thread-multi /usr/local/lib/site_perl/5.26.2 /usr/local/lib/5.26.2/x86_64-linux-thread-multi /usr/local/lib/5.26.2 .) at /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib/Gene_obj.pm line 15.
BEGIN failed--compilation aborted at /flash/MillerU/Vibrio_first_paper_data/references_bins/TransDecoder/PerlLib/Gene_obj.pm line 15.
Compilation failed in require at references_bins/TransDecoder/TransDecoder.Predict line 17.
BEGIN failed--compilation aborted at references_bins/TransDecoder/TransDecoder.Predict line 17.

and this is how I checked for perl URI::Escape:

perl -MURI::Escape -e 'print "URI::Escape is installed\n"'

Can't locate URI/Escape.pm in @inc (you may need to install the URI::Escape module) (@inc contains: /usr/lib64/perl5/lib /usr/local/lib/site_perl/5.26.2/x86_64-linux-thread-multi /usr/local/lib/site_perl/5.26.2 /usr/local/lib/5.26.2/x86_64-linux-thread-multi /usr/local/lib/5.26.2 .).
BEGIN failed--compilation aborted.

@erikrikarddaniel
Copy link
Member

See my reply in the other issue you opened. I'm closing this as both regard the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants