Deal properly with download of data #168

maxulysse · 2023-08-24T14:19:18Z

Description of the bug

I think it's great to have possibility to automatically download data within the pipeline, but I would have that as optional and add the possibility to provide already downloaded data.
We already talked about this on Slack, and I'd be happy to help find a smart solution to deal with it.

Command used and terminal output

No response

Relevant files

No response

System information

No response

erikrikarddaniel · 2023-08-24T14:24:12Z

👍 @maxulysse

danilodileo · 2023-08-25T09:48:30Z

Hello,
to address the issue I am writing here a recap about how the pipeline is dealing with downloading the databases.

In metatdenovo there are three main programs that need a database, EUKulele, EggNOG and KOFAMSCAN. EUKulele and EGGNOG have their build-in tool for downloading it while KOFAMSCAN relies on a wget module . For each module is possible to skip the download step if you already have the database by pointing to the directory with the stored files. This method works quite well but we are still experiencing issues in the downloading steps (when the user doesn't have the databases then the pipeline tries by it's own):

nextflow Tower;
in some clusters wget seems to not working;

There might be other cases but I don't recall them now.

I am sure something is missing in our code: there might be some kind of conflict + we should avoid to be too conservative and save the databases in the work directory and the output directory if necessary.

erikrikarddaniel · 2023-08-29T08:48:02Z

One thought: Eukulele downloads things automatically in search mode too, right? Would it be better to rely on that and scrap the download module? Or, would that make it more difficult to make sure files are available after a run?

tfalkarkea · 2023-08-30T16:46:50Z

Will skip the download modules if the user specifies an available database, and ensure the files are staged properly for their downstream modules too.

erikrikarddaniel · 2023-08-30T17:47:36Z

Just to be clear: When you check if the download module should be called or not, check for the existence of at least one file (see the second if clase in subworkflows/local/eggnog.nf).

For Eukulele this file could e.g. be eukulele/$db/reference.pep.fa (where $db is the name of the database), for kofamscan kofamscan/db/ko_list.

We might want to make larger changes later (see above discussion) but this should get us started.

tfalkarkea · 2023-08-30T17:54:04Z

I'll have to look into doing this outside of the module's context. I think the core of the problem is staging the files in the projectDir isn't helpful in the Tower context, since files are moved around and symlinked automatically. I'll definitely try to test for input file completeness, but will have to think of some solutions here.

erikrikarddaniel · 2023-08-30T17:55:40Z

I think the subworkflows is the best place, just like in the eggnog case, see above.

danilodileo · 2023-10-01T09:57:48Z

For eukuele download we are addressing the issue in PR #190

danilodileo · 2023-10-24T10:12:09Z

Today we agreed on closing this issue.

maxulysse added the bug Something isn't working label Aug 24, 2023

erikrikarddaniel added this to the nf-core candidate milestone Aug 24, 2023

erikrikarddaniel mentioned this issue Aug 30, 2023

Eukulele works poorly with -resume #176

Closed

tfalkarkea self-assigned this Aug 30, 2023

tfalkarkea mentioned this issue Oct 2, 2023

Issue 168 deal properly with download of data #191

Merged

4 tasks

danilodileo mentioned this issue Oct 10, 2023

Deal properly with download of data #192

Closed

danilodileo closed this as completed Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deal properly with download of data #168

Deal properly with download of data #168

maxulysse commented Aug 24, 2023

erikrikarddaniel commented Aug 24, 2023

danilodileo commented Aug 25, 2023

erikrikarddaniel commented Aug 29, 2023

tfalkarkea commented Aug 30, 2023

erikrikarddaniel commented Aug 30, 2023

tfalkarkea commented Aug 30, 2023

erikrikarddaniel commented Aug 30, 2023

danilodileo commented Oct 1, 2023

danilodileo commented Oct 24, 2023

Deal properly with download of data #168

Deal properly with download of data #168

Comments

maxulysse commented Aug 24, 2023

Description of the bug

Command used and terminal output

Relevant files

System information

erikrikarddaniel commented Aug 24, 2023

danilodileo commented Aug 25, 2023

erikrikarddaniel commented Aug 29, 2023

tfalkarkea commented Aug 30, 2023

erikrikarddaniel commented Aug 30, 2023

tfalkarkea commented Aug 30, 2023

erikrikarddaniel commented Aug 30, 2023

danilodileo commented Oct 1, 2023

danilodileo commented Oct 24, 2023