Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Fix tmp-dirs in some leaky rules #1446

Open
mathiasbio opened this issue Jun 12, 2024 · 1 comment
Open

[Bug] Fix tmp-dirs in some leaky rules #1446

mathiasbio opened this issue Jun 12, 2024 · 1 comment
Labels
Bug Something isn't working

Comments

@mathiasbio
Copy link
Collaborator

Description

At the moment there's a bit of a mix in the use of tmp dirs in the rules in our snakemake workflows. Some rules have this in params:
tmpdir = tempfile.mkdtemp(prefix=tmp_dir)
And this in the command:

mkdir -p {params.tmpdir};
export TMPDIR={params.tmpdir};

Other rules don't have anything regarding tmpdirs, such as:

rule cadd_annotate_somaticINDEL_research:
  input:
    vcf_indel_research = vcf_dir + "SNV.somatic.{case_name}.{var_caller}.indel.research.vcf.gz",
  output:
    cadd_indel_research = vep_dir + "SNV.somatic.{case_name}.{var_caller}.cadd_indel.research.tsv.gz",
  benchmark:
    Path(benchmark_dir, "vep_somatic_research_snv.{case_name}.{var_caller}.tsv").as_posix()
  singularity:
    Path(singularity_image, config["bioinfo_tools"].get("cadd") + ".sif").as_posix()
  params:
    message_text = "SNV.somatic.{case_name}.{var_caller}.research.vcf.gz",
  threads:
    get_threads(cluster_config, "cadd_annotate_somaticINDEL_research")
  message:
    "Running cadd annotation for INDELs on {params.message_text}"
  shell:
        """
CADD.sh -g GRCh37 -o {output.cadd_indel_research} {input.vcf_indel_research}
        """

Which failed in production due to too full tmp on the worknode compute-0-27 (which was close to full due to a leaky rule in my development branch which saved a bunch of inprogress bamfile-chunks in there)

We should try to find the correct way to assign tmpdirs and make this consistent across all of our rules

How to reproduce

No response

Expected behaviour

No response

Anything else?

No response

Pipeline version

15.0.0

@mathiasbio mathiasbio added the Bug Something isn't working label Jun 12, 2024
@github-project-automation github-project-automation bot moved this to Todo in BALSAMIC Jun 12, 2024
@mathiasbio
Copy link
Collaborator Author

mathiasbio commented Jun 12, 2024

From Eva 2024-06-14:
Another leaky rule seems to be picard_umiaware:

OpenJDK 64-Bit Server VM warning: Insufficient space for shared memory file:
   75256
Try using the -Djava.io.tmpdir= option to select an alternate temp location.

Alright, last one. Most failed jobs are either due to picard_umiaware or to cadd_annotate. But there is also one failing job (also in comp 27) failing on BALSAMIC.bettercrappie.cnvkit_segment_CNV_research.142.sh_6710424.err
With similar errors, so this might also be another rule to look at:

RuntimeError: Subprocess command failed:
$ Rscript --no-restore --no-environ /var/tmp/tmpdqpd9oxy

b"Fatal error: cannot create 'R_TempDir'\n" 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

1 participant