Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs for microsalt analyses not tracked correctly #2896

Closed
seallard opened this issue Feb 5, 2024 · 4 comments · Fixed by #2897
Closed

Jobs for microsalt analyses not tracked correctly #2896

seallard opened this issue Feb 5, 2024 · 4 comments · Fixed by #2897
Labels

Comments

@seallard
Copy link
Contributor

seallard commented Feb 5, 2024

Description

For some microsalt analyses, the slurm jobs are not tracked. When checking the path to the job ids file for those analyses, it does not exist, which explains why no jobs show up.

It turns out for the analyses where jobs were being reported (over the past month), the jobs displayed were actually from the previous analysis of the case.

The underlying issue is that microsalt outputs a directory with a timestamp, and it is only created once the analysis is completed. So the pending analysis in trailblazer cannot be provided with the correct path.

Suggested solution

After digging in the microsalt codebase, it was discovered that it attempts to write a job ids file to /microbial/results/reports/trailblazer/<project_id>_slurm_ids.yaml. The trailblazer directory does not exist, so it fails. This is the file we need to use.

  • Create missing slurm job ids directory on hasta /microbial/results/reports/trailblazer
  • Update logic to use this path for the pending analysis in trailblazer to use /microbial/results/reports/trailblazer/<project_id>_slurm_ids.yaml
@seallard
Copy link
Contributor Author

seallard commented Feb 5, 2024

For example, for one case a job ids file path like this is stored:
/home/proj/production/microbial/results/ACC<id>_slurm_ids.yaml
but the correct path is:
/home/proj/production/microbial/results/ACC<id>_2024.2.5_2.22.16/ACC<id>_slurm_ids.yaml

@seallard
Copy link
Contributor Author

seallard commented Feb 5, 2024

Decided fix

Changing the output dir in microsalt is not viable given lack of tests and that the entire pipeline is being replaced. The least error prone path is to revert the old logic and just create the missing directory on Hasta for the slurm job id files.

This is the relevant logic creating the slurm job ids file:

        try:
            #Generates file with all slurm ids
            slurmname = "{}_slurm_ids.yaml".format(self.name)
            slurmreport_storedir = Path(self.config["folders"]["reports"],
                "trailblazer", slurmname)
            slurmreport_workdir = Path(self.finishdir, slurmname)
            yaml.safe_dump(
                data={"jobs": [str(job) for job in joblist]},
                      stream=open(slurmreport_workdir, "w"))
            shutil.copyfile(slurmreport_workdir, slurmreport_storedir)
            self.logger.info(
                "Saved Trailblazer slurm report file to %s and %s",
                slurmreport_storedir,
                slurmreport_workdir,
            )
        except Exception as e:
            self.logger.info("Unable to generate Trailblazer slurm report file")
  1. Create directory on Hasta: /home/proj/production/microbial/results/reports/trailblazer and /home/proj/stage/microbial/results/reports/trailblazer
  2. Update the logic in cg adding the pending microsalt analysis in trailblazer to pass paths like /home/proj/production/microbial/results/reports/trailblazer/<ticket_id>_slurm_ids.yaml. The out directory should be /home/proj/production/microbial/results/reports/deliverables (?).

@seallard seallard added the Bug label Feb 5, 2024
@seallard
Copy link
Contributor Author

seallard commented Feb 5, 2024

If a microsalt analysis is re-run, will the old slurm ids be overwritten in the trailblazer directory?
Yes, the dir is open in write mode and any filecontents will be overwritten.

@seallard
Copy link
Contributor Author

seallard commented Feb 5, 2024

The name used for the job ids file seems to differ depending on the number of samples in the case 🤢 😭

        if isinstance(self.sampleinfo, list) and len(self.sampleinfo) > 1:
            self.name = self.sampleinfo[0].get("CG_ID_project")
            self.sample = self.sampleinfo[0]
            for entry in self.sampleinfo:
                if entry.get("CG_ID_sample") == self.name:
                    raise Exception(
                        "Mixed projects in samples_info file. Do not know how to proceed"
                    )
        else:
            if isinstance(self.sampleinfo, list):
                self.sampleinfo = self.sampleinfo[0]
            self.name = self.sampleinfo.get("CG_ID_sample")
            self.sample = self.sampleinfo

I'm going to disregard this since cases with one sample are rare in microsalt. And why would you even use different paths? 🤦 Added to backlog in microsalt Clinical-Genomics/microSALT#170

@seallard seallard changed the title Jobs for some microsalt analyses are not tracked Jobs for microsalt analyses not tracked correctly Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant