Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set up LSF specific conf file for caper #93

Open
ls233 opened this issue Sep 22, 2020 · 20 comments
Open

how to set up LSF specific conf file for caper #93

ls233 opened this issue Sep 22, 2020 · 20 comments

Comments

@ls233
Copy link

ls233 commented Sep 22, 2020

Hi Jin,

I'm looking for the right value of a platform parameter to be specified to init Caper on my HPC (Mount Sinai). My HPC uses the LSF system. I'm referring to section 2.3 of this manual - https://github.com/MoTrPAC/motrpac-atac-seq-pipeline.

Thanks,

German Nudelman, Ph.D.
Sr. Bioinformatics Developer/Analyst
Icahn School of Medicine at Mount Sinai

@leepc12
Copy link
Contributor

leepc12 commented Sep 22, 2020

That link doesn't work. Does your LSF cluster have a wiki page?

@ls233
Copy link
Author

ls233 commented Sep 22, 2020 via email

@leepc12
Copy link
Contributor

leepc12 commented Sep 22, 2020

Caper doesn't currently support LSF. If I can get some detailed info about bsub and monitoring command then I can add it to Caper later.

@leepc12
Copy link
Contributor

leepc12 commented Sep 22, 2020

You may need to run Caper with local backend, which means that Caper will not bsub tasks. It will run all tasks on a current shell.

Login on a compute node and then run

caper run ATAC_WDL -i INPUT_JSON --singularity --max-concurrent-tasks 2

Use screen or nohup to keep the session on.
Or bsub caper command line itself with very large resources.

If you want to save resources on a compute node, then serialize all tasks by using --max-concurrent-tasks 1.

@ls233
Copy link
Author

ls233 commented Jan 26, 2021 via email

@leepc12
Copy link
Contributor

leepc12 commented Feb 9, 2021

Sorry for the late reply, currently we don't have a plan to add a LSF backend. If you are familiar with python then you can start by modifying the PBS backend of Caper.

https://github.com/ENCODE-DCC/caper/blob/master/caper/cromwell_backend.py#L710

You need to modify bash command lines under keys submit, kill, check-alive and job-id-regex. For example replace qsub with bsub.

        'submit': dedent(
            """\
            if [ -z \\"$SINGULARITY_BINDPATH\\" ]; then export SINGULARITY_BINDPATH=${singularity_bindpath}; fi; \\
            if [ -z \\"$SINGULARITY_CACHEDIR\\" ]; then export SINGULARITY_CACHEDIR=${singularity_cachedir}; fi;
            echo "${if !defined(singularity) then '/bin/bash ' + script
                    else
                      'singularity exec --cleanenv ' +
                      '--home ' + cwd + ' ' +
                      (if defined(gpu) then '--nv ' else '') +
                      singularity + ' /bin/bash ' + script}" | \\
            qsub \\
                -N ${job_name} \\
                -o ${out} \\
                -e ${err} \\
                ${true="-lnodes=1:ppn=" false="" defined(cpu)}${cpu}${true=":mem=" false="" defined(memory_mb)}${memory_mb}${true="mb" false="" defined(memory_mb)} \\
                ${'-lwalltime=' + time + ':0:0'} \\
                ${'-lngpus=' + gpu} \\
                ${'-q ' + pbs_queue} \\
                ${pbs_extra_param} \\
                -V
        """
        ),
        'exit-code-timeout-seconds': 180,
        'kill': 'qdel ${job_id}',
        'check-alive': 'qstat ${job_id}',
        'job-id-regex': '(\\d+)',

@HenryCWong
Copy link

I'll probably be working on getting this to LSF soon. In the mean time this might help, it's old but the basic commands typically don't change that much. https://modelingguru.nasa.gov/docs/DOC-1040

@HenryCWong
Copy link

HenryCWong commented Jun 1, 2021

Should look something like this

class CromwellBackendlsf(CromwellBackendLocal):
    TEMPLATE_BACKEND = {
        'config': {
            'default-runtime-attributes': {'time': 24},
            'script-epilogue': 'sleep 5',
            'runtime-attributes': dedent(
                """\
                String? docker
                String? docker_user
                Int cpu = 1
                Int? gpu
                Int? time
                Int? memory_mb
                String? lsf_queue
                String? lsf_extra_param
                String? singularity
                String? singularity_bindpath
                String? singularity_cachedir
            """
            ),
            'submit': dedent(
                """\
                if [ -z \\"$SINGULARITY_BINDPATH\\" ]; then export SINGULARITY_BINDPATH=${singularity_bindpath}; fi; \\
                if [ -z \\"$SINGULARITY_CACHEDIR\\" ]; then export SINGULARITY_CACHEDIR=${singularity_cachedir}; fi;
                echo "${if !defined(singularity) then '/bin/bash ' + script
                        else
                          'singularity exec --cleanenv ' +
                          '--home ' + cwd + ' ' +
                          singularity + ' /bin/bash ' + script}" | \\
                bsub \\
                    -J ${job_name} \\
                    -o ${out} \\
                    -e ${err} \\
                    ${true="-n=" false="" defined(cpu)}${cpu} \\
                    ${true="-R 'rusage[mem=" false="" defined(memory_mb)}${memory_mb} ${true="mb]'" false="" defined(memory_mb)} \\
                    ${'-W=' + time + ':0'} \\
                    ${'-q ' + lsf_queue} \\
                    ${lsf_extra_param} \\
                    -V
            """
            ),
            'exit-code-timeout-seconds': 180,
            'kill': 'bkill ${job_id}',
            'check-alive': 'bjobs ${job_id}',
            'job-id-regex': '(\\d+)',
        }
    }

    def __init__(
        self,
        local_out_dir,
        max_concurrent_tasks=CromwellBackendBase.DEFAULT_CONCURRENT_JOB_LIMIT,
        soft_glob_output=False,
        local_hash_strat=CromwellBackendLocal.DEFAULT_LOCAL_HASH_STRAT,
        lsf_queue=None,
        lsf_extra_param=None,
    ):
        super().__init__(
            local_out_dir=local_out_dir,
            backend_name=BACKEND_LSF,
            max_concurrent_tasks=max_concurrent_tasks,
            soft_glob_output=soft_glob_output,
            local_hash_strat=local_hash_strat,
        )
        self.merge_backend(CromwellBackendLSF.TEMPLATE_BACKEND)
        self.backend_config.pop('submit-docker')

        if lsf_queue:
            self.default_runtime_attributes['lsf_queue'] = lsf_queue
        if LSF_extra_param:
            self.default_runtime_attributes['LSF_extra_param'] = lsf_extra_param```

Note: have not tested this

I got rid of GPU because GPU use is dependent on LSF implmenetation.

However, @leepc12 how is "job-id-regex" grabbed in PBS? I'm not completely familiar with how job id's are grabbed from PBS so any insight on this would be much appreciated. It shouldn't be too difficult to construct a regex.

@HenryCWong
Copy link

Never mind, went threw cromwell docs and found this



LSF {
  actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
  config {
    submit = "bsub -J ${job_name} -cwd ${cwd} -o ${out} -e ${err} /usr/bin/env bash ${script}"
    kill = "bkill ${job_id}"
    check-alive = "bjobs ${job_id}"
    job-id-regex = "Job <(\\d+)>.*"
  }
}

so everything is fairly similar

@HenryCWong
Copy link

I am modifying the PBS backend at https://github.com/ENCODE-DCC/caper/blob/master/caper/cromwell_backend.py but when I run caper I still get an error saying I am using qsub.
Any help would be appreciated.

@HenryCWong
Copy link

HenryCWong commented Jun 11, 2021

For future reference this is the LSF backend file that was ran that worked. You cluster may not have a -G or -q flag so adjust as you may. I also had to set up specific paths (the PATHS="" and LSF_DOCKER_VOLUMES="") for my LSF call. If your compute cluster or system is different you'll want to take those out too. When you run caper just run --backend-file name_of_backendfile.conf. Thanks to @leepc12 for helping me set this up.

backend {
  providers {
    pbs {
      config {
        submit = """if [ -z \"$SINGULARITY_BINDPATH\" ]; then export SINGULARITY_BINDPATH=${singularity_bindpath}; fi; \
if [ -z \"$SINGULARITY_CACHEDIR\" ]; then export SINGULARITY_CACHEDIR=${singularity_cachedir}; fi;

echo "${if !defined(singularity) then '/bin/bash ' + script
        else
          'singularity exec --cleanenv ' +
          '--home ' + cwd + ' ' +
          (if defined(gpu) then '--nv ' else '') +
          singularity + ' /bin/bash ' + script}" | \

PATH="/opt/juicer/CPU/common:/opt/hic-pipeline/hic_pipeline:$PATH" LSF_DOCKER_VOLUMES="/storage1/fs1/dspencer/Active:/storage1/fs1/dspencer/Active" \
bsub \
    -J ${job_name} \
    -o ${out} \
    -e ${err} \
    ${true="-n " false="" defined(cpu)}${cpu} \
    ${true="-M" false="" defined(memory_mb)}${memory_mb}${true="MB" false="" defined(memory_mb)} \ \
    ${'-W' + time + ':0:0'} \
    ${'-q ' + pbs_queue} \
    -G compute-group \
    ${pbs_extra_param} \
"""
        kill = "bkill ${job_id}"
        check-alive = "bjobs ${job_id}"
        job-id-regex = "(\\d+)"
      }
    }
  }
}```

@ernstki
Copy link

ernstki commented Sep 2, 2021

Hi everyone. I'm charged with standing up the ENCODE ATAC-seq pipeline to work in our environment, which is LSF, and I'm willing to take the baton across the finish line with GitHub pull request to see LSF supported out of the box for all users of Caper.

@HenryCWong, you have done most of the legwork already. If I can test your changes locally, and everything works for the two of us, at our two different sites, is there a way I can walk you through how to do a PR on GitHub, or... are you clear on how to do that? Do you have the time?

It would be a shame for you not to get credit, if it gets merged into the codebase.

@leepc12
Copy link
Contributor

leepc12 commented Sep 7, 2021

@ernstki: Please let me make a dev PR for you and you can pull it (you may need to git pull the test branch and add the git directory to PYTHONPATH so that pip-installed one is ignored) and test on your clusters.

All I need a working custom backend file (--backend-file) that works most of LSF clusters. Then you will be able to use caper init lsf and just define required parameters in the conf.(~/.caper/default.conf).

If that works for you two @ernstki and @HenryCWong then I can merge it to master.

@HenryCWong
Copy link

Hi sorry for the late response y'all. So do you still want me to take the PR since @leepc12 is making a dev PR?

I can get you the custom lsf backend file tomorrow. The one above should work but I also haven't been in here in 2 months so I'll double check things.

@ernstki
Copy link

ernstki commented Sep 7, 2021

@HenryCWong If you're willing to just

  • make a fork
  • drop a single commit in your fork with the work that you've done (the custom backend)
    • you could even just upload this file using the GitHub website, no command-line Git required
  • and let us know when that's done

…I'm willing to cherry-pick that commit from your fork and do any remaining work to get it in a state that meets @leepc12's requirements.

This way you'll get credit for the work you've done in the Git commit history for Caper, and you will be Internet Famous. ;) If that kind of fame has no great appeal for you, I can just copy-paste what you have above instead, I will credit you in the relevant commit message, and you can forget about the forking and all that.

I think we can discuss whether @leepc12 wants to put custom backends in a contrib subdirectory and other details like that in the PR.

@HenryCWong
Copy link

Thanks for the info. I forked the and made a commit here https://github.com/HenryCWong/caper.

@lauragails
Copy link

I opened my password manager to log in to specifically thank you for doing this.

(I am another bioinformatician at Mount Sinai, on the same computing environment, that needed this fix)

@HenryCWong
Copy link

It seems IBM has been customizing specific LSF things for customers so if it doesn't work for you guys and you need to do run caper with a custom backend I can try to help out.

@lauragails
Copy link

lauragails commented Sep 21, 2021 via email

@ernstki
Copy link

ernstki commented Oct 28, 2021

It looks like #148 (release 2.0.0) implements LSF support, so thanks @leepc12!

Not sure if that's based on what @HenryCWong shared here or not, but it looks like this issue could be closed if v2.0.0 meets @ls233's requirements.

The project I needed this for is not nearing the stage where it's ready to submit jobs to a cluster anyway, so I wouldn't have been to work on this for several weeks at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants