Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caper submit leader job but no children jobs on slurm HPC #101

Open
alexyfyf opened this issue Nov 10, 2020 · 4 comments
Open

caper submit leader job but no children jobs on slurm HPC #101

alexyfyf opened this issue Nov 10, 2020 · 4 comments

Comments

@alexyfyf
Copy link

Hi team,

I'm using the ENCODE chip-seq-pipeline2 and installed the conda environment for it.
I also edited the ~/.caper/default.conf as follow:

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=genomics
slurm-account=ls25

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/fyan0011/ls25_scratch/feng.yan/caperfiles/

cromwell=/home/fyan0011/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/fyan0011/.caper/womtool_jar/womtool-52.jar

Then I activated the conda environment and run this command as per your manual
sbatch -A ls25 -p genomics --qos=genomics -J chip-seq --export=ALL --mem 4G -t 4:00:00 --wrap 'caper run /home/fyan0011/ls25_scratch/feng.yan/software/chip-seq-pipeline2/chip.wdl -i template.json'
I noticed the qos flag seems not used according to the logs, anyway, the job was submitted, but no children job was seen.

The slurm out file showed that jobs are

2020-11-10 15:05:32,956|caper.caper_base|INFO| Creating a timestamped temporary directory. /home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560
2020-11-10 15:05:32,957|caper.caper_runner|INFO| Localizing files on work_dir. /home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560
2020-11-10 15:05:34,243|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2020-11-10 15:05:43,850|caper.cromwell|INFO| Womtool validation passed.
2020-11-10 15:05:43,851|caper.caper_runner|INFO| launching run: wdl=/home/fyan0011/ls25_scratch/feng.yan/software/chip-seq-pipeline2/chip.wdl, inputs=/fs03/ls25/feng.yan/Lmo2_ChIP/test/caper/template.json, backend_conf=/home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560/backend.conf
2020-11-10 15:06:07,320|caper.cromwell_workflow_monitor|INFO| Workflow: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, status=Submitted
2020-11-10 15:06:07,545|caper.cromwell_workflow_monitor|INFO| Workflow: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, status=Running
2020-11-10 15:06:25,686|caper.cromwell_workflow_monitor|INFO| Task: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, task=chip.read_genome_tsv:-1, retry=0, status=Started, job_id=35516
2020-11-10 15:06:25,697|caper.cromwell_workflow_monitor|INFO| Task: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, task=chip.read_genome_tsv:-1, retry=0, status=WaitingForReturnCode

Could you help with this?
Thank you!

@leepc12
Copy link
Contributor

leepc12 commented Nov 12, 2020

Please post slurm*.out and cromwell.out.

@kaji331
Copy link

kaji331 commented Nov 1, 2021

Please post slurm*.out and cromwell.out.

Hi, I commit similar error with caper 2.0/chip-seq-pipeline2 v2.0. I installed chip-seq-pipeline2 using conda and installed the caper in encode-chip-seq-pipeline environment using pip.

cromwell.out.txt
slurm-6342687.out.txt

@leepc12
Copy link
Contributor

leepc12 commented Nov 1, 2021

Here is the actual sbatch command line used for submitting a job (in cromwell.out):

for ITER in 1 2 3; do
    sbatch --export=ALL -J cromwell_035dea6f_read_genome_tsv -D /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv -o /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stdout -e /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stderr \
        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \
         \
        /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/script.caper && break
    sleep 30
done

Please check if these resource parameters work on your cluster:

        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \

Also, do not activate Conda environment. If you want to use conda then use caper run ... --conda. Caper internally run conda run -n ENV_NAME JOB_SCRIPT. You can also use --singularity if you have Singularity installed on your cluster. I recommend Singularity.

@kaji331
Copy link

kaji331 commented Nov 2, 2021

Here is the actual sbatch command line used for submitting a job (in cromwell.out):

for ITER in 1 2 3; do
    sbatch --export=ALL -J cromwell_035dea6f_read_genome_tsv -D /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv -o /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stdout -e /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stderr \
        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \
         \
        /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/script.caper && break
    sleep 30
done

Please check if these resource parameters work on your cluster:

        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \

Also, do not activate Conda environment. If you want to use conda then use caper run ... --conda. Caper internally run conda run -n ENV_NAME JOB_SCRIPT. You can also use --singularity if you have Singularity installed on your cluster. I recommend Singularity.

Thank you very much! Actually, I wanna install an standalone version of caper and chip-seq-pipeline2, so I tried to install caper in another conda environment or singularity image...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants