caper submit leader job but no children jobs on slurm HPC #101

alexyfyf · 2020-11-10T04:29:30Z

Hi team,

I'm using the ENCODE chip-seq-pipeline2 and installed the conda environment for it.
I also edited the ~/.caper/default.conf as follow:

backend=slurm

# define one of the followings (or both) according to your
# cluster's SLURM configuration.
slurm-partition=genomics
slurm-account=ls25

# Hashing strategy for call-caching (3 choices)
# This parameter is for local (local/slurm/sge/pbs) backend only.
# This is important for call-caching,
# which means re-using outputs from previous/failed workflows.
# Cache will miss if different strategy is used.
# "file" method has been default for all old versions of Caper<1.0.
# "path+modtime" is a new default for Caper>=1.0,
#   file: use md5sum hash (slow).
#   path: use path.
#   path+modtime: use path and modification time.
local-hash-strat=path+modtime

# Local directory for localized files and Cromwell's intermediate files
# If not defined, Caper will make .caper_tmp/ on local-out-dir or CWD.
# /tmp is not recommended here since Caper store all localized data files
# on this directory (e.g. input FASTQs defined as URLs in input JSON).
local-loc-dir=/home/fyan0011/ls25_scratch/feng.yan/caperfiles/

cromwell=/home/fyan0011/.caper/cromwell_jar/cromwell-52.jar
womtool=/home/fyan0011/.caper/womtool_jar/womtool-52.jar

Then I activated the conda environment and run this command as per your manual
sbatch -A ls25 -p genomics --qos=genomics -J chip-seq --export=ALL --mem 4G -t 4:00:00 --wrap 'caper run /home/fyan0011/ls25_scratch/feng.yan/software/chip-seq-pipeline2/chip.wdl -i template.json'
I noticed the qos flag seems not used according to the logs, anyway, the job was submitted, but no children job was seen.

The slurm out file showed that jobs are

2020-11-10 15:05:32,956|caper.caper_base|INFO| Creating a timestamped temporary directory. /home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560
2020-11-10 15:05:32,957|caper.caper_runner|INFO| Localizing files on work_dir. /home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560
2020-11-10 15:05:34,243|caper.cromwell|INFO| Validating WDL/inputs/imports with Womtool...
2020-11-10 15:05:43,850|caper.cromwell|INFO| Womtool validation passed.
2020-11-10 15:05:43,851|caper.caper_runner|INFO| launching run: wdl=/home/fyan0011/ls25_scratch/feng.yan/software/chip-seq-pipeline2/chip.wdl, inputs=/fs03/ls25/feng.yan/Lmo2_ChIP/test/caper/template.json, backend_conf=/home/fyan0011/ls25_scratch/feng.yan/caperfiles/chip/20201110_150532_953560/backend.conf
2020-11-10 15:06:07,320|caper.cromwell_workflow_monitor|INFO| Workflow: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, status=Submitted
2020-11-10 15:06:07,545|caper.cromwell_workflow_monitor|INFO| Workflow: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, status=Running
2020-11-10 15:06:25,686|caper.cromwell_workflow_monitor|INFO| Task: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, task=chip.read_genome_tsv:-1, retry=0, status=Started, job_id=35516
2020-11-10 15:06:25,697|caper.cromwell_workflow_monitor|INFO| Task: id=abadfc26-bf72-4d87-b5cf-a36e8a2cbeb8, task=chip.read_genome_tsv:-1, retry=0, status=WaitingForReturnCode

Could you help with this?
Thank you!

The text was updated successfully, but these errors were encountered:

leepc12 · 2020-11-12T20:02:33Z

Please post slurm*.out and cromwell.out.

kaji331 · 2021-11-01T07:08:19Z

Please post slurm*.out and cromwell.out.

Hi, I commit similar error with caper 2.0/chip-seq-pipeline2 v2.0. I installed chip-seq-pipeline2 using conda and installed the caper in encode-chip-seq-pipeline environment using pip.

cromwell.out.txt
slurm-6342687.out.txt

leepc12 · 2021-11-01T16:23:18Z

Here is the actual sbatch command line used for submitting a job (in cromwell.out):

for ITER in 1 2 3; do
    sbatch --export=ALL -J cromwell_035dea6f_read_genome_tsv -D /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv -o /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stdout -e /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stderr \
        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \
         \
        /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/script.caper && break
    sleep 30
done

Please check if these resource parameters work on your cluster:

        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \

Also, do not activate Conda environment. If you want to use conda then use caper run ... --conda. Caper internally run conda run -n ENV_NAME JOB_SCRIPT. You can also use --singularity if you have Singularity installed on your cluster. I recommend Singularity.

kaji331 · 2021-11-02T02:00:01Z

Here is the actual sbatch command line used for submitting a job (in cromwell.out):

for ITER in 1 2 3; do
    sbatch --export=ALL -J cromwell_035dea6f_read_genome_tsv -D /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv -o /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stdout -e /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/stderr \
        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \
         \
        /storage/hpc/yangling/Projects/Singularity/chip/chip-data/chip/035dea6f-f3c2-47ce-82b0-e19174e47a3b/call-read_genome_tsv/execution/script.caper && break
    sleep 30
done

Please check if these resource parameters work on your cluster:

        -p intel-e5,amd-ep2 --account yangling \
        -n 1 --ntasks-per-node=1 --cpus-per-task=1 --mem=2048M --time=240  \

Also, do not activate Conda environment. If you want to use conda then use caper run ... --conda. Caper internally run conda run -n ENV_NAME JOB_SCRIPT. You can also use --singularity if you have Singularity installed on your cluster. I recommend Singularity.

Thank you very much! Actually, I wanna install an standalone version of caper and chip-seq-pipeline2, so I tried to install caper in another conda environment or singularity image...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caper submit leader job but no children jobs on slurm HPC #101

caper submit leader job but no children jobs on slurm HPC #101

alexyfyf commented Nov 10, 2020

leepc12 commented Nov 12, 2020

kaji331 commented Nov 1, 2021

leepc12 commented Nov 1, 2021

kaji331 commented Nov 2, 2021

caper submit leader job but no children jobs on slurm HPC #101

caper submit leader job but no children jobs on slurm HPC #101

Comments

alexyfyf commented Nov 10, 2020

leepc12 commented Nov 12, 2020

kaji331 commented Nov 1, 2021

leepc12 commented Nov 1, 2021

kaji331 commented Nov 2, 2021