From e1cbbe4394319067fd00a241a47dee54a11e2d49 Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Tue, 7 Jun 2022 16:18:17 -0700 Subject: [PATCH 1/7] change conda env name and update docs a bit --- README.md | 69 +++++++++++++++++----------------- chip.wdl | 6 +-- docs/build_genome_database.md | 8 +--- scripts/install_conda_env.sh | 8 ++-- scripts/uninstall_conda_env.sh | 6 +-- scripts/update_conda_env.sh | 6 +-- 6 files changed, 49 insertions(+), 54 deletions(-) diff --git a/README.md b/README.md index 7db7cc8f..ef98befa 100644 --- a/README.md +++ b/README.md @@ -3,36 +3,22 @@ [![CircleCI](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master.svg?style=svg)](https://circleci.com/gh/ENCODE-DCC/chip-seq-pipeline2/tree/master) -## Download new Caper>=2.1 +## Conda environment name change (since v2.2.0 or 6/13/2022) -New Caper is out. You need to update your Caper to work with the latest ENCODE ChIP-seq pipeline. -```bash -$ pip install caper --upgrade +Pipeline's Conda environment's names have been shortened to work around the following error: ``` - -## Local/HPC users and new Caper>=2.1 - -There are tons of changes for local/HPC backends: `local`, `slurm`, `sge`, `pbs` and `lsf`(added). Make a backup of your current Caper configuration file `~/.caper/default.conf` and run `caper init`. Local/HPC users need to reset/initialize Caper's configuration file according to your chosen backend. Edit the configuration file and follow instructions in there. -```bash -$ cd ~/.caper -$ cp default.conf default.conf.bak -$ caper init [YOUR_BACKEND] +PaddingError: Placeholder of length '80' too short in package /XXXXXXXXXXX/miniconda3/envs/ ``` -In order to run a pipeline, you need to add one of the following flags to specify the environment to run each task within. i.e. `--conda`, `--singularity` and `--docker`. These flags are not required for cloud backend users (`aws` and `gcp`). +You need to reinstall pipeline's Conda environment. It's recommended to do this for every version update. ```bash -# for example -$ caper run ... --singularity +$ bash scripts/uninstall_conda_env.sh +$ bash scripts/install_conda_env.sh ``` -For Conda users, **RE-INSTALL PIPELINE'S CONDA ENVIRONMENT AND DO NOT ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING PIPELINES**. Caper will internally call `conda run -n ENV_NAME CROMWELL_JOB_SCRIPT`. Just make sure that pipeline's new Conda environments are correctly installed. -```bash -$ scripts/uninstall_conda_env.sh -$ scripts/install_conda_env.sh -``` +## Introduction -## Introduction This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor and histone ChIP-seq pipeline specifications (by Anshul Kundaje) in [this google doc](https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit#). ### Features @@ -45,30 +31,42 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an 1) Make sure that you have Python>=3.6. Caper does not work with Python2. Install Caper and check its version >=2.0. ```bash - $ python --version $ pip install caper + + # use caper version >= 2.3.0 for a new HPC feature (caper hpc submit/list/abort). + $ caper -v ``` -2) Make a backup of your Caper configuration file `~/.caper/default.conf` if you are upgrading from old Caper(<2.0.0). Reset/initialize Caper's configuration file. Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file. +2) Read Caper's [README](https://github.com/ENCODE-DCC/caper/blob/master/README.md) carefully to choose a backend for your system. Follow the instruction in the configuration file. ```bash - # make a backup of ~/.caper/default.conf if you already have it + # this will overwrite the existing conf file ~/.caper/default.conf + # make a backup of it first if needed $ caper init [YOUR_BACKEND] - # then edit ~/.caper/default.conf + # edit the conf file $ vi ~/.caper/default.conf ``` 3) Git clone this pipeline. - > **IMPORTANT**: use `~/chip-seq-pipeline2/chip.wdl` as `[WDL]` in Caper's documentation. ```bash $ cd $ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2 ``` -4) (Optional for Conda) Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda. If you don't have Conda on your system, install [Miniconda3](https://docs.conda.io/en/latest/miniconda.html). +4) (Optional for Conda) **DO NOT USE A SHARED CONDA. INSTALL YOUR OWN [MINICONDA3](https://docs.conda.io/en/latest/miniconda.html) AND USE IT.** Install pipeline's Conda environments if you don't have Singularity or Docker installed on your system. We recommend to use Singularity instead of Conda. ```bash + # check if you have Singularity on your system, if so then it's not recommended to use Conda + $ singularity --version + + # check if you are not using a shared conda, if so then delete it or remove it from your PATH + $ which conda + + # change directory to pipeline's git repo $ cd chip-seq-pipeline2 - # uninstall old environments (<2.0.0) + + # uninstall old environments $ bash scripts/uninstall_conda_env.sh + + # install new envs, you need to run this for every pipeline version update $ bash scripts/install_conda_env.sh ``` @@ -90,18 +88,19 @@ According to your chosen platform of Caper, run Caper or submit Caper command li The followings are just examples. Please read [Caper's README](https://github.com/ENCODE-DCC/caper) very carefully to find an actual working command line for your chosen platform. ```bash - # Run it locally with Conda (You don't need to activate it, make sure to install Conda envs first) + # Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT) $ caper run chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --conda - # Or submit it as a leader job (with long/enough resources) to SLURM (Stanford Sherlock) with Singularity - # It will fail if you directly run the leader job on login nodes - $ sbatch -p [SLURM_PARTITON] -J [WORKFLOW_NAME] --export=ALL --mem 4G -t 4-0 --wrap "caper chip chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --singularity" + # On HPC, submit it as a leader job to SLURM with Singularity + $ caper hpc submit chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --singularity --leader-job-name ANY_GOOD_LEADER_JOB_NAME - # Check status of your leader job - $ squeue -u $USER | grep [WORKFLOW_NAME] + # Check job ID and status of your leader jobs + $ caper hpc list # Cancel the leader node to close all of its children jobs - $ scancel -j [JOB_ID] + # If you directly use cluster command like scancel or qdel then + # child jobs will not be terminated + $ caper hpc abort [JOB_ID] ``` diff --git a/chip.wdl b/chip.wdl index 9dead742..616e474a 100644 --- a/chip.wdl +++ b/chip.wdl @@ -73,9 +73,9 @@ workflow chip { # group: runtime_environment String docker = 'encodedcc/chip-seq-pipeline:v2.1.6' String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.1.6.sif' - String conda = 'encode-chip-seq-pipeline' - String conda_macs2 = 'encode-chip-seq-pipeline-macs2' - String conda_spp = 'encode-chip-seq-pipeline-spp' + String conda = 'encd-chip' + String conda_macs2 = 'encd-chip-macs2' + String conda_spp = 'encd-chip-spp' # group: pipeline_metadata String title = 'Untitled' diff --git a/docs/build_genome_database.md b/docs/build_genome_database.md index ca019e0f..83f869b9 100644 --- a/docs/build_genome_database.md +++ b/docs/build_genome_database.md @@ -8,11 +8,7 @@ # How to build genome database -1. [Install Conda](https://conda.io/miniconda.html). Skip this if you already have equivalent Conda alternatives (Anaconda Python). Download and run the [installer](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh). Agree to the license term by typing `yes`. It will ask you about the installation location. On Stanford clusters (Sherlock and SCG4), we recommend to install it outside of your `$HOME` directory since its filesystem is slow and has very limited space. At the end of the installation, choose `yes` to add Miniconda's binary to `$PATH` in your BASH startup script. - ```bash - $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh - $ bash Miniconda3-latest-Linux-x86_64.sh - ``` +1. [Install Conda](https://conda.io/miniconda.html). 2. Install pipeline's Conda environment. ```bash @@ -22,7 +18,7 @@ 2. Choose `GENOME` from `hg19`, `hg38`, `mm9` and `mm10` and specify a destination directory. This will take several hours. We recommend not to run this installer on a login node of your cluster. It will take >8GB memory and >2h time. ```bash - $ conda activate encode-chip-seq-pipeline + $ conda activate encd-chip $ bash scripts/build_genome_data.sh [GENOME] [DESTINATION_DIR] ``` diff --git a/scripts/install_conda_env.sh b/scripts/install_conda_env.sh index 4ee88fa6..5d2cefb0 100755 --- a/scripts/install_conda_env.sh +++ b/scripts/install_conda_env.sh @@ -5,20 +5,20 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd) echo "$(date): Installing pipeline's Conda environments..." -conda create -n encode-chip-seq-pipeline --file ${SH_SCRIPT_DIR}/requirements.txt \ +conda create -n encd-chip --file ${SH_SCRIPT_DIR}/requirements.txt \ --override-channels -c bioconda -c defaults -y -conda create -n encode-chip-seq-pipeline-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \ +conda create -n encd-chip-macs2 --file ${SH_SCRIPT_DIR}/requirements.macs2.txt \ --override-channels -c bioconda -c defaults -y -conda create -n encode-chip-seq-pipeline-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \ +conda create -n encd-chip-spp --file ${SH_SCRIPT_DIR}/requirements.spp.txt \ --override-channels -c r -c bioconda -c defaults -y # adhoc fix for the following issues: # - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/259 # - https://github.com/ENCODE-DCC/chip-seq-pipeline2/issues/265 # force-install readline 6.2, ncurses 5.9 from conda-forge (ignoring dependencies) -conda install -n encode-chip-seq-pipeline-spp --no-deps --no-update-deps -y \ +conda install -n encd-chip-spp --no-deps --no-update-deps -y \ readline==6.2 ncurses==5.9 -c conda-forge echo "$(date): Done successfully." diff --git a/scripts/uninstall_conda_env.sh b/scripts/uninstall_conda_env.sh index 544da45a..56a85c8c 100755 --- a/scripts/uninstall_conda_env.sh +++ b/scripts/uninstall_conda_env.sh @@ -1,9 +1,9 @@ #!/bin/bash PIPELINE_CONDA_ENVS=( - encode-chip-seq-pipeline - encode-chip-seq-pipeline-macs2 - encode-chip-seq-pipeline-spp + encd-chip + encd-chip-macs2 + encd-chip-spp ) for PIPELINE_CONDA_ENV in "${PIPELINE_CONDA_ENVS[@]}" do diff --git a/scripts/update_conda_env.sh b/scripts/update_conda_env.sh index e294da3a..50fbdc6f 100755 --- a/scripts/update_conda_env.sh +++ b/scripts/update_conda_env.sh @@ -5,9 +5,9 @@ SH_SCRIPT_DIR=$(cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd) SRC_DIR=${SH_SCRIPT_DIR}/../src PIPELINE_CONDA_ENVS=( - encode-chip-seq-pipeline - encode-chip-seq-pipeline-macs2 - encode-chip-seq-pipeline-spp + encd-chip + encd-chip-macs2 + encd-chip-spp ) chmod u+rx ${SRC_DIR}/*.py From 20f667fa2f16c543731f0442c5179aa18ccce1e6 Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Tue, 7 Jun 2022 16:19:02 -0700 Subject: [PATCH 2/7] bump ver --- chip.wdl | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/chip.wdl b/chip.wdl index 616e474a..4efe59c5 100644 --- a/chip.wdl +++ b/chip.wdl @@ -7,10 +7,10 @@ struct RuntimeEnvironment { } workflow chip { - String pipeline_ver = 'v2.1.6' + String pipeline_ver = 'v2.2.0' meta { - version: 'v2.1.6' + version: 'v2.2.0' author: 'Jin wook Lee' email: 'leepc12@gmail.com' @@ -19,8 +19,8 @@ workflow chip { specification_document: 'https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit?usp=sharing' - default_docker: 'encodedcc/chip-seq-pipeline:v2.1.6' - default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.1.6.sif' + default_docker: 'encodedcc/chip-seq-pipeline:v2.2.0' + default_singularity: 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.0.sif' croo_out_def: 'https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.v5.json' parameter_group: { @@ -71,8 +71,8 @@ workflow chip { } input { # group: runtime_environment - String docker = 'encodedcc/chip-seq-pipeline:v2.1.6' - String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.1.6.sif' + String docker = 'encodedcc/chip-seq-pipeline:v2.2.0' + String singularity = 'https://encode-pipeline-singularity-image.s3.us-west-2.amazonaws.com/chip-seq-pipeline_v2.2.0.sif' String conda = 'encd-chip' String conda_macs2 = 'encd-chip-macs2' String conda_spp = 'encd-chip-spp' From e0ea27d8e6ed607e3d07b574bc97dbb4019687a2 Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Tue, 7 Jun 2022 16:24:27 -0700 Subject: [PATCH 3/7] delete redundant doc --- docs/install_conda.md | 53 ------------------------------------------- 1 file changed, 53 deletions(-) delete mode 100644 docs/install_conda.md diff --git a/docs/install_conda.md b/docs/install_conda.md deleted file mode 100644 index 523f30d8..00000000 --- a/docs/install_conda.md +++ /dev/null @@ -1,53 +0,0 @@ -# How to install pipeline's Conda environment - -If you do not have miniconda (or anaconda) installed, follow the instructions below in steps 1 - 4 to install miniconda. - -**IF YOU ALREADY HAVE ANACONDA OR MINICONDA INSTALLED, SKIP TO STEP 5** - -1) Download [Miniconda installer](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh). Use default answers to all questions except for the first and last. - ```bash - $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh - $ bash Miniconda3-4.6.14-Linux-x86_64.sh - ``` - - Type `yes` to the first question. - ```bash - Do you accept the license terms? [yes|no] - [no] >>> yes - ``` - - Type `yes` to the last question. - ```bash - Do you wish the installer to initialize Miniconda3 - by running conda init? [yes|no] - [no] >>> yes - ``` - -2) **IMPORTANT**: Close your session and re-login. If you skip this step then pipeline's Conda environment will be messed up with base Conda environment. - -3) **IMPORTANT**: Disable auto activation of base Conda enrivonment. - ```bash - $ conda config --set auto_activate_base false - ``` - -4) **IMPORTANT**: Close your session and re-login. - -5) Install pipeline's Conda environment. Add `mamba` to the install command line to resolve conflicts much faster. - - ```bash - $ bash scripts/uninstall_conda_env.sh # uninstall it for clean-install - $ bash scripts/install_conda_env.sh mamba # remove mamba if it does not work - ``` - -> **WARNING**: DO NOT PROCEED TO RUN PIPELINES UNTIL YOU SEE THE FOLLOWING SUCCESS MESSAGE OR PIPELINE WILL NOT WORK. - ```bash - === All done successfully === - ``` - -6) Activate pipeline's Conda environment before running a pipeline. - ```bash - $ conda activate encode-chip-seq-pipeline - - $ caper run ... - $ caper server ... - ``` From 8f022a2038d72e3412bef0b982acd60706c13a74 Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Thu, 9 Jun 2022 14:25:03 -0700 Subject: [PATCH 4/7] increate mem factor for subsample_ctl --- chip.wdl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chip.wdl b/chip.wdl index 4efe59c5..344ce35d 100644 --- a/chip.wdl +++ b/chip.wdl @@ -228,7 +228,7 @@ workflow chip { Int xcor_time_hr = 24 Float xcor_disk_factor = 4.5 - Float subsample_ctl_mem_factor = 14.0 + Float subsample_ctl_mem_factor = 22.0 Float subsample_ctl_disk_factor = 15.0 Float macs2_signal_track_mem_factor = 12.0 From 6d7df45dd1152c6ea4d4cb1c9d62092a7578a9fc Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Thu, 9 Jun 2022 14:25:20 -0700 Subject: [PATCH 5/7] increase mem factor for subsample_ctl --- docs/input.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/input.md b/docs/input.md index 9c0a5184..d527fb76 100644 --- a/docs/input.md +++ b/docs/input.md @@ -302,7 +302,7 @@ Parameter|Default|Description Parameter|Default|Description ---------|-------|----------- -`chip.subsample_ctl_mem_factor` | 14.0 | Multiplied to size of TAG-ALIGN BED to determine required memory +`chip.subsample_ctl_mem_factor` | 22.0 | Multiplied to size of TAG-ALIGN BED to determine required memory `chip.macs2_signal_track_time_hr` | 24 | Walltime (HPCs only) `chip.subsample_ctl_disk_factor` | 15.0 | Multiplied to size of TAG-ALIGN BED to determine required disk From 80503e14f792e186fc1f89d977f0a853b48220fb Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Sun, 12 Jun 2022 21:41:07 -0700 Subject: [PATCH 6/7] update readme --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ef98befa..bfa19f4f 100644 --- a/README.md +++ b/README.md @@ -66,7 +66,9 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an # uninstall old environments $ bash scripts/uninstall_conda_env.sh - # install new envs, you need to run this for every pipeline version update + # install new envs, you need to run this for every pipeline version update. + # it may be killed if you run this command line on a login node. + # it's recommended to make an interactive node and run it there. $ bash scripts/install_conda_env.sh ``` @@ -86,7 +88,8 @@ You can use URIs(`s3://`, `gs://` and `http(s)://`) in Caper's command lines and According to your chosen platform of Caper, run Caper or submit Caper command line to the cluster. You can choose other environments like `--singularity` or `--docker` instead of `--conda`. But you must define one of the environments. -The followings are just examples. Please read [Caper's README](https://github.com/ENCODE-DCC/caper) very carefully to find an actual working command line for your chosen platform. +PLEASE READ [CAPER'S README](https://github.com/ENCODE-DCC/caper) VERY CAREFULLY BEFORE RUNNING ANY PIPELINES. YOU WILL NEED TO CORRECTLY CONFIGURE CAPER FIRST. These are just example command lines. + ```bash # Run it locally with Conda (DO NOT ACTIVATE PIPELINE'S CONDA ENVIRONEMT) $ caper run chip.wdl -i https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI_subsampled_chr19_only.json --conda From d2d77536466b57013afc57fc4e63c43ca1b905ec Mon Sep 17 00:00:00 2001 From: Jin wook Lee Date: Mon, 13 Jun 2022 10:55:49 -0700 Subject: [PATCH 7/7] fix conda env name in help --- chip.wdl | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/chip.wdl b/chip.wdl index 344ce35d..e747b629 100644 --- a/chip.wdl +++ b/chip.wdl @@ -261,17 +261,17 @@ workflow chip { conda: { description: 'Default Conda environment name to run WDL tasks. For Conda users only.', group: 'runtime_environment', - example: 'encode-atac-seq-pipeline' + example: 'encd-chip' } conda_macs2: { description: 'Conda environment name for task macs2. For Conda users only.', group: 'runtime_environment', - example: 'encode-atac-seq-pipeline-macs2' + example: 'encd-chip-macs2' } conda_spp: { description: 'Conda environment name for tasks spp/xcor. For Conda users only.', group: 'runtime_environment', - example: 'encode-atac-seq-pipeline-spp' + example: 'encd-chip-spp' } title: { description: 'Experiment title.',