Skip to content

Commit

Permalink
Merge pull request #69 from ENCODE-DCC/v1.1.8
Browse files Browse the repository at this point in the history
V1.1.8
  • Loading branch information
leepc12 authored May 24, 2019
2 parents 2f567e6 + 1a50594 commit 22d5b48
Show file tree
Hide file tree
Showing 97 changed files with 1,822 additions and 616 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
name: build image
command: |
source ${BASH_ENV}
export DOCKER_CACHE_TAG=v1.1.6
export DOCKER_CACHE_TAG=test-v1.1.8
echo "pulling ${DOCKER_CACHE_TAG}!"
docker pull quay.io/encode-dcc/chip-seq-pipeline:${DOCKER_CACHE_TAG}
docker login -u=${QUAY_ROBOT_USER} -p=${QUAY_ROBOT_USER_TOKEN} quay.io
Expand Down
78 changes: 50 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,41 +7,67 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an

### Features

* **Flexibility**: Support for `docker`, `singularity` and `Conda`.
* **Portability**: Support for many cloud platforms (Google/DNAnexus) and cluster engines (SLURM/SGE/PBS).
* **Resumability**: [Resume](utils/qc_jsons_to_tsv/README.md) a failed workflow from where it left off.
* **User-friendly HTML report**: tabulated quality metrics including alignment/peak statistics and FRiP along with many useful plots (IDR/cross-correlation measures).
- Examples: [HTML](https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/example_output/qc.html), [JSON](docs/example_output/v1.1.5/qc.json)
* **Genomes**: Pre-built database for GRCh38, hg19, mm10, mm9 and additional support for custom genomes.

## Installation and tutorial
## Installation

This pipeline supports many cloud platforms and cluster engines. It also supports `docker`, `singularity` and `Conda` to resolve complicated software dependencies for the pipeline. A tutorial-based instruction for each platform will be helpful to understand how to run pipelines. There are special instructions for two major Stanford HPC servers (SCG4 and Sherlock).
1) Install [Caper](https://github.com/ENCODE-DCC/caper#installation). Caper is a python wrapper for [Cromwell](https://github.com/broadinstitute/cromwell). Make sure that you have python3(> 3.4.1) installed on your system.

* Cloud platforms
* Web interface
* [DNAnexus Platform](docs/tutorial_dx_web.md)
* CLI (command line interface)
* [Google Cloud Platform](docs/tutorial_google.md)
* [DNAnexus Platform](docs/tutorial_dx_cli.md)
* Stanford HPC servers (CLI)
* [Stanford SCG4](docs/tutorial_scg.md)
* [Stanford Sherlock 2.0](docs/tutorial_sherlock.md)
* Cluster engines (CLI)
* [SLURM](docs/tutorial_slurm.md)
* [Sun GridEngine (SGE/PBS)](docs/tutorial_sge.md)
* Local computers (CLI)
* [Local system with `singularity`](docs/tutorial_local_singularity.md)
* [Local system with `docker`](docs/tutorial_local_docker.md)
* [Local system with `Conda`](docs/tutorial_local_conda.md)
```bash
$ pip install caper
```

## Input JSON file
2) Read through [Caper's README](https://github.com/ENCODE-DCC/caper) carefully.

[Input JSON file specification](docs/input.md)
3) Run a pipeline with Caper.

## Output directories
## Conda

[Output directory specification](docs/output.md)
We don't recommend Conda for dependency helper. Use Docker or Singularity instead. We will not take any issues about Conda. You can install Singularity locally without super-user privilege and use it for our pipeline with Caper (with `--use-singularity`).

1) Install [Conda](https://docs.conda.io/en/latest/miniconda.html).

2) Install Conda environment for pipeline.

```bash
$ conda/install_dependencies.sh
```

## Tutorial

Make sure that you have configured Caper correctly.

```bash
$ caper run chip.wdl -i examples/caper/ENCSR936XTK_subsampled_chr19_only.json --deepcopy --use-singularity
```

If you use Conda or Docker (on cloud platforms) then remove `--use-singularity` from the command line and activate it before running a pipeline.
```bash
$ conda activate encode-chip-seq-pipeline
```

## How to organize outputs

Install [Croo](https://github.com/ENCODE-DCC/croo#installation). Make sure that you have python3(> 3.4.1) installed on your system.

```bash
$ pip install croo
```

Find a `metadata.json` on Caper's output directory.

```bash
$ croo [METADATA_JSON_FILE]
```

## How to build/download genome database

You need to specify a genome data TSV file in your input JSON. Such TSV can be generated/downloaded with actual genome database files.

Use genome database [downloader](genome/download_genome_data.sh) or [builder](docs/build_genome_database.md) for your own genome.

## Useful tools

Expand All @@ -51,10 +77,6 @@ There are some useful tools to post-process outputs of the pipeline.

[This tool](utils/qc_jsons_to_tsv/README.md) recursively finds and parses all `qc.json` (pipeline's [final output](docs/example_output/v1.1.5/qc.json)) found from a specified root directory. It generates a TSV file that has all quality metrics tabulated in rows for each experiment and replicate. This tool also estimates overall quality of a sample by [a criteria definition JSON file](utils/qc_jsons_to_tsv/criteria.default.json) which can be a good guideline for QC'ing experiments.

### resumer

[This tool](utils/resumer/README.md) parses a metadata JSON file from a previous failed workflow and generates a new input JSON file to start a pipeline from where it left off.

### ENCODE downloader

[This tool](https://github.com/kundajelab/ENCODE_downloader) downloads any type (FASTQ, BAM, PEAK, ...) of data from the ENCODE portal. It also generates a metadata JSON file per experiment which will be very useful to make an input JSON file for the pipeline.
Loading

0 comments on commit 22d5b48

Please sign in to comment.