Sample and variant QC for exome cohort studies

This repository contains the code used for the QC of human exome cohorts

The code is written by members of Wellcome Sanger HGI group (https://www.sanger.ac.uk/group/human-genetics-informatics-hgi/) based on the gnomAD QC pipeline by Broad institute (https://github.com/broadinstitute/gnomad_qc/tree/main).

The current codebase has several branches - one for each dataset. The unification of branches and the code refactoring is in progress.

The howto for the code is here: https://hgi-projects.pages.internal.sanger.ac.uk/documentation/docs/how-to-guides/wes-qc-hail/

How to run the code

The code is designed to run on the SPARK cluster with the Hail library installed. The manual for the cluster setting up is here.

To run the code on a Spark cluster, two scripts are used:

hlrun_local - runs the Python script via spark-submit. You need to run it on the spark master node on your cluster.
hlrun_remote - runs the code on the Spark cluster form your local machine. It performs a series of operations:
- Sync the codebase to the remote cluster
- Create tmux session on the remoter cluster
- Run the Python script via hlrun_local
- Attach to the tmux session to monitor the progress

Warning

The hlrun_remote is designed to work with only one tmux session. To start a new task via hlrun_remote, first end the existing tmux session, if it exists.

How to run the tests and with coverage

The tests currently require running on the SPARK cluster. There are plans to make them runnable locally. They can be run by commands defined in Makefile.

To run all the tests:

make test

Or you can specify the type of test to run

make unit-test
make integration-test

To run the tests with coverage:

make unit-test-coverage
make integration-test-coverage

Developer's howto

To run pre-commit hooks on commit

Install pre-commit

pip install pre-commit

pre-commit will automatically run on every commit
To run pre-commit manually on specific files

pre-commit run --files <file1> <file2>

mypy is configured to run manually because now it produce too many errors. To run it:

pre-commit run --hook-stage manual

Name		Name	Last commit message	Last commit date
Latest commit History 939 Commits
1-import_data		1-import_data
2-sample_qc		2-sample_qc
3-variant_qc		3-variant_qc
4-genotype_qc		4-genotype_qc
bam		bam
check_array_data		check_array_data
config		config
docs		docs
evaluation		evaluation
jupyter_notebooks		jupyter_notebooks
kinship		kinship
misc		misc
mutation_spectra		mutation_spectra
scripts		scripts
tests		tests
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
run_all_steps.sh		run_all_steps.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sample and variant QC for exome cohort studies

How to run the code

How to run the tests and with coverage

Developer's howto

To run pre-commit hooks on commit

About

Releases

Packages

Contributors 8

Languages

wtsi-hgi/wes-qc

Folders and files

Latest commit

History

Repository files navigation

Sample and variant QC for exome cohort studies

How to run the code

How to run the tests and with coverage

Developer's howto

To run pre-commit hooks on commit

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages