Skip to content

A motif discovery tool to detect the occurrences of known motifs

License

Notifications You must be signed in to change notification settings

shao-lab/MotifScan

Repository files navigation

MotifScan

Github Actions Documentation Status PyPI Python Version Bioconda Codecov License

Introduction

Scan input genomic regions with known DNA motifs

Given a set of input genomic regions, MotifScan scans the sequences to detect the occurrences of known motifs. It can also applies a statistical test on each motif to check whether the motif is significantly over- or under-represented (enriched or depleted) in the input genomic regions compared to another set of control regions.

Citation

Sun, H., Wang, J., Gong, Z. et al. Quantitative integration of epigenomic variation and transcription factor binding using MAmotif toolkit identifies an important role of IRF2 as transcription activator at gene promoters. Cell Discov 4, 38 (2018).

Documentation

To see the full documentation of MotifScan, please refer to: https://motifscan.readthedocs.io

Installation

The latest version release of MotifScan is available at PyPI:

$ pip install motifscan

Or you can install MotifScan via conda:

$ conda install -c bioconda motifscan

Usage

Install genome assemblies

Install from a remote database

You can download genome assemblies from the UCSC database.

First, display all available genome assemblies:

$ motifscan genome --list-remote

Then, install a genome assembly (e.g. hg19):

$ motifscan genome --install -n hg19 -r hg19

Install with local files

To install a genome assembly locally, you have to prepare a FASTA file containing the genome sequences and a genome annotation file (refGene.txt).

$ motifscan genome --install -n hg19 -i <hg19.fa> -a <refGene.txt>

Install and build motif sets

Install from a remote database

Users can install motif PFMs sets in the JASPAR 2020 database.

First, display all available motif PFMs sets in JASPAR 2020:

$ motifscan motif --list-remote

Then, install a JASPAR motif PFMs set (e.g. vertebrates_non-redundant):

$ motifscan motif --install -n <motif_set> -r vertebrates_non-redundant -g hg19

Install with local files

Install a motif set with local PFMs file:

$ motifscan motif --install -n <motif_set> -i <pfms.jaspar> -g hg19

Build PFMs for additional genome

Build the motif PFMs set for another installed genome assembly hg38:

$ motifscan motif --build <motif_set> -g hg38

Scanning Motifs

After the data preparation steps, you can now scan a set of genomic regions to detect the occurrences of known motifs.

$ motifscan scan -i regions.bed -g hg19 -m <motif_set> -o <output_dir>

Note: Using -h/--help for the details of all arguments.

License

BSD 3-Clause License