Scan input genomic regions with known DNA motifs
Given a set of input genomic regions, MotifScan scans the sequences to detect the occurrences of known motifs. It can also applies a statistical test on each motif to check whether the motif is significantly over- or under-represented (enriched or depleted) in the input genomic regions compared to another set of control regions.
To see the full documentation of MotifScan, please refer to: https://motifscan.readthedocs.io
The latest version release of MotifScan is available at PyPI:
$ pip install motifscan
Or you can install MotifScan via conda:
$ conda install -c bioconda motifscan
You can download genome assemblies from the UCSC database.
First, display all available genome assemblies:
$ motifscan genome --list-remote
Then, install a genome assembly (e.g. hg19):
$ motifscan genome --install -n hg19 -r hg19
To install a genome assembly locally, you have to prepare a FASTA file containing the genome sequences and a genome annotation file (refGene.txt).
$ motifscan genome --install -n hg19 -i <hg19.fa> -a <refGene.txt>
Users can install motif PFMs sets in the JASPAR 2020 database.
First, display all available motif PFMs sets in JASPAR 2020:
$ motifscan motif --list-remote
Then, install a JASPAR motif PFMs set (e.g. vertebrates_non-redundant):
$ motifscan motif --install -n <motif_set> -r vertebrates_non-redundant -g hg19
Install a motif set with local PFMs file:
$ motifscan motif --install -n <motif_set> -i <pfms.jaspar> -g hg19
Build the motif PFMs set for another installed genome assembly hg38:
$ motifscan motif --build <motif_set> -g hg38
After the data preparation steps, you can now scan a set of genomic regions to detect the occurrences of known motifs.
$ motifscan scan -i regions.bed -g hg19 -m <motif_set> -o <output_dir>
Note: Using -h/--help for the details of all arguments.