Skip to content

Commit

Permalink
update README
Browse files Browse the repository at this point in the history
  • Loading branch information
simonvh committed Nov 27, 2020
2 parents 91a2026 + 2567178 commit a9210e7
Show file tree
Hide file tree
Showing 11 changed files with 535 additions and 20,258 deletions.
35 changes: 27 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,28 @@

SCEPIA predicts transcription factor motif activity from single cell RNA-seq data. It uses computationally inferred epigenomes of single cells to identify transcription factors that determine cellular states. The regulatory inference is based on a two-step process:

1) Single cells are matched to a combination of (bulk) reference H3K27ac profiles.
2) Using the H3K27ac signal in enhancers associated with hypervariable genes the TF motif activity is inferred.
1) Single cells are matched to a combination of (bulk) reference H3K27ac ChIP-seq or ATAC-seq profiles.
2) Using the H3K27ac ChIP-seq or ATAC-seq signal in enhancers associated with hypervariable genes the TF motif activity is inferred.

The current reference is based on H3K27ac profiles from ENCODE.
Currently five different references are available, three for human and two for mouse. Different
data sets may give different results, based on a) the type of data (H3K27ac
ChIP-seq or ATAC-seq) and b) the different cell types being represented. While
SCEPIA does not require exact matching cell types to give good results, it does
work best when relatively similar cell types are in the reference.

So sorry, but only human is supported for now. However, if you have mouse data you *can* try it. Make sure you use upper-case gene names as identifier, and `scepia` will run fine. In our (very limited) experience this *can* yield good results, but there are a lot of assumptions on conservation of regulatory interactions.
The following references can be used:

* `ENCODE.H3K27ac.human` - All H3K27ac experiments from ENCODE. Includes cell
lines, tissues
* `BLUEPRINT.H3K27ac.human` - All H3K27ac cell types from BLUEPRINT (mostly
hematopoietic cell types)
* `Domcke.ATAC.fetal.human` - Fetal single cell-based ATAC-seq clusters from
15 different organs ([Domcke et al 2020](http://dx.doi.org/10.1126/science.aba7612)).
* `Cusanovich.ATAC.mouse` - ATAC-seq data of single cell-based clusters from 13
adult mouse tissues ([Cusanovich et al, 2018](http://dx.doi.org/doi:10.1016/j.cell.2018.06.052)).
* `ENCODE.H3K27ac.mouse` - All H3K27ac experiments from mouse ENCODE.

So sorry, but only human and mouse are supported for now. However, if you have data from other species you can try it if gene names tend to match. Make sure you use gene names as identifiers, and `scepia` will run fine. In our (very limited) experience this *can* yield good results, but there are a lot of assumptions on conservation of regulatory interactions. If you have a large collection of ATAC-seq or ChIP-seq reference experiments available you can also create your own reference with `ScepiaDataset.create()`. This is not well-documented at the moment, let us know if you need help to do so.

## Requirements and installation

Expand All @@ -26,7 +42,7 @@ $ conda config --add channels conda-forge
Now you can create an environment for scepia:

```
conda create -n scepia scepia=0.4
conda create -n scepia scepia>=0.5.0
# Note: if you want to use scepia in a Jupyter notebook, you also have to install the following packages: `ipywidgets nb_conda`.
conda activate scepia
```
Expand All @@ -35,14 +51,17 @@ conda activate scepia

### Before using SCEPIA

First install the hg38 genome through [genomepy](https://github.com/vanheeringen-lab/genomepy):
You have to install genomes that scepia uses through [genomepy](https://github.com/vanheeringen-lab/genomepy). The genomes that are used include `hg38`, `hg19`, `mm10` and `mm9`, depending on the reference. For example, to install `hg38`:

```
$ conda activate scepia
$ genomepy install hg38
```

You only need to do this once.
You only need to do this once for each genome.

**Note: this is independent of which genome / annotation you used for your
single cell RNA-seq!**

### Command line

Expand Down Expand Up @@ -77,7 +96,7 @@ from scepia.sc import infer_motifs
# load and preprocess single-cell data using scanpy
adata = infer_motifs(adata, dataset="ENCODE")
adata = infer_motifs(adata, dataset="ENCODE.H3K27ac.human")
```

The resulting `AnnData` object can be saved with the `.write()` method to a `h5ad` file. However, due to some difficulties with storing the motif annotation in the correct format, the file cannot be loaded with the `scanpy` load() method. Instead, use the `read()` method from the scepia package:
Expand Down
6 changes: 5 additions & 1 deletion data/data_directory.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
name version url
ENCODE 0.1 http://mbdata.science.ru.nl/share/heeringen/area27/area27.ENCODE.data.tgz
BLUEPRINT 1.0 http://mbdata.science.ru.nl/share/heeringen/area27/scepia.BLUEPRINT.data.v1.0.tgz
BLUEPRINT.H3K27ac.human 1.0.0 https://zenodo.org/record/4290591/files/BLUEPRINT.H3K27ac.human.tgz
Domcke.ATAC.fetal.human 0.1.0 https://zenodo.org/record/4290593/files/Domcke.ATAC.fetal.human.tgz
Cusanovich.ATAC.adult.mouse 0.1.0 https://zenodo.org/record/4290595/files/Cusanovich.ATAC.adult.mouse.tgz
ENCODE.H3K27ac.mouse 0.1.0 https://zenodo.org/record/4290597/files/ENCODE.H3K27ac.mouse.tgz
ENCODE.H3K27ac.human 0.1.0 https://zenodo.org/record/4290601/files/ENCODE.H3K27ac.human.tgz
Loading

0 comments on commit a9210e7

Please sign in to comment.