Skip to content

TADGATE is a computational tool to identify TADs in Hi-C contact map with a graph attention autoencoder.

License

Notifications You must be signed in to change notification settings

zhanglabtools/TADGATE

Repository files navigation

TADGATE

Topologically associating domains (TADs) have emerged as basic structural and functional units of genome organization. However, accurately identifying TADs from sparse chromatin contact maps
remain challenging. Here, we developed TADGATE to identify TADs
in Hi-C contact map with a graph attention autoencoder. It impute
and smooth the sparse chromatin contact maps while preserving or enhancing their topological domains. TADGATE can output imputed
Hi-C contact maps with clear topological structures. Additionally, it
can provide embeddings for each chromatin bin, and the learned
attention patterns effectively depict the positions of TAD boundaries.

Overview

TADGATE can provide good embeddings to represent bins within each TAD.

TADGATE can impute the sparse chromatin contact maps with enhanced topological domains.

TADGATE can impute the single-cell chromatin contact maps and identify TAD-like domains.

Getting start

Installation

The TADGATE package is developed based on the Python libraries Scanpy, PyTorch and PyG (PyTorch Geometric) framework, and can be run on GPU (recommend) or CPU.

First clone the repository.

git clone https://github.com/zhanglabtools/TADGATE.git
cd TADGATE

It's recommended to create a separate conda environment for running TADGATE:

#create an environment
conda create -n TADGATE python=3.8
#activate your environment
conda activate TADGATE

Install TADGATE with two methods:

  1. Install TADGATE by PyPI
pip install TADGATE
  1. Or install from source code
pip install .

The use of the mclust algorithm requires the rpy2 package (Python) and the mclust package (R). See https://pypi.org/project/rpy2/ and https://cran.r-project.org/web/packages/mclust/index.html for detail. You can also use K-means for instead, if you can't use mclust.

Tutorial

TADGATE support four kinds of input format of Hi-C contact map: dense contact matrix, sparse contact matrix, .hic file produced by juicer tools or .mcool file produced by cooler.

More detailed information can be seen in TADGATE usage.ipynb.

The data used in the tutorial can be downloaded here.

For Command Line Interface(CLI) user

The parameter file for TADGATE needs to be prepared according to the file in the example.

cd TADGATE
python TADGATE_CLI.py [path/to/you/paramaters.txt]

Support

If you have any issues, please let us know. We have a mailing list located at:

Citation

If TADGATE is used in your research, please cite our paper:

Uncovering topologically associating domains from three-dimensional genome maps with TADGATE. Dachang Dang, Shao-Wu Zhang, Kangning Dong, Ran Duan, Shihua Zhang

About

TADGATE is a computational tool to identify TADs in Hi-C contact map with a graph attention autoencoder.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published