-
Notifications
You must be signed in to change notification settings - Fork 911
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2433 from heylf/master
Adding ATAC-Seq slides
- Loading branch information
Showing
4 changed files
with
128 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,125 @@ | ||
--- | ||
layout: introduction_slides | ||
logo: "GTN" | ||
video: true | ||
|
||
title: "ATAC-Seq data analysis" | ||
type: "introduction" | ||
questions: | ||
- What is ATAC-Seq? | ||
- What are the quality parameters to check for each dataset? | ||
- How to analyse ATAC-Seq data? | ||
objectives: | ||
- Understand ATAC-Seq | ||
- Quality Parameters for ATAC-Seq | ||
- Understand Peak calling for ATAC-Seq | ||
requirements: | ||
time_estimation: "30min" | ||
key_points: | ||
- Run quality control on every sequencing dataset before any other analyses | ||
- Choose QC parameters carefully | ||
- Re-run FastQC to check the impact of the quality control | ||
contributors: | ||
- heylf | ||
|
||
--- | ||
|
||
# What is ATAC sequencing? | ||
|
||
--- | ||
|
||
### Where my data comes from? | ||
|
||
![](../images/atac-seq/atac.jpg) | ||
|
||
<small> | ||
[*Buenrostro et al. 2013 Nat Methods*](https://doi.org/10.1002/0471142727.mb2129s109) | ||
</small> | ||
|
||
- Usage of hyperactive Tn5 transposase to insert sequencing adapters into open chromatin regions. | ||
- After adapter attachment the DNA is sheared by the Transposase itself. | ||
|
||
??? | ||
|
||
You have reads without nucleosomes (histones) = open chromatin regions, and reads with these complexes. | ||
If the length is too big (bigger than 800bp) it is not correctly amplified by PCR and/or not efficiently sequenced by illumina sequencers. | ||
That's why it is so important to build coverage around the insertion sites and not fully between the mates of the pairs. | ||
If the pairs are larger apart than 170bp you don't know if there was a nucleosome between them. | ||
|
||
--- | ||
|
||
### Characteristics of ATAC-Seq | ||
|
||
- Typically you have at least two biological replicates. | ||
- You may also have a control. A control could be purified DNA, which has no more nucleosomes, treated with Tn5. It is sequenced along with the ATAC sample. | ||
- ATAC-Seq is usually paired-end sequencing. The it is easier to idenfiy a true open chromatin region. That is why you need then both adapters. | ||
|
||
--- | ||
|
||
## How to analyze ATAC-Seq data? | ||
|
||
--- | ||
|
||
### Check the Insert Size | ||
|
||
![](../images/atac-seq/Screenshot_sizeDistribution_Good.png) | ||
|
||
|
||
- Typical insert size of 150-200 bp. | ||
- The first peak of 50 basepairs correspond to nucleosome-free regions. | ||
- The second peak that is a bit less than 200 basepairs corresponds to a single nucleosome. | ||
|
||
??? | ||
|
||
The third one (around 400bp) is where Tn5 inserted around two adjacent nucleosomes and the fourth one (around 600bp) is where Tn5 inserted around three adjacent nucleosomes. | ||
|
||
--- | ||
|
||
## Do not worry about a nucleotide bias | ||
|
||
![](../images/atac-seq/per_base_sequence_content.png) | ||
|
||
- Your experiment might have a nucelotide bias because of the transposase treatment. | ||
|
||
??? | ||
|
||
([Brian Green et. al. 2012](https://doi.org/10.1186/1759-8753-3-3)) | ||
|
||
--- | ||
|
||
## Filtering Reads | ||
|
||
- Filter for uniquely mapped reads with end-to-end alignment. | ||
- Remove reads mapping to mitochondrial DNA. | ||
- Remove PCR duplicates. | ||
|
||
??? | ||
|
||
ATAC-seq datasets usually contain a large percentage of reads that are derived from mitochondrial DNA. | ||
Since there are no ATAC-seq peaks of interest in the mitochondrial genome you can discard those reads. | ||
Especially because there is no nucleosome on the mitochondrion, so it makes this part of the genome very sensitive to Tn5. | ||
|
||
End-to-End alignement is probably useful because you are interested in the exact open chromatin regions. | ||
|
||
--- | ||
|
||
## Peak Calling | ||
|
||
![](../images/atac-seq/schemeWithLegend.jpg) | ||
|
||
- You would prefer a peak caller taking into consideration that the adapters are separated by 9 basepairs. | ||
- When Tn5 cuts an accessible chromatin locus it inserts adapters separated by 9 basepairs. | ||
- It is better to test Genrich and MACS2. Both of them might produce different results based on the read coverage. | ||
|
||
??? | ||
|
||
[Kia et al. 2017](https://doi.org/10.1186/s12896-016-0326-1) | ||
This means that to have the read start site reflect the centre of where Tn5 bound, the reads on the positive strand should be shifted 4 bp to the right and reads on the negative strands should be shifted 5 bp to the left as in [Buenrostro et al. 2013](https://doi.org/10.1002/0471142727.mb2129s109). Genrich can apply these shifts when ATAC-seq mode is selected. | ||
|
||
--- | ||
|
||
## Overview | ||
|
||
![](../images/atac-seq/ATACWF.svg) | ||
|
||
- This is an overview of ATAC-Seq data analysis. |