Skip to content

Commit

Permalink
Merge pull request #2433 from heylf/master
Browse files Browse the repository at this point in the history
Adding ATAC-Seq slides
  • Loading branch information
lldelisle authored Mar 30, 2021
2 parents 69279d2 + 6b6ff68 commit a7bb923
Show file tree
Hide file tree
Showing 4 changed files with 128 additions and 0 deletions.
3 changes: 3 additions & 0 deletions bin/ari-map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,9 @@ SQLAlchemy: SQL alchemy
fastqc: fast QC
ntpdate: NTP date
RedHat: Red Hat
ATAC-Seq: ATAC Seq
Tn5: T N five
paired-end: paired end
# For some reason moz-tts doesn't pronounce right in a sentence unless it's
# lowercase. This gives parity with AWS
RAM: ram
Expand Down
Binary file added topics/epigenetics/images/atac-seq/atac.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
125 changes: 125 additions & 0 deletions topics/epigenetics/slides/introduction_atac.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
layout: introduction_slides
logo: "GTN"
video: true

title: "ATAC-Seq data analysis"
type: "introduction"
questions:
- What is ATAC-Seq?
- What are the quality parameters to check for each dataset?
- How to analyse ATAC-Seq data?
objectives:
- Understand ATAC-Seq
- Quality Parameters for ATAC-Seq
- Understand Peak calling for ATAC-Seq
requirements:
time_estimation: "30min"
key_points:
- Run quality control on every sequencing dataset before any other analyses
- Choose QC parameters carefully
- Re-run FastQC to check the impact of the quality control
contributors:
- heylf

---

# What is ATAC sequencing?

---

### Where my data comes from?

![](../images/atac-seq/atac.jpg)

<small>
[*Buenrostro et al. 2013 Nat Methods*](https://doi.org/10.1002/0471142727.mb2129s109)
</small>

- Usage of hyperactive Tn5 transposase to insert sequencing adapters into open chromatin regions.
- After adapter attachment the DNA is sheared by the Transposase itself.

???

You have reads without nucleosomes (histones) = open chromatin regions, and reads with these complexes.
If the length is too big (bigger than 800bp) it is not correctly amplified by PCR and/or not efficiently sequenced by illumina sequencers.
That's why it is so important to build coverage around the insertion sites and not fully between the mates of the pairs.
If the pairs are larger apart than 170bp you don't know if there was a nucleosome between them.

---

### Characteristics of ATAC-Seq

- Typically you have at least two biological replicates.
- You may also have a control. A control could be purified DNA, which has no more nucleosomes, treated with Tn5. It is sequenced along with the ATAC sample.
- ATAC-Seq is usually paired-end sequencing. The it is easier to idenfiy a true open chromatin region. That is why you need then both adapters.

---

## How to analyze ATAC-Seq data?

---

### Check the Insert Size

![](../images/atac-seq/Screenshot_sizeDistribution_Good.png)


- Typical insert size of 150-200 bp.
- The first peak of 50 basepairs correspond to nucleosome-free regions.
- The second peak that is a bit less than 200 basepairs corresponds to a single nucleosome.

???

The third one (around 400bp) is where Tn5 inserted around two adjacent nucleosomes and the fourth one (around 600bp) is where Tn5 inserted around three adjacent nucleosomes.

---

## Do not worry about a nucleotide bias

![](../images/atac-seq/per_base_sequence_content.png)

- Your experiment might have a nucelotide bias because of the transposase treatment.

???

([Brian Green et. al. 2012](https://doi.org/10.1186/1759-8753-3-3))

---

## Filtering Reads

- Filter for uniquely mapped reads with end-to-end alignment.
- Remove reads mapping to mitochondrial DNA.
- Remove PCR duplicates.

???

ATAC-seq datasets usually contain a large percentage of reads that are derived from mitochondrial DNA.
Since there are no ATAC-seq peaks of interest in the mitochondrial genome you can discard those reads.
Especially because there is no nucleosome on the mitochondrion, so it makes this part of the genome very sensitive to Tn5.

End-to-End alignement is probably useful because you are interested in the exact open chromatin regions.

---

## Peak Calling

![](../images/atac-seq/schemeWithLegend.jpg)

- You would prefer a peak caller taking into consideration that the adapters are separated by 9 basepairs.
- When Tn5 cuts an accessible chromatin locus it inserts adapters separated by 9 basepairs.
- It is better to test Genrich and MACS2. Both of them might produce different results based on the read coverage.

???

[Kia et al. 2017](https://doi.org/10.1186/s12896-016-0326-1)
This means that to have the read start site reflect the centre of where Tn5 bound, the reads on the positive strand should be shifted 4 bp to the right and reads on the negative strands should be shifted 5 bp to the left as in [Buenrostro et al. 2013](https://doi.org/10.1002/0471142727.mb2129s109). Genrich can apply these shifts when ATAC-seq mode is selected.

---

## Overview

![](../images/atac-seq/ATACWF.svg)

- This is an overview of ATAC-Seq data analysis.

0 comments on commit a7bb923

Please sign in to comment.