Merge pull request #2433 from heylf/master

Adding ATAC-Seq slides
galaxyproject · Mar 30, 2021 · a7bb923 · a7bb923
2 parents 69279d2 + 6b6ff68
commit a7bb923
Show file tree

Hide file tree

Showing 4 changed files with 128 additions and 0 deletions.
diff --git a/bin/ari-map.yml b/bin/ari-map.yml
@@ -20,6 +20,9 @@ SQLAlchemy: SQL alchemy
 fastqc: fast QC
 ntpdate: NTP date
 RedHat: Red Hat
+ATAC-Seq: ATAC Seq
+Tn5: T N five
+paired-end: paired end
 # For some reason moz-tts doesn't pronounce right in a sentence unless it's
 # lowercase. This gives parity with AWS
 RAM: ram

diff --git a/topics/epigenetics/images/atac-seq/atac.jpg b/topics/epigenetics/images/atac-seq/atac.jpg
diff --git a/topics/epigenetics/images/atac-seq/per_base_sequence_content.png b/topics/epigenetics/images/atac-seq/per_base_sequence_content.png
diff --git a/topics/epigenetics/slides/introduction_atac.html b/topics/epigenetics/slides/introduction_atac.html
@@ -0,0 +1,125 @@
+---
+layout: introduction_slides
+logo: "GTN"
+video: true
+
+title: "ATAC-Seq data analysis"
+type: "introduction"
+questions:
+  - What is ATAC-Seq?
+  - What are the quality parameters to check for each dataset?
+  - How to analyse ATAC-Seq data?
+objectives:
+  - Understand ATAC-Seq
+  - Quality Parameters for ATAC-Seq
+  - Understand Peak calling for ATAC-Seq
+requirements:
+time_estimation: "30min"
+key_points:
+  - Run quality control on every sequencing dataset before any other analyses
+  - Choose QC parameters carefully
+  - Re-run FastQC to check the impact of the quality control
+contributors:
+  - heylf
+
+---
+
+# What is ATAC sequencing?
+
+---
+
+### Where my data comes from?
+
+![](../images/atac-seq/atac.jpg)
+
+<small>
+[*Buenrostro et al. 2013 Nat Methods*](https://doi.org/10.1002/0471142727.mb2129s109)
+</small>
+
+- Usage of hyperactive Tn5 transposase to insert sequencing adapters into open chromatin regions.
+- After adapter attachment the DNA is sheared by the Transposase itself.
+
+???
+
+You have reads without nucleosomes (histones) = open chromatin regions, and reads with these complexes.
+If the length is too big (bigger than 800bp) it is not correctly amplified by PCR and/or not efficiently sequenced by illumina sequencers.
+That's why it is so important to build coverage around the insertion sites and not fully between the mates of the pairs.
+If the pairs are larger apart than 170bp you don't know if there was a nucleosome between them.
+
+---
+
+### Characteristics of ATAC-Seq
+
+- Typically you have at least two biological replicates.
+- You may also have a control. A control could be purified DNA, which has no more nucleosomes, treated with Tn5. It is sequenced along with the ATAC sample.
+- ATAC-Seq is usually paired-end sequencing. The it is easier to idenfiy a true open chromatin region. That is why you need then both adapters.
+
+---
+
+## How to analyze ATAC-Seq data?
+
+---
+
+### Check the Insert Size
+
+![](../images/atac-seq/Screenshot_sizeDistribution_Good.png)
+
+
+- Typical insert size of 150-200 bp.
+- The first peak of 50 basepairs correspond to nucleosome-free regions.
+- The second peak that is a bit less than 200 basepairs corresponds to a single nucleosome.
+
+???
+
+The third one (around 400bp) is where Tn5 inserted around two adjacent nucleosomes and the fourth one (around 600bp) is where Tn5 inserted around three adjacent nucleosomes.
+
+---
+
+## Do not worry about a nucleotide bias
+
+![](../images/atac-seq/per_base_sequence_content.png)
+
+- Your experiment might have a nucelotide bias because of the transposase treatment.
+
+???
+
+([Brian Green et. al. 2012](https://doi.org/10.1186/1759-8753-3-3))
+
+---
+
+## Filtering Reads
+
+- Filter for uniquely mapped reads with end-to-end alignment.
+- Remove reads mapping to mitochondrial DNA.
+- Remove PCR duplicates.
+
+???
+
+ATAC-seq datasets usually contain a large percentage of reads that are derived from mitochondrial DNA.
+Since there are no ATAC-seq peaks of interest in the mitochondrial genome you can discard those reads.
+Especially because there is no nucleosome on the mitochondrion, so it makes this part of the genome very sensitive to Tn5.
+
+End-to-End alignement is probably useful because you are interested in the exact open chromatin regions.
+
+---
+
+## Peak Calling
+
+![](../images/atac-seq/schemeWithLegend.jpg)
+
+- You would prefer a peak caller taking into consideration that the adapters are separated by 9 basepairs.
+- When Tn5 cuts an accessible chromatin locus it inserts adapters separated by 9 basepairs.
+- It is better to test Genrich and MACS2. Both of them might produce different results based on the read coverage.
+
+???
+
+[Kia et al. 2017](https://doi.org/10.1186/s12896-016-0326-1)
+This means that to have the read start site reflect the centre of where Tn5 bound, the reads on the positive strand should be shifted 4 bp to the right and reads on the negative strands should be shifted 5 bp to the left as in [Buenrostro et al. 2013](https://doi.org/10.1002/0471142727.mb2129s109). Genrich can apply these shifts when ATAC-seq mode is selected.
+
+---
+
+## Overview
+
+![](../images/atac-seq/ATACWF.svg)
+
+- This is an overview of ATAC-Seq data analysis.