-
Notifications
You must be signed in to change notification settings - Fork 0
/
SMBE2019-poster.tex
230 lines (196 loc) · 11.9 KB
/
SMBE2019-poster.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
\documentclass[25pt, a0paper, portrait, margin=0mm, innermargin=15mm,
blockverticalspace=15mm, colspace=15mm, subcolspace=8mm]{tikzposter} %Default values for poster format options.
\usepackage{tabularx,booktabs,adjustbox} % For tables
\usetikzlibrary{calc,fit,arrows,decorations.pathmorphing,backgrounds,fit,positioning}
\usetikzlibrary{shapes.symbols}
\usepackage{color,colortbl}
\definecolor{Cblack}{rgb}{0,0,0}
\definecolor{Corange}{rgb}{0.9,0.6,0}
\definecolor{Cskyblue}{rgb}{0.35,0.7,0.9}
\definecolor{Cbluegreen}{rgb}{0,0.6,0.5}
\definecolor{Cyellow}{rgb}{0.95,0.9,0.25}
\definecolor{Cblue}{rgb}{0,0.45,0.7}
\definecolor{Cvermillion}{rgb}{0.8,0.4,0}
\definecolor{Cpurple}{rgb}{0.8,0.6,0.7}
\newcommand{\Cblack}[1]{\textcolor{Cblack}{#1}}
\newcommand{\Corange}[1]{\textcolor{Corange}{#1}}
\newcommand{\Cskyblue}[1]{\textcolor{Cskyblue}{#1}}
\newcommand{\Cbluegreen}[1]{\textcolor{Cbluegreen}{#1}}
\newcommand{\Cyellow}[1]{\textcolor{Cyellow}{#1}}
\newcommand{\Cblue}[1]{\textcolor{Cblue}{#1}}
\newcommand{\Cvermillion}[1]{\textcolor{Cvermillion}{#1}}
\newcommand{\Cpurple}[1]{\textcolor{Cpurple}{#1}}
\newcommand{\magenta}[1]{\textcolor{magenta}{#1}}
\newcommand{\teal}[1]{\textcolor{teal}{#1}}
\newcommand{\gray}[1]{\textcolor{gray}{#1}}
% tikz colour settings
\tikzset{pop1/.style={blue!40},pop2/.style={red!40}}
\definecolorpalette{myColorPalette} {
\definecolor{colorOne}{named}{teal}
\definecolor{colorTwo}{named}{white}
\definecolor{colorThree}{named}{cyan}
}
\tikzposterlatexaffectionproofon %shows small comment on how the poster was made at bottom of poster
% Commands
\newcommand{\bs}{\textbackslash} % backslash
\newcommand{\cmd}[1]{{\bf \color{red}#1}} % highlights command
% \newcommand{\crossmark}{\ding{55}}
% Title, Author, Institute
\title{Efficient simulation of introgression, admixture and local ancestry}
\author{{\bf Georgia Tsambos}}
\institute{{\bf University of Melbourne, Australia}}
% -- PREDEFINED THEMES ---------------------- %
% Choose LAYOUT: Default, Basic, Rays, Simple, Envelope, Wave, Board, Autumn, Desert,
\usetheme{Autumn}
\usecolorstyle[colorPalette=myColorPalette]{Denmark}
\begin{document}
\maketitle
\block[bodyverticalshift=-2cm]{}{
{\Large
Georgia Tsambos (1, 2), Peter Ralph (3), Jerome Kelleher (4), Stephen Leslie (1, 2, 5), Damjan Vukcevic (1, 2).
(1) School of Mathematics and Statistics, University of Melbourne, Australia (2) Melbourne Integrative Genomics, University of Melbourne, Australia, (3) Department of Mathematics, University of Oregon, United States, (4) Big Data Institute, University of Oxford, United Kingdom, (5) School of Biosciences, University of Melbourne, Australia.\\[7mm]
Presenting author: gtsambos (at) student.unimelb.edu.au
}
}
%%% BLOCK 0
\block[titleoffsety=1cm,bodyverticalshift=-20cm]{0. Introduction}{
}
\begin{columns}
\begin{subcolumns}
\subcolumn{23} \block[bodyoffsetx=1.5cm,bodyverticalshift=-5cm]{}{
To assess the performance of methods in population genetics, we often wish to simulate realistic genetic datasets while retaining detailed information about the history of the simulated genomes.
This poster briefly describes how we can efficiently simulate genetic information with full information about the ancestral populations that particular genomic segments have been inherited from, often called the \magenta{local ancestry} of the sample. In all pictures here, we represent ancestries with colours.\\[3mm]
\teal{Section 1} introduces the \magenta{tree sequence} [1], a data structure that is capable of encoding a complete genealogy for a sample of chromosomes at each chromosomal location.
\teal{Section 2} shows how local ancestry information can be stored and extracted from tree sequences.
\teal{Section 3} outlines a method for simulating such information using recent advances implemented in the tree sequence simulation software msprime [1] and SLiM [2].
\teal{Section 4} demonstrates the performance of this method.
\teal{Section 5} provides further information for the interested viewer.
%Many existing methods can infer the ancestral origin of chromosomal segments.
%However, it is difficult to simulate chromosomes for which the true origin of those segments is known; existing approaches are approximate and ad-hoc. Recent advances implemented in the software msprime (Kelleher et al. 2016) and SLiM (Haller et al. 2018) allow us to efficiently record genetic information using a succinct tree sequence data structure, which provides unprecedented detail about the genealogy of the sample
}
\subcolumn{3} \block[bodyoffsetx=-2cm,bodyverticalshift=-6cm]{}{ \input{pics/admixed-family-tree.tex}}
\end{subcolumns}
\end{columns}
%%% BLOCK 1
\begin{columns}%blocks will be placed into columns
\column{.5}
\block[roundedcorners=40,titleoffsety=-8cm,bodyoffsety=-8cm,bodyverticalshift=-1cm]{1. Tree sequences}{
Genetic sequence data is BIG and REPETITIVE:\\[2mm]
\begin{center}
{\tt
%\ldots GTAACGCGATAAGA\Cvermillion{G}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
%\ldots GTAACGCGATAAGA\Cvermillion{G}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cvermillion{G}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cvermillion{G}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cvermillion{A}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cskyblue{T}AATAGCGTA\ldots \\
%\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cskyblue{T}AATAGCGTA\ldots \\
%\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cskyblue{T}AATAGCGTA\ldots \\
\ldots GTAACGCGATAAGA\Cskyblue{T}ATTAGCCCAAAAACACAGACATGG\Cskyblue{T}AATAGCGTA\ldots \\[1mm]
}
\gray{\small $\leftarrow 5\times 10^7$ bases for small human chromosome $\rightarrow$}\\[4mm]
\end{center}
%Because of this, you are probably used to storing your data in a compressed format, and decompressing it only when you need to perform analyses or query the data. Doing this can be time-consuming and computationally expensive, however.
%Storing $n = 1\times 10^5$ chromosomes in a compressed VCF requires $\approx$ 50 GB.
However, common haplotypes in a sample are often simply a consequence of some common history. So if we know this history (as we always do in simulations!), storing it directly is often more convenient and efficient than storing the raw haplotypes. This is the key idea behind the \magenta{tree sequence} data structure [1], which encodes a complete genealogy for a sample of chromosomes at each chromosomal location. Tree sequences offer a few benefits to population geneticists compared with traditional sequence-based file formats:
\begin{itemize}
\item They can store large simulated datasets extremely compactly.
\item As they hold rich detail about the history of the sample, many important processes can be observed directly from the tree structure.
\item They can be queried and modified extremely quickly.
\end{itemize}
}
%%%% BLOCK 3
\block[titleoffsety=1cm,bodyoffsety=4cm]{3. Method outline}{
}
\begin{subcolumns}
\subcolumn{.48}
\block{}{
\input{pics/new-method/simulation-method-0.tex}
\input{pics/new-method/simulation-method-2.tex}
\input{pics/new-method/simulation-method-4.tex}
}
\subcolumn{.48}
\block{}{
\input{pics/new-method/simulation-method-1.tex}
\input{pics/new-method/simulation-method-3.tex}
\input{pics/new-method/simulation-method-6.tex}
}
\end{subcolumns}
\column{.5}
%%% BLOCK 2
\block[titleoffsety=-8cm,bodyoffsety=-3.5cm]{2. Local ancestry in tree sequences}{}
% \begin{subcolumns}
% \subcolumn{.45}
\block{}{
\begin{center}
\input{pics/local-ancestry-extraction.tex}
\end{center}
By assigning population labels to the nodes that correspond to ancestors of the sample, tree sequences can store the sample's local ancestry.
The branch joining a sample node to an ancestral node shows its ancestry.
Extracting this information efficiently is challenging due to correlations in genealogical structure between samples, and across chromosomes;
in an upcoming paper, we will describe an algorithm that enables us to do this.
}
\vspace{20cm}
\block[titleoffsety=-2.5cm,bodyoffsety=-2.5cm]{4. Method performance}{
\begin{center}
%\large
\centering
\setlength{\aboverulesep}{5pt}
\setlength{\belowrulesep}{5pt}
%\newcolumntype{R}{>{\raggedleft\arraybackslash}X}%
\begin{tabularx}{.45\textwidth}{p{8cm}p{7cm}p{7cm}p{7cm}X}
\toprule
%& \multicolumn{3}{c}{{\bf Global ancestry}} \\[1mm]
& Missing data & Run time (sec) & File size (Mb) & Selection \\
\midrule
{\bf \texttt{msprime}} & $4.0$\% & 6 & 9 & No \\ \addlinespace
{\bf \texttt{msprime}} +~ARG & $0.0$\% & 53 & 1700 & No\\ \addlinespace
%{\bf + migration records} & $0.0$\% & & & \crossmark \\
%\midrule
%{\bf \texttt{SLiM}} & $4.0$\% & 10 min & 16 Mb & \checkmark \\ \addlinespace
{\bf \texttt{SLiM}}& {$0.0$\%} & $>3600$ & 41 & {Yes} \\ \addlinespace
%\midrule
{\bf\magenta{\texttt{slime}}} & \magenta{$0.0$\%} & \magenta{86} & \magenta{39} & \magenta{Yes} \\ \addlinespace
\bottomrule\\
\end{tabularx}
\end{center}
To illustrate the power of our method, \texttt{slime} [3], we simulated a toy demographic scenario inspired by the history of Neanderthal introgression into the Eurasian population.
We simulated 200 chromosomes of length $50\thinspace$Mb from 100 present-day Eurasian individuals, assuming a 2\% introgression of Neanderthals into Eurasians 2500 generations ago.
For simplicity, we assumed a constant effective population sizes of 5000 individuals, a uniform recombination rate of $1 \times 10^{-8}$ bp per generation, a uniform mutation rate of $1 \times 10^{-8}$ bp per generation and neutral variation.
However, note that all of these methods can deal with more complexity than this.
In particular, both \texttt{slime} and \texttt{SLiM} are capable of simulating under selection.\\
Although still under development, \texttt{slime} appears to outperform existing tree sequence simulation softwares on various metrics by orders of magnitude.
As a principled fusion of \texttt{SLiM} and \texttt{msprime}, \texttt{slime} will allow users to track local ancestry in large simulations under realistically complex demographic scenarios, and with minimal computational overhead.
}
\end{columns}
%%% BLOCK 5
\block[titleoffsety=-3cm,bodyoffsety=-3cm]{5. Acknowledgements, references and further information}{
}
\begin{columns}
\begin{subcolumns}
\subcolumn{25} \block[bodyoffsetx=1.5cm,bodyverticalshift=-5cm]{}{
GT is funded by the Helen Freeman scholarship, the Maurice Belz Fund and the Australian Government's Research Training Scheme. \\[1mm]
[1] Kelleher, J., et al. (2016). Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLOS Computational Biology, 12(5).\\[1mm]
[2] Galloway, J., et al. (2018). Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes. Molecular Ecology Resources, (November 2018), 552–566.\\[1mm]
[3] Under development: \texttt{https://github.com/gtsambos/slime}
}
\subcolumn{3} \block[bodyoffsetx=-3cm,bodyverticalshift=-7.5cm]{}{
\begin{center}
\includegraphics[scale=.6]{pics/2019-07-SMBE-QRcode.png}\\[-7mm]
More info
\end{center}
}
\subcolumn{3} \block[bodyoffsetx=-5cm,bodyverticalshift=-7cm]{}{
\begin{center}
\includegraphics[scale=.3]{pics/Georgia_headshot.png}\\
Come say hi!
\end{center}
}
\end{subcolumns}
\end{columns}
\end{document}
\endinput
%%
%% End of file `tikzposter-example.tex'.