Are SLiM Substitutions saved to the tree sequence? #245

elissasoroj · 2022-02-22T14:17:39Z

elissasoroj
Feb 22, 2022

Hi!
A few questions about substitution objects:

Are substitutions objects from SLiM saved in the tree sequence? The SLiM manual says:

"Note that fixed mutations (i.e., Substitution objects in SLiM) will be added on to the end of this derived state list, even in cases such as nucleotide-based models where mutations do not normally stack, because of the way substitutions are reconciled internally between SLiM and the tree sequence."

I've gone looking for these fixed mutations at the end of the derived state list but I think I'm getting confused about all the terminology. Each mutation in the tree sequence has a derived state, but I'm unsure how this tells me whether it fixed or not - and I'm having a little trouble wrapping my head around how there can be more than one derived state for one mutation. I keep thinking the derived state list should be a characteristic of a site rather than a mutation.

Generally, I am trying to access metadata on substitutions from a SLiM simulation, e.g. mutation type, selection coeff, etc. When I know where to look (e.g. the mutation type and time it arose) I can find the fixed mutation, but I don't know how to tell that it is a substitution rather than a regular mutation. E.g.:

#this fixed mutation:
Mutation(id=28821, site=28023, node=12028, derived_state='779132671', parent=-1, metadata={'mutation_list': 
[{'mutation_type': 4, 'selection_coeff': 0.5, 'subpopulation': 0, 'slim_time': 13000, 'nucleotide': -1}]}, time=4001.0)

#looks like any other:
Mutation(id=3, site=3, node=10120, derived_state='811440967', parent=-1, metadata={'mutation_list': 
[{'mutation_type': 1, 'selection_coeff': 0.0, 'subpopulation': 1, 'slim_time': 17001, 'nucleotide': -1}]}, time=0.0)

If I can't identify substitutions by looking at the tree sequence is there a way I can tag them so when they are output from SLiM I can find them easily? I'm thinking I could just add a list of their ids to the user metadata but curious if there's a more elegant solution.

Answered by petrelharp

Feb 24, 2022

Hello, @elissasoroj! Let's see - there is nothing stored in the tree sequence to indicate that a given mutation is a substitution, other than the frequency of that mutation.

Reminder about how the mutation model in SLiM works - by default, mutations "stack", i.e., we can think of them as sticky notes stuck onto the DNA sequence, and thus subsequent mutations don't erase previous ones at the same spot. That's what led to the confusing data model in SLiM tree sequences: suppose that some chromosome got mutation A; then that chromosome's descendants would inherit the sticky note labeled A. If someone with A then also got a B mutation, all their descendants would get both A and B; so in the t…

View full answer

petrelharp · 2022-02-24T06:18:05Z

petrelharp
Feb 24, 2022
Maintainer

Hello, @elissasoroj! Let's see - there is nothing stored in the tree sequence to indicate that a given mutation is a substitution, other than the frequency of that mutation.

Reminder about how the mutation model in SLiM works - by default, mutations "stack", i.e., we can think of them as sticky notes stuck onto the DNA sequence, and thus subsequent mutations don't erase previous ones at the same spot. That's what led to the confusing data model in SLiM tree sequences: suppose that some chromosome got mutation A; then that chromosome's descendants would inherit the sticky note labeled A. If someone with A then also got a B mutation, all their descendants would get both A and B; so in the tree sequence world, the result of that mutation was the combination (or, list, or stack of sticky notes) A and B.

So, when we talk about mutations we have to distinguish whether we mean a "SLiM mutation" or a "tskit mutation", since each tskit mutation carries around with it a list of SLiM mutations. (In the example above, there's two mutations recorded in the tree sequence; one with derived state A, and the other with derived state A,B).

Now, a substitution is just a mutation that everyone at the end has inherited. To figure out which (tskit) mutation is inherited by everyone, assuming that you haven't Remembered any individuals along the way, you can compute the allele frequencies, e.g., with the code here: tskit-dev/tskit#504 All the slim mutations that are listed in these fixed tskit mutations should be substitutions back in SLiM.

How's that? I'm happy to look at some code to do this if you want to post it here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are SLiM Substitutions saved to the tree sequence? #245

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Are SLiM Substitutions saved to the tree sequence? #245

elissasoroj Feb 22, 2022

Replies: 1 comment

petrelharp Feb 24, 2022 Maintainer

elissasoroj
Feb 22, 2022

petrelharp
Feb 24, 2022
Maintainer