Are SLiM Substitutions saved to the tree sequence? #245
-
Hi! Are substitutions objects from SLiM saved in the tree sequence? The SLiM manual says:
I've gone looking for these fixed mutations at the end of the derived state list but I think I'm getting confused about all the terminology. Each mutation in the tree sequence has a derived state, but I'm unsure how this tells me whether it fixed or not - and I'm having a little trouble wrapping my head around how there can be more than one derived state for one mutation. I keep thinking the derived state list should be a characteristic of a site rather than a mutation. Generally, I am trying to access metadata on substitutions from a SLiM simulation, e.g. mutation type, selection coeff, etc. When I know where to look (e.g. the mutation type and time it arose) I can find the fixed mutation, but I don't know how to tell that it is a substitution rather than a regular mutation. E.g.:
If I can't identify substitutions by looking at the tree sequence is there a way I can tag them so when they are output from SLiM I can find them easily? I'm thinking I could just add a list of their ids to the user metadata but curious if there's a more elegant solution. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello, @elissasoroj! Let's see - there is nothing stored in the tree sequence to indicate that a given mutation is a substitution, other than the frequency of that mutation. Reminder about how the mutation model in SLiM works - by default, mutations "stack", i.e., we can think of them as sticky notes stuck onto the DNA sequence, and thus subsequent mutations don't erase previous ones at the same spot. That's what led to the confusing data model in SLiM tree sequences: suppose that some chromosome got mutation A; then that chromosome's descendants would inherit the sticky note labeled A. If someone with A then also got a B mutation, all their descendants would get both A and B; so in the tree sequence world, the result of that mutation was the combination (or, list, or stack of sticky notes) A and B. So, when we talk about mutations we have to distinguish whether we mean a "SLiM mutation" or a "tskit mutation", since each tskit mutation carries around with it a list of SLiM mutations. (In the example above, there's two mutations recorded in the tree sequence; one with derived state A, and the other with derived state A,B). Now, a substitution is just a mutation that everyone at the end has inherited. To figure out which (tskit) mutation is inherited by everyone, assuming that you haven't Remembered any individuals along the way, you can compute the allele frequencies, e.g., with the code here: tskit-dev/tskit#504 All the slim mutations that are listed in these fixed tskit mutations should be substitutions back in SLiM. How's that? I'm happy to look at some code to do this if you want to post it here. |
Beta Was this translation helpful? Give feedback.
Hello, @elissasoroj! Let's see - there is nothing stored in the tree sequence to indicate that a given mutation is a substitution, other than the frequency of that mutation.
Reminder about how the mutation model in SLiM works - by default, mutations "stack", i.e., we can think of them as sticky notes stuck onto the DNA sequence, and thus subsequent mutations don't erase previous ones at the same spot. That's what led to the confusing data model in SLiM tree sequences: suppose that some chromosome got mutation A; then that chromosome's descendants would inherit the sticky note labeled A. If someone with A then also got a B mutation, all their descendants would get both A and B; so in the t…