Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add mus mus domesticus demographic model #1485

Merged
merged 13 commits into from
Oct 17, 2023

Conversation

peterdfields
Copy link
Contributor

Hi @igronau It's taken me a bit of time to get back to this but you mentioned I should ping you when I submit things for the demographic models and DFE for mouse. Here is a draft demographic model for Mus musculus domesticus from Fujiwara et al. 2022. I have models for the other two subspecies of mouse as well from the same publication, I figured I'd just start with this one to make sure the format was okay. I haven't heard back anything about the genetic map, should I try pinging anyone about that? Anyway, if this looks okay I can add the two other models and then start on the DFE.

Copy link
Contributor

@igronau igronau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so much time to get to this. I reviewed the Domesticus demographic model and left some comments. Only one has to do with actual content. The other two are more about format and documentation. Let me know if you have any questions.


return stdpopsim.DemographicModel(
id="M_musculus_domesticus_Europe",
description="M. musculus domesticus piecewise constant size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have certain conventions for the demographic model ids. You can find them described here. According to these conventions, I would suggest something like DomesticusEurope_1F22

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also added similar names for the other demographic models.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

This model comes from MSMC using four randomly sampled
individuals (DEU01,DEU03,DEU04,DEU06) from a German population.
The model is estimated with 57 time periods.
""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also explicitly mention Fig 3 of Fujiwara et al 2022 and say that the population sizes and time changes were supplied by the authors (maybe even mention a specific author?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added these in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

This model comes from MSMC using four randomly sampled
individuals (KOR01,KOR02,KOR03,KOR05) from a Korean population.
The model is estimated with 57 time periods.
""",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments above about the model id and long description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

2040,
3844,
90428,
145603,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check that there is no offset here. From examination of fig 3, it looks like between 300-400 generations ago the Ne was ~150K. The way the arrays are set up here, I think that Ne=145,603 is associated with the time range 420-570.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say offset here do you mean something being added at the stage of plotting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be an inconsistency between the plot in fig 3 of Fujiwara et al 2022 and your implementation. I don't know if the problem is in fig 3 or your implementation. If I understand the implementation correctly, then each Ne in your table is being associated with the wrong time range (shifted one range back in time). The 5th Ne (145603) is associated in your implementation with the 5th time interval (420 - 570 generations ago). However, in Fig. 3, it appears to be associated the time range somewhere between 300 to 400 generations ago, which fits your 4th time interval. If you remove the first element in the sizes array, this should fix it IMO.

@petrelharp
Copy link
Contributor

Thanks a lot! Also note from the automatic checks that we're missing a table of parameters:

Skipping model parameters for M_musculus_domesticus_Europe due to missing table

@codecov
Copy link

codecov bot commented Jun 3, 2023

Codecov Report

All modified lines are covered by tests ✅

Comparison is base (b185585) 99.85% compared to head (51386ba) 99.85%.

❗ Current head 51386ba differs from pull request most recent head 6720d15. Consider uploading reports for the commit 6720d15 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1485      +/-   ##
==========================================
- Coverage   99.85%   99.85%   -0.01%     
==========================================
  Files         125      122       -3     
  Lines        4217     4210       -7     
  Branches      588      591       +3     
==========================================
- Hits         4211     4204       -7     
  Misses          3        3              
  Partials        3        3              
Files Coverage Δ
stdpopsim/catalog/MusMus/__init__.py 100.00% <100.00%> (ø)
stdpopsim/catalog/MusMus/demographic_models.py 100.00% <100.00%> (ø)

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@peterdfields
Copy link
Contributor Author

Thanks a lot! Also note from the automatic checks that we're missing a table of parameters:

Skipping model parameters for M_musculus_domesticus_Europe due to missing table

I have added these parameter tables in and the automatics checks seem to be passing now.


return stdpopsim.DemographicModel(
id="M_musculus_domesticus_Europe",
description="M. musculus domesticus piecewise constant size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Time (yrs.),291,Begining of 54th time interval
Time (yrs.),180,Begining of 55th time interval
Time (yrs.),83,Begining of 56th time interval
Time (yrs.),0,Begining of 57th time interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter table looks okay. I think that the standard is to have one more entry in the list of Nes than in the list of times. This should correspond to the Ne before the first time interval (the ancestral Ne). Other Nes should map to time intervals. This also relates to my comment on the "shift" in the demographic model

Copy link
Contributor

@igronau igronau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. See my comment about the number of Ne entries in the Domesticus model

@peterdfields
Copy link
Contributor Author

Looks good. See my comment about the number of Ne entries in the Domesticus model

@igronau I removed the first time interval from the demographic history file and tables, please let me know if any additional changes are needed.

@igronau
Copy link
Contributor

igronau commented Jul 6, 2023

@peterdfields , Sorry for the delayed response. I just came back from a long vacation. I think that your changes didn't fix the problem. You just removed he last entry in the times and sizes arrays, but this shouldn't really change how the more recent time intervals are implemented in the demographic model. What I think needs to be done is removing the first entry only in the sizes array (and keeping the times array as is). However, seeing as this is a major source of confusion, then maybe we should try to set a time to discuss this (via Zoom or some other interactive medium)? It might be the simplest way to make sure that we get it right. What do you say?

@peterdfields
Copy link
Contributor Author

@igronau No worries. I'm happy to discuss over zoom. What sort of windows of time work best for you?

@igronau
Copy link
Contributor

igronau commented Jul 6, 2023

I'm now in UCLA (Pacific time) and quite flexible in my schedule until next Wednesday. What time works best for you?

@peterdfields
Copy link
Contributor Author

I messaged you over slack. Let me know if you would prefer to continue here.

@petrelharp
Copy link
Contributor

ping @igronau @peterdfields - what's going on with this?

@peterdfields
Copy link
Contributor Author

@petrelharp @igronau I made some small changes to the configuration and tables files but it's not obvious to me how these changes caused the (runner?) errors. I just saw the update on Slack about getting the demography for mouse figured out and I remembered I needed to implement @igronau suggestions.

@petrelharp
Copy link
Contributor

Those errors are not your fault - looks like random CI failure. I'd say (a) implement the changes you need to make (and push); (b) run tests locally to identify errors; then (c) we'll get CI sorted if there's still an issue.

@peterdfields
Copy link
Contributor Author

@petrelharp the updates pushed yesterday seem to be working fine on the functional side. @igronau do the implemented changes fix the problems you identified before?

@igronau
Copy link
Contributor

igronau commented Oct 11, 2023

I verified the model for Domesticus and it seems fine. I created a figure below to validate that the pop size between times[i] and times[i+1] is indeed sizes[i]. The tables appear to require some tweaking. For example, according to the DomesticusEurope_1F22.csv, pop sizes are defined for 55 intervals and not 56, and sizes[1] (2040) is associated with the 55th interval and not the 56th interval. This can be fixed by adding another 133912 value to the table. There are 3 intervals associated with this Ne in the model. This is in addition to the ancestral Ne (also 133912), which is associated with the time range before year 1915544.

If my associations are correct,
image

@igronau
Copy link
Contributor

igronau commented Oct 11, 2023

I now also validated the demographic model for Mus Mus (red line in fig 3). In particular, I checked the first peak in Ne, which according to the plot occurred around 900-750 generations ago. The first local max in sizes is sizes[11] (68111), and times[11]=740 and times[12]=891, which is consistent with the figure. Also here, the parameter table in MusculusKorea_1F22.csv requires tweaking. Again, size[1] (179912) is associated with the 55th time interval and not the 56th interval. This can be fixed by adding another 152757 value to the table. There are 3 intervals associated with this Ne in the model. This is in addition to the ancestral Ne (also 152757), which is associated with the time range before year 807711.

@igronau
Copy link
Contributor

igronau commented Oct 11, 2023

I now also validated the demographic model for Mus Cas (green line in fig 3). I checked the first four intervals and they seem consistent with the figure (although the Ne of 938111 between 1886 and 3011 generations ago is not clear in the figure because it is out of bounds). Also here, the parameter table in CastaneusIndia_1F22.csv requires tweaking. Again, size[1] (64853) is associated with the 55th time interval and not the 56th interval. This can be fixed by adding another 344802 value to the table.

@peterdfields
Copy link
Contributor Author

@igronau I've updated the table files with (I hope correct!) the additional values/intervals you described. Please let me know if I should add any other modifications.

@igronau
Copy link
Contributor

igronau commented Oct 16, 2023

Looks good. I think that this PR can finally be merged, and then I can formally QC it

@igronau
Copy link
Contributor

igronau commented Oct 16, 2023

@petrelharp - can you merge this?

@peterdfields
Copy link
Contributor Author

Looks good. I think that this PR can finally be merged, and then I can formally QC it

yay! On to the DFE...I think this one will probably be the most tricky.

@petrelharp
Copy link
Contributor

I've run the tests locally, and they've passed. We'll deal with the CI failures elsewhere (e.g., maybe bumping the cache number like Ive just done in #1526 will do it). So, I'll merge this. Can one of you open the QC issue, please?

@petrelharp petrelharp merged commit 496a879 into popsim-consortium:main Oct 17, 2023
3 of 9 checks passed
@peterdfields peterdfields deleted the mus_mus_dem_history branch October 18, 2023 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants