-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding genetic map for honeybee #1399
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportBase: 99.94% // Head: 99.81% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1399 +/- ##
==========================================
- Coverage 99.94% 99.81% -0.14%
==========================================
Files 109 110 +1
Lines 3789 3812 +23
Branches 515 522 +7
==========================================
+ Hits 3787 3805 +18
- Misses 1 6 +5
Partials 1 1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Hm, okay - now we need the python code? can you get that going? |
@janaobsteter - do you think you can finalize this within the next couple of days? If so, we can update fig 1 in the "adding species" manuscript to show the genetic map for ApiMel |
@igronau , hi! Probably yes - I just have another question: should the genetic map provide genetic positions for every base in the reference genome? Thanks! |
According to what I see in the examples in the docs, you don't have to specify a rate for every position. |
LGTM. @petrelharp - anything else we need to check here before this is merged? |
@igronau I just have to finish the description! Will do that today! |
Ok, I've updated the description! However, these maps are a bit redundant - they were made from data on crossover events, and recombination rate was assumed stable between crossovers. Hence, the maps would actually really need to include the sites where the recombination rate changes. Currently, they are expanded to fit a certain vcf, but I can strip them down (if you agree). |
I'm not sure because I haven't added genetic maps to stdpopsim. Maybe we should consult with @petrelharp. |
I had no idea honeybee recombination rates were so high, that's neat. @janaobsteter By expanded to fit a certain VCF, do you mean that there are successive positions in the files that have the same recombination rate only so each site in that VCF was given a row? If that's the case, I would think that it could be stripped back down to just the sites where the rate changes (to fit the msprime specification from the link Ilan gave, "The value in the rate column in a given line gives the constant rate between the physical position in that line (inclusive) and the physical position on the next line (exclusive).") And it might be a good idea for the sake of file size. But @petrelharp (or @jeromekelleher ?) would be able to say more definitely. (Also, I think it's expecting a header line for each rate file?) |
@lauterbur , yes, that's exactly what has been done (several lines have the rate) - that's why I want to strip it down. Will do this! I will also add the column names. |
@igronau , everything has been added! I've also added the new reduced genetic maps (although I need to remove them and send them separately - atm they are in the ApiMel folder). |
@nspope and @petrelharp I'm not sure what failing codecov/patch and codecov/project means. Do you think this is ready to be merged? |
I think you need to import the contents of the new |
And I think @andrewkern has to upload the maps to AWS for everything to work correctly? |
@nspope , I've edited the |
Yeah @janaobsteter, I think it's easiest to remove the tarball from this PR and send it to @andrewkern. Once the maps are on AWS, I can rerun the tests to make sure it's loading OK. |
@nspope, I've removed the tarball and send it to Andrew. |
Thanks @janaobsteter! |
@janaobsteter @nspope -- i've gone ahead and uploaded the ApiMel map to AWS. It can be found at the following URL: https://stdpopsim.s3.us-west-2.amazonaws.com/genetic_maps/ApiMel/Liu2015_litfover_maps.tar.gz Once the URL and the checksum have been updated in the code, tests should pass. |
Thanks @andrewkern! @janaobsteter could you please update the sha256sum and url? |
@nspope , sure - just to check - how do I obtain the sha? |
you would use the % sha256sum ~/Desktop/Liu2015_litfover_maps.tar.gz
551a1819fb007573b6fa00a088493964dacd089a5d5863cf03b7b73eac0bd405 /Users/adk/Desktop/Liu2015_litfover_maps.tar.gz |
@andrewkern , ok! This is the sha that I've put in the file - however, the tests are still failing ... |
Hi @janaobsteter, it looks like the directory structure in that tarball doesn't match
e.g. this'll unpack into Probably would be good to clean it up and just have a single directory 'Liu2015_litfover_maps' without the hidden files, then resend to Andy; rather than add a really long path to |
Whoops @janaobsteter looks like this file path screw-up happened on Andy's end -- so nevermind, there's nothing you need to do; @andrewkern or I will fix it and re-upload. Sorry for the confusion. This stuff is a bit fiddly, unfortunately. |
Darn, looks like there's some formatting issues with these maps. I think it's because (1) there's no header line (so, the first line is being silently skipped); (2) there's no last line (with a 0 rate). The format should match the example in the docs for msprime.RateMap.read_hapmap |
Hmm, I think there are issues with the genetic map positions as well -- for example take these two lines from
The genetic map position (4th column) for the second row is 44.050139, but shouldn't it be,
For instance, the example from the msprime.RateMap.read_hapmap docstring includes the lines,
Using the equation above, the map position for the second row should be:
which matches what was given. So, the last line always should always have rate 0 (which isn't the case for these ApiMel maps). (I think I'm doing this right?? Hapmap format always confuses me) |
Do folks (@nspope @petrelharp ?) think it's worth waiting for this to be resolved before submitting the adding species ms to bioRxiv? If it will be ready in the next few days, especially if it won't make it into a release until the selection paper release otherwise, we can wait. If it will take longer we'd like to go ahead and put it up on bioRxiv and do the version release. |
To keep others in the loop-- @janaobsteter sent me the crossovers and the script used to compute rates from these. I'll try to figure out what's going wrong, but I'm not sure how long that'll take. @igronau and @lauterbur, it looks like you wanted to add this map to the "adding species" preprint? Should we just go ahead with the preprint and put this map in a later release? |
Adding it to the adding species preprint was the hope, but we're pretty
much ready to put it up so don't want to wait much longer. And you were
going to do the version release alongside the preprint, right? It would
just be a shame to see this map miss the release by a day or two since it's
so close!
…On Thu, Oct 27, 2022 at 10:26 AM nspope ***@***.***> wrote:
To keep others in the loop-- @janaobsteter
<https://github.com/janaobsteter> sent me the crossovers and the script
used to compute rates from these. I'll try to figure out what's going
wrong, but I'm not sure how long that'll take.
@igronau <https://github.com/igronau> and @lauterbur
<https://github.com/lauterbur>, it looks like you wanted to add this map
to the "adding species" preprint? Should we just go ahead with the preprint
and put this map in a later release?
—
Reply to this email directly, view it on GitHub
<#1399 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTKUNIFFJL4ZSLS36PSCXTWFK3LVANCNFSM6AAAAAARDSEZHI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@lauterbur My suggestion is to go ahead with the preprint. I don't know how long it'll take to work out this map (it'll be next week before I can really take a look). |
yes I agree @nspope. @lauterbur we need the preprint to go out before the release so that the documentation might point to the preprint DOI |
That's the plan, and it sounds like the timing doesn't work out to include the honeybee map in the preprint. |
Hi, folks - it'd be great to get this map in here! But, looking at what's going on - our policy has been that we want to include "published" things, so that (a) it's well documented what's going into stdpopsim, and (b) the burden isn't on us of figuring out what to include and what's useful. It sounds to me like you're basically inferring the map, @janaobsteter? And, for us to make sure that the map looks reasonable, we'd really want to have a short writeup of what you did, with some diagnostic plots? So, here's my suggestion: do a short (2pg?) writeup of what you're doing, with diagnostic plots, and post it to bioRxiv, and then we can cite that (and maybe you can publish it somewhere, saying "and also this is in stdpopsim") - what do you think? |
I've added the tar.gz version for now, since this is how it says in the instructions.
The genetic map was created from a map on microsatellites as described here (this is David Wragg's bitbucket): https://bitbucket.org/scriptBee/hapmap-pilot/src/main/GeneticMaps/