You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yes, I have been working a lot with tweaks for this, what I ended up doing is using a while loop to split up only chromosomes that results in sums > 2^31-1, such that I only need to do it for those I know will fail.
I then split it that chromosome/bin set in two, and check if any of the subsets are still > 2^31-1, repeat if true (I set to max 20 split rounds to avoid infinite loop, if some crazy number is used etc)
This works for me, but the optimal would be if it is possible to catch this internally and allow some numeric types in the coverage subsetting function directly ?
Btw, this problem is also true for calculating the coverage() of GRanges object using weight = "score", where any score is > 2^31-1 if score column is integer (this happens if the same coordinate repeats over multiple ranges (and those scores are all < 2^31-1, but the sum is bigger than 2^31-1, if they are merged together before calling coverage() GRanges handles this internally and converts "score" to numeric), I use similar method then to what I described above, I first run coverage(), if any sum(runValues(cov)) are NA, I rerun the subset of the GRanges chromosomes with score as numeric.
If I set the entire "score" to numeric, my output object is 1GB, if I set it to numeric only for failing chromosomes my object is 580MB, and it is faster too. So a no brainer to fix that one at least for me on big data for human genome.
This is a spin-off from issue #43:
sessionInfo():
The text was updated successfully, but these errors were encountered: