Incorrect mean values with na.rm = T #23

kjedrzejewski · 2017-03-27T00:01:26Z

> tmp = c(1,1,1,1,NA,NA,NA,NA,1,1)
> roll_mean(tmp, 4, c(1,3,3,1), na.rm=T)
[1] 1.000000 1.166667 1.000000 0.500000      NaN 0.500000 1.000000

1.166667 and 0.5 are unexpected values. There should be only 1s, and one NaN.

This seems to work correctly when a default weighting is used:

> roll_mean(tmp, 4, na.rm=T)
[1]   1   1   1   1 NaN   1   1

The text was updated successfully, but these errors were encountered:

kevinushey · 2017-05-03T18:20:04Z

Thanks for the bug report! zoo::rollapply() gets it right here, so I should see what's going on:

> tmp = c(1, 1, 1, 1, NA, NA, NA, NA, 1, 1)
> zoo::rollapply(tmp, c(1, 3, 3, 1), mean)
[1]  1  1  1  1 NA NA NA NA  1

kevinushey · 2018-06-04T15:01:28Z

Finally looking into this a little deeper now. The way rollapply() constructs the windows for vectors of weights is a bit different than what I'm doing in RcppRoll:

Those are the offsets for each window; that is, rollapply() uses a variable window size (e.g. first run is width 1, second 3, third is 3, fourth is 1, repeat...) In other words, comparing directly to rollapply() as I did before isn't quite accurate.

In the RcppRoll case, in this call, we're always using a window size of 4, with weights c(1, 3, 3, 1). Normalized, these become c(0.5, 1.5, 1.5, 0.5). The computation done for the second iteration is effectively:

mean(c(1, 1, 1, NA) * c(0.5, 1.5, 1.5, 0.5))

However, because we remove NAs, we instead end up with:

> mean(c(1, 1, 1) * c(0.5, 1.5, 1.5))
[1] 1.166667

This is primarily because the weights are not re-normalized after removing the NA value, which they likely should be in this case.

In other words, we should be checking + re-normalizing the weights after accounting for NAs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect mean values with na.rm = T #23

Incorrect mean values with na.rm = T #23

kjedrzejewski commented Mar 27, 2017

kevinushey commented May 3, 2017

kevinushey commented Jun 4, 2018

Incorrect mean values with na.rm = T #23

Incorrect mean values with na.rm = T #23

Comments

kjedrzejewski commented Mar 27, 2017

kevinushey commented May 3, 2017

kevinushey commented Jun 4, 2018