Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect mean values with na.rm = T #23

Open
kjedrzejewski opened this issue Mar 27, 2017 · 2 comments
Open

Incorrect mean values with na.rm = T #23

kjedrzejewski opened this issue Mar 27, 2017 · 2 comments

Comments

@kjedrzejewski
Copy link

> tmp = c(1,1,1,1,NA,NA,NA,NA,1,1)
> roll_mean(tmp, 4, c(1,3,3,1), na.rm=T)
[1] 1.000000 1.166667 1.000000 0.500000      NaN 0.500000 1.000000

1.166667 and 0.5 are unexpected values. There should be only 1s, and one NaN.

This seems to work correctly when a default weighting is used:

> roll_mean(tmp, 4, na.rm=T)
[1]   1   1   1   1 NaN   1   1
@kevinushey
Copy link
Owner

Thanks for the bug report! zoo::rollapply() gets it right here, so I should see what's going on:

> tmp = c(1, 1, 1, 1, NA, NA, NA, NA, 1, 1)
> zoo::rollapply(tmp, c(1, 3, 3, 1), mean)
[1]  1  1  1  1 NA NA NA NA  1

@kevinushey
Copy link
Owner

Finally looking into this a little deeper now. The way rollapply() constructs the windows for vectors of weights is a bit different than what I'm doing in RcppRoll:

screen shot 2018-06-04 at 7 46 06 am

Those are the offsets for each window; that is, rollapply() uses a variable window size (e.g. first run is width 1, second 3, third is 3, fourth is 1, repeat...) In other words, comparing directly to rollapply() as I did before isn't quite accurate.

In the RcppRoll case, in this call, we're always using a window size of 4, with weights c(1, 3, 3, 1). Normalized, these become c(0.5, 1.5, 1.5, 0.5). The computation done for the second iteration is effectively:

mean(c(1, 1, 1, NA) * c(0.5, 1.5, 1.5, 0.5))

However, because we remove NAs, we instead end up with:

> mean(c(1, 1, 1) * c(0.5, 1.5, 1.5))
[1] 1.166667

This is primarily because the weights are not re-normalized after removing the NA value, which they likely should be in this case.

In other words, we should be checking + re-normalizing the weights after accounting for NAs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants