You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SentinelArrays.jl uses undef constructors to initialize the array to "missing" values. I propose to do the following instead:
Use missing constructors to initialize arrays to "missing" (sentinel) values.
Use undef constructors to skip initialization (or more precisely, do whatever the undef constructor does for the underlying array)
Background:undef constructors of Base arrays have been (ab)used to initialize arrays with missing values. This relies on undocumented behavior: the undef constructor does zero-initialization of the memory for union arrays, to avoid invalid element types. And due to implementation details, arrays of Union{Missing,...} usually end up with all elements initialized to missing.
Relying on undocumented behavior is not great, so an attempt was made to document it: JuliaLang/julia#31091. It turned out that this use of undef works almost always, but can fail with unions of Missing and other singleton types.
The issue was solved by addingmissing constructors for Base arrays: JuliaLang/julia#25054 .
Arguments:
Using undef to initialize to a particular value makes no sense from a semantics point of view. It doesn't feel like a good API when "define to xxx" is made by calling "leave undefined"!
Please note the problem is not in the behavior of Base constructors: a "leave undefined" constructor can return anything, so why not all-missing values. But users shouldn't rely on it (at least not when there is a better way).
Using undef for this purpose in SentinelArrays will encourage people to do the same with Base arrays, where it can introduce subtle bugs (since it works almost always, but not always).
It leaves performance on the table. In Base, zero-initialization is necessary to guarantee valid union types. This constraint doesn't apply to SentinelArrays! And the whole point of SentinelArrays is to give better performance for particular use cases, so I think SentinelArrays should define undef constructor that do what it says on the tin: leave values uninitialized.
Relation to Base:
A downside of this proposal is that SentinelVector{Float64}(undef, 3) would not longer behave the same as Array{Union{Missing,Float64}}(undef, 3). But the latter is undefined behavior. Is agreeing on this undefined behavior more important than having an API that makes sense and offering the best performance?
Another way to look at it:
SentinelArrays should promote patterns that also work with Base arrays, so it should not promote using undef to get missing.
Base does the semantically correct thing: undef for uninitialized (where possible), and missing for initialized-to-missing. In particular, Base does not document the values returned by undef constructors. I think SentinelArrays should do the same.
The text was updated successfully, but these errors were encountered:
I apologize for such a slow response here; I agree with everything written up here and I agree that we should properly define and encourage the missing constructors. @knuesel, do you have interest/ability/time to make a PR? Otherwise, I can get to it in the next little while.
SentinelArrays.jl uses
undef
constructors to initialize the array to "missing" values. I propose to do the following instead:missing
constructors to initialize arrays to "missing" (sentinel) values.undef
constructors to skip initialization (or more precisely, do whatever theundef
constructor does for the underlying array)Background:
undef
constructors of Base arrays have been (ab)used to initialize arrays with missing values. This relies on undocumented behavior: theundef
constructor does zero-initialization of the memory for union arrays, to avoid invalid element types. And due to implementation details, arrays ofUnion{Missing,...}
usually end up with all elements initialized tomissing
.Relying on undocumented behavior is not great, so an attempt was made to document it: JuliaLang/julia#31091. It turned out that this use of
undef
works almost always, but can fail with unions ofMissing
and other singleton types.The issue was solved by adding
missing
constructors for Base arrays: JuliaLang/julia#25054 .Arguments:
Using
undef
to initialize to a particular value makes no sense from a semantics point of view. It doesn't feel like a good API when "define to xxx" is made by calling "leave undefined"!Please note the problem is not in the behavior of Base constructors: a "leave undefined" constructor can return anything, so why not all-missing values. But users shouldn't rely on it (at least not when there is a better way).
Using
undef
for this purpose in SentinelArrays will encourage people to do the same with Base arrays, where it can introduce subtle bugs (since it works almost always, but not always).It leaves performance on the table. In Base, zero-initialization is necessary to guarantee valid union types. This constraint doesn't apply to SentinelArrays! And the whole point of SentinelArrays is to give better performance for particular use cases, so I think SentinelArrays should define
undef
constructor that do what it says on the tin: leave values uninitialized.Relation to Base:
A downside of this proposal is that
SentinelVector{Float64}(undef, 3)
would not longer behave the same asArray{Union{Missing,Float64}}(undef, 3)
. But the latter is undefined behavior. Is agreeing on this undefined behavior more important than having an API that makes sense and offering the best performance?Another way to look at it:
undef
to getmissing
.undef
for uninitialized (where possible), andmissing
for initialized-to-missing. In particular, Base does not document the values returned byundef
constructors. I think SentinelArrays should do the same.The text was updated successfully, but these errors were encountered: