-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs of construction undef arrays with missing #31091
Conversation
This PR explains the behavior I observe, but it is not documented, so maybe it is not guaranteed (therefore before merging it should be confirmed that Base wants to guarantee this contract). The contract is that if `T` is a bits type then creation of uninitialized array of type `Union{Missing, T}` initializes it to hold `missing` in all entries. CC @nalimilan
doc/src/manual/missing.md
Outdated
@@ -254,6 +254,23 @@ julia> Array{Union{Missing, String}}(missing, 2, 3) | |||
missing missing missing | |||
``` | |||
|
|||
For `T` that is bits type (`isbitstype(T)` returns `true`), | |||
`Array{Union{Missing, T}}(undef, dims)` creates an array filled with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to recommend Array{Union{Missing, T}}(undef, dims)
(which should work even for non-isbits
T
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the problem. If T
is non isbitstype
then the array is truly uninitialized (#undef
). If I understand the reason correctly this is for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean Array{Union{Missing, T}}(missing, dims)
@fredrikekre?
I'm not sure whether it's a good idea to develop this here given that we want people to use Array{Union{Missing, T}}(missing, dims)
rather than Array{Union{Missing, T}}(undef, dims)
, and that it makes things more complex with the introduction of the concept of isbits
type. Maybe make this a shorter !!note
without showing the REPL output? It would also be useful to mention for non-isbits
type the entries will be #undef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean...
Yea...
I also was not sure what to do. One thing that was clear to me is that we should document it, especially the behavior I am OK to change the message (I just wanted to put something on the table), but first let us confirm that this is the intended behavior that will be supported (and this result is a consequence of conscious design decision), i.e.:
(making this clear was the main reason why I have opened this PR) |
The latter is intentional. I didn’t realize the former was a guarantee... probably more of an interaction between ordering of unions and how the isbits union array constructor was defined - ie more of a coincidence. Generic code should never read from an |
Yes AFAICT both behaviors are guaranteed (since changing them would potentially break a lot of code), but the former it's not really "intended": that's just because the type tag is filled with zeros (unlike the data part, we can't leave it filled with arbitrary values). |
This was my assumption. I have rewritten the message to explain the behavior (because when more people start using Julia they will notice this so questions may arise) but discourage relying on it. |
doc/src/manual/missing.md
Outdated
`Array{Union{Missing, T}}(undef, dims)` creates an array filled with | ||
missing values. Also calling `similar` to create an unitialized array that | ||
has element type of the form `Union{Missing, T}` creates an array filled with | ||
`missing` if `T` is bits type. This behavior is an implementation detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Implementation detail that may change in the future" might be a bit strong, as it makes it sound like the public API cannot be relied on. Maybe just recommend using Array{Union{Missing, T}}(missing, dims)
, noting that it works for all types? Maybe also remove the leading "currently".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is strong, but from the discussion I understood that we want to be discouraging. The question is if in e.g. Julia 1.7 we will guarantee this behavior (I know we rely on it internally, and the probability that this will change is minimal, but theoretically it is possible). In general - I understand that this is only the current behavior but it is not a part of a public API.
Actually, I have written this PR is exactly to clarify: do we want to make it a public API or just describe the current behavior and warn users that this is not a part of public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, that's part of the public API (anyway, any behavior that is stable like this will de facto end up as being part of the API since lots of code will rely on it). We can mark this for triage to make sure that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - can you please mark it?
any behavior that is stable like this will de facto end up as being part of the API since lots of code will rely on it
Yes- but I would prefer an explicit decision 😄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have Union{SingletonType, NonSingletonType}
you are guaranteed to get the singleton type (e.g. Missing
), and that won't change. For other combinations it's less predictable.
This isn't true. julia> module A
struct NotMissing end
export NotMissing
end
using .A
julia> Array{Union{NotMissing, Missing}}(undef, 5)
5-element Array{Union{NotMissing, Missing},1}:
NotMissing()
NotMissing()
NotMissing()
NotMissing()
NotMissing() In short, it's sorted by singleton-ness and module/type name. In most cases, though, folks are going to be interested in combinations of I say we document a very simplistic version of this behavior — but getting the wording right there is gonna be a pain. |
My conclusion would be just to stress that the result is undefined (as documented elsewhere, but I think it is good to be explicit here). The reason is, as I think @nalimilan noted earlier, that if you want to fill an @nalimilan - where do we rely on the behavior on |
Ah, good catch @mbauman. So I guess the note should be a warning saying that the result is undefined or at least varies depending on the type
I don't think I said we rely on it in a particular place. It's just that inevitably some code somewhere will depend on it. As long as that code isn't exercized on a singleton type which breaks the apparent rule, it won't be caught -- which might be OK if e.g. |
I have updated the description following the discussion we had. Would it be acceptable now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just add an empty line after the note (with [ci skip]
in the commit message).
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
CI errors seem unrelated |
Shouldn't we altogether discourage people from relying on this behavior? I see only downsides here. We now have the semantically correct
Otherwise we just further the confusion about the semantics of the If we mention this behavior in the documentation, I think it should only be to note that it's an accident of the implementation and that the semantics of |
I think that there are two purposes of the documentation:
|
I'm also a bit confused by this update of the documentation... On the one hand, it helps to not rely on a behavior which is observed in particular cases (non isbits
Out of curiosity, why does the type tag has to be filled with zeros? |
So for me it is the opposite. It tells me not to rely on this when writing generic code (as in such cases if something is broken only rarely it is the same as if it were broken 50% of times). |
Yes but sometimes it's best to leave a particular behavior unspecified. If we specify this, then the developers of new array types have to choose between performance and "compatibility" with Base types. This is a self-inflicted wound: it makes perfect sense to leave undefined the values returned by As you know, this is already a problem for the developers of SentinelArrays.jl. The SentinelArray So I'm afraid we're missing an opportunity here. I think the Julia documentation should say something like this:
Then it would be clear that SentinelArray is free to make a real |
OK, makes sense. I am not Julia Base maintainer, so maybe please open a PR with this change? |
Sure, PR filed at #38260 . |
This PR explains the behavior I observe, but it is not documented, so maybe it is not guaranteed (therefore before merging it should be confirmed that Base wants to guarantee this contract).
The contract is that if
T
is a bits type then creation of uninitialized array of typeUnion{Missing, T}
initializes it to holdmissing
in all entries.CC @nalimilan