-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer overflow on mean #861
Comments
You order of ops does look like what is happening. But we don't actually do any of the math here. So this is the difference between a Base specialisation on May be worth a base Julia issue. The sum reduction needs to have |
I followed it into Statistics https://github.com/JuliaStats/Statistics.jl/blob/d49c2bf4f81e1efb4980a35fe39c815ef8396297/src/Statistics.jl#L179-L198 So I think the issue comes out of a difference the results of the type conversion. I've removed the identity, sum and convert: julia> M= fill(UInt16(32000), 2);
julia> D = DimArray(M, X(1:2));
julia> promote_type(typeof(M/1), typeof(M))
Vector{Float64} (alias for Array{Float64, 1})
julia> promote_type(typeof(D/1), typeof(D))
DimVector{T, Tuple{X{Sampled{Int64, UnitRange{Int64}, ForwardOrdered, Regular{Int64}, Points, NoMetadata}}}, Tuple{}, A, Na, NoMetadata} where {T, A<:AbstractVector{T}, Na} (alias for DimArray{T, 1, Tuple{X{DimensionalData.Dimensions.Lookups.Sampled{Int64, UnitRange{Int64}, DimensionalData.Dimensions.Lookups.ForwardOrdered, DimensionalData.Dimensions.Lookups.Regular{Int64}, DimensionalData.Dimensions.Lookups.Points, DimensionalData.Dimensions.Lookups.NoMetadata}}}, Tuple{}, A, Na,
DimensionalData.Dimensions.Lookups.NoMetadata} where {T, A<:AbstractArray{T, 1}, Na})
# Note:
# M isa Array{UInt16,1}
# M/1 isa Array{Float64,1}
# D isa DimArray{UInt16,1}
# D/1 isa DimArray{Float64,1}
I interpret this as promote_type(Vector{Float64}, Vector{Int64}) => Vector{Float64}
promote_type(DimVector{Float64}, DimVector{Int64}) => DimVector{Any} instead of |
Ok interesting, yes We could add one that just forwards all the type parameters to |
Taking the elementwise mean of a vector of Integer DimArrays overflows integer bounds
Base:
Whether the sum is greater or less than the integer limits, the answer is correct
Equivalent math with DimArrays:
A different answer for mean of two 32000s and three 32000s where the maxval on the int is 65535.
Guessing at this, but is the order of operations:
Base: convert to float |> sum |> divide by N
DimensionalData: sum |> convert to float |> divide by N
taking the mean of a DimArray by itself does not have any overflow issues.
The text was updated successfully, but these errors were encountered: