Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using AbstractFloat instead of Float64 in warning check for slow conv #475

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gabrielpreviato
Copy link
Contributor

@gabrielpreviato gabrielpreviato commented Feb 18, 2023

PR Checklist

  • Tests are added
  • Documentation, if applicable

#383 silence some warnings that are not necessarily useful (as discussed in the PR), but keep the warning for Float64 with other types.

But with the support for half-precision floats, some other combinations can happen and go unnoticed, such as a Float16 weight on a Float32 matrix, or a Float32 weight on an Int matrix ("oh no I forgot to convert my Integers to Floats").

This PR changes the type check from Float64 to AbstractFloat, creating a warning for some unintentional and weird combinations, but still preventing issuing warnings for Dual.ForwardDiff, which I think was the main purpose of the previous PR.

Some examples

Current NNlib (only shows a warning when there is a Float64):

julia> x = rand(Float16, 5, 5, 1, 1)
5×5×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
 0.3955  0.76    0.8823  0.4844  0.593
 0.2158  0.51    0.9277  0.2725  0.2163
 0.547   0.8364  0.958   0.939   0.3027
 0.1377  0.7285  0.4229  0.943   0.579
 0.7437  0.5874  0.805   0.146   0.269

julia> w = rand(Int8, 3, 3, 1, 1)
3×3×1×1 Array{Int8, 4}:
[:, :, 1, 1] =
 -68  120  -77
 -75   90   11
   3   92  -25

julia> conv(x, w)
3×3×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
  34.44  119.1    61.03
 101.75   29.06  116.0
  60.0    87.0    46.62

julia> w = rand(Float32, 3, 3, 1, 1)
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 0.476644  0.732119  0.917347
 0.350704  0.753561  0.0978633
 0.633826  0.753227  0.0496203

julia> conv(x, w)
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 3.45243  3.77007  2.86686
 2.86372  3.45715  2.65007
 3.47114  3.27677  2.87541

julia> w = rand(Float64, 3, 3, 1, 1)
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 0.220391  0.259412   0.027865
 0.855732  0.353315   0.893624
 0.531092  0.0226233  0.315492

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float16
│   T2 = Float64
└ @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:192
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 2.22078  2.01215  2.05155
 2.46237  2.55374  2.24465
 1.79309  2.64892  1.81043

This PR (shows a warning for any mixture of different Floats):

PS: I did these tests resetting the runtime since there is a maxlog of 1 for the warnings.

julia> x = rand(Float16, 5, 5, 1, 1)
5×5×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
 0.474   0.5884  0.7637   0.6035  0.6396
 0.0762  0.645   0.0952   0.7197  0.818
 0.3394  0.1221  0.543    0.7017  0.767
 0.7173  0.961   0.08936  0.2783  0.3022
 0.678   0.5005  0.7104   0.965   0.4219

julia> w = rand(Int8, 3, 3, 1, 1)
3×3×1×1 Array{Int8, 4}:
[:, :, 1, 1] =
  81  -48  -76
 -15  -41   58
 -23   72  -56

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float16
│   T1 = Float16
│   T2 = Int8
└ @ NNlib ~/NNlib.jl/src/conv.jl:192
3×3×1×1 Array{Float16, 4}:
[:, :, 1, 1] =
 -12.91    52.38  -63.06
 -46.8   -126.2    23.25
 -39.9     70.0   -74.44

julia> w = rand(Float32, 3, 3, 1, 1)
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 0.885747  0.745852  0.161355
 0.837498  0.757606  0.457403
 0.86357   0.687177  0.457069

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float32
│   T1 = Float16
│   T2 = Float32
└ @ NNlib ~/NNlib.jl/src/conv.jl:192
3×3×1×1 Array{Float32, 4}:
[:, :, 1, 1] =
 2.20351  3.15128  3.26581
 2.8175   3.43277  3.7439
 2.4886   3.27672  3.43359

julia> w = rand(Float64, 3, 3, 1, 1)
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 0.191498  0.043293  0.34513
 0.417291  0.227745  0.510484
 0.627288  0.680126  0.0511033

julia> conv(x, w)
┌ Warning: Slow fallback implementation invoked for conv!  You probably don't want this; check your datatypes.
│   yT = Float64
│   T1 = Float16
│   T2 = Float64
└ @ NNlib ~/NNlib.jl/src/conv.jl:192
3×3×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 1.83448  1.69441  1.53605
 1.48041  2.10666  2.06308
 1.76039  1.72283  2.31743

julia> f = x -> sum(conv(x, w))
#5 (generic function with 1 method)

julia> ForwardDiff.gradient(f, x)
5×5×1×1 Array{Float64, 4}:
[:, :, 1, 1] =
 0.998969  1.54049  2.26595  1.26698  0.725466
 1.57489   2.57254  4.01191  2.43702  1.43937
 2.10452   3.83771  6.0199   3.91539  2.18219
 1.10555   2.29723  3.75395  2.64841  1.45672
 0.529626  1.26517  2.008    1.47837  0.742825

@ToucheSir
Copy link
Member

I like the idea, but do we support half precision in the fast path? I didn't think we did since it relies on BLAS and that's usually 32/64bit only.

@gabrielpreviato
Copy link
Contributor Author

Hum, I think I've read somewhere something like "half precision support", but indeed, BLAS only supports 32/64.

But still, if you mix Float32 and Float16 (it was what happened to me and made me create this PR) there will be no warning. Maybe for now, instead of Float64 only we check for Float64 and Float32 then?

@ToucheSir
Copy link
Member

There should be no warning, because we don't have another implementation of convolutions for those types! Or are you saying that we should be dissuading people from mixing FP16+FP32 altogether?

@gabrielpreviato
Copy link
Contributor Author

I don't think we should be dissuading people from mixing types or doing weird combinations, but I think the warning is something valid and useful once you can actually do this mix by mistake. Flux is currently also issuing a warning for a possible mix of Float16 and Float32.

https://github.com/FluxML/Flux.jl/blob/cebc0d931a3678afdcd04040858f5541bf5ff23b/src/layers/stateless.jl#L56-L60

And if you have scalar indexing disabled, you get a scalar index error because of how the direct convolution is implemented. If you are somehow new to Julia's environment, it can be not so straightforward to detect that the problem is type mixing, IMO.

julia> x = CUDA.rand(Float32, 5, 5, 1, 1)
5×5×1×1 CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.601529  0.142086  0.612069  0.094501  0.699099
 0.821472  0.202396  0.484461  0.337047  0.106021
 0.586893  0.62307   0.168788  0.329471  0.8758
 0.213742  0.770866  0.132256  0.489098  0.570035
 0.392795  0.27233   0.656506  0.795186  0.618649

julia> w = CUDA.rand(Float16, 3, 3, 1, 1)
3×3×1×1 CuArray{Float16, 4, CUDA.Mem.DeviceBuffer}:
[:, :, 1, 1] =
 0.4766  0.009766  0.8623
 0.6084  0.0       0.001953
 0.8877  0.2754    0.6045

julia> conv(x, w)
ERROR: TaskFailedException

    nested task error: Scalar indexing is disallowed.
    Invocation of getindex resulted in scalar indexing of a GPU array.
    This is typically caused by calling an iterating implementation of a method.
    Such implementations *do not* execute on the GPU, but very slowly on the CPU,
    and therefore are only permitted from the REPL for prototyping purposes.
    If you did intend to index this array, annotate the caller with @allowscalar.
    Stacktrace:
     [1] error(s::String)
       @ Base ./error.jl:35
     [2] assertscalar(op::String)
       @ GPUArraysCore ~/.julia/packages/GPUArraysCore/B3xv7/src/GPUArraysCore.jl:100
     [3] getindex(::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64, ::Int64, ::Int64, ::Vararg{Int64})
       @ GPUArrays ~/.julia/packages/GPUArrays/5wTN2/src/host/indexing.jl:9
     [4] getindex
       @ ./subarray.jl:282 [inlined]
     [5] conv_direct!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}, ::Val{(3, 3, 1)}, ::Val{1}, ::Val{(0, 0, 0, 0, 0, 0)}, ::Val{(1, 1, 1)}, ::Val{(1, 1, 1)}, fk::Val{false}; alpha::Float32, beta::Bool)
       @ NNlib ~/.julia/packages/NNlib/TZPiH/src/impl/conv_direct.jl:104
     [6] conv_direct!(y::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, x::SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, w::CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; alpha::Float32, beta::Bool)
       @ NNlib ~/.julia/packages/NNlib/TZPiH/src/impl/conv_direct.jl:50
     [7] conv_direct!
       @ ~/.julia/packages/NNlib/TZPiH/src/impl/conv_direct.jl:47 [inlined]
     [8] (::NNlib.var"#308#312"{Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, DenseConvDims{3, 3, 3, 6, 3}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}, CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, SubArray{Float32, 5, CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, Tuple{Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}, Base.Slice{Base.OneTo{Int64}}}, false}})()
       @ NNlib ./threadingconstructs.jl:258
Stacktrace:
  [1] sync_end(c::Channel{Any})
    @ Base ./task.jl:436
  [2] macro expansion
    @ ./task.jl:455 [inlined]
  [3] conv!(out::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in1::CuArray{Float32, 5, CUDA.Mem.DeviceBuffer}, in2::CuArray{Float16, 5, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{3, 3, 3, 6, 3}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:205
  [4] conv!
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:185 [inlined]
  [5] #conv!#258
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:145 [inlined]
  [6] conv!
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:140 [inlined]
  [7] conv(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float16, 4, CUDA.Mem.DeviceBuffer}, cdims::DenseConvDims{2, 2, 2, 4, 2}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:88
  [8] conv
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:83 [inlined]
  [9] #conv#231
    @ ~/.julia/packages/NNlib/TZPiH/src/conv.jl:56 [inlined]
 [10] conv(x::CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, w::CuArray{Float16, 4, CUDA.Mem.DeviceBuffer})
    @ NNlib ~/.julia/packages/NNlib/TZPiH/src/conv.jl:50
 [11] top-level scope
    @ REPL[3]:1

@ToucheSir
Copy link
Member

ToucheSir commented Feb 19, 2023

The reason <64 and 64 bit mixing is so insidious is because so it's so easy to write Julia code which promotes to the latter. That's less of a concern with 16/32 because one has to explicitly request arrays be that eltype. It's also not as if users have a different option for 16-bit conv operations, which is the case for both 32 and 64 bit. IMO the scalar indexing problem is a separate one and should be addressed by us adding a check for GPU arrays.

Edit: I might support having the fallback warning as a way to tell users that FP16 does not have have a fast path on CPU, but again that's a different discussion than trying to extend the behaviour of the current warning (which as far as I know was meant as a "hey, this isn't hitting the BLAS path" warning) to different eltype combinations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants