How to set fast math for CUDA #491

Zentrik · 2023-08-11T10:56:09Z

It seems to me that fast math is set in this line and as --math-mode has been disabled (JuliaLang/julia#41638) there is no way currently to use fast math.

GPUCompiler.jl/src/ptx.jl

Line 427 in 15f0077

fast_math = Base.JLOptions().fast_math == 1

Perhaps fast math should be set by the user similarly to max_regs.

The text was updated successfully, but these errors were encountered:

vchuravy · 2023-08-11T14:38:12Z

You can use the @fastmath macro locally.

maleadt · 2023-08-11T15:20:42Z

@fastmath being a syntactical transformation, JuliaLang/julia#26828, we cannot implement this as an option to @cuda without reimplementing it all using overlay tables. Not saying that shouldn't happen, and I'd be in favor, it's just that this shouldn't happen in CUDA.jl.

Zentrik · 2023-08-11T15:56:28Z

Does @fastmath enable flushing denormals and the other stuff here.

GPUCompiler.jl/src/ptx.jl

Lines 427 to 441 in 15f0077

    
           fast_math = Base.JLOptions().fast_math == 1 
        
           # NOTE: we follow nvcc's --use_fast_math 
        
           reflect_val = if reflect_arg == "__CUDA_FTZ" 
        
               # single-precision denormals support 
        
               ConstantInt(reflect_typ, fast_math ? 1 : 0) 
        
           elseif reflect_arg == "__CUDA_PREC_DIV" 
        
               # single-precision floating-point division and reciprocals. 
        
               ConstantInt(reflect_typ, fast_math ? 0 : 1) 
        
           elseif reflect_arg == "__CUDA_PREC_SQRT" 
        
               # single-precision denormals support 
        
               ConstantInt(reflect_typ, fast_math ? 0 : 1) 
        
           elseif reflect_arg == "__CUDA_FMAD" 
        
               # contraction of floating-point multiplies and adds/subtracts into 
        
               # floating-point multiply-add operations (FMAD, FFMA, or DFMA) 
        
               ConstantInt(reflect_typ, fast_math ? 1 : 0)

That's what I cared about, I'm already using @fastmath to use the faster versions of functions.

It doesn't seem to, e.g.

@fastmath function kernel!(y, x)
     i = threadIdx().x
     @inbounds y[i] = sqrt(x[i])
     
     return nothing
 end
x = CuArray(Float32[])
@device_code_ptx @cuda launch=false always_inline=true kernel!(x, x)

Looking at the ptx of this I see sqrt.rn.f32 %f2, %f1; whereas if I run a fork of GPUCompiler.jl and set fast_math to true, I get sqrt.approx.ftz.f32 %f2, %f1; and the sass code looks significantly better.

Zentrik · 2023-08-11T16:09:03Z

@fastmath sqrt and sqrt compile to @llvm.sqrt.f32 in llvm as CUDA doesn't define a fastmath sqrt, so perhaps not the best example but I think my point still stands. Doing @fastmath 1 / x[i] I see with fast_math = false div.approx in the ptx and with fast_math=true div.approx.ftz.

maleadt · 2023-08-11T16:28:45Z

Does @fastmath enable flushing denormals and the other stuff here.

Ah yes, that kind of stuff we should be able to control.

Doing @fastmath 1 / x[i] I see with fast_math = false div.approx in the ptx and with fast_math=true div.approx.ftz.

That won't be affected by the proposed fast_math flag though, which only affects the code from libdevice. IIUC Julia itself should change the emission of the fdiv LLVM IR instruction, adding appropriate fast-math flags (not saying that's implemented, but it's the level where this should be happening).

I guess we could have a GPUCompiler pass that adds fast-math stuff everywhere, but that feels like a hack.

Zentrik · 2023-08-19T08:47:53Z

Thanks

maleadt closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2023

maleadt reopened this Aug 11, 2023

maleadt mentioned this issue Aug 14, 2023

Add support for @cuda fastmath JuliaGPU/CUDA.jl#2030

Merged

maleadt closed this as completed in JuliaGPU/CUDA.jl#2030 Aug 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set fast math for CUDA #491

How to set fast math for CUDA #491

Zentrik commented Aug 11, 2023

vchuravy commented Aug 11, 2023

maleadt commented Aug 11, 2023

Zentrik commented Aug 11, 2023

Zentrik commented Aug 11, 2023 •

edited

Loading

maleadt commented Aug 11, 2023 •

edited

Loading

Zentrik commented Aug 19, 2023

How to set fast math for CUDA #491

How to set fast math for CUDA #491

Comments

Zentrik commented Aug 11, 2023

vchuravy commented Aug 11, 2023

maleadt commented Aug 11, 2023

Zentrik commented Aug 11, 2023

Zentrik commented Aug 11, 2023 • edited Loading

maleadt commented Aug 11, 2023 • edited Loading

Zentrik commented Aug 19, 2023

Zentrik commented Aug 11, 2023 •

edited

Loading

maleadt commented Aug 11, 2023 •

edited

Loading