Benchmarking #183

ngphuoc · 2017-12-05T10:22:53Z

I am not sure if I should ask here but I'd like to ask if my benchmarking is correct. I am trying to check the speed of AFArray and compare with the KnetArray in Knet.jl. Since AFArray is asynchronous, I am not sure if BenchmarkTools.jl's @btime can do it. Please comment:

using ArrayFire
using Knet
using GZip,BenchmarkTools
atype = Array{Float32}
gtype = KnetArray  # comment out this line and uncomment next line to use AFArray 
#gtype = AFArray

Base.zero(a::gtype) = (b=similar(a); b[:]=0;b)

function gzload(file; path=Knet.dir("data",file), url="http://yann.lecun.com/exdb/mnist/$file")
    isfile(path) || download(url, path)
    f = gzopen(path)
    a = read(f)
    close(f)
    return(a)
end

xs = gzload("train-images-idx3-ubyte.gz")[17:end];
ys = gzload("train-labels-idx1-ubyte.gz")[9:end];

function minibatch(x, y, batchsize; gtype=Array{Float32}, xrows=784, yrows=10, xscale=255)
  function xbatch(a)
    gtype(atype( reshape(a./xscale, xrows, div(length(a),xrows))))
  end
  function ybatch(a)
    a[a.==0] = 10
    a = convert(Vector{Int},a)
    a = sparse(a, 1:length(a), one(eltype(a)), yrows, length(a))
  end

  xcols = div(length(x),xrows)
  xcols == length(y) || throw(DimensionMismatch())
  data = Any[]
  for i=1:batchsize:xcols-batchsize+1
    j=i+batchsize-1
    push!(data, (xbatch(x[1+(i-1)*xrows:j*xrows]), ybatch(y[i:j])))
  end
  return data
end

data = minibatch(xs,ys,10)
x,y = gtype.(atype.(first(data)))

function predict(w,x)
  for i=1:2:length(w)
    x = w[i]*x .+ w[i+1]
  end
  x
end

function weights(h...; gtype=Array{Float32}, winit=0.1f0)
  w = Any[]
  x = 28*28
  for y in [h..., 10]
    push!(w, winit*gtype(rand(Float32,y,x)))
    push!(w, gtype(zeros(Float32,y, 1)))
    x = y
  end
  return w
end

H = (300,200,100,50)
batchsize = 30
winit = 0.1f0
epochs = 10
lr = 0.1f0

w = weights(H...; gtype=gtype, winit=winit)

@btime predict(w, x)
# ArrayFire 193.778 μs (30 allocations: 480 bytes)
# KnetArray 46.188 μs (104 allocations: 4.13 KiB)

function loss(w,x,ygold)
    ypred = predict(w,x)
    ynorm = logp(ypred,1) # ypred .- log(sum(exp(ypred),1))
    -sum(ygold .* ynorm) / size(ygold,2)
end

@btime loss(w,x,y)
# AFArray 469.991 μs (63 allocations: 1008 bytes)
# KnetArray 91.161 μs (158 allocations: 5.92 KiB)

lossgradient = grad(loss)

@btime lossgradient(w,x,y)
# AFArray 1.086 ms (889 allocations: 31.50 KiB)
# KnetArray 348.304 μs (784 allocations: 35.08 KiB)

The text was updated successfully, but these errors were encountered:

ghost · 2017-12-05T16:21:42Z

@btime is fine, just add @afgc to predict and loss, and make it @btime sync(predict(w,x)). For small array sizes KnetArray and CuArray should be faster, try increasing batch size and see if it changes. Also try .= or swap! in predict.

ngphuoc · 2017-12-05T22:49:56Z

I reran the script, restarted julia each run and varied the batchsize. I got the result below:

batchsize = 100
AFArray: 116.966 μs (30 allocations: 480 bytes) 1.3GB
KnetArray: 47.557 μs (110 allocations: 4.22 KiB) 2.7GB

batchsize = 1000
AFArray: 126.905 - 388.120 μs (30 allocations: 480 bytes) 0.8 - 0.3 GB
KnetArray: 52.093 μs (115 allocations: 4.30 KiB) 1GB

batchsize = 5000
AFArray: 47.622 - 117.923 μs (115 allocations: 4.30 KiB)  4GB
KnetArray: 46.344 - 122.819 μs (30 allocations: 480 bytes) 10GB

In general, for batchsize 100-1000 Knet is about twice as fast but for batchsize 5000 both perform similarly. I normally use small batchsize and my model has many layers. Is ArrayFire optimized for large matrices but not many small ones with basic matrix operations?

ghost · 2017-12-06T01:17:38Z

In my experience ArrayFire works best with large matrices and complicated kernels, only then JIT / async pays off big time.

ngphuoc · 2017-12-06T02:11:01Z

Thanks a lot. Just a side question, is there a pooling function (max/average pooling)? This is needed in convolutional neural network.

ghost · 2017-12-06T03:49:57Z

convolution is there, pooling is coming in arrayfire/arrayfire-ml#17

ngphuoc mentioned this issue Dec 5, 2017

Using AutoGrad with CuArray denizyuret/AutoGrad.jl#39

Open

ngphuoc closed this as completed Dec 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking #183

Benchmarking #183

ngphuoc commented Dec 5, 2017

ghost commented Dec 5, 2017

ngphuoc commented Dec 5, 2017

ghost commented Dec 6, 2017

ngphuoc commented Dec 6, 2017

ghost commented Dec 6, 2017

Benchmarking #183

Benchmarking #183

Comments

ngphuoc commented Dec 5, 2017

ghost commented Dec 5, 2017

ngphuoc commented Dec 5, 2017

ghost commented Dec 6, 2017

ngphuoc commented Dec 6, 2017

ghost commented Dec 6, 2017