Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking #183

Closed
ngphuoc opened this issue Dec 5, 2017 · 5 comments
Closed

Benchmarking #183

ngphuoc opened this issue Dec 5, 2017 · 5 comments

Comments

@ngphuoc
Copy link

ngphuoc commented Dec 5, 2017

I am not sure if I should ask here but I'd like to ask if my benchmarking is correct. I am trying to check the speed of AFArray and compare with the KnetArray in Knet.jl. Since AFArray is asynchronous, I am not sure if BenchmarkTools.jl's @btime can do it. Please comment:

using ArrayFire
using Knet
using GZip,BenchmarkTools
atype = Array{Float32}
gtype = KnetArray  # comment out this line and uncomment next line to use AFArray 
#gtype = AFArray

Base.zero(a::gtype) = (b=similar(a); b[:]=0;b)

function gzload(file; path=Knet.dir("data",file), url="http://yann.lecun.com/exdb/mnist/$file")
    isfile(path) || download(url, path)
    f = gzopen(path)
    a = read(f)
    close(f)
    return(a)
end

xs = gzload("train-images-idx3-ubyte.gz")[17:end];
ys = gzload("train-labels-idx1-ubyte.gz")[9:end];

function minibatch(x, y, batchsize; gtype=Array{Float32}, xrows=784, yrows=10, xscale=255)
  function xbatch(a)
    gtype(atype( reshape(a./xscale, xrows, div(length(a),xrows))))
  end
  function ybatch(a)
    a[a.==0] = 10
    a = convert(Vector{Int},a)
    a = sparse(a, 1:length(a), one(eltype(a)), yrows, length(a))
  end

  xcols = div(length(x),xrows)
  xcols == length(y) || throw(DimensionMismatch())
  data = Any[]
  for i=1:batchsize:xcols-batchsize+1
    j=i+batchsize-1
    push!(data, (xbatch(x[1+(i-1)*xrows:j*xrows]), ybatch(y[i:j])))
  end
  return data
end

data = minibatch(xs,ys,10)
x,y = gtype.(atype.(first(data)))

function predict(w,x)
  for i=1:2:length(w)
    x = w[i]*x .+ w[i+1]
  end
  x
end

function weights(h...; gtype=Array{Float32}, winit=0.1f0)
  w = Any[]
  x = 28*28
  for y in [h..., 10]
    push!(w, winit*gtype(rand(Float32,y,x)))
    push!(w, gtype(zeros(Float32,y, 1)))
    x = y
  end
  return w
end

H = (300,200,100,50)
batchsize = 30
winit = 0.1f0
epochs = 10
lr = 0.1f0

w = weights(H...; gtype=gtype, winit=winit)

@btime predict(w, x)
# ArrayFire 193.778 μs (30 allocations: 480 bytes)
# KnetArray 46.188 μs (104 allocations: 4.13 KiB)

function loss(w,x,ygold)
    ypred = predict(w,x)
    ynorm = logp(ypred,1) # ypred .- log(sum(exp(ypred),1))
    -sum(ygold .* ynorm) / size(ygold,2)
end

@btime loss(w,x,y)
# AFArray 469.991 μs (63 allocations: 1008 bytes)
# KnetArray 91.161 μs (158 allocations: 5.92 KiB)

lossgradient = grad(loss)

@btime lossgradient(w,x,y)
# AFArray 1.086 ms (889 allocations: 31.50 KiB)
# KnetArray 348.304 μs (784 allocations: 35.08 KiB)

@ghost
Copy link

ghost commented Dec 5, 2017

@btime is fine, just add @afgc to predict and loss, and make it @btime sync(predict(w,x)). For small array sizes KnetArray and CuArray should be faster, try increasing batch size and see if it changes. Also try .= or swap! in predict.

@ngphuoc
Copy link
Author

ngphuoc commented Dec 5, 2017

I reran the script, restarted julia each run and varied the batchsize. I got the result below:

batchsize = 100
AFArray: 116.966 μs (30 allocations: 480 bytes) 1.3GB
KnetArray: 47.557 μs (110 allocations: 4.22 KiB) 2.7GB

batchsize = 1000
AFArray: 126.905 - 388.120 μs (30 allocations: 480 bytes) 0.8 - 0.3 GB
KnetArray: 52.093 μs (115 allocations: 4.30 KiB) 1GB

batchsize = 5000
AFArray: 47.622 - 117.923 μs (115 allocations: 4.30 KiB)  4GB
KnetArray: 46.344 - 122.819 μs (30 allocations: 480 bytes) 10GB

In general, for batchsize 100-1000 Knet is about twice as fast but for batchsize 5000 both perform similarly. I normally use small batchsize and my model has many layers. Is ArrayFire optimized for large matrices but not many small ones with basic matrix operations?

@ghost
Copy link

ghost commented Dec 6, 2017

In my experience ArrayFire works best with large matrices and complicated kernels, only then JIT / async pays off big time.

@ngphuoc
Copy link
Author

ngphuoc commented Dec 6, 2017

Thanks a lot. Just a side question, is there a pooling function (max/average pooling)? This is needed in convolutional neural network.

@ngphuoc ngphuoc closed this as completed Dec 6, 2017
@ghost
Copy link

ghost commented Dec 6, 2017

convolution is there, pooling is coming in arrayfire/arrayfire-ml#17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant