You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To be clear the method which copies to CPU should only be for inds which are arrays, which is where you have to worry about races. For simpler things like A[1,:] or B[3:end-3] it should not do this.
I think this method was added as the simplest way to solve the problem. But having a faster kernel in a package extension would be fine. I believe it's a lot like NNlib.scatter.
Can we use custom kernel with atomics for
∇getindex!(dx::AbstractGPUArray, dy, inds...)
instead of copying everything to CPU?This way we'd be able to avoid synchronizations and we can add such kernel via extension
The text was updated successfully, but these errors were encountered: