-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch execution #6
Comments
What do you mean by continuous memory layout? Do you mean that in the case of an array (for full state simulation), the multiple states are combined into one array with an extra dimension? This is not so simple with MPS, since combining multiple MPS into a single MPS with an extra index can in general increase the MPS bond dimension a lot (in general, roughly adding up the bond dimensions of the MPS being combined). |
It's not adding bond dimension, it's adding an extra batch dimension, here is a piece of example came from a private MPS implement of mine (unfortunately I can't open source it at the moment), you will also need my batch gemm implementation as well as the batch gemm intrinsic in CUDA to get the best acceleration. struct MPS{B, PN, L, T, VT <: AbstractArray{T, 3}}
tensors::NTuple{L, NTuple{PN, VT}}
function MPS(tensors::NTuple{L, NTuple{PN, VT}}) where {T, L, PN, VT <: AbstractArray{T, 3}}
B = size(first(first(tensors)), 3)
for (i, each_tensor) in enumerate(tensors), (k, tx) in enumerate(each_tensor)
B == size(tx, 3) || error("batch size mismatch for the $i-th tensor's $k-th physical leg's matrix, expect $B, got $(size(tx, 3))")
end
new{B, PN, L, T, VT}(tensors)
end
end you don't want to add multiple MPS into the bond dimension, that still destroys the memory layout. But again, it's not a huge deal, just a suggestion, we could also just make a simple loop outside apply and enable multithreading when there are batches. |
I'm not quite sure what all of the type parameters are. Is |
The memory layout is like MPO, but it is not MPO. |
By MPO I mean in the general sense that there are 4 indices per tensor, where 2 indices are shared between neighbors. Is this the form of the "batched" MPS you are imagining, where one of the indices of each tensor is the "batched" index? |
Yes, but since this is still a MPS, you will be able to use the batch intrinsic I wrote before to accelerate the speed. You won't be able to do it for MPO. |
batch execution will help on parallel multiple input states, which can give acceleration for things like MPS circuits. But this requires PastaQ to support a continous memory layout on batch dimension.
cc: @mtfishman
The text was updated successfully, but these errors were encountered: