Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch execution #6

Open
Roger-luo opened this issue May 26, 2021 · 6 comments
Open

batch execution #6

Roger-luo opened this issue May 26, 2021 · 6 comments

Comments

@Roger-luo
Copy link
Member

Roger-luo commented May 26, 2021

batch execution will help on parallel multiple input states, which can give acceleration for things like MPS circuits. But this requires PastaQ to support a continous memory layout on batch dimension.

cc: @mtfishman

@mtfishman
Copy link

What do you mean by continuous memory layout? Do you mean that in the case of an array (for full state simulation), the multiple states are combined into one array with an extra dimension?

This is not so simple with MPS, since combining multiple MPS into a single MPS with an extra index can in general increase the MPS bond dimension a lot (in general, roughly adding up the bond dimensions of the MPS being combined).

@Roger-luo
Copy link
Member Author

Roger-luo commented May 26, 2021

This is not so simple with MPS, since combining multiple MPS into a single MPS with an extra index can in general increase the MPS bond dimension a lot (in general, roughly adding up the bond dimensions of the MPS being combined).

It's not adding bond dimension, it's adding an extra batch dimension, here is a piece of example came from a private MPS implement of mine (unfortunately I can't open source it at the moment), you will also need my batch gemm implementation as well as the batch gemm intrinsic in CUDA to get the best acceleration.

struct MPS{B, PN, L, T, VT <: AbstractArray{T, 3}}
    tensors::NTuple{L, NTuple{PN, VT}}

    function MPS(tensors::NTuple{L, NTuple{PN, VT}}) where {T, L, PN, VT <: AbstractArray{T, 3}}
        B = size(first(first(tensors)), 3)
        for (i, each_tensor) in enumerate(tensors), (k, tx) in enumerate(each_tensor)
            B == size(tx, 3) || error("batch size mismatch for the $i-th tensor's $k-th physical leg's matrix, expect $B, got $(size(tx, 3))")
        end

        new{B, PN, L, T, VT}(tensors)
    end
end

you don't want to add multiple MPS into the bond dimension, that still destroys the memory layout. But again, it's not a huge deal, just a suggestion, we could also just make a simple loop outside apply and enable multithreading when there are batches.

@mtfishman
Copy link

I'm not quite sure what all of the type parameters are. Is L the system size? If so are there PN tensors on each site? Is the idea that there is a batch index on every tensor of the MPS, so it is basically an MPO?

@Roger-luo
Copy link
Member Author

so it is basically an MPO?

The memory layout is like MPO, but it is not MPO.

@mtfishman
Copy link

By MPO I mean in the general sense that there are 4 indices per tensor, where 2 indices are shared between neighbors. Is this the form of the "batched" MPS you are imagining, where one of the indices of each tensor is the "batched" index?

@Roger-luo
Copy link
Member Author

By MPO I mean in the general sense that there are 4 indices per tensor, where 2 indices are shared between neighbors. Is this the form of the "batched" MPS you are imagining, where one of the indices of each tensor is the "batched" index?

Yes, but since this is still a MPS, you will be able to use the batch intrinsic I wrote before to accelerate the speed. You won't be able to do it for MPO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants