batch execution #6

Roger-luo · 2021-05-26T18:38:14Z

batch execution will help on parallel multiple input states, which can give acceleration for things like MPS circuits. But this requires PastaQ to support a continous memory layout on batch dimension.

cc: @mtfishman

mtfishman · 2021-05-26T18:49:31Z

What do you mean by continuous memory layout? Do you mean that in the case of an array (for full state simulation), the multiple states are combined into one array with an extra dimension?

This is not so simple with MPS, since combining multiple MPS into a single MPS with an extra index can in general increase the MPS bond dimension a lot (in general, roughly adding up the bond dimensions of the MPS being combined).

Roger-luo · 2021-05-26T20:18:01Z

This is not so simple with MPS, since combining multiple MPS into a single MPS with an extra index can in general increase the MPS bond dimension a lot (in general, roughly adding up the bond dimensions of the MPS being combined).

It's not adding bond dimension, it's adding an extra batch dimension, here is a piece of example came from a private MPS implement of mine (unfortunately I can't open source it at the moment), you will also need my batch gemm implementation as well as the batch gemm intrinsic in CUDA to get the best acceleration.

struct MPS{B, PN, L, T, VT <: AbstractArray{T, 3}}
    tensors::NTuple{L, NTuple{PN, VT}}

    function MPS(tensors::NTuple{L, NTuple{PN, VT}}) where {T, L, PN, VT <: AbstractArray{T, 3}}
        B = size(first(first(tensors)), 3)
        for (i, each_tensor) in enumerate(tensors), (k, tx) in enumerate(each_tensor)
            B == size(tx, 3) || error("batch size mismatch for the $i-th tensor's $k-th physical leg's matrix, expect $B, got $(size(tx, 3))")
        end

        new{B, PN, L, T, VT}(tensors)
    end
end

you don't want to add multiple MPS into the bond dimension, that still destroys the memory layout. But again, it's not a huge deal, just a suggestion, we could also just make a simple loop outside apply and enable multithreading when there are batches.

mtfishman · 2021-05-26T21:30:54Z

I'm not quite sure what all of the type parameters are. Is L the system size? If so are there PN tensors on each site? Is the idea that there is a batch index on every tensor of the MPS, so it is basically an MPO?

Roger-luo · 2021-05-26T21:32:11Z

so it is basically an MPO?

The memory layout is like MPO, but it is not MPO.

mtfishman · 2021-05-26T21:35:37Z

By MPO I mean in the general sense that there are 4 indices per tensor, where 2 indices are shared between neighbors. Is this the form of the "batched" MPS you are imagining, where one of the indices of each tensor is the "batched" index?

Roger-luo · 2021-05-26T21:36:40Z

By MPO I mean in the general sense that there are 4 indices per tensor, where 2 indices are shared between neighbors. Is this the form of the "batched" MPS you are imagining, where one of the indices of each tensor is the "batched" index?

Yes, but since this is still a MPS, you will be able to use the batch intrinsic I wrote before to accelerate the speed. You won't be able to do it for MPO.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch execution #6

batch execution #6

Roger-luo commented May 26, 2021 •

edited

Loading

mtfishman commented May 26, 2021

Roger-luo commented May 26, 2021 •

edited

Loading

mtfishman commented May 26, 2021

Roger-luo commented May 26, 2021

mtfishman commented May 26, 2021

Roger-luo commented May 26, 2021

batch execution #6

batch execution #6

Comments

Roger-luo commented May 26, 2021 • edited Loading

mtfishman commented May 26, 2021

Roger-luo commented May 26, 2021 • edited Loading

mtfishman commented May 26, 2021

Roger-luo commented May 26, 2021

mtfishman commented May 26, 2021

Roger-luo commented May 26, 2021

Roger-luo commented May 26, 2021 •

edited

Loading

Roger-luo commented May 26, 2021 •

edited

Loading