Confusion related to the Causal Cross Attention #5

arnavc1712 · 2024-09-15T01:42:39Z

Hi, so in the Causal Cross Attention, I see we are registering a causal mask being the lower triangular matrix.
However when we are trying to learn the latent parameters C of seq_len m such that m < block_size. But the way it picks the mask seems weird, because it take the first m rows and columns based on the input sequence length.

For example lets say the mask for a block size of 5 is

False True True True True
False False True True True
False False False True True
False False False False True
False False False False False

now lets say the C parameter we want to learn is of seq_len 2

Now given the line att = att.masked_fill(self.mask[:,:,:Tq,:Tk] == 0, -1e10)
where Tq=2 in our case and Tk=5

we get the mask as (first two rows and 5 columns)
False True True True True
False False True True True

This mean we only take into account the first value vector to learn the first latent element and the first 2 value vectors to learn the second latent element. And disregard all the other value vectors.

Unless im understanding this incorrectly?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion related to the Causal Cross Attention #5

Confusion related to the Causal Cross Attention #5

arnavc1712 commented Sep 15, 2024

Confusion related to the Causal Cross Attention #5

Confusion related to the Causal Cross Attention #5

Comments

arnavc1712 commented Sep 15, 2024