issue with MobileNetV4HybridLarge for non square images #12

ilyassmoummad · 2024-07-22T12:40:24Z

Hi, thanks for the cool torch reproduction!
I'm trying to use the code for rectangular grayscale images (I modified conv0 input channel to 1).

When running

x = torch.rand(1, 1, 128, 251).cuda()
model = MobileNetV4("MobileNetV4HybridLarge").cuda()

I have an error with this line of code:

context = context.view(batch_size, self.num_heads * self.key_dim, px, px)

causing this error: RuntimeError: shape '[1, 384, 8, 8]' is invalid for input of size 49152

I checked the original implementation https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_blocks.py#L1489 they do divide by height and width strides but they are of value 1, any idea on how to make this work for non square images ? Thanks a lot in advance!

The text was updated successfully, but these errors were encountered:

Provide feedback