Replies: 1 comment
-
@Aure20 there is no interpolation or normalization of image inputs in the model, this is done with the preprocessing pipeline, and there are metadata attributes for each pretrained model that indicate which normalization mean/std is needed for each weight. fully convolutional networks work with any image size, although you run into issues if you go under the size of the network reduction (stride), which is usually 32. These models do not need any specific image size to be set on creation and have no argument for the size. However, if you deviate too far from the original train size, performance can drop significantly. vit and vit-hybrid models can have constraints on the image size, the position embeddings especially are often initialized for a specific size, to allow the model to work with a different size the model needs to be passed the image size ( |
Beta Was this translation helpful? Give feedback.
-
I am currently trying different types of backbones for image classification, (ECA-NFNet-L0,tf_efficientnetv2_b0.in1k,efficientvit_b1.r288_in1k). I have been passing to the models images of different shapes from the expected ones by the model and the model accept them without any issue, so I assume there is some interpolation going on under the hood, is this correct (also if the image is already with the correct shape then can I assume that the interpolation layer is not included)? Moreover how do I know for a specific backbone if I need to manually normalize the inputs according to imagenet, or I need to pass the values in the [0,1] range or in the [0,255] range. Bonus question: in the case there is some interpolation/normalization layer, if I transform the model to onxx for faster inference are these layers also passed accordingly?
Beta Was this translation helpful? Give feedback.
All reactions