You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The order of connection between word vectors and image vectors in prompt
Thank you for your work. I have a few questions regarding the concatenation order of the image vectors and text vectors in the input prompt during the training and evaluation stages of MiniGPT4.
1.It has been observed that during the evaluation stage (as seen in the demo), the input prompt has the word vectors in the front and the image feature vectors at the back. Could you please explain the reason for using this prompt order ? It seems different from the order during the second stage of training.
2.In the first training stage, you directly input the image embeddings into the large model without utilizing any prompt word vectors as assistance. How does this approach still achieve a preliminary "interaction between images and text" effect?"
The text was updated successfully, but these errors were encountered:
The order of connection between word vectors and image vectors in prompt
Thank you for your work. I have a few questions regarding the concatenation order of the image vectors and text vectors in the input prompt during the training and evaluation stages of MiniGPT4.
1.It has been observed that during the evaluation stage (as seen in the demo), the input prompt has the word vectors in the front and the image feature vectors at the back. Could you please explain the reason for using this prompt order ? It seems different from the order during the second stage of training.
2.In the first training stage, you directly input the image embeddings into the large model without utilizing any prompt word vectors as assistance. How does this approach still achieve a preliminary "interaction between images and text" effect?"
The text was updated successfully, but these errors were encountered: