-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CLIP] 'clip-ViT-B-32' can we not change the max_seq_lenght? #1269
Comments
Clip can just encode text up to 77 word pieces. A further increase is not possible, as it is not supported by the model. Encoding only text with CLIP does not make sense. There are much better text encoders available like the all-* models |
Yes, thanks. I wanted to clarify if it was a bug or the model itself. It makes sense that it only encodes up to 77 as it was trained on captions. Thanks for the advice! It was just some tests of speed with ANN similarity. |
Just I want to clarify, 77 word pieces mean 77 alphabets including spaces or 77 words no matter how many alphabets separated by space.? |
It uses a fixed sized vocabulary with common words and character ngrams. So short common words are 1 word piece, will longer words and less common words will be broken down into multiple character chunks. |
To fix this error, I used the CLIP tokenizer to truncate my input sentences: tokenizer = model._first_module().processor.tokenizer
def truncate_sentence(sentence, tokenizer):
"""
Truncate a sentence to fit the CLIP max token limit (77 tokens including the
starting and ending tokens).
Args:
sentence(string): The sentence to truncate.
tokenizer(CLIPTokenizer): Rretrained CLIP tokenizer.
"""
cur_sentence = sentence
tokens = tokenizer.encode(cur_sentence)
if len(tokens) > 77:
# Skip the starting token, only include 75 tokens
truncated_tokens = tokens[1:76]
cur_sentence = tokenizer.decode(truncated_tokens)
# Recursive call here, because the encode(decode()) can have different result
return truncate_sentence(cur_sentence, tokenizer)
else:
return cur_sentence |
Hi,
I am playing with the clip model. Can not we change the maximum sequence length of the clip model? I was trying to encode quora_duplicate_questions (that have some "long" sentences > 77). I borrowed the code from here
The code to reproduce the issues:
Click to expand!
The issue that I have is that despite I had set the
model.max_seq_length = 512
it seems that the dimesion of the position_embeddings keeps being 77.Click here to see the error
Thank you!
The text was updated successfully, but these errors were encountered: