-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embeddings: batch size vs context length #963
Comments
Batch size is the maximum number of tokens that can be processed at once, it's separate from the context size. For text generation you can feed the model multiple batches before generating a response. For embedding, I think right now the embedder requires that all of your text is sent in one batch. So you'll need a larger batch for embeddings.
You're already loading |
Here's our code, where the exception is thrown: public async Task<Embedding> GenerateEmbeddingAsync(string text)
{
if (this._log.IsEnabled(LogLevel.Trace))
{
this._log.LogTrace("Generating embedding, input token size: {0}", this._textTokenizer.CountTokens(text));
}
// Throws `System.ArgumentException`
var embeddings = await this._embedder.GetEmbeddings(text);
return new Embedding(embeddings[0]);
}
The string is 979 tokens, and I would expect Is there something to change in the method above? |
Sounds right. Since you must process everything for embeddings in one batch that means you batch size must be set to 979, or greater. |
Looking at the examples, there's no code about the batch size - how is the batch size set? e.g. https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Examples/Examples/GetEmbeddings.cs |
trying to run https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Examples/Examples/GetEmbeddings.cs and it throws the same exception:
|
Batch size is set in the |
Wouldn't it be easier if batch size was automatically set to match max tokens? is there any benefit from having a lower default? For instance, if a model supports up to 8192 tokens per embedding, automatically setting batch size to 8192 would replicate the behavior seen in HF, OpenAI, etc. |
A large batch size is costly (it takes extra memory). It's generally not worth making very large since (for text generation) after the initial prompt you'll be submitting just one single token at a time. For embedding it's different, you must make the batch size as large as the largest amount of data you'll ever need an embedding for, since it can't be split across multiple batches (currently). |
Description
I’m using two models,
openchat_3.5.Q5_K_M.gguf
to generate text andnomic-embed-text-v1.5.Q8_0.gguf
to calculate text embeddings.When I input text that exceeds 512 tokens - in my case, it’s 979 tokens - embedding generation throws this exception:
However, the model documentation specifies a context length of 8192 tokens.
Questions:
The text was updated successfully, but these errors were encountered: