You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for creating and maintaining this great library!
What would you like to be added:
Today, ModelToEncoder calls ModelToEncoding which statically initializes a dictionary of 7 encodings. 6/7 are duplicates.
When each encoding is constructed, the constructor eagerly loads a bunch of data from manifest resources. As far as I can tell, this data gets loaded separately for each instance.
I would like to be able to call ModelToEncoder and only have it lazily load the encoding I care about. Furthermore, I'd like to see it share Encoding instances among models which map to the same encoding.
An example implementation might look like this:
public static Encoding? TryFor(string modelName)
{
switch (modelName)
{
case "gpt-4o":
return O200KCache.Instance;
case "gpt-4":
...
case "text-embedding-3-large":
return Cl100KCache.Instance;
default:
return null;
}
}
private static class O200KCache
{
public static readonly O200KBase Instance = new();
}
private static class Cl100KCache
{
public static readonly Cl100KBase Instance = new();
}
Why is this needed:
Reduce memory footprint and startup time, especially as more models are added.
Anything else we need to know?
I'd be happy to file a PR for this if you're interested!
The text was updated successfully, but these errors were encountered:
Thanks for creating and maintaining this great library!
What would you like to be added:
Today,
ModelToEncoder
callsModelToEncoding
which statically initializes a dictionary of 7 encodings. 6/7 are duplicates.When each encoding is constructed, the constructor eagerly loads a bunch of data from manifest resources. As far as I can tell, this data gets loaded separately for each instance.
I would like to be able to call
ModelToEncoder
and only have it lazily load the encoding I care about. Furthermore, I'd like to see it share Encoding instances among models which map to the same encoding.An example implementation might look like this:
Why is this needed:
Reduce memory footprint and startup time, especially as more models are added.
Anything else we need to know?
I'd be happy to file a PR for this if you're interested!
The text was updated successfully, but these errors were encountered: