-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/corcel+llama+gemini #252
Conversation
llm = Llama.from_pretrained( | ||
repo_id=AVAILABLE_MODELS[model]["repo_id"], | ||
filename=AVAILABLE_MODELS[model]["filename"], | ||
verbose=False, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0xArdi this tool runs small LLM models on the CPU. When this command is run for the first time, it downloads the model (~4GB) to a local directory, which means that the first execution could take 2-3 minutes depending on the connection. Would this be a problem for the mech?
Also, inference can take up to 1 minute depending on the prompts and uses a lot of CPU and RAM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have had issues before with models in memory, as they can lead to spikes to resource usage resulting in the container being killed by k8s. So unless this is required to be deployed, I'd say let's not deploy this tool.
That being said, with the introduction of mech-marketplace, what I said above doesn't hold, and if someone can chose to run it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's not deploy it then for now, but I guess we can merge it. What's that mech marketplace thing?
Not sure what is happening to Windows lock check. It seems to stall while installing tomte. |
@0xArdi I'm getting a linter error failure on an already existing component:
|
Proposed changes
Adds corcel, llama and gemini request tools that support both prediction and completion.
Fixes
N/A
Types of changes
What types of changes does your code introduce? (A breaking change is a fix or feature that would cause existing functionality and APIs to not work as expected.)
Put an
x
in the box that appliesChecklist
Put an
x
in the boxes that apply.main
branch (left side). Also you should start your branch off ourmain
.Further comments
N/A