-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run with VLM #1792
Comments
the openai prefix is just to let litellm know that this is OpenAI-Compatible endpoint so it knows how to call it. as an api_key you can give it anything and it will accept it if there's no authentication set up on your endpoint. For example, because I manged my own deployment, my |
I tried as mentioned,
|
try setting |
@Samjith888 Sorry if this is obvious, but have you launched Qwen with VLLM or SGLang first? See dspy.ai for instructions on launching LMs. It's now on the landing page (new). |
Hi, I am also interested on this. Do I understand correctly that to use local LLMs we need to create a server in sglang so that we can it in the dspy.LM module? What about using only VLLM. I am looking at documentation on: "Local LMs on a GPU server" and I am trying to use "EleutherAI/gpt-j-6B" for prompt optimization. I was loading the model using the HFmodel but ran into some problems. You can take a look at what I was trying in this notebook and using only vLLM in this script. Thanks for the support! |
@okhat Qwen model works with vLLM. I tested it. |
@Samjith888 did you use SGlang or did you launch vLLM server using the HFClient vLLM?
|
No @danilotpnta , i didn't try. |
@Samjith888 @danilotpnta Yes, you need to launch SGLang or vLLM (or similar things like TGI). That's going to resolve the issue. Is there a reason you wouldn't want to do this? (separately, @danilotpnta , EleutherAI/gpt-j-6B is an extremely undertrained and weak model. I don't think you can get it to do much. Why not use Llama-3 base or instruct, of the same size?) |
@okhat thanks for the reply! Indeed I have opened a client using vLLM running: and I am using this script to compare the outputs when using
View log.txt
Could be some routing with LiteLLM, but can't seem to figure it how to obtain the same behaviour. That said, the reason to use this model is due to some reproducibility study. Basically we are trying to improve on the query generation part from a toolkit called InPars and we believe DSPy can certainly improve upon their static prompting. Btw, we recently talked to the folks from Zeta-Alpha (Jakub and the authors from InPars) and I saw some interview they had with you about DSPy. Cool stuff! |
Thanks for the very nicely presented summary, @danilotpnta! Some comments below.
It's a plug-n-play code change, but the behavior is very different under the hood. Can you show me how you're setting up the client? Here's how I'd set it up if you really think What you might need to do is look into how DSPy's Adapters work. These are the components that translate a signature into a prompt, before (or rather, irrespective of) prompt optimization. DSPy 2.5 has more chat-like adapter by default "ChatAdapter", but for a base model, the older approach may be a better fit perhaps. |
Thanks for adding support to VLM.
I was using this notebook.Tried with the
Qwen2-VL-7B-Instruct
andLlama-3.2-11B-Vision-Instruct
, but in the script it's mentioned thatopenai/meta-llama/
andopenai/Qwen/
. So that its asking for openai's api key too. Is there any other way to use these models without using openai?The text was updated successfully, but these errors were encountered: