Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run with VLM #1792

Open
Samjith888 opened this issue Nov 13, 2024 · 11 comments
Open

Run with VLM #1792

Samjith888 opened this issue Nov 13, 2024 · 11 comments

Comments

@Samjith888
Copy link

Thanks for adding support to VLM.

I was using this notebook.Tried with the Qwen2-VL-7B-Instruct and Llama-3.2-11B-Vision-Instruct, but in the script it's mentioned that openai/meta-llama/ and openai/Qwen/. So that its asking for openai's api key too. Is there any other way to use these models without using openai?

image

@MohammedAlhajji
Copy link
Contributor

the openai prefix is just to let litellm know that this is OpenAI-Compatible endpoint so it knows how to call it. as an api_key you can give it anything and it will accept it if there's no authentication set up on your endpoint. For example, because I manged my own deployment, my api_key="fake-key". You just have to put something it doesn't need to be an actual api key, if there's no authentication on the endpoint

@Samjith888
Copy link
Author

I tried as mentioned,

import dspy
from dspy.datasets import DataLoader
from dspy.evaluate.metrics import answer_exact_match
from typing import List
from dspy.evaluate import Evaluate

import dotenv
import litellm

litellm.suppress_debug_info = True

dotenv.load_dotenv()

def debug_exact_match(example, pred, trace=None, frac=1.0):
    print(example.inputs())
    print(example.answer)
    print(pred)
    return answer_exact_match(example, pred, trace, frac)




qwen_lm = dspy.LM(model="openai/Qwen/Qwen2-VL-7B-Instruct", api_base="http://localhost:8000/v1", api_key="fake-key", max_tokens=5000)

dspy.settings.configure(lm=qwen_lm)



class DogPictureSignature(dspy.Signature):
    """Answer the question based on the image."""
    image: dspy.Image = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()

class DogPicture(dspy.Module):
    def __init__(self) -> None:
        self.predictor = dspy.ChainOfThought(DogPictureSignature)
    
    def __call__(self, **kwargs):
        return self.predictor(**kwargs)

dog_picture = DogPicture()

example = dspy.Example(image=dspy.Image.from_url("https://i.pinimg.com/564x/78/f9/6d/78f96d0314d39a1b8a849005123e166d.jpg"), question="What is the breed of the dog in the image?").with_inputs("image", "question")
print(dog_picture(**example.inputs()))

Getting error:
image

@MohammedAlhajji
Copy link
Contributor

try setting litellm.set_verbose=True and see the curl command that it output. Try it, if it doesn't work, then there's something wrong in the configuration, api base may not be correct or something like that. try it also on a smaller prompt, something like qwen_lm("hi") just to get a clean curl command you can play with

@okhat
Copy link
Collaborator

okhat commented Nov 13, 2024

@Samjith888 Sorry if this is obvious, but have you launched Qwen with VLLM or SGLang first?

See dspy.ai for instructions on launching LMs. It's now on the landing page (new).

@danilotpnta
Copy link

danilotpnta commented Nov 15, 2024

Hi, I am also interested on this. Do I understand correctly that to use local LLMs we need to create a server in sglang so that we can it in the dspy.LM module? What about using only VLLM. I am looking at documentation on: "Local LMs on a GPU server" and I am trying to use "EleutherAI/gpt-j-6B" for prompt optimization. I was loading the model using the HFmodel but ran into some problems. You can take a look at what I was trying in this notebook and using only vLLM in this script.

Thanks for the support!

@Samjith888
Copy link
Author

@okhat Qwen model works with vLLM. I tested it.

@danilotpnta
Copy link

danilotpnta commented Nov 15, 2024

@Samjith888 did you use SGlang or did you launch vLLM server using the HFClient vLLM?

python -m vllm.entrypoints.openai.api_server --model mosaicml/mpt-7b --port 8000

@Samjith888
Copy link
Author

No @danilotpnta , i didn't try.

@okhat
Copy link
Collaborator

okhat commented Nov 15, 2024

@Samjith888 @danilotpnta Yes, you need to launch SGLang or vLLM (or similar things like TGI).

That's going to resolve the issue. Is there a reason you wouldn't want to do this?

(separately, @danilotpnta , EleutherAI/gpt-j-6B is an extremely undertrained and weak model. I don't think you can get it to do much. Why not use Llama-3 base or instruct, of the same size?)

@danilotpnta
Copy link

@okhat thanks for the reply!

Indeed I have opened a client using vLLM running:
python -m vllm.entrypoints.openai.api_server --model EleutherAI/gpt-j-6B --port 8000

and I am using this script to compare the outputs when using dsp.LM vs dsp.HFClientVLLM. Conceptually, they should provide me with the same output. However, I am puzzled to find that:

  1. Using dsp.LM, I get more than one query-reponse generation. I am unsure why is the behaviour of this since I initially though the migration changes was plug and play. You can see it down below configuring the dspy to use the different LM instances
View log.txt
-- Questions using new dspy.LM -- 

** New response **
Paris

** New response **
Paris.

** New response **
The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed. Find all the latest football transfers on our dedicated page.
-- Questions using HFClientVLLM -- 
WARNING:root:   *** In DSPy 2.5, all LM clients except `dspy.LM` are deprecated, underperform, and are about to be deleted. ***
              You are using the client HFClientVLLM, which will be removed in DSPy 2.6.
              Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs. 

              Learn more about the changes and how to migrate at
              https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb

** New response **
Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

Question: What is the capital of France?
Response: Paris

---

** New response **
Paris

---

Question: What is the capital of France?
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: Paris

---

Question: What is the capital of France?
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: Paris

---

Question: What is the capital of France?
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: Paris

---

Question: What is the capital of France?
Reasoning: Let's think

** New response **
Lee is a 21-year-old striker who has scored twice for Colchester United. He has two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice for the U's but was unable to save them from relegation. The length of Lee's contract with the promoted Tykes has not been revealed.

---

Document: The 21-year-old made seven appearances for the Hammers and netted his only goal for them in a Europa League qualification round match against Andorran side FC Lustrains last season. Lee had two loan spells in League One last term, with Blackpool and then Colchester United. He scored twice
  1. It looks like from the output log.txt of the summary module that dspy.HFClientVLLM is actually generating a somewhat coherent answer while dspy.LM seems to output the same query.

Could be some routing with LiteLLM, but can't seem to figure it how to obtain the same behaviour.

That said, the reason to use this model is due to some reproducibility study. Basically we are trying to improve on the query generation part from a toolkit called InPars and we believe DSPy can certainly improve upon their static prompting.

Btw, we recently talked to the folks from Zeta-Alpha (Jakub and the authors from InPars) and I saw some interview they had with you about DSPy. Cool stuff!

@okhat
Copy link
Collaborator

okhat commented Nov 15, 2024

Thanks for the very nicely presented summary, @danilotpnta! Some comments below.

initially though the migration changes was plug and play

It's a plug-n-play code change, but the behavior is very different under the hood. Can you show me how you're setting up the client? Here's how I'd set it up if you really think EleutherAI/gpt-j-6B is the right choice, but keep in mind that using DSPy to optimize prompts for a base LM like this (one that wasn't instruction-tuned) is not a very common usecase.

What you might need to do is look into how DSPy's Adapters work. These are the components that translate a signature into a prompt, before (or rather, irrespective of) prompt optimization. DSPy 2.5 has more chat-like adapter by default "ChatAdapter", but for a base model, the older approach may be a better fit perhaps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants