Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.2-Vision Implementation #1160

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

mktexan
Copy link

@mktexan mktexan commented Nov 9, 2024

This is a work in progress. Simply got it working. Need to refactor the code back to best practices after the Llama model functions properly and handles all requests made of it. Here are some things I have noted so far.

  1. Llama is successfully called through the current LLM structure in the app.
  2. The Llama 3.2 vision model does not work with most of the current prompts that work with higher level models like GPT, thus, a rewrite of most of them is needed.
  3. Llama3.2-Vision 90b testing will come after I am done with 11b.
  4. The models do not have enough tokens to handle prompt sizes. 2048 is the limit but some prompts are over 3 million.

Important

Integrate Llama 3.2-Vision model by adding handlers, configurations, and environment setups while disabling other providers.

  • Llama Integration:
    • Added llama_handler in llama_handler.py to handle Llama 3.2-Vision API requests.
    • Updated api_handler_factory.py to register llama_handler if Llama is enabled.
    • Modified config_registry.py to register Llama configuration with LLMConfigRegistry.
  • Configuration:
    • Updated config.py and settings_manager.py to include Llama-specific settings and disable other providers.
    • Added Llama environment variables to .env.example.
  • Docker and Setup:
    • Modified Dockerfile and docker-compose.yml to support Llama setup.
    • Updated setup.sh to configure Llama during setup process.
  • Utilities:
    • Enhanced llm_messages_builder in utils.py to handle Llama-specific message formatting.
    • Updated parse_api_response in utils.py to parse Llama responses correctly.

This description was created by Ellipsis for 6c646b5. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to 6c646b5 in 1 minute and 7 seconds

More details
  • Looked at 1058 lines of code in 11 files
  • Skipped 0 files when reviewing.
  • Skipped posting 6 drafted comments based on config settings.
1. skyvern/forge/sdk/api/llm/utils.py:17
  • Draft comment:
    The add_assistant_prefix parameter is not used for Llama models in this function. Consider removing it to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The llm_messages_builder function in utils.py has a redundant parameter add_assistant_prefix for Llama models, as it is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
2. skyvern/forge/sdk/api/llm/utils.py:96
  • Draft comment:
    The add_assistant_prefix parameter is not used for Llama models in this function. Consider removing it to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The llm_messages_builder function in utils.py has a redundant parameter add_assistant_prefix for Llama models, as it is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
3. skyvern/forge/sdk/api/llm/api_handler_factory.py:101
  • Draft comment:
    The add_assistant_prefix parameter is not used for Llama models in llm_messages_builder. Consider removing it to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The llm_messages_builder function in api_handler_factory.py is called with add_assistant_prefix for Llama models, but this parameter is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
4. skyvern/forge/sdk/api/llm/api_handler_factory.py:159
  • Draft comment:
    The add_assistant_prefix parameter is not used for Llama models in llm_messages_builder. Consider removing it to avoid confusion.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The llm_messages_builder function in api_handler_factory.py is called with add_assistant_prefix for Llama models, but this parameter is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.
5. skyvern/forge/sdk/api/llm/llama_handler.py:25
  • Draft comment:
    Consider making the API endpoint URL configurable instead of hardcoding it to http://localhost:11434/api/chat.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The llama_handler function in llama_handler.py uses a hardcoded URL for the API endpoint. This should be configurable to allow flexibility in different environments.
6. skyvern/__init__.py:23
  • Draft comment:
    The llama_handler function is not implemented. Ensure this is completed to provide the necessary functionality.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_NRvLU0214y5uG6Ns


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@@ -11,3 +13,12 @@
},
)
setup_logger()

async def llama_handler(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new llama_handler function appears to duplicate existing functionality in skyvern/forge/sdk/api/llm/llama_handler.py. Consider reusing or extending the existing function instead of adding a new one.

@arisylafeta
Copy link

I've encountered similar problems with context size, otherwise the ollama integration seems to be working solely with some additions to config.py, .env, and config_registry.py. Is there a way to limit the context size sent by the application? (I increased Ollama context limit to 16k but context sometimes explodes to 450k tokens)

@mktexan
Copy link
Author

mktexan commented Nov 20, 2024

I've encountered similar problems with context size, otherwise the ollama integration seems to be working solely with some additions to config.py, .env, and config_registry.py. Is there a way to limit the context size sent by the application? (I increased Ollama context limit to 16k but context sometimes explodes to 450k tokens)

I have yet to find a way to address that. The prompt length required for some jobs gets quite large using llama and I do not know why at this point. Also, llama struggles in particular with the extract-action prompt, and doesn't like to output the required JSON. Sometimes, it will also provide fake element ID's. Im not sure this model is capable of doing the job, but I am also new to utilizing these LLM's. Any input here would be appreciated.

@DominicTWHV
Copy link

Ollama does provide other vision models, have you tried llava or the 90B version of llama3.2 vision?

@mktexan
Copy link
Author

mktexan commented Nov 21, 2024

Ollama does provide other vision models, have you tried llava or the 90B version of llama3.2 vision?

Yes, 90B is just as clueless as 11b sadly. I had the same results.

@DominicTWHV
Copy link

Yes, 90B is just as clueless as 11b sadly. I had the same results.

That's unfortunate, could it be a quantization issue? Ollama models usually comes in Q4, perhaps that's doing weird things?

@mktexan
Copy link
Author

mktexan commented Nov 21, 2024

Yes, 90B is just as clueless as 11b sadly. I had the same results.

That's unfortunate, could it be a quantization issue? Ollama models usually comes in Q4, perhaps that's doing weird things?

Honestly, I do not know. I am new to the LLM space. I will look into it. Thanks!

@DominicTWHV
Copy link

Best of luck!

@rcmorano
Copy link

what about qwen-vl?

@arisylafeta
Copy link

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

@DominicTWHV
Copy link

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

Ollama supports inference of custom models through the ‘ollama create’ endpoint
More info on their docs

@mktexan
Copy link
Author

mktexan commented Nov 23, 2024

Using Ollama, I keep getting errors like this. level=WARN source=runner.go:126 msg="truncating input prompt" limit=128000 prompt=781154 numKeep=5. That prompt length is insane. Does anyone have any idea what's going on here?

@DominicTWHV
Copy link

DominicTWHV commented Nov 23, 2024

exceeded context length of the model in question

its trying to get message history for the last 5 messages, and the total of those exceeded the 128k context length

@mktexan
Copy link
Author

mktexan commented Nov 23, 2024

exceeded context length of the model in question

its trying to get message history for the last 5 messages, and the total of those exceeded the 128k context length

Right. I understand that. I just don't understand how to fix this or why this happens using any model within Ollama. These prompt lengths are outrageous.

@DominicTWHV
Copy link

Not sure what skyvern is doing, can you give an example of what the prompt is like?

@arisylafeta
Copy link

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

Ollama supports inference of custom models through the ‘ollama create’ endpoint More info on their docs

that doesn't work for vision models, yet. Each VLM has a different enough architecture to be a pain in the ass. Might need to fork Ollama and implement it myself.

@DominicTWHV
Copy link

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

Ollama supports inference of custom models through the ‘ollama create’ endpoint More info on their docs

that doesn't work for vision models, yet. Each VLM has a different enough architecture to be a pain in the ass. Might need to fork Ollama and implement it myself.

pity, although ollama is fast evolving. Who knows? Perhaps in a few months

@mktexan
Copy link
Author

mktexan commented Nov 26, 2024

Im thinking of closing this branch down. The goal here should simply be to plug Ollama in. Any rabbit holes in terms of Ollama's capabilities should not be hard coded here.

@DominicTWHV
Copy link

DominicTWHV commented Nov 26, 2024

Def drop a new link in here if you open a new pr! Can’t wait to see ollama powering skyvern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants