Llama 3.2-Vision Implementation #1160

mktexan · 2024-11-09T18:51:37Z

This is a work in progress. Simply got it working. Need to refactor the code back to best practices after the Llama model functions properly and handles all requests made of it. Here are some things I have noted so far.

Llama is successfully called through the current LLM structure in the app.
The Llama 3.2 vision model does not work with most of the current prompts that work with higher level models like GPT, thus, a rewrite of most of them is needed.
Llama3.2-Vision 90b testing will come after I am done with 11b.
The models do not have enough tokens to handle prompt sizes. 2048 is the limit but some prompts are over 3 million.

Important

Integrate Llama 3.2-Vision model by adding handlers, configurations, and environment setups while disabling other providers.

Llama Integration:
- Added llama_handler in llama_handler.py to handle Llama 3.2-Vision API requests.
- Updated api_handler_factory.py to register llama_handler if Llama is enabled.
- Modified config_registry.py to register Llama configuration with LLMConfigRegistry.
Configuration:
- Updated config.py and settings_manager.py to include Llama-specific settings and disable other providers.
- Added Llama environment variables to .env.example.
Docker and Setup:
- Modified Dockerfile and docker-compose.yml to support Llama setup.
- Updated setup.sh to configure Llama during setup process.
Utilities:
- Enhanced llm_messages_builder in utils.py to handle Llama-specific message formatting.
- Updated parse_api_response in utils.py to parse Llama responses correctly.

^{This description was created by}^{for 6c646b5. It will automatically update as commits are pushed.}

…wn format fix for Ollama.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 6c646b5 in 1 minute and 7 seconds

More details

Looked at 1058 lines of code in 11 files
Skipped 0 files when reviewing.
Skipped posting 6 drafted comments based on config settings.

1. skyvern/forge/sdk/api/llm/utils.py:17

Draft comment:
The add_assistant_prefix parameter is not used for Llama models in this function. Consider removing it to avoid confusion.
Reason this comment was not posted:
Confidence changes required: 50%
The llm_messages_builder function in utils.py has a redundant parameter add_assistant_prefix for Llama models, as it is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.

2. skyvern/forge/sdk/api/llm/utils.py:96

Draft comment:
The add_assistant_prefix parameter is not used for Llama models in this function. Consider removing it to avoid confusion.
Reason this comment was not posted:
Confidence changes required: 50%
The llm_messages_builder function in utils.py has a redundant parameter add_assistant_prefix for Llama models, as it is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.

3. skyvern/forge/sdk/api/llm/api_handler_factory.py:101

Draft comment:
The add_assistant_prefix parameter is not used for Llama models in llm_messages_builder. Consider removing it to avoid confusion.
Reason this comment was not posted:
Confidence changes required: 50%
The llm_messages_builder function in api_handler_factory.py is called with add_assistant_prefix for Llama models, but this parameter is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.

4. skyvern/forge/sdk/api/llm/api_handler_factory.py:159

Draft comment:
The add_assistant_prefix parameter is not used for Llama models in llm_messages_builder. Consider removing it to avoid confusion.
Reason this comment was not posted:
Confidence changes required: 50%
The llm_messages_builder function in api_handler_factory.py is called with add_assistant_prefix for Llama models, but this parameter is not used in the Llama-specific logic. This can be misleading and should be removed for clarity.

5. skyvern/forge/sdk/api/llm/llama_handler.py:25

Draft comment:
Consider making the API endpoint URL configurable instead of hardcoding it to http://localhost:11434/api/chat.
Reason this comment was not posted:
Confidence changes required: 50%
The llama_handler function in llama_handler.py uses a hardcoded URL for the API endpoint. This should be configurable to allow flexibility in different environments.

6. skyvern/__init__.py:23

Draft comment:
The llama_handler function is not implemented. Ensure this is completed to provide the necessary functionality.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_NRvLU0214y5uG6Ns

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2024-11-09T18:53:04Z

skyvern/__init__.py

@@ -11,3 +13,12 @@
    },
 )
 setup_logger()
+
+async def llama_handler(


The new llama_handler function appears to duplicate existing functionality in skyvern/forge/sdk/api/llm/llama_handler.py. Consider reusing or extending the existing function instead of adding a new one.

function llama_handler (llama_handler.py)

… testing.

…. Closer to success with extract-action.

arisylafeta · 2024-11-20T12:04:43Z

I've encountered similar problems with context size, otherwise the ollama integration seems to be working solely with some additions to config.py, .env, and config_registry.py. Is there a way to limit the context size sent by the application? (I increased Ollama context limit to 16k but context sometimes explodes to 450k tokens)

mktexan · 2024-11-20T12:16:42Z

I've encountered similar problems with context size, otherwise the ollama integration seems to be working solely with some additions to config.py, .env, and config_registry.py. Is there a way to limit the context size sent by the application? (I increased Ollama context limit to 16k but context sometimes explodes to 450k tokens)

I have yet to find a way to address that. The prompt length required for some jobs gets quite large using llama and I do not know why at this point. Also, llama struggles in particular with the extract-action prompt, and doesn't like to output the required JSON. Sometimes, it will also provide fake element ID's. Im not sure this model is capable of doing the job, but I am also new to utilizing these LLM's. Any input here would be appreciated.

DominicTWHV · 2024-11-21T03:45:08Z

Ollama does provide other vision models, have you tried llava or the 90B version of llama3.2 vision?

mktexan · 2024-11-21T23:02:40Z

Ollama does provide other vision models, have you tried llava or the 90B version of llama3.2 vision?

Yes, 90B is just as clueless as 11b sadly. I had the same results.

DominicTWHV · 2024-11-21T23:04:42Z

Yes, 90B is just as clueless as 11b sadly. I had the same results.

That's unfortunate, could it be a quantization issue? Ollama models usually comes in Q4, perhaps that's doing weird things?

mktexan · 2024-11-21T23:06:04Z

Yes, 90B is just as clueless as 11b sadly. I had the same results.

That's unfortunate, could it be a quantization issue? Ollama models usually comes in Q4, perhaps that's doing weird things?

Honestly, I do not know. I am new to the LLM space. I will look into it. Thanks!

DominicTWHV · 2024-11-21T23:07:18Z

Best of luck!

rcmorano · 2024-11-22T09:10:06Z

what about qwen-vl?

arisylafeta · 2024-11-22T11:40:29Z

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

DominicTWHV · 2024-11-22T12:05:29Z

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

Ollama supports inference of custom models through the ‘ollama create’ endpoint
More info on their docs

mktexan · 2024-11-23T00:08:35Z

Using Ollama, I keep getting errors like this. level=WARN source=runner.go:126 msg="truncating input prompt" limit=128000 prompt=781154 numKeep=5. That prompt length is insane. Does anyone have any idea what's going on here?

DominicTWHV · 2024-11-23T08:07:56Z

exceeded context length of the model in question

its trying to get message history for the last 5 messages, and the total of those exceeded the 128k context length

mktexan · 2024-11-23T14:21:34Z

exceeded context length of the model in question

its trying to get message history for the last 5 messages, and the total of those exceeded the 128k context length

Right. I understand that. I just don't understand how to fix this or why this happens using any model within Ollama. These prompt lengths are outrageous.

DominicTWHV · 2024-11-24T09:18:34Z

Not sure what skyvern is doing, can you give an example of what the prompt is like?

arisylafeta · 2024-11-25T01:43:43Z

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

Ollama supports inference of custom models through the ‘ollama create’ endpoint More info on their docs

that doesn't work for vision models, yet. Each VLM has a different enough architecture to be a pain in the ass. Might need to fork Ollama and implement it myself.

DominicTWHV · 2024-11-26T01:12:59Z

@rcmorano Qwen2-VL does not work with Ollama or llama.cpp yet afaik.

Ollama supports inference of custom models through the ‘ollama create’ endpoint More info on their docs

that doesn't work for vision models, yet. Each VLM has a different enough architecture to be a pain in the ass. Might need to fork Ollama and implement it myself.

pity, although ollama is fast evolving. Who knows? Perhaps in a few months

mktexan · 2024-11-26T15:04:39Z

Im thinking of closing this branch down. The goal here should simply be to plug Ollama in. Any rabbit holes in terms of Ollama's capabilities should not be hard coded here.

DominicTWHV · 2024-11-26T15:24:20Z

Def drop a new link in here if you open a new pr! Can’t wait to see ollama powering skyvern.

mktexan added 3 commits November 8, 2024 19:00

Adding in support for Llama 3.2

36403e3

Data return instruction updates for Ollama. Added function for markdo…

aecaa23

…wn format fix for Ollama.

Provided more explicit instructions to Llama on its expected output.

6c646b5

ellipsis-dev bot reviewed Nov 9, 2024

View reviewed changes

mktexan added 6 commits November 9, 2024 11:56

Testing llama specific instructions.

35d94e0

Refactor of promtps. Temporary disabling of skyvern prompts for Llama…

44a04cd

… testing.

Further testing of custom prompts for llama.

4219ecf

Switched LLM server. Fine tuned extract-actions for llama.

4b3ee6b

Llama Extract-action outputting json response.

d14dd20

Added utils function to compensate for LLama conversational responses…

ba837f3

…. Closer to success with extract-action.

DominicTWHV mentioned this pull request Nov 26, 2024

Usage of Ollama and local models #1079

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.2-Vision Implementation #1160

Llama 3.2-Vision Implementation #1160

mktexan commented Nov 9, 2024 •

edited

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Nov 9, 2024

arisylafeta commented Nov 20, 2024

mktexan commented Nov 20, 2024 •

edited

Loading

DominicTWHV commented Nov 21, 2024

mktexan commented Nov 21, 2024 •

edited

Loading

DominicTWHV commented Nov 21, 2024

mktexan commented Nov 21, 2024

DominicTWHV commented Nov 21, 2024

rcmorano commented Nov 22, 2024

arisylafeta commented Nov 22, 2024

DominicTWHV commented Nov 22, 2024

mktexan commented Nov 23, 2024 •

edited

Loading

DominicTWHV commented Nov 23, 2024 •

edited

Loading

mktexan commented Nov 23, 2024

DominicTWHV commented Nov 24, 2024

arisylafeta commented Nov 25, 2024

DominicTWHV commented Nov 26, 2024

mktexan commented Nov 26, 2024 •

edited

Loading

DominicTWHV commented Nov 26, 2024 •

edited

Loading

Llama 3.2-Vision Implementation #1160

Are you sure you want to change the base?

Llama 3.2-Vision Implementation #1160

Conversation

mktexan commented Nov 9, 2024 • edited Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Nov 9, 2024

Choose a reason for hiding this comment

arisylafeta commented Nov 20, 2024

mktexan commented Nov 20, 2024 • edited Loading

DominicTWHV commented Nov 21, 2024

mktexan commented Nov 21, 2024 • edited Loading

DominicTWHV commented Nov 21, 2024

mktexan commented Nov 21, 2024

DominicTWHV commented Nov 21, 2024

rcmorano commented Nov 22, 2024

arisylafeta commented Nov 22, 2024

DominicTWHV commented Nov 22, 2024

mktexan commented Nov 23, 2024 • edited Loading

DominicTWHV commented Nov 23, 2024 • edited Loading

mktexan commented Nov 23, 2024

DominicTWHV commented Nov 24, 2024

arisylafeta commented Nov 25, 2024

DominicTWHV commented Nov 26, 2024

mktexan commented Nov 26, 2024 • edited Loading

DominicTWHV commented Nov 26, 2024 • edited Loading

mktexan commented Nov 9, 2024 •

edited

Loading

mktexan commented Nov 20, 2024 •

edited

Loading

mktexan commented Nov 21, 2024 •

edited

Loading

mktexan commented Nov 23, 2024 •

edited

Loading

DominicTWHV commented Nov 23, 2024 •

edited

Loading

mktexan commented Nov 26, 2024 •

edited

Loading

DominicTWHV commented Nov 26, 2024 •

edited

Loading