Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLamaIndex Integration #12

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

LLamaIndex Integration #12

wants to merge 6 commits into from

Conversation

gallegi
Copy link

@gallegi gallegi commented Nov 22, 2024

  • LlamaIndex integration using workflow
  • Continue conversation (ask follow-up questions)
  • RAG pipeline, allow dropping multiple pdf, doc, txt files and ask related questions
  • Fix text cut off in the chat box
  • Be able to display markdown in the chat box

self.node_processor = SimilarityPostprocessor(similarity_cutoff=0.3)
self.llm = llm

def udpate_index(self, files: Optional[Set[str] ] = set()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be update_index?


@step
async def setup(self, ctx: Context, ev: StartEvent) -> SetupEvent:
# set frequetly used variables to context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frequently?

messages=[{"role": "user", "content": message}], stream=stream
)

import asyncio
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move import to top?

@@ -9,21 +10,31 @@ class ProcessingThread(QThread):
update_signal = pyqtSignal(str)
finished_signal = pyqtSignal()

def __init__(self, model, prompt, image=None):
def __init__(self, model, prompt, lookup_files=set(), image=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using set() as a default parameter value can be dangerous because it creates a mutable default argument, which is a common Python pitfall. The same set object will be shared across all instances of the class. Instead, use None and create the set inside the method:

def __init__(self, model, prompt, lookup_files=None, image=None):
    self.lookup_files = set() if lookup_files is None else lookup_files

document_icon = "llama_assistant/resources/document_icon.png"

# for RAG pipeline
embed_model_name = "BAAI/bge-base-en-v1.5"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO: Make it configurable next time.

@@ -7,4 +8,8 @@ huggingface_hub==0.25.1
openwakeword==0.6.0
pyinstaller==6.10.0
ffmpeg-python==0.2.0
llama-index-core==0.12.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add new requirements to pyproject.toml to be installable from PyPi.

@vietanhdev
Copy link
Member

Error due to missing package:

 File "/opt/homebrew/Caskroom/miniforge/base/envs/la/lib/python3.11/site-packages/llama_index/core/readers/file/base.py", line 67, in _try_loading_included_file_formats
    raise ImportError("`llama-index-readers-file` package not found")
ImportError: `llama-index-readers-file` package not found

@vietanhdev
Copy link
Member

vietanhdev commented Nov 23, 2024

Error while inference:

 File "/opt/homebrew/Caskroom/miniforge/base/envs/la/lib/python3.11/site-packages/llama_cpp/llama_chat_format.py", line 289, in _convert_text_completion_chunks_to_chat
    for i, chunk in enumerate(chunks):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/la/lib/python3.11/site-packages/llama_cpp/llama.py", line 1269, in _create_completion
    raise ValueError(
ValueError: Requested tokens (2387) exceed context window of 2048

File:
sockets.txt

Question:

Is socket supported in AnyLearning?

Fix:

This works when I update all context length (2048) to 4096. The answer was based on the content of the text file. Good job!

TODO: Make it configurable.

@vietanhdev
Copy link
Member

Implementing #13.

@vietanhdev vietanhdev changed the title Update features LLamaIndex Integration Nov 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants