Skip to content

Commit

Permalink
Added "Auto index on upload" option
Browse files Browse the repository at this point in the history
  • Loading branch information
szczyglis-dev committed Nov 25, 2024
1 parent 1f4b22b commit a65b87c
Show file tree
Hide file tree
Showing 20 changed files with 396 additions and 25 deletions.
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
# CHANGELOG

## 2.4.31 (2024-11-25)

- Added an option checkbox `Auto-index on upload` in the `Attachments` tab:

**Tip:** To use the `Query only` mode, the file must be indexed in the vector database. This occurs automatically at the time of upload if the `Auto-index on upload` option in the `Attachments` tab is enabled. When uploading large files, such indexing might take a while - therefore, if you are using the `Full context` option, which does not use the index, you can disable the `Auto-index` option to speed up the upload of the attachment. In this case, it will only be indexed when the `Query only` option is called for the first time, and until then, attachment will be available in the form of `Full context` and `Summary`.

- Added context menu options in `Uploaded attachments` tab: `Open`, `Open Source directory` and `Open Storage directory`.

## 2.4.30 (2024-11-25)

- Added instruction to model about mapped data directory in both legacy and IPython code interepreter.
Expand Down
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -356,7 +356,7 @@ This mode in **PyGPT** mirrors `ChatGPT`, allowing you to chat with models such

The main part of the interface is a chat window where conversations appear. Right below that is where you type your messages. On the right side of the screen, there's a section to set up or change your system prompts. You can also save these setups as presets to quickly switch between different models or tasks.

Above where you type your messages, the interface shows you the number of tokens your message will use up as you type it – this helps to keep track of usage. There's also a feature to upload files in this area. Go to the `Files` tab to manage your uploads or add attachments to send to the OpenAI API (but this makes effect only in `Assisant` and `Vision` modes).
Above where you type your messages, the interface shows you the number of tokens your message will use up as you type it – this helps to keep track of usage. There's also a feature to upload files in this area. Go to the `Attachments` tab to manage your uploads or add attachments to send to the OpenAI API (but this makes effect only in `Assisant` and `Vision` modes).

![v2_mode_chat](https://github.com/szczyglis-dev/py-gpt/assets/61396542/f573ee22-8539-4259-b180-f97e54bc0d94)

Expand Down Expand Up @@ -1097,6 +1097,10 @@ The content from the uploaded attachments will be used in the current conversati

**Note:** Only text files from the list above are included in the additional context. Images will not be included in the context but will be used by the vision model in real-time. Adding image files to the context will be available in future versions.

**Uploading larger files and auto-index:**

To use the `Query only` mode, the file must be indexed in the vector database. This occurs automatically at the time of upload if the `Auto-index on upload` option in the `Attachments` tab is enabled. When uploading large files, such indexing might take a while - therefore, if you are using the `Full context` option, which does not use the index, you can disable the `Auto-index` option to speed up the upload of the attachment. In this case, it will only be indexed when the `Query only` option is called for the first time, and until then, attachment will be available in the form of `Full context` and `Summary`.

## Files (download, code generation)

**PyGPT** enables the automatic download and saving of files created by the model. This is carried out in the background, with the files being saved to an `data` folder located within the user's working directory. To view or manage these files, users can navigate to the `Files` tab which features a file browser for this specific directory. Here, users have the interface to handle all files sent by the AI.
Expand Down Expand Up @@ -3652,6 +3656,14 @@ may consume additional tokens that are not displayed in the main window.

## Recent changes:

**2.4.31 (2024-11-25)**

- Added an option checkbox `Auto-index on upload` in the `Attachments` tab:

**Tip:** To use the `Query only` mode, the file must be indexed in the vector database. This occurs automatically at the time of upload if the `Auto-index on upload` option in the `Attachments` tab is enabled. When uploading large files, such indexing might take a while - therefore, if you are using the `Full context` option, which does not use the index, you can disable the `Auto-index` option to speed up the upload of the attachment. In this case, it will only be indexed when the `Query only` option is called for the first time, and until then, attachment will be available in the form of `Full context` and `Summary`.

- Added context menu options in `Uploaded attachments` tab: `Open`, `Open Source directory` and `Open Storage directory`.

**2.4.30 (2024-11-25)**

- Added instruction to model about mapped data directory in both legacy and IPython code interepreter.
Expand Down
4 changes: 4 additions & 0 deletions docs/source/attachments.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ The content from the uploaded attachments will be used in the current conversati

**Note:** Only text files from the list above are included in the additional context. Images will not be included in the context but will be used by the vision model in real-time. Adding image files to the context will be available in future versions.

**Uploading larger files and auto-index:**

To use the ``Query only`` mode, the file must be indexed in the vector database. This occurs automatically at the time of upload if the ``Auto-index on upload`` option in the ``Attachments`` tab is enabled. When uploading large files, such indexing might take a while - therefore, if you are using the ``Full context`` option, which does not use the index, you can disable the ``Auto-index`` option to speed up the upload of the attachment. In this case, it will only be indexed when the ``Query only`` option is called for the first time, and until then, attachment will be available in the form of ``Full context`` and ``Summary``.


Files (download, code generation)
---------------------------------
Expand Down
8 changes: 8 additions & 0 deletions src/pygpt_net/CHANGELOG.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
2.4.31 (2024-11-25)

- Added an option checkbox `Auto-index on upload` in the `Attachments` tab:

Tip: To use the `Query only` mode, the file must be indexed in the vector database. This occurs automatically at the time of upload if the `Auto-index on upload` option in the `Attachments` tab is enabled. When uploading large files, such indexing might take a while - therefore, if you are using the `Full context` option, which does not use the index, you can disable the `Auto-index` option to speed up the upload of the attachment. In this case, it will only be indexed when the `Query only` option is called for the first time, and until then, attachment will be available in the form of `Full context` and `Summary`.

- Added context menu options in `Uploaded attachments` tab: `Open`, `Open Source directory` and `Open Storage directory`.

2.4.30 (2024-11-25)

- Added instruction to model about mapped data directory in both legacy and IPython code interepreter.
Expand Down
15 changes: 15 additions & 0 deletions src/pygpt_net/controller/attachment.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,13 @@ def setup(self):
else:
self.window.ui.nodes['attachments.capture_clear'].setChecked(False)

# auto-index
if self.window.core.config.has('attachments_auto_index') \
and self.window.core.config.get('attachments_auto_index'):
self.window.ui.nodes['attachments.auto_index'].setChecked(True)
else:
self.window.ui.nodes['attachments.auto_index'].setChecked(False)

self.window.core.attachments.load()
self.update()

Expand Down Expand Up @@ -411,6 +418,14 @@ def toggle_capture_clear(self, value: bool):
"""
self.window.core.config.set('attachments_capture_clear', value)

def toggle_auto_index(self, value: bool):
"""
Toggle auto index
:param value: value of the checkbox
"""
self.window.core.config.set('attachments_auto_index', value)

def is_capture_clear(self) -> bool:
"""
Return True if capture clear is enabled
Expand Down
141 changes: 139 additions & 2 deletions src/pygpt_net/controller/chat/attachment.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,9 +114,12 @@ def upload(self, meta: CtxMeta, mode: str, prompt: str) -> bool:
:return: True if uploaded
"""
self.uploaded = False
auto_index = self.window.core.config.get("attachments_auto_index", False)
attachments = self.window.core.attachments.get_all(mode, only_files=True)

if self.is_verbose() and len(attachments) > 0:
print("\nUploading attachments...\nWork Mode: {}".format(mode))

for uuid in attachments:
attachment = attachments[uuid]
if not self.is_allowed(attachment.path):
Expand All @@ -137,7 +140,13 @@ def upload(self, meta: CtxMeta, mode: str, prompt: str) -> bool:
if self.is_allowed(str(path)):
if self.is_verbose():
print("Uploading unpacked from archive: {}".format(path_relative))
item = self.window.core.attachments.context.upload(meta, sub_attachment, prompt)
item = self.window.core.attachments.context.upload(
meta=meta,
attachment=sub_attachment,
prompt=prompt,
real_path=attachment.path,
auto_index=auto_index,
)
if item:
item["path"] = os.path.basename(attachment.path) + "/" + path_relative
item["size"] = os.path.getsize(path)
Expand All @@ -149,7 +158,13 @@ def upload(self, meta: CtxMeta, mode: str, prompt: str) -> bool:
attachment.consumed = True
self.window.core.filesystem.packer.remove_tmp(tmp_path) # clean
else:
item = self.window.core.attachments.context.upload(meta, attachment, prompt)
item = self.window.core.attachments.context.upload(
meta=meta,
attachment=attachment,
prompt=prompt,
real_path=attachment.path,
auto_index=auto_index,
)
if item:
if meta.additional_ctx is None:
meta.additional_ctx = []
Expand Down Expand Up @@ -362,6 +377,128 @@ def clear(self, force: bool = False, remove_local=False, auto: bool = False):
self.window.core.attachments.context.clear(meta, delete_files=remove_local)
self.update_list(meta)

def select(self, idx: int):
"""
Select uploaded file
:param idx: index of file
"""
pass

def open_by_idx(self, idx: int):
"""
Open attachment by index
:param idx: Index on list
"""
meta = self.window.core.ctx.get_current_meta()
if meta is None or meta.additional_ctx is None:
return
items = self.window.core.attachments.context.get_all(meta)
if idx < len(items):
item = items[idx]
path = item["path"]
if "real_path" in item:
path = item["real_path"]
if os.path.exists(path) and os.path.isfile(path):
print("Opening attachment: {}".format(path))
self.window.controller.files.open(path)

def open_dir_src_by_idx(self, idx: int):
"""
Open source directory by index
:param idx: Index on list
"""
meta = self.window.core.ctx.get_current_meta()
if meta is None or meta.additional_ctx is None:
return
items = self.window.core.attachments.context.get_all(meta)
if idx < len(items):
item = items[idx]
path = item["path"]
if "real_path" in item:
path = item["real_path"]
dir = os.path.dirname(path)
if os.path.exists(dir) and os.path.isdir(dir):
print("Opening source directory: {}".format(dir))
self.window.controller.files.open(dir)

def open_dir_dest_by_idx(self, idx: int):
"""
Open destination directory by index
:param idx: Index on list
"""
meta = self.window.core.ctx.get_current_meta()
if meta is None or meta.additional_ctx is None:
return
items = self.window.core.attachments.context.get_all(meta)
if idx < len(items):
item = items[idx]
root_dir = self.window.core.attachments.context.get_dir(meta)
dir = os.path.join(root_dir, item["uuid"])
if os.path.exists(dir) and os.path.isdir(dir):
self.window.controller.files.open(dir)
print("Opening destination directory: {}".format(dir))

def has_file_by_idx(self, idx: int) -> bool:
"""
Check if has file by index
:param idx: Index on list
:return: True if has file
"""
meta = self.window.core.ctx.get_current_meta()
if meta is None or meta.additional_ctx is None:
return False
items = self.window.core.attachments.context.get_all(meta)
if idx < len(items):
item = items[idx]
path = item["path"]
if "real_path" in item:
path = item["real_path"]
return os.path.exists(path) and os.path.isfile(path)
return False

def has_src_by_idx(self, idx: int) -> bool:
"""
Check if has source directory by index
:param idx: Index on list
:return: True if has source directory
"""
meta = self.window.core.ctx.get_current_meta()
if meta is None or meta.additional_ctx is None:
return False
items = self.window.core.attachments.context.get_all(meta)
if idx < len(items):
item = items[idx]
path = item["path"]
if "real_path" in item:
path = item["real_path"]
dir = os.path.dirname(path)
return os.path.exists(dir) and os.path.isdir(dir)
return False

def has_dest_by_idx(self, idx: int) -> bool:
"""
Check if has destination directory by index
:param idx: Index on list
:return: True if has destination directory
"""
meta = self.window.core.ctx.get_current_meta()
if meta is None or meta.additional_ctx is None:
return False
items = self.window.core.attachments.context.get_all(meta)
if idx < len(items):
item = items[idx]
root_dir = self.window.core.attachments.context.get_dir(meta)
dir = os.path.join(root_dir, item["uuid"])
return os.path.exists(dir) and os.path.isdir(dir)
return False

@Slot(object)
def handle_upload_error(self, error: Exception):
"""
Expand Down
55 changes: 46 additions & 9 deletions src/pygpt_net/core/attachments/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,28 @@ def query_context(self, meta: CtxMeta, query: str) -> str:
if not os.path.exists(meta_path) or not os.path.isdir(meta_path):
return ""
idx_path = os.path.join(self.get_dir(meta), self.dir_index)

indexed = False
# index files if not indexed by auto_index
for i, file in enumerate(meta.additional_ctx):
if "indexed" not in file or not file["indexed"]:
file_id = file["uuid"]
file_idx_path = os.path.join(meta_path, file_id)
file_path = os.path.join(file_idx_path, file["name"])
model = None
doc_ids = self.window.core.idx.indexing.index_attachment(file_path, idx_path, model)
if self.is_verbose():
print("Attachments: indexed. Doc IDs: {}".format(doc_ids))
file["indexed"] = True
file["doc_ids"] = doc_ids
#meta.additional_ctx[i] = file # update meta
indexed = True

if indexed:
# update ctx in DB
self.window.core.ctx.replace(meta)
self.window.core.ctx.save(meta.id)

model = None
result = self.window.core.idx.chat.query_attachment(query, idx_path, model)

Expand Down Expand Up @@ -162,13 +184,22 @@ def summary_context(self, ctx: CtxItem, query: str) -> str:
print("Attachments: summary received: {}".format(response))
return response

def upload(self, meta: CtxMeta, attachment: AttachmentItem, prompt: str) -> dict:
def upload(
self,
meta: CtxMeta,
attachment: AttachmentItem,
prompt: str,
auto_index: bool = False,
real_path: str = None
) -> dict:
"""
Upload attachment for context
:param meta: CtxMeta instance
:param attachment: AttachmentItem instance
:param prompt: user input prompt
:param auto_index: auto index
:param real_path: real path
:return: Dict with attachment data
"""
if self.is_verbose():
Expand All @@ -185,7 +216,8 @@ def upload(self, meta: CtxMeta, attachment: AttachmentItem, prompt: str) -> dict

if self.is_verbose():
print("Attachments: created path: {}".format(meta_path))
print("Attachments: vector index path: {}".format(index_path))
if auto_index:
print("Attachments: vector index path: {}".format(index_path))

# copy raw file
raw_path = os.path.join(file_idx_path, name)
Expand Down Expand Up @@ -214,24 +246,29 @@ def upload(self, meta: CtxMeta, attachment: AttachmentItem, prompt: str) -> dict
tokens = self.window.core.tokens.from_str(text)

# index file to ctx index
model = None
doc_ids = self.window.core.idx.indexing.index_attachment(attachment.path, index_path, model)

if self.is_verbose():
print("Attachments: indexed. Doc IDs: {}".format(doc_ids))
doc_ids = []
if auto_index:
model = None
doc_ids = self.window.core.idx.indexing.index_attachment(attachment.path, index_path, model)
if self.is_verbose():
print("Attachments: indexed. Doc IDs: {}".format(doc_ids))

result = {
"name": name,
"path": attachment.path,
"type": "local_file",
"uuid": str(file_id),
"doc_ids": doc_ids,
"indexed": True,
"content_type": "text",
"size": os.path.getsize(attachment.path),
"length": len(text),
"tokens": tokens,
"indexed": False,
}
if auto_index:
result["indexed"] = True
result["doc_ids"] = doc_ids
if real_path:
result["real_path"] = real_path

if self.is_verbose():
print("Attachments: uploaded: {}".format(result))
Expand Down
1 change: 1 addition & 0 deletions src/pygpt_net/data/config/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
"assistant": "",
"assistant_thread": "",
"assistant.store.hide_threads": true,
"attachments_auto_index": false,
"attachments_send_clear": true,
"attachments_capture_clear": true,
"audio.transcribe.convert_video": true,
Expand Down
Loading

0 comments on commit a65b87c

Please sign in to comment.