-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: hmoazzem <moazzem@edgeflare.io>
- Loading branch information
Showing
10 changed files
with
243 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
# Retrieval-Augmented Generation (RAG) on PostgreSQL with PGVector | ||
|
||
Retrieval-Augmented Generation (RAG) is a technique that combines the strengths of large language models (LLMs) with up-to-date, private, internal information to provide more accurate, contextually relevant responses. It works by: | ||
|
||
1. Retrieving relevant information from a knowledge base in response to a query. | ||
2. Augmenting the original query with this retrieved information (context). | ||
3. Using a language model to generate a response based on both the query and the retrieved context. | ||
|
||
<img src="./rag.mmd.svg" alt="Retrieval-Augmented Generation (RAG)" style="max-height: 30rem; width: 100%;"> | ||
|
||
## Key Concepts | ||
|
||
1. **Embeddings**: Vector representations of texts or other data. | ||
2. **Vector Database**: A database optimized for storing and querying vector embeddings (e.g., PostgreSQL with pgvector extension). | ||
3. **Similarity Search**: Finding the most similar vectors to a given query vector. | ||
4. **RAG (Retrieval-Augmented Generation)**: A technique that enhances language models by retrieving relevant information from a knowledge base. | ||
|
||
## PGVector Basics | ||
|
||
PGVector is a PostgreSQL extension that adds support for vector operations and similarity search, enabling efficient storage and querying of embeddings. | ||
|
||
### Data Type | ||
- `vector(n)`: Represents an n-dimensional vector. | ||
|
||
### Similarity Functions | ||
- `<->`: Euclidean distance | ||
- `<#>`: Negative inner product | ||
- `<=>`: Cosine distance | ||
|
||
### Creating a Table with Vector Column | ||
|
||
```sql | ||
CREATE TABLE IF NOT EXISTS embeddings ( | ||
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, | ||
content TEXT, | ||
embedding vector(1536) | ||
); | ||
``` | ||
|
||
### Indexing | ||
|
||
```sql | ||
-- IVFFlat index (faster build, larger index) | ||
CREATE INDEX ON documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100); | ||
-- HNSW index (slower build, faster search) | ||
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); | ||
``` | ||
|
||
## RAG Implementation Steps | ||
|
||
1. **Prepare Knowledge Base**: | ||
- Collect and preprocess documents | ||
- Generate embeddings for documents | ||
- Store documents and embeddings in the vector database | ||
|
||
2. **Query Processing**: | ||
- Generate embedding for the user query | ||
- Perform similarity search to retrieve relevant documents | ||
|
||
3. **Context Augmentation**: | ||
- Combine retrieved documents with the original query | ||
|
||
4. **Generation**: | ||
- Feed the augmented context to the language model | ||
- Generate the response | ||
|
||
## Best Practices | ||
|
||
1. Choose appropriate embedding dimensions (usually 768, 1536, 3072 for modern models). | ||
2. Experiment with different indexing methods for optimal performance. | ||
3. Use batching for efficient embedding generation and database operations. | ||
4. Implement caching mechanisms to reduce redundant computations. | ||
5. Regularly update and maintain your knowledge base for accurate retrievals. | ||
|
||
## Useful PGVector Functions | ||
|
||
- `vector_dims(vector)`: Returns the dimension of a vector | ||
- `vector_norm(vector)`: Calculates the Euclidean norm of a vector | ||
- `vector_add(vector, vector)`: Adds two vectors | ||
- `vector_subtract(vector, vector)`: Subtracts one vector from another | ||
|
||
See implementation [examples/rag/main.go](../examples/rag/main.go). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,26 @@ | ||
graph TD | ||
A[Document_Collection] | ||
B[Document_Chunks] | ||
C[Vector_Database] | ||
subgraph Knowledge_Base | ||
D[Document_Collection] | ||
C[Document_Chunks] | ||
D --> |1a. Chunk & Process| C | ||
end | ||
|
||
A --> |1a. Chunk & Process| B | ||
B -->|1b. Generate Embeddings| C | ||
DB[Vector_Database] | ||
Knowledge_Base -->|1b. Generate Embeddings| DB | ||
|
||
D[User_Query] | ||
Q[User_Query] | ||
E[Query_Embedding] | ||
D -->|2a. Generate Embedding| E | ||
E -->|2b. Vector Similarity Search| C | ||
Q -->|2a. Generate_Embedding| E | ||
E -->|2b. Vector Similarity Search| DB | ||
|
||
F[Retrieved_Context] | ||
C -->|3a. Retrieve Relevant Chunks| F | ||
CTX[Retrieved_Context] | ||
DB -->|3 Retrieve Relevant Chunks| CTX | ||
|
||
G[Augmented_Prompt] | ||
D -->|"4a. Combine"| G | ||
F -->|"4b. Combine"| G | ||
P[Augmented_Prompt] | ||
Q -->|"4a. Combine"| P | ||
CTX -->|"4b. Combine"| P | ||
|
||
H[Language_Model] | ||
I[Final_Response] | ||
G -->|5a. Send to LLM| H | ||
H -->|6a. Generate Response| I | ||
LLM[Language_Model] | ||
R[Final_Response] | ||
P -->|5 Send to LLM| LLM | ||
LLM -->|6 Generate Response| R |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.