Skip to content

RAPTOR introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This allows for more efficient and context-aware information retrieval across large texts, addressing common limitations in traditional language models.

Notifications You must be signed in to change notification settings

BlueBash/RAG-Raptor-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interact with your complex PDF that includes images, tables, and graphs using Raptor RAG.

Table of Contents

Introduction

The RAG-RAPTOR-DEMO project simplifies the process of extracting and querying information from Complex PDF documents, including complex content such as tables, graphs, and images. Leveraging state-of-the-art natural language processing models and Unstructured.io for document parsing, as well as integrating RAPTOR, which introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents for more efficient and context-aware information retrieval across large texts, and Raptor Rag for retrieve semantic chunk, the chatbot provides a user-friendly interface to interact with and retrieve detailed information from these documents.

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Shows an illustrated sun in light color mode and a moon with stars in dark color mode.

RAPTOR introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This allows for more efficient and context-aware information retrieval across large texts, addressing common limitations in traditional language models.

For detailed methodologies and implementations, refer to the original paper:

Paper page PWC

Features

  • Table Extraction: Identify and parse tables to retrieve structured data, making it easier to answer data-specific questions.
  • Text Extraction: Efficiently extract and process text from PDFs, enabling accurate and comprehensive information retrieval.
  • Image Analysis: Extract and interpret images within the PDFs to provide contextually relevant information.

Technologies Used

  • LangChain: Framework for building applications with language models.
  • RAG (Retrieval-Augmented Generation): Combines retrieval and generation for more accurate answers.
  • RAPTOR: Constructs a recursive tree structure from documents for efficient, context-aware information retrieval.
  • Streamlit: Framework for creating interactive web applications with Python.
  • Unstructured.io: Tool for parsing and extracting complex content from PDFs, such as tables, graphs, and images.
  • Poetry: Dependency management and packaging tool for Python.

Setup Instructions

Follow these steps to set up the project on your local machine:

1. Clone the Repository:

  • Begin by cloning the repository to your local machine:
https://github.com/langchain-tech/Rag-raptor-demo.git
cd Rag-raptor-demo

2. Install project dependencies:

  • Use Poetry to install the dependencies defined in your pyproject.toml file. This command will also respect the versions pinned in your poetry.lock file:
poetry install

This will create a virtual environment (if one does not already exist) and install the dependencies into it.

3. Activate the virtual environment (optional):

  • If you want to manually activate the virtual environment created by Poetry, you can do so with:
poetry shell

This step is optional because Poetry automatically manages the virtual environment for you when you run commands through it.

4. Set Up Environment Variables: Create a .env file in the root directory of your project and add the required environment variables. For example:

OPENAI_API_KEY=Your_OPENAI_API_KEY
POSTGRES_URL_EMBEDDINDS=YOUR_POSTGRES_URL,  like:-postgresql+psycopg://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}
POSTGRES_URL=YOUR_POSTGRES_URL ,  like:- postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_name}

5. Run Data ingestion file

  • This command will insert data into your postgres database
python3 ingest/app.py

6. Start the Application:

Run the application using Streamlit:

streamlit run app.py

Examples

My test image My test image

About

RAPTOR introduces a novel approach to retrieval-augmented language models by constructing a recursive tree structure from documents. This allows for more efficient and context-aware information retrieval across large texts, addressing common limitations in traditional language models.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages