Skip to content

Chat with SQL data, preprocessed CSV and XLSX data, uploaded CSV and XSLX files during the interaction with the user interface. - RAG with Tabular datasets.

Notifications You must be signed in to change notification settings

parthivshah33/Q-A-and-RAG-with-SQL-CSV-XLSX-Chatbot

Repository files navigation

Q&A-and-RAG-with-SQL-and-TabularData

Q&A-and-RAG-with-SQL-and-TabularData is a chatbot project that utilizes Google Generative AI Model Gemini, Langchain, SQLite, and ChromaDB and allows users to interact (perform Q&A and RAG) with SQL databases, CSV, and XLSX files using natural language.

Key NOTE: Remember to NOT use a SQL databbases with WRITE privileges. Use only READ and limit the scope. Otherwise your user could manupulate the data (e.g ask your chain to delete data).

Features:

  • Chat with SQL data.
  • Chat with preprocessed CSV and XLSX data.
  • Chat with uploaded CSV and XSLX files during the interaction with the user interface.
  • RAG with Tabular datasets.

Main underlying techniques used in this chatbot:

  • LLM chains and agents
  • Gemini Language Model for Chat
  • Retrieval Augmented generation (RAG)

Models used in this chatbot:

Note :

instructor
sentence-transformer == 2.2.2
  • Please note that I have used HuggingfaceInstructorEmbeddings which is one of the finest Free embeddings models out there, but for that you have to install them using pip to your virtual environment or you can use OpenAI embedding model if you have paid APIs. But this modules sometimes stuck you in "dependency errors".
  • Make sure you install proper versions of each module from requirements.txt file.

Requirements:

  • Operating System : Windows (used by me), Linux can also be used.
  • Get your Gemini API key from here.

Installation:

  • Ensure you have Python installed along with required dependencies.
pip3 install --upgrade pip
conda create -p rag-venv python==3.9.19
git clone <the repository>
conda activate rag-venv
pip install -r requirements.txt

Execution:

  1. To prepare the SQL DB from a .sql file, Copy the file into data/sql directory and in the terminal, from the project folder, execute: (sqlite3 already added to requirements.txt)
pip install sqlite3
Now create a database called `sqldb`:
sqlite3 data/sqldb.db
.read data/sql/<name of your sql database>.sql

Ex:

.read data/sql/Chinook_Sqlite.sql
This command will create a SQL database named `sqldb.db` in the `data` directory. Verify that it created the database
SELECT * FROM <any Table name in your sql database> LIMIT 10;

Ex:

SELECT * FROM Artist LIMIT 10;
  1. To prepare a SQL DB from your CSV and XLSX files, copy your files in data/csv_xlsx and in the terminal, from the project folder, execute:
python src/prepare_csv_xlsx_db.py.
This command will create a SQL database named `csv_xlsx_sqldb.db` in the `data` directory.
  1. To prepare a vectorDB from your CSV and XLSX files, copy your files in data/for_upload and in the terminal, from the project folder, execute:
python src/prepare_csv_xlsx_vectordb.py

This command will create a VectorDB in data/chroma directory.


How to use Chatbot (Important)

Important

- To upload your datasets and chat with them during the interaction with the user interface: **Change the chat functioncality to `Process files`** please note It.
  • To upload your datasets and chat with them during the interaction with the user interface: Change the chat functioncality to Process files please note It.

  • Upload you files and wait for the message indicating the the database is ready.

  • Switch back the chat functioncality to Chatplease note It.

  • Change the RAG with dropdown to Uploaded files. please note It.

  • Start chatting.


Chatbot User Interface

ChatBot UI

Databases:

  • Diabetes dataset: Link
  • Cancer dataset: Link
  • Chinook database: Link

Key frameworks/libraries used in this chatbot:

About

Chat with SQL data, preprocessed CSV and XLSX data, uploaded CSV and XSLX files during the interaction with the user interface. - RAG with Tabular datasets.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published