This repository contains a practical example about how to build a GPT-4 Q&A app capable of answering questions related to your private documents in just a couple of hours.
The app uses the following technologies:
The repository contains the following applications.
- A
Jupyter Notebook
reads your private documents (for this example I'm using the dotnet microservices book) and stores the content in Pinecone. - A
Streamlit
app allow us to query the data stored in Pinecone using a GPT-4 LLM model.
- Azure OpenAI
- Pinecone
You MUST have the following services running before trying to execute the app.
- An
Azure OpenAI
instance with the following models deployed:text-embedding-ada-002
.gpt-4
orgpt-4-32k
.
The models can be called whatever you like.
- A
Pinecone
database with an index with1536
dimensions andcosine
metric.
The index can be called whatever you like.
Before trying to run the app, read the Prerequisites section.
The repository contains a Jupyter Notebook
that reads a PDF file from the docs
folder, splits the content into multiple chunks and stores them into PineCone.
You must set the following enviroment variables, before executing the Jupyter Notebook
:
PINECONE_API_KEY
: Pinecone ApiKey.PINECONE_ENVIRONMENT
: Pinecone index environment.PINECONE_INDEX_NAME
: Pinecone index name.AZURE_OPENAI_APIKEY
: Azure OpenAI ApiKey.AZURE_OPENAI_BASE_URI
: Azure OpenAI URI.AZURE_OPENAI_EMBEDDINGS_MODEL_NAME
: Thetext-embedding-ada-002
model deployment name.AZURE_OPENAI_GPT4_MODEL_NAME
: Thegpt-4
model deployment name.
What's the model deployment name?
- When you deploy a model on an
Azure OpenAI
instance you must give it a name.
For this example to run properly you need to deploy at least atext-embedding-ada-002
model and agpt-4
model.
The app.py
is a Streamlit
app that does the following steps:
- Converts your query into a vector.
- Retrieves the information that is semantically related to our query from Pinecone.
- Feeds the retrieved information into a LLM model which builds a response.
Run the app locally :
Restore dependencies:
pip install -r requirements.txt
When you install Streamlit
, a command-line (CLI) tool gets installed as well. The purpose of this tool is to run Streamlit
apps.
streamlit run app.py
You MUST set the following environment variables on your local machine before executing the app:
PINECONE_API_KEY
: Pinecone ApiKey.PINECONE_ENVIRONMENT
: Pinecone index environment.PINECONE_INDEX_NAME
: Pinecone index name.AZURE_OPENAI_APIKEY
: Azure OpenAI ApiKey.AZURE_OPENAI_BASE_URI
: Azure OpenAI URI.AZURE_OPENAI_EMBEDDINGS_MODEL_NAME
: Thetext-embedding-ada-002
model deployment name.AZURE_OPENAI_GPT4_MODEL_NAME
: Thegpt-4
model deployment name.
Run the app in a container:
This repository has a Dockerfile
in case you prefer to execute the app on a container.
Build the image:
docker build -t qa-app .
Run it:
docker run -p 5050:5050 \
-e AZURE_OPENAI_APIKEY="<azure-openai-api-key>" \
-e AZURE_OPENAI_BASE_URI="<azure-openai-api-uri>" \
-e AZURE_OPENAI_EMBEDDINGS_MODEL_NAME="<azure-openai-embeddings-deployment-model-name>" \
-e AZURE_OPENAI_GPT4_MODEL_NAME="<azure-openai-gpt4-deployment-model-name>" \
-e PINECONE_INDEX="<pinecone-index-name>" \
-e PINECONE_ENVIRONMENT="<pinecone-environment-name>" \
-e PINECONE_API_KEY="<pinecone-api-key>" \
qa-app