Web Article QA Tool is a user-friendly tool designed for effortless information retrieval. Users can input article URLs and ask questions to receive relevant insights from the given urls.
- Load URLs or upload text files containing URLs to fetch article content.
- Process article content through LangChain's UnstructuredURL Loader
- Construct an embedding vector using open source embeddings and leverage FAISS, a powerful similarity search library, to enable swift and effective retrieval of relevant information
- Interact with the LLM's (Google Gemini) by inputting queries and receiving answers along with source URLs.
1.Clone this repository to your local machine using:
git clone https://github.com/parthivshah33/Web-article-QA-Tool.git
2.Navigate to the project directory:
cd WebArticle QA Bot
- Install the required dependencies using pip:
pip install -r requirements.txt
4.Set up your OpenAI API key by creating a .env file in the project root and adding your API
GEMINI_API_KEY = "your api key"
- Run the Streamlit app by executing:
streamlit run main.py
2.The web app will open in your browser.
-
On the sidebar, you can input URLs directly.
-
Initiate the data loading and processing by clicking "Process URLs."
-
Observe the system as it performs text splitting, generates embedding vectors, and efficiently indexes them using FAISS.
-
The embeddings will be stored and indexed using FAISS, enhancing retrieval speed.
-
The FAISS index will be saved in a local file path in pickle format for future use.
-
One can now ask a question and get the answer based on those news articles
- main.py: The main Streamlit application script.
- requirements.txt: A list of required Python packages for the project.
- vectorStoreDB.pkl: A pickle file to store the FAISS index.
- .env: Configuration file for storing your OpenAI API key.