Welcome to the BorusanAuto-EmbedStorage repository, the hub for innovative PDF processing and data embedding techniques developed during the Borusan AutoHackathon.
This project involves an automated process of extracting text from PDFs, generating embeddings using Azure OpenAI text-embedding-ada-002
model, and efficiently storing these embeddings in a Qdrant database for advanced search and retrieval.
BorusanOto π - Embeddings Workflow Diagram π |
---|
Qdrant Collection Snapshot π |
---|
- PDF Text Extraction: Convert PDF documents into manageable text chunks.
- Embedding Generation: Utilize Azure and OpenAI models for embedding generation.
- Qdrant Integration: Seamlessly store and manage embeddings in Qdrant collections.
To begin using this repository, clone the repo and follow the setup and cell run instructions in the notebook.
Contributions to enhance the project are welcome. Please read the contribution guidelines for more information.
This project is licensed under the MIT License - see the LICENSE file for details.
A big thank you to Borusan for hosting the AutoHackathon and providing an opportunity to innovate in the automotive and AI space.
For more information on the Borusan AutoHackathon, visit Borusan AutoHackathon Details.