A production-ready RAG (Retrieval Augmented Generation) system built with FastAPI, LangChain, LangServe, LangSmith, Hugging Face, and Qdrant for document processing and intelligent querying.
- PDF Document Processing with automatic chunking and metadata enrichment
- Vector Search using Qdrant with sentence-transformers/all-MiniLM-L6-v2 embedding model for efficient document retrieval
- Integration with google/flan-t5-base for question answering
- RESTful API with streaming support
- Docker containerization with multi-service architecture
- FastAPI + Uvicorn
- LangChain + LangServe
- Qdrant Vector Database
- Hugging Face Models (google/flan-t5-base, sentence-transformers/all-MiniLM-L6-v2)
- LangSmith for monitoring and debugging
- PyPDF for document processing
- Clone the repository:
git clone https://github.com/hoduy511/rag-langchain.git
cd rag-langchain
- Create and configure your .env file with required variables:
cp .env-dev .env
- Start the services using
make up
command:
make up
- API Documentation: http://localhost:8000/docs
- Interactive Playground: http://localhost:8000/rag/playground
GET /health
- Health check endpointPOST /api/v1/upload
- Upload and process PDF documentsPOST /api/v1/query
- Query the knowledge basePOST /api/v1/search
- Perform similarity search
- FastAPI application handling HTTP requests
- Route definitions for document upload, querying, and search
- Input validation and response formatting
- CORS and middleware configuration
- Text chunking with configurable size and overlap
- Metadata enrichment
- UTF-8 encoding handling
- Content cleaning and normalization
- google/flan-t5-base model integration
- Text generation pipeline configuration
- Token length management
- Model parameter optimization
- PDF document processing
- Text extraction and cleaning
- Temporary file management
- Chunk generation and storage
- Qdrant vector database integration
- Document embedding using sentence-transformers
- Similarity search functionality
- Collection management and indexing
- LangChain implementation for question answering
- Integration with google/flan-t5-base model
- Prompt management and chain composition
- Context retrieval and response generation
- Pydantic schemas for request/response validation
- Data transfer object definitions
- Type hints and validation rules
Available make commands for development:
up
: Start all services with docker-composedown
: Stop all services and remove containerslogs
: View container logs in follow modeshell
: Open interactive shell in app containerclean
: Remove all containers, volumes and prune systemtest
: Run pytest test suiteformat
: Format Python code with autopep8 and isortlint
: Run flake8 linter checks
The project includes comprehensive tests for all components including API endpoints, RAG chain implementation, and various services.
- API settings
- Qdrant vector database configuration
- Model settings (google/flan-t5-base and sentence-transformers/all-MiniLM-L6-v2)
- LangChain integration parameters
- LangSmith API keys and project settings
- Hugging Face API tokens and model configurations