--- title: PDF Chat Assistant emoji: 📄 colorFrom: blue colorTo: indigo sdk: streamlit app_file: src/app.py pinned: false --- # PDF Chat Assistant Interact with your PDF using Retrieval-Augmented Generation (RAG) + Gemini. Upload a PDF, it is chunked, embedded, and you can ask questions with contextual, streamed answers. ## Features - PDF upload & inline preview - Automatic text extraction, cleaning, chunking - Embedding storage (pickle vector store) - Similarity-based context retrieval - Gemini response generation (streaming) - Scrollable chat UI ## Conda Setup ```bash git clone https://github.com/Seif-aber/pdf_chat_assistant cd pdf-chat-assistant # Create environment conda create -n pdfchat python=3.12 -y conda activate pdfchat # Install dependencies pip install -r requirements.txt ``` ## Environment Variables Create a `.env` file in project root: ``` GEMINI_API_KEY=your_key_here GEMINI_MODEL=gemini-2.5-flash EMBEDDING_MODEL=models/embedding-001 STREAMLIT_PORT=8501 MAX_PDF_SIZE_MB=10 CHUNK_SIZE=1000 CHUNK_OVERLAP=200 UPLOAD_FOLDER=data/uploads EMBEDDINGS_FOLDER=data/embeddings ``` Then: ```bash streamlit run src/app.py --server.port $STREAMLIT_PORT ``` ## How It Works 1. Upload PDF → saved to a temp file. 2. Text extracted (PyPDF2 / pypdf fallback) and chunked with overlap. 3. Each chunk embedded via Gemini Embeddings API. 4. On question: create query embedding → cosine similarity → top chunks form context. 5. Gemini model generates constrained to context.