--- title: Bottttt emoji: πŸ“‰ colorFrom: gray colorTo: indigo sdk: gradio sdk_version: 5.36.2 app_file: app.py pinned: false license: apache-2.0 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference Great β€” let’s prepare your **RAG app** for **deployment on Hugging Face Spaces** with: * βœ… Gradio as UI * βœ… LLaMA3-Instruct via Groq API * βœ… Sentence Transformers * βœ… ChromaDB with persistence * βœ… PDF upload + student Q\&A --- ## βœ… STEP 1: Project Structure Create this directory structure for your Hugging Face Space: ``` rag-student-assistant/ β”œβ”€β”€ app.py β”œβ”€β”€ requirements.txt └── .env (optional, but don’t upload publicly) ``` --- ## βœ… STEP 2: `app.py` (Full Code) ```python import os import gradio as gr import fitz # PyMuPDF from sentence_transformers import SentenceTransformer import chromadb from chromadb.utils import embedding_functions import openai # Load GROQ API Key openai.api_key = os.getenv("GROQ_API_KEY") openai.api_base = "https://api.groq.com/openai/v1" # Load embedding model embedder = SentenceTransformer("all-MiniLM-L6-v2") # Set up ChromaDB with persistence persist_path = "./chroma_db" db = chromadb.Client(chromadb.config.Settings(persist_directory=persist_path)) collection = db.get_or_create_collection("papers") # Extract text from uploaded PDF def extract_text_from_pdf(file): text = "" doc = fitz.open(stream=file.read(), filetype="pdf") for page in doc: text += page.get_text() return text # Chunk and store in vector DB def chunk_and_store(text): chunks = [text[i:i+500] for i in range(0, len(text), 500)] embeddings = embedder.encode(chunks).tolist() for i, chunk in enumerate(chunks): collection.add(documents=[chunk], ids=[f"id_{len(collection.get()['ids']) + i}"], embeddings=[embeddings[i]]) db.persist() # Retrieve relevant chunks and send to LLaMA3 via Groq def retrieve_and_ask(query): if len(collection.get()["documents"]) == 0: return "Please upload a paper first." query_embedding = embedder.encode([query]).tolist()[0] results = collection.query(query_embeddings=[query_embedding], n_results=3) context = "\n".join(results["documents"][0]) system_prompt = "You are an academic assistant helping students understand research papers." user_prompt = f"Based on the following context:\n{context}\n\nAnswer the question:\n{query}" try: response = openai.ChatCompletion.create( model="llama3-70b-8192", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt} ] ) return response['choices'][0]['message']['content'] except Exception as e: return f"Error: {str(e)}" # Gradio UI def handle_upload(file): if file is None: return "Upload a valid PDF file." text = extract_text_from_pdf(file) chunk_and_store(text) return "βœ… Paper uploaded and processed." def handle_query(query): return retrieve_and_ask(query) with gr.Blocks() as demo: gr.Markdown("### πŸ“˜ RAG Academic Assistant\nUpload a paper and ask questions.") with gr.Row(): file = gr.File(label="Upload PDF", type="binary") upload_btn = gr.Button("Process") upload_output = gr.Textbox() with gr.Row(): query = gr.Textbox(label="Ask a question") response = gr.Textbox(label="Answer") ask_btn = gr.Button("Ask") upload_btn.click(handle_upload, inputs=[file], outputs=[upload_output]) ask_btn.click(handle_query, inputs=[query], outputs=[response]) demo.launch() ``` --- ## βœ… STEP 3: `requirements.txt` ```txt gradio chromadb sentence-transformers PyMuPDF openai ``` > Hugging Face Spaces will auto-install these on build. --- ## βœ… STEP 4: GROQ API Key ### πŸ” Option 1: Use Hugging Face "Secrets" * Go to your Space β†’ **Settings > Secrets** * Add a new secret: * **Name:** `GROQ_API_KEY` * **Value:** `your-api-key-here` No need to change code. It will use `os.getenv("GROQ_API_KEY")`. --- ## βœ… STEP 5: Deploy on Hugging Face 1. Go to [Hugging Face Spaces](https://huggingface.co/spaces) 2. Click **Create New Space** 3. Choose: * **Gradio** * **Public or Private** 4. Upload: * `app.py` * `requirements.txt` 5. Add GROQ API key under **Settings > Secrets** --- ## βœ… You’re Done! After deployment: * Students can upload PDF papers * Ask natural language questions * Get Groq/LLaMA3-generated answers from your vector database --- Would you like me to: * 🎁 Zip the files for direct upload? * πŸ§ͺ Add test examples? * πŸŽ“ Add UI branding for universities or students? Let me know what extras you want!