--- title: Docubot-PDF Analyzer emoji: ๐Ÿ”ฅ colorFrom: purple colorTo: purple sdk: streamlit sdk_version: 1.45.0 app_file: app.py pinned: false license: mit short_description: A prototype for final project in NLP --- # ๐Ÿ“„ DocuBot: PDF Analyzer A lightweight Streamlit app that lets you analyze academic PDFs or lecture slides โ€” no LLMs needed! --- ## ๐Ÿš€ What This App Does - ๐Ÿง  Named Entity Recognition (NER): Extracts people, places, and organizations. - ๐Ÿ” Document Search: Answers your custom questions using TF-IDF relevance. - ๐Ÿ“ Extractive Summarization: Highlights the most important sentences using TextRank. - ๐Ÿ“ฅ Summary Download: Export your summary as .txt or .pdf. - ๐ŸŒ— Light/Dark UI toggle (Streamlit theme). --- ## ๐Ÿงช How It Works - Text is extracted using pdfplumber. - Entities are recognized using spaCy's transformer model (en_core_web_trf). - Document search uses TF-IDF with cosine similarity. - Summarization is done via sumy's TextRank. - Everything runs locally in-browser via Streamlit. --- ## ๐Ÿ“‚ File Types Supported - Standard PDFs (.pdf) - Lecture slides saved as PDF (.pptx.pdf) --- ## ๐Ÿง‘โ€๐Ÿ’ป How to Use (on Hugging Face Spaces) 1. Navigate to the "๐Ÿ“‚ Demo" tab. 2. Upload a PDF or use the provided sample. 3. Optionally ask a question like "What is the main topic?" 4. View the entities, relevant chunks, and summary. 5. Download results and rate your experience. --- ## ๐Ÿ›  Dependencies Add these to requirements.txt if you're running locally: ``` streamlit pdfplumber spacy en_core_web_trf scikit-learn sumy fpdf ``` --- ## ๐Ÿ™Œ Credits Built with ๐Ÿ’™ using open-source NLP libraries. Project created for learning and experimentation purposes. --- Have fun analyzing! ๐Ÿค– Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference