---
title: Docubot-PDF Analyzer
emoji: 🔥
colorFrom: purple
colorTo: purple
sdk: streamlit
sdk_version: 1.45.0
app_file: app.py
pinned: false
license: mit
short_description: A prototype for final project in NLP
---

# 📄 DocuBot: PDF Analyzer

A lightweight Streamlit app that lets you analyze academic PDFs or lecture slides — no LLMs needed!

---

## 🚀 What This App Does
- 🧠 Named Entity Recognition (NER): Extracts people, places, and organizations.
- 🔍 Document Search: Answers your custom questions using TF-IDF relevance.
- 📝 Extractive Summarization: Highlights the most important sentences using TextRank.
- 📥 Summary Download: Export your summary as .txt or .pdf.
- 🌗 Light/Dark UI toggle (Streamlit theme).

---

## 🧪 How It Works
- Text is extracted using pdfplumber.
- Entities are recognized using spaCy's transformer model (en_core_web_trf).
- Document search uses TF-IDF with cosine similarity.
- Summarization is done via sumy's TextRank.
- Everything runs locally in-browser via Streamlit.

---

## 📂 File Types Supported
- Standard PDFs (.pdf)
- Lecture slides saved as PDF (.pptx.pdf)

---

## 🧑‍💻 How to Use (on Hugging Face Spaces)
1. Navigate to the "📂 Demo" tab.
2. Upload a PDF or use the provided sample.
3. Optionally ask a question like "What is the main topic?"
4. View the entities, relevant chunks, and summary.
5. Download results and rate your experience.

---

## 🛠 Dependencies
Add these to requirements.txt if you're running locally:

```
streamlit
pdfplumber
spacy
en_core_web_trf
scikit-learn
sumy
fpdf
```

---

## 🙌 Credits
Built with 💙 using open-source NLP libraries. 
Project created for learning and experimentation purposes.

---

Have fun analyzing! 🤖


Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference