Spaces:
Runtime error
Runtime error
# Hugging Face Implementation Plan | |
## Overview | |
This document outlines the plan to rebuild the RAG system using Hugging Face's models and capabilities instead of Google Cloud services, while preserving the original cloud implementation as a separate option. | |
## Repository Links | |
- GitHub: https://github.com/Daanworg/cloud-rag-webhook | |
- Hugging Face Space: https://huggingface.co/spaces/Ultronprime/cloud-rag-webhook | |
## Migration Strategy | |
The key difference in our approach is to **replace all Google Cloud dependencies with Hugging Face models and tools**: | |
1. **Replace Google's DocumentAI** β Use Hugging Face OCR models (like `microsoft/layoutlm-base-uncased`) | |
2. **Replace Vertex AI** β Use Hugging Face embeddings models (like `sentence-transformers/all-MiniLM-L6-v2`) | |
3. **Replace BigQuery** β Use FAISS/Chroma vector store with local storage or Hugging Face Datasets | |
4. **Replace Cloud Storage** β Use Hugging Face's persistent storage | |
5. **Replace Cloud Run** β Use Hugging Face Spaces continuous execution | |
## Implementation Steps | |
1. **Set Up New Architecture**: | |
- Create a revised Dockerfile for Hugging Face | |
- Set up persistent storage (20GB purchased) | |
- Configure A100 GPU using `accelerate` for pro users | |
2. **Replace Text Processing Pipeline**: | |
- Create a new OCR module using Transformers document models | |
- Implement a chunking system using pure Python | |
- Add text cleaning and processing without DocumentAI | |
3. **Replace Vector Database**: | |
- Implement FAISS/Chroma for vector storage | |
- Use Hugging Face Datasets for persistent indexed storage | |
- Create migration utility to move data from BigQuery | |
4. **Replace Embedding System**: | |
- Use `sentence-transformers` models for embeddings | |
- Implement similarity search using FAISS/Chroma | |
- Create a compatible API to replace Vertex AI functions | |
5. **Update Application Layer**: | |
- Modify Flask app to run on Hugging Face | |
- Update file handling to use local storage | |
- Create model caching for better performance | |
## Key Components | |
1. **Text Processing**: | |
```python | |
# New approach using Hugging Face models | |
from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
from datasets import Dataset | |
def process_text(text_content): | |
"""Process text using Hugging Face models.""" | |
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") | |
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased") | |
# Process and chunk the text | |
chunks = chunk_text(text_content) | |
# Store in persistent dataset | |
dataset = Dataset.from_dict({"text": chunks}) | |
dataset.save_to_disk("./data/chunks") | |
return dataset | |
``` | |
2. **Vector Storage**: | |
```python | |
# New approach using FAISS | |
import faiss | |
import numpy as np | |
from sentence_transformers import SentenceTransformer | |
class FAISSVectorStore: | |
def __init__(self): | |
self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2') | |
self.dimension = self.model.get_sentence_embedding_dimension() | |
self.index = faiss.IndexFlatL2(self.dimension) | |
self.texts = [] | |
def add_texts(self, texts): | |
embeddings = self.model.encode(texts) | |
self.index.add(np.array(embeddings, dtype=np.float32)) | |
self.texts.extend(texts) | |
def search(self, query, k=5): | |
query_embedding = self.model.encode([query])[0] | |
distances, indices = self.index.search( | |
np.array([query_embedding], dtype=np.float32), k | |
) | |
return [self.texts[i] for i in indices[0]] | |
``` | |
3. **Hugging Face Space Configuration**: | |
```yaml | |
title: RAG Document Processing | |
emoji: π | |
colorFrom: blue | |
colorTo: green | |
sdk: docker | |
app_port: 7860 | |
pinned: false | |
models: | |
- sentence-transformers/all-MiniLM-L6-v2 | |
- facebook/bart-large-cnn | |
license: apache-2.0 | |
``` | |
## Automation Plan | |
1. **Background Processing**: | |
- Implement a file watcher for the persistent storage directory | |
- Process files automatically when added to upload directory | |
- Use Gradio/Streamlit for UI with background task system | |
2. **Scheduled Tasks**: | |
- Use Hugging Face Space's GitHub Actions for scheduling | |
- Run index maintenance tasks periodically | |
- Implement file processing queue for batch operations | |
3. **GitHub Integration**: | |
- Push processed data to GitHub repository as backup | |
- Use GitHub to store model configuration | |
- Implement version control for processed data | |
## Required Libraries | |
``` | |
transformers==4.40.0 | |
datasets==2.17.1 | |
sentence-transformers==2.3.1 | |
faiss-cpu==1.7.4 # or faiss-gpu for CUDA support | |
gradio==4.19.2 | |
streamlit==1.32.0 | |
langchain==0.1.5 | |
torch==2.1.2 | |
accelerate==0.28.0 | |
``` | |
## Hardware Requirements | |
- Use Hugging Face Pro's free A100 tier (zero.gpu) | |
- Configure model inference for optimal performance on GPU | |
- Set up model caching to reduce memory usage | |
- Utilize Hugging Face's persistent storage (20GB) | |
## Project Goals | |
Create a fully self-contained RAG system on Hugging Face: | |
1. Process text files automatically | |
2. Generate embeddings with Hugging Face models | |
3. Store vectors in FAISS/Chroma on persistent storage | |
4. Query the data with a simple API | |
5. Run continuously "under the hood" | |
6. Utilize Hugging Face Pro benefits (A100 GPU, persistent storage) | |
## Implementation Files | |
We'll create the following new files to implement the Hugging Face version: | |
1. `hf_process_text.py` - Text processing with HF models | |
2. `hf_embeddings.py` - Embedding generation with sentence-transformers | |
3. `hf_vector_store.py` - FAISS/Chroma implementation | |
4. `hf_app.py` - Gradio/Streamlit interface | |
5. `hf_rag_query.py` - Query interface for HF models | |
6. `requirements_hf.txt` - HF-specific dependencies | |
This will allow us to maintain both implementations in parallel. |