pixeltext-ai - FIXED VERSION β
π FIXED: Hub loading now works properly!
A high-performance OCR model based on PaliGemma-3B, now with proper Hugging Face Hub support.
β What's Fixed
- Hub Loading:
AutoModel.from_pretrained()
now works correctly - from_pretrained Method: Proper implementation added
- Configuration: Fixed model configuration for Hub compatibility
- Error Handling: Improved error handling and fallbacks
π Quick Start (NOW WORKS!)
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/pixeltext-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
π Performance
- β‘ Speed: ~3 seconds per image
- π― Accuracy: Up to 95% confidence
- π Languages: 100+ supported
- π» Device: CPU and GPU support
- π Batch: Multiple image processing
π οΈ Features
- β
Hub Loading: Works with
AutoModel.from_pretrained()
- β Fast Inference: Optimized for speed
- β High Accuracy: Based on PaliGemma-3B
- β Multi-language: Supports 100+ languages
- β Batch Processing: Handle multiple images
- β Custom Prompts: Tailor extraction for specific needs
- β Production Ready: Error handling included
π Usage Examples
Basic Usage
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/pixeltext-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image)
Custom Prompts
result = model.generate_ocr_text(
image,
prompt="<image>Extract all invoice details including amounts:"
)
Batch Processing
images = [Image.open(f"doc_{i}.jpg") for i in range(5)]
results = model.batch_ocr(images)
File Path Input
result = model.generate_ocr_text("path/to/your/image.jpg")
π§ Installation
pip install torch transformers pillow
π Model Details
- Base Model: google/paligemma-3b-pt-224
- Model Size: ~3B parameters
- Architecture: Vision-Language Transformer
- Optimization: OCR-specific enhancements
- Training: Custom OCR pipeline
π Comparison
Feature | Before (Broken) | After (FIXED) |
---|---|---|
Hub Loading | β AttributeError | β Works perfectly |
from_pretrained | β Missing | β Implemented |
AutoModel | β Failed | β Compatible |
Configuration | β Invalid | β Proper config |
π― Use Cases
- Document Digitization: Convert scanned documents
- Invoice Processing: Extract invoice data
- Form Processing: Digitize forms
- Receipt OCR: Extract receipt information
- Multi-language Documents: Handle international text
- Batch Processing: Process document collections
π Related Models
- textract-ai: https://huggingface.co/BabaK07/textract-ai (Qwen-based, higher accuracy)
- Base Model: https://huggingface.co/google/paligemma-3b-pt-224
π Support
For issues or questions, please check the model repository or contact the author.
Status: β FIXED and ready for production use!
- Downloads last month
- 13
Model tree for BabaK07/pixeltext-ai
Base model
google/paligemma-3b-pt-224