textract-ai - FIXED VERSION β
π FIXED: Hub loading now works properly!
A high-accuracy OCR model based on Qwen2-VL-2B-Instruct, now with proper Hugging Face Hub support.
β What's Fixed
- Hub Loading:
AutoModel.from_pretrained()
now works correctly - from_pretrained Method: Proper implementation added
- Configuration: Fixed model configuration for Hub compatibility
- Error Handling: Improved error handling and fallbacks
π Quick Start (NOW WORKS!)
from transformers import AutoModel
from PIL import Image
# Load model from Hub (FIXED!)
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
# Load image
image = Image.open("your_image.jpg")
# Extract text
result = model.generate_ocr_text(image, use_native=True)
print(f"Text: {result['text']}")
print(f"Confidence: {result['confidence']:.1%}")
print(f"Success: {result['success']}")
π Performance
- π― Accuracy: High accuracy OCR (up to 95% confidence)
- β±οΈ Speed: ~13 seconds per image (high quality)
- π Languages: Multi-language support
- π» Device: CPU and GPU support
- π Documents: Excellent for complex documents
π οΈ Features
- β
Hub Loading: Works with
AutoModel.from_pretrained()
- β High Accuracy: Based on Qwen2-VL-2B-Instruct
- β Multi-language: Supports many languages
- β Document OCR: Excellent for invoices, forms, documents
- β Robust Processing: Multiple extraction methods
- β Production Ready: Error handling included
π Usage Examples
Basic Usage
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("BabaK07/textract-ai", trust_remote_code=True)
image = Image.open("document.jpg")
result = model.generate_ocr_text(image, use_native=True)
High Accuracy Mode
result = model.generate_ocr_text(image, use_native=True) # Best accuracy
Fast Mode
result = model.generate_ocr_text(image, use_native=False) # Faster processing
File Path Input
result = model.generate_ocr_text("path/to/your/image.jpg")
π§ Installation
pip install torch transformers pillow
π Model Details
- Base Model: Qwen/Qwen2-VL-2B-Instruct
- Model Size: ~2.5B parameters
- Architecture: Vision-Language Transformer
- Optimization: OCR-specific processing
- Training: Custom OCR pipeline
π Comparison
Feature | Before (Broken) | After (FIXED) |
---|---|---|
Hub Loading | β ValueError | β Works perfectly |
from_pretrained | β Missing | β Implemented |
AutoModel | β Failed | β Compatible |
Configuration | β Invalid | β Proper config |
π― Use Cases
- High-Accuracy OCR: When accuracy is most important
- Document Processing: Complex invoices, forms, contracts
- Multi-language Text: International documents
- Professional OCR: Business and enterprise use
- Research Applications: Academic and research projects
π Related Models
- pixeltext-ai: https://huggingface.co/BabaK07/pixeltext-ai (PaliGemma-based, faster)
- Base Model: https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
π Support
For issues or questions, please check the model repository or contact the author.
Status: β FIXED and ready for production use!
- Downloads last month
- 33