πŸš€ MonkeyOCR-MLX: Apple Silicon Optimized OCR

A high-performance OCR application optimized for Apple Silicon with MLX-VLM acceleration, featuring advanced document layout analysis and intelligent text extraction.

πŸ”₯ Key Features

  • ⚑ MLX-VLM Optimization: Native Apple Silicon acceleration using MLX framework
  • πŸš€ 3x Faster Processing: Compared to standard PyTorch on M-series chips
  • 🧠 Advanced AI: Powered by Qwen2.5-VL model with specialized layout analysis
  • πŸ“„ Multi-format Support: PDF, PNG, JPG, JPEG with intelligent structure detection
  • 🌐 Modern Web Interface: Beautiful Gradio interface for easy document processing
  • πŸ”„ Batch Processing: Efficient handling of multiple documents
  • 🎯 High Accuracy: Specialized for complex financial documents and tables
  • πŸ”’ 100% Private: All processing happens locally on your Mac

πŸ“Š Performance Benchmarks

Test: Complex Financial Document (Tax Form)

  • MLX-VLM: ~15-18 seconds ⚑
  • Standard PyTorch: ~25-30 seconds
  • CPU Only: ~60-90 seconds

MacBook M4 Pro Performance:

  • Model loading: ~1.7s
  • Text extraction: ~15s
  • Table structure: ~18s
  • Memory usage: ~13GB peak

πŸ›  Installation

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.11+
  • 16GB+ RAM (32GB+ recommended for large documents)

Quick Setup

  1. Clone the repository:

    git clone https://huggingface.co/Jimmi42/MonkeyOCR-Apple-Silicon
    cd MonkeyOCR-Apple-Silicon
    
  2. Run the automated setup script:

    chmod +x setup.sh
    ./setup.sh
    

    This script will automatically:

    • Download MonkeyOCR from the official GitHub repository
    • Apply MLX-VLM optimization patches for Apple Silicon
    • Enable smart backend auto-selection (MLX/LMDeploy/transformers)
    • Install UV package manager if needed
    • Set up virtual environment with Python 3.11
    • Install all dependencies including MLX-VLM
    • Download required model weights
    • Configure optimal backend for your hardware
  3. Alternative manual installation:

    # Install UV if not already installed
    curl -LsSf https://astral.sh/uv/install.sh | sh
    
    # Download MonkeyOCR
    git clone https://github.com/Yuliang-Liu/MonkeyOCR.git MonkeyOCR
    
    # Install dependencies (includes mlx-vlm)
    uv sync
    
    # Download models
    cd MonkeyOCR && python tools/download_model.py && cd ..
    

πŸƒβ€β™‚οΈ Usage

Web Interface (Recommended)

# Activate virtual environment
source .venv/bin/activate  # or `uv shell`

# Start the web app
python app.py

Access the interface at http://localhost:7861

Command Line

python main.py path/to/document.pdf

βš™οΈ Configuration

Smart Backend Selection (Default)

The app automatically detects your hardware and selects the optimal backend:

# model_configs_mps.yaml
device: mps
chat_config:
  backend: auto  # Smart auto-selection
  batch_size: 1
  max_new_tokens: 256
  temperature: 0.0

Auto-Selection Logic:

  • 🍎 Apple Silicon (MPS) β†’ MLX-VLM (3x faster)
  • πŸ–₯️ CUDA GPU β†’ LMDeploy (optimized for NVIDIA)
  • πŸ’» CPU/Fallback β†’ Transformers (universal compatibility)

Performance Backends

Backend Speed Memory Best For Auto-Selected
auto ⚑ 🧠 All systems (Recommended) βœ… Default
mlx πŸš€πŸš€πŸš€ 🟒 Apple Silicon 🍎 Auto for MPS
lmdeploy πŸš€πŸš€ 🟑 CUDA systems πŸ–₯️ Auto for CUDA
transformers πŸš€ 🟒 Universal fallback πŸ’» Auto for CPU

🧠 Model Architecture

Core Components

  • Layout Detection: DocLayout-YOLO for document structure analysis
  • Vision-Language Model: Qwen2.5-VL with MLX optimization
  • Layout Reading: LayoutReader for reading order optimization
  • MLX Framework: Native Apple Silicon acceleration

Apple Silicon Optimizations

  • Metal Performance Shaders: Direct GPU acceleration
  • Unified Memory: Optimized memory access patterns
  • Neural Engine: Utilizes Apple's dedicated AI hardware
  • Float16 Precision: Optimal speed/accuracy balance

🎯 Perfect For

Document Types:

  • πŸ“Š Financial Documents: Tax forms, invoices, statements
  • πŸ“‹ Legal Documents: Contracts, forms, certificates
  • πŸ“„ Academic Papers: Research papers, articles
  • 🏒 Business Documents: Reports, presentations, spreadsheets

Advanced Features:

  • βœ… Complex table extraction with highlighted cells
  • βœ… Multi-column layouts and mixed content
  • βœ… Mathematical formulas and equations
  • βœ… Structured data output (Markdown, JSON)
  • βœ… Batch processing for multiple files

🚨 Troubleshooting

MLX-VLM Issues

# Test MLX-VLM availability
python -c "import mlx_vlm; print('βœ… MLX-VLM available')"

# Check if auto backend selection is working
python -c "
from MonkeyOCR.magic_pdf.model.custom_model import MonkeyOCR
model = MonkeyOCR('model_configs_mps.yaml')
print(f'Selected backend: {type(model.chat_model).__name__}')
"

Performance Issues

# Check MPS availability
python -c "import torch; print(f'MPS available: {torch.backends.mps.is_available()}')"

# Monitor memory usage during processing
top -pid $(pgrep -f "python app.py")

Common Solutions

  1. Patches Not Applied:

    • Re-run ./setup.sh to reapply patches
    • Check that MonkeyOCR directory exists and has our modifications
    • Verify MonkeyChat_MLX class exists in MonkeyOCR/magic_pdf/model/custom_model.py
  2. Wrong Backend Selected:

    • Check hardware detection with python -c "import torch; print(torch.backends.mps.is_available())"
    • Verify MLX-VLM is installed: pip install mlx-vlm
    • Use backend: mlx in config to force MLX backend
  3. Slow Performance:

    • Ensure auto-selection chose MLX backend on Apple Silicon
    • Check Activity Monitor for MPS GPU usage
    • Verify backend: auto in model_configs_mps.yaml
  4. Memory Issues:

    • Reduce image resolution before processing
    • Close other memory-intensive applications
    • Reduce batch_size to 1 in config
  5. Port Already in Use:

    GRADIO_SERVER_PORT=7862 python app.py
    

πŸ“ Project Structure

MonkeyOCR-MLX/
β”œβ”€β”€ 🌐 app.py                    # Gradio web interface
β”œβ”€β”€ πŸ–₯️ main.py                   # CLI interface  
β”œβ”€β”€ βš™οΈ model_configs_mps.yaml    # MLX-optimized config
β”œβ”€β”€ πŸ“¦ requirements.txt          # Dependencies (includes mlx-vlm)
β”œβ”€β”€ πŸ› οΈ torch_patch.py           # Compatibility patches
β”œβ”€β”€ 🧠 MonkeyOCR/               # Core AI models
β”‚   └── 🎯 magic_pdf/           # Processing engine
β”œβ”€β”€ πŸ“„ .gitignore               # Git ignore rules
└── πŸ“š README.md                # This file

πŸ”₯ What's New in MLX Version

  • ✨ Smart Patching System: Automatically applies MLX-VLM optimizations to official MonkeyOCR
  • 🧠 Intelligent Backend Selection: Auto-detects hardware and selects optimal backend
  • πŸš€ 3x Faster Processing: MLX-VLM acceleration on Apple Silicon
  • πŸ’Ύ Better Memory Efficiency: Optimized for unified memory architecture
  • 🎯 Improved Accuracy: Enhanced table and structure detection
  • πŸ”§ Zero Configuration: Works out-of-the-box with smart defaults
  • πŸ“Š Performance Monitoring: Built-in timing and metrics
  • πŸ› οΈ Latest Fix (June 2025): Resolved MLX-VLM prompt formatting for optimal OCR output
  • πŸ”„ Always Up-to-Date: Uses official MonkeyOCR repository with our patches applied

πŸ”¬ Technical Implementation

Smart Patching System

  • Dynamic Code Injection: Automatically adds MLX-VLM class to official MonkeyOCR
  • Backend Selection Logic: Patches smart hardware detection into initialization
  • Zero Maintenance: Always uses latest official MonkeyOCR with our optimizations
  • Seamless Integration: Patches are applied transparently during setup

MLX-VLM Backend (MonkeyChat_MLX)

  • Direct MLX framework integration
  • Optimized for Apple's Metal Performance Shaders
  • Native unified memory management
  • Specialized prompt processing for OCR tasks
  • Fixed prompt formatting for optimal output quality

Intelligent Fallback System

  • Hardware Detection: MPS β†’ MLX, CUDA β†’ LMDeploy, CPU β†’ Transformers
  • Graceful Degradation: Falls back to compatible backends if preferred unavailable
  • Cross-Platform: Maintains compatibility across all systems
  • Error Recovery: Automatic fallback on initialization failures

🀝 Contributing

We welcome contributions! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Apple MLX Team: For the incredible MLX framework
  • MonkeyOCR Team: For the foundational OCR model
  • Qwen Team: For the excellent Qwen2.5-VL model
  • Gradio Team: For the beautiful web interface
  • MLX-VLM Contributors: For the MLX vision-language integration

πŸ“ž Support

  • πŸ› Bug Reports: Create an issue
  • πŸ’¬ Discussions: Hugging Face Discussions
  • πŸ“– Documentation: Check the troubleshooting section above
  • ⭐ Star the repository if you find it useful!

πŸš€ Supercharged for Apple Silicon β€’ Made with ❀️ for the MLX Community

Experience the future of OCR with native Apple Silicon optimization

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support