PaddleOCR ONNX Models
Summary
This repository provides ONNX-format implementations of PaddleOCR models, offering comprehensive optical character recognition capabilities for multilingual text detection and recognition. The models are exported from the original PaddleOCR framework and optimized for efficient inference across various deployment scenarios.
The repository contains two main types of models: text detection models that identify text regions in images, and text recognition models that convert detected text regions into actual text content. The detection models utilize advanced segmentation-based approaches with post-processing techniques including contour detection, polygon approximation, and score thresholding to accurately localize text regions with high precision.
For text recognition, the models support multiple languages including Chinese, English, Japanese, Korean, Arabic, Cyrillic, Devanagari, and several other scripts. Each recognition model comes with its own character dictionary (dict.txt) tailored to the specific language or script family. The recognition pipeline handles text normalization, feature extraction, and sequence decoding to produce accurate text transcriptions with confidence scores.
The models are designed with practical deployment in mind, offering various versions optimized for different use cases - from lightweight mobile applications to high-accuracy server deployments. Key features include support for rotated text detection, duplicate character removal, and configurable confidence thresholds for balancing precision and recall in real-world applications.
Usage
The models can be easily used through the dghs-imgutils library:
# Installation
pip install dghs-imgutils
# Basic OCR usage
from imgutils.ocr import ocr, list_det_models, list_rec_models
# List available models
print("Detection models:", list_det_models())
print("Recognition models:", list_rec_models())
# Perform OCR on an image
results = ocr('your_image.jpg')
for bbox, text, confidence in results:
print(f"Text: {text}, Confidence: {confidence:.4f}, BBox: {bbox}")
# Custom model selection
results = ocr('your_image.jpg',
detect_model='ch_PP-OCRv4_det',
recognize_model='japan_PP-OCRv3_rec')
# Text detection only
from imgutils.ocr import detect_text_with_ocr
# Detect text regions without recognition
detections = detect_text_with_ocr('your_image.jpg')
for bbox, label, confidence in detections:
print(f"BBox: {bbox}, Confidence: {confidence:.4f}")
Available Models
Text Detection Models
ch_PP-OCRv2_det- Chinese text detection v2ch_PP-OCRv3_det- Chinese text detection v3ch_PP-OCRv4_det- Chinese text detection v4ch_PP-OCRv4_server_det- Server-optimized Chinese detection v4ch_ppocr_mobile_slim_v2.0_det- Lightweight mobile detectionch_ppocr_mobile_v2.0_det- Mobile-optimized detectionch_ppocr_server_v2.0_det- Server-optimized detectionen_PP-OCRv3_det- English text detection
Text Recognition Models
arabic_PP-OCRv3_rec- Arabic text recognitionch_PP-OCRv2_rec- Chinese text recognition v2ch_PP-OCRv3_rec- Chinese text recognition v3ch_PP-OCRv4_rec- Chinese text recognition v4ch_PP-OCRv4_server_rec- Server-optimized Chinese recognition v4ch_ppocr_mobile_v2.0_rec- Mobile-optimized Chinese recognitionch_ppocr_server_v2.0_rec- Server-optimized Chinese recognitionchinese_cht_PP-OCRv3_rec- Traditional Chinese recognitioncyrillic_PP-OCRv3_rec- Cyrillic script recognitiondevanagari_PP-OCRv3_rec- Devanagari script recognitionen_PP-OCRv3_rec- English text recognition v3en_PP-OCRv4_rec- English text recognition v4en_number_mobile_v2.0_rec- Mobile-optimized number recognitionjapan_PP-OCRv3_rec- Japanese text recognitionka_PP-OCRv3_rec- Kannada text recognitionkorean_PP-OCRv3_rec- Korean text recognitionlatin_PP-OCRv3_rec- Latin script recognitionta_PP-OCRv3_rec- Tamil text recognitionte_PP-OCRv3_rec- Telugu text recognition
Model Configuration
The OCR pipeline supports several configurable parameters:
heat_threshold: Heat map threshold for text detection (default: 0.3)box_threshold: Box confidence threshold (default: 0.7)max_candidates: Maximum number of text candidates (default: 1000)unclip_ratio: Expansion ratio for detected boxes (default: 2.0)rotation_threshold: Aspect ratio threshold for rotation detection (default: 1.5)is_remove_duplicate: Whether to remove duplicate characters (default: False)
Performance Notes
- The default detection model
ch_PP-OCRv4_detprovides excellent balance of accuracy and speed - The default recognition model
ch_PP-OCRv4_recsupports both Chinese and English with high accuracy - For specific languages, choose the corresponding recognition model for optimal results
- Server versions generally offer higher accuracy at the cost of increased computational requirements
- Mobile versions are optimized for speed and resource efficiency
Original Content
Onnx version of PaddleOCR.
Citation
@misc{paddleocr_onnx,
title = {{PaddleOCR ONNX Models}},
author = {PaddlePaddle and Repository Contributors},
howpublished = {\url{https://huggingface.co/deepghs/paddleocr}},
year = {2023},
note = {ONNX-format implementations of PaddleOCR models for multilingual text detection and recognition},
abstract = {This repository provides ONNX-format implementations of PaddleOCR models, offering comprehensive optical character recognition capabilities for multilingual text detection and recognition. The models are exported from the original PaddleOCR framework and optimized for efficient inference across various deployment scenarios. The repository contains text detection models that identify text regions in images and text recognition models that convert detected text regions into actual text content, supporting multiple languages including Chinese, English, Japanese, Korean, Arabic, Cyrillic, Devanagari, and several other scripts.},
keywords = {OCR, text-detection, text-recognition, multilingual, ONNX}
}