zackli4ai's picture
Update README.md
4017efd verified
metadata
language:
  - multilingual
tags:
  - deepseek
  - vision-language
  - ocr
  - document-parse
base_model:
  - deepseek-ai/DeepSeek-OCR

DeepSeek OCR

Note currently only NexaSDK supports this model's GGUF.

Quickstart

  1. Install NexaSDK
  2. Run the model locally with one line of code:
nexa infer NexaAI/DeepSeek-OCR-GGUF
  1. Then drag your image to terminal or type into the image path

case 1 : extract text

<your-image-path> Free OCR.

case 2 : extract bounding box

<your-image-path> <|grounding|>Convert the document to markdown. 

Model Description

DeepSeek OCR is a high-accuracy optical character recognition model built for extracting text from complex visual inputs such as documents, screenshots, receipts, and natural scenes.
It combines vision-language modeling with efficient visual encoders to achieve superior recognition of multi-language and multi-layout text while remaining lightweight enough for edge or on-device deployment.

Features

  • Multilingual OCR — recognizes printed and handwritten text across major global languages.
  • Document Layout Understanding — preserves structure such as tables, paragraphs, and titles.
  • Scene Text Recognition — robust against lighting, distortion, and low-quality captures.
  • Lightweight & Fast — optimized for CPU and GPU acceleration.
  • End-to-End Pipeline — supports image-to-text and structured JSON output.

Use Cases

  • Digitizing scanned documents or PDFs
  • Extracting text from mobile camera inputs or screenshots
  • Invoice and receipt parsing
  • OCR-based search and indexing systems
  • Visual question answering or document agents

Inputs and Outputs

Input:

  • Image file (JPEG, PNG, or tensor array)
  • Optional parameters for language hints or layout detection

Output:

  • Extracted text (plain text or structured format with bounding boxes)
  • Confidence scores per word or region

Integration

DeepSeek OCR can be integrated through:

  • Python API (pip install deepseek-ocr)
  • REST or gRPC endpoints for server deployment

License

This model is released under the Apache 2.0 License, allowing commercial use, modification, and redistribution with attribution.