ArabicOCR-Qwen2.5-VL-7B-Vision

This repository contains the float16 merged version of a Vision-Language Model (VLM), fine-tuned by loay for the specific task of performing Optical Character Recognition (OCR) on Arabic text from images.

The model was created by fine-tuning the unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.

Model Details

  • Fine-tuned by: loay
  • Base Model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit
  • Fine-tuning Task: Arabic Optical Character Recognition (OCR)
  • Training Data: The model was trained on a curated dataset of images containing Arabic text and their corresponding transcriptions.
  • Output Format: This is a float16 precision model, ideal for inference on GPUs with sufficient VRAM (requires >14GB).
Downloads last month
52
Safetensors
Model size
8.29B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for loay/ArabicOCR-Qwen2.5-VL-7B-Vision

Adapter
(1)
this model