ArabicOCR-Qwen2.5-VL-7B-Vision / README.md

loay

Update README.md

7e2f762 verified 2 months ago

preview code

raw

history blame contribute delete

1.23 kB

metadata

license: apache-2.0
language:
  - en
  - ar
library_name: transformers
tags:
  - unsloth
  - qwen
  - qwen2.5-vl
  - arabic
  - ocr
  - vision
  - text-extraction
  - merged
  - lora
pipeline_tag: image-to-text
base_model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit

ArabicOCR-Qwen2.5-VL-7B-Vision

This repository contains the float16 merged version of a Vision-Language Model (VLM), fine-tuned by loay for the specific task of performing Optical Character Recognition (OCR) on Arabic text from images.

The model was created by fine-tuning the unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit model using LoRA adapters. The high-performance training was made possible by the Unsloth library, and the adapters were then merged back into the base model for easy deployment.

Model Details

Fine-tuned by: loay
Base Model: unsloth/Qwen2.5-VL-7B-Instruct-bnb-4bit
Fine-tuning Task: Arabic Optical Character Recognition (OCR)
Training Data: The model was trained on a curated dataset of images containing Arabic text and their corresponding transcriptions.
Output Format: This is a float16 precision model, ideal for inference on GPUs with sufficient VRAM (requires >14GB).