Update README.md
Browse files
README.md
CHANGED
|
@@ -15,4 +15,32 @@ tags:
|
|
| 15 |
|
| 16 |
> **Dots.OCR-Latest-BF16** is an optimized and updated vision-language OCR model variant of the original [Dots.OCR](https://huggingface.co/rednote-hilab/dots.ocr). This open-source model is designed to extract text from images and scanned documents, including handwritten and printed content. It can output results as plain text or Markdown, preserving document layout elements such as headings, tables, and lists. This model uses a powerful multimodal backbone (**3B VLM**) to enhance reading comprehension and layout understanding, handling cursive handwriting and complex document structures effectively.
|
| 17 |
|
| 18 |
-
The **BF16 variant** has been tested and updated to work smoothly with the latest `transformers` version without compatibility issues, ensuring optimized performance.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
> **Dots.OCR-Latest-BF16** is an optimized and updated vision-language OCR model variant of the original [Dots.OCR](https://huggingface.co/rednote-hilab/dots.ocr). This open-source model is designed to extract text from images and scanned documents, including handwritten and printed content. It can output results as plain text or Markdown, preserving document layout elements such as headings, tables, and lists. This model uses a powerful multimodal backbone (**3B VLM**) to enhance reading comprehension and layout understanding, handling cursive handwriting and complex document structures effectively.
|
| 17 |
|
| 18 |
+
The **BF16 variant** has been tested and updated to work smoothly with the latest `transformers` version without compatibility issues, ensuring optimized performance.
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
transformers: 4.57.1
|
| 22 |
+
torch: 2.6.0+cu124
|
| 23 |
+
cuda: 12.4
|
| 24 |
+
device: NVIDIA H200 MIG 3g.71gb
|
| 25 |
+
attn_implementation= "flash_attention_2"
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
## Quick Start with Transformers 🤗
|
| 29 |
+
|
| 30 |
+
#### Install the required packages
|
| 31 |
+
|
| 32 |
+
```
|
| 33 |
+
gradio
|
| 34 |
+
numpy
|
| 35 |
+
torch
|
| 36 |
+
torchvision
|
| 37 |
+
transformers==4.57.1
|
| 38 |
+
accelerate
|
| 39 |
+
matplotlib
|
| 40 |
+
flash-attn @ https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.6cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
### Run Demo
|
| 44 |
+
|
| 45 |
+
```py
|
| 46 |
+
```
|