Donut KTP OCR Model
This model is fine-tuned from Donut for extracting information from Indonesian ID Cards (KTP).
Model Description
- Model type: Donut (Document Understanding Transformer)
- Language: Indonesian
- Task: OCR and Information Extraction from KTP
- Accuracy: TED: 90.97%, F1: 84.28%
- Fields extracted: 15 fields
Extracted Fields
The model can extract the following fields from KTP:
- NIK (ID Number)
- Nama (Name)
- Tempat Lahir (Place of Birth)
- Tanggal Lahir (Date of Birth)
- Jenis Kelamin (Gender)
- Golongan Darah (Blood Type)
- Alamat (Address)
- RT/RW
- Kelurahan (Village)
- Kecamatan (District)
- Agama (Religion)
- Status Perkawinan (Marital Status)
- Pekerjaan (Occupation)
- Kewarganegaraan (Citizenship)
- Berlaku Hingga (Valid Until)
Usage
import torch
from PIL import Image
from donut import DonutModel
# Load model
model = DonutModel.from_pretrained("ahmadarif019/donut-ktp-extractor")
model.eval()
# Use GPU if available
if torch.cuda.is_available():
model.half()
model.to("cuda")
# Process image
image = Image.open("ktp.jpg").convert("RGB")
result = model.inference(image=image, prompt="<s_dataset_ktp>")
ktp_data = result["predictions"][0]
print(ktp_data)
Training Data
- Dataset: Indonesian KTP images
- Size: [Add your dataset size]
- Split: [Add train/val/test split info]
Performance
- TED (Tree Edit Distance): 90.97%
- F1 Score: 84.28%
Limitations
- Works best with clear, well-lit KTP images
- May have reduced accuracy with damaged or obscured cards
- Optimized for standard KTP format
Citation
@article{kim2021donut,
title={OCR-free Document Understanding Transformer},
author={Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun},
journal={arXiv preprint arXiv:2111.15664},
year={2021}
}
License
MIT License - For research and internal use only.
- Downloads last month
- 54
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
1
Ask for provider support