Sinhala Handwritten Notes OCR - TrOCR Fine-tuned Model

Model Description

This model is a fine-tuned TrOCR-based OCR model for recognizing Sinhala handwritten text from image inputs. It was developed to support Sinhala handwritten educational note recognition as part of a Sinhala educational assistant pipeline.

The model was fine-tuned from eshangj/TrOCR-Sinhala-finetuned using a custom Sinhala handwritten notes dataset.

The main goal of this model is to improve Sinhala handwritten text recognition, especially for cropped handwritten word images or short handwritten text regions. The extracted text can be used for downstream tasks such as document digitization, search, summarization, and question answering.

Intended Use

This model is intended for:

  • Sinhala handwritten text recognition
  • OCR for Sinhala educational notes
  • Word-level or short-line handwritten text extraction
  • Sinhala OCR research experiments
  • Sinhala educational document processing pipelines

Not Intended For

This model is not currently optimized for:

  • Long paragraph-level handwritten OCR
  • Complex full-page layout understanding
  • Printed Sinhala OCR
  • Very noisy or low-resolution images
  • Production use without further validation

For best results, the input should be a clearly cropped handwritten word or short handwritten text segment.

Training Configuration

Parameter Value
Base model eshangj/TrOCR-Sinhala-finetuned
Epochs 20
Train batch size 8
Evaluation batch size 8
Learning rate 2e-5
FP16 True
Save strategy Epoch
Evaluation strategy Epoch
Logging steps 20
Generation enabled True
Max sequence length 64
Evaluation split 10%
Random state 42

Evaluation Results

Two fine-tuning runs were recorded during experimentation.

Run 1

Metric Value
Best epoch 6
Best evaluation loss 1.4947
Best evaluation CER 0.3741
Best evaluation WER 0.5283
Final CER 0.2878
Final WER 0.4906
First training loss 3.4820
Last training loss 0.0008
Minimum training loss 0.0008

Run 2

Metric Value
Best epoch 7
Best evaluation loss 1.4877
Best evaluation CER 0.3453
Best evaluation WER 0.4717
Final CER 0.3129
Final WER 0.4528
First training loss 3.4819
Last training loss 0.0009
Minimum training loss 0.0009

Result Interpretation

The model shows clear learning during training, as the training loss decreased from approximately 3.48 to below 0.001.

However, the evaluation loss remained around 1.48, while the training loss became almost zero. This indicates that the model may be overfitting to the training dataset.

The best validation performance was observed around epoch 6-7, rather than at the final epoch. Therefore, the best checkpoint should be selected based on validation CER and WER, not only based on the final training loss.

The best recorded validation metrics were:

Metric Best Observed Value
Evaluation CER 0.3453
Evaluation WER 0.4717
Evaluation loss 1.4877

The best final CER observed was:

Metric Value
Final CER 0.2878

Metrics Explanation

Character Error Rate - CER

Character Error Rate measures character-level mistakes between the predicted Sinhala text and the ground truth text.

Lower CER means better character-level recognition.

Word Error Rate - WER

Word Error Rate measures word-level mistakes between the predicted Sinhala text and the ground truth text.

Lower WER means better word-level recognition.

Since Sinhala handwritten OCR is challenging due to Sinhala character shapes, ligatures, spacing, and handwriting variations, both CER and WER are useful for evaluating this model.

Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

model_name = "hasindu-k/sinhala-handwritten-notes-v3"

processor = TrOCRProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name)

image = Image.open("sample_image.png").convert("RGB")

pixel_values = processor(images=image, return_tensors="pt").pixel_values

with torch.no_grad():
    generated_ids = model.generate(pixel_values)

predicted_text = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True
)[0]

print(predicted_text)

Recommended Input Format

For better recognition accuracy, use:

  • Cropped word images
  • Cropped short-line images
  • Clear Sinhala handwritten text
  • High-contrast images
  • Minimal background noise
  • Preprocessed images where ruled lines, shadows, or unnecessary background areas are removed

Limitations

This model still has several limitations:

  • It may produce incorrect characters for visually similar Sinhala letters.
  • It may struggle with long handwritten sentences.
  • It may perform poorly on unseen handwriting styles.
  • It may be sensitive to image quality, skew, blur, shadows, and background noise.
  • It may confuse word boundaries if the input image contains multiple words.
  • Current WER values show that word-level accuracy still needs further improvement.

Future Improvements

Future versions of this model can be improved by:

  • Increasing the Sinhala handwritten training dataset size
  • Adding more diverse handwriting styles
  • Training with better word-level cropped images
  • Applying data augmentation
  • Using early stopping based on validation CER and WER
  • Evaluating using a separate unseen test dataset
  • Improving preprocessing for noisy handwritten documents
  • Comparing performance with other Sinhala OCR models

Ethical Considerations

This model is intended for educational and research use. It should not be used as the only source of truth for high-stakes document interpretation. OCR outputs should be reviewed by a human when accuracy is important.

Citation

If you use this model in research or academic work, please cite the model repository and mention that it is based on a fine-tuned TrOCR architecture for Sinhala handwritten OCR.

Model Version

Version: v3
Repository: hasindu-k/sinhala-handwritten-notes-v3
Fine-tuned from: eshangj/TrOCR-Sinhala-finetuned
Training timestamp: 2026-05-03
Downloads last month
42
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hasindu-k/sinhala-handwritten-notes-v3

Finetuned
(1)
this model