Sinhala Handwritten Notes OCR - TrOCR Fine-tuned Model

Model Description

This model is a fine-tuned TrOCR-based OCR model for recognizing Sinhala handwritten text from image inputs. It was developed to support Sinhala handwritten educational note recognition as part of a Sinhala educational assistant pipeline.

The model was fine-tuned from eshangj/TrOCR-Sinhala-finetuned using a custom Sinhala handwritten notes dataset.

The main goal of this model is to improve Sinhala handwritten text recognition, especially for cropped handwritten word images or short handwritten text regions. The extracted text can be used for downstream tasks such as document digitization, search, summarization, and question answering.

Intended Use

This model is intended for:

Sinhala handwritten text recognition
OCR for Sinhala educational notes
Word-level or short-line handwritten text extraction
Sinhala OCR research experiments
Sinhala educational document processing pipelines

Not Intended For

This model is not currently optimized for:

Long paragraph-level handwritten OCR
Complex full-page layout understanding
Printed Sinhala OCR
Very noisy or low-resolution images
Production use without further validation

For best results, the input should be a clearly cropped handwritten word or short handwritten text segment.

Training Configuration

Parameter	Value
Base model	`eshangj/TrOCR-Sinhala-finetuned`
Epochs	20
Train batch size	8
Evaluation batch size	8
Learning rate	2e-5
FP16	True
Save strategy	Epoch
Evaluation strategy	Epoch
Logging steps	20
Generation enabled	True
Max sequence length	64
Evaluation split	10%
Random state	42

Evaluation Results

Two fine-tuning runs were recorded during experimentation.

Run 1

Metric	Value
Best epoch	6
Best evaluation loss	1.4947
Best evaluation CER	0.3741
Best evaluation WER	0.5283
Final CER	0.2878
Final WER	0.4906
First training loss	3.4820
Last training loss	0.0008
Minimum training loss	0.0008

Run 2

Metric	Value
Best epoch	7
Best evaluation loss	1.4877
Best evaluation CER	0.3453
Best evaluation WER	0.4717
Final CER	0.3129
Final WER	0.4528
First training loss	3.4819
Last training loss	0.0009
Minimum training loss	0.0009

Result Interpretation

The model shows clear learning during training, as the training loss decreased from approximately 3.48 to below 0.001.

However, the evaluation loss remained around 1.48, while the training loss became almost zero. This indicates that the model may be overfitting to the training dataset.

The best validation performance was observed around epoch 6-7, rather than at the final epoch. Therefore, the best checkpoint should be selected based on validation CER and WER, not only based on the final training loss.

The best recorded validation metrics were:

Metric	Best Observed Value
Evaluation CER	0.3453
Evaluation WER	0.4717
Evaluation loss	1.4877

The best final CER observed was:

Metric	Value
Final CER	0.2878

Metrics Explanation

Character Error Rate - CER

Character Error Rate measures character-level mistakes between the predicted Sinhala text and the ground truth text.

Lower CER means better character-level recognition.

Word Error Rate - WER

Word Error Rate measures word-level mistakes between the predicted Sinhala text and the ground truth text.

Lower WER means better word-level recognition.

Since Sinhala handwritten OCR is challenging due to Sinhala character shapes, ligatures, spacing, and handwriting variations, both CER and WER are useful for evaluating this model.

Usage

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
import torch

model_name = "hasindu-k/sinhala-handwritten-notes-v3"

processor = TrOCRProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name)

image = Image.open("sample_image.png").convert("RGB")

pixel_values = processor(images=image, return_tensors="pt").pixel_values

with torch.no_grad():
    generated_ids = model.generate(pixel_values)

predicted_text = processor.batch_decode(
    generated_ids,
    skip_special_tokens=True
)[0]

print(predicted_text)

Recommended Input Format

For better recognition accuracy, use:

Cropped word images
Cropped short-line images
Clear Sinhala handwritten text
High-contrast images
Minimal background noise
Preprocessed images where ruled lines, shadows, or unnecessary background areas are removed

Limitations

This model still has several limitations:

It may produce incorrect characters for visually similar Sinhala letters.
It may struggle with long handwritten sentences.
It may perform poorly on unseen handwriting styles.
It may be sensitive to image quality, skew, blur, shadows, and background noise.
It may confuse word boundaries if the input image contains multiple words.
Current WER values show that word-level accuracy still needs further improvement.

Future Improvements

Future versions of this model can be improved by:

Increasing the Sinhala handwritten training dataset size
Adding more diverse handwriting styles
Training with better word-level cropped images
Applying data augmentation
Using early stopping based on validation CER and WER
Evaluating using a separate unseen test dataset
Improving preprocessing for noisy handwritten documents
Comparing performance with other Sinhala OCR models

Ethical Considerations

This model is intended for educational and research use. It should not be used as the only source of truth for high-stakes document interpretation. OCR outputs should be reviewed by a human when accuracy is important.

Citation

If you use this model in research or academic work, please cite the model repository and mention that it is based on a fine-tuned TrOCR architecture for Sinhala handwritten OCR.

Model Version

Version: v3
Repository: hasindu-k/sinhala-handwritten-notes-v3
Fine-tuned from: eshangj/TrOCR-Sinhala-finetuned
Training timestamp: 2026-05-03

Downloads last month: 42

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for hasindu-k/sinhala-handwritten-notes-v3

Base model

Ransaka/sinhala-ocr-model

Finetuned

Ransaka/TrOCR-Sinhala

Finetuned

eshangj/TrOCR-Sinhala-finetuned

Finetuned

(1)

this model