Gemma3-Singlish-Codemix

Fine-tuned Gemma 3 model for converting code-mixed Singlish/English text into proper Sinhala script.

Model Details

Property	Value
Base Model	`savinugunarathna/Gemma3-Singlish-Sinhala-Merged`
Fine-tuning Method	QLoRA (4-bit, r=16)
Upload Type	`merged`
Task	Code-mixed → Sinhala transliteration

Training Data

Phonetic dataset: ~1M Singlish → Sinhala pairs (sampled subset used)
Code-mixed dataset: ~22K Singlish/English → Sinhala pairs
Curriculum: 3-phase training (phonetic → mixed → code-mix focused)

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

import torch

MODEL_ID = "Pudamya/Gemma3-Singlish-Codemix"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16, device_map='auto')

def translate(text):
    prompt = (
        "### Instruction:\n"
        "Convert the following code-mixed Singlish-English sentence into proper Sinhala script.\n\n"
        f"### Input:\n{text}\n\n"
        "### Response:\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=150,
            do_sample=False,
            repetition_penalty=1.1,
            pad_token_id=tokenizer.eos_token_id,
        )
    decoded = tokenizer.decode(out[0], skip_special_tokens=True)
    return decoded.split("### Response:")[-1].strip()

print(translate("mama api ekka movie eke gihin fun hari thibba"))

Languages

Input: Romanized Sinhala / Singlish / Code-mixed Sinhala-English
Output: Sinhala script (Unicode)

Downloads last month: -

Safetensors

Model size

0.3B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pudamya/Gemma3-Singlish-Codemix

Base model

savinugunarathna/Gemma3-Singlish-Sinhala-Merged

Adapter

(3)

this model