Gemma3-Singlish-Codemix
Fine-tuned Gemma 3 model for converting code-mixed Singlish/English text into proper Sinhala script.
Model Details
| Property | Value |
|---|---|
| Base Model | savinugunarathna/Gemma3-Singlish-Sinhala-Merged |
| Fine-tuning Method | QLoRA (4-bit, r=16) |
| Upload Type | merged |
| Task | Code-mixed โ Sinhala transliteration |
Training Data
- Phonetic dataset: ~1M Singlish โ Sinhala pairs (sampled subset used)
- Code-mixed dataset: ~22K Singlish/English โ Sinhala pairs
- Curriculum: 3-phase training (phonetic โ mixed โ code-mix focused)
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL_ID = "Pudamya/Gemma3-Singlish-Codemix"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype=torch.float16, device_map='auto')
def translate(text):
prompt = (
"### Instruction:\n"
"Convert the following code-mixed Singlish-English sentence into proper Sinhala script.\n\n"
f"### Input:\n{text}\n\n"
"### Response:\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=150,
do_sample=False,
repetition_penalty=1.1,
pad_token_id=tokenizer.eos_token_id,
)
decoded = tokenizer.decode(out[0], skip_special_tokens=True)
return decoded.split("### Response:")[-1].strip()
print(translate("mama api ekka movie eke gihin fun hari thibba"))
Languages
- Input: Romanized Sinhala / Singlish / Code-mixed Sinhala-English
- Output: Sinhala script (Unicode)
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support