Gemma4-E2B-IT-Nepali

This repository contains a Nepali supervised fine-tuned LoRA adapter for Google Gemma 4 E2B IT. The model was fine-tuned to improve Nepali instruction-following and Nepali conversational response generation using the himalaya-ai/nepali-sft-dataset dataset.

Model Details

Model Description

This model is a PEFT/LoRA adapter trained on top of google/gemma-4-E2B-it. It is designed for Nepali instruction-following tasks, Nepali question answering, Nepali text generation, and simple Nepali chatbot-style interaction.

Because this repository contains a LoRA adapter, the base model must be loaded first, and then this adapter should be attached using the peft library.

  • Developed by: Yuv Raj Pant and Himalaya AI Labs
  • Shared by: Himalaya AI Labs
  • Model type: PEFT LoRA adapter for causal language modeling
  • Base model: google/gemma-4-E2B-it
  • Dataset: himalaya-ai/nepali-sft-dataset
  • Language(s): Nepali and English
  • License: Apache 2.0
  • Fine-tuning method: Supervised Fine-Tuning (SFT) with LoRA / QLoRA-style training

Intended Use

This model is intended for research, experimentation, and community demonstrations involving Nepali language AI.

Potential use cases include:

  • Nepali instruction-following
  • Nepali chatbot applications
  • Nepali question answering
  • Nepali text generation
  • Nepali-English bilingual assistant workflows
  • Educational AI demos for Nepali users
  • Low-resource language research

Out-of-Scope Use

This model should not be used as the only source of truth in high-stakes settings such as medical, legal, financial, emergency, or safety-critical decision-making.

The model may generate incorrect, biased, incomplete, or hallucinated outputs. Human review is recommended for public-facing or production use.

Training Dataset

The model was fine-tuned on:

  • himalaya-ai/nepali-sft-dataset

The dataset was used for supervised instruction fine-tuning. Since the dataset provides a training split, a small evaluation split was created from the training data during preprocessing.

Training Configuration

Setting Value
Base model google/gemma-4-E2B-it
Dataset himalaya-ai/nepali-sft-dataset
Number of epochs 1
Max sequence length 2048
Per-device train batch size 4
Per-device eval batch size 4
Gradient accumulation steps 4
Effective batch size 16
Learning rate 2e-4
LR scheduler Cosine
Warmup ratio 0.03
Weight decay 0.0
Max grad norm 0.3
LoRA rank 32
LoRA alpha 64
LoRA dropout 0.05
Evaluation fraction 0.005
Split seed 42

How to Use

Install the required packages:

pip install -U transformers peft accelerate bitsandbytes torch

Then load the base model and attach the LoRA adapter:

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

base_model_id = "google/gemma-4-E2B-it"
adapter_id = "himalaya-ai/gemma4-e2b-it-nepali"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    dtype=torch.bfloat16,
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(adapter_id)

Example Inference

import torch

@torch.inference_mode()
def chat(model, tokenizer, user_text, system=None):
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": user_text})

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt",
        return_dict=True,
    ).to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

    input_length = inputs["input_ids"].shape[-1]
    new_tokens = outputs[0, input_length:]
    return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()

system_prompt = "You are a helpful AI assistant that answers in Nepali."
prompt = "नेपालको राजधानी कहाँ हो?"
response = chat(model, tokenizer, prompt, system=system_prompt)
print(response)

Example Prompts

  • नेपालको राजधानी कहाँ हो?

Limitations

This model has not been fully benchmarked across all Nepali NLP tasks. It may produce hallucinated or factually incorrect answers, especially for questions requiring current information or specialized domain knowledge.

The model may also reflect biases present in the base model or fine-tuning dataset. Users should evaluate the model carefully for their specific use case.

Ethical Considerations

When deploying this model in public-facing applications, developers should consider adding safety filters, human review, and domain-specific evaluation. The model should not be used to produce harmful, deceptive, or high-risk advice.

Contributors

  • Yuv Raj Pant
  • Himalaya AI Labs

Acknowledgements

This model is based on Google DeepMind's Gemma 4 E2B IT model and was fine-tuned using the Himalaya AI Nepali SFT dataset.

Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for himalaya-ai/gemma4-e2b-it-nepali

Adapter
(90)
this model

Dataset used to train himalaya-ai/gemma4-e2b-it-nepali