Instructions to use himalaya-ai/gemma4-e2b-it-nepali with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use himalaya-ai/gemma4-e2b-it-nepali with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-E2B-it")
model = PeftModel.from_pretrained(base_model, "himalaya-ai/gemma4-e2b-it-nepali")

Transformers

How to use himalaya-ai/gemma4-e2b-it-nepali with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="himalaya-ai/gemma4-e2b-it-nepali")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("himalaya-ai/gemma4-e2b-it-nepali", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use himalaya-ai/gemma4-e2b-it-nepali with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "himalaya-ai/gemma4-e2b-it-nepali"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "himalaya-ai/gemma4-e2b-it-nepali",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/himalaya-ai/gemma4-e2b-it-nepali

SGLang

How to use himalaya-ai/gemma4-e2b-it-nepali with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "himalaya-ai/gemma4-e2b-it-nepali" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "himalaya-ai/gemma4-e2b-it-nepali",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "himalaya-ai/gemma4-e2b-it-nepali" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "himalaya-ai/gemma4-e2b-it-nepali",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use himalaya-ai/gemma4-e2b-it-nepali with Docker Model Runner:
```
docker model run hf.co/himalaya-ai/gemma4-e2b-it-nepali
```

Gemma4-E2B-IT-Nepali

This repository contains a Nepali supervised fine-tuned LoRA adapter for Google Gemma 4 E2B IT. The model was fine-tuned to improve Nepali instruction-following and Nepali conversational response generation using the himalaya-ai/nepali-sft-dataset dataset.

Model Details

Model Description

This model is a PEFT/LoRA adapter trained on top of google/gemma-4-E2B-it. It is designed for Nepali instruction-following tasks, Nepali question answering, Nepali text generation, and simple Nepali chatbot-style interaction.

Because this repository contains a LoRA adapter, the base model must be loaded first, and then this adapter should be attached using the peft library.

Developed by: Yuv Raj Pant and Himalaya AI Labs
Shared by: Himalaya AI Labs
Model type: PEFT LoRA adapter for causal language modeling
Base model: google/gemma-4-E2B-it
Dataset: himalaya-ai/nepali-sft-dataset
Language(s): Nepali and English
License: Apache 2.0
Fine-tuning method: Supervised Fine-Tuning (SFT) with LoRA / QLoRA-style training

Intended Use

This model is intended for research, experimentation, and community demonstrations involving Nepali language AI.

Potential use cases include:

Nepali instruction-following
Nepali chatbot applications
Nepali question answering
Nepali text generation
Nepali-English bilingual assistant workflows
Educational AI demos for Nepali users
Low-resource language research

Out-of-Scope Use

This model should not be used as the only source of truth in high-stakes settings such as medical, legal, financial, emergency, or safety-critical decision-making.

The model may generate incorrect, biased, incomplete, or hallucinated outputs. Human review is recommended for public-facing or production use.

Training Dataset

The model was fine-tuned on:

himalaya-ai/nepali-sft-dataset

The dataset was used for supervised instruction fine-tuning. Since the dataset provides a training split, a small evaluation split was created from the training data during preprocessing.

Training Configuration

Setting	Value
Base model	google/gemma-4-E2B-it
Dataset	himalaya-ai/nepali-sft-dataset
Number of epochs	1
Max sequence length	2048
Per-device train batch size	4
Per-device eval batch size	4
Gradient accumulation steps	4
Effective batch size	16
Learning rate	2e-4
LR scheduler	Cosine
Warmup ratio	0.03
Weight decay	0.0
Max grad norm	0.3
LoRA rank	32
LoRA alpha	64
LoRA dropout	0.05
Evaluation fraction	0.005
Split seed	42

How to Use

Install the required packages:

pip install -U transformers peft accelerate bitsandbytes torch

Then load the base model and attach the LoRA adapter:

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

base_model_id = "google/gemma-4-E2B-it"
adapter_id = "himalaya-ai/gemma4-e2b-it-nepali"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
    dtype=torch.bfloat16,
)

model = PeftModel.from_pretrained(base_model, adapter_id)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(adapter_id)

Example Inference

import torch

@torch.inference_mode()
def chat(model, tokenizer, user_text, system=None):
    messages = []
    if system:
        messages.append({"role": "system", "content": system})
    messages.append({"role": "user", "content": user_text})

    inputs = tokenizer.apply_chat_template(
        messages,
        add_generation_prompt=True,
        return_tensors="pt",
        return_dict=True,
    ).to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.eos_token_id,
    )

    input_length = inputs["input_ids"].shape[-1]
    new_tokens = outputs[0, input_length:]
    return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()

system_prompt = "You are a helpful AI assistant that answers in Nepali."
prompt = "नेपालको राजधानी कहाँ हो?"
response = chat(model, tokenizer, prompt, system=system_prompt)
print(response)

Example Prompts

नेपालको राजधानी कहाँ हो?

Limitations

This model has not been fully benchmarked across all Nepali NLP tasks. It may produce hallucinated or factually incorrect answers, especially for questions requiring current information or specialized domain knowledge.

The model may also reflect biases present in the base model or fine-tuning dataset. Users should evaluate the model carefully for their specific use case.

Ethical Considerations

When deploying this model in public-facing applications, developers should consider adding safety filters, human review, and domain-specific evaluation. The model should not be used to produce harmful, deceptive, or high-risk advice.

Contributors

Yuv Raj Pant
Himalaya AI Labs

Acknowledgements

This model is based on Google DeepMind's Gemma 4 E2B IT model and was fine-tuned using the Himalaya AI Nepali SFT dataset.

Downloads last month: 28

Model tree for himalaya-ai/gemma4-e2b-it-nepali

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Adapter

(90)

this model

himalaya-ai
/

gemma4-e2b-it-nepali