Instructions to use Bharatdeep-H/qwen2.5-14b-desi with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Bharatdeep-H/qwen2.5-14b-desi with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Bharatdeep-H/qwen2.5-14b-desi")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Bharatdeep-H/qwen2.5-14b-desi")
model = AutoModelForCausalLM.from_pretrained("Bharatdeep-H/qwen2.5-14b-desi")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Bharatdeep-H/qwen2.5-14b-desi with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Bharatdeep-H/qwen2.5-14b-desi"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bharatdeep-H/qwen2.5-14b-desi",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Bharatdeep-H/qwen2.5-14b-desi

SGLang

How to use Bharatdeep-H/qwen2.5-14b-desi with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Bharatdeep-H/qwen2.5-14b-desi" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bharatdeep-H/qwen2.5-14b-desi",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Bharatdeep-H/qwen2.5-14b-desi" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bharatdeep-H/qwen2.5-14b-desi",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use Bharatdeep-H/qwen2.5-14b-desi with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Bharatdeep-H/qwen2.5-14b-desi to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Bharatdeep-H/qwen2.5-14b-desi to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Bharatdeep-H/qwen2.5-14b-desi to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Bharatdeep-H/qwen2.5-14b-desi",
    max_seq_length=2048,
)

Docker Model Runner
How to use Bharatdeep-H/qwen2.5-14b-desi with Docker Model Runner:
```
docker model run hf.co/Bharatdeep-H/qwen2.5-14b-desi
```

Qwen-Desi: Multilingual Indian Language Model

Model Description

Qwen-Desi is a multilingual language model fine-tuned from Sarvam-AI's Sarvam M model, specifically designed to support Indian languages and their transliterated variants. This model excels at understanding and generating text in English, Hindi, Kannada, and their English-transliterated forms (Hinglish and Kannadish).

Developed by: Anshuman Suresh and Bharatdeep Hazarika
Model type: Causal Language Model
Base model: unsloth/Qwen2.5-14B-Instruct-unsloth-bnb-4bit
Parent model: sarvamai/sarvam-m (24B Mistral)
Language(s): English, Hindi (Devanagari), Hinglish, Kannada, Kannadish
License: Apache-2.0

Supported Languages

English: Standard English language
Hindi: Written in Devanagari script (हिंदी)
Hinglish: Hindi words written in English script (transliterated)
Kannada: Written in Kannada script (ಕನ್ನಡ)
Kannadish: Kannada words written in English script (transliterated)

Training Details

This Qwen2 model was efficiently fine-tuned using:

Training Framework: Unsloth (2x faster training)
Library: Hugging Face TRL (Transformer Reinforcement Learning)
Base model: unsloth/Qwen2.5-14B-Instruct-unsloth-bnb-4bit
Parent model: sarvamai/sarvam-m
Language(s): English, Hindi (Devanagari), Hinglish, Kannada, Kannadish
Architecture: Qwen2-based transformer

Usage

Direct Usage with OpenAI Compatible Package

import json
from openai import OpenAI

client = OpenAI(
    base_url="http://<URL>/v1/",
    api_key="your-api-key"
)

messages = [
    {
        "role": "system", 
        "content": """
You are a helpful assistant. You support five languages: English, Hindi, Hinglish, Kannada and Kannadish. 
English is the standard English language. Hindi is written in Devanagari script. 
Hinglish refers to Hindi words written in English script (Hindi transliterated to English). 
Kannada is written in Kannada script. Kannadish refers to Kannada words written in English script 
(Kannada transliterated to English). Infer user's query and answer in Kannadish (English script).
"""
    },
    {
        "role": "user",
        "content": "Mujhe Lebron James ke baare mai info do"
    },
]

response = client.chat.completions.create(
    model="Bharatdeep-H/qwen2.5-14b-desi",
    messages=messages,
    temperature=0.3,
    max_tokens=3096,
    frequency_penalty=0,
    presence_penalty=1.05,
    top_p=0.2,
    seed=42,
    stream=True,
    stream_options={"include_usage": True},
)

for token in response:
    if hasattr(token, 'choices') and token.choices[0].delta.content:
        print(token.choices[0].delta.content, end='', flush=True)

Model Capabilities

Code-mixed conversations: Seamlessly handles conversations mixing English with Hindi/Kannada
Script flexibility: Understands both native scripts and transliterated text
Multi-turn dialogue: Maintains context across conversation turns
Language detection: Automatically infers the preferred response language

Recommended Parameters

# Recommended inference parameters
temperature = 0.3        # For more focused responses
max_tokens = 4096       # Adjust based on your needs
top_p = 0.2            # For controlled generation
frequency_penalty = 0   # Prevent repetition
presence_penalty = 1.05 # Encourage diverse responses