Instructions to use orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix")
model = AutoModelForCausalLM.from_pretrained("orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix

SGLang

How to use orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix with Docker Model Runner:
```
docker model run hf.co/orai-nlp/Llama-3.1-8B-Instruct-Magpie_mix
```

Llama-3.1-8B-Instruct-Magpie_mix [BASELINE]

Fine-tuned version of Llama-3.1-8B-Instruct. Curated by instruction tuning the base model with mix of MagpieEU Basque instructions and Magpie-Llama-3.1-Pro-300K-Filtered English instructions.

📕 Paper: DIPLomA: Efficient Adaptation of Instructed LLMs to Low-Resource Languages via Post-Training Delta Merging

NOTE: This model is a baseline used in the paper. See Orai NLP's HuggingFace homepage to check up to date instructed models!

License

This model inherits the Llama 3.1 Community License from its base model. Before use or redistribution, please review the license terms

Citation

If you use Llama-eus-8B-DIPLomA please cite the following reference:

@inproceedings{sarasua-etal-2025-diploma,
    title = "{DIPL}om{A}: Efficient Adaptation of Instructed {LLM}s to Low-Resource Languages via Post-Training Delta Merging",
    author = "Sarasua, Ixak  and
      Corral, Ander  and
      Saralegi, Xabier",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.1355/",
    pages = "24898--24912",
    ISBN = "979-8-89176-335-7",
    abstract = "This paper investigates how open-weight instruction-tuned large language models (LLMs) can be efficiently adapted to low-resource languages without requiring costly large-scale post-training. We introduce DIPLomA (Decoupled Instruction-Preserving Language Adaptation), a lightweight delta-based transfer strategy that provides a practical and effective solution for this scenario. DIPLomA decouples language adaptation from post-training alignment by first continually pretraining a foundational LLM on a modest amount of monolingual target-language data while anchoring on English replay, and then injecting instruction-following capabilities via delta-based weight merging from the instructed counterpart of the base LLM. We evaluate DIPLomA on Basque and validate its generality on Welsh and Swahili, demonstrating consistent and substantial gains in instruction-following, linguistic proficiency, and safety. Compared to strong baselines, our method achieves average relative improvements of 50 points in Basque, 63 in Welsh, and 51 in Swahili, while preserving the original model{'}s multilingual performance. These results highlight DIPLomA as an effective, resource-efficient strategy for bringing high-quality instruction alignment to underrepresented languages at scale."
}