You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
By clicking "Agree", you agree to the License Agreement and acknowledge Writer's Privacy Policy.
Log in or Sign Up to review the conditions and access this model content.
Palmyra-local-1.7B-Instruct
Introduction
Palmyra-local is part of the Palmyra series of domain-specialized language models, designed for high performance on enterprise and task-specific use cases. This release features a 1.7 billion parameter instruction-tuned variant of Palmyra-local, built for local deployment and optimized for enterprise-grade language understanding and generation.
Compared to earlier versions, Palmyra-local brings the following enhancements:
- Stronger domain reasoning in code and math, powered by targeted expert tuning and curated domain datasets.
- Improved instruction-following, generation of long-form outputs (8K+ tokens), accurate handling of structured data (e.g., tables), and consistent structured output generation (especially JSON).
- Robust prompt handling, enabling nuanced role-play, dynamic agent behavior, and complex prompt chaining in enterprise workflows.
- Extended context support, with a maximum context window of 128K tokens and generation support for up to 8K tokens.
- Multilingual capabilities, supporting over 29 languages including English, Spanish, French, German, Chinese, Arabic, Japanese, and more.
This repository includes the instruction-tuned Palmyra-local 1.7B model, with the following architecture details:
- Type: Causal Language Model
- Training Stages: Pretraining + Instruction Tuning
- Architecture: Transformer with RoPE positional encoding
- Total Parameters: 1.7B
- Number of Layers: 28
- Attention Heads: GQA
Training Details
- Architecture: Palmyra
- Training Method: From scratch
- Attention Mechanism: GQA
- Training Data: [~1T packed dataset]
Benchmark Results
| Benchmark | Palmyra-local-1.7B | Qwen2.5-1.5B-Instruct | GPT-4 mini | Llama-3.2-1B-Instruct | Llama-3.2-3B-Instruct | 
|---|---|---|---|---|---|
| HumanEval | 74.10 | 61.60 | N/A | N/A | N/A | 
| MBPP | 66.86 | 63.20 | N/A | N/A | N/A | 
| GSM8K | 81.0 | 73.20 | 88.6 | N/A | 75.6 | 
| MATH | 60.94 | 55.20 | 64.0 | N/A | 46.7 | 
| MMLU | 59.82 | 58.37 | 67.3 | 32.2 | 58.0 | 
| MMLU Pro | 34.10 | 32.40 | 52.8 | N/A | N/A | 
| Average | 62.8 | 57.33 | N/A | N/A | N/A | 
Notes:
- HumanEval and MBPP: Benchmark data for these tasks were not available for GPT-4 mini, Llama-3.2-1B-Instruct, and Llama-3.2-3B-Instruct based on the model created sources.
Usage
Install dependencies
requirements.txt
transformers==4.51.0
torch==2.6.0
tokenizers==0.21.1
accelerate==1.6.0
pip install -r requirements.txt
Inference
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "Writer/Palmyra-local-1_7B"
auth_token = "xxx"
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True, token=auth_token)
# Load model with quantization for lower memory usage (optional)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
    token=auth_token,
)
# Prepare input
messages = [
    {"role": "user", "content": "Write a blog post about strangelets"},
]
# Check if apply_chat_template is available, fallback if not
if hasattr(tokenizer, "apply_chat_template"):
    input_ids = tokenizer.apply_chat_template(
        messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
    )
else:
    input_text = messages[0]["content"]
    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
# Ensure input_ids is on the same device as the model
input_ids = input_ids.to(model.device)
# Generation config
gen_conf = {
    "max_new_tokens": 256,
    "eos_token_id": tokenizer.eos_token_id,
    "temperature": 0.7,
    "top_p": 0.9,
}
# Generate output
with torch.inference_mode():
    output_id = model.generate(input_ids, **gen_conf)
# Decode output
output_text = tokenizer.decode(output_id[0][input_ids.shape[1]:], skip_special_tokens=True)
print(output_text)
Citation and Related Information
To cite this model:
@misc{Palmyra-Local-1.7B,
  author = {Writer Engineering team},
  title = {{Palmyra-Local-1.7B: A powerful LLM designed for On device run}},
  howpublished = {\url{https://dev.writer.com}},
  year = 2025,
  month = March 
}
Contact [email protected]
- Downloads last month
- -
