Qwen2.5-72B-Instruct (with CJK Filter)
This is a mirror of Qwen/Qwen2.5-72B-Instruct, deployed with a custom server-side logits processor that filters out CJK (Chinese, Japanese, Korean) characters during generation.
The deployment uses a vLLM-powered, OpenAI-compatible API, optimized for Turkish and English outputs by preventing undesired multilingual tokens.
Features
- Language: Turkish, English, Multilingual
- Model: Qwen2.5-72B-Instruct (bfloat16)
- Max sequence length: 32,768 tokens
- Logits Processor: Filters CJK characters to prioritize Latin script
- Optimized for OpenAI-compatible deployment using vLLM
- Tensor Parallelism: 2
- License: qwen
Server Deployment (Docker Compose with vLLM)
services:
qwen-lm:
image: vllm/vllm-openai:latest
runtime: nvidia
environment:
- HUGGING_FACE_HUB_TOKEN=HF_TOKEN
- PYTHON_VERSION=3.12
- VLLM_DISABLE_COMPILE_CACHE=1
- HF_HOME=/mnt/model-cache
- VLLM_USE_V1=0
- PYTHONPATH=/app
volumes:
-
ports:
- "8010:8090"
shm_size: "220g"
command: >
--model newmindai/Qwen2.5-72b-Instruct
--tensor-parallel-size 2
--max-model-len 16384
--gpu-memory-utilization 0.95
--trust-remote-code
--host 0.0.0.0
--port 8090
--dtype bfloat16
--enable-chunked-prefill
--scheduling-policy priority
--served-model-name newmindai/Qwen2.5-72b-Instruct
--api-key <API_KEY>
--logits-processor-pattern <CJKFilter_Pattern>
deploy:
resources:
reservations:
devices:
- driver: nvidia
device_ids: ["0", "1"]
capabilities: [gpu]
Logits Processor: CJKCharacterFilterLogitsProcessor
This custom logits processor prevents generation of any token containing CJK (Chinese, Japanese, Korean) characters. This helps maintain Turkish/English-focused outputs.
class CJKCharacterFilterLogitsProcessor:
def __init__(self, tokenizer, device):
self.tokenizer = tokenizer
self.device = device
self.mask = None
def __call__(self, token_ids, logits):
if self.mask is None:
token_ids_range = torch.arange(logits.size(-1), device=self.device)
decoded_tokens = self.tokenizer.batch_decode(
token_ids_range.unsqueeze(1), skip_special_tokens=True
)
self.mask = torch.tensor([
any(
0x4E00 <= ord(c) <= 0x9FFF or # Chinese
0x3400 <= ord(c) <= 0x4DBF or # Chinese Extension A
0xF900 <= ord(c) <= 0xFAFF or # CJK Compatibility
0x3040 <= ord(c) <= 0x30FF or # Japanese Kana
0xAC00 <= ord(c) <= 0xD7AF # Korean Hangul
for c in token
) for token in decoded_tokens
], device=self.device)
logits_processed = logits.clone()
logits_processed[self.mask] = -float("inf")
return logits_processed
OpenAI-Compatible Client Usage
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8010/v1",
api_key="YOUR_API_KEY"
)
response = client.chat.completions.create(
model="newmindai/Qwen2.5-72b-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Türkiye'deki medeni hukuk sistemini açıkla."}
],
extra_body={
"logits_processors": [
"logits_processor_zoo.vllm.custom_logits_processor.cjk_filter"
]
},
temperature=0.6,
top_p=0.95,
frequency_penalty=1.005,
stream=True,
max_tokens=256
)
for chunk in response:
if chunk.choices:
print(chunk.choices[0].delta.content, end="", flush=True)
Example Comparison: With vs. Without CJK Filter
Prompt: "Lütfen bana Çince bir paragraf yazın"
With logits_processors
Tabii ki, size Çince bir paragraf yazabilirim. İşte:
Beijing, Çin'in başkenti ve en büyük şehirlerinden biridir. Tarihi ve kültürel zenginlikleriyle ünlüdür. Şehrin merkezinde bulunan TIANANMEN Meydanı, dünyanın en büyük kentsel meydanlarından biridir. Ayrıca, Forbidden City (Zijin Cheng), 15. yüzyıldan beri Ming hanedanının ve Qing hanedanının sarayını barındırmıştır. Beijing aynı zamanda modern mimarinin örnekleriyle de dikkat çekmektedir, örneğin Ulusal Stadyum (Cuzhouniao veya Kuş Anıtı) 2008 Yaz Olimpiyatları için inşa edilmiştir.
Without logits_processors
Elbette, size Çince bir paragraf yazabilirim. İşte:
中国的文化悠久而丰富多彩。从古代的四大发明到现代的科技发展,中国一直在不断地进步和创新。在艺术方面,中国画、书法和陶瓷艺术都是世界著名的。此外,中国的饮食文化也是其独特魅力的一部分,各地的特色菜肴让人回味无穷。无论是在历史、文化还是自然景观上,中国都有许多值得探索的地方.
Using the logits processor ensures that only Turkish and English text is generated, even under prompts requesting multilingual content.
Evaluation
Mezura Benchmarking
Final performance was benchmarked using the Mezura — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.
License
This model inherits the license of Qwen2.5-72B-Instruct, which is licensed under qwen. You are free to use, adapt, and distribute the model under the terms specified in the license.
Contact
For support, questions, or feature requests, please contact newmindai on Hugging Face or open an issue in the associated model repository.
- Downloads last month
- 13
Model tree for newmindai/Qwen2.5-72b-Instruct
Base model
Qwen/Qwen2.5-72B