Qwen2.5-72B-Instruct (with CJK Filter)

This is a mirror of Qwen/Qwen2.5-72B-Instruct, deployed with a custom server-side logits processor that filters out CJK (Chinese, Japanese, Korean) characters during generation.

The deployment uses a vLLM-powered, OpenAI-compatible API, optimized for Turkish and English outputs by preventing undesired multilingual tokens.


Features

  • Language: Turkish, English, Multilingual
  • Model: Qwen2.5-72B-Instruct (bfloat16)
  • Max sequence length: 32,768 tokens
  • Logits Processor: Filters CJK characters to prioritize Latin script
  • Optimized for OpenAI-compatible deployment using vLLM
  • Tensor Parallelism: 2
  • License: qwen

Server Deployment (Docker Compose with vLLM)

services:
  qwen-lm:
    image: vllm/vllm-openai:latest
    runtime: nvidia
    environment:
      - HUGGING_FACE_HUB_TOKEN=HF_TOKEN
      - PYTHON_VERSION=3.12
      - VLLM_DISABLE_COMPILE_CACHE=1
      - HF_HOME=/mnt/model-cache
      - VLLM_USE_V1=0
      - PYTHONPATH=/app
    volumes:
      - 
    ports:
      - "8010:8090"
    shm_size: "220g"
    command: >
      --model newmindai/Qwen2.5-72b-Instruct
      --tensor-parallel-size 2
      --max-model-len 16384
      --gpu-memory-utilization 0.95
      --trust-remote-code
      --host 0.0.0.0
      --port 8090
      --dtype bfloat16
      --enable-chunked-prefill
      --scheduling-policy priority
      --served-model-name newmindai/Qwen2.5-72b-Instruct
      --api-key <API_KEY>
      --logits-processor-pattern <CJKFilter_Pattern>
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0", "1"]
              capabilities: [gpu]

Logits Processor: CJKCharacterFilterLogitsProcessor

This custom logits processor prevents generation of any token containing CJK (Chinese, Japanese, Korean) characters. This helps maintain Turkish/English-focused outputs.

class CJKCharacterFilterLogitsProcessor:
    def __init__(self, tokenizer, device):
        self.tokenizer = tokenizer
        self.device = device
        self.mask = None

    def __call__(self, token_ids, logits):
        if self.mask is None:
            token_ids_range = torch.arange(logits.size(-1), device=self.device)
            decoded_tokens = self.tokenizer.batch_decode(
                token_ids_range.unsqueeze(1), skip_special_tokens=True
            )
            self.mask = torch.tensor([
                any(
                    0x4E00 <= ord(c) <= 0x9FFF or  # Chinese
                    0x3400 <= ord(c) <= 0x4DBF or  # Chinese Extension A
                    0xF900 <= ord(c) <= 0xFAFF or  # CJK Compatibility
                    0x3040 <= ord(c) <= 0x30FF or  # Japanese Kana
                    0xAC00 <= ord(c) <= 0xD7AF     # Korean Hangul
                    for c in token
                ) for token in decoded_tokens
            ], device=self.device)

        logits_processed = logits.clone()
        logits_processed[self.mask] = -float("inf")
        return logits_processed

OpenAI-Compatible Client Usage

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8010/v1",
    api_key="YOUR_API_KEY"
)

response = client.chat.completions.create(
    model="newmindai/Qwen2.5-72b-Instruct",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Türkiye'deki medeni hukuk sistemini açıkla."}
    ],
    extra_body={
        "logits_processors": [
            "logits_processor_zoo.vllm.custom_logits_processor.cjk_filter"
        ]
    },
    temperature=0.6,
    top_p=0.95,
    frequency_penalty=1.005,
    stream=True,
    max_tokens=256
)

for chunk in response:
    if chunk.choices:
        print(chunk.choices[0].delta.content, end="", flush=True)

Example Comparison: With vs. Without CJK Filter

Prompt: "Lütfen bana Çince bir paragraf yazın"

With logits_processors

Tabii ki, size Çince bir paragraf yazabilirim. İşte:

Beijing, Çin'in başkenti ve en büyük şehirlerinden biridir. Tarihi ve kültürel zenginlikleriyle ünlüdür. Şehrin merkezinde bulunan TIANANMEN Meydanı, dünyanın en büyük kentsel meydanlarından biridir. Ayrıca, Forbidden City (Zijin Cheng), 15. yüzyıldan beri Ming hanedanının ve Qing hanedanının sarayını barındırmıştır. Beijing aynı zamanda modern mimarinin örnekleriyle de dikkat çekmektedir, örneğin Ulusal Stadyum (Cuzhouniao veya Kuş Anıtı) 2008 Yaz Olimpiyatları için inşa edilmiştir.

Without logits_processors

Elbette, size Çince bir paragraf yazabilirim. İşte:

中国的文化悠久而丰富多彩。从古代的四大发明到现代的科技发展,中国一直在不断地进步和创新。在艺术方面,中国画、书法和陶瓷艺术都是世界著名的。此外,中国的饮食文化也是其独特魅力的一部分,各地的特色菜肴让人回味无穷。无论是在历史、文化还是自然景观上,中国都有许多值得探索的地方.

Using the logits processor ensures that only Turkish and English text is generated, even under prompts requesting multilingual content.


Evaluation

Mezura Benchmarking
Final performance was benchmarked using the Mezura — a standardized evaluation suite developed by NewmindAI for structured Turkish NLP tasks.

License

This model inherits the license of Qwen2.5-72B-Instruct, which is licensed under qwen. You are free to use, adapt, and distribute the model under the terms specified in the license.


Contact

For support, questions, or feature requests, please contact newmindai on Hugging Face or open an issue in the associated model repository.

Downloads last month
13
Safetensors
Model size
72.7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for newmindai/Qwen2.5-72b-Instruct

Base model

Qwen/Qwen2.5-72B
Finetuned
(46)
this model

Space using newmindai/Qwen2.5-72b-Instruct 1

Collection including newmindai/Qwen2.5-72b-Instruct