Zen Vl 8B Instruct

Zen VL 8B Instruct - Vision-language model with Zen identity (9B params)

Model Details

Architecture: Qwen3-VL
Parameters: 8B
Context Window: 256K tokens (expandable to 1M)
License: Apache 2.0
Training: Fine-tuned with Zen identity and instruction following

Capabilities

🎨 Visual Understanding: Image analysis, video comprehension, spatial reasoning
📝 OCR: Text extraction in 32 languages
🧠 Multimodal Reasoning: STEM, math, code generation

Usage

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from PIL import Image

# Load model
model = Qwen3VLForConditionalGeneration.from_pretrained(
    "zenlm/zen-vl-8b-instruct",
    device_map="auto"
)
processor = AutoProcessor.from_pretrained("zenlm/zen-vl-8b-instruct")

# Process image
image = Image.open("example.jpg")
prompt = "What's in this image?"

messages = [{"role": "user", "content": prompt}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(**inputs, max_new_tokens=256)
response = processor.decode(outputs[0], skip_special_tokens=True)
print(response)

Citation

@misc{zenvl2025,
  title={Zen VL: Vision-Language Models with Integrated Function Calling},
  author={Hanzo AI Team},
  year={2025},
  publisher={Zen Language Models},
  url={https://github.com/zenlm/zen-vl}
}

License

Apache 2.0

Created by Hanzo AI for the Zen model family.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zenlm/zen-vl-8b-instruct

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(39)

this model

zenlm
/

zen-vl-8b-instruct