UI-Venus-1.5-8B 6bit

This is a 6-bit quantized MLX conversion of inclusionAI/UI-Venus-1.5-8B, optimized for Apple Silicon.

UI-Venus-1.5 is a unified end-to-end GUI agent family built for grounding, web navigation, and mobile navigation. The 1.5 family spans dense 2B and 8B variants plus a 30B-A3B MoE variant, and is framed upstream around a shared GUI semantics stage, online RL for long-horizon navigation, and model merging across grounding, web, and mobile domains.

This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field	Value
Upstream model	`inclusionAI/UI-Venus-1.5-8B`
Artifact type	`6bit quantized MLX conversion`
Source artifact	local validated `bf16` MLX artifact
Conversion tool	`mlx_vlm.convert` via `mlx-vlm 0.3.12`
Python	`3.11.14`
MLX	`0.31.0`
Transformers	`5.2.0`
Validation backend	`vllm-mlx (phase/p1 @ 8a5d41b)`
Quantization	`6bit`
Group size	`64`
Quantization mode	`affine`
Converter dtype note	`float16`
Reported effective bits per weight	`7.125`
Artifact size	`7.28G`
Template repair	`tokenizer_config.json["chat_template"]` was re-injected after quantization

Additional notes:

This MLX artifact preserves the dual-template contract across chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"].
chat_template.jinja is present as an additive compatibility shim.
No manual dtype edit was applied after conversion.

Validation

This artifact passed local validation in this workspace:

mlx_vlm prompt-packet validation: PASS
vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

output stayed in the same behavior envelope as the local bf16 reference artifact
schema stayed valid on the structured-action prompt and retained the requested reason field
grounding drifted modestly lower/right relative to bf16, but still pointed at the correct API Host region

Performance

Artifact size on disk: 7.28G
Local fixed-packet mlx_vlm validation used about 20.82 GB peak memory
Observed local fixed-packet throughput was about 183-189 prompt tok/s and 40.7-51.9 generation tok/s across the four validation prompts
Local vllm-mlx serve validation completed in about 25.18s non-stream and 26.84s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/UI-Venus-1.5-8B-6bit \
  --image path/to/image.png \
  --prompt "Describe the visible controls on this screen." \
  --max-tokens 256 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/UI-Venus-1.5-8B-6bit")
result = generate(
    model,
    processor,
    prompt="Describe the visible controls on this screen.",
    image="path/to/image.png",
    max_tokens=256,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/UI-Venus-1.5-8B-6bit --mllm --localhost --port 8000

Other Quantizations

Planned sibling repos in this wave:

Notes and Limitations

This card reports local MLX conversion and validation results only.
Upstream benchmark claims belong to the original UI-Venus model family and were not re-run here unless explicitly stated.
Quantization changes numerical behavior relative to the local bf16 reference artifact.
The main qualitative change relative to bf16 was modest grounding drift, not schema breakage or text collapse.

Citation

If you use this MLX conversion, please cite the original UI-Venus papers:

License

This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: inclusionAI/UI-Venus-1.5-8B.

Downloads last month: 11

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit

Model tree for mlx-community/UI-Venus-1.5-8B-6bit

Base model

inclusionAI/UI-Venus-1.5-8B

Quantized

(6)

this model

Papers for mlx-community/UI-Venus-1.5-8B-6bit

UI-Venus-1.5 Technical Report

Paper • 2602.09082 • Published Feb 9 • 157

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 45

mlx-community
/

UI-Venus-1.5-8B-6bit