UI-Venus-1.5-8B 6bit

This is a 6-bit quantized MLX conversion of inclusionAI/UI-Venus-1.5-8B, optimized for Apple Silicon.

UI-Venus-1.5 is a unified end-to-end GUI agent family built for grounding, web navigation, and mobile navigation. The 1.5 family spans dense 2B and 8B variants plus a 30B-A3B MoE variant, and is framed upstream around a shared GUI semantics stage, online RL for long-horizon navigation, and model merging across grounding, web, and mobile domains.

This artifact was derived from the validated local MLX bf16 reference conversion and then quantized with mlx-vlm. It was validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field Value
Upstream model inclusionAI/UI-Venus-1.5-8B
Artifact type 6bit quantized MLX conversion
Source artifact local validated bf16 MLX artifact
Conversion tool mlx_vlm.convert via mlx-vlm 0.3.12
Python 3.11.14
MLX 0.31.0
Transformers 5.2.0
Validation backend vllm-mlx (phase/p1 @ 8a5d41b)
Quantization 6bit
Group size 64
Quantization mode affine
Converter dtype note float16
Reported effective bits per weight 7.125
Artifact size 7.28G
Template repair tokenizer_config.json["chat_template"] was re-injected after quantization

Additional notes:

  • This MLX artifact preserves the dual-template contract across chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"].
  • chat_template.jinja is present as an additive compatibility shim.
  • No manual dtype edit was applied after conversion.

Validation

This artifact passed local validation in this workspace:

  • mlx_vlm prompt-packet validation: PASS
  • vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

  • output stayed in the same behavior envelope as the local bf16 reference artifact
  • schema stayed valid on the structured-action prompt and retained the requested reason field
  • grounding drifted modestly lower/right relative to bf16, but still pointed at the correct API Host region

Performance

  • Artifact size on disk: 7.28G
  • Local fixed-packet mlx_vlm validation used about 20.82 GB peak memory
  • Observed local fixed-packet throughput was about 183-189 prompt tok/s and 40.7-51.9 generation tok/s across the four validation prompts
  • Local vllm-mlx serve validation completed in about 25.18s non-stream and 26.84s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/UI-Venus-1.5-8B-6bit \
  --image path/to/image.png \
  --prompt "Describe the visible controls on this screen." \
  --max-tokens 256 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/UI-Venus-1.5-8B-6bit")
result = generate(
    model,
    processor,
    prompt="Describe the visible controls on this screen.",
    image="path/to/image.png",
    max_tokens=256,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/UI-Venus-1.5-8B-6bit --mllm --localhost --port 8000

Links

Other Quantizations

Planned sibling repos in this wave:

Notes and Limitations

  • This card reports local MLX conversion and validation results only.
  • Upstream benchmark claims belong to the original UI-Venus model family and were not re-run here unless explicitly stated.
  • Quantization changes numerical behavior relative to the local bf16 reference artifact.
  • The main qualitative change relative to bf16 was modest grounding drift, not schema breakage or text collapse.

Citation

If you use this MLX conversion, please cite the original UI-Venus papers:

License

This repo follows the upstream model license: Apache 2.0. See the upstream model card for the authoritative license details: inclusionAI/UI-Venus-1.5-8B.

Downloads last month
11
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/UI-Venus-1.5-8B-6bit

Quantized
(6)
this model

Papers for mlx-community/UI-Venus-1.5-8B-6bit