TVP-OPD-Qwen2VL-2B

Unified distilled LoRA adapter for Thinking with Visual Primitives.

Stage 3: On-Policy Distillation — Combines Box and Point expert capabilities into a single model.

  • Base model: Qwen/Qwen2-VL-2B-Instruct
  • LoRA: r=64, alpha=128
  • Distillation: Forward KL (temperature=1.0) + CE (ce_coeff=0.5), lr=1e-6
  • KL divergence: 0.35 (high-fidelity distillation)
  • Capabilities: Grounding, counting, spatial reasoning, maze navigation, path tracing

Example Output

1. **Analyzing the request**
The user asks me to locate the sports ball in this image.
2. **Object grounding**
I see a <|ref|>sports ball<|/ref|><|box|>[[277,244,479,510]]<|/box|>.
3. **Conclusion**
The sports ball is located at the specified coordinates.

Usage

from model import VisualPrimitiveVLM
model = VisualPrimitiveVLM.from_pretrained("yunfengwang/TVP-OPD-Qwen2VL-2B", device_map="cuda")

See the project repo for full instructions.

Framework versions

  • PEFT 0.12.0
Downloads last month
122
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yunfengwang/TVP-OPD-Qwen2VL-2B

Base model

Qwen/Qwen2-VL-2B
Adapter
(159)
this model

Space using yunfengwang/TVP-OPD-Qwen2VL-2B 1

Collection including yunfengwang/TVP-OPD-Qwen2VL-2B