TVP-OPD-Qwen2VL-2B

Unified distilled LoRA adapter for Thinking with Visual Primitives.

Stage 3: On-Policy Distillation — Combines Box and Point expert capabilities into a single model.

Base model: Qwen/Qwen2-VL-2B-Instruct
LoRA: r=64, alpha=128
Distillation: Forward KL (temperature=1.0) + CE (ce_coeff=0.5), lr=1e-6
KL divergence: 0.35 (high-fidelity distillation)
Capabilities: Grounding, counting, spatial reasoning, maze navigation, path tracing

Example Output

1. **Analyzing the request**
The user asks me to locate the sports ball in this image.
2. **Object grounding**
I see a <|ref|>sports ball<|/ref|><|box|>[[277,244,479,510]]<|/box|>.
3. **Conclusion**
The sports ball is located at the specified coordinates.

Usage

from model import VisualPrimitiveVLM
model = VisualPrimitiveVLM.from_pretrained("yunfengwang/TVP-OPD-Qwen2VL-2B", device_map="cuda")

See the project repo for full instructions.

Framework versions

PEFT 0.12.0

Downloads last month: 122

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yunfengwang/TVP-OPD-Qwen2VL-2B

Base model

Qwen/Qwen2-VL-2B

Finetuned

Qwen/Qwen2-VL-2B-Instruct

Adapter

(159)

this model

Space using yunfengwang/TVP-OPD-Qwen2VL-2B 1

Collection including yunfengwang/TVP-OPD-Qwen2VL-2B

TVP-Thinking with Visual Primitives

Collection

4 items • Updated 9 days ago