TVP-Thinking with Visual Primitives
Collection
4 items • Updated
How to use yunfengwang/TVP-OPD-Qwen2VL-2B with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = PeftModel.from_pretrained(base_model, "yunfengwang/TVP-OPD-Qwen2VL-2B")Unified distilled LoRA adapter for Thinking with Visual Primitives.
Stage 3: On-Policy Distillation — Combines Box and Point expert capabilities into a single model.
1. **Analyzing the request**
The user asks me to locate the sports ball in this image.
2. **Object grounding**
I see a <|ref|>sports ball<|/ref|><|box|>[[277,244,479,510]]<|/box|>.
3. **Conclusion**
The sports ball is located at the specified coordinates.
from model import VisualPrimitiveVLM
model = VisualPrimitiveVLM.from_pretrained("yunfengwang/TVP-OPD-Qwen2VL-2B", device_map="cuda")
See the project repo for full instructions.