Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Paper • 2412.15606 • Published Dec 20, 2024 • 2
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage Paper • 2412.15606 • Published Dec 20, 2024 • 2
LongViTU: Instruction Tuning for Long-Form Video Understanding Paper • 2501.05037 • Published Jan 9 • 1
LongViTU: Instruction Tuning for Long-Form Video Understanding Paper • 2501.05037 • Published Jan 9 • 1
FIRE: A Dataset for Feedback Integration and Refinement Evaluation of Multimodal Models Paper • 2407.11522 • Published Jul 16, 2024 • 9
Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World Paper • 2310.10207 • Published Oct 16, 2023