Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models Paper • 2511.09809 • Published 23 days ago • 4
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published Oct 28 • 21
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration Paper • 2510.10395 • Published Oct 12 • 29
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning Paper • 2509.19894 • Published Sep 24 • 33
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper • 2506.07961 • Published Jun 9 • 11