new

Get trending papers in your email inbox once a day!

Get trending papers in your email inbox!

Daily Papers

byAK and the research community

Nov 7

Submitted by

lkdhy

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

OpenMOSS-Team

Submitted by

taesiri

V-Thinker: Interactive Thinking with Images

·
13 authors

Submitted by

taesiri

Scaling Agent Learning via Experience Synthesis

metaresearch

Submitted by

taesiri

Cambrian-S: Towards Spatial Supersensing in Video

·
15 authors

Submitted by

taesiri

NVIDIA Nemotron Nano V2 VL

nvidia

Submitted by

vyokky

GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents

microsoft

Submitted by

mucai

Contamination Detection for VLMs using Multi-Modal Semantic Perturbation

uw-madison

University of Wisconsin - Madison

Submitted by

h-otsuka

The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

·
6 authors

2

Submitted by

jihanyang

Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts

nyu-visionx

Submitted by

taesiri

Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots

ByteDance-Seed

Submitted by

spapi

How to Evaluate Speech Translation with Source-Aware Neural MT Metrics

·
5 authors

2

Submitted by

ellisbrown

SIMS-V: Simulated Instruction-Tuning for Spatial Video Understanding

nyu-visionx

Submitted by

nielsr

RDMA Point-to-Point Communication for LLM Systems

perplexity-ai

Submitted by

Shuhuhuhu

SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

BytedanceDouyinContent

BytedanceDouyinContent

Submitted by

liushanyuan18

EVTAR: End-to-End Try on with Additional Unpaired Visual Reference

qihoo360

北京奇虎科技有限公司

2