Submitted by Elizaveta 73 When Less is Enough: Adaptive Token Reduction for Efficient Image Representation · 3 authors 2
Submitted by VentureZJ 55 MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving · 9 authors 2
Submitted by VentureZJ 45 MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization · 6 authors 3 2
Submitted by IranQin 41 RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints · 8 authors 2
Submitted by akhaliq 37 Modifying Large Language Model Post-Training for Diverse Creative Writing · 5 authors 3
Submitted by Epiphqny 34 Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation · 7 authors 130 4
Submitted by akhaliq 27 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting · 7 authors 3
Submitted by ydeng9 24 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement · 6 authors 102 2
Submitted by JacobYuan 15 MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems · 8 authors 6 3
Submitted by yairshp 14 Single Image Iterative Subject-driven Generation and Editing · 3 authors 97 2
Submitted by akhaliq 11 FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models · 7 authors 52 3
Submitted by Guan123 11 ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering · 8 authors 2
Submitted by hitsmy 9 From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration · 4 authors 2
Submitted by ChengmingX 6 When Preferences Diverge: Aligning Diffusion Models with Minority-Aware Adaptive DPO · 8 authors 2
Submitted by ZhaochongAn 5 Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model · 7 authors 30 2