Submitted by Wenxuan123 133 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model HKUSTGZ 29 2
Submitted by xiaochonglinghu 90 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training AMAP-ML 66 3
Submitted by Everything-is-Ok 90 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation CLAIN-WHU 7 2
Submitted by gowitheflow 82 Scaling Language-Centric Omnimodal Representation Learning DAMO Academy 14 4
Submitted by YuyaoGe 31 A Survey of Vibe Coding with Large Language Models Institute of Computing Technology, Chinese Academy of Sciences 10 2
Submitted by taesiri 30 FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution · 7 authors 76 3
Submitted by raymin0223 26 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models KAIST AI 2
Submitted by Ray2333 20 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning University of Illinois at Urbana-Champaign 2
Submitted by Wayne-King 15 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models The University of Hong Kong 29 3
Submitted by taesiri 14 UniFusion: Vision-Language Model as Unified Encoder in Image Generation Adobe 3
Submitted by XingweiT 13 Deconstructing Attention: Investigating Design Principles for Effective Language Modeling · 3 authors 2
Submitted by TokerZ 12 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks · 6 authors 2
Submitted by NeoZ123 12 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models Z.ai 5 2
Submitted by simonycl 11 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Stanford NLP 51 3
Submitted by taesiri 8 SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model ByteDance 2
Submitted by dongyuanjushi 7 R-WoM: Retrieval-augmented World Model For Computer-use Agents · 7 authors 2
Submitted by ArmelRandy 4 LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens ALMAnaCH (Inria) 0 2
Submitted by ruihangxu 4 ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation · 4 authors 4 2
Submitted by MasterZhou 3 The Geometry of Reasoning: Flowing Logics in Representation Space · 5 authors 3 2
Submitted by Franck-Dernoncourt 3 MLLM as a UI Judge: Benchmarking Multimodal LLMs for Predicting Human Perception of User Interfaces · 15 authors 2
Submitted by codezakh 2 One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration · 5 authors 1 2
Submitted by linghan199 2 ExpVid: A Benchmark for Experiment Video Understanding & Reasoning OpenGVLab 4 2
Submitted by YongdingTao 2 Detecting Data Contamination from Reinforcement Learning Post-training for Large Language Models Peking University 3 2
Submitted by orpatashnik 2 Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing · 6 authors 2
Submitted by CuiLong7 1 ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution OpenGVLab 2
Submitted by ConnorZhong 1 Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance THUML @ Tsinghua University 2
Submitted by ttttonyhe 1 Locket: Robust Feature-Locking Technique for Language Models University of Waterloo 2
Submitted by ShuoChen99 1 Bag of Tricks for Subverting Reasoning-based Safety Guardrails · 9 authors 2
Submitted by JiayuDing 1 Information-Preserving Reformulation of Reasoning Traces for Antidistillation Microsoft 2
Submitted by southKH 1 Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap · 5 authors 2
Submitted by MasterZhou 1 Why Do Transformers Fail to Forecast Time Series In-Context? · 4 authors 2 2
Submitted by cesun 1 ReFIne: A Framework for Trustworthy Large Reasoning Models with Reliability, Faithfulness, and Interpretability · 4 authors 2
Submitted by zhengda1936 - dInfer: An Efficient Inference Framework for Diffusion Language Models · 23 authors 2
Submitted by sunweiwei - Scaling LLM Multi-turn RL with End-to-end Summarization-based Context Management · 7 authors 2