Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Paper • 2512.20848 • Published 3 days ago • 24
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published 9 days ago • 70
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning Paper • 2510.27623 • Published Oct 31 • 12
Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning Paper • 2510.24320 • Published Oct 28 • 19 • 3
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
Stress Testing Generalization: How Minor Modifications Undermine Large Language Model Performance Paper • 2502.12459 • Published Feb 18 • 2
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39
LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated Oct 17 • 1
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding Paper • 2510.14943 • Published Oct 16 • 39 • 2
LaSeR Collection Models from the paper "LaSeR: Reinforcement Learning with Last-Token Self-Rewarding" • 5 items • Updated Oct 17 • 1