ECHO-2: A Large-Scale Distributed Rollout Framework for Cost-Efficient Reinforcement Learning Paper • 2602.02192 • Published 11 days ago • 12
Surprisal Guided Selection Collection Training at test-time for kernel optimization • 2 items • Updated about 24 hours ago • 1
OpenSec: Incident Response Agent Calibration Collection OpenSec is a dual-control RL environment, dataset, and evaluation suite that measures agent calibration on incident response tasks. • 4 items • Updated about 24 hours ago • 1
Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation Paper • 2602.07670 • Published 6 days ago • 1
view article Article Where should test-time compute go? Surprisal-guided selection in verifiable environments 6 days ago • 1
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents Paper • 2601.18217 • Published 18 days ago • 11
OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence Paper • 2601.21083 • Published 16 days ago • 1
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano v3. • 8 items • Updated 9 days ago • 63
view article Article Frontier Security Agents Don't Lack Detection. They Lack Restraint 21 days ago • 2
PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models Paper • 2601.11087 • Published 28 days ago • 11
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published Jan 10 • 52
CausalARC: Abstract Reasoning with Causal World Models Paper • 2509.03636 • Published Sep 3, 2025 • 1
Parakeet Collection NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 16 items • Updated 9 days ago • 54
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance Apr 16, 2025 • 61