Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning Paper • 2509.09284 • Published Sep 11 • 2
Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning Paper • 2509.09284 • Published Sep 11 • 2
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning Paper • 2312.14878 • Published Dec 22, 2023 • 15
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning Paper • 2406.19741 • Published Jun 28, 2024 • 62
Almost Surely Safe Alignment of Large Language Models at Inference-Time Paper • 2502.01208 • Published Feb 3 • 11
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Paper • 2507.02726 • Published Jul 3 • 14
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26 • 11
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26 • 11
Rethinking Large Language Model Distillation: A Constrained Markov Decision Process Perspective Paper • 2509.22921 • Published Sep 26 • 11 • 2
view article Article <p style="text-align:center;"> Bridging the Gap: Making Robotics Feel Like Machine Learning </p> By hba123 • Aug 12 • 12
Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory Paper • 2507.16713 • Published Jul 22 • 21
view article Article <p style="text-align:center;"> Bourbaki (7b): SOTA 7B Algorithms for Putnam Bench (Part I: Reasoning MDPs)</p> By hba123 and 2 others • Jul 13 • 11
view article Article <p style="text-align:center;"> Bourbaki (7b): SOTA 7B Algorithms for Putnam Bench (Part I: Reasoning MDPs)</p> By hba123 and 2 others • Jul 13 • 11
Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving Paper • 2507.02726 • Published Jul 3 • 14
Almost Surely Safe Alignment of Large Language Models at Inference-Time Paper • 2502.01208 • Published Feb 3 • 11
view article Article Accelerating Language Model Inference with Mixture of Attentions By hba123 and 1 other • Jan 7 • 24
view article Article Accelerating Language Model Inference with Mixture of Attentions By hba123 and 1 other • Jan 7 • 24