AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems Paper • 2601.11354 • Published Jan 16 • 4
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following Paper • 2601.06431 • Published Jan 10 • 12
Pre-Trained Policy Discriminators are General Reward Models Paper • 2507.05197 • Published Jul 7, 2025 • 39
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26, 2025 • 46
A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models Paper • 2505.07591 • Published May 12, 2025 • 11
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published Apr 20, 2025 • 30
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published Mar 10, 2025 • 23
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published Mar 10, 2025 • 23
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Paper • 2502.09082 • Published Feb 13, 2025 • 30
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20, 2025 • 109
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators Paper • 2501.09484 • Published Jan 16, 2025 • 19
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use Paper • 2501.02506 • Published Jan 5, 2025 • 10
AAAR-1.0: Assessing AI's Potential to Assist Research Paper • 2410.22394 • Published Oct 29, 2024 • 16
Revealing the Barriers of Language Agents in Planning Paper • 2410.12409 • Published Oct 16, 2024 • 27
QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search Paper • 2306.06707 • Published Jun 11, 2023
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs Paper • 2407.00653 • Published Jun 30, 2024 • 13
From Persona to Personalization: A Survey on Role-Playing Language Agents Paper • 2404.18231 • Published Apr 28, 2024 • 1
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation Paper • 2404.12753 • Published Apr 19, 2024 • 43