ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
Abstract
ORION models enhance reasoning efficiency and cost-effectiveness by compressing reasoning steps into ultra-compressed structured tokens, reducing latency and training costs while maintaining high accuracy.
Large Reasoning Models (LRMs) achieve strong performance in mathematics, code generation, and task planning, but their reliance on long chains of verbose "thinking" tokens leads to high latency, redundancy, and incoherent reasoning paths. Inspired by the Language of Thought Hypothesis, which posits that human reasoning operates over a symbolic, compositional mental language called Mentalese, we introduce a framework that trains models to reason in a similarly compact style. Mentalese encodes abstract reasoning as ultra-compressed, structured tokens, enabling models to solve complex problems with far fewer steps. To improve both efficiency and accuracy, we propose SHORTER LENGTH PREFERENCE OPTIMIZATION (SLPO), a reinforcement learning method that rewards concise solutions that stay correct, while still allowing longer reasoning when needed. Applied to Mentalese-aligned models, SLPO yields significantly higher compression rates by enabling concise reasoning that preserves the benefits of detailed thinking without the computational overhead. Across benchmarks including AIME 2024 and 2025, MinervaMath, OlympiadBench, Math500, and AMC, our ORION models produce reasoning traces with 4-16x fewer tokens, achieve up to 5x lower inference latency, and reduce training costs by 7-9x relative to the DeepSeek R1 Distilled model, while maintaining 90-98% of its accuracy. ORION also surpasses Claude and ChatGPT-4o by up to 5% in accuracy while maintaining 2x compression. These results show that Mentalese-style compressed reasoning offers a step toward human-like cognitive efficiency, enabling real-time, cost-effective reasoning without sacrificing accuracy.
Community
๐ Paper Overview
We introduce ORION, a framework designed to teach language models to reason efficiently using a compact, symbolic language-of-thought representation. Instead of generating long, often redundant chain-of-thought explanations, we train models to express reasoning through concise, structured symbolic steps. Our approach combines supervised learning with a novel reinforcement learning objective to optimize both correctness and efficiency.
โณ Evolutionary Context
Reasoning in large language models has evolved from prompt-based chain-of-thought, to supervised step-by-step reasoning, to reinforcement-learning-driven verifiable reasoning. We position ORION as the next step: efficient symbolic reasoning, moving toward future systems capable of scalable, faithful, and cost-effective problem solving with minimal redundancy.
๐งฌ Core Framework
Symbolic Reasoning Format
We formalize a Mentalese-style representation that encodes reasoning using atomic operations. This removes natural-language verbosity and keeps every operation semantically necessary.
Training Pipeline
- Foundation Stage: Supervised fine-tuning on symbolic reasoning traces from our MentaleseR dataset.
- RL Stage: Reinforcement learning with verifier feedback using Shorter Length Preference Optimization (SLPO), which encourages concise yet correct reasoning without rigid length constraints.
Evaluation System
We benchmark ORION on high-difficulty math reasoning tasks such as AIME, AMC, and MATH-500, measuring trace length, accuracy, inference cost, and faithfulness.
Capability Optimization
Our RL objective adaptively balances correctness and compactness, enabling models to keep reasoning steps minimal while preserving validity.
Efficiency Engineering
We implement a full pipeline optimized for deployment: shorter traces yield 4โ16ร fewer tokens, 5ร lower latency, and 7โ9ร lower compute cost.
Interpretability & Faithfulness
Structured symbolic traces allow transparent inspection of each reasoning step, improving debuggability and model reliability.
๐ญ Key Insights
- Most chain-of-thought traces are highly redundant; symbolic reasoning eliminates unnecessary natural-language overhead.
- Conciseness improves both model efficiency and faithfulness, reducing hallucinated or irrelevant steps.
- Reward shaping through SLPO is more flexible and robust than rigid length constraints.
- Efficient reasoning is essential for real-time agents and resource-constrained deployments.
๐ก Value Proposition
ORION provides a clear technical pathway for efficient reasoning in LLMs. By combining symbolic thinking with reinforcement learning, we demonstrate that models can achieve strong accuracy with far fewer reasoning tokens. This work serves as both a conceptual shift away from verbose chain-of-thought and a practical system enabling faster, cheaper, and more interpretable reasoning, suitable for both academic research and real-world applications.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Efficient Reasoning via Thought-Training and Thought-Free Inference (2025)
- ARM2: Adaptive Reasoning Model with Vision Understanding and Executable Code (2025)
- Teaching Language Models to Reason with Tools (2025)
- PEAR: Phase Entropy Aware Reward for Efficient Reasoning (2025)
- DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains (2025)
- Beyond Correctness: Rewarding Faithful Reasoning in Retrieval-Augmented Generation (2025)
- Beyond Reasoning Gains: Mitigating General Capabilities Forgetting in Large Reasoning Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper