Papers
arxiv:2511.22891

ORION: Teaching Language Models to Reason Efficiently in the Language of Thought

Published on Nov 28
ยท Submitted by Kumar Tanmay on Dec 2
Authors:
,
,

Abstract

ORION models enhance reasoning efficiency and cost-effectiveness by compressing reasoning steps into ultra-compressed structured tokens, reducing latency and training costs while maintaining high accuracy.

AI-generated summary

Large Reasoning Models (LRMs) achieve strong performance in mathematics, code generation, and task planning, but their reliance on long chains of verbose "thinking" tokens leads to high latency, redundancy, and incoherent reasoning paths. Inspired by the Language of Thought Hypothesis, which posits that human reasoning operates over a symbolic, compositional mental language called Mentalese, we introduce a framework that trains models to reason in a similarly compact style. Mentalese encodes abstract reasoning as ultra-compressed, structured tokens, enabling models to solve complex problems with far fewer steps. To improve both efficiency and accuracy, we propose SHORTER LENGTH PREFERENCE OPTIMIZATION (SLPO), a reinforcement learning method that rewards concise solutions that stay correct, while still allowing longer reasoning when needed. Applied to Mentalese-aligned models, SLPO yields significantly higher compression rates by enabling concise reasoning that preserves the benefits of detailed thinking without the computational overhead. Across benchmarks including AIME 2024 and 2025, MinervaMath, OlympiadBench, Math500, and AMC, our ORION models produce reasoning traces with 4-16x fewer tokens, achieve up to 5x lower inference latency, and reduce training costs by 7-9x relative to the DeepSeek R1 Distilled model, while maintaining 90-98% of its accuracy. ORION also surpasses Claude and ChatGPT-4o by up to 5% in accuracy while maintaining 2x compression. These results show that Mentalese-style compressed reasoning offers a step toward human-like cognitive efficiency, enabling real-time, cost-effective reasoning without sacrificing accuracy.

Community

Paper submitter

๐Ÿ“œ Paper Overview

We introduce ORION, a framework designed to teach language models to reason efficiently using a compact, symbolic language-of-thought representation. Instead of generating long, often redundant chain-of-thought explanations, we train models to express reasoning through concise, structured symbolic steps. Our approach combines supervised learning with a novel reinforcement learning objective to optimize both correctness and efficiency.


โณ Evolutionary Context

Reasoning in large language models has evolved from prompt-based chain-of-thought, to supervised step-by-step reasoning, to reinforcement-learning-driven verifiable reasoning. We position ORION as the next step: efficient symbolic reasoning, moving toward future systems capable of scalable, faithful, and cost-effective problem solving with minimal redundancy.


๐Ÿงฌ Core Framework

Symbolic Reasoning Format

We formalize a Mentalese-style representation that encodes reasoning using atomic operations. This removes natural-language verbosity and keeps every operation semantically necessary.

Training Pipeline

  • Foundation Stage: Supervised fine-tuning on symbolic reasoning traces from our MentaleseR dataset.
  • RL Stage: Reinforcement learning with verifier feedback using Shorter Length Preference Optimization (SLPO), which encourages concise yet correct reasoning without rigid length constraints.

Evaluation System

We benchmark ORION on high-difficulty math reasoning tasks such as AIME, AMC, and MATH-500, measuring trace length, accuracy, inference cost, and faithfulness.

Capability Optimization

Our RL objective adaptively balances correctness and compactness, enabling models to keep reasoning steps minimal while preserving validity.

Efficiency Engineering

We implement a full pipeline optimized for deployment: shorter traces yield 4โ€“16ร— fewer tokens, 5ร— lower latency, and 7โ€“9ร— lower compute cost.

Interpretability & Faithfulness

Structured symbolic traces allow transparent inspection of each reasoning step, improving debuggability and model reliability.


๐Ÿ”ญ Key Insights

  • Most chain-of-thought traces are highly redundant; symbolic reasoning eliminates unnecessary natural-language overhead.
  • Conciseness improves both model efficiency and faithfulness, reducing hallucinated or irrelevant steps.
  • Reward shaping through SLPO is more flexible and robust than rigid length constraints.
  • Efficient reasoning is essential for real-time agents and resource-constrained deployments.

๐Ÿ’ก Value Proposition

ORION provides a clear technical pathway for efficient reasoning in LLMs. By combining symbolic thinking with reinforcement learning, we demonstrate that models can achieve strong accuracy with far fewer reasoning tokens. This work serves as both a conceptual shift away from verbose chain-of-thought and a practical system enabling faster, cheaper, and more interpretable reasoning, suitable for both academic research and real-world applications.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.22891 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.22891 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.22891 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.