arxiv:2506.08343

Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

Published on Jun 10

· Submitted by

shuaishuaicdp on Jun 17

Upvote

Authors:

Tianyi Zhou

Abstract

NoWait suppresses explicit self-reflection tokens during inference to enhance efficiency in multimodal reasoning without reducing model utility.

AI-generated summary

Recent advances in large reasoning models have enabled complex, step-by-step reasoning but often introduce significant overthinking, resulting in verbose and redundant outputs that hinder efficiency. In this study, we examine whether explicit self-reflection, signaled by tokens such as "Wait" and "Hmm", is necessary for advanced reasoning. We propose NoWait, a simple yet effective approach that disables explicit self-reflection by suppressing these tokens during inference. Extensive experiments on ten benchmarks across textual, visual, and video reasoning tasks show that NoWait reduces chain-of-thought trajectory length by up to 27%-51% in five R1-style model series, without compromising model utility. NoWait thus offers a plug-and-play solution for efficient and utility-preserving multimodal reasoning.

View arXiv page View PDF Add to collection

Community

shuaishuaicdp

Paper submitter Jun 17

🚀 Do we really need to "Wait" in AI reasoning?

NEW RESEARCH: Removing "Wait", "Hmm" thinking tokens BOOSTS efficiency by 27%-51%! 🤯

🔥 Key Findings

❌ "Wait, let me think again..."
❌ "Hmm, maybe I should..."
✅ Direct reasoning = 2x efficiency!

⚡ NoWait Method Highlights:

🎯 Training-Free: Plug-and-play solution
📊 Massive Token Reduction: Up to 51% shorter outputs
🎯 Accuracy Preserved: Performance maintained or improved
🌐 Multimodal: Text + Vision + Video reasoning