Spaces:

ReasoningTrap
/

README

Running

File size: 1,030 Bytes

f1a9ec4
 
 
 
 
 
 
 
 
186cea5
 
48163d7
d30ba4f
186cea5
 
 
 
 
 
d30ba4f
186cea5
d30ba4f
 
 
 
186cea5

---
title: README
emoji: 👁
colorFrom: green
colorTo: purple
sdk: static
pinned: false
---

<!-- Banner -------------------------------------------------------------- -->
<p align="center">
  <b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
  ConditionedMath (AIME &amp; MATH500) · PuzzleTrivial · Zero-shot pipelines
</p>

---

## 📜 Why ReasoningTrap?

> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints.  
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
* **Plug-and-play** – evaluate any 🤗 Transformers model with vLLM in simple instructions.