File size: 1,030 Bytes
f1a9ec4
 
 
 
 
 
 
 
 
186cea5
 
48163d7
d30ba4f
186cea5
 
 
 
 
 
d30ba4f
186cea5
d30ba4f
 
 
 
186cea5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
title: README
emoji: πŸ‘
colorFrom: green
colorTo: purple
sdk: static
pinned: false
---

<!-- Banner -------------------------------------------------------------- -->
<p align="center">
  <b>Fine-grain evaluation &amp; Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
  ConditionedMath (AIME &amp; MATH500) Β· PuzzleTrivial Β· Zero-shot pipelines
</p>

---

## πŸ“œ Why ReasoningTrap?

> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints.  
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
* **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
* **Plug-and-play** – evaluate any πŸ€— Transformers model with vLLM in simple instructions.