Spaces:
Running
Running
File size: 1,030 Bytes
f1a9ec4 186cea5 48163d7 d30ba4f 186cea5 d30ba4f 186cea5 d30ba4f 186cea5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
title: README
emoji: π
colorFrom: green
colorTo: purple
sdk: static
pinned: false
---
<!-- Banner -------------------------------------------------------------- -->
<p align="center">
<b>Fine-grain evaluation & Large Reasoning Models that <i>fails in reasoning</i> due to <i>reasoning rigidity</i>.</b><br/>
ConditionedMath (AIME & MATH500) Β· PuzzleTrivial Β· Zero-shot pipelines
</p>
---
## π Why ReasoningTrap?
> Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints.
> **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems.
* **Modified from Famous MATH Reasoning Benchmark** β AIME & MATH500 problems altered with minimal constraints to divert reasoning paths.
* **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one.
* **Plug-and-play** β evaluate any π€ Transformers model with vLLM in simple instructions.
|