--- title: README emoji: 👁 colorFrom: green colorTo: purple sdk: static pinned: false ---

Fine-grain evaluation & Large Reasoning Models that fails in reasoning due to reasoning rigidity.
ConditionedMath (AIME & MATH500) · PuzzleTrivial · Zero-shot pipelines

--- ## 📜 Why ReasoningTrap? > Current RL-tuned Reasoning LLMs excel at *producing* answers but often ignore explicit user constraints. > **ReasoningTrap** surfaces these failure modes with carefully crafted, *conditioned* problems. * **Modified from Famous MATH Reasoning Benchmark** – AIME & MATH500 problems altered with minimal constraints to divert reasoning paths. * **Puzzles Trivialized by Subtle Modifications** - Well-known puzzles where a small change transforms a challenging problem into a trivial one. * **Plug-and-play** – evaluate any 🤗 Transformers model with vLLM in simple instructions.