Snowflake/dare-bench
Viewer • Updated • 2.3k • 59 • 3
None defined yet.
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections
When Agents Disagree With Themselves: Measuring Behavioral Consistency in LLM-Based Agents