Papers
arxiv:2507.02726

Bourbaki: Self-Generated and Goal-Conditioned MDPs for Theorem Proving

Published on Jul 3
· Submitted by hba123 on Jul 4
Authors:
,
,
,
,
,

Abstract

A new framework using self-generated goal-conditioned MDPs with MCTS-like algorithms enhances LLM performance in automated theorem proving, particularly on benchmarks like PutnamBench.

AI-generated summary

Reasoning remains a challenging task for large language models (LLMs), especially within the logically constrained environment of automated theorem proving (ATP), due to sparse rewards and the vast scale of proofs. These challenges are amplified in benchmarks like PutnamBench, which contains university-level problems requiring complex, multi-step reasoning. To address this, we introduce self-generated goal-conditioned MDPs (sG-MDPs), a new framework in which agents generate and pursue their subgoals based on the evolving proof state. Given this more structured generation of goals, the resulting problem becomes more amenable to search. We then apply Monte Carlo Tree Search (MCTS)-like algorithms to solve the sG-MDP, instantiating our approach in Bourbaki (7B), a modular system that can ensemble multiple 7B LLMs for subgoal generation and tactic synthesis. On PutnamBench, Bourbaki (7B) solves 26 problems, achieving new state-of-the-art results with models at this scale.

Community

Paper submitter

We present a new algorithm for automatic theorem proving, showing new state-of-the-art results on PutnamBench. We call our method Bourbaki (7B) inspired by the pseudonymous group of French mathematicians known for their rigorous and systematic approach to mathematics; check them out (https://en.wikipedia.org/wiki/Nicolas_Bourbaki) :=D

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.02726 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.02726 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.02726 in a Space README.md to link it from this page.

Collections including this paper 1