Post
2885
In our latest paper, Bourbaki (7b), we show how one can achieve state-of-the-art 7B theorem provers on PutnamBench by applying MCTS to what we call self-generated and goal-conditioned MDPs. I started a series of Blogs on this!
Why a series of Blogs 😝? I want to try to make everyone understand what Bourbaki (7b) is and what it does. I don't want to just give you a ChatGPT summary with some result hype. I think there are many things to improve, and I am hoping with more exposure to this, beyond experiments and codes, some people would be interested and help us improve it!
In this first blog, we will be talking basics: 1) MCTS and why it should be applied to LLMs so that the whole world is not just fine-tuning a 100000000000000000000000 b model on 10 data points (not that i have not done it before 🤪🤪), 2) the basics of MDPs, and 3) the Vanilla MCTS algorithm.
Check it out: https://huggingface.co/blog/hba123/bourbaki7b
If you find it useful, consider upvoting and sharing this post and the Hugging Face blog! Thank you 🥰🥰
Why a series of Blogs 😝? I want to try to make everyone understand what Bourbaki (7b) is and what it does. I don't want to just give you a ChatGPT summary with some result hype. I think there are many things to improve, and I am hoping with more exposure to this, beyond experiments and codes, some people would be interested and help us improve it!
In this first blog, we will be talking basics: 1) MCTS and why it should be applied to LLMs so that the whole world is not just fine-tuning a 100000000000000000000000 b model on 10 data points (not that i have not done it before 🤪🤪), 2) the basics of MDPs, and 3) the Vanilla MCTS algorithm.
Check it out: https://huggingface.co/blog/hba123/bourbaki7b
If you find it useful, consider upvoting and sharing this post and the Hugging Face blog! Thank you 🥰🥰