CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
This repository contains the CodePlot-CoT model, a core component of the paper CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images. CodePlot-CoT is an innovative code-driven Chain-of-Thought (CoT) paradigm designed to enable Vision Language Models (VLMs) to "think with images" when solving mathematical problems. Instead of generating pixel-based images directly, the model outputs executable plotting code to represent its "visual thoughts". This code is then executed to render a precise figure, which is reinput to the model as a visual input for subsequent reasoning steps.
The model is built upon the Qwen2.5-VL architecture and is compatible with the transformers
library.

For more details, please refer to the project homepage and the GitHub repository.
Citation
If you find this work helpful, please consider citing our paper:
@article{duan2025code,
title={CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images},
author={Duan, Chengqi and Fang, Rongyao and Wang, Yuqing and Wang, Kun and Huang, Linjiang and Zeng, Xingyu and Li, Hongsheng and Liu, Xihui},
journal={arXiv preprint arXiv:2510.11718},
year={2025}
}
- Downloads last month
- 32