Code2Video: A Code-centric Paradigm for Educational Video Generation
Abstract
Code2Video generates educational videos using a code-centric agent framework, improving coherence and interpretability compared to direct code generation.
While recent generative models advance pixel-space video synthesis, they remain limited in producing professional educational videos, which demand disciplinary knowledge, precise visual structures, and coherent transitions, limiting their applicability in educational scenarios. Intuitively, such requirements are better addressed through the manipulation of a renderable environment, which can be explicitly controlled via logical commands (e.g., code). In this work, we propose Code2Video, a code-centric agent framework for generating educational videos via executable Python code. The framework comprises three collaborative agents: (i) Planner, which structures lecture content into temporally coherent flows and prepares corresponding visual assets; (ii) Coder, which converts structured instructions into executable Python codes while incorporating scope-guided auto-fix to enhance efficiency; and (iii) Critic, which leverages vision-language models (VLM) with visual anchor prompts to refine spatial layout and ensure clarity. To support systematic evaluation, we build MMMC, a benchmark of professionally produced, discipline-specific educational videos. We evaluate MMMC across diverse dimensions, including VLM-as-a-Judge aesthetic scores, code efficiency, and particularly, TeachQuiz, a novel end-to-end metric that quantifies how well a VLM, after unlearning, can recover knowledge by watching the generated videos. Our results demonstrate the potential of Code2Video as a scalable, interpretable, and controllable approach, achieving 40% improvement over direct code generation and producing videos comparable to human-crafted tutorials. The code and datasets are available at https://github.com/showlab/Code2Video.
Community
TL;DR: Video Generation via Code.
ArXiv: https://arxiv.org/abs/2510.01174
Website: https://showlab.github.io/Code2Video/
Github: https://github.com/showlab/code2video
HF datasets: https://huggingface.co/datasets/YanzheChen/MMMC
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VideoAgent: Personalized Synthesis of Scientific Videos (2025)
- PixelCraft: A Multi-Agent System for High-Fidelity Visual Reasoning on Structured Images (2025)
- Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis (2025)
- LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction (2025)
- Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA (2025)
- Preacher: Paper-to-Video Agentic System (2025)
- Interleaving Reasoning for Better Text-to-Image Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper