Papers
arxiv:2507.16713

Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory

Published on Jul 22
· Submitted by hba123 on Jul 23
Authors:
,
,
,
,
,

Abstract

Vision-language models (VLMs) have been widely adopted in robotics to enable autonomous planning. However, grounding VLMs, originally trained on internet data, to diverse real-world robots remains a challenge. This paper presents ExpTeach, a framework that grounds VLMs to physical robots by building a self-generated memory of real-world experiences. In ExpTeach, the VLM autonomously plans actions, verifies outcomes, reflects on failures, and adapts robot behaviors in a closed loop. The self-generated experiences during this process are then summarized into a long-term memory, enabling retrieval of learned knowledge to guide future tasks via retrieval-augmented generation (RAG). Additionally, ExpTeach enhances the spatial understanding of VLMs with an on-demand image annotation module. In experiments, we show that reflection improves success rates from 36% to 84% on four challenging robotic tasks and observe the emergence of intelligent object interactions, including creative tool use. Across extensive tests on 12 real-world scenarios (including eight unseen ones), we find that grounding with long-term memory boosts single-trial success rates from 22% to 80%, demonstrating the effectiveness and generalizability of ExpTeach.

Community

Paper author Paper submitter

Everyone wants VLMs in robotics, but well, VLMs are not trained or grounded in robotics! So, we fix that by presenting ExPTeach: Experience is the Best Teacher, where we ground VLMs using self-generated Memory.

It's remarkable to see how it can recover from failures by leveraging past experiences stored in memory.

Paper author Paper submitter

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.16713 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.16713 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.16713 in a Space README.md to link it from this page.

Collections including this paper 3