metadata
license: apache-2.0
language:
- en
base_model:
- stable-diffusion-v1-5/stable-diffusion-v1-5
- liuhaotian/llava-llama-2-13b-chat-lightning-preview
tags:
- Image-to-Image
- Action-Generation
- HOI
- Egocentric-Vision
- Vision-Language-Model
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
ECCV 2024 (Oral, Best Paper Finalist)
Project Page | Paper | Dataset | Code
Bolin Lai, Xiaoliang Dai, Lawrence Chen, Guan Pang, James M. Rehg, Miao Liu

This repo is the model weights finetuned on Ego4D for our paper "LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning".
Please refer to the code on github for detailed instructions on how to use it. More repos are available in this collection.
If you find LEGO useful for your work, please cite using this BibTeX.
@inproceedings{lai2024lego,
title={Lego: Learning egocentric action frame generation via visual instruction tuning},
author={Lai, Bolin and Dai, Xiaoliang and Chen, Lawrence and Pang, Guan and Rehg, James M and Liu, Miao},
booktitle={European Conference on Computer Vision},
pages={135--155},
year={2024},
organization={Springer}
}