Unmasked Teacher: Towards Training-Efficient Video Foundation Models Paper • 2303.16058 • Published Mar 28, 2023
Harvest Video Foundation Models via Efficient Post-Pretraining Paper • 2310.19554 • Published Oct 30, 2023
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Paper • 2311.17005 • Published Nov 28, 2023 • 2
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation Paper • 2307.06942 • Published Jul 13, 2023 • 23
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer Paper • 2211.09552 • Published Nov 17, 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning Paper • 2212.03191 • Published Dec 6, 2022
DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models Paper • 2412.04446 • Published Dec 5, 2024
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation Paper • 2506.03126 • Published Jun 3 • 22
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published Dec 5, 2024 • 16
Moto: Latent Motion Token as the Bridging Language for Robot Manipulation Paper • 2412.04445 • Published Dec 5, 2024 • 23