arxiv:2506.18851

Phantom-Data : Towards a General Subject-Consistent Video Generation Dataset

Published on Jun 23

· Submitted by

ZhuoweiChen on Jun 24

Upvote

Authors:

Zhuowei Chen ,

Bingchuan Li ,

Tianxiang Ma ,

Mingcong Liu ,

Xinghui Li ,

Xinglong Wu

Abstract

A cross-pair dataset called Phantom-Data improves subject-to-video generation by enhancing prompt alignment and visual quality while maintaining identity consistency.

AI-generated summary

Subject-to-video generation has witnessed substantial progress in recent years. However, existing models still face significant challenges in faithfully following textual instructions. This limitation, commonly known as the copy-paste problem, arises from the widely used in-pair training paradigm. This approach inherently entangles subject identity with background and contextual attributes by sampling reference images from the same scene as the target video. To address this issue, we introduce Phantom-Data, the first general-purpose cross-pair subject-to-video consistency dataset, containing approximately one million identity-consistent pairs across diverse categories. Our dataset is constructed via a three-stage pipeline: (1) a general and input-aligned subject detection module, (2) large-scale cross-context subject retrieval from more than 53 million videos and 3 billion images, and (3) prior-guided identity verification to ensure visual consistency under contextual variation. Comprehensive experiments show that training with Phantom-Data significantly improves prompt alignment and visual quality while preserving identity consistency on par with in-pair baselines.

View arXiv page View PDF Project page GitHub 64 Add to collection

Community

ZhuoweiChen

Paper author Paper submitter Jun 24

https://phantom-video.github.io/Phantom-Data/

ZhuoweiChen

Paper author Paper submitter Jun 24

d537319udhjgj

25 days ago

דרמה

d537319udhjgj

25 days ago

מתח , דרמה , , spank

d537319udhjgj

25 days ago

A young man wearing black wide-leg pants, a black jacket, layered silver chains, and rectangular sunglasses stands confidently at a gas station under a clear blue sky. The sunlight flares behind him, creating a dramatic lens glow. The camera is positioned low and close, looking slightly up toward him, capturing his assertive stance and expression. Suddenly, the camera begins a rapid and smooth zoom-out, while the man remains completely still. The scene expands quickly through the gas station, over the surrounding desert, climbing high above the landscape. The zoom continues beyond the atmosphere. In the final view, we see a single, photorealistic Earth slowly rotating in the darkness of space, with no duplicated elements or overlays. A soft glowing dot marks the man’s original position on the planet.