Generate video from audio and image
Audio-Driven Multi-Person Conversational Video Generation
4D Motion Tokenization for Open-World Human Image Animation