MIBURI: Towards Expressive Interactive Gesture Synthesis Paper • 2603.03282 • Published 17 days ago • 4
BBQ-to-Image: Numeric Bounding Box and Qolor Control in Large-Scale Text-to-Image Models Paper • 2602.20672 • Published 24 days ago • 9
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant Paper • 2603.01059 • Published 19 days ago • 1
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions Paper • 2603.03447 • Published 17 days ago • 36
ID-LoRA: Identity-Driven Audio-Video Personalization with In-Context LoRA Paper • 2603.10256 • Published 10 days ago • 19
Accent Vector: Controllable Accent Manipulation for Multilingual TTS Without Accented Data Paper • 2603.07534 • Published 12 days ago • 5
EvoTok: A Unified Image Tokenizer via Residual Latent Evolution for Visual Understanding and Generation Paper • 2603.12108 • Published 8 days ago • 8
Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation Paper • 2603.12793 • Published 7 days ago • 36
Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training Paper • 2603.16139 • Published 4 days ago • 29