COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Paper • 2504.21850 • Published Apr 30 • 27
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images Paper • 2504.09621 • Published Apr 13 • 12
Attention IoU: Examining Biases in CelebA using Attention Maps Paper • 2503.19846 • Published Mar 25 • 7
Unifying Specialized Visual Encoders for Video Language Models Paper • 2501.01426 • Published Jan 2 • 21
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Paper • 2410.03051 • Published Oct 4, 2024 • 6