VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding Paper • 2508.07493 • Published Aug 10 • 8
Towards Visual Text Grounding of Multimodal Large Language Model Paper • 2504.04974 • Published Apr 7 • 16
MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models Paper • 2506.23009 • Published Jun 28 • 10
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints Paper • 2402.04754 • Published Feb 7, 2024 • 1
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models Paper • 2407.19185 • Published Jul 27, 2024 • 2
MMR: Evaluating Reading Ability of Large Multimodal Models Paper • 2408.14594 • Published Aug 26, 2024 • 1
TextLap: Customizing Language Models for Text-to-Layout Planning Paper • 2410.12844 • Published Oct 9, 2024 • 1
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding Paper • 2411.01106 • Published Nov 2, 2024 • 4