Vision Language Model - a kiennt Collection

kiennt 's Collections

Vision Language Model

Vision Language Model

updated Nov 12, 2023

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

Paper • 2311.00571 • Published Nov 1, 2023 • 43
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

Paper • 2311.00047 • Published Oct 31, 2023 • 10
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

Paper • 2310.08825 • Published Oct 13, 2023 • 1
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

Paper • 2311.05332 • Published Nov 9, 2023 • 13
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51