Multimodal/Vision LLMs - a youali Collection

youali 's Collections

LLMs

Multimodal/Vision LLMs

Standalone Neural Modules

Diffusion Modles

RL

Computer Vision

Multimodal/Vision LLMs

updated Dec 11, 2023

GLaMM: Pixel Grounding Large Multimodal Model

Paper • 2311.03356 • Published Nov 6, 2023 • 36
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Paper • 2311.03354 • Published Nov 6, 2023 • 8
CogVLM: Visual Expert for Pretrained Language Models

Paper • 2311.03079 • Published Nov 6, 2023 • 27
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework

Paper • 2311.10125 • Published Nov 16, 2023 • 6
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 27
Localized Symbolic Knowledge Distillation for Visual Commonsense Models

Paper • 2312.04837 • Published Dec 8, 2023 • 3