Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
youali 's Collections
LLMs
Multimodal/Vision LLMs
Standalone Neural Modules
Transformers
Diffusion Modles
Graphics/3D
RL
Computer Vision
Efficient ML

Multimodal/Vision LLMs

updated Dec 11, 2023
Upvote
-

  • GLaMM: Pixel Grounding Large Multimodal Model

    Paper • 2311.03356 • Published Nov 6, 2023 • 36

  • CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

    Paper • 2311.03354 • Published Nov 6, 2023 • 8

  • CogVLM: Visual Expert for Pretrained Language Models

    Paper • 2311.03079 • Published Nov 6, 2023 • 27

  • UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework

    Paper • 2311.10125 • Published Nov 16, 2023 • 6

  • Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

    Paper • 2311.10122 • Published Nov 16, 2023 • 27

  • Localized Symbolic Knowledge Distillation for Visual Commonsense Models

    Paper • 2312.04837 • Published Dec 8, 2023 • 3
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs