xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 100
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models Paper • 2209.07511 • Published Sep 15, 2022
What do Vision Transformers Learn? A Visual Exploration Paper • 2212.06727 • Published Dec 13, 2022 • 1
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens Paper • 2406.11271 • Published Jun 17, 2024 • 21
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Paper • 2310.19909 • Published Oct 30, 2023 • 21
On the Reliability of Watermarks for Large Language Models Paper • 2306.04634 • Published Jun 7, 2023 • 6
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models Paper • 2306.13651 • Published Jun 23, 2023 • 15