Papers
arxiv:2507.09071

BlindSight: Harnessing Sparsity for Efficient VLMs

Published on Jul 11
Authors:
,

Abstract

BlindSight optimizes VLM inference by applying an attention sparsity mask derived from dataset samples, reducing FLOPs with minimal impact on accuracy.

AI-generated summary

Large vision-language models (VLMs) enable the joint processing of text and images. However, the inclusion of vision data significantly expands the prompt length. Along with the quadratic complexity of the attention computation, this results in a longer prefill duration. An approach to mitigate this bottleneck is to leverage the inherent sparsity in the attention computation. In our analysis of attention patterns in VLMs, we observe that a substantial portion of layers exhibit minimal cross-image attention, except through attention-sink tokens per image. These sparse attention patterns fall into distinct categories: sink-only, document mask and a hybrid document-sink mask. Based on this, we propose BlindSight: a training-free approach to optimize VLM inference using a input template-aware attention sparsity mask. We utilize samples from a dataset to derive a prompt-agnostic sparsity categorization for every attention head. We evaluate the proposed technique using VLMs such as Qwen2-VL, Qwen2.5-VL and Gemma-3. BlindSight results in a 32%-41% reduction in FLOPs on average with -2%-+2% accuracy compared to the original model in most evaluated multi-image understanding benchmarks.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.09071 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.09071 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.09071 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.