arxiv:2507.03117

BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers

Published on Jul 3

Authors:

Sameer Deshmukh ,

Abstract

BLaST is a sparsification method for linear layers that achieves high sparsity with minimal accuracy loss, leading to significant speedups and memory reductions in large-scale ML models.

AI-generated summary

The energy consumption of large-scale ML models is dominated by data movement - shuffling billions of parameters across memory hierarchies and data centers. Effective sparsification to prune redundant parameters is still challenging: existing methods incur significant accuracy degradation, performance overhead, or both. We introduce (Bl)ock (a)nd (S)parse (T)ransformers (BLaST), a general, robust, and reliable sparsification method applicable to linear layers in all settings. Our method iteratively sparsifies weight matrices into a block sparsity pattern suitable for efficient sparse matrix-matrix (SpMM) multiplication. BLaST achieves up to 95% sparsity in MLP weights with negligible accuracy loss. Our fused, highly optimized Sparse MLP kernel delivers up to 16.7x speedup over dense MLPs across 9 architectures and 8 datasets, resulting in up to 1.6x inference speedup, 1.11x pretraining speedup and up to 3.12x inference memory usage reduction. BLaST enables the next generation of large-scale AI systems by reducing energy use, memory footprint, and latency.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2507.03117 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.03117 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.03117 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.