Flash-Attention 2.7.4 Prebuilt Wheels for NVIDIA Blackwell (cu128) on Windows

This repository provides prebuilt wheels for Flash-Attention 2.7.4 optimized for NVIDIA Blackwell GPUs (cu128 and cu129) on Windows systems. These wheels are compatible with Python 3.10 and 3.11, enabling seamless integration for high-performance attention mechanisms in deep learning workflows.

Available Wheels

flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl (Python 3.10) * pytorch 2.7 cu128
flash_attn-2.7.4.post1-cp311-cp311-win_amd64.whl (Python 3.11) * pytorch 2.7 cu128
flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl (Python 3.10) * pytorch 2.8 cu129

Compatibility

The prebuilt wheels are designed for NVIDIA Blackwell GPUs but have been tested and confirmed compatible with previous-generation NVIDIA GPUs, including:

NVIDIA RTX 5090
NVIDIA RTX 3090

Installation

To install, use pip with the appropriate wheel for your Python version:

pip install flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl
# or
pip install flash_attn-2.7.4.post1-cp311-cp311-win_amd64.whl

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support