Precompiled Windows wheels for rare Python / PyTorch / CUDA combinations that are no longer commonly supported by upstream projects.
This repository focuses on Python 3.10 (cp310), PyTorch 2.9.1, and CUDA 13.0 (cu130) — combinations that are increasingly hard to find prebuilt binaries for.
Important, you'll need PyTorch 2.9.1 CUDA 13.0, install command :
pip install torch==2.9.1 torchvision==0.24.1 torchaudio==2.9.1 --index-url https://download.pytorch.org/whl/cu130Useful tips (optional, if needed) : You can install compatible triton-windows using this command :
pip install triton-windows==3.5.1.post24You can install compatible xformers using this command :
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu130This repository currently provides precompiled Windows wheels for:
- flash-attn 2.8.3
- sageattention 2.2.0
- llama-cpp-python 0.3.16
All wheels:
- Target Windows x86_64
- Are built for Python 3.10 (cp310)
- Expect PyTorch 2.9.1
- Target CUDA 13.0 (cu130)
- Built and validated on RTX 5090 / RTX 5080
- Expected to be backwards compatible with earlier NVIDIA GPUs (driver/runtime permitting)
- Go to the Releases page of this repository
- Download the
.whlfile you need - Install it locally:
pip install flash_attn-2.8.3-cp310-cp310-win_amd64.whl
pip install sageattention-2.2.0-cp310-cp310-win_amd64.whl
pip install llama_cpp_python-0.3.16-cp310-cp310-win_amd64.whlOption B — Install directly from GitHub (advanced) : Flash attention :
pip install "https://github.com/MarkOrez/essential-wheels/releases/download/flash_attn-2.8.3-cp310-cp310-win_amd64.whl"Sage attention :
pip install "https://github.com/MarkOrez/essential-wheels/releases/download/sageattention-2.2.0-cp310-cp310-win_amd64.whl"Llama cpp python :
pip install "https://github.com/MarkOrez/essential-wheels/releases/download/llama_cpp_python-0.3.16-cp310-cp310-win_amd64.whl"These wheels are not official upstream builds
They are provided for convenience for users stuck on specific environments
CUDA / driver compatibility depends on your local system
📜 Licensing & attribution
Each wheel is built from its respective upstream project. Please refer to the original repositories for their licenses and source code:
FlashAttention
SageAttention
llama-cpp-python
This repository redistributes binaries for convenience only.