fix(gpu): try all available backends before falling back to CPU by haddadrm · Pull Request #240 · tobi/qmd

haddadrm · 2026-02-21T14:35:22Z

Problem

When the first preferred GPU backend (CUDA) is detected as available but fails to initialize, QMD fell straight to CPU - skipping other viable backends like Vulkan entirely.

This is a common scenario on machines where CUDA is installed but the prebuilt binary is incompatible:

Older Pascal-generation NVIDIA GPUs (GTX 10xx, compute_61) with newer CUDA toolkits
CUDA toolkit version mismatches
Systems where Vulkan works fine but CUDA does not

Solution

Instead of trying one backend and giving up, iterate all detected GPU backends in priority order before falling back to CPU:

cuda ? metal ? vulkan

This ensures:

Users with working CUDA/Metal always get the best backend (no regression)
Users whose CUDA fails gracefully fall through to Vulkan instead of CPU
Only hits CPU after every GPU backend has been attempted and failed

Changes

src/llm.ts: Rework ensureLlama() to loop through all available backends
Clear warnings listing which backends were tried and why each failed
Respect QMD_FORCE_CPU / FORCE_CPU env vars to skip GPU entirely
Proper LlamaGpuType casting (removes @ts-expect-error hack)

Tested on

GTX 1050 Ti (compute_61) + CUDA 13.1 + Vulkan, Node 25.2.1, Windows 11
Before: GPU: none (running on CPU)
After: GPU: vulkan (offloading: yes)

@ts-expect-error

Previously, when the first preferred GPU backend (e.g. CUDA) was detected as available but failed to initialize, QMD fell straight to CPU — skipping other viable backends like Vulkan entirely. This is a common scenario on machines where CUDA is installed but the prebuilt binary is incompatible (e.g. older Pascal GPUs, mismatched toolkit versions), while Vulkan is fully functional. Changes: - Try all detected GPU backends in priority order before falling to CPU: cuda -> metal -> vulkan This ensures CUDA/Metal users always get the best available backend, while Vulkan serves as a universal fallback. - Only fall back to CPU after every GPU backend has been attempted. - Emit clear, concise warnings listing which backends were tried and why each one failed — much easier to diagnose than a silent CPU fallback. - Respect QMD_FORCE_CPU / FORCE_CPU env vars to skip GPU selection entirely. - Remove @ts-expect-error hack; proper LlamaGpuType casting throughout. Tested on: GTX 1050 Ti (compute_61) + CUDA 13.1 + Vulkan, Node 25, Windows 11 Before: CPU fallback (GPU: none) After: GPU: vulkan (offloading: yes)

Copilot

Pull request overview

This PR improves GPU backend selection by attempting all available GPU backends (CUDA, Metal, Vulkan) in priority order before falling back to CPU, instead of trying only one backend and immediately falling back to CPU on failure.

Changes:

Implements a fallback loop that tries all detected GPU backends in order (CUDA → Metal → Vulkan) before falling back to CPU
Adds support for QMD_FORCE_CPU and FORCE_CPU environment variables to explicitly disable GPU
Improves error reporting with detailed messages about which backends were tried and why each failed

Comments suppressed due to low confidence (1)

src/llm.ts:572

When forceCpu is true, two warning messages will be printed: first "GPU disabled via QMD_FORCE_CPU/FORCE_CPU. Running on CPU." (line 561), then "no GPU acceleration, running on CPU (slow). Run 'qmd status' for details." (lines 569-571). This creates redundant output for users who explicitly requested CPU mode.

To fix, add && !forceCpu to the condition on line 568 to prevent the second message when CPU mode is explicitly requested.

      if (!llama.gpu) {
        process.stderr.write(
          "QMD Warning: no GPU acceleration, running on CPU (slow). Run 'qmd status' for details.\n"
        );
      }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 21, 2026 14:35

Copilot started reviewing on behalf of haddadrm February 21, 2026 14:35 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(gpu): try all available backends before falling back to CPU#240

fix(gpu): try all available backends before falling back to CPU#240
haddadrm wants to merge 1 commit intotobi:mainfrom
haddadrm:main

haddadrm commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

haddadrm commented Feb 21, 2026

Problem

Solution

Changes

Tested on

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant