fix(llm): try all available GPU types before falling back to CPU by JonBasse · Pull Request #237 · tobi/qmd

JonBasse · 2026-02-21T07:02:54Z

Summary

GPU initialization used .find() to pick only the first available GPU type (typically CUDA), and on failure fell directly to CPU — skipping Vulkan and Metal entirely
Changed to .filter() + loop to try each available GPU in priority order (CUDA → Metal → Vulkan → CPU)

Problem

On Linux systems where getLlamaGpuTypes() returns both cuda and vulkan, the current code picks cuda via .find(). When CUDA initialization fails (no toolkit installed), the catch block falls straight to gpu: false (CPU), never attempting Vulkan.

This affects any system with Vulkan GPU support but no CUDA toolkit — e.g., AMD GPUs with vulkan-radeon.

Fix

Replace .find() (first match) with .filter() + for loop (try all in order):

// Before: picks first, skips rest on failure
const preferred = (["cuda", "metal", "vulkan"] as const).find(g => gpuTypes.includes(g));

// After: tries each in priority order
const gpuOrder = (["cuda", "metal", "vulkan"] as const).filter(g => gpuTypes.includes(g));
for (const gpu of gpuOrder) {
  try {
    llama = await getLlama({ gpu, logLevel: LlamaLogLevel.error });
    break;
  } catch {
    process.stderr.write(`QMD Warning: ${gpu} reported available but failed to initialize. Trying next...\n`);
  }
}

Testing

Tested on Linux (Arch, kernel 6.19.3) with AMD Radeon 890M (RADV GFX1150, 47GB shared VRAM):

Before fix: CUDA fails → falls to CPU (0.26s BM25 only, 30+ min embedding, 3,361 errors)
After fix: CUDA fails → Vulkan succeeds → GPU acceleration working (3.5s hybrid queries, 4m 44s embedding, 0 errors)

Device
  GPU:      vulkan (offloading: yes)
  Devices:  AMD Radeon 890M Graphics (RADV GFX1150)
  VRAM:     47.1 GB free / 47.3 GB total

Fixes #213

The GPU initialization used `.find()` to pick only the first available GPU type (typically CUDA), and on failure fell directly to CPU — skipping Vulkan and Metal entirely. On Linux systems with Vulkan but no CUDA toolkit, this meant queries always ran on CPU despite a working GPU. Changed to `.filter()` + loop to try each available GPU in priority order (CUDA > Metal > Vulkan) before falling back to CPU. Fixes tobi#213 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix(llm): try all available GPU types before falling back to CPU#237

fix(llm): try all available GPU types before falling back to CPU#237
JonBasse wants to merge 1 commit intotobi:mainfrom
JonBasse:fix/gpu-fallback-chain

JonBasse commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

JonBasse commented Feb 21, 2026

Summary

Problem

Fix

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant