Skip to content

Comments

fix(llm): try all available GPU types before falling back to CPU#237

Open
JonBasse wants to merge 1 commit intotobi:mainfrom
JonBasse:fix/gpu-fallback-chain
Open

fix(llm): try all available GPU types before falling back to CPU#237
JonBasse wants to merge 1 commit intotobi:mainfrom
JonBasse:fix/gpu-fallback-chain

Conversation

@JonBasse
Copy link

Summary

  • GPU initialization used .find() to pick only the first available GPU type (typically CUDA), and on failure fell directly to CPU — skipping Vulkan and Metal entirely
  • Changed to .filter() + loop to try each available GPU in priority order (CUDA → Metal → Vulkan → CPU)

Problem

On Linux systems where getLlamaGpuTypes() returns both cuda and vulkan, the current code picks cuda via .find(). When CUDA initialization fails (no toolkit installed), the catch block falls straight to gpu: false (CPU), never attempting Vulkan.

This affects any system with Vulkan GPU support but no CUDA toolkit — e.g., AMD GPUs with vulkan-radeon.

Fix

Replace .find() (first match) with .filter() + for loop (try all in order):

// Before: picks first, skips rest on failure
const preferred = (["cuda", "metal", "vulkan"] as const).find(g => gpuTypes.includes(g));

// After: tries each in priority order
const gpuOrder = (["cuda", "metal", "vulkan"] as const).filter(g => gpuTypes.includes(g));
for (const gpu of gpuOrder) {
  try {
    llama = await getLlama({ gpu, logLevel: LlamaLogLevel.error });
    break;
  } catch {
    process.stderr.write(`QMD Warning: ${gpu} reported available but failed to initialize. Trying next...\n`);
  }
}

Testing

Tested on Linux (Arch, kernel 6.19.3) with AMD Radeon 890M (RADV GFX1150, 47GB shared VRAM):

  • Before fix: CUDA fails → falls to CPU (0.26s BM25 only, 30+ min embedding, 3,361 errors)
  • After fix: CUDA fails → Vulkan succeeds → GPU acceleration working (3.5s hybrid queries, 4m 44s embedding, 0 errors)
Device
  GPU:      vulkan (offloading: yes)
  Devices:  AMD Radeon 890M Graphics (RADV GFX1150)
  VRAM:     47.1 GB free / 47.3 GB total

Fixes #213

The GPU initialization used `.find()` to pick only the first available
GPU type (typically CUDA), and on failure fell directly to CPU — skipping
Vulkan and Metal entirely. On Linux systems with Vulkan but no CUDA
toolkit, this meant queries always ran on CPU despite a working GPU.

Changed to `.filter()` + loop to try each available GPU in priority order
(CUDA > Metal > Vulkan) before falling back to CPU.

Fixes tobi#213

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GPU fallback skips Vulkan when CUDA fails — falls directly to CPU

1 participant