Skip to content
GitHub Action edited this page Aug 3, 2025 · 1 revision

VulkanIlm ๐Ÿš€๐Ÿ”ฅ โ€“ GPU-Accelerated Local LLMs for Everyone

GitHub stars GitHub forks GitHub license Python version GitHub issues

VulkanIlm (from "Vulkan" ๐Ÿ”ฅ + "Ilm" ๐Ÿ“š, meaning "knowledge" in Urdu/Arabic) is a Python library that brings GPU-accelerated local LLM inference to AMD and Intel GPU usersโ€”no CUDA required.

Built specifically for developers with legacy GPUs, VulkanIlm enables blazing-fast local inference using llama.cpp's Vulkan backend.


๐ŸŽฏ Why VulkanIlm?

Not everyone has an NVIDIA GPU. Everyone deserves fast local AI.

The Problem: Most GPU acceleration libraries focus on CUDA, leaving AMD/Intel users with slow CPU-only inference.

The Solution: VulkanIlm democratizes GPU-accelerated AI for all GPU brands using Vulkan.

Key Features

  • ๐Ÿš€ 4-6x faster than CPU inference on legacy GPUs
  • ๐ŸŽฎ Universal GPU support: AMD, Intel, and NVIDIA
  • ๐Ÿ Python-first with simple CLI tools
  • โšก Auto-detection and optimization for your GPU
  • ๐Ÿ“ฆ Zero-config installation with auto-building
  • ๐Ÿ”„ Real-time streaming token generation

๐Ÿ“Š Performance Results

Hardware CPU Performance Vulkan Performance Speedup
AMD RX 580 8GB 188.47s 44.74s 4.21x
Intel Arc A770 ~120s ~25s 4.8x
AMD RX 6600 ~90s ~18s 5.0x

Benchmarked with Gemma-3n-E4B-it model (6.9B parameters)


๐Ÿ“ฆ Installation

Quick Start

git clone https://github.com/Talnz007/VulkanIlm.git
cd VulkanIlm
pip install -e .

Prerequisites

  • Python 3.9+
  • Vulkan-capable GPU (AMD RX 400+, Intel Arc/Xe, NVIDIA GTX 900+)
  • Vulkan drivers installed

Install Vulkan Drivers (if needed)

# Ubuntu/Debian
sudo apt install vulkan-tools libvulkan-dev

# Fedora/RHEL
sudo dnf install vulkan-tools vulkan-devel

# Test installation
vulkaninfo

๐Ÿš€ Usage

CLI Interface

# Auto-install llama.cpp with Vulkan support
vulkanilm install

# Check your GPU setup
vulkanilm vulkan-info

# Download and use models
vulkanilm search "llama"
vulkanilm download microsoft/DialoGPT-medium

# Generate text
vulkanilm ask model.gguf --prompt "Explain quantum computing"

# Stream in real-time
vulkanilm stream model.gguf "Tell me a story about AI"

# Benchmark CPU vs GPU performance
vulkanilm benchmark "Test prompt"

Python API

from vulkan_ilm import Llama

# Load model with automatic GPU optimization
llm = Llama("path/to/model.gguf", gpu_layers=16)

# Generate text
response = llm.ask("Explain the term 'ilm' in AI context.")
print(response)

# Stream tokens in real-time
for token in llm.stream_ask_real("Tell me about Vulkan API"):
    print(token, end='', flush=True)

๐ŸŽฎ Supported Hardware

Tested GPUs

Brand Models Status Performance
AMD RX 580, RX 590, RX 6600, RX 6700 โœ… Excellent 4-5x speedup
Intel Arc A770, Arc A750, Xe Graphics โœ… Great 4-5x speedup
NVIDIA GTX 1060+, RTX series โœ… Excellent 4-6x speedup

๐Ÿ› ๏ธ Troubleshooting

Common Issues

โŒ vulkanilm: command not found

# Ensure you installed the package correctly
pip install -e .
poetry install  # if using Poetry

# Check if CLI is available
which vulkanilm

โŒ No Vulkan support detected

# Install Vulkan tools and test
sudo apt install vulkan-tools libvulkan-dev
vulkaninfo

# Update GPU drivers

โŒ Model hangs or uses too much VRAM

# Use fewer GPU layers
vulkanilm ask model.gguf --gpu-layers 8 -p "test"

# Or force CPU mode for testing
vulkanilm ask model.gguf --cpu -p "test"

๐Ÿ—๏ธ Architecture

VulkanIlm/
โ”œโ”€โ”€ vulkan_ilm/
โ”‚   โ”œโ”€โ”€ cli.py              # Command-line interface
โ”‚   โ”œโ”€โ”€ llama.py            # Main Python API
โ”‚   โ”œโ”€โ”€ vulkan/
โ”‚   โ”‚   โ””โ”€โ”€ detector.py     # GPU detection & optimization
โ”‚   โ”œโ”€โ”€ benchmark.py        # Performance testing
โ”‚   โ”œโ”€โ”€ installer.py        # Auto-build llama.cpp
โ”‚   โ””โ”€โ”€ streaming.py        # Real-time token streaming
โ”œโ”€โ”€ pyproject.toml          # Poetry configuration
โ””โ”€โ”€ README.md

๐Ÿค Contributing

We welcome contributions! Areas where you can help:

  • GPU Testing: Test on different AMD/Intel/NVIDIA cards
  • Model Support: Add support for new model formats
  • Performance: Optimize memory usage and speed
  • Documentation: Improve guides and examples

See CONTRIBUTING.md for details.


๐Ÿงพ The Story Behind "VulkanIlm"

In South Asian and Islamic culture, "Ilm" (ุนู„ู…) represents knowledge, wisdom, and enlightenment.

Combined with Vulkanโ€”a high-performance GPU APIโ€”this project embodies "knowledge on fire" ๐Ÿ”ฅ: making advanced AI accessible to everyone, regardless of their GPU brand or budget.

Our mission: Democratize local AI inference for the global developer community.


๐Ÿ“„ License

MIT License - see LICENSE for details.


๐Ÿ“ž Support & Links


๐Ÿ”ฅ Built with passion by @Talnz007 โ€” Bringing fast, local AI to legacy GPUs everywhere