gpu-inference

Here are 8 public repositories matching this topic...

Talnz007 / VulkanIlm

GPU-accelerated LLaMA inference wrapper for legacy Vulkan-capable systems a Pythonic way to run AI with knowledge (Ilm) on fire (Vulkan).

machine-learning vulkan python-wrapper fastai amd-gpu intel-gpu llama-cpp gpu-inference llm-inference localllm local-ai open-source-llm llama-cpp-python gguf legacy-gpus

Updated Oct 14, 2025
Python

redbco / infermesh

Star

GPU-aware inference mesh for large-scale AI serving

rust distributed-systems fault-tolerance high-availability service-mesh observability inference-engine model-serving ml-infrastructure ai-inference gpu-inference ai-infrastructure gpu-mesh

Updated Sep 25, 2025
Rust

🚀 ClipServe: A fast API server for embedding text, images, and performing zero-shot classification using OpenAI’s CLIP model. Powered by FastAPI, Redis, and CUDA for lightning-fast, scalable AI applications. Transform texts and images into embeddings or classify images with custom labels—all through easy-to-use endpoints. 🌐📊

docker redis docker-compose cuda transformers python3 gradio fastapi openai-clip gpu-inference

Updated Sep 29, 2024
Python

mpaepper / docker-cifar

Star

Docker based GPU inference of machine learning models

deep-learning pytorch gpu-inference

Updated May 9, 2019
Python

jozsefszalma / intranet_image_generator

Star

Generating images with diffusion models on a mobile device, with an intranet GPU box as backend

python fun react-native backend rest-api mobile-app pytorch image-generation diffusion prompt-engineering huggingface-diffusers gpu-inference

Updated Apr 13, 2023
Jupyter Notebook

FurkanAtass / Scalable-ML-Inference-Eks

Star

End-to-end scalable ML inference on EKS: KEDA-driven pod autoscaling with Prometheus custom metrics, Cluster Autoscaler for GPU node scaling, and NVIDIA GPU time-slicing to run multiple pods per GPU.

kubernetes machine-learning terraform scalability prometheus aws-eks mlops keda gpu-inference

Updated Aug 29, 2025
HCL

FlosMume / CUDA-AI-Inference-Starter

Star

A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.

benchmarking deep-learning cuda nvidia high-performance-computing fp16 int8 inference-engine onnx tensor-rt devtech model-optimization gpu-inference ai-deployment adge-ai

Updated Nov 2, 2025
Cuda

evz / deepseek_ocr_server

Star

A tiny Python / ZMQ server to faciliate experimenting with DeepSeek-OCR

ocr zeromq document-processing gpu-inference deepseek

Updated Oct 25, 2025
Python

Improve this page

Add a description, image, and links to the gpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the gpu-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpu-inference

Here are 8 public repositories matching this topic...

Talnz007 / VulkanIlm

redbco / infermesh

Armaggheddon / ClipServe

mpaepper / docker-cifar

jozsefszalma / intranet_image_generator

FurkanAtass / Scalable-ML-Inference-Eks

FlosMume / CUDA-AI-Inference-Starter

evz / deepseek_ocr_server

Improve this page

Add this topic to your repo