GPU-aware inference mesh for large-scale AI serving
-
Updated
Aug 30, 2025 - Rust
GPU-aware inference mesh for large-scale AI serving
🚀 ClipServe: A fast API server for embedding text, images, and performing zero-shot classification using OpenAI’s CLIP model. Powered by FastAPI, Redis, and CUDA for lightning-fast, scalable AI applications. Transform texts and images into embeddings or classify images with custom labels—all through easy-to-use endpoints. 🌐📊
Docker based GPU inference of machine learning models
Generating images with diffusion models on a mobile device, with an intranet GPU box as backend
End-to-end scalable ML inference on EKS: KEDA-driven pod autoscaling with Prometheus custom metrics, Cluster Autoscaler for GPU node scaling, and NVIDIA GPU time-slicing to run multiple pods per GPU.
Add a description, image, and links to the gpu-inference topic page so that developers can more easily learn about it.
To associate your repository with the gpu-inference topic, visit your repo's landing page and select "manage topics."