vllm

Here are 457 public repositories matching this topic...

meta-llama / llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services

python machine-learning ai pytorch llama finetuning llm langchain vllm llama2

Updated Nov 3, 2025
Jupyter Notebook

xorbitsai / inference

Star

Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-prem, or your laptop — all through one unified, production-ready inference API.

Updated Jan 5, 2026
Python

OpenRLHF / OpenRLHF

Star

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

reinforcement-learning raylib transformers proximal-policy-optimization large-language-models reinforcement-learning-from-human-feedback vllm openai-o1

Updated Jan 6, 2026
Python

LMCache / LMCache

Star

Supercharge Your LLM with the Fastest KV Cache Layer

fast amd cuda inference pytorch speed rocm kv-cache llm vllm

Updated Jan 5, 2026
Python

katanaml / sparrow

Sponsor

Star

Structured data extraction and instruction calling with ML, LLM and Vision LLM

computer-vision machinelearning gpt nlp-machine-learning rag huggingface-transformers llm vllm

Updated Dec 19, 2025
Python

kserve / kserve

Star

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

Updated Jan 2, 2026
Go

xlite-dev / Awesome-LLM-Inference

Star

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Nov 28, 2025
Python

kvcache-ai / Mooncake

Star

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

inference rdma disaggregation llm vllm sglang kvcache

Updated Jan 5, 2026
C++

gpustack / gpustack

Star

Performance-Optimized AI Inference on Your GPUs. Unlock it by selecting and tuning the optimal inference engine for your model.

cuda inference openai llama maas rocm ascend llm llm-serving vllm genai llm-inference qwen deepseek sglang distributed-inference high-performance-inference mindie

Updated Jan 5, 2026
Python

skyzh / tiny-llm

Star

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

python course serving llm large-language-model vllm qwen qwen2

Updated Dec 18, 2025
Python

PaddlePaddle / FastDeploy

Star

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

inference openai serving ernie llm llm-serving vllm ernie-45 ernie-45-vl

Updated Jan 5, 2026
Python

vllm-project / semantic-router

Sponsor

Star

System Level Intelligent Router for Mixture-of-Models

Updated Jan 5, 2026
Go

containers / ramalama

Star

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

Updated Jan 5, 2026
Python

OpenBMB / UltraRAG

Star

UltraRAG v2: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines

mcp openai easy gpt embedding rag sentence-transformers jina llm vllm qwen deepseek bge-m3 mcp-server mcp-client

Updated Jan 5, 2026
Python

mostlygeek / llama-swap

Star

Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc

golang openai llama openai-api llamacpp vllm localllm localllama

Updated Jan 1, 2026
Go

apconw / sanic-web

Star

一个轻量级、支持全链路且易于二次开发的大模型应用项目(Large Model Data Assistant) 支持DeepSeek/Qwen3等大模型基于 Dify 、LangChain/LangGraph、Ollama&Vllm、Sanic 和 Text2SQL 📊 等技术构建的一站式大模型应用开发项目，采用 Vue3、TypeScript 和 Vite 5 打造现代UI。它支持通过 ECharts 📈 实现基于大模型的数据图形化问答，具备处理 CSV 文件 📂 表格问答的能力。同时，能方便对接第三方开源 RAG 系统检索系统 🌐等，以支持广泛的通用知识问答。

python neo4j mcp bigdata sanic echarts dify vue3 text2sql llm langchain vllm ollama qwen lamaindex langgraph-python deepseek-r1

Updated Dec 27, 2025
JavaScript

vllm-project / vllm-ascend

Sponsor

Star

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-serving mlops ascend llm llmops llm-serving vllm

Updated Jan 6, 2026
Python

bricks-cloud / BricksLLM

Star

🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and monitoring per user, application, or environment. Supports OpenAI, Azure OpenAI, Anthropic, vLLM, and open-source LLMs.

api docker golang open-source security privacy ai azure rest-api postgresql self-hosted artificial-intelligence ycombinator openai gpt llm generative-ai anthropic vllm

Updated Jan 5, 2025
Go

kubeai-project / kubeai

Star

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.

kubernetes ai k8s whisper autoscaler openai-api llm vllm faster-whisper ollama vllm-operator ollama-operator inference-operator

Updated Dec 15, 2025
Go

prometheus-eval / prometheus-eval

Star

Evaluate your LLM's response with Prometheus and GPT4 💯

python evaluation gpt4 llm llmops vllm litellm llm-as-a-judge llm-as-evaluator

Updated Apr 25, 2025
Python

Improve this page

Add a description, image, and links to the vllm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm

Here are 457 public repositories matching this topic...

meta-llama / llama-cookbook

xorbitsai / inference

OpenRLHF / OpenRLHF

LMCache / LMCache

katanaml / sparrow

kserve / kserve

xlite-dev / Awesome-LLM-Inference

kvcache-ai / Mooncake

gpustack / gpustack

skyzh / tiny-llm

PaddlePaddle / FastDeploy

vllm-project / semantic-router

containers / ramalama

OpenBMB / UltraRAG

mostlygeek / llama-swap

apconw / sanic-web

vllm-project / vllm-ascend

bricks-cloud / BricksLLM

kubeai-project / kubeai

prometheus-eval / prometheus-eval

Improve this page

Add this topic to your repo