vLLM Plugins

A monorepo containing multiple vLLM plugins for extending vLLM's functionality without maintaining a fork.

Repository Structure

vllm-plugins/
├── plugins/                        # Plugin packages
│   ├── vllm-plugin-example/       # Template plugin with best practices
│   ├── vllm-custom-models/        # Custom model architectures
│   ├── vllm-patches/              # Surgical patches for vLLM classes
│   ├── vllm-custom-loggers/       # Custom stat loggers for metrics
│   ├── vllm-entropy-decoder/      # Adaptive entropy-based decoding
│   └── vllm-cot-decoder/          # Confidence-weighted CoT decoding
├── shared/                         # Shared utilities
│   └── vllm_plugin_utils/         # Version checking, common helpers
├── scripts/                        # Development scripts
│   ├── build_all.sh               # Build all plugins
│   ├── test_all.sh                # Run all tests
│   ├── install_dev.sh             # Install all in dev mode
│   └── lint.sh                    # Lint all code
└── docs/                           # Documentation
    └── plugin-development.md      # Comprehensive development guide

Quick Start

Install All Plugins (Development)

./scripts/install_dev.sh

Install a Specific Plugin

cd plugins/vllm-custom-models
pip install -e ".[dev]"

Run Tests

./scripts/test_all.sh

Available Plugins

Plugin	Entry Point Group	Description
`vllm-plugin-example`	`vllm.general_plugins`	Template demonstrating best practices
`vllm-custom-models`	`vllm.general_plugins`	Register custom model architectures
`vllm-patches`	`vllm.general_plugins`	Apply surgical patches to vLLM classes
`vllm-custom-loggers`	`vllm.stat_logger_plugins`	Custom metrics and logging backends
`vllm-entropy-decoder`	`vllm.logits_processors`	Adaptive entropy-based decoding strategy
`vllm-cot-decoder`	`vllm.logits_processors`	Confidence-weighted Chain-of-Thought decoding

Plugin Types

vLLM supports several plugin entry point groups:

vllm.general_plugins: General extensions, custom models, patches
vllm.platform_plugins: Hardware backend integrations
vllm.stat_logger_plugins: Custom metrics/logging
vllm.logits_processors: Custom decoding strategies and logits manipulation (applies globally)
vllm.io_processor_plugins: Input/output processing

Note on logits processors: These plugins apply to ALL requests when installed. vLLM v1 does not support per-request selection. Deploy one strategy per server instance.

Usage Examples

Custom Models

After installing vllm-custom-models, your custom architectures are automatically available:

from vllm import LLM

# Load a custom model (architecture registered by plugin)
llm = LLM(model="path/to/custom/model")

Patches

Control which patches apply via environment variable:

# Enable specific patches
VLLM_CUSTOM_PATCHES=PrioritySchedulerPatch python app.py

# Enable all available patches
VLLM_CUSTOM_PATCHES=* python app.py

Custom Decoding Strategies

Important: Logits processor plugins apply globally to all requests when installed. vLLM v1 does not support per-request logits processor selection via the OpenAI API. Install only ONE decoding strategy plugin per deployment.

Use entropy-based or CoT-based decoding by installing the appropriate plugin:

# For entropy-based adaptive decoding
pip install -e plugins/vllm-entropy-decoder/

# OR for confidence-weighted CoT decoding (not both)
pip install -e plugins/vllm-cot-decoder/

Once installed, the decoder applies automatically to all requests:

from vllm import LLM, SamplingParams

llm = LLM(model="your-model")

# The installed decoder is applied automatically - no need to specify it
sampling_params = SamplingParams(temperature=0.8)

output = llm.generate("Your prompt here", sampling_params)

To switch strategies, use different Docker images or reinstall with the desired plugin.

Selective Plugin Loading

Load only specific plugins:

VLLM_PLUGINS=custom_models,vllm_patches python app.py

Development

Creating a New Plugin

Copy the example plugin as a template:

cp -r plugins/vllm-plugin-example plugins/vllm-my-plugin

Rename packages and update pyproject.toml
Implement your plugin logic in register.py
Add tests in tests/

See docs/plugin-development.md for the comprehensive development guide.

Key Guidelines

Re-entrancy: Plugin registration functions must be safe to call multiple times
Version compatibility: Always check/declare vLLM version requirements
Minimal changes: Patches should be surgical, not wholesale replacements
Graceful degradation: Handle missing dependencies gracefully

Documentation

Plugin Development Guide - Comprehensive guide with best practices
vLLM Plugin System - Official documentation
vLLM Plugin Blog Post - Architecture deep-dive

License

Apache-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
plugins		plugins
scripts		scripts
shared		shared
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vLLM Plugins

Repository Structure

Quick Start

Install All Plugins (Development)

Install a Specific Plugin

Run Tests

Available Plugins

Plugin Types

Usage Examples

Custom Models

Patches

Custom Decoding Strategies

Selective Plugin Loading

Development

Creating a New Plugin

Key Guidelines

Documentation

License

About

Uh oh!

Releases

Packages

Languages

License

BudEcosystem/vllm-plugins

Folders and files

Latest commit

History

Repository files navigation

vLLM Plugins

Repository Structure

Quick Start

Install All Plugins (Development)

Install a Specific Plugin

Run Tests

Available Plugins

Plugin Types

Usage Examples

Custom Models

Patches

Custom Decoding Strategies

Selective Plugin Loading

Development

Creating a New Plugin

Key Guidelines

Documentation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages