vLLM-Tuner

An intelligent tuner for vLLM that automatically monitors GPU metrics, uses Bayesian optimization to tune parameters (batch_size, max_num_batched_tokens, max_num_seqs, gpu_memory_utilization) to maximize throughput while minimizing latency and balancing memory, respecting user-provided constraints.

Features

Intelligent Profiling: Monitor GPU memory, utilization, and vLLM metrics automatically
Adaptive Parameter Search: Bayesian optimization (Optuna) with multi-objective support (throughput, latency, memory)
vLLM-Aware Integration: Parse vLLM logs for KV cache utilization, preemption tracking, and guidance
Multi-GPU Support: Handle data-parallel and model-parallel (tensor/pipeline) configurations
User-Friendly Configuration: Simple YAML configs to specify objectives and constraints
Rich Reporting: Plotly interactive HTML reports with trial progression, Pareto front, and GPU telemetry
Extensibility: Custom workloads and plugins for specific deployment scenarios

Installation

# Create and activate uv environment
uv venv --seed --python 3.10
source .venv/bin/activate

# Install vllm-tuner
uv pip install git+https://github.com/jranaraki/vllm-tuner

# Install vLLM
uv pip install vllm --torch-backend=auto

Configuration

Configuration is done via YAML file, see default.yaml, and here are the key settings:

Multi-Objective Weights (must sum to 100)

objectives:
  throughput: 60  # Weight for throughput maximization
  latency: 30     # Weight for latency minimization
  memory: 10      # Weight for memory efficiency

Search Space

search_space:
  batch_size: [1, 256]  # Range or override defaults
  gpu_memory_utilization: [0.6, 0.99]
  tensor_parallel_size: [1, 2, 4]

Workload

workload:
  dataset_name: "tatsu-lab/alpaca"  # HF dataset
  sample_size: 100                  # Number of prompts
  concurrent_requests: 10           # Concurrent clients

Store the config file under configs folder.

Run

Basic Tuning

# Run tuning study
vllm-tuner tune --config config/default.yaml --study-name my_study

Output Structure

Studies and reports are saved to studies/<study_name>/ and reports, respectively:

├── configs
│		 └── default.yaml                   # vLLM-Tuner config
├── reports
│		 └── my_study
│		     └── report.html                # Interactive Plotly report
└── studies
    └── my_study
        ├── baseline                        # Baseline metrics (if enabled)
        │		 ├── baseline_config.yaml
        │		 ├── baseline_metrics.json
        │		 ├── baseline_summary.txt
        │		 └── logs
        │		     └── vllm_baseline.log
        ├── configs                         # Summary & best configs
        │		 ├── best_config.json
        │		 ├── best_config.yaml
        │		 ├── summary.json
        │		 └── trials.json
        ├── logs                            # vLLM server logs
        │		 ├── vllm_trial_0.log
        │		 ├── vllm_trial_1.log
        │		 ├── vllm_trial_2.log
        │		 ├── vllm_trial_3.log
        │        ├── vllm_trial_4.log
        │		 └── ...
        └── optuna.db                       # SQLite study database

And the final report is as follows:

Documentation

For detailed information, see the comprehensive documentation.

Citing

If you find vllm-tuner useful and interested in citing this work, please use the following BibTex entry:

@software{vllmtuner2026,
  author = {Javad Anaraki},
  title = {vllm-tuner: Automated Parameter Tuning for vLLM via Bayesian Optimization},
  url = {https://github.com/jranaraki/vllm-tuner},
  version = {0.1.0},
  year = {2026},
}

Acknowledgments

Optuna for Bayesian optimization
vLLM for high-performance serving
Hugging Face Datasets for workloads

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
config		config
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
TESTING.md		TESTING.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM-Tuner

Features

Installation

Configuration

Multi-Objective Weights (must sum to 100)

Search Space

Workload

Run

Basic Tuning

Output Structure

Documentation

Citing

Acknowledgments

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vLLM-Tuner

Features

Installation

Configuration

Multi-Objective Weights (must sum to 100)

Search Space

Workload

Run

Basic Tuning

Output Structure

Documentation

Citing

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages