Ollama Bench

A comprehensive benchmarking and optimization tool for Ollama models with both Terminal User Interface (TUI) and Command-Line Interface (CLI) modes.

Author

Jeremiah Pegues jeremiah@pegues.io

Features

🚀 Core Features

Dual Interface: Full-featured TUI with real-time graphs or simple CLI mode
Model Benchmarking: Compare performance across multiple Ollama models
System Optimization: Automatically tune models for your hardware
Resource Monitoring: Real-time CPU, GPU, and RAM usage tracking
Batch Processing: Optimize multiple models in parallel
Export Results: Save benchmark data in CSV format

🎯 Version 2.0 Features

Modelfile Optimization: Generate optimized configurations based on system specs
Batch Model Optimization: Optimize all models with a single command
Hardware Detection: Automatic detection of CPU, RAM, and GPU capabilities
Platform-Specific Tuning: Special optimizations for Apple Silicon
Performance Profiling: Detailed metrics including tokens/sec and memory usage

Installation

From Source

git clone https://github.com/peguesj/ollama-bench.git
cd ollama-bench
pip install -e .

Dependencies

pip install psutil pynvml py-cpuinfo

Quick Start

TUI Mode (Interactive)

ollama-bench
# or
python -m ollama_bench

CLI Mode (Non-Interactive)

ollama-bench --cli

Direct Scripts

# Optimize a single model
python optimize_model.py llama2

# Optimize all models
python optimize_all.py --parallel

# Clean up optimized models
python optimize_all.py --cleanup

Usage Guide

TUI Interface

The TUI provides a rich interactive experience with:

┌─────────────────────────────────────────────────────────┐
│                    Ollama Bench v2.0.0                  │
├─────────────────┬───────────────────────────────────────┤
│  === Models === │      Benchmark Results                │
│  qwen2.5-coder  │  Model: qwen2.5-coder                │
│  llama2:7b      │  Tokens/sec: 42.3                    │
│  codellama:34b  │  Time: 1.2s                          │
│                 │  Peak RAM: 7.2 GB                    │
│  === Actions ===│                                       │
│> Run Benchmark  │  ┌─Performance Graph──────┐           │
│  Configuration  │  │ ████████████████      │           │
│  Optimize Model │  │ CPU: 45% GPU: 80%     │           │
│  Export Results │  └───────────────────────┘           │
├─────────────────┴───────────────────────────────────────┤
│ [Up/Down] Navigate  [Enter] Select  [O] Optimize  [Q] Quit │
├─────────────────────────────────────────────────────────┤
│ Ready                                    CPU: 12% RAM: 8GB │
└─────────────────────────────────────────────────────────┘

Keyboard Shortcuts

Arrow Keys: Navigate menu
Enter: Select menu item
Space: Start/stop benchmark
O: Optimize selected model (or all if none selected)
E: Edit configuration
M: Edit Modelfile
X: Export results
Q: Quit

CLI Interface

The CLI provides a simple menu-driven interface:

$ ollama-bench --cli

============================================================
Ollama Bench CLI - Benchmarking Tool
============================================================

=== Main Menu ===
1. Run Benchmark
2. List Models
3. Show Configuration
4. Export Results
5. Optimize Single Model
6. Optimize All Models
7. Show System Info
8. Clean Optimized Models
Q. Quit

Enter choice:

Model Optimization

System Analysis

$ ollama-bench --cli
# Select option 7

System Specifications
============================================================
Platform: Darwin (Apple Silicon)
CPU: 12 cores @ 3.2 GHz
RAM: 48.0 GB total, 32.0 GB available
GPU: Apple Silicon GPU (36.0 GB)

Optimal Parameters
============================================================
Context Size: 4096 tokens
Batch Size: 512
Threads: 11
GPU Layers: 999

Batch Optimization

# Optimize all models with parallel processing
$ python optimize_all.py --parallel --workers 4

# Optimize specific models
$ python optimize_all.py llama2:7b codellama:13b

# Generate benchmark comparison script
$ python optimize_all.py --benchmark

# Clean up when done
$ python optimize_all.py --cleanup

Optimization Parameters

The optimizer automatically configures:

Parameter	Description	Impact
`num_ctx`	Context window size	Larger = better comprehension
`num_batch`	Batch processing size	Larger = higher throughput
`num_gpu`	GPU layers to offload	999 = full GPU acceleration
`num_thread`	CPU threads	Optimized for core count
`use_mlock`	Memory locking	Prevents swapping
`use_mmap`	Memory mapping	Efficient for large models

Model Recommendations by RAM

Available RAM	Model Size	Example Models
< 8 GB	3B-7B	qwen2.5:3b, tinyllama
8-16 GB	7B	llama2:7b, mistral:7b
16-32 GB	13B	llama2:13b, codellama:13b
32-64 GB	34B	codellama:34b
> 64 GB	70B+	llama2:70b, mixtral:8x7b

Benchmark Results

Results are saved in CSV format with detailed metrics:

model,iteration,elapsed_s,tokens_per_sec,peak_rss_bytes,cpu_percent,gpu_percent
qwen2.5-coder,1,1.234,42.3,7516192768,45.2,78.9
llama2:7b,1,2.456,38.1,8589934592,52.1,82.3

Configuration

Configuration is stored in ~/.config/ollama_bench/config.yaml:

benchmark:
  iterations: 3
  timeout: 120
  num_predict: 100
  temperature: 0.7
  seed: 42
  workdir: ~/.ollama_bench

resources:
  max_cpu_percent: 80
  max_gpu_percent: 90
  max_ram_gb: null
  throttle_enabled: false

ui:
  theme: default
  refresh_rate: 0.5
  show_graph: true

Performance Improvements

Typical optimization results:

Speed: 20-70% faster token generation
Memory: 10-20% lower RAM usage
Stability: Reduced out-of-memory errors
Efficiency: Better CPU/GPU utilization

Development

Project Structure

ollama-bench/
├── ollama_bench/           # Main package
│   ├── core/              # Core functionality
│   │   ├── benchmark.py   # Benchmarking engine
│   │   ├── models.py      # Model management
│   │   ├── monitor.py     # Resource monitoring
│   │   ├── config.py      # Configuration
│   │   ├── system_optimizer.py  # Hardware optimization
│   │   └── batch_optimizer.py   # Batch processing
│   ├── tui/               # Terminal UI
│   │   ├── app.py        # Main TUI application
│   │   ├── components/   # UI components
│   │   └── widgets/      # Interactive widgets
│   ├── cli.py            # CLI interface
│   └── utils/            # Utilities
├── optimize_model.py      # Single model optimizer
├── optimize_all.py        # Batch optimizer
└── setup.py              # Package setup

Testing

# Run tests
python test_optimization.py

# Test TUI import
python -c "from ollama_bench.tui import OllamaBenchTUI"

# Test CLI
python -m ollama_bench.cli

Troubleshooting

Terminal Issues

If you see Unicode errors, the tool automatically falls back to ASCII
For best results, use a terminal that supports UTF-8

GPU Detection

NVIDIA: Requires nvidia-ml-py
Apple Silicon: Automatic Metal acceleration
No GPU: Falls back to CPU-only optimization

Memory Issues

Reduce num_ctx for lower memory usage
Enable low_vram mode for limited GPU memory
Use quantized models (q4_0, q4_K_M)

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Run tests and benchmarks
Submit a pull request

License

MIT License - see LICENSE file

Acknowledgments

Ollama team for the excellent local LLM platform
Python curses library for terminal UI capabilities
psutil for cross-platform system monitoring

Changelog

Version 2.0.0 (2024)

Added Modelfile optimization based on system specs
Implemented batch model optimization
Added hardware detection and profiling
Improved TUI with optimization features
Added parallel processing support
Fixed terminal compatibility issues

Version 1.0.0 (2024)

Initial release with TUI and CLI interfaces
Basic benchmarking functionality
Resource monitoring
Model management

Contact

Author: Jeremiah Pegues
Email: jeremiah@pegues.io
GitHub: github.com/peguesj

Built with ❤️ for the Ollama community

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.vscode		.vscode
bench_results		bench_results
ollama_bench		ollama_bench
ollama_bench_tui.egg-info		ollama_bench_tui.egg-info
.DS_Store		.DS_Store
.gitignore		.gitignore
.tool-versions		.tool-versions
DEVELOPMENT_PLAN.md		DEVELOPMENT_PLAN.md
FINAL_TEST.md		FINAL_TEST.md
LICENSE		LICENSE
OPTIMIZATION_FEATURES.md		OPTIMIZATION_FEATURES.md
OPTIMIZER_GUIDE.md		OPTIMIZER_GUIDE.md
README.md		README.md
RUN_TUI.md		RUN_TUI.md
SOLUTION.md		SOLUTION.md
USAGE.md		USAGE.md
bench_ollama.py		bench_ollama.py
checkpoint.json		checkpoint.json
debug_tui.py		debug_tui.py
github_setup.sh		github_setup.sh
optimize_all.py		optimize_all.py
optimize_model.py		optimize_model.py
requirements.txt		requirements.txt
run_benchmark.py		run_benchmark.py
run_tui.sh		run_tui.sh
setup.py		setup.py
start_tui.command		start_tui.command
test_cli.py		test_cli.py
test_local_tui.py		test_local_tui.py
test_optimization.py		test_optimization.py
test_optimizer.py		test_optimizer.py
test_terminal.py		test_terminal.py
test_tui.py		test_tui.py
test_tui_312.py		test_tui_312.py

License

peguesj/ollama-bench

Folders and files

Latest commit

History

Repository files navigation

Ollama Bench

Author

Features

🚀 Core Features

🎯 Version 2.0 Features

Installation

From Source

Dependencies

Quick Start

TUI Mode (Interactive)

CLI Mode (Non-Interactive)

Direct Scripts

Usage Guide

TUI Interface

Keyboard Shortcuts

CLI Interface

Model Optimization

System Analysis

Batch Optimization

Optimization Parameters

Model Recommendations by RAM

Benchmark Results

Configuration

Performance Improvements

Development

Project Structure

Testing

Troubleshooting

Terminal Issues

GPU Detection

Memory Issues

Contributing

License

Acknowledgments

Changelog

Version 2.0.0 (2024)

Version 1.0.0 (2024)

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages