Skip to content

peguesj/ollama-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ollama Bench

Version License Python

A comprehensive benchmarking and optimization tool for Ollama models with both Terminal User Interface (TUI) and Command-Line Interface (CLI) modes.

Author

Jeremiah Pegues jeremiah@pegues.io

Features

🚀 Core Features

  • Dual Interface: Full-featured TUI with real-time graphs or simple CLI mode
  • Model Benchmarking: Compare performance across multiple Ollama models
  • System Optimization: Automatically tune models for your hardware
  • Resource Monitoring: Real-time CPU, GPU, and RAM usage tracking
  • Batch Processing: Optimize multiple models in parallel
  • Export Results: Save benchmark data in CSV format

🎯 Version 2.0 Features

  • Modelfile Optimization: Generate optimized configurations based on system specs
  • Batch Model Optimization: Optimize all models with a single command
  • Hardware Detection: Automatic detection of CPU, RAM, and GPU capabilities
  • Platform-Specific Tuning: Special optimizations for Apple Silicon
  • Performance Profiling: Detailed metrics including tokens/sec and memory usage

Installation

From Source

git clone https://github.com/peguesj/ollama-bench.git
cd ollama-bench
pip install -e .

Dependencies

pip install psutil pynvml py-cpuinfo

Quick Start

TUI Mode (Interactive)

ollama-bench
# or
python -m ollama_bench

CLI Mode (Non-Interactive)

ollama-bench --cli

Direct Scripts

# Optimize a single model
python optimize_model.py llama2

# Optimize all models
python optimize_all.py --parallel

# Clean up optimized models
python optimize_all.py --cleanup

Usage Guide

TUI Interface

The TUI provides a rich interactive experience with:

┌─────────────────────────────────────────────────────────┐
│                    Ollama Bench v2.0.0                  │
├─────────────────┬───────────────────────────────────────┤
│  === Models === │      Benchmark Results                │
│  qwen2.5-coder  │  Model: qwen2.5-coder                │
│  llama2:7b      │  Tokens/sec: 42.3                    │
│  codellama:34b  │  Time: 1.2s                          │
│                 │  Peak RAM: 7.2 GB                    │
│  === Actions ===│                                       │
│> Run Benchmark  │  ┌─Performance Graph──────┐           │
│  Configuration  │  │ ████████████████      │           │
│  Optimize Model │  │ CPU: 45% GPU: 80%     │           │
│  Export Results │  └───────────────────────┘           │
├─────────────────┴───────────────────────────────────────┤
│ [Up/Down] Navigate  [Enter] Select  [O] Optimize  [Q] Quit │
├─────────────────────────────────────────────────────────┤
│ Ready                                    CPU: 12% RAM: 8GB │
└─────────────────────────────────────────────────────────┘

Keyboard Shortcuts

  • Arrow Keys: Navigate menu
  • Enter: Select menu item
  • Space: Start/stop benchmark
  • O: Optimize selected model (or all if none selected)
  • E: Edit configuration
  • M: Edit Modelfile
  • X: Export results
  • Q: Quit

CLI Interface

The CLI provides a simple menu-driven interface:

$ ollama-bench --cli

============================================================
Ollama Bench CLI - Benchmarking Tool
============================================================

=== Main Menu ===
1. Run Benchmark
2. List Models
3. Show Configuration
4. Export Results
5. Optimize Single Model
6. Optimize All Models
7. Show System Info
8. Clean Optimized Models
Q. Quit

Enter choice: 

Model Optimization

System Analysis

$ ollama-bench --cli
# Select option 7

System Specifications
============================================================
Platform: Darwin (Apple Silicon)
CPU: 12 cores @ 3.2 GHz
RAM: 48.0 GB total, 32.0 GB available
GPU: Apple Silicon GPU (36.0 GB)

Optimal Parameters
============================================================
Context Size: 4096 tokens
Batch Size: 512
Threads: 11
GPU Layers: 999

Batch Optimization

# Optimize all models with parallel processing
$ python optimize_all.py --parallel --workers 4

# Optimize specific models
$ python optimize_all.py llama2:7b codellama:13b

# Generate benchmark comparison script
$ python optimize_all.py --benchmark

# Clean up when done
$ python optimize_all.py --cleanup

Optimization Parameters

The optimizer automatically configures:

Parameter Description Impact
num_ctx Context window size Larger = better comprehension
num_batch Batch processing size Larger = higher throughput
num_gpu GPU layers to offload 999 = full GPU acceleration
num_thread CPU threads Optimized for core count
use_mlock Memory locking Prevents swapping
use_mmap Memory mapping Efficient for large models

Model Recommendations by RAM

Available RAM Model Size Example Models
< 8 GB 3B-7B qwen2.5:3b, tinyllama
8-16 GB 7B llama2:7b, mistral:7b
16-32 GB 13B llama2:13b, codellama:13b
32-64 GB 34B codellama:34b
> 64 GB 70B+ llama2:70b, mixtral:8x7b

Benchmark Results

Results are saved in CSV format with detailed metrics:

model,iteration,elapsed_s,tokens_per_sec,peak_rss_bytes,cpu_percent,gpu_percent
qwen2.5-coder,1,1.234,42.3,7516192768,45.2,78.9
llama2:7b,1,2.456,38.1,8589934592,52.1,82.3

Configuration

Configuration is stored in ~/.config/ollama_bench/config.yaml:

benchmark:
  iterations: 3
  timeout: 120
  num_predict: 100
  temperature: 0.7
  seed: 42
  workdir: ~/.ollama_bench

resources:
  max_cpu_percent: 80
  max_gpu_percent: 90
  max_ram_gb: null
  throttle_enabled: false

ui:
  theme: default
  refresh_rate: 0.5
  show_graph: true

Performance Improvements

Typical optimization results:

  • Speed: 20-70% faster token generation
  • Memory: 10-20% lower RAM usage
  • Stability: Reduced out-of-memory errors
  • Efficiency: Better CPU/GPU utilization

Development

Project Structure

ollama-bench/
├── ollama_bench/           # Main package
│   ├── core/              # Core functionality
│   │   ├── benchmark.py   # Benchmarking engine
│   │   ├── models.py      # Model management
│   │   ├── monitor.py     # Resource monitoring
│   │   ├── config.py      # Configuration
│   │   ├── system_optimizer.py  # Hardware optimization
│   │   └── batch_optimizer.py   # Batch processing
│   ├── tui/               # Terminal UI
│   │   ├── app.py        # Main TUI application
│   │   ├── components/   # UI components
│   │   └── widgets/      # Interactive widgets
│   ├── cli.py            # CLI interface
│   └── utils/            # Utilities
├── optimize_model.py      # Single model optimizer
├── optimize_all.py        # Batch optimizer
└── setup.py              # Package setup

Testing

# Run tests
python test_optimization.py

# Test TUI import
python -c "from ollama_bench.tui import OllamaBenchTUI"

# Test CLI
python -m ollama_bench.cli

Troubleshooting

Terminal Issues

  • If you see Unicode errors, the tool automatically falls back to ASCII
  • For best results, use a terminal that supports UTF-8

GPU Detection

  • NVIDIA: Requires nvidia-ml-py
  • Apple Silicon: Automatic Metal acceleration
  • No GPU: Falls back to CPU-only optimization

Memory Issues

  • Reduce num_ctx for lower memory usage
  • Enable low_vram mode for limited GPU memory
  • Use quantized models (q4_0, q4_K_M)

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Run tests and benchmarks
  4. Submit a pull request

License

MIT License - see LICENSE file

Acknowledgments

  • Ollama team for the excellent local LLM platform
  • Python curses library for terminal UI capabilities
  • psutil for cross-platform system monitoring

Changelog

Version 2.0.0 (2024)

  • Added Modelfile optimization based on system specs
  • Implemented batch model optimization
  • Added hardware detection and profiling
  • Improved TUI with optimization features
  • Added parallel processing support
  • Fixed terminal compatibility issues

Version 1.0.0 (2024)

  • Initial release with TUI and CLI interfaces
  • Basic benchmarking functionality
  • Resource monitoring
  • Model management

Contact

Author: Jeremiah Pegues
Email: jeremiah@pegues.io
GitHub: github.com/peguesj


Built with ❤️ for the Ollama community

About

Comprehensive benchmarking and optimization tool for Ollama models with TUI and CLI interfaces

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published