lmgo is a suite of tools for running local LLM models using llama.cpp server with ROCm GPU acceleration. It includes:
- lmgo: System tray application for Windows with model management
- lmc: Terminal-based control interface using BubbleTea
Both tools are specifically optimized for systems with AMD RYZEN AI MAX+ 395 / Radeon 8060S graphics.
This application only works on:
- Operating System: Windows 11
- Processor: AMD RYZEN AI MAX+ 395
- Graphics: Radeon 8060S
- Architecture: x86_64
The embedded llama-server is compiled specifically for ROCm GFX1151 architecture and will not work on other hardware configurations.
- System Tray Interface: Runs in the Windows system tray for easy access
- Automatic Model Discovery: Scans directories for .gguf model files
- Single Model Support: Load and run one model at a time
- Web Interface: Built-in web interface for each loaded model
- Auto-start on Boot: Option to start automatically with Windows
- Notifications: Windows toast notifications for model status
- Multi-Configuration Support: Multiple configurations for the same model, each displayed as a separate option
- Automatic Web Browser Launch: Option to automatically open web interface when models load
- Model Exclusion Patterns: Support for excluding specific models or folders using glob patterns
- Config Refresh: Refresh button to reload configuration and rescan models without restarting
- Terminal Interface: TUI-based model management with keyboard shortcuts
- Real-time Status: Live display of model loading/unloading status
- API Integration: Communicates with lmgo's REST API for model control
- Key Bindings: Intuitive keyboard controls (Arrow keys, Enter, U, Q)
- Multi-Configuration Support: Displays all model configurations as separate entries
The application creates a lmgo.json configuration file with the following structure:
{
"modelDir": "./models",
"autoOpenWebEnabled": true,
"basePort": 8080,
"llamaServerPort": 8081,
"defaultArgs": [
"--host", "0.0.0.0",
"--prio-batch", "3",
"--no-host",
"--ctx-size", "131072",
"--batch-size", "4096",
"--ubatch-size", "4096",
"--threads", "0",
"--threads-batch", "0",
"-ngl", "999",
"--flash-attn", "on",
"--cache-type-k", "f16",
"--cache-type-v", "f16",
"--kv-offload",
"--no-mmap",
"--no-repack",
"--direct-io",
"--mlock",
"--split-mode", "layer",
"--main-gpu", "0"
],
"modelSpecificArgs": [],
"excludePatterns": []
}- modelDir: Directory containing .gguf model files
- autoOpenWebEnabled: Automatically open browser when model loads
- basePort: API server port (default: 8080) - used by lmc and HTTP API
- llamaServerPort: llama-server port (default: 8081) - where models run
- defaultArgs: Default arguments passed to llama-server
- modelSpecificArgs: Array of model configurations, allowing multiple configurations per model
- excludePatterns: List of glob patterns to exclude models from the list (similar to .gitignore)
You can define multiple configurations for the same model, each displayed as a separate option in the menu and API:
"modelSpecificArgs": [
{
"name": "Llama-3 (Fast Mode)",
"target": "Llama-3-8B-Instruct",
"args": ["-ngl", "10", "-c", "2048", "--batch-size", "512"]
},
{
"name": "Llama-3 (Long Context)",
"target": "Llama-3-8B-Instruct",
"args": ["-ngl", "99", "-c", "32768"]
},
{
"name": "Qwen2.5-7B (Optimized)",
"target": "Qwen2.5-7B-Instruct",
"args": ["-ngl", "35", "-c", "8192", "--batch-size", "1024"]
}
]Note: When a model has configurations defined in modelSpecificArgs, the default configuration is not shown as an option.
You can exclude specific models or folders using glob patterns:
"excludePatterns": [
"mmproj-35B-F16.gguf",
"*-test.gguf",
"experimental/*",
"backup/**/*.gguf"
]Patterns support:
*matches any sequence of non-separator characters?matches any single non-separator character[abc]matches any character in the set**matches zero or more directories
GET /api/models- List all available models and configurationsGET /api/status- Get current model statusPOST /api/load?index=N- Load model at index N (includes configurations as separate indices)POST /api/unload- Unload current modelGET /api/health- Health check
API Response Example:
{
"success": true,
"data": [
{
"index": 0,
"modelIndex": 0,
"configIndex": 0,
"name": "Llama-3 (Fast Mode)",
"path": "D:/LLM/Llama-3-8B-Instruct.gguf",
"filename": "Llama-3-8B-Instruct.gguf",
"hasConfig": true,
"configName": "Llama-3 (Fast Mode)"
},
{
"index": 1,
"modelIndex": 0,
"configIndex": 1,
"name": "Llama-3 (Long Context)",
"path": "D:/LLM/Llama-3-8B-Instruct.gguf",
"filename": "Llama-3-8B-Instruct.gguf",
"hasConfig": true,
"configName": "Llama-3 (Long Context)"
}
]
}Download the latest llama-b*-windows-rocm-gfx1151-x64.zip file from releases first and then
go mod tidy
go build -ldflags "-s -w -H windowsgui" -buildvcs=false .cd lmc
go mod tidy
go build -buildvcs=false .lmc uses an embedded configuration file baseURL.json that specifies the lmgo API endpoint:
{
"baseURL": "http://127.0.0.1:9696"
}Note: lmc automatically displays all model configurations from lmgo as separate entries in the terminal interface. Each configuration appears as an independent model option.

