Releases: KolosalAI/kolosal-cli
v0.1.3
New:
- Vulkan GPU inference (llama-vulkan)
- Auto GPU detection; automatic CPU fallback
Usage:
- Auto: --engine auto
- GPU: --engine llama-vulkan
- CPU: --engine llama-cpu
Linux setup:
- Loader/tools: libvulkan1 vulkan-tools
- Drivers: nvidia-vulkan-icd or mesa-vulkan-drivers
Validate:
- ldconfig -p | grep libvulkan.so.1
- vulkaninfo | head
Notes:
- CPU-only environments continue to work
- If Vulkan missing and GPU forced, startup may fail or warn, per config
v0.1.2
- Fixed premature exit in useAuthCommand.ts during headless authentication
- Added graceful cleanup handlers in gemini.tsx to prevent crashes on process exit
- Enhanced signal handling for SIGINT/SIGTERM with proper cleanup
- Linux kolosal-server Support (a62381b)
- Extended build.js to build kolosal-server on Linux (previously macOS-only)
- Added Linux paths to kolosal-server-manager.ts for executable discovery
- Fixed spawn argument order for port override functionality
- Created config.linux.yaml with proper library paths (.so files)
- Platform Compatibility (a62381b)
- Updated build scripts for cross-platform library handling
- Fixed port configuration (8080→8087) to match CLI expectations
- Added Linux-specific dependency detection and installation
- Input Handling (KeypressContext.tsx)
- Fixed spurious keypress detection on Linux during terminal initialization
- Added platform-specific delays to prevent false triggers
- Improved paste mode detection and handling
- Standardized all message components to use consistent marginTop={1} spacing.
- 8 message components updated for consistent 1-space vertical margins
- Removed double spacing from UserMessage (marginBottom removed), ToolGroupMessage (paddingTop removed), and ErrorMessage
- Added missing spacing to UserShellMessage, CompressionMessage, and SummaryMessage
- Simplified conditional spacing in GeminiMessage and GeminiMessageContent (removed isFirstAssistantMessage logic)
v0.1.1-pre
Release Notes
Version 0.1.1 - Enhanced Tool Call Support & Stability Improvements
Major New Features
XML-Style Tool Call Parser
A groundbreaking addition that enables seamless tool call support across a wide range of AI models and inference engines, including those that don't natively support structured tool calling.
Key Benefits:
- Universal Compatibility: Works with local AI models and inference engines that lack native tool call support
- Seamless Integration: Automatically detects and parses XML-style tool call markers in model responses
- Future-Proof Architecture: Enables support for emerging models and custom inference implementations
- Zero Configuration: Works out of the box with OpenAI-compatible APIs
Technical Details:
- Supports both standard OpenAI JSON-style tool calls and XML-style tool call markers
- Intelligent streaming parser handles cross-chunk tool call boundaries
- Robust error handling with graceful fallback to text-only responses
- Prevents tool call marker leakage in text output
Impact:
This feature dramatically expands the ecosystem of AI models compatible with Kolosal CLI, making it possible to use local models, custom inference engines, and emerging AI providers that previously couldn't support tool calling capabilities. This is a major step toward true vendor independence and flexibility in AI model selection.
Technical Improvements
Enhanced Tool Call ID Management
- Implemented deterministic tool call ID remapping system
- Prevents ID collisions when converting from Gemini format to OpenAI format
- Proper ID pairing between tool calls and their responses
- Sequential ID generation with suffix counters (
0,0__1,0__2, etc.)
Improved Streaming Robustness
- Better handling of cross-chunk tool call boundaries
- Intelligent buffering prevents premature text emission
- XML tool call detection and parsing during streaming
- Graceful error handling for malformed tool calls
Code Quality Enhancements
- Added extensive debug logging for troubleshooting
- Improved error messages with context
- Better separation of concerns in converter architecture
- Enhanced test coverage for edge cases
Use Cases Enabled
With the new XML parser, you can now:
- Use Local AI Models: Run models locally with inference engines that don't have native tool calling support
- Custom Inference Engines: Integrate custom AI inference implementations using XML-style tool markers
- Legacy Model Support: Enable tool calling on older models that predate structured function calling
- Vendor Independence: Easily switch between different AI providers without compatibility concerns
- Emerging Models: Support new models and providers as they come online, even before they implement structured tool calling
Support
If you encounter any issues with this release:
- Check the Troubleshooting Guide
- Review the API Documentation
- Open an issue on GitHub
Full Changelog: See CHANGELOG.md for detailed commit history.
Release Date: October 14, 2025
Version: 0.1.1-pre
Branch: agent → main
v0.1.0-pre
Kolosal CLI - First Release
This is the initial release of Kolosal CLI, an agentic AI coding assistant that brings intelligent code generation and automation to your terminal.
Key Features
- Agentic AI assistant with autonomous task execution and planning
- Interactive command-line interface with rich terminal UI
- VS Code integration via companion extension for seamless IDE workflow
- Multi-turn conversations with context awareness
- Built-in tool system for file operations, shell commands, and code analysis
- MCP (Model Context Protocol) support for extensibility
- Subagent framework for specialized coding tasks
- Memory and session management for persistent workflows
- Support for the new kolosal cloud and any OpenAI compatible API
Offline Capabilities
- Full offline mode powered by kolosal-server
- Run local LLM models without internet connection
- Privacy-first architecture with on-device inference
- Support for GGUF model format
- Local model management and selection
Architecture
- Modern TypeScript/Node.js implementation
- Modular package structure: cli, core, test-utils, and vscode-ide-companion
- Integrated kolosal-server for local inference
- Comprehensive testing and build infrastructure
- Cross-platform support with macOS packaging
This release establishes the foundation for an intelligent, context-aware coding assistant that works directly in your development environment, with the flexibility to run completely offline.
v0-pre
Kolosal CLI v0.1 is here to make you run and deploy any local LLM easily.
- Single Binary: One lightweight executable, install anywhere instantly
- Universal GPU Support: Powered by llama.cpp + Vulkan—works on every GPU
- Auto-Scaling: Models scale down when idle, seamless switching
- Instant API: Every model available at localhost:8080 (OpenAI compatible)
- Smart Memory: Built-in approximator shows what models fit your hardware
- Hugging Face Ready: Run any GGUF model with one command