-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Development Proposal: AI-Powered Text Generation (LLM Integration) for Studio-Whip
1. Introduction & Purpose
This proposal outlines the plan to integrate Large Language Model (LLM) capabilities into the Studio-Whip application. The primary goal is to empower users with AI-assisted text generation directly within their creative workflows, such as scriptwriting and story development. This feature will allow users to leverage local LLMs to generate, continue, or modify text content seamlessly within Studio-Whip's existing text editing interface.
The initial implementation will focus on supporting GGUF-formatted LLMs (e.g., Qwen3, Gemma3, QWQ, Mistral) via the llama-cpp-2 library, targeting NVIDIA CUDA-enabled GPUs and CPU inference. User experience is paramount, with features like real-time streaming of generated text and the ability to instantly cancel ongoing generations.
2. Goals & Scope
- Core Functionality:
- Load user-specified GGUF LLM models.
- Perform text generation based on user prompts within existing Studio-Whip text objects.
- Stream generated text tokens in real-time into the target text object.
- Allow users to cancel ongoing text generation requests perceived as instantly.
- Technical Scope (Phase 1):
- Integration of the
llama-cpp-2Rust library. - Support for CUDA (NVIDIA GPU) and CPU-based inference.
- Configuration of models via TOML files in a
user/ai_models/directory. - Robust error handling and logging using
bevy_log.
- Integration of the
- Out of Scope (Phase 1):
- Advanced prompt engineering UI (initial input will be simple text prompts).
- Concurrent generation from multiple LLMs simultaneously (will be sequential initially).
- Vulkan compute backend for
llama.cpp(deferred for cross-platform GPU). - Support for other AI modalities (image, audio, etc. – this LLM system will serve as a foundational pattern).
3. Proposed Architecture & Design
The AI functionality will be encapsulated within a new, dedicated Rust module (src/ai/) within the existing rusty_whip crate. This module will integrate with the Bevy application primarily through Bevy's ECS (Entities, Components, Systems) and event system.
3.1. Core Components of the AI Module:
AiPlugin(Bevy Plugin):- Purpose: Initializes AI-related resources and systems.
- Functionality: Manages the lifecycle of the AI module within the Bevy app.
AiModelManager(Bevy Resource):- Purpose: Manages the loading, unloading, and access to AI models.
- Functionality:
- Discovers available models from user configuration files.
- Handles asynchronous model loading (to prevent UI freezes).
- Tracks loaded models (
HashMap<ModelId, Arc<dyn LlmModel>>). - Manages active generation tasks and their cancellation flags (
HashMap<Uuid, Arc<AtomicBool>>). - Provides an interface for other systems to request model operations.
- Model Abstractions (
ai::llm::model):LlmModelTrait: Defines a common interface for LLMs (e.g.,stream_generate,metadata).LlamaCppModelStruct: ImplementsLlmModelusing thellama-cpp-2backend. Handles the specifics of token-by-token generation and cancellation polling.
- Backend Wrapper (
ai::backends::llama_cpp_2):- Purpose: Provides a safe and ergonomic Rust interface over the
llama-cpp-2library. - Functionality: Manages
llama.cppmodel and context lifecycles, parameter translation, GPU offloading, and the core inference loop.
- Purpose: Provides a safe and ergonomic Rust interface over the
- Event System (
ai::events):- Purpose: Decouples AI operations from direct GUI calls, enabling asynchronous processing.
- Key Events:
LlmLoadRequest: Signals a request to load a model.LlmLoadResult: Reports the outcome of a model load attempt.LlmGenerateRequest: Signals a request to generate text.LlmCancelRequest: Signals a request to cancel an ongoing generation.LlmTokenStreamEvent: Carries individual generated tokens to be appended to the UI.LlmGenerationComplete: Signals the end or failure of a generation task.
- Configuration (
user/ai_models/*.toml&ai::llm::config):- Purpose: Allows users to define which models to load and their specific parameters (e.g., GGUF path, GPU layers).
LlmModelUserConfigStruct: Deserializes these TOML files.
3.2. Interaction with GUI Framework:
The AI module will interface with the existing Bevy-based GUI framework as follows:
- Request Initiation (GUI -> AI):
- A user action in the GUI (e.g., clicking a "Generate" button associated with an
EditableTextentity, or a hotkey) triggers a GUI system. - This GUI system gathers the prompt (from the Yrs data of the target
EditableText), the targetEntityID, and other parameters. - It then sends an
LlmGenerateRequestBevy event. - The GUI system can also update its state (e.g., show a loading spinner, add an
AwaitingAiResponsecomponent to the target entity).
- A user action in the GUI (e.g., clicking a "Generate" button associated with an
- Asynchronous AI Processing:
- The
AiPlugin'shandle_llm_generate_requests_systemreceives the request. - It uses the
AiModelManagerto get the specifiedLlmModel. - An asynchronous Bevy task is spawned to perform the generation via
LlmModel::stream_generate. This task includes:- An
mpscchannel for sending generated tokens back. - An
Arc<AtomicBool>cancellation flag, stored by theAiModelManager.
- An
- The
LlamaCppModel'sstream_generateimplementation callsllama-cpp-2in a loop, polling the cancellation flag after each token.
- The
- Streaming Results (AI -> GUI via Yrs):
- As the async task receives tokens from
llama-cpp-2(via thempscchannel withinstream_generate), it queuesLlmTokenStreamEventdata using a shared, thread-safe queue (e.g.,AsyncAiTaskOutputsresource). - A Bevy system (
forward_async_ai_events_system) on the main thread drains this queue and sends the actualLlmTokenStreamEventBevy events. - The
apply_llm_tokens_to_yrs_system(inAiPlugin) listens forLlmTokenStreamEvent.- It retrieves the
YrsDocResourceand the targetyrs::TextRef(identified byevent.target_yrs_text_entityfrom thetext_mapinYrsDocResource). - It appends the
event.tokento theyrs::TextRef. - Crucially, it then sends a
YrsTextChanged { entity: event.target_yrs_text_entity }event.
- It retrieves the
- As the async task receives tokens from
- GUI Update (Reactive via Yrs):
- Studio-Whip's existing
text_layout_system(ingui_framework::plugins::core.rs) already listens forYrsTextChangedevents. - Upon receiving this event, it re-layouts the text, and the custom Vulkan renderer displays the updated content in the next frame. This provides real-time streaming without direct AI-to-renderer calls.
- Studio-Whip's existing
- Cancellation (GUI -> AI -> Task):
- User clicks a "Cancel" button.
- GUI system sends
LlmCancelRequest { request_id }. handle_llm_cancel_requests_system(inAiPlugin) finds therequest_id'sArc<AtomicBool>inAiModelManagerand sets it totrue.- The
LlamaCppModel::stream_generateloop detects the flag and terminates, sending anLlmGenerationCompleteevent with a "Cancelled" status.
- Completion/Error Handling (AI -> GUI):
- When generation finishes (normally, cancelled, or error), the async task queues an
LlmGenerationCompleteevent. forward_async_ai_events_systemsends this Bevy event.- GUI systems listen for
LlmGenerationCompleteto update UI (hide spinner, show error messages, removeAwaitingAiResponsecomponent).
- When generation finishes (normally, cancelled, or error), the async task queues an
3.3. Reasoning Behind Design Decisions:
- Modularity (
aimodule): Keeps AI concerns separate, facilitating future expansion to other AI types (image, audio) using similar patterns. - Bevy Events: Provides loose coupling between GUI and AI, essential for asynchronous operations and testability. Aligns with existing GUI framework patterns.
- Yrs for Text Streaming: Leverages Studio-Whip's existing CRDT infrastructure for efficient, real-time, and potentially collaborative text updates. The AI module simply "pushes" data into Yrs, and the GUI reacts.
llama-cpp-2: Chosen for its active development, focus on staying current withllama.cpp, and direct C++ bindings suitable for Rust.- Asynchronous Tasks (
AsyncComputeTaskPool): Prevents UI freezes during model loading and inference, crucial for good UX. AtomicBoolfor Cancellation: A standard, lightweight mechanism for signalling cancellation to long-running tasks, ensuring responsiveness.- User Configuration (TOML): Simple, human-readable way for users to manage their local models.
4. Implementation Plan & Actionable Steps
The implementation will be phased. Each step should be testable.
Phase 1.0: AI Module Skeleton & Configuration
- Create Directory Structure:
- Create
src/ai/withmod.rs,common.rs,error.rs,events.rs. - Create
src/ai/llm/withmod.rs,model.rs,config.rs. - Create
src/ai/backends/withmod.rs. - Create
src/ai/backends/llama_cpp_2/withmod.rs. - Create
user/ai_models/directory.
- Create
- Define Core Types & Events:
- Implement structs/enums in
ai::common.rs(ModelId,ModelType,InferenceDevice,ModelLoadConfig,ModelMetadata). - Implement
AiErrorinai::error.rs. - Implement event structs in
ai::events.rs(allLlm*events, includingLlmCancelRequest). - Implement
LlmModelUserConfiginai::llm::config.rsfor TOML deserialization.
- Implement structs/enums in
AiPluginandAiModelManager(Basic Structure):- Create
AiPlugininai::mod.rs. - Create
AiModelManagerresource inai::manager.rswith empty HashMaps for models and active generations. - Implement the
discover_and_request_model_loads_systemto scanuser/ai_models/and sendLlmLoadRequestevents (initially, these requests won't be fully processed). - Add
AiPlugintomain.rs.
- Create
- Testing:
- Verify the plugin loads.
- Create a sample
model.tomlfile. - Verify
discover_and_request_model_loads_systemruns and sendsLlmLoadRequestevents (log the events).
Phase 1.1: llama-cpp-2 Backend Wrapper & Model Loading
- Add
llama-cpp-2Dependency:- Add
llama-cpp-2toCargo.tomlwith appropriate feature flags for CPU and CUDA (e.g.,features = ["cuda"]if building for CUDA). - Ensure
git submodule update --init --recursiveis run ifllama-cpp-2is a git submodule or if its dependencies require it.
- Add
- Implement
ai::backends::llama_cpp_2::mod.rs:- Write wrapper functions to:
- Initialize
llama.cppbackend. - Load a GGUF model using
llama_cpp_2::LlamaModel::load_from_filebased onModelLoadConfig(path,gpu_layers). - Handle potential errors from
llama-cpp-2and convert them toAiError.
- Initialize
- Write wrapper functions to:
- Implement
LlmModelTrait andLlamaCppModel:- Define the
LlmModeltrait inai::llm::model.rs. - Implement
LlamaCppModelstruct holding the loadedllama_cpp_2::LlamaModel. - Implement the
metadata,estimate_vram/ram(initially can be placeholders or read from GGUF if API allows) methods.
- Define the
- Enhance
AiModelManagerand Loading Systems:- Implement
handle_model_load_requests_system:- Spawn an async Bevy task.
- Task uses the
llama_cpp_2wrapper to load the model. - Task sends results (model instance or error) back to the main thread (e.g., via
AsyncAiTaskOutputsqueue).
- Implement
process_model_load_results_system:- Drains the result queue.
- If successful, creates
Arc<LlamaCppModel>and stores it inAiModelManager.models. - Sends
LlmLoadResultBevy event.
- Implement
- Testing:
- Place a small GGUF model (e.g., a tiny test model or a small Mistral quant) and its TOML config in
user/ai_models/. - Run the app. Verify (via logs and
LlmLoadResultevents) that the model loads successfully on CPU. - If CUDA is set up, test GPU offloading by setting
gpu_layersin the TOML. Verifyllama.cpplogs indicate GPU usage. - Test error handling for incorrect paths or corrupted GGUF files.
- Place a small GGUF model (e.g., a tiny test model or a small Mistral quant) and its TOML config in
Phase 1.2: Text Generation Streaming & Cancellation
- Implement
LlamaCppModel::stream_generate:- Takes
prompt,params,token_tx: mpsc::Sender, andcancel_flag: Arc<AtomicBool>. - Sets up a
llama_cpp_2::LlamaContext. - Enters the
llama.cpptoken generation loop. - Inside the loop:
- Poll
cancel_flag. If true, break, sendErr(AiError::Cancelled)viatoken_tx(or a separate completion channel), and return. - Get the next token string from
llama-cpp-2. - Send
Ok(token_string)viatoken_tx.
- Poll
- When the loop finishes (EOS or error), ensure
token_txis closed or a final completion message is sent.
- Takes
- Implement
handle_llm_generate_requests_system:- As described in section 3.2.2, this system receives
LlmGenerateRequest, sets up cancellation, spawns the async task which callsstream_generateand forwards tokens/completion viaAsyncAiTaskOutputs.
- As described in section 3.2.2, this system receives
- Implement
handle_llm_cancel_requests_system:- Receives
LlmCancelRequest, finds theArc<AtomicBool>inAiModelManager.active_generations, and sets it totrue.
- Receives
- Implement
forward_async_ai_events_system:- Drains
AsyncAiTaskOutputsand sendsLlmTokenStreamEventandLlmGenerationCompleteBevy events.
- Drains
- Implement
apply_llm_tokens_to_yrs_system:- Receives
LlmTokenStreamEvent. - Appends token to the target Yrs
TextRef. - Sends
YrsTextChangedevent.
- Receives
- GUI Integration (Basic):
- Create a temporary Bevy system (or use a debug UI if you have one) that can:
- Send an
LlmGenerateRequestfor a hardcoded prompt and targetEditableTextentity (the one created insetup_scene_ecs). - Send an
LlmCancelRequestfor that generation.
- Send an
- Add basic UI state components (
AwaitingAiResponse) and systems to manage them based onLlmGenerateRequestandLlmGenerationComplete.
- Create a temporary Bevy system (or use a debug UI if you have one) that can:
- Testing:
- Trigger a generation. Verify text streams into the sample
EditableTextUI element. - Verify your
text_layout_systemand renderer update the display in real-time. - Trigger cancellation during generation. Verify generation stops promptly and the UI updates accordingly (e.g., "Cancelled" message, spinner stops).
- Test generation completion (EOS token).
- Test error conditions during inference (if possible to simulate).
- Trigger a generation. Verify text streams into the sample
Phase 1.3: Refinement & Polish
- Error Handling & Logging:
- Ensure all
Resulttypes are handled. - Use
bevy_logmacros (info!,warn!,error!) appropriately throughout the AI module. - Propagate meaningful errors to the user via
LlmGenerationCompleteorLlmLoadResultevents, allowing the GUI to display them.
- Ensure all
- Code Cleanup & Documentation:
- Add comments and documentation to new AI modules and systems.
- Refactor for clarity and efficiency.
- Basic GUI Integration for Triggering:
- Implement a simple button or hotkey within the main application to trigger generation on the currently focused
EditableText(as sketched in the previous response).
- Implement a simple button or hotkey within the main application to trigger generation on the currently focused
5. Future Considerations (Post-Phase 1)
- Concurrent LLM generations.
- Vulkan compute backend for
llama.cpp. - Integration of other AI modalities (Image, Audio), following similar architectural patterns.
- Advanced prompt engineering UI.
- More sophisticated model management UI within Studio-Whip.
- Memory monitoring and dynamic unloading/loading of models based on usage and system resources.
This phased approach allows for incremental development and testing, ensuring each part of the system is functional before building upon it. The focus on leveraging existing Bevy and Yrs patterns should make the integration relatively smooth and maintainable.