AI Integration Plan

## Development Proposal: AI-Powered Text Generation (LLM Integration) for Studio-Whip

**1. Introduction & Purpose**

This proposal outlines the plan to integrate Large Language Model (LLM) capabilities into the Studio-Whip application. The primary goal is to empower users with AI-assisted text generation directly within their creative workflows, such as scriptwriting and story development. This feature will allow users to leverage local LLMs to generate, continue, or modify text content seamlessly within Studio-Whip's existing text editing interface.

The initial implementation will focus on supporting GGUF-formatted LLMs (e.g., Qwen3, Gemma3, QWQ, Mistral) via the `llama-cpp-2` library, targeting NVIDIA CUDA-enabled GPUs and CPU inference. User experience is paramount, with features like real-time streaming of generated text and the ability to instantly cancel ongoing generations.

**2. Goals & Scope**

*   **Core Functionality:**
    *   Load user-specified GGUF LLM models.
    *   Perform text generation based on user prompts within existing Studio-Whip text objects.
    *   Stream generated text tokens in real-time into the target text object.
    *   Allow users to cancel ongoing text generation requests perceived as instantly.
*   **Technical Scope (Phase 1):**
    *   Integration of the `llama-cpp-2` Rust library.
    *   Support for CUDA (NVIDIA GPU) and CPU-based inference.
    *   Configuration of models via TOML files in a `user/ai_models/` directory.
    *   Robust error handling and logging using `bevy_log`.
*   **Out of Scope (Phase 1):**
    *   Advanced prompt engineering UI (initial input will be simple text prompts).
    *   Concurrent generation from multiple LLMs simultaneously (will be sequential initially).
    *   Vulkan compute backend for `llama.cpp` (deferred for cross-platform GPU).
    *   Support for other AI modalities (image, audio, etc. – this LLM system will serve as a foundational pattern).

**3. Proposed Architecture & Design**

The AI functionality will be encapsulated within a new, dedicated Rust module (`src/ai/`) within the existing `rusty_whip` crate. This module will integrate with the Bevy application primarily through Bevy's ECS (Entities, Components, Systems) and event system.

**3.1. Core Components of the AI Module:**

*   **`AiPlugin` (Bevy Plugin):**
    *   **Purpose:** Initializes AI-related resources and systems.
    *   **Functionality:** Manages the lifecycle of the AI module within the Bevy app.
*   **`AiModelManager` (Bevy Resource):**
    *   **Purpose:** Manages the loading, unloading, and access to AI models.
    *   **Functionality:**
        *   Discovers available models from user configuration files.
        *   Handles asynchronous model loading (to prevent UI freezes).
        *   Tracks loaded models (`HashMap<ModelId, Arc<dyn LlmModel>>`).
        *   Manages active generation tasks and their cancellation flags (`HashMap<Uuid, Arc<AtomicBool>>`).
        *   Provides an interface for other systems to request model operations.
*   **Model Abstractions (`ai::llm::model`):**
    *   **`LlmModel` Trait:** Defines a common interface for LLMs (e.g., `stream_generate`, `metadata`).
    *   **`LlamaCppModel` Struct:** Implements `LlmModel` using the `llama-cpp-2` backend. Handles the specifics of token-by-token generation and cancellation polling.
*   **Backend Wrapper (`ai::backends::llama_cpp_2`):**
    *   **Purpose:** Provides a safe and ergonomic Rust interface over the `llama-cpp-2` library.
    *   **Functionality:** Manages `llama.cpp` model and context lifecycles, parameter translation, GPU offloading, and the core inference loop.
*   **Event System (`ai::events`):**
    *   **Purpose:** Decouples AI operations from direct GUI calls, enabling asynchronous processing.
    *   **Key Events:**
        *   `LlmLoadRequest`: Signals a request to load a model.
        *   `LlmLoadResult`: Reports the outcome of a model load attempt.
        *   `LlmGenerateRequest`: Signals a request to generate text.
        *   `LlmCancelRequest`: Signals a request to cancel an ongoing generation.
        *   `LlmTokenStreamEvent`: Carries individual generated tokens to be appended to the UI.
        *   `LlmGenerationComplete`: Signals the end or failure of a generation task.
*   **Configuration (`user/ai_models/*.toml` & `ai::llm::config`):**
    *   **Purpose:** Allows users to define which models to load and their specific parameters (e.g., GGUF path, GPU layers).
    *   **`LlmModelUserConfig` Struct:** Deserializes these TOML files.

**3.2. Interaction with GUI Framework:**

The AI module will interface with the existing Bevy-based GUI framework as follows:

1.  **Request Initiation (GUI -> AI):**
    *   A user action in the GUI (e.g., clicking a "Generate" button associated with an `EditableText` entity, or a hotkey) triggers a GUI system.
    *   This GUI system gathers the prompt (from the Yrs data of the target `EditableText`), the target `Entity` ID, and other parameters.
    *   It then sends an `LlmGenerateRequest` Bevy event.
    *   The GUI system can also update its state (e.g., show a loading spinner, add an `AwaitingAiResponse` component to the target entity).
2.  **Asynchronous AI Processing:**
    *   The `AiPlugin`'s `handle_llm_generate_requests_system` receives the request.
    *   It uses the `AiModelManager` to get the specified `LlmModel`.
    *   An asynchronous Bevy task is spawned to perform the generation via `LlmModel::stream_generate`. This task includes:
        *   An `mpsc` channel for sending generated tokens back.
        *   An `Arc<AtomicBool>` cancellation flag, stored by the `AiModelManager`.
    *   The `LlamaCppModel`'s `stream_generate` implementation calls `llama-cpp-2` in a loop, polling the cancellation flag after each token.
3.  **Streaming Results (AI -> GUI via Yrs):**
    *   As the async task receives tokens from `llama-cpp-2` (via the `mpsc` channel within `stream_generate`), it queues `LlmTokenStreamEvent` data using a shared, thread-safe queue (e.g., `AsyncAiTaskOutputs` resource).
    *   A Bevy system (`forward_async_ai_events_system`) on the main thread drains this queue and sends the actual `LlmTokenStreamEvent` Bevy events.
    *   The `apply_llm_tokens_to_yrs_system` (in `AiPlugin`) listens for `LlmTokenStreamEvent`.
        *   It retrieves the `YrsDocResource` and the target `yrs::TextRef` (identified by `event.target_yrs_text_entity` from the `text_map` in `YrsDocResource`).
        *   It appends the `event.token` to the `yrs::TextRef`.
        *   Crucially, it then sends a `YrsTextChanged { entity: event.target_yrs_text_entity }` event.
4.  **GUI Update (Reactive via Yrs):**
    *   Studio-Whip's existing `text_layout_system` (in `gui_framework::plugins::core.rs`) already listens for `YrsTextChanged` events.
    *   Upon receiving this event, it re-layouts the text, and the custom Vulkan renderer displays the updated content in the next frame. This provides real-time streaming without direct AI-to-renderer calls.
5.  **Cancellation (GUI -> AI -> Task):**
    *   User clicks a "Cancel" button.
    *   GUI system sends `LlmCancelRequest { request_id }`.
    *   `handle_llm_cancel_requests_system` (in `AiPlugin`) finds the `request_id`'s `Arc<AtomicBool>` in `AiModelManager` and sets it to `true`.
    *   The `LlamaCppModel::stream_generate` loop detects the flag and terminates, sending an `LlmGenerationComplete` event with a "Cancelled" status.
6.  **Completion/Error Handling (AI -> GUI):**
    *   When generation finishes (normally, cancelled, or error), the async task queues an `LlmGenerationComplete` event.
    *   `forward_async_ai_events_system` sends this Bevy event.
    *   GUI systems listen for `LlmGenerationComplete` to update UI (hide spinner, show error messages, remove `AwaitingAiResponse` component).

**3.3. Reasoning Behind Design Decisions:**

*   **Modularity (`ai` module):** Keeps AI concerns separate, facilitating future expansion to other AI types (image, audio) using similar patterns.
*   **Bevy Events:** Provides loose coupling between GUI and AI, essential for asynchronous operations and testability. Aligns with existing GUI framework patterns.
*   **Yrs for Text Streaming:** Leverages Studio-Whip's existing CRDT infrastructure for efficient, real-time, and potentially collaborative text updates. The AI module simply "pushes" data into Yrs, and the GUI reacts.
*   **`llama-cpp-2`:** Chosen for its active development, focus on staying current with `llama.cpp`, and direct C++ bindings suitable for Rust.
*   **Asynchronous Tasks (`AsyncComputeTaskPool`):** Prevents UI freezes during model loading and inference, crucial for good UX.
*   **`AtomicBool` for Cancellation:** A standard, lightweight mechanism for signalling cancellation to long-running tasks, ensuring responsiveness.
*   **User Configuration (TOML):** Simple, human-readable way for users to manage their local models.

**4. Implementation Plan & Actionable Steps**

The implementation will be phased. Each step should be testable.

**Phase 1.0: AI Module Skeleton & Configuration**

1.  **Create Directory Structure:**
    *   Create `src/ai/` with `mod.rs`, `common.rs`, `error.rs`, `events.rs`.
    *   Create `src/ai/llm/` with `mod.rs`, `model.rs`, `config.rs`.
    *   Create `src/ai/backends/` with `mod.rs`.
    *   Create `src/ai/backends/llama_cpp_2/` with `mod.rs`.
    *   Create `user/ai_models/` directory.
2.  **Define Core Types & Events:**
    *   Implement structs/enums in `ai::common.rs` (`ModelId`, `ModelType`, `InferenceDevice`, `ModelLoadConfig`, `ModelMetadata`).
    *   Implement `AiError` in `ai::error.rs`.
    *   Implement event structs in `ai::events.rs` (all `Llm*` events, including `LlmCancelRequest`).
    *   Implement `LlmModelUserConfig` in `ai::llm::config.rs` for TOML deserialization.
3.  **`AiPlugin` and `AiModelManager` (Basic Structure):**
    *   Create `AiPlugin` in `ai::mod.rs`.
    *   Create `AiModelManager` resource in `ai::manager.rs` with empty HashMaps for models and active generations.
    *   Implement the `discover_and_request_model_loads_system` to scan `user/ai_models/` and send `LlmLoadRequest` events (initially, these requests won't be fully processed).
    *   Add `AiPlugin` to `main.rs`.
4.  **Testing:**
    *   Verify the plugin loads.
    *   Create a sample `model.toml` file.
    *   Verify `discover_and_request_model_loads_system` runs and sends `LlmLoadRequest` events (log the events).

**Phase 1.1: `llama-cpp-2` Backend Wrapper & Model Loading**

1.  **Add `llama-cpp-2` Dependency:**
    *   Add `llama-cpp-2` to `Cargo.toml` with appropriate feature flags for CPU and CUDA (e.g., `features = ["cuda"]` if building for CUDA).
    *   Ensure `git submodule update --init --recursive` is run if `llama-cpp-2` is a git submodule or if its dependencies require it.
2.  **Implement `ai::backends::llama_cpp_2::mod.rs`:**
    *   Write wrapper functions to:
        *   Initialize `llama.cpp` backend.
        *   Load a GGUF model using `llama_cpp_2::LlamaModel::load_from_file` based on `ModelLoadConfig` (path, `gpu_layers`).
        *   Handle potential errors from `llama-cpp-2` and convert them to `AiError`.
3.  **Implement `LlmModel` Trait and `LlamaCppModel`:**
    *   Define the `LlmModel` trait in `ai::llm::model.rs`.
    *   Implement `LlamaCppModel` struct holding the loaded `llama_cpp_2::LlamaModel`.
    *   Implement the `metadata`, `estimate_vram/ram` (initially can be placeholders or read from GGUF if API allows) methods.
4.  **Enhance `AiModelManager` and Loading Systems:**
    *   Implement `handle_model_load_requests_system`:
        *   Spawn an async Bevy task.
        *   Task uses the `llama_cpp_2` wrapper to load the model.
        *   Task sends results (model instance or error) back to the main thread (e.g., via `AsyncAiTaskOutputs` queue).
    *   Implement `process_model_load_results_system`:
        *   Drains the result queue.
        *   If successful, creates `Arc<LlamaCppModel>` and stores it in `AiModelManager.models`.
        *   Sends `LlmLoadResult` Bevy event.
5.  **Testing:**
    *   Place a small GGUF model (e.g., a tiny test model or a small Mistral quant) and its TOML config in `user/ai_models/`.
    *   Run the app. Verify (via logs and `LlmLoadResult` events) that the model loads successfully on CPU.
    *   If CUDA is set up, test GPU offloading by setting `gpu_layers` in the TOML. Verify `llama.cpp` logs indicate GPU usage.
    *   Test error handling for incorrect paths or corrupted GGUF files.

**Phase 1.2: Text Generation Streaming & Cancellation**

1.  **Implement `LlamaCppModel::stream_generate`:**
    *   Takes `prompt`, `params`, `token_tx: mpsc::Sender`, and `cancel_flag: Arc<AtomicBool>`.
    *   Sets up a `llama_cpp_2::LlamaContext`.
    *   Enters the `llama.cpp` token generation loop.
    *   Inside the loop:
        *   Poll `cancel_flag`. If true, break, send `Err(AiError::Cancelled)` via `token_tx` (or a separate completion channel), and return.
        *   Get the next token string from `llama-cpp-2`.
        *   Send `Ok(token_string)` via `token_tx`.
    *   When the loop finishes (EOS or error), ensure `token_tx` is closed or a final completion message is sent.
2.  **Implement `handle_llm_generate_requests_system`:**
    *   As described in section 3.2.2, this system receives `LlmGenerateRequest`, sets up cancellation, spawns the async task which calls `stream_generate` and forwards tokens/completion via `AsyncAiTaskOutputs`.
3.  **Implement `handle_llm_cancel_requests_system`:**
    *   Receives `LlmCancelRequest`, finds the `Arc<AtomicBool>` in `AiModelManager.active_generations`, and sets it to `true`.
4.  **Implement `forward_async_ai_events_system`:**
    *   Drains `AsyncAiTaskOutputs` and sends `LlmTokenStreamEvent` and `LlmGenerationComplete` Bevy events.
5.  **Implement `apply_llm_tokens_to_yrs_system`:**
    *   Receives `LlmTokenStreamEvent`.
    *   Appends token to the target Yrs `TextRef`.
    *   Sends `YrsTextChanged` event.
6.  **GUI Integration (Basic):**
    *   Create a temporary Bevy system (or use a debug UI if you have one) that can:
        *   Send an `LlmGenerateRequest` for a hardcoded prompt and target `EditableText` entity (the one created in `setup_scene_ecs`).
        *   Send an `LlmCancelRequest` for that generation.
    *   Add basic UI state components (`AwaitingAiResponse`) and systems to manage them based on `LlmGenerateRequest` and `LlmGenerationComplete`.
7.  **Testing:**
    *   Trigger a generation. Verify text streams into the sample `EditableText` UI element.
    *   Verify your `text_layout_system` and renderer update the display in real-time.
    *   Trigger cancellation during generation. Verify generation stops promptly and the UI updates accordingly (e.g., "Cancelled" message, spinner stops).
    *   Test generation completion (EOS token).
    *   Test error conditions during inference (if possible to simulate).

**Phase 1.3: Refinement & Polish**

1.  **Error Handling & Logging:**
    *   Ensure all `Result` types are handled.
    *   Use `bevy_log` macros (`info!`, `warn!`, `error!`) appropriately throughout the AI module.
    *   Propagate meaningful errors to the user via `LlmGenerationComplete` or `LlmLoadResult` events, allowing the GUI to display them.
2.  **Code Cleanup & Documentation:**
    *   Add comments and documentation to new AI modules and systems.
    *   Refactor for clarity and efficiency.
3.  **Basic GUI Integration for Triggering:**
    *   Implement a simple button or hotkey within the main application to trigger generation on the currently focused `EditableText` (as sketched in the previous response).

**5. Future Considerations (Post-Phase 1)**

*   Concurrent LLM generations.
*   Vulkan compute backend for `llama.cpp`.
*   Integration of other AI modalities (Image, Audio), following similar architectural patterns.
*   Advanced prompt engineering UI.
*   More sophisticated model management UI within Studio-Whip.
*   Memory monitoring and dynamic unloading/loading of models based on usage and system resources.

This phased approach allows for incremental development and testing, ensuring each part of the system is functional before building upon it. The focus on leveraging existing Bevy and Yrs patterns should make the integration relatively smooth and maintainable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AI Integration Plan #9

Development Proposal: AI-Powered Text Generation (LLM Integration) for Studio-Whip

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

AI Integration Plan #9

Description

Development Proposal: AI-Powered Text Generation (LLM Integration) for Studio-Whip

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions