Skip to content

Comments

feat: Add AI Narration (TTS) feature for book reader#1

Closed
Quickkill0 wants to merge 2 commits intotts-implementationfrom
feature/tts-implementation
Closed

feat: Add AI Narration (TTS) feature for book reader#1
Quickkill0 wants to merge 2 commits intotts-implementationfrom
feature/tts-implementation

Conversation

@Quickkill0
Copy link
Owner

Summary

This PR implements a comprehensive AI Narration (Text-to-Speech) feature for Kavita, allowing users to generate and listen to AI-narrated audiobooks from their ebooks using Local AI with Kokoro and VibeVoice 0.5 models.

Key Features

  • Admin Settings Tab: New "AI Narration" configuration with Local AI server URL, engine selection (Kokoro/VibeVoice), voice selection, and advanced parameters (CFG, speed, temperature, etc.)
  • Generation System: Queue-based narration generation with progress tracking, ETA, and chapter-level details
  • Integrated Audio Player: Full-featured player within book reader including:
    • Play/Pause with seek bar
    • Variable playback speed (0.5x - 2x)
    • Volume control with visual indicator
    • Sleep timer (15/30/45/60 min or end of chapter)
    • Audio bookmarks
    • Skip forward/back controls
  • Text-Audio Sync: Paragraph-level highlighting and scrolling during playback
  • Streaming Mode: On-demand TTS generation option for immediate playback
  • Cross-Device Sync: Real-time listening progress sync via SignalR
  • Export: Download narration as MP3/M4A with proper metadata
  • Lifecycle Management: Handles source book deletion/modification gracefully

Technical Implementation

Backend:

  • NarrationController with comprehensive API endpoints
  • NarrationService for business logic
  • NarrationGenerationService for background processing
  • Entity models: NarrationSettings, NarrationJob, NarrationSegment, NarrationProgress, NarrationBookmark
  • SignalR events for progress and sync
  • AutoMapper profiles for DTOs

Frontend:

  • ManageNarrationSettingsComponent for admin configuration
  • NarrationPlayerComponent integrated in book reader
  • NarrationService for API communication
  • MessageHub integration for real-time updates

UI Naming

Uses "Listen" / "Narration" terminology as specified (not TTS).

Test Plan

  • Verify admin settings page loads and saves correctly
  • Test connection to Local AI server
  • Generate narration for a chapter (pre-generate mode)
  • Test streaming mode playback
  • Verify audio player controls (play/pause, speed, volume, sleep timer)
  • Test bookmark creation and navigation
  • Verify text-audio sync highlighting
  • Test export functionality (MP3/M4A)
  • Verify cross-device progress sync
  • Test narration cleanup when source book is deleted

Related

Implements AI Narration feature as specified in plan at .claude/plans/wondrous-yawning-quilt.md

- Fix TypeScript null safety issues in narration-player component template
- Add optional chaining for activeJob properties
- Change calculateTotalTime from private to protected for template access
- Fix SCSS darken() incompatibility with CSS variables
- Add DecimalPipe import to manage-narration-settings component
- Add volume control with slider and dynamic icon to audio player
Adds a workflow_dispatch triggered workflow that:
- Allows selecting any branch to build
- Supports custom Docker tags or auto-generates from branch name
- Offers platform selection (amd64 only, arm64, or all three)
- Pushes to ghcr.io/<your-username>/kavita (not the upstream repo)
- Includes a build summary with all image details
@Quickkill0 Quickkill0 deleted the branch tts-implementation January 18, 2026 06:18
@Quickkill0 Quickkill0 closed this Jan 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant