Skip to content

iamjoaovytor/Voxly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Voxly

Live Native Captions for macOS

Real-time, AI-powered captions for everything you hear on your Mac. 100% offline. Total privacy.


Features

  • Real-time transcription using Apple's Speech Framework
  • 2-Line Rolling Captions - smart rolling display (max 2 lines, ~70 chars each)
  • AI-formatted captions - clean, readable text (removes filler words, adds punctuation)
  • Raw Transcription Mode - bypass AI for lowest latency (<500ms)
  • 100% offline - no data leaves your Mac
  • Ultra-low latency - < 500ms from audio to caption (Raw mode)
  • Beautiful overlay - floating captions over any app
  • Multi-language support (English US/UK initially)
  • Fully customizable - font size, position, transparency, formatting mode
  • Native macOS - optimized for Apple Silicon

Use Cases

Perfect for:

  • Language learners - see subtitles for YouTube, podcasts, courses
  • International meetings - follow along in Zoom, Meet, Teams
  • Online learning - capture every detail in lectures
  • Accessibility - real-time captions for any audio content
  • Streaming - add captions to Netflix, videos, presentations

Quick Start

Requirements

  • macOS 15+ (Sequoia)
  • Apple Silicon (M1, M2, M3, M4)
  • ~200MB disk space

Installation

  1. Download Voxly.dmg from Releases
  2. Open DMG and drag Voxly to Applications
  3. Launch Voxly
  4. Grant permissions:
    • Screen Recording (to capture system audio)
    • Speech Recognition (to transcribe audio)
  5. Click the menu bar icon and select "Start Capturing"

That's it! Captions will appear at the bottom of your screen.


Documentation


Development

Prerequisites

  • Xcode 16+
  • macOS 15+
  • Apple Silicon Mac

Setup

# Clone the repository
git clone https://github.com/iamjoaovytor/voxly.git
cd voxly

# Open in Xcode
open Voxly.xcodeproj

# Build and run (⌘R)

Architecture

Voxly uses MVVM + Coordinator pattern with the following stack:

  • Swift 6.0+
  • SwiftUI (UI)
  • AppKit (Menu bar, window management)
  • Combine (Reactive programming)
  • ScreenCaptureKit (Audio capture)
  • Speech Framework (Transcription)
  • Foundation Models (AI formatting)

See APPLE_APIS_GUIDE.md for Apple APIs reference.

Testing

# Run tests
⌘U in Xcode

# Run specific test suite
xcodebuild test -scheme Voxly -destination 'platform=macOS'

Roadmap

v1.0 (MVP) - Current

  • Audio capture from system
  • Real-time speech-to-text
  • AI caption formatting
  • Floating overlay window
  • Menu bar app
  • Settings panel
  • English (US/UK) support

v1.1 (Planned)

  • More languages (Spanish, Portuguese, French, German, Japanese)
  • Save & export transcripts
  • Caption history
  • Improved AI formatting with context awareness

v1.2 (Future)

  • Real-time translation
  • Custom themes
  • Integration with learning apps (Anki, etc.)
  • Presentation mode (highlight technical terms)

v2.0 (Vision)

  • iOS/iPadOS companion app
  • iCloud sync
  • Collaborative captions
  • Advanced analytics

Contributing

Contributions are welcome! Please read CONTRIBUTING.md for details.

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'feat: add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see LICENSE file for details.


Acknowledgments

  • Apple for providing amazing on-device ML frameworks
  • The open-source community
  • All beta testers and early adopters

Contact


Privacy

Voxly takes privacy seriously:

  • All processing happens on-device
  • No data is sent to external servers
  • No tracking or analytics
  • No network requests during operation
  • Open source - audit the code yourself

Your audio never leaves your Mac. Period.


Why Voxly?

Other solutions require uploading audio to cloud services, have high latency, or cost money monthly. Voxly is:

  • Free (open source)
  • Fast (< 500ms latency)
  • Private (100% offline)
  • Native (optimized for Apple Silicon)

Built for language learners and accessibility.


Made by João Vitor Sousa

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages