Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 31 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ Hit your <span style="color:#FF4500">**hotkey shortcut**</span> -> speak -> hotk
| Feature | Notes |
| -------------------------------- | ----------------------------------------------------------------------- |
| **Whisper.cpp** backend | Local, offline, fast ASR. |
| **Simulated typing** | instantly types straight into any currently focused input window. Even on Wayland! (*ydotool*). |
| **Streaming transcription** | Real-time incremental typing as you speak. Text appears word-by-word, not after recording stops. |
| **Simulated typing** | Instantly types straight into any currently focused input window. Even on Wayland! (*ydotool*). |
| **Clipboard** | Auto-copies into clipboard - ready for pasting, if desired |
| **Languages** | 99+ languages. Provides default language config and session language override |
| **AIPP**, AI Post-Processing | AI-rewriting via local or cloud LLMs. GUI prompt editor. |
Expand Down Expand Up @@ -97,6 +98,8 @@ cd voxd && ./setup.sh

Setup is non-interactive with minimal console output; a detailed setup log is saved in the repo directory (e.g. `2025-09-18-setup-log.txt`).

**Note:** The setup script automatically detects and uses the highest available Python 3.x version (3.9 or newer) on your system. If you have Python 3.14, 3.13, or any newer version installed, it will be used automatically.

**Reboot** the system!
(unless on an X11 system; on most modern systems there is Wayland, so **ydotool** is required for typing and needs rebooting for user setup).

Expand Down Expand Up @@ -138,9 +141,25 @@ Leave VOXD running in the background -> go to any app where you want to voice-ty
| Press hotkey … | VOXD does … |
| ---------------- | ----------------------------------------------------------- |
| **First press** | start recording |
| **Second press** | stop β‡’ [transcribe β‡’ copy to clipboard] β‡’ types the output into any focused app |
| **Second press** | stop β‡’ [finalize transcription β‡’ copy to clipboard] β‡’ types any remaining output into any focused app |

### πŸŽ™οΈ Streaming Mode (Default)

VOXD uses **streaming transcription** by default, which means:

- **Real-time typing**: Text appears incrementally as you speak, not after you stop recording
- **Chunk-based processing**: Audio is processed in overlapping chunks (default: 3 seconds) for continuous transcription
- **Incremental updates**: Text is typed word-by-word or phrase-by-phrase as it's transcribed (typically every 2 seconds or 3 words)
- **Seamless experience**: You see your words appear in real-time, making it feel like natural voice-typing

Otherwise, if in --flux (beta), **just speak**.
**How it works:**
1. Press hotkey to start β†’ VOXD begins recording and transcribing
2. As you speak β†’ Text appears incrementally in your focused application
3. Press hotkey again β†’ Finalizes any remaining transcription and copies to clipboard

This streaming behavior is enabled by default in CLI (`voxd`), GUI (`voxd --gui`), and Tray (`voxd --tray`) modes. The old "record-then-transcribe" behavior is no longer used.

**Note:** If in `--flux` mode (beta), **just speak** - no hotkey needed, voice activity detection triggers recording automatically.

### Autostart
For practical reasons (always ready to type & low system footprint), it is advised to enable voxd user daemon:
Expand Down Expand Up @@ -307,6 +326,15 @@ llamacpp_server_timeout: 30
# Selected models per provider (automatically updated by VOXD)
aipp_selected_models:
llamacpp_server: "qwen2.5-3b-instruct-q4_k_m"

# Streaming transcription settings (default: enabled)
streaming_enabled: true # Enable/disable streaming mode
streaming_chunk_seconds: 3.0 # Audio chunk size in seconds (default: 3.0)
streaming_overlap_seconds: 0.5 # Overlap between chunks in seconds (default: 0.5)
streaming_emit_interval_seconds: 2.0 # Minimum time between text updates (default: 2.0)
streaming_emit_word_count: 3 # Minimum words before emitting text (default: 3)
streaming_typing_delay: 0.01 # Delay between typed characters in streaming mode (default: 0.01)
streaming_min_chars_to_type: 3 # Minimum characters before typing incremental text (default: 3)
```

---
Expand Down
51 changes: 42 additions & 9 deletions packaging/postinstall.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,29 +41,62 @@ echo "voxd installed. Each user should run: voxd --setup"
# We inherit system site-packages to avoid duplicating distro Python libs
APPDIR="/opt/voxd"

# Pick a Python >= 3.9 if available; attempt RPM install on openSUSE if too old
# Pick the highest available Python 3.x version (>=3.9)
# Attempt RPM install on openSUSE if too old
pick_python() {
for c in python3.12 python3.11 python3.10 python3.9 python3 python; do
best_cmd=""
best_ver=""
# Check common python3.x commands (3.9 to 3.20) and python3/python
# Build candidate list
i=20
while [ "$i" -ge 9 ]; do
if command -v "python3.$i" >/dev/null 2>&1; then
ver="$(python3.$i - <<'PY'
import sys
print(f"{sys.version_info.major}.{sys.version_info.minor}")
PY
)" 2>/dev/null || { i=$((i - 1)); continue; }
# Check if version is >= 3.9
IFS='.' read -r major minor <<EOF
$ver
EOF
if [ "$major" -gt 3 ] || ([ "$major" -eq 3 ] && [ "$minor" -ge 9 ]); then
# Compare versions: if no best yet, or this is newer
if [ -z "$best_ver" ] || [ "$(printf '%s\n' "$best_ver" "$ver" | sort -V | tail -1)" = "$ver" ]; then
best_cmd="python3.$i"
best_ver="$ver"
fi
fi
fi
i=$((i - 1))
done
# Also check python3 and python
for c in python3 python; do
if command -v "$c" >/dev/null 2>&1; then
ver="$("$c" - <<'PY'
import sys
print(f"{sys.version_info.major}.{sys.version_info.minor}")
PY
)"
case "$ver" in
3.9|3.10|3.11|3.12|3.13) echo "$c"; return 0 ;;
*) ;;
esac
)" 2>/dev/null || continue
IFS='.' read -r major minor <<EOF
$ver
EOF
if [ "$major" -gt 3 ] || ([ "$major" -eq 3 ] && [ "$minor" -ge 9 ]); then
if [ -z "$best_ver" ] || [ "$(printf '%s\n' "$best_ver" "$ver" | sort -V | tail -1)" = "$ver" ]; then
best_cmd="$c"
best_ver="$ver"
fi
fi
fi
done
echo ""
[ -n "$best_cmd" ] && echo "$best_cmd"
}

PY="$(pick_python)"

# If no suitable python found, on zypper try to install a newer one
if [ -z "$PY" ] && command -v zypper >/dev/null 2>&1; then
for pkg in python311 python3.11 python310 python3.10 python39 python3.9; do
for pkg in python314 python3.14 python313 python3.13 python311 python3.11 python310 python3.10 python39 python3.9; do
if zypper --non-interactive --no-gpg-checks install -y "$pkg" >/dev/null 2>&1; then
break
fi
Expand Down
43 changes: 29 additions & 14 deletions packaging/voxd.wrapper
Original file line number Diff line number Diff line change
Expand Up @@ -44,25 +44,41 @@ print(f"{sys.version_info.major}.{sys.version_info.minor}")
PY
)"
log "System Python version: $ver"
case "$ver" in
3.9|3.10|3.11|3.12|3.13) : ;;
*)
# Attempt to create a user-local venv with any newer Python found
# Check if version is >= 3.9
IFS='.' read -r major minor <<< "$ver"
if [[ "$major" -gt 3 ]] || [[ "$major" -eq 3 && "$minor" -ge 9 ]]; then
: # version is acceptable
else
# Attempt to create a user-local venv with any newer Python found
# Pick the highest available Python 3.x version (>=3.9)
pick_python() {
for c in python3.12 python3.11 python3.10 python3.9 python3; do
local best_cmd="" best_ver=""
# Check common python3.x commands (3.9 to 3.20) and python3/python
local candidates=()
for i in {20..9}; do
candidates+=("python3.$i")
done
candidates+=("python3" "python")

for c in "${candidates[@]}"; do
if command -v "$c" >/dev/null 2>&1; then
v="$($c - <<'PY'
v="$("$c" - <<'PY'
import sys
print(f"{sys.version_info.major}.{sys.version_info.minor}")
PY
)"
case "$v" in
3.9|3.10|3.11|3.12|3.13) echo "$c"; return 0 ;;
*) : ;;
esac
)" 2>/dev/null || continue
# Check if version is >= 3.9
IFS='.' read -r v_major v_minor <<< "$v"
if [[ "$v_major" -gt 3 ]] || [[ "$v_major" -eq 3 && "$v_minor" -ge 9 ]]; then
# Compare versions: if no best yet, or this is newer
if [[ -z "$best_ver" ]] || [[ "$(printf '%s\n' "$best_ver" "$v" | sort -V | tail -1)" == "$v" ]]; then
best_cmd="$c"
best_ver="$v"
fi
fi
fi
done
echo ""
[[ -n "$best_cmd" ]] && echo "$best_cmd"
}
log "Attempting to locate newer system Python (>=3.9)"
CAND="$(pick_python)"
Expand All @@ -86,8 +102,7 @@ PY
echo "[voxd] System Python $ver is unsupported and no newer Python was found. Use 'bash packaging/install_voxd.sh <rpm>' to provision a newer Python, or create $APPDIR/.venv with Python >= 3.9." >&2
exit 1
fi
;;
esac
fi
fi

# Ensure Python can import the embedded source tree
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "voxd"
version = "mr.batman" # bump manually on releases
version = "1.7.0"
description = "Voice-typing helper powered by whisper.cpp"
authors = [{ name = "Jakov", email = "jakov.iv@proton.me" }]
requires-python = ">=3.9"
Expand Down
45 changes: 42 additions & 3 deletions setup.sh
Original file line number Diff line number Diff line change
Expand Up @@ -523,10 +523,47 @@ fi
spinner_stop 0

# ────────────────── 4. python venv & deps ─────────────────────────────────––
# Pick the highest available Python 3.x version (>=3.9)
pick_python() {
local best_cmd="" best_ver=""
# Check common python3.x commands (3.9 to 3.20) and python3/python
local candidates=()
for i in {20..9}; do
candidates+=("python3.$i")
done
candidates+=("python3" "python")

for c in "${candidates[@]}"; do
if command -v "$c" >/dev/null 2>&1; then
ver="$("$c" - <<'PY'
import sys
print(f"{sys.version_info.major}.{sys.version_info.minor}")
PY
)" 2>/dev/null || continue
# Check if version is >= 3.9
IFS='.' read -r major minor <<< "$ver"
if [[ "$major" -gt 3 ]] || [[ "$major" -eq 3 && "$minor" -ge 9 ]]; then
# Compare versions: if no best yet, or this is newer
if [[ -z "$best_ver" ]] || [[ "$(printf '%s\n' "$best_ver" "$ver" | sort -V | tail -1)" == "$ver" ]]; then
best_cmd="$c"
best_ver="$ver"
fi
fi
fi
done
[[ -n "$best_cmd" ]] && echo "$best_cmd"
}

spinner_start "Setting up Python env and installing VOXD"
PYTHON_CMD="$(pick_python)"
if [[ -z "$PYTHON_CMD" ]]; then
die "No suitable Python (>=3.9) found. Please install Python 3.9 or newer."
fi
msg "Using Python: $PYTHON_CMD"

if [[ ! -d .venv ]]; then
msg "Creating virtualenv (.venv)…"
python3 -m venv .venv
msg "Creating virtualenv (.venv) with $PYTHON_CMD…"
"$PYTHON_CMD" -m venv .venv
else
msg "Using existing virtualenv (.venv)"
fi
Expand Down Expand Up @@ -556,7 +593,9 @@ fi
pip install -e .

# Fix editable install .pth file if it's empty (hatchling bug workaround)
PTH_FILE=".venv/lib/python3.12/site-packages/_voxd.pth"
# Dynamically determine Python version for PTH file path
PY_VERSION="$($PY -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"
PTH_FILE=".venv/lib/python${PY_VERSION}/site-packages/_voxd.pth"
if [[ -f "$PTH_FILE" && ! -s "$PTH_FILE" ]]; then
echo "$PWD/src" > "$PTH_FILE"
msg "Fixed editable install .pth file"
Expand Down
48 changes: 47 additions & 1 deletion src/voxd/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,31 @@ def _systemd_user_available() -> bool:
except Exception:
return False

def _find_voxd_executable() -> str:
"""Find the voxd executable path, checking PATH and common locations.

Returns the path to voxd executable, or 'voxd' if not found (relying on PATH).
"""
# First, try to find voxd in PATH
voxd_path = shutil.which("voxd")
if voxd_path:
return voxd_path

# Fall back to common installation locations
home = Path.home()
candidates = [
home / ".local/bin/voxd",
Path("/usr/local/bin/voxd"),
Path("/usr/bin/voxd"),
]

for candidate in candidates:
if candidate.exists() and os.access(candidate, os.X_OK):
return str(candidate)

# If nothing found, return 'voxd' and let systemd use PATH
return "voxd"

def _ensure_voxd_tray_unit() -> None:
"""Ensure a voxd-tray.service user unit exists (packaged or per-user fallback)."""
try:
Expand All @@ -184,20 +209,28 @@ def _ensure_voxd_tray_unit() -> None:
except Exception:
pass
unit_path = user_dir / "voxd-tray.service"
voxd_exec = _find_voxd_executable()

if not unit_path.exists():
unit_path.write_text(
"[Unit]\n"
"Description=VOXD tray mode (user)\n"
"After=default.target\n\n"
"[Service]\n"
"Type=simple\n"
"ExecStart=/usr/bin/voxd --tray\n"
f"ExecStart={voxd_exec} --tray\n"
"Restart=on-failure\n"
"RestartSec=2s\n"
"Environment=YDOTOOL_SOCKET=%h/.ydotool_socket\n\n"
"[Install]\n"
"WantedBy=default.target\n"
)
else:
# Update existing file if it has the hardcoded /usr/bin/voxd path
content = unit_path.read_text()
if "ExecStart=/usr/bin/voxd --tray" in content and voxd_exec != "/usr/bin/voxd":
content = content.replace("ExecStart=/usr/bin/voxd --tray", f"ExecStart={voxd_exec} --tray")
unit_path.write_text(content)
except Exception:
pass

Expand Down Expand Up @@ -352,6 +385,12 @@ def main():
dest="lang",
help="Transcription language (ISO 639-1, e.g. 'en', 'sv', or 'auto' for detection)"
)
parser.add_argument(
"-v",
"--verbose",
action="store_true",
help="Enable verbose logging (shows detailed debug output)"
)
args, unknown = parser.parse_known_args()

if args.version:
Expand Down Expand Up @@ -419,6 +458,13 @@ def main():
sys.exit(0)

cfg = AppConfig()
# Session-only override for verbosity
if args.verbose:
cfg.data["verbosity"] = True
setattr(cfg, "verbosity", True)
import os
os.environ["VOXD_VERBOSE"] = "1"

# Session-only override for language
if args.lang:
try:
Expand Down
Loading