A command-line interface for access to core AI services: speech-to-text, large language models, and text-to-speech. This toolkit offers advanced AI capabilities directly from their terminal.
- Speech-to-Text (STT): Transcribe speech from your microphone to text.
- Large Language Model (LLM): Interact with powerful language models for various tasks.
- Text-to-Speech (TTS): Convert text to natural-sounding speech.
- Remote Server Support: All AI services can be installed on remote servers for enhanced performance and flexibility.
- Modular Design: Each tool functions independently and can be combined for complex workflows.
- Real-time Processing: Supports real-time transcription and speech synthesis.
- Multiple LLM Providers: Switch between different LLM providers (LMStudio, Anthropic, OpenAI, Ollama).
- Customizable TTS: Options for different TTS models and voices, including RVC support.
This project is in early alpha stage. Servers aren't stable yet, also this current implementation only offers a working prototype with opinionated, streamlined commands. While customization options are limited at this phase, the foundation is set for potential expansion into a comprehensive CLI Swiss Army knife for STT, LLM, and TTS applications.
- Enhance customization options for each tool
- Expand API parameter accessibility
- Improve documentation and usage examples
- Gather user feedback for prioritizing new features
The STT command transcribes speech from your microphone to text.
- Basic:
stt
- File output:
stt > file.txt
C:\Dev\Test\ai-cli-tools>stt
This is detected speech.
C:\Dev\Test\ai-cli-tools>stt > transcription.txt
[Transcription saved to file]
The LLM command sends queries to a Large Language Model (using LM Studio on our server).
- Inline query:
llm your question here
- Piped input:
echo your question | llm
- Speech input:
stt | llm
- File output:
llm your question > answer.txt
- System message output:
echo your query or question | llm your system message
- Provider and model selection:
llm your question --provider [lmstudio|anthropic|openai|ollama] --model [model_name]
C:\Dev\Test\ai-cli-tools>llm When was Barack Obama born?
Barack Obama, the 44th President of the United States, was born on August 4, 1961.
C:\Dev\Test\ai-cli-tools>echo List 5 reputable sources for information on climate change: | llm > sources.txt
C:\Dev\Test\ai-cli-tools>llm Tell me about AI --anthropic
[Response from Anthropic's Claude model]
C:\Dev\Test\ai-cli-tools>llm Explain quantum computing --openai --model gpt-4
[Response from OpenAI's GPT-4 model]
The TTS command synthesizes text into speech in real-time.
- Inline text:
tts text to speak
- Piped input:
echo text to speak | tts
- Full pipeline:
stt | llm | tts
- File output:
tts Hello, world! > output.wav
- RVC toggle:
tts Hello, world! --rvc
C:\Dev\Test\ai-cli-tools>tts Let me hear this
Let me hear this
C:\Dev\Test\ai-cli-tools>echo hey there | llm answer short | tts
[Speaks the short answer]
C:\Dev\Test\ai-cli-tools>stt | llm translate to english, only return the translation | tts --rvc > translation.wav
[Transcribes speech, translates it to English, applies RVC, and saves to file]
You can combine these tools in various ways to create powerful workflows:
stt | llm --anthropic | tts --rvc
[Transcribes speech, processes it with Anthropic's LLM, and speaks the result using RVC]
echo Explain relativity | llm --openai --model gpt-4 | tts > explanation.wav
[Processes text with OpenAI's GPT-4, then speaks and saves the result]
stt | llm translate to english, only return the translation | tts --xtts-voice french_voice.wav
[Transcribes speech, translates it to English, and speaks the translation with a French voice]
- Python 3.10.9
- CUDA 12.1
-
Clone the Repository
git clone https://github.com/KoljaB/ai_cli_tools.git cd ai_cli_tools
-
Create a Virtual Environment
For Windows:
_create_venv.bat
-
Activate the Virtual Environment
For Windows:
_start_venv.bat
-
Install Dependencies
Execute in that virtual environment:
pip install -r requirements.txt
-
Upgrade PyTorch to Use GPU
Execute in venv:
pip install torch==2.3.1+cu121 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu121
(Adjust the CUDA version according to your system.)
-
Install models for TTS and RVC
Execute in venv or main environment:
For Windows:
_install_models.bat
-
Install CLI commands
Execute this on your main python or on your clients environment, NOT in the virtual environment.
For Windows:
pip install -r requirements_client.txt _install_cli_commands.bat
-
Start the servers
Execute this on your main python or on your clients environment, NOT in the venv.
For Windows:
_start_stt_server.bat _start_llm_server.bat _start_tts_server.bat
or execute
stt-server llm-server tts-server
To use certain LLM providers, set the following environment variables on the server machine:
- For OpenAI:
OPENAI_API_KEY
- For Anthropic:
ANTHROPIC_API_KEY
- You can use CTRL+C to immediately abort any command (stt, llm, tts).
- The system will prompt to start a local server if it's not running when you try to use a command.
- Default models:
- OpenAI: gpt-4o-mini
- Anthropic: claude-3-sonnet-20240229
- Ollama: llama3.1