This Discord bot uses voice recognition to interact with users in a voice channel through transcription, processing with a Large Language Model (LLM), and responding with synthesized voice. The bot converts spoken audio to text, sends it to an LLM for processing, and uses Text-to-Speech (TTS) to voice the response.
Additionally, you can just use the bot in text channels.
- Conversation: Engage in a conversation with the bot using voice or text input.
- Music Playback: Play music from YouTube in the voice channel. Say
play [query] on youtube
orplay [query] song
to play a song. You can also use the>play
command. - Timers: Set a timer by saying
set a timer for [time]
orset an alarm for [time]
. The bot will notify you when the timer is up. - Internet search: Ask the bot to search the internet for you by saying
search [query] on internet
orsearch on internet for [query]
. The bot will respond using the web. - Vision: Send an image mentioning the bot, and it will react to it in voice chat.
- Node.js and npm installed
- A Discord Bot Token
- Access to OpenAI compatible APIs for STT (Speech to Text), LLM, and TTS services (for fully local, checkout
openedai-whisper
,ollama
andopenedai-speech
) - If you wish to use timer and alarms, you need a
alarm.mp3
andtimer.mp3
files in thesounds
folder.
-
Clone the Repository:
git clone <repository-url> cd <repository-directory>
-
Install Dependencies:
npm install
-
Configure the Environment:
- Rename
.env.example
to.env
. - Update the
.env
file with your specific credentials and API endpoints.
-
Start the Bot:
node bot.js
-
Invite the Bot to Your Discord Server:
- Use the invite link generated through your Discord application page. Here is a quick link with all the permissions the bot should ever need: https://discord.com/oauth2/authorize?client_id=REPLACEME&permissions=964220516416&scope=bot
(Change "REPLACEME" with your bot's ID)
- Using the Bot in Discord:
- Ensure the bot has permission to join voice channels and speak.
- In a Discord server where the bot is a member, join a voice channel and type the command
>join
or>join free
. - The bot will join the channel and start listening to users who are speaking. Spoken phrases are processed and responded to in real-time.
- To start a new conversation, mention the bot in your message.
- To continue a conversation, just reply to the bot's message.
- You can also continue conversations by creating a thread from the bot's message. In that case, you no longer need to reply or mention the bot within the thread.
/join
: Command for the bot to join the voice channel you are currently in. The bot will listen to voice input, transcribe it, send it to the LLM if you used a trigger word, and respond with a spoken answer using TTS./join free
: Similar to>join
, but will respond to everything without using trigger words. Best for solo usage./join silent
: Similar to>join
, but no confirmation sound will play when trigger is detected/llm responded./join transcribe
: Similar to>join
, but will save the transcriptions to a file and send it once you use the>leave
command./play [song name or URL]
: Play a song from YouTube using either its name (search via API) or direct URL. Please note that the search function requires a valid API key. You may also sayplay [query] on youtube
orplay [query] song
in voice chat./search [query]
: Use perplexity LLM search to find the best answer to your query. You may also saysearch [query] on internet
orsearch on internet for [query]
in voice chat./reminder [timestamp] [message]
: Set a reminder for a specific time using Discord timestamps./reset
: Reset the LLM chat history. You may also sayreset chat history
in voice chat./leave
: Command for the bot to leave the voice channel. You may also sayleave voice chat
in voice chat./help
: Display the list of available commands.
You may at any time say stop
to stop the bot while it is speaking.
- Bot Doesn't Join Channel: Ensure the bot has the correct permissions in your Discord server, including the ability to join and speak in voice channels.
- No Audio from Bot: Check that the TTS API is returning valid MP3 audio data and that the bot has permissions to play audio in the channel.
- Errors in Transcription or Response: Verify that the API endpoints and models specified in the
.env
file are correct and that the APIs are operational.