🤖 Personal AI Telegram Bot (LLM + Voice + Vision)

Ever wanted to clone yourself, your friend, or just build the ultimate custom AI assistant directly in Telegram? That's exactly what this project does.

I put together this Colab notebook so you can spin up a fully multimodal AI companion for free using Google Colab's T4 GPU. It reads texts, listens to voice memos, looks at pictures, and replies with realistic voice messages. It’s essentially your own local pocket AI.

✨ What makes it cool?

🧠 Actually Smart: Uses a local LLM running right in Colab. No need to pay for expensive OpenAI API keys.
🗣️ Cloned Voice: Talks back to you with a custom voice using Coqui XTTS.
👁️ Sees Your Memes: Send it photos, GIFs, or stickers. It uses Groq's Vision API to understand exactly what it's looking at.
👂 Hears You: Send a voice message, and it instantly transcribes and replies in context.
⚡ Context Aware: It actually remembers your conversation (and you can wipe its memory anytime with a simple command).

🎬 See It In Action: Johnny Silverhand Clone

Example.mp4

What it took to build this specific clone:

🧠 LLM Dataset: Only 120 lines of dialogue used for fine-tuning the personality.
🗣️ TTS Audio: Just a 30-second clean audio clip for the voice engine.

⚠️ Pro Tip on Quality: This example was made with very minimal data to show how accessible this is. However, the final quality of your bot's personality and voice depends entirely on the volume and quality of your dataset. A larger text history and a studio-quality audio sample (without background noise or music) will make your digital clone sound incredibly realistic!

🛠️ Under the Hood

Here is the tech stack powering the bot:

Unsloth: Makes running Heavy LLMs super fast and efficient on Colab's T4 GPU.
Coqui XTTS: The magic behind the text-to-speech and zero-shot voice cloning.
Groq API: Does the heavy lifting for lightning-fast image recognition (meta-llama/llama-4-scout-17b-16e-instruct) and audio transcription (whisper-large-v3-turbo).
Aiogram: The engine keeping the Telegram bot running smoothly.

🗂️ Getting Your Models

To make this work, you'll need to drop two things into your Google Drive: a "Brain" (LLM) and a "Voice" (TTS model).

1. The Brain (LLM)

Grab any Unsloth-compatible model (like a 4-bit Llama-3 or Mistral to save RAM) from Hugging Face and upload the folder to your Drive.

Want it to sound exactly like you? You can actually fine-tune an LLM on your own exported Telegram or Discord chats! The Unsloth GitHub has awesome, free Colab notebooks to do this in just a few minutes.

2. The Voice (TTS)

You need an XTTS model folder containing a config.json, the model weights, and a clean 10-15 second audio clip of the voice you want to clone (make sure to name it reference.wav).

The Quick Way: Grab the base coqui/XTTS-v2 model from Hugging Face. XTTS is crazy good at "zero-shot" cloning it'll just copy the voice from your reference.wav without any extra training.
The Pro Way: If the zero-shot clone sounds a bit off or has the wrong accent, you can fine-tune it. I highly recommend xtts-finetune-webui by daswer123. It’s a super simple web interface for training XTTS on your own audio datasets.

🚀 Let's Get It Running

Get your Tokens: Grab a bot token from @BotFather on Telegram and a free API key from the Groq Console.
Drive Setup: Upload your LLM and TTS model folders to your Google Drive.
Open Colab: Click that blue "Open in Colab" badge at the top of this page. Make sure the runtime is set to T4 GPU (Runtime => Change runtime type).
Hide your keys: Click the 🔑 Secrets tab on the left sidebar in Colab. Add BOT_API and GROQ_API, paste your keys, and turn ON "Notebook access" for both. (Never paste keys directly into the code!)
Run the Blocks:
- Run Block 1 (Install). It will download the required libraries and automatically restart the session.
- Set up Block 2 (Settings). Paste your folder paths and tweak the AI sliders.
  - ⚠️ Important LLM Loading Settings: To avoid Out Of Memory (OOM) errors on a 15GB T4 GPU, adjust the Unsloth parameters based on your model:
    - 7B-9B Models (Llama-3, DeepSeek): LLM_MAX_SEQ_LENGTH = 2048, LLM_LOAD_IN_4BIT = True.
    - 12B Models (Mistral-Nemo): LLM_MAX_SEQ_LENGTH = 2048, LLM_LOAD_IN_4BIT = True.
    - 1.5B-3B Models (Qwen): LLM_MAX_SEQ_LENGTH = 8192, LLM_LOAD_IN_4BIT = False.
- Run Block 3 (Load Models). This loads the heavy weights into the GPU and spins up the TTS server.
- Customize Block 4 (Personalization). Here you can completely change the bot's core personality (SYSTEM_INSTRUCTION), adjust the typing debounce timer, and translate all UI messages (like "Typing..." or "Generating voice...") to fit your style.
- Run Block 5 (Start Telegram bot). Once the console says Bot is running!, jump into Telegram and type /start.

📝 Commands

/start - Wake up the bot.
/voice_mode - Toggle between text and voice message replies.
/reset - Regenerate the last response if you didn't like what the AI said.
/reset_memory - Wipe the current conversation context and start fresh.

🤝 Support & Feedback

If you had fun building your own AI companion or just found this project useful, I’d massively appreciate a ⭐️ Star on this repository!

It helps more people discover the project and keeps me motivated to push more cool AI stuff. ♥ <3

Something broke? Got a killer feature idea? Don't be shy, let's make this bot even better together:

🐛 Open an Issue right here on GitHub if you catch any bugs or OOM errors.
💬 Hit me up on Telegram: @Just_Xirexxx
📧 Drop me an email: xyroset+dev@gmail.com

Pull requests and forks are always welcome. Let's build the ultimate open-source pocket AI together! 🚀

📜 License

This project is open-source and available under the MIT License. Note: The AI models used in conjunction with this code (like Llama, XTTS, etc.) are subject to their own respective licenses and terms of use.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
LICENSE		LICENSE
Personal_AI_Telegram_Bot_(LLM_+_Voice_+_Vision).ipynb		Personal_AI_Telegram_Bot_(LLM_+_Voice_+_Vision).ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Personal AI Telegram Bot (LLM + Voice + Vision)

✨ What makes it cool?

🎬 See It In Action: Johnny Silverhand Clone

🛠️ Under the Hood

🗂️ Getting Your Models

1. The Brain (LLM)

2. The Voice (TTS)

🚀 Let's Get It Running

📝 Commands

🤝 Support & Feedback

If you had fun building your own AI companion or just found this project useful, I’d massively appreciate a ⭐️ Star on this repository!

It helps more people discover the project and keeps me motivated to push more cool AI stuff. ♥ <3

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Personal AI Telegram Bot (LLM + Voice + Vision)

✨ What makes it cool?

🎬 See It In Action: Johnny Silverhand Clone

🛠️ Under the Hood

🗂️ Getting Your Models

1. The Brain (LLM)

2. The Voice (TTS)

🚀 Let's Get It Running

📝 Commands

🤝 Support & Feedback

If you had fun building your own AI companion or just found this project useful, I’d massively appreciate a ⭐️ Star on this repository! It helps more people discover the project and keeps me motivated to push more cool AI stuff. ♥ <3

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

If you had fun building your own AI companion or just found this project useful, I’d massively appreciate a ⭐️ Star on this repository!

It helps more people discover the project and keeps me motivated to push more cool AI stuff. ♥ <3

Packages