Ever wanted to clone yourself, your friend, or just build the ultimate custom AI assistant directly in Telegram? That's exactly what this project does.
I put together this Colab notebook so you can spin up a fully multimodal AI companion for free using Google Colab's T4 GPU. It reads texts, listens to voice memos, looks at pictures, and replies with realistic voice messages. It’s essentially your own local pocket AI.
- 🧠 Actually Smart: Uses a local LLM running right in Colab. No need to pay for expensive OpenAI API keys.
- 🗣️ Cloned Voice: Talks back to you with a custom voice using Coqui XTTS.
- 👁️ Sees Your Memes: Send it photos, GIFs, or stickers. It uses Groq's Vision API to understand exactly what it's looking at.
- 👂 Hears You: Send a voice message, and it instantly transcribes and replies in context.
- ⚡ Context Aware: It actually remembers your conversation (and you can wipe its memory anytime with a simple command).
Example.mp4
What it took to build this specific clone:
- 🧠 LLM Dataset: Only 120 lines of dialogue used for fine-tuning the personality.
- 🗣️ TTS Audio: Just a 30-second clean audio clip for the voice engine.
Here is the tech stack powering the bot:
- Unsloth: Makes running Heavy LLMs super fast and efficient on Colab's T4 GPU.
- Coqui XTTS: The magic behind the text-to-speech and zero-shot voice cloning.
- Groq API: Does the heavy lifting for lightning-fast image recognition (
meta-llama/llama-4-scout-17b-16e-instruct) and audio transcription (whisper-large-v3-turbo). - Aiogram: The engine keeping the Telegram bot running smoothly.
To make this work, you'll need to drop two things into your Google Drive: a "Brain" (LLM) and a "Voice" (TTS model).
Grab any Unsloth-compatible model (like a 4-bit Llama-3 or Mistral to save RAM) from Hugging Face and upload the folder to your Drive.
- Want it to sound exactly like you? You can actually fine-tune an LLM on your own exported Telegram or Discord chats! The Unsloth GitHub has awesome, free Colab notebooks to do this in just a few minutes.
You need an XTTS model folder containing a config.json, the model weights, and a clean 10-15 second audio clip of the voice you want to clone (make sure to name it reference.wav).
- The Quick Way: Grab the base
coqui/XTTS-v2model from Hugging Face. XTTS is crazy good at "zero-shot" cloning it'll just copy the voice from yourreference.wavwithout any extra training. - The Pro Way: If the zero-shot clone sounds a bit off or has the wrong accent, you can fine-tune it. I highly recommend xtts-finetune-webui by daswer123. It’s a super simple web interface for training XTTS on your own audio datasets.
- Get your Tokens: Grab a bot token from @BotFather on Telegram and a free API key from the Groq Console.
- Drive Setup: Upload your LLM and TTS model folders to your Google Drive.
- Open Colab: Click that blue "Open in Colab" badge at the top of this page. Make sure the runtime is set to T4 GPU (
Runtime=>Change runtime type). - Hide your keys: Click the 🔑 Secrets tab on the left sidebar in Colab. Add
BOT_APIandGROQ_API, paste your keys, and turn ON "Notebook access" for both. (Never paste keys directly into the code!) - Run the Blocks:
- Run Block 1 (Install). It will download the required libraries and automatically restart the session.
- Set up Block 2 (Settings). Paste your folder paths and tweak the AI sliders.
⚠️ Important LLM Loading Settings: To avoid Out Of Memory (OOM) errors on a 15GB T4 GPU, adjust the Unsloth parameters based on your model:- 7B-9B Models (Llama-3, DeepSeek):
LLM_MAX_SEQ_LENGTH= 2048,LLM_LOAD_IN_4BIT= True. - 12B Models (Mistral-Nemo):
LLM_MAX_SEQ_LENGTH= 2048,LLM_LOAD_IN_4BIT= True. - 1.5B-3B Models (Qwen):
LLM_MAX_SEQ_LENGTH= 8192,LLM_LOAD_IN_4BIT= False.
- 7B-9B Models (Llama-3, DeepSeek):
- Run Block 3 (Load Models). This loads the heavy weights into the GPU and spins up the TTS server.
- Customize Block 4 (Personalization). Here you can completely change the bot's core personality (
SYSTEM_INSTRUCTION), adjust the typing debounce timer, and translate all UI messages (like "Typing..." or "Generating voice...") to fit your style. - Run Block 5 (Start Telegram bot). Once the console says
Bot is running!, jump into Telegram and type/start.
/start- Wake up the bot./voice_mode- Toggle between text and voice message replies./reset- Regenerate the last response if you didn't like what the AI said./reset_memory- Wipe the current conversation context and start fresh.
If you had fun building your own AI companion or just found this project useful, I’d massively appreciate a ⭐️ Star on this repository!
It helps more people discover the project and keeps me motivated to push more cool AI stuff. ♥ <3
Something broke? Got a killer feature idea? Don't be shy, let's make this bot even better together:
- 🐛 Open an Issue right here on GitHub if you catch any bugs or OOM errors.
- 💬 Hit me up on Telegram: @Just_Xirexxx
- 📧 Drop me an email: xyroset+dev@gmail.com
Pull requests and forks are always welcome. Let's build the ultimate open-source pocket AI together! 🚀
This project is open-source and available under the MIT License. Note: The AI models used in conjunction with this code (like Llama, XTTS, etc.) are subject to their own respective licenses and terms of use.
