DialogSmith lets you fine-tune large language models (LLMs) like Qwen on your own Telegram conversations. Built on top of LLaMA-Factory, it automatically formats data into the ShareGPT format for supervised fine-tuning (SFT).
Fine-tuning on Telegram data can capture aspects of your text style, including:
- Writing tone, vocabulary, and phrasing
- Typical response lengths and structure
- Repeated expressions or idioms
- Conversational flow and habits
However, this method won’t replicate your deeper beliefs, private memories, or behavior outside the chat. It reflects how you write — not necessarily how you think.
For stronger emulation, consider incorporating:
- Additional sources like emails or forum posts
- Clear prompt instructions during inference
- Domain-specific datasets (e.g., technical messages, inside jokes)
Fine-tuning on real chat history may unintentionally encode:
- Personal identifiers (names, locations, contact info)
- Confidential conversations
- Sensitive or offensive content
Always review and sanitize your exported dataset (
result.json) before training. You are responsible for ensuring compliance with privacy laws and personal data protection.
- Open Telegram Desktop.
- Go to:
Settings > Advanced > Export Telegram Data. - Select your personal chat or group to export.
- Ensure JSON format is selected (not HTML).
- Place the exported
result.jsonfile into:
DialogSmith/
├── data/
│ └── result.json ← Place here
Run the automated setup script from Command Prompt (not PowerShell):
setup.batThis will:
- Create and activate a Python virtual environment
- Upgrade
pip - Clone the official LLaMA-Factory repository and install Python dependencies from
requirements.txt - Patch
dataset_info.jsonto register your dataset (chat_sharegpt) - Process your exported Telegram chat (
result.json) intochat_sharegpt.json - Place the converted dataset in the correct directory (
LLaMA-Factory/data)
Once complete, you will see:
All steps completed successfully.
Please refer to the README.md for the next steps.
You will find instructions on how to launch training.
Make sure your result.json file is already located at:
./data/result.json
The following example uses Qwen1.5-1.8B-Chat, but you can replace it with any Hugging Face-compatible model.
python LLaMA-Factory\src\train.py --stage sft --do_train ^
--model_name_or_path Qwen/Qwen1.5-1.8B-Chat ^
--dataset chat_sharegpt ^
--dataset_dir .\LLaMA-Factory\data ^
--template qwen ^
--finetuning_type lora ^
--lora_target Wqkv,o_proj,gate_proj,down_proj,up_proj ^
--output_dir saves\Qwen1.5-1.8B-Chat-lora ^
--overwrite_cache ^
--per_device_train_batch_size 2 ^
--gradient_accumulation_steps 4 ^
--lr_scheduler_type cosine ^
--logging_steps 10 ^
--save_strategy steps ^
--save_steps 100 ^
--learning_rate 5e-5 ^
--num_train_epochs 3.0 ^
--plot_lossModify these flags to match your model:
| Option | Description |
|---|---|
--model_name_or_path |
Hugging Face model ID or local model path |
--template |
Prompt template type (e.g., qwen, chatml, default) |
--lora_target |
LoRA target modules (refer to model’s architecture) |
--output_dir |
Destination to save the LoRA checkpoints |
If you're using a model like mistralai/Mistral-7B-Instruct-v0.2, you would change:
--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 ^
--template chatml ^
--lora_target q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj ^
--output_dir saves\Mistral-7B-Instruct-lora ^Refer to the LLaMA-Factory model table for recommended values.
Find your latest checkpoint under your saves folder, then add this flag:
--resume_from_checkpoint saves\Qwen1.5-1.8B-Chat-lora\checkpoint-400Edit the export_lora.yaml file to match your model:
# export_lora.yaml
base_model: Qwen/Qwen1.5-1.8B-Chat
lora_model: saves/Qwen1.5-1.8B-Chat-lora
output_dir: merged/Qwen1.5-1.8B-Chat-mergedThen run:
llamafactory-cli export export_lora.yamlTest your merged model in an interactive shell:
llamafactory-cli chat ^
--model_name_or_path merged/Qwen1.5-1.8B-Chat-merged ^
--template qwenUpdate --template to match the one used during training.
If you’ve already run setup.bat, the virtual environment is created automatically.
In future sessions, you can activate it manually before running any Python scripts:
venv\Scripts\activateYou should see the prompt change to show that the environment is active:
(venv) C:\Users\yourname\DialogSmith>
Once activated, you can run python, pip, or training commands as usual.