DialogSmith – Fine-Tune Models on Your Telegram History

DialogSmith lets you fine-tune large language models (LLMs) like Qwen on your own Telegram conversations. Built on top of LLaMA-Factory, it automatically formats data into the ShareGPT format for supervised fine-tuning (SFT).

Purpose

Fine-tuning on Telegram data can capture aspects of your text style, including:

Writing tone, vocabulary, and phrasing
Typical response lengths and structure
Repeated expressions or idioms
Conversational flow and habits

However, this method won’t replicate your deeper beliefs, private memories, or behavior outside the chat. It reflects how you write — not necessarily how you think.

For stronger emulation, consider incorporating:

Additional sources like emails or forum posts
Clear prompt instructions during inference
Domain-specific datasets (e.g., technical messages, inside jokes)

Warning: Risk of Sensitive Data Exposure

Fine-tuning on real chat history may unintentionally encode:

Personal identifiers (names, locations, contact info)
Confidential conversations
Sensitive or offensive content

Always review and sanitize your exported dataset (result.json) before training. You are responsible for ensuring compliance with privacy laws and personal data protection.

Export Telegram Chat

Open Telegram Desktop.
Go to: Settings > Advanced > Export Telegram Data.
Select your personal chat or group to export.
Ensure JSON format is selected (not HTML).
Place the exported result.json file into:

DialogSmith/
├── data/
│   └── result.json  ← Place here

Setup Instructions (Windows)

Run the automated setup script from Command Prompt (not PowerShell):

setup.bat

This will:

Create and activate a Python virtual environment
Upgrade pip
Clone the official LLaMA-Factory repository and install Python dependencies from requirements.txt
Patch dataset_info.json to register your dataset (chat_sharegpt)
Process your exported Telegram chat (result.json) into chat_sharegpt.json
Place the converted dataset in the correct directory (LLaMA-Factory/data)

Once complete, you will see:

All steps completed successfully.
Please refer to the README.md for the next steps.
You will find instructions on how to launch training.

Make sure your result.json file is already located at:

./data/result.json

Fine-Tune Your Model (LoRA)

The following example uses Qwen1.5-1.8B-Chat, but you can replace it with any Hugging Face-compatible model.

Basic LoRA Fine-Tuning Command

python LLaMA-Factory\src\train.py --stage sft --do_train ^
  --model_name_or_path Qwen/Qwen1.5-1.8B-Chat ^
  --dataset chat_sharegpt ^
  --dataset_dir .\LLaMA-Factory\data ^
  --template qwen ^
  --finetuning_type lora ^
  --lora_target Wqkv,o_proj,gate_proj,down_proj,up_proj ^
  --output_dir saves\Qwen1.5-1.8B-Chat-lora ^
  --overwrite_cache ^
  --per_device_train_batch_size 2 ^
  --gradient_accumulation_steps 4 ^
  --lr_scheduler_type cosine ^
  --logging_steps 10 ^
  --save_strategy steps ^
  --save_steps 100 ^
  --learning_rate 5e-5 ^
  --num_train_epochs 3.0 ^
  --plot_loss

How to Customize for Your Model

Modify these flags to match your model:

Option	Description
`--model_name_or_path`	Hugging Face model ID or local model path
`--template`	Prompt template type (e.g., `qwen`, `chatml`, `default`)
`--lora_target`	LoRA target modules (refer to model’s architecture)
`--output_dir`	Destination to save the LoRA checkpoints

If you're using a model like mistralai/Mistral-7B-Instruct-v0.2, you would change:

--model_name_or_path mistralai/Mistral-7B-Instruct-v0.2 ^
--template chatml ^
--lora_target q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj ^
--output_dir saves\Mistral-7B-Instruct-lora ^

Refer to the LLaMA-Factory model table for recommended values.

To Resume Training

Find your latest checkpoint under your saves folder, then add this flag:

--resume_from_checkpoint saves\Qwen1.5-1.8B-Chat-lora\checkpoint-400

Merge LoRA Adapter with Base Model

Edit the export_lora.yaml file to match your model:

# export_lora.yaml
base_model: Qwen/Qwen1.5-1.8B-Chat
lora_model: saves/Qwen1.5-1.8B-Chat-lora
output_dir: merged/Qwen1.5-1.8B-Chat-merged

Then run:

llamafactory-cli export export_lora.yaml

Chat Inference with Fine-Tuned Model

Test your merged model in an interactive shell:

llamafactory-cli chat ^
  --model_name_or_path merged/Qwen1.5-1.8B-Chat-merged ^
  --template qwen

Update --template to match the one used during training.

Manually Activate the Virtual Environment

If you’ve already run setup.bat, the virtual environment is created automatically. In future sessions, you can activate it manually before running any Python scripts:

Activate on Windows (Command Prompt)

venv\Scripts\activate

You should see the prompt change to show that the environment is active:

(venv) C:\Users\yourname\DialogSmith>

Once activated, you can run python, pip, or training commands as usual.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DialogSmith – Fine-Tune Models on Your Telegram History

Purpose

Warning: Risk of Sensitive Data Exposure

Export Telegram Chat

Setup Instructions (Windows)

Fine-Tune Your Model (LoRA)

Basic LoRA Fine-Tuning Command

How to Customize for Your Model

To Resume Training

Merge LoRA Adapter with Base Model

Chat Inference with Fine-Tuned Model

Manually Activate the Virtual Environment

Activate on Windows (Command Prompt)

About

Uh oh!

Releases 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
export_lora.yaml		export_lora.yaml
setup.bat		setup.bat

License

NotYuSheng/DialogSmith

Folders and files

Latest commit

History

Repository files navigation

DialogSmith – Fine-Tune Models on Your Telegram History

Purpose

Warning: Risk of Sensitive Data Exposure

Export Telegram Chat

Setup Instructions (Windows)

Fine-Tune Your Model (LoRA)

Basic LoRA Fine-Tuning Command

How to Customize for Your Model

To Resume Training

Merge LoRA Adapter with Base Model

Chat Inference with Fine-Tuned Model

Manually Activate the Virtual Environment

Activate on Windows (Command Prompt)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages