Skip to content

[EMNLP2024] Official Implementation of the paper "Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification"

Notifications You must be signed in to change notification settings

KomeijiForce/TextGrafting

Repository files navigation

Text Grafting: Near-Distribution Weak Supervision for Minority Classes

Overview Text Grafting generates high-quality, in-distribution weak supervision for minority classes in text classification tasks, combining the strengths of pseudo-label mining and LLM-based text synthesis.

Motivation Current methods struggle with limited or no samples for minority classes, leading to poor classifier performance.

Solution Text Grafting mines masked templates from the raw corpus and fills them with state-of-the-art LLMs to synthesize near-distribution texts for minority classes.

Running

The bash script is provided in run.sh. You need to fill in your own your_openai_key and your_huggingface_key.

python grafting.py --api_key "your_openai_key"\
	--hf_token "your_huggingface_key"\
	--model_engine "gpt-4o"\
	--miner_id "google/gemma-1.1-7b-it" \
	--ratio_k 0.25 \
	--ratio_t 0.1 \
	--raw_text_lim 10000 \
	--label_name "surprised" \
	--label_id 5 \
	--dataset_path "SetFit/emotion" \
	--dataset_name "emotion" \
	--style "sentence" \
	--device "0" 

About

[EMNLP2024] Official Implementation of the paper "Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published