The implementation of paper Yao fu, Hao Zhou, Jiaze Chen, and Lei Li, Rethinking Text Attribute Transfer: A Lexical Analysis. INLG 2019 (oral). link
In this paper, we discuss the observation that in many text style transfer datasets and models, only a few style-related words are changed during the transfer process, while the higher-level sentence structures remain unchanged. E.g. to change a negetive sentence "The food is awful" in Yelp to positive, one only need to substitute the word "awful" -> "The food is awesome".
How can quantitatively identify, measure, and visualize the influence of these words? We propose three algorithms for this purpose: the pivot word discovery, the pivot classifier, and the precision-recall histogram algorithms. They are all implemented in this repo.
We gather 8 major style-transfer dataset, standarlize them (so in your future work you could use them from this repo with minimal modification :), and analyze the pivot effects in these dataset. All analytical results from the paper can be reproduced and find out in the outputs/
folder.
The datasets used in the paper are:
- yelp
- amazon
- caption
- gender
- paper
- politics
All organized as: train.0, train.1/ dev.0, dev.1/ test.0, test.1. Download from here
But note that the caption dataset does not have the right test data (because they made a mistake in their release, the positive and negative sentences in the test set are the same).
Other data are from the corresponding papers, with renaming and re-organization to fit our code.
mkdir outputs
python main.py --dataset=yelp --pivot_thres_cnt=1 --prec_thres=0.5 --recl_thres=0.0
and the outputs would something like:
...
Pivot word discovery:
class 0, 4929 pivots, pivot recall: 0.3348
class 1, 4129 pivots, pivot recall: 0.3435
...
Pivot classifier:
train accuracy: 0.8401
dev accuracy: 0.8313
test accuracy: 0.8333
...
output stored in
../outputs/yelp_1.pivot
Sample outputs
yelp_0.pivot: word/ precision/ recall (negative sentiment)
sadly 0.9924 0.0002
mistaken 0.7778 0.0000
general 0.6285 0.0001
run 0.6795 0.0003
mill 0.6226 0.0000
yelp_1.pivot: word/ precision/ recall (positive sentiment)
hoagies 0.7903 0.0000
italian 0.7029 0.0004
ton 0.7260 0.0001
really 0.5998 0.0038
worthy 0.6548 0.0000
yelp_0.sent: (pivot words are annotated with their precision)
ok(0.927) never(0.897) going(0.680) back(0.616) to this place again .
easter(0.786) day(0.502) nothing(0.918) open(0.516) , heard(0.778) about this place figured(0.781) it would ok(0.927) .
yelp_1.sent: (pivot words are annotated with their precision)
staff(0.791) behind the deli(0.696) counter were super(0.845) nice(0.907) and efficient(0.943) !
the staff(0.791) are always(0.918) very nice(0.907) and helpful(0.890) .
Parameters tunning:
prec_thres
gives the confidence of how a word may determine the classification. To find strong pivot words, increase this parameter (e.g. [0.7, 1.0]). To achieve better classification performance, decrease this parameter (e.g. [0.5, 0.7])
recl_thres
and pivot_thres_cnt
prevents overfitting on single words. To increase confidence of the pivot words, increase them; to increase classification performance, decrease them.
Yao Fu, yao.fu@columbia.edu