Anna Scherbakova, Anna Sukhanova, Anna Palatkina, Elina Sigdel
Modern dialogue systems have learned to communicate with users, although it has not yet been possible to completely eliminate situations in which conversational agents use inappropriate language. It is difficult to examine the generalisability of the dialogue models, as usually they are evaluated on held-out data. In this paper, we present RuHateBe, a novel Russian benchmark dataset for evaluating dialogue systems on hate speech. To illustrate RuHateBe’s usefulness, we test several state-of-the-art generative models in order to reveal models’ weaknesses. To the best of our knowledge, this is the first study that examines Russian dialogue models to use hate speech towards specific social group.
Link to the RuHateBe dataset: https://disk.yandex.ru/d/hi3PF0XuoyCRlg
Link to the paper: https://disk.yandex.ru/i/Divcpu7LaJwchw