RuHateBe: A Benchmark Dataset for Hate Speech in Russian

Anna Scherbakova, Anna Sukhanova, Anna Palatkina, Elina Sigdel

Modern dialogue systems have learned to communicate with users, although it has not yet been possible to completely eliminate situations in which conversational agents use inappropriate language. It is difficult to examine the generalisability of the dialogue models, as usually they are evaluated on held-out data. In this paper, we present RuHateBe, a novel Russian benchmark dataset for evaluating dialogue systems on hate speech. To illustrate RuHateBe’s usefulness, we test several state-of-the-art generative models in order to reveal models’ weaknesses. To the best of our knowledge, this is the first study that examines Russian dialogue models to use hate speech towards specific social group.

Link to the RuHateBe dataset: https://disk.yandex.ru/d/hi3PF0XuoyCRlg

Link to the paper: https://disk.yandex.ru/i/Divcpu7LaJwchw

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
analysis		analysis
data		data
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RuHateBe: A Benchmark Dataset for Hate Speech in Russian

About

Uh oh!

Contributors 4

Uh oh!

Languages

License

Aniezka/hatespeech-russian

Folders and files

Latest commit

History

Repository files navigation

RuHateBe: A Benchmark Dataset for Hate Speech in Russian

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages