Dataset generated by Dr. Bhimrao Ambedkar University, Agra in collaboration with IIT-Kharagpur, Panlingua Language Processing LLP and UnReaL-TecE LLP for the project titled "Communal and Misogynistic Aggression in Hindi-English-Bangla" (the ComMA Project), funded by Facebook Research. The dataset contains 20,000 data points in three Indian languages - Meitei, Bangla and Hindi (all from social media and code-mixed with English) - richly annotated with different levels of aggression and bias.
The dataset is licensed under CC BY-NC-SA 4.0. For commercial licensing of the dataset, contact UnReaL-TecE LLP.
If you are using the data, please cite the following paper(s)
@article{kumar_multilingual_2023,
title = {A multilingual, multimodal dataset of aggression and bias: the {ComMA} dataset},
issn = {1574-0218},
shorttitle = {A multilingual, multimodal dataset of aggression and bias},
url = {https://doi.org/10.1007/s10579-023-09696-7},
doi = {10.1007/s10579-023-09696-7},
language = {en},
urldate = {2024-01-10},
journal = {Language Resources and Evaluation},
author = {Kumar, Ritesh and Ratan, Shyam and Singh, Siddharth and Nandi, Enakshi and Devi, Laishram Niranjana and Bhagat, Akash and Dawer, Yogesh and Lahiri, Bornini and Bansal, Akanksha},
month = nov,
year = {2023}
}
@inproceedings{kumar-etal-2022-comma,
title = "The {C}om{MA} Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse",
author = "Kumar, Ritesh and Ratan, Shyam and Singh, Siddharth and Nandi, Enakshi and Devi, Laishram Niranjana and Bhagat, Akash and Dawer, Yogesh and Lahiri, Bornini and Bansal, Akanksha and Ojha, Atul Kr.",
editor = "Calzolari, Nicoletta and B{\'e}chet, Fr{\'e}d{\'e}ric and Blache, Philippe and Choukri, Khalid and Cieri, Christopher and Declerck, Thierry and Goggi, Sara and Isahara, Hitoshi and Maegaard, Bente and Mariani, Joseph and Mazo, H{\'e}l{\`e}ne and Odijk, Jan and Piperidis, Stelios",
booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
month = jun,
year = "2022",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2022.lrec-1.441",
pages = "4149--4161"
}
For any queries, please feel free to contact at riteshkr[dot]kmi
- the email is at the most popular email domain starting with 'g'.