Skip to content

kmi-linguistics/ComMA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

ComMA Dataset

Dataset generated by Dr. Bhimrao Ambedkar University, Agra in collaboration with IIT-Kharagpur, Panlingua Language Processing LLP and UnReaL-TecE LLP for the project titled "Communal and Misogynistic Aggression in Hindi-English-Bangla" (the ComMA Project), funded by Facebook Research. The dataset contains 20,000 data points in three Indian languages - Meitei, Bangla and Hindi (all from social media and code-mixed with English) - richly annotated with different levels of aggression and bias.

The dataset is licensed under CC BY-NC-SA 4.0. For commercial licensing of the dataset, contact UnReaL-TecE LLP.

If you are using the data, please cite the following paper(s)

@article{kumar_multilingual_2023,
  	title = {A multilingual, multimodal dataset of aggression and bias: the {ComMA} dataset},
  	issn = {1574-0218},
  	shorttitle = {A multilingual, multimodal dataset of aggression and bias},
  	url = {https://doi.org/10.1007/s10579-023-09696-7},
  	doi = {10.1007/s10579-023-09696-7},
  	language = {en},
  	urldate = {2024-01-10},
  	journal = {Language Resources and Evaluation},
  	author = {Kumar, Ritesh and Ratan, Shyam and Singh, Siddharth and Nandi, Enakshi and Devi, Laishram Niranjana and Bhagat, Akash and Dawer, Yogesh and Lahiri, Bornini and Bansal, Akanksha},
  	month = nov,
  	year = {2023}
}

@inproceedings{kumar-etal-2022-comma,
    title = "The {C}om{MA} Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse",
    author = "Kumar, Ritesh  and Ratan, Shyam  and Singh, Siddharth  and Nandi, Enakshi  and Devi, Laishram Niranjana  and Bhagat, Akash  and Dawer, Yogesh  and Lahiri, Bornini  and Bansal, Akanksha  and Ojha, Atul Kr.",
    editor = "Calzolari, Nicoletta  and B{\'e}chet, Fr{\'e}d{\'e}ric  and Blache, Philippe  and Choukri, Khalid  and Cieri, Christopher  and Declerck, Thierry  and Goggi, Sara  and Isahara, Hitoshi  and Maegaard, Bente  and Mariani, Joseph  and Mazo, H{\'e}l{\`e}ne  and Odijk, Jan  and Piperidis, Stelios",
    booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference",
    month = jun,
    year = "2022",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2022.lrec-1.441",
    pages = "4149--4161"    
}

For any queries, please feel free to contact at riteshkr[dot]kmi - the email is at the most popular email domain starting with 'g'.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published