This is the code for the paper "Hope Speech detection in under-resourced Kannada language"
- Download the corresponding files from Zenodo:
https://zenodo.org/record/5006517/
-
Set the path to
'path_to_repo/KanHope/Dual Channel models/'
. -
For the models that follow the architecture of BERT, run the
classifier.py
and find the string'read_csv'
. Add the paths to the train, test, and validation dataframes. Change the path to the dataset where the files have been stored after downloading from Zenodo. -
Run
test.py
for inference. -
Under the same directory, run
get_predictions.py
to view classification reports and confusion matrix.
- Download the English translations of the code-mixed Kannada-English dataset, along with the splits:
https://Zenodo.org/record/4904729/
-
run
dc_classifier.py
to train the Dual channel BERT model. -
For the names of the models (
model1;model2
), follow the naming conventions as listed in Huggingface Transformers' pretrained models. a)model1:
Monolingual English language model (Translated Texts). b)model2:
Multilingual language model (Kannada-English code-mixed text). -
under the same directory run
get_predictions.py
to view the classification reports and confusion matrix. -
The architecture of the dual channel model is as follows:
This approach could be used for any multilingual datasets. The weights of the fine-tuned models are available on my Huggingface account [AdWeeb](https://huggingface.co/AdWeeb).
We have provided the notebooks for reference.
The code and their explanation for all the experiments are present in the Jupyter Notebook. We document interesting findings, results, discussions and qualitative analysis in the manuscript.
If you use our dataset, and/or find our codes useful, please cite our paper:
@misc{hande2021hope,
title={Hope Speech detection in under-resourced Kannada language},
author={Adeep Hande and Ruba Priyadharshini and Anbukkarasi Sampath and Kingston Pal Thamburaj and Prabakaran Chandran and Bharathi Raja Chakravarthi},
year={2021},
eprint={2108.04616},
archivePrefix={arXiv},
primaryClass={cs.CL}
}