DeepOffense : Multilingual Offensive Language Identification with Cross-lingual Embeddings

DeepOffense provides state-of-the-art models for multilingual offensive language identification. In this project, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low resource languages. We project predictions on comparable data in Bengali, Hindi, and Spanish and we report results of 0.8415 F1 macro for Bengali, 0.8568 F1 macro for Hindi, and 0.7513 F1 macro for Spanish.

Installation

You first need to install PyTorch. THe recommended PyTorch version is 1.5. Please refer to PyTorch installation page regarding the specific install command for your platform.

When PyTorch has been installed, you can install from source by cloning the repository and running:

git clone https://github.com/TharinduDR/DeepOffense.git
cd DeepOffense
pip install -r requirements.txt

Run the examples

Examples are included in the repository but are not shipped with the library. Please refer the examples directory for the examples. Each directory in the examples folder contains different languages.

Pretrained Models

English offensive language detection pre-trained model trained with XLM-R large model on OffensEval data can be downloaded using this link.

Once downloading it and unzipping it, they can be loaded easily. To see how to begin the training process please refer the examples directory

model = ClassificationModel("xlmroberta", "path",  use_cuda=torch.cuda.is_available())

Citation

Please consider citing us if you use the library.

@inproceedings{ranasinghe-etal-2020-multilingual,
    title = "Multilingual Offensive Language Identification with Cross-lingual Embeddings",
    author = "Ranasinghe, Tharindu  and
      Zampieri, Marcos,
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = nov
    year = "2020",
    }

Citation for the Malayalam specific paper,

@inproceedings{ranasinghe-etal-2020-wlv,
     title={WLV-RIT at HASOC 2020: Offensive Language Identification in Code-switched Texts},
      author={Ranasinghe, Tharindu and Zampieri, Marcos},
      year={2020},
      booktitle={Proceedings of FIRE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DeepOffense : Multilingual Offensive Language Identification with Cross-lingual Embeddings

Installation

Run the examples

Pretrained Models

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

DeepOffense : Multilingual Offensive Language Identification with Cross-lingual Embeddings

Installation

Run the examples

Pretrained Models

Citation