Skip to content

MorphDiv/TeDDi_sample

Repository files navigation

TeDDi

This is the repository for the Text Data Diversity Sample (TeDDi Sample), a part of the Swiss National Science Foundation funded project: Non-randomness in Morphological Diversity: A Computational Approach Based on Multilingual Corpora realised at the University of Zurich URPP 'Language and Space'.

This repository contains the corpus data and code that processes and analyzes it. This is currently a work in progress.

If you use TeDDi, please cite as:

Steven Moran, Christian Bentz, Ximena Gutierrez-Vasques, Olga Pelloni, and Tanja Samardzic. 2022. TeDDi Sample: Text Data Diversity Sample for Language Comparison and Multilingual NLP. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1150–1158, Marseille, France. European Language Resources Association. Online: https://aclanthology.org/2022.lrec-1.123/

To contribute code or data to the repository, please first refer to our guidelines on contributing.

Different data formats available for direct download.

Main Contributors (alphabetical order):

  • Bentz, Christian
  • Gutierrez-Vasques, Ximena
  • Moran, Steven
  • Samardžić, Tanja
  • Sozinova, Olga

Language-specific contributors (alphabetical order):

  • Kalessa, Jule (Paiwan)
  • Mächler, Alina
  • Rood, David S. (Wichita)
  • Roth, Rainer (Wari')

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0).

License: CC BY-NC-SA 4.0

License: CC BY-NC-SA 4.0

About

Text Data Diversity Sample (TeDDi Sample)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published