CLDF Dataset derived from the Bahnaric data in Sidwell's "Austroasiatic dataset for phylogenetic analysis" from 2015
If you use these data please cite
- the original source
Sidwell, Paul. 2015. Austroasiatic dataset for phylogenetic analysis: 2015 version. Mon-Khmer Studies (Notes, Reviews, Data-Papers) 44. lxviii-ccclvii.
- the derived dataset using the DOI of the particular released version you were using
This dataset is licensed under a CC-By-4.0 license
Conceptlists in Concepticon:
This dataset by Sidwell (2015) was used as a gold standard benchmark in the study of List et al. (2017) on automated cognate detection. It forms part of the test dataset used in this study, and it was in the form in which you find it here also prepared in this way.
List, J.-M., S. Greenhill, and R. Gray (2017): The potential of automatic word comparison for historical linguistics. PLOS ONE 12.1. 1-18. DOI: https://doi.org/10.1371/journal.pone.0170046
- Varieties: 24 (linked to 20 different Glottocodes)
- Concepts: 200 (linked to 200 different Concepticon concept sets)
- Lexemes: 4,546
- Sources: 1
- Synonymy: 1.06
- Cognacy: 4,546 cognates in 1,055 cognate sets (524 singletons)
- Cognate Diversity: 0.20
- Invalid lexemes: 0
- Tokens: 17,314
- Segments: 133 (0 BIPA errors, 0 CLTS sound class errors, 133 CLTS modified)
- Inventory size (avg): 47.12
Name | GitHub user | Descriptin | Role |
---|---|---|---|
Johann-Mattis List | @LinguList | maintainer | Editor |
Paul Sidwell | data collection | Author |
The following CLDF datasets are available in cldf:
- CLDF Wordlist at cldf/cldf-metadata.json