Skip to content

CLDF Dataset derived from the Bahnaric data in Sidwell's "Austroasiatic dataset for phylogenetic analysis" from 2015

License

Notifications You must be signed in to change notification settings

lexibank/sidwellbahnaric

Repository files navigation

CLDF Dataset derived from the Bahnaric data in Sidwell's "Austroasiatic dataset for phylogenetic analysis" from 2015

CLDF validation

How to cite

If you use these data please cite

  • the original source

    Sidwell, Paul. 2015. Austroasiatic dataset for phylogenetic analysis: 2015 version. Mon-Khmer Studies (Notes, Reviews, Data-Papers) 44. lxviii-ccclvii.

  • the derived dataset using the DOI of the particular released version you were using

Description

This dataset is licensed under a CC-By-4.0 license

Conceptlists in Concepticon:

Notes

This dataset by Sidwell (2015) was used as a gold standard benchmark in the study of List et al. (2017) on automated cognate detection. It forms part of the test dataset used in this study, and it was in the form in which you find it here also prepared in this way.

List, J.-M., S. Greenhill, and R. Gray (2017): The potential of automatic word comparison for historical linguistics. PLOS ONE 12.1. 1-18. DOI: https://doi.org/10.1371/journal.pone.0170046

Statistics

CLDF validation Glottolog: 100% Concepticon: 100% Source: 100% BIPA: 100% CLTS SoundClass: 100%

  • Varieties: 24 (linked to 20 different Glottocodes)
  • Concepts: 200 (linked to 200 different Concepticon concept sets)
  • Lexemes: 4,546
  • Sources: 1
  • Synonymy: 1.06
  • Cognacy: 4,546 cognates in 1,055 cognate sets (524 singletons)
  • Cognate Diversity: 0.20
  • Invalid lexemes: 0
  • Tokens: 17,314
  • Segments: 133 (0 BIPA errors, 0 CLTS sound class errors, 133 CLTS modified)
  • Inventory size (avg): 47.12

Contributors

Name GitHub user Descriptin Role
Johann-Mattis List @LinguList maintainer Editor
Paul Sidwell data collection Author

CLDF Datasets

The following CLDF datasets are available in cldf: