Skip to content

Latest commit

 

History

History

cldf

CLDF datasets

Wordlist Hindu Kush Areal Typology

CLDF Metadata: cldf-metadata.json

Sources: sources.bib

The wordlist is mainly generated from the processed wordlists that were part of the elicitation package (see Project design and data collection). That is reflected in the category labels: ASJPlist, Kinship and Numerals.

Selecting an item gives you a) a map of the region with the languages plotted and a label displaying a transcription for each language, b) a list displaying the languages alphabetically with the transcribed item for each. For a subset of the items, audio recordings are linked.

We have aimed at providing a broad phonetic transcription of each item, using the International Phonetic Alphabet. While it is our intention to give each item an accurate written representation, the primary objective was not to represent phonemes or to consistently reflect language-internal phonological contrasts.

Elicitation of kinship terms turned out to be particularly challenging. Although conscious efforts were made to carefully explain and guide the consultants prior to the recording sessions, we are aware that some of the terms listed, in a few of the languages, most likely are explanatory terms rather than reference terms actually used in the local community.

property value
dc:conformsTo CLDF Wordlist
dc:format
  1. http://concepticon.clld.org/contributions/Holman-2008-40
dc:identifier https://hindukush.clld.org/
dc:license https://creativecommons.org/licenses/by/4.0/
dcat:accessURL https://github.com/cldf-datasets/liljegrenhindukush
prov:wasDerivedFrom
  1. cldf-datasets/liljegrenhindukush v1.1
  2. Glottolog v4.8
  3. Concepticon v3.1.0
  4. CLTS v2.2.0
prov:wasGeneratedBy
  1. lingpy-rcParams: lingpy-rcParams.json
  2. python: 3.10.12
  3. python-packages: requirements.txt
rdf:ID liljegrenhindukush
rdf:type http://www.w3.org/ns/dcat#Distribution

Table forms.csv

property value
dc:conformsTo CLDF FormTable
dc:extent 11600

Columns

Name/Property Datatype Description
ID string Primary key
Local_ID string
Language_ID string References languages.csv::ID
Parameter_ID string References parameters.csv::ID
Value string
Form string
Segments list of string (separated by )
Comment string
Source list of string (separated by ;) References sources.bib::BibTeX-key
Cognacy string
Loan boolean
Graphemes string
Profile string
Audio_Files list of string (separated by ) References media.csv::ID
property value
dc:conformsTo CLDF LanguageTable
dc:extent 59

Columns

Name/Property Datatype Description
ID string Primary key
Name string
Glottocode string
Glottolog_Name string
ISO639P3code string
Macroarea string
Latitude decimal
≥ -90
≤ 90
Longitude decimal
≥ -180
≤ 180
Family string
SubGroup string
Location string
Elicitation string
Consultant string
property value
dc:conformsTo CLDF ParameterTable
dc:extent 194

Columns

Name/Property Datatype Description
ID string Primary key
Name string
Concepticon_ID string
Concepticon_Gloss string
domain string

Table media.csv

property value
dc:conformsTo CLDF MediaTable
dc:extent 4862

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
Media_Type string
Regex: [^/]+/.+
Path_In_Zip string
objid string
fname string
size integer

StructureDataset Hindu Kush Areal Typology

CLDF Metadata: StructureDataset-metadata.json

Sources: sources.bib

A feature, as the term is used here, is a structural property of a language. A conscious choice was made to define features binary. That means that a given language is classified as either displaying a particular feature (having the value=1) or that the feature is absent (having the value=0) as far as the present data set is concerned. In some cases, for instance when conclusive data is missing, it is marked as indeterminate.

Our features are not meant to reflect all possible language properties. Instead they have been chosen, a) to represent areally relevant features, and b) to cover different parts of the language system (hence distributed across five macro-categories: Clause structure, Grammatical categories, Lexicon, Phonology and Word order). The choice of features to include is also constrained by our present, and relatively limited, data set.

Each feature presentation contains: 1. A table showing the number of sample languages displaying the feature, the number of languages in which it is absent, and the number of languages for which the feature value is indeterminate. 2. A map showing the geographical distribution of feature values across the region. 3. A short prose description of the feature as it occurs in the region, accompanied by one or more illustrative examples drawn from the data set.

Data references are given in the following way: Sample language-Data component-Language consultant:Item number. The reference HNO-Val-RH:061 should for instance be read as Hindko language data, provided by a consultant given the identifier RH, related to item number 061 in the Valency Questionnaire. Most of the data used in the feature descriptions belong to one of the seven components listed (along with their abbreviations) in Project design and data collection.

The abbreviations in the interlinear glossing are to a large extent standard abbreviations according to the Leipzig Glossing Rules. Additional abbreviations are listed below.

AN animate
CONJ conjunction
EMPH emphatic
EQ equative copula
ESS essive
EX existential copula
HF human feminine (gender)
HM human masculine (gender)
INAN inanimate
MED medial (distance)
R recipient (argument)
T theme (argument)
V verb
X x (non-human gender)
Y y (non-human gender)
property value
dc:conformsTo CLDF StructureDataset
dc:format
  1. http://concepticon.clld.org/contributions/Holman-2008-40
dc:identifier https://hindukush.clld.org/
dc:license https://creativecommons.org/licenses/by/4.0/
dcat:accessURL https://github.com/cldf-datasets/liljegrenhindukush
prov:wasDerivedFrom
  1. cldf-datasets/liljegrenhindukush v1.1
  2. Glottolog v4.8
  3. Concepticon v3.1.0
  4. CLTS v2.2.0
prov:wasGeneratedBy
  1. python: 3.10.12
  2. python-packages: requirements.txt
rdf:ID liljegrenhindukush
rdf:type http://www.w3.org/ns/dcat#Distribution
property value
dc:conformsTo CLDF ValueTable
dc:extent 4720

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Language_ID string References languages.csv::ID
Parameter_ID string References features.csv::ID
Value string
Code_ID string References codes.csv::ID
Comment string
Source list of string (separated by ;) References sources.bib::BibTeX-key
property value
dc:conformsTo CLDF ParameterTable
dc:extent 80

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Name string
Description string
ColumnSpec json
Category string
property value
dc:conformsTo CLDF LanguageTable
dc:extent 59

Columns

Name/Property Datatype Description
ID string Primary key
Name string
Glottocode string
Glottolog_Name string
ISO639P3code string
Macroarea string
Latitude decimal
≥ -90
≤ 90
Longitude decimal
≥ -180
≤ 180
Family string
SubGroup string
Location string
Elicitation string
Consultant string

Table codes.csv

property value
dc:conformsTo CLDF CodeTable
dc:extent 240

Columns

Name/Property Datatype Description
ID string
Regex: [a-zA-Z0-9_\-]+
Primary key
Parameter_ID string The parameter or variable the code belongs to.
References features.csv::ID
Name string
Description string