CLDF Metadata: cldf-metadata.json
Sources: sources.bib
The wordlist is mainly generated from the processed wordlists that were part of the elicitation package (see Project design and data collection). That is reflected in the category labels: ASJPlist, Kinship and Numerals.
Selecting an item gives you a) a map of the region with the languages plotted and a label displaying a transcription for each language, b) a list displaying the languages alphabetically with the transcribed item for each. For a subset of the items, audio recordings are linked.
We have aimed at providing a broad phonetic transcription of each item, using the International Phonetic Alphabet. While it is our intention to give each item an accurate written representation, the primary objective was not to represent phonemes or to consistently reflect language-internal phonological contrasts.
Elicitation of kinship terms turned out to be particularly challenging. Although conscious efforts were made to carefully explain and guide the consultants prior to the recording sessions, we are aware that some of the terms listed, in a few of the languages, most likely are explanatory terms rather than reference terms actually used in the local community.
property | value |
---|---|
dc:conformsTo | CLDF Wordlist |
dc:format | |
dc:identifier | https://hindukush.clld.org/ |
dc:license | https://creativecommons.org/licenses/by/4.0/ |
dcat:accessURL | https://github.com/cldf-datasets/liljegrenhindukush |
prov:wasDerivedFrom | |
prov:wasGeneratedBy |
|
rdf:ID | liljegrenhindukush |
rdf:type | http://www.w3.org/ns/dcat#Distribution |
Table forms.csv
property | value |
---|---|
dc:conformsTo | CLDF FormTable |
dc:extent | 11600 |
Name/Property | Datatype | Description |
---|---|---|
ID | string |
Primary key |
Local_ID | string |
|
Language_ID | string |
References languages.csv::ID |
Parameter_ID | string |
References parameters.csv::ID |
Value | string |
|
Form | string |
|
Segments | list of string (separated by ) |
|
Comment | string |
|
Source | list of string (separated by ; ) |
References sources.bib::BibTeX-key |
Cognacy |
string |
|
Loan |
boolean |
|
Graphemes |
string |
|
Profile |
string |
|
Audio_Files | list of string (separated by ) |
References media.csv::ID |
Table languages.csv
property | value |
---|---|
dc:conformsTo | CLDF LanguageTable |
dc:extent | 59 |
Name/Property | Datatype | Description |
---|---|---|
ID | string |
Primary key |
Name | string |
|
Glottocode | string |
|
Glottolog_Name |
string |
|
ISO639P3code | string |
|
Macroarea | string |
|
Latitude | decimal ≥ -90 ≤ 90 |
|
Longitude | decimal ≥ -180 ≤ 180 |
|
Family |
string |
|
SubGroup |
string |
|
Location |
string |
|
Elicitation |
string |
|
Consultant |
string |
Table parameters.csv
property | value |
---|---|
dc:conformsTo | CLDF ParameterTable |
dc:extent | 194 |
Name/Property | Datatype | Description |
---|---|---|
ID | string |
Primary key |
Name | string |
|
Concepticon_ID | string |
|
Concepticon_Gloss |
string |
|
domain |
string |
Table media.csv
property | value |
---|---|
dc:conformsTo | CLDF MediaTable |
dc:extent | 4862 |
Name/Property | Datatype | Description |
---|---|---|
ID | string Regex: [a-zA-Z0-9_\-]+ |
Primary key |
Name | string |
|
Description | string |
|
Media_Type | string Regex: [^/]+/.+ |
|
Path_In_Zip | string |
|
objid |
string |
|
fname |
string |
|
size |
integer |
CLDF Metadata: StructureDataset-metadata.json
Sources: sources.bib
A feature, as the term is used here, is a structural property of a language. A conscious choice was made to define features binary. That means that a given language is classified as either displaying a particular feature (having the value=1) or that the feature is absent (having the value=0) as far as the present data set is concerned. In some cases, for instance when conclusive data is missing, it is marked as indeterminate.
Our features are not meant to reflect all possible language properties. Instead they have been chosen, a) to represent areally relevant features, and b) to cover different parts of the language system (hence distributed across five macro-categories: Clause structure, Grammatical categories, Lexicon, Phonology and Word order). The choice of features to include is also constrained by our present, and relatively limited, data set.
Each feature presentation contains: 1. A table showing the number of sample languages displaying the feature, the number of languages in which it is absent, and the number of languages for which the feature value is indeterminate. 2. A map showing the geographical distribution of feature values across the region. 3. A short prose description of the feature as it occurs in the region, accompanied by one or more illustrative examples drawn from the data set.
Data references are given in the following way: Sample language-Data component-Language consultant:Item number. The reference HNO-Val-RH:061 should for instance be read as Hindko language data, provided by a consultant given the identifier RH, related to item number 061 in the Valency Questionnaire. Most of the data used in the feature descriptions belong to one of the seven components listed (along with their abbreviations) in Project design and data collection.
The abbreviations in the interlinear glossing are to a large extent standard abbreviations according to the Leipzig Glossing Rules. Additional abbreviations are listed below.
AN | animate |
---|---|
CONJ | conjunction |
EMPH | emphatic |
EQ | equative copula |
ESS | essive |
EX | existential copula |
HF | human feminine (gender) |
HM | human masculine (gender) |
INAN | inanimate |
MED | medial (distance) |
R | recipient (argument) |
T | theme (argument) |
V | verb |
X | x (non-human gender) |
Y | y (non-human gender) |
property | value |
---|---|
dc:conformsTo | CLDF StructureDataset |
dc:format | |
dc:identifier | https://hindukush.clld.org/ |
dc:license | https://creativecommons.org/licenses/by/4.0/ |
dcat:accessURL | https://github.com/cldf-datasets/liljegrenhindukush |
prov:wasDerivedFrom | |
prov:wasGeneratedBy |
|
rdf:ID | liljegrenhindukush |
rdf:type | http://www.w3.org/ns/dcat#Distribution |
Table values.csv
property | value |
---|---|
dc:conformsTo | CLDF ValueTable |
dc:extent | 4720 |
Name/Property | Datatype | Description |
---|---|---|
ID | string Regex: [a-zA-Z0-9_\-]+ |
Primary key |
Language_ID | string |
References languages.csv::ID |
Parameter_ID | string |
References features.csv::ID |
Value | string |
|
Code_ID | string |
References codes.csv::ID |
Comment | string |
|
Source | list of string (separated by ; ) |
References sources.bib::BibTeX-key |
Table features.csv
property | value |
---|---|
dc:conformsTo | CLDF ParameterTable |
dc:extent | 80 |
Name/Property | Datatype | Description |
---|---|---|
ID | string Regex: [a-zA-Z0-9_\-]+ |
Primary key |
Name | string |
|
Description | string |
|
ColumnSpec | json |
|
Category |
string |
Table languages.csv
property | value |
---|---|
dc:conformsTo | CLDF LanguageTable |
dc:extent | 59 |
Name/Property | Datatype | Description |
---|---|---|
ID | string |
Primary key |
Name | string |
|
Glottocode | string |
|
Glottolog_Name |
string |
|
ISO639P3code | string |
|
Macroarea | string |
|
Latitude | decimal ≥ -90 ≤ 90 |
|
Longitude | decimal ≥ -180 ≤ 180 |
|
Family |
string |
|
SubGroup |
string |
|
Location |
string |
|
Elicitation |
string |
|
Consultant |
string |
Table codes.csv
property | value |
---|---|
dc:conformsTo | CLDF CodeTable |
dc:extent | 240 |
Name/Property | Datatype | Description |
---|---|---|
ID | string Regex: [a-zA-Z0-9_\-]+ |
Primary key |
Parameter_ID | string |
The parameter or variable the code belongs to. References features.csv::ID |
Name | string |
|
Description | string |