Name		Name	Last commit message	Last commit date
parent directory ..
.gitattributes		.gitattributes
README.md		README.md
StructureDataset-metadata.json		StructureDataset-metadata.json
cldf-metadata.json		cldf-metadata.json
codes.csv		codes.csv
features.csv		features.csv
forms.csv		forms.csv
languages.csv		languages.csv
media.csv		media.csv
parameters.csv		parameters.csv
requirements.txt		requirements.txt
sources.bib		sources.bib
values.csv		values.csv

README.md

CLDF datasets

Wordlist
StructureDataset

Wordlist Hindu Kush Areal Typology

CLDF Metadata: cldf-metadata.json

Sources: sources.bib

The wordlist is mainly generated from the processed wordlists that were part of the elicitation package (see Project design and data collection). That is reflected in the category labels: ASJPlist, Kinship and Numerals.

Selecting an item gives you a) a map of the region with the languages plotted and a label displaying a transcription for each language, b) a list displaying the languages alphabetically with the transcribed item for each. For a subset of the items, audio recordings are linked.

We have aimed at providing a broad phonetic transcription of each item, using the International Phonetic Alphabet. While it is our intention to give each item an accurate written representation, the primary objective was not to represent phonemes or to consistently reflect language-internal phonological contrasts.

Elicitation of kinship terms turned out to be particularly challenging. Although conscious efforts were made to carefully explain and guide the consultants prior to the recording sessions, we are aware that some of the terms listed, in a few of the languages, most likely are explanatory terms rather than reference terms actually used in the local community.

property	value
dc:conformsTo	CLDF Wordlist
dc:format	http://concepticon.clld.org/contributions/Holman-2008-40
dc:identifier	https://hindukush.clld.org/
dc:license	https://creativecommons.org/licenses/by/4.0/
dcat:accessURL	https://github.com/cldf-datasets/liljegrenhindukush
prov:wasDerivedFrom	cldf-datasets/liljegrenhindukush v1.1 Glottolog v4.8 Concepticon v3.1.0 CLTS v2.2.0
prov:wasGeneratedBy	lingpy-rcParams: lingpy-rcParams.json python: 3.10.12 python-packages: requirements.txt
rdf:ID	liljegrenhindukush
rdf:type	http://www.w3.org/ns/dcat#Distribution

Table forms.csv

property	value
dc:conformsTo	CLDF FormTable
dc:extent	11600

Columns

Name/Property	Datatype	Description
ID	`string`	Primary key
Local_ID	`string`
Language_ID	`string`	References languages.csv::ID
Parameter_ID	`string`	References parameters.csv::ID
Value	`string`
Form	`string`
Segments	list of `string` (separated by )
Comment	`string`
Source	list of `string` (separated by `;`)	References sources.bib::BibTeX-key
`Cognacy`	`string`
`Loan`	`boolean`
`Graphemes`	`string`
`Profile`	`string`
Audio_Files	list of `string` (separated by )	References media.csv::ID

Table languages.csv

property	value
dc:conformsTo	CLDF LanguageTable
dc:extent	59

Columns

Name/Property	Datatype	Description
ID	`string`	Primary key
Name	`string`
Glottocode	`string`
`Glottolog_Name`	`string`
ISO639P3code	`string`
Macroarea	`string`
Latitude	`decimal` ≥ -90 ≤ 90
Longitude	`decimal` ≥ -180 ≤ 180
`Family`	`string`
`SubGroup`	`string`
`Location`	`string`
`Elicitation`	`string`
`Consultant`	`string`

Table parameters.csv

property	value
dc:conformsTo	CLDF ParameterTable
dc:extent	194

Columns

Name/Property	Datatype	Description
ID	`string`	Primary key
Name	`string`
Concepticon_ID	`string`
`Concepticon_Gloss`	`string`
`domain`	`string`

Table media.csv

property	value
dc:conformsTo	CLDF MediaTable
dc:extent	4862

Columns

Name/Property	Datatype	Description
ID	`string` Regex: `[a-zA-Z0-9_\-]+`	Primary key
Name	`string`
Description	`string`
Media_Type	`string` Regex: `[^/]+/.+`
Path_In_Zip	`string`
`objid`	`string`
`fname`	`string`
`size`	`integer`

StructureDataset Hindu Kush Areal Typology

CLDF Metadata: StructureDataset-metadata.json

Sources: sources.bib

A feature, as the term is used here, is a structural property of a language. A conscious choice was made to define features binary. That means that a given language is classified as either displaying a particular feature (having the value=1) or that the feature is absent (having the value=0) as far as the present data set is concerned. In some cases, for instance when conclusive data is missing, it is marked as indeterminate.

Our features are not meant to reflect all possible language properties. Instead they have been chosen, a) to represent areally relevant features, and b) to cover different parts of the language system (hence distributed across five macro-categories: Clause structure, Grammatical categories, Lexicon, Phonology and Word order). The choice of features to include is also constrained by our present, and relatively limited, data set.

Each feature presentation contains: 1. A table showing the number of sample languages displaying the feature, the number of languages in which it is absent, and the number of languages for which the feature value is indeterminate. 2. A map showing the geographical distribution of feature values across the region. 3. A short prose description of the feature as it occurs in the region, accompanied by one or more illustrative examples drawn from the data set.

Data references are given in the following way: Sample language-Data component-Language consultant:Item number. The reference HNO-Val-RH:061 should for instance be read as Hindko language data, provided by a consultant given the identifier RH, related to item number 061 in the Valency Questionnaire. Most of the data used in the feature descriptions belong to one of the seven components listed (along with their abbreviations) in Project design and data collection.

The abbreviations in the interlinear glossing are to a large extent standard abbreviations according to the Leipzig Glossing Rules. Additional abbreviations are listed below.

AN	animate
CONJ	conjunction
EMPH	emphatic
EQ	equative copula
ESS	essive
EX	existential copula
HF	human feminine (gender)
HM	human masculine (gender)
INAN	inanimate
MED	medial (distance)
R	recipient (argument)
T	theme (argument)
V	verb
X	x (non-human gender)
Y	y (non-human gender)

property	value
dc:conformsTo	CLDF StructureDataset
dc:format	http://concepticon.clld.org/contributions/Holman-2008-40
dc:identifier	https://hindukush.clld.org/
dc:license	https://creativecommons.org/licenses/by/4.0/
dcat:accessURL	https://github.com/cldf-datasets/liljegrenhindukush
prov:wasDerivedFrom	cldf-datasets/liljegrenhindukush v1.1 Glottolog v4.8 Concepticon v3.1.0 CLTS v2.2.0
prov:wasGeneratedBy	python: 3.10.12 python-packages: requirements.txt
rdf:ID	liljegrenhindukush
rdf:type	http://www.w3.org/ns/dcat#Distribution

Table values.csv

property	value
dc:conformsTo	CLDF ValueTable
dc:extent	4720

Columns

Name/Property	Datatype	Description
ID	`string` Regex: `[a-zA-Z0-9_\-]+`	Primary key
Language_ID	`string`	References languages.csv::ID
Parameter_ID	`string`	References features.csv::ID
Value	`string`
Code_ID	`string`	References codes.csv::ID
Comment	`string`
Source	list of `string` (separated by `;`)	References sources.bib::BibTeX-key

Table features.csv

property	value
dc:conformsTo	CLDF ParameterTable
dc:extent	80

Columns

Name/Property	Datatype	Description
ID	`string` Regex: `[a-zA-Z0-9_\-]+`	Primary key
Name	`string`
Description	`string`
ColumnSpec	`json`
`Category`	`string`

Table languages.csv

property	value
dc:conformsTo	CLDF LanguageTable
dc:extent	59

Columns

Name/Property	Datatype	Description
ID	`string`	Primary key
Name	`string`
Glottocode	`string`
`Glottolog_Name`	`string`
ISO639P3code	`string`
Macroarea	`string`
Latitude	`decimal` ≥ -90 ≤ 90
Longitude	`decimal` ≥ -180 ≤ 180
`Family`	`string`
`SubGroup`	`string`
`Location`	`string`
`Elicitation`	`string`
`Consultant`	`string`

Table codes.csv

property	value
dc:conformsTo	CLDF CodeTable
dc:extent	240

Columns

Name/Property	Datatype	Description
ID	`string` Regex: `[a-zA-Z0-9_\-]+`	Primary key
Parameter_ID	`string`	The parameter or variable the code belongs to. References features.csv::ID
Name	`string`
Description	`string`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cldf

cldf

README.md

CLDF datasets

Wordlist Hindu Kush Areal Typology

Table forms.csv

Columns

Table languages.csv

Columns

Table parameters.csv

Columns

Table media.csv

Columns

StructureDataset Hindu Kush Areal Typology

Table values.csv

Columns

Table features.csv

Columns

Table languages.csv

Columns

Table codes.csv

Columns

Files

cldf

Directory actions

More options

Directory actions

More options

Latest commit

History

cldf

Folders and files

parent directory

README.md

CLDF datasets

Wordlist Hindu Kush Areal Typology

Table forms.csv

Columns

Table languages.csv

Columns

Table parameters.csv

Columns

Table media.csv

Columns

StructureDataset Hindu Kush Areal Typology

Table values.csv

Columns

Table features.csv

Columns

Table languages.csv

Columns

Table codes.csv

Columns