diff --git a/cldf/README.md b/cldf/README.md index 8c93585..f8b2011 100644 --- a/cldf/README.md +++ b/cldf/README.md @@ -13,8 +13,8 @@ property | value [dc:identifier](http://purl.org/dc/terms/identifier) | https://wals.info [dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/ [dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/cldf-datasets/wals -[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) |
  1. cldf-datasets/wals v2020-37-g8c3c30b
  2. Glottolog v4.3
-[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) |
  1. python: 3.8.5
  2. python-packages: requirements.txt
+[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) |
  1. cldf-datasets/wals v2020.1-6-g0950885
  2. Glottolog v4.6
+[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) |
  1. python: 3.8.10
  2. python-packages: requirements.txt
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | wals [rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution @@ -70,7 +70,7 @@ property | value Name/Property | Datatype | Description --- | --- | --- [ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key -[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [parameters.csv::ID](#table-parameterscsv) +[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | The parameter or variable the code belongs to.
References [parameters.csv::ID](#table-parameterscsv) [Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` | [Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` | `Number` | `integer` | @@ -119,11 +119,11 @@ Name/Property | Datatype | Description --- | --- | --- [ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key [Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv) -[Primary_Text](http://cldf.clld.org/v1.0/terms.rdf#primaryText) | `string` | -[Analyzed_Word](http://cldf.clld.org/v1.0/terms.rdf#analyzedWord) | list of `string` (separated by `\t`) | -[Gloss](http://cldf.clld.org/v1.0/terms.rdf#gloss) | list of `string` (separated by `\t`) | -[Translated_Text](http://cldf.clld.org/v1.0/terms.rdf#translatedText) | `string` | -[Meta_Language_ID](http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference) | `string` | References [languages.csv::ID](#table-languagescsv) +[Primary_Text](http://cldf.clld.org/v1.0/terms.rdf#primaryText) | `string` | The example text in the source language. +[Analyzed_Word](http://cldf.clld.org/v1.0/terms.rdf#analyzedWord) | list of `string` (separated by `\t`) | The sequence of words of the primary text to be aligned with glosses +[Gloss](http://cldf.clld.org/v1.0/terms.rdf#gloss) | list of `string` (separated by `\t`) | The sequence of glosses aligned with the words of the primary text +[Translated_Text](http://cldf.clld.org/v1.0/terms.rdf#translatedText) | `string` | The translation of the example text in a meta language +[Meta_Language_ID](http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference) | `string` | References the language of the translated text
References [languages.csv::ID](#table-languagescsv) [Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` | ## Table [language_names.csv](./language_names.csv) diff --git a/cldf/StructureDataset-metadata.json b/cldf/StructureDataset-metadata.json index 82ff58f..64d7093 100644 --- a/cldf/StructureDataset-metadata.json +++ b/cldf/StructureDataset-metadata.json @@ -16,7 +16,7 @@ { "rdf:about": "https://github.com/cldf-datasets/wals", "rdf:type": "prov:Entity", - "dc:created": "v2020.1-10-g9116b7e", + "dc:created": "v2020.1-6-g0950885", "dc:title": "Repository" }, { diff --git a/cldf/docs/chapter_s1.html b/cldf/docs/chapter_s1.html index dd0c282..080bb0d 100644 --- a/cldf/docs/chapter_s1.html +++ b/cldf/docs/chapter_s1.html @@ -54,7 +54,7 @@

Due to these practical considerations, many features of current or potential future interest to linguistics had to be excluded. If the reader is disappointed that his or her favourite feature is not to be found in the atlas, chances are that this feature simply has not been described for a sufficiently large number of diverse languages to have warranted its inclusion. Hopefully, the absence of such features from the atlas will motivate future linguists to go out and collect the necessary data.

2.2. The maps

-

The great majority of maps show two hundred languages or more. Map 83A (“Order of object and verb”) shows the greatest number of languages, while the two maps on sign languages (Maps 139A and 140A) show a much smaller number, for the simple reason that linguists have only recently begun to study the grammatical structure of sign languages in a comparative perspective. On average, the maps show about 400 languages. This is less than 10 percent of the world’s languages, so the picture that we see in this atlas is far from complete. However, not more than 10-15 percent of languages have been described comprehensively, and many hundreds of languages are still completely or almost completely unknown. But both descriptive and comparative linguistics have made enormous progress in recent decades, and these efforts are reflected in the current work. Altogether 2676 languages, just little less than one half of the world’s languages, occur somewhere in the atlas — we call these the WALS languages. More than 6700 books and articles have been consulted by the authors and the relevant bibliographical references can be accessed in the online version of the atlas. In addition to the maps and accompanying texts, the atlas contains a genealogically organized list of the languages (the Genealogical Language List, by Matthew S. Dryer), to facilitate identification of each language. Issues having to do with the identification and designation of languages and language families are discussed in detail in §3.

+

The great majority of maps show two hundred languages or more. Map 83A (“Order of object and verb”) shows the greatest number of languages, while the two maps on sign languages (Maps 139A and 140A) show a much smaller number, for the simple reason that linguists have only recently begun to study the grammatical structure of sign languages in a comparative perspective. On average, the maps show about 400 languages. This is less than 10 percent of the world’s languages, so the picture that we see in this atlas is far from complete. However, not more than 10-15 percent of languages have been described comprehensively, and many hundreds of languages are still completely or almost completely unknown. But both descriptive and comparative linguistics have made enormous progress in recent decades, and these efforts are reflected in the current work. Altogether 2662 languages, just little less than one half of the world’s languages, occur somewhere in the atlas — we call these the WALS languages. More than 6700 books and articles have been consulted by the authors and the relevant bibliographical references can be accessed in the online version of the atlas. In addition to the maps and accompanying texts, the atlas contains a genealogically organized list of the languages (the Genealogical Language List, by Matthew S. Dryer), to facilitate identification of each language. Issues having to do with the identification and designation of languages and language families are discussed in detail in §3.

One chapter, the chapter on writing systems (chapter 141), is somewhat special with respect to the maps: it shows differently coloured areas rather than differently coloured dots.

2.3. The feature values

@@ -67,7 +67,7 @@

3. The languages

3.1. The WALS samples

-

There is a total of 2676 languages which appear on at least one map in the atlas. Some of these languages (262 in number) appear on only one map, while some, such as English, appear on most of the maps. There are 180 languages which appear on at least 80 maps, and 449 languages which appear on at least 40 maps. The choice of which languages to include on particular maps was the choice of individual authors. However, there is a set of 100 languages (hereafter the 100-language sample) which authors were asked to include on their maps if at all possible, and a further 100 languages which authors were encouraged to include on their maps (hereafter these two sets of 100 languages together are referred to as the 200-language sample).

+

There is a total of 2662 languages which appear on at least one map in the atlas. Some of these languages (262 in number) appear on only one map, while some, such as English, appear on most of the maps. There are 180 languages which appear on at least 80 maps, and 449 languages which appear on at least 40 maps. The choice of which languages to include on particular maps was the choice of individual authors. However, there is a set of 100 languages (hereafter the 100-language sample) which authors were asked to include on their maps if at all possible, and a further 100 languages which authors were encouraged to include on their maps (hereafter these two sets of 100 languages together are referred to as the 200-language sample).

A general desideratum for a good language sample is that it maximize both genealogical and areal diversity. Samples which include too many languages from one area of the world or too many languages from one family can provide a misleading picture of the relative frequency of different types of languages. Typological studies in the past have often included a disproportionate number of Indo-European languages or of languages of Europe or Eurasia. While Eurasia has a larger land mass than any other continental region in the world, fewer than twenty percent of the languages of the world are spoken on mainland Eurasia (i.e. excluding the languages of Indonesia and the Philippines and other islands). In fact, there are more languages spoken on the island of New Guinea than in mainland Eurasia. Furthermore, as a number of the maps in this atlas show, there are patterns of similarity among languages of Eurasia that one does not find elsewhere in the world. For example, Map 97A shows that the vast majority of the OV languages of Eurasia (i.e. ones that place the object before the verb) place the modifying adjective before the noun. From this, linguists in the past erroneously concluded that this was a normal feature of OV languages. But as Map 97A shows, this is not true outside of Eurasia, where OV languages more often place adjectives after the noun. Only by using samples of languages which include many languages from outside Eurasia can we avoid making erroneous inferences of this sort.

Maximizing genealogical and areal diversity were major considerations in constructing the 100- and 200-language samples. However, there were a number of other considerations that played a role in constructing these samples that would not generally play the same role in constructing samples of languages. First, most of the languages of the islands of the Pacific fall within the Oceanic branch of the Austronesian family and thus are closely related to each other. For instance, one would normally not include more than one of these languages in a sample of 100 or even 200 languages. However, because the sample used here is for an atlas, we decided that we ought to include more of these languages, since otherwise there would be few dots on the maps in the Pacific. For this reason, there are two Oceanic languages in the 100-language sample and seven in the 200-language sample. Similar considerations led to the inclusion of three Bantu languages in the 100-language sample and five in the 200-language sample. Without these, many of the maps would have shown few languages in sub-Saharan Africa, and the majority of those shown would have been non-Bantu languages that are in some ways atypical of this region. A second consideration that would not normally play a role in constructing a language sample is that we felt that we ought to include a number of the major languages of Eurasia, even when this meant including pairs of languages which are too close genealogically to be otherwise included in a sample of 100 or 200 languages, including English and German, French and Spanish, and Modern Hebrew and Egyptian Arabic.

A further consideration in choosing languages for the 100- and 200-language samples was the ready availability of detailed grammatical descriptions. In most cases, the choice of a language over genealogically related languages was based on the availability of detailed descriptions. Some of the languages that were included in the samples are ones for which there was no detailed description at the beginning of the WALS project (in 1999) but for which an expert on the language was willing to answer questions from authors (see §4). Some of the languages in the 200-language sample were chosen primarily for the purposes of maximizing genealogical or areal diversity, despite the fact that the available descriptions of these languages are somewhat meagre, thus making it impossible for many authors to include them on their maps. One language in the 200-language sample, Minica Huitoto, appears on only 32 maps; however this was because we eventually realized the need to distinguish this language from other Huitoto languages and some authors in attempting to include Huitoto used sources for one of these other languages.

diff --git a/cldf/requirements.txt b/cldf/requirements.txt index f5edae6..9b9e5ce 100644 --- a/cldf/requirements.txt +++ b/cldf/requirements.txt @@ -3,7 +3,7 @@ bs4==0.0.1 certifi==2019.11.28 chardet==3.0.4 -e git+https://github.com/cldf/cldfbench@4ec9bdd9f3f1c0d2d6f4473e66daf0fbd388c1e0#egg=cldfbench --e git+https://github.com/cldf-datasets/wals@9116b7e03e08c6272b528f4c8d04dcd80c1298bc#egg=cldfbench_wals +-e git+https://github.com/cldf-datasets/wals@0950885494ae16d2cd797143656fc4230711c669#egg=cldfbench_wals cldfcatalog==1.5.1 clldutils==3.10.1 colorama==0.4.3