Skip to content

Commit

Permalink
v2020.2
Browse files Browse the repository at this point in the history
  • Loading branch information
xrotwang committed Jul 7, 2022
1 parent 0950885 commit 806dc97
Show file tree
Hide file tree
Showing 4 changed files with 12 additions and 12 deletions.
16 changes: 8 additions & 8 deletions cldf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ property | value
[dc:identifier](http://purl.org/dc/terms/identifier) | https://wals.info
[dc:license](http://purl.org/dc/terms/license) | https://creativecommons.org/licenses/by/4.0/
[dcat:accessURL](http://www.w3.org/ns/dcat#accessURL) | https://github.com/cldf-datasets/wals
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/wals/tree/8c3c30b">cldf-datasets/wals v2020-37-g8c3c30b</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.3">Glottolog v4.3</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.8.5</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[prov:wasDerivedFrom](http://www.w3.org/ns/prov#wasDerivedFrom) | <ol><li><a href="https://github.com/cldf-datasets/wals/tree/0950885">cldf-datasets/wals v2020.1-6-g0950885</a></li><li><a href="https://github.com/glottolog/glottolog/tree/v4.6">Glottolog v4.6</a></li></ol>
[prov:wasGeneratedBy](http://www.w3.org/ns/prov#wasGeneratedBy) | <ol><li><strong>python</strong>: 3.8.10</li><li><strong>python-packages</strong>: <a href="./requirements.txt">requirements.txt</a></li></ol>
[rdf:ID](http://www.w3.org/1999/02/22-rdf-syntax-ns#ID) | wals
[rdf:type](http://www.w3.org/1999/02/22-rdf-syntax-ns#type) | http://www.w3.org/ns/dcat#Distribution

Expand Down Expand Up @@ -70,7 +70,7 @@ property | value
Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | References [parameters.csv::ID](#table-parameterscsv)
[Parameter_ID](http://cldf.clld.org/v1.0/terms.rdf#parameterReference) | `string` | The parameter or variable the code belongs to.<br>References [parameters.csv::ID](#table-parameterscsv)
[Name](http://cldf.clld.org/v1.0/terms.rdf#name) | `string` |
[Description](http://cldf.clld.org/v1.0/terms.rdf#description) | `string` |
`Number` | `integer` |
Expand Down Expand Up @@ -119,11 +119,11 @@ Name/Property | Datatype | Description
--- | --- | ---
[ID](http://cldf.clld.org/v1.0/terms.rdf#id) | `string` | Primary key
[Language_ID](http://cldf.clld.org/v1.0/terms.rdf#languageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
[Primary_Text](http://cldf.clld.org/v1.0/terms.rdf#primaryText) | `string` |
[Analyzed_Word](http://cldf.clld.org/v1.0/terms.rdf#analyzedWord) | list of `string` (separated by `\t`) |
[Gloss](http://cldf.clld.org/v1.0/terms.rdf#gloss) | list of `string` (separated by `\t`) |
[Translated_Text](http://cldf.clld.org/v1.0/terms.rdf#translatedText) | `string` |
[Meta_Language_ID](http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference) | `string` | References [languages.csv::ID](#table-languagescsv)
[Primary_Text](http://cldf.clld.org/v1.0/terms.rdf#primaryText) | `string` | The example text in the source language.
[Analyzed_Word](http://cldf.clld.org/v1.0/terms.rdf#analyzedWord) | list of `string` (separated by `\t`) | The sequence of words of the primary text to be aligned with glosses
[Gloss](http://cldf.clld.org/v1.0/terms.rdf#gloss) | list of `string` (separated by `\t`) | The sequence of glosses aligned with the words of the primary text
[Translated_Text](http://cldf.clld.org/v1.0/terms.rdf#translatedText) | `string` | The translation of the example text in a meta language
[Meta_Language_ID](http://cldf.clld.org/v1.0/terms.rdf#metaLanguageReference) | `string` | References the language of the translated text<br>References [languages.csv::ID](#table-languagescsv)
[Comment](http://cldf.clld.org/v1.0/terms.rdf#comment) | `string` |

## <a name="table-languagenamescsv"></a>Table [language_names.csv](./language_names.csv)
Expand Down
2 changes: 1 addition & 1 deletion cldf/StructureDataset-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
{
"rdf:about": "https://github.com/cldf-datasets/wals",
"rdf:type": "prov:Entity",
"dc:created": "v2020.1-10-g9116b7e",
"dc:created": "v2020.1-6-g0950885",
"dc:title": "Repository"
},
{
Expand Down
4 changes: 2 additions & 2 deletions cldf/docs/chapter_s1.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ <h3 class="heading-2">
<p class="Firstlineindent">Due to these practical considerations, many features of current or potential future interest to linguistics had to be excluded. If the reader is disappointed that his or her favourite feature is not to be found in the atlas, chances are that this feature simply has not been described for a sufficiently large number of diverse languages to have warranted its inclusion. Hopefully, the absence of such features from the atlas will motivate future linguists to go out and collect the necessary data.</p>
<h3 class="heading-2">
<a name="2.2._The_maps"></a>2.2. The maps</h3>
<p class="Textbody">The great majority of <a href="http://wals.info/feature">maps</a> show two hundred languages or more. Map <a class="Feature" href="http://wals.info/feature/83A?tg_format=map">83A</a> (“Order of object and verb”) shows the greatest number of languages, while the two maps on sign languages (Maps <a class="Feature" href="http://wals.info/feature/139A?tg_format=map">139A</a> and <a class="Feature" href="http://wals.info/feature/140A?tg_format=map">140A</a>) show a much smaller number, for the simple reason that linguists have only recently begun to study the grammatical structure of sign languages in a comparative perspective. On average, the maps show about 400 languages. This is less than 10 percent of the world’s languages, so the picture that we see in this atlas is far from complete. However, not more than 10-15 percent of languages have been described comprehensively, and many hundreds of languages are still completely or almost completely unknown. But both descriptive and comparative linguistics have made enormous progress in recent decades, and these efforts are reflected in the current work. Altogether 2676 languages, just little less than one half of the world’s languages, occur somewhere in the atlas — we call these the <i><b>WALS</b></i> <b>languages</b>. More than 6700 books and articles have been consulted by the authors and the relevant bibliographical references can be accessed in the online version of the atlas. In addition to the maps and accompanying texts, the atlas contains a genealogically organized list of the languages (the <a class="genealogy" href="http://wals.info/languoid/genealogy">Genealogical Language List</a>, by <a href="http://wals.info/author#dryerms">Matthew S. Dryer</a>), to facilitate identification of each language. Issues having to do with the identification and designation of languages and language families are discussed in detail in <a href="#3._The_languages">§3</a>.</p>
<p class="Textbody">The great majority of <a href="http://wals.info/feature">maps</a> show two hundred languages or more. Map <a class="Feature" href="http://wals.info/feature/83A?tg_format=map">83A</a> (“Order of object and verb”) shows the greatest number of languages, while the two maps on sign languages (Maps <a class="Feature" href="http://wals.info/feature/139A?tg_format=map">139A</a> and <a class="Feature" href="http://wals.info/feature/140A?tg_format=map">140A</a>) show a much smaller number, for the simple reason that linguists have only recently begun to study the grammatical structure of sign languages in a comparative perspective. On average, the maps show about 400 languages. This is less than 10 percent of the world’s languages, so the picture that we see in this atlas is far from complete. However, not more than 10-15 percent of languages have been described comprehensively, and many hundreds of languages are still completely or almost completely unknown. But both descriptive and comparative linguistics have made enormous progress in recent decades, and these efforts are reflected in the current work. Altogether 2662 languages, just little less than one half of the world’s languages, occur somewhere in the atlas — we call these the <i><b>WALS</b></i> <b>languages</b>. More than 6700 books and articles have been consulted by the authors and the relevant bibliographical references can be accessed in the online version of the atlas. In addition to the maps and accompanying texts, the atlas contains a genealogically organized list of the languages (the <a class="genealogy" href="http://wals.info/languoid/genealogy">Genealogical Language List</a>, by <a href="http://wals.info/author#dryerms">Matthew S. Dryer</a>), to facilitate identification of each language. Issues having to do with the identification and designation of languages and language families are discussed in detail in <a href="#3._The_languages">§3</a>.</p>
<p class="Firstlineindent">One chapter, the chapter on writing systems (chapter <a class="Chapter" href="http://wals.info/chapter/141">141</a>), is somewhat special with respect to the maps: it shows differently coloured areas rather than differently coloured dots.</p>
<h3 class="heading-2">
<a name="2.3._The_feature_values"></a>2.3. The feature values</h3>
Expand All @@ -67,7 +67,7 @@ <h2 class="heading-1">
<a name="3._The_languages"></a>3. The languages</h2>
<h3 class="heading-2">
<a name="3.1._The_WALS_samples"></a>3.1. The WALS samples</h3>
<p class="Textbody">There is a total of 2676 languages which appear on at least one map in the atlas. Some of these languages (262 in number) appear on only one map, while some, such as <a class="Language" href="http://wals.info/languoid/lect/wals_code_eng" title="view language details">English</a>, appear on most of the maps. There are 180 languages which appear on at least 80 maps, and 449 languages which appear on at least 40 maps. The choice of which languages to include on particular maps was the choice of individual authors. However, there is a set of 100 languages (hereafter the <a href="http://wals.info/languoid/samples/100">100-language sample</a>) which authors were asked to include on their maps if at all possible, and a further 100 languages which authors were encouraged to include on their maps (hereafter these two sets of 100 languages together are referred to as the <a href="http://wals.info/languoid/samples/200">200-language sample</a>).</p>
<p class="Textbody">There is a total of 2662 languages which appear on at least one map in the atlas. Some of these languages (262 in number) appear on only one map, while some, such as <a class="Language" href="http://wals.info/languoid/lect/wals_code_eng" title="view language details">English</a>, appear on most of the maps. There are 180 languages which appear on at least 80 maps, and 449 languages which appear on at least 40 maps. The choice of which languages to include on particular maps was the choice of individual authors. However, there is a set of 100 languages (hereafter the <a href="http://wals.info/languoid/samples/100">100-language sample</a>) which authors were asked to include on their maps if at all possible, and a further 100 languages which authors were encouraged to include on their maps (hereafter these two sets of 100 languages together are referred to as the <a href="http://wals.info/languoid/samples/200">200-language sample</a>).</p>
<p class="Firstlineindent">A general desideratum for a good language sample is that it maximize both genealogical and areal diversity. Samples which include too many languages from one area of the world or too many languages from one family can provide a misleading picture of the relative frequency of different types of languages. Typological studies in the past have often included a disproportionate number of Indo-European languages or of languages of Europe or Eurasia. While Eurasia has a larger land mass than any other continental region in the world, fewer than twenty percent of the languages of the world are spoken on mainland Eurasia (i.e. excluding the languages of Indonesia and the Philippines and other islands). In fact, there are more languages spoken on the island of New Guinea than in mainland Eurasia. Furthermore, as a number of the maps in this atlas show, there are patterns of similarity among languages of Eurasia that one does not find elsewhere in the world. For example, Map <a class="Feature" href="http://wals.info/feature/97A?tg_format=map">97A</a> shows that the vast majority of the OV languages of Eurasia (i.e. ones that place the object before the verb) place the modifying adjective before the noun. From this, linguists in the past erroneously concluded that this was a normal feature of OV languages. But as Map <a class="Feature" href="http://wals.info/feature/97A?tg_format=map">97A</a> shows, this is not true outside of Eurasia, where OV languages more often place adjectives after the noun. Only by using samples of languages which include many languages from outside Eurasia can we avoid making erroneous inferences of this sort.</p>
<p class="Firstlineindent">Maximizing genealogical and areal diversity were major considerations in constructing the 100- and 200-language samples. However, there were a number of other considerations that played a role in constructing these samples that would not generally play the same role in constructing samples of languages. First, most of the languages of the islands of the Pacific fall within the <a href="http://wals.info/languoid/genus/oceanic" class="Genus" title="Oceanic">Oceanic</a> branch of the <a href="http://wals.info/languoid/family/austronesian" class="Family" title="Austronesian">Austronesian</a> family and thus are closely related to each other. For instance, one would normally not include more than one of these languages in a sample of 100 or even 200 languages. However, because the sample used here is for an atlas, we decided that we ought to include more of these languages, since otherwise there would be few dots on the maps in the Pacific. For this reason, there are two Oceanic languages in the 100-language sample and seven in the 200-language sample. Similar considerations led to the inclusion of three Bantu languages in the 100-language sample and five in the 200-language sample. Without these, many of the maps would have shown few languages in sub-Saharan Africa, and the majority of those shown would have been non-Bantu languages that are in some ways atypical of this region. A second consideration that would not normally play a role in constructing a language sample is that we felt that we ought to include a number of the major languages of Eurasia, even when this meant including pairs of languages which are too close genealogically to be otherwise included in a sample of 100 or 200 languages, including <a class="Language" href="http://wals.info/languoid/lect/wals_code_eng" title="view language details">English</a> and <a class="Language" href="http://wals.info/languoid/lect/wals_code_ger" title="view language details">German</a>, <a class="Language" href="http://wals.info/languoid/lect/wals_code_fre" title="view language details">French</a> and <a class="Language" href="http://wals.info/languoid/lect/wals_code_spa" title="view language details">Spanish</a>, and <a class="Language" href="http://wals.info/languoid/lect/wals_code_heb" title="view language details">Modern Hebrew</a> and <a class="Language" href="http://wals.info/languoid/lect/wals_code_aeg" title="view language details">Egyptian Arabic</a>.</p>
<p class="Firstlineindent">A further consideration in choosing languages for the 100- and 200-language samples was the ready availability of detailed grammatical descriptions. In most cases, the choice of a language over genealogically related languages was based on the availability of detailed descriptions. Some of the languages that were included in the samples are ones for which there was no detailed description at the beginning of the <i>WALS</i> project (in 1999) but for which an expert on the language was willing to answer questions from authors (see <a href="#4._The_data_sources">§4</a>). Some of the languages in the 200-language sample were chosen primarily for the purposes of maximizing genealogical or areal diversity, despite the fact that the available descriptions of these languages are somewhat meagre, thus making it impossible for many authors to include them on their maps. One language in the 200-language sample, <a class="Language" href="http://wals.info/languoid/lect/wals_code_hmi" title="view language details">Minica Huitoto</a>, appears on only 32 maps; however this was because we eventually realized the need to distinguish this language from other <a href="http://wals.info/languoid/genus/huitoto" class="Genus" title="Huitoto">Huitoto</a> languages and some authors in attempting to include Huitoto used sources for one of these other languages.</p>
Expand Down
2 changes: 1 addition & 1 deletion cldf/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ bs4==0.0.1
certifi==2019.11.28
chardet==3.0.4
-e git+https://github.com/cldf/cldfbench@4ec9bdd9f3f1c0d2d6f4473e66daf0fbd388c1e0#egg=cldfbench
-e git+https://github.com/cldf-datasets/wals@9116b7e03e08c6272b528f4c8d04dcd80c1298bc#egg=cldfbench_wals
-e git+https://github.com/cldf-datasets/wals@0950885494ae16d2cd797143656fc4230711c669#egg=cldfbench_wals
cldfcatalog==1.5.1
clldutils==3.10.1
colorama==0.4.3
Expand Down

0 comments on commit 806dc97

Please sign in to comment.