-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathholle-list-2023-05-23.qmd
236 lines (201 loc) · 18.7 KB
/
holle-list-2023-05-23.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
title: "Digitised, searchable Holle List in Stokhof (1980)"
author:
- name:
given: Gede Primahadi Wijaya
family: Rajeg
url: https://www.ling-phil.ox.ac.uk/people/gede-rajeg
orcid: 0000-0002-2047-8621
affiliations:
- ref: ox
- ref: unud
affiliations:
- id: ox
name: University of Oxford
department: Faculty of Linguistics, Philology and Phonetics
country: United Kingdom
url: https://www.ling-phil.ox.ac.uk/
- id: unud
name: Universitas Udayana
department: Bachelor of English Literature, Faculty of Humanities
country: Indonesia
url: https://udayananetworking.unud.ac.id/lecturer/880-gede-primahadi-wijaya-rajeg
title-prefix: "Holle List"
date: 2023-05-23
date-modified: now
crossref:
fig-title: "**Table**"
fig-prefix: "Table"
citation:
type: dataset
publisher: University of Oxford, UK
doi: 10.5281/zenodo.7972273
version: 1.3.0
url: "https://engganolang.github.io/digitised-holle-list/"
number-sections: true
google-scholar: true
appendix-cite-as: display
license: "CC BY-SA"
doi: 10.5281/zenodo.7972273
cap-location: top
format:
html:
code-fold: true
code-tools:
source: https://github.com/engganolang/digitised-holle-list
toc: true
toc-location: right
theme:
- default
- custom.scss
mainfont: "Palatino"
citations-hover: true
footnotes-hover: true
bibliography: references.bib
keywords: "Holle List, Word List, Language Documentation, Lexical Database, Indonesian languages"
search: true
csl: unified-style-sheet-for-linguistics.csl
---
<br>[![](file-oxweb-logo.gif){width="84"}](https://www.ox.ac.uk/) [![](file-lingphil.png){width="83"}](https://www.ling-phil.ox.ac.uk/) [![Arts and Humanities Research Council](file-ahrc.png){width="325"}](https://www.ukri.org/councils/ahrc/) <br><br><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/" target="_blank"><img src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" alt="Creative Commons License" style="border-width:0"/></a> <a href="https://doi.org/10.5281/zenodo.7972273" target="_blank"><img src="https://zenodo.org/badge/DOI/10.5281/zenodo.7972273.svg" alt="DOI"/></a> [![DOI](https://img.shields.io/badge/doi-10.25446/oxford.23205173-blue.svg?style=flat&labelColor=gainsboro&logoWidth=40&logo=data%3Aimage%2Fpng%3Bbase64%2CiVBORw0KGgoAAAANSUhEUgAAAFAAAAAZCAYAAACmRqkJAAAKi0lEQVR4Ae3ZaVBUV97H8evuE0EfH32MmkcfoyAuGjXKgkvMaFRAFuiloemWvRuEXlgEBREXBYJiXAQUFeKocUniQiKogAJhQWwWENDEjLNYvjFLzUzNkplEZb5kTme6nCRjKlOpSZlb9SmL2%2Ffcuv3re87%2FnKP0TYfOcslqPMbt63xBKuh09MTxgi7HKT1Sj1TvKp%2BMkZB6%2FXT8c4AjUYPyVdfb7Qs6HTIJ8EHe7Ul%2B152CphDabRQ0uMr7%2FRQgh%2B8qU6%2FBiPDVGv0jq0uGE94b0ZZ3j%2B25MTetoMsh%2FWD91OBqT9%2Fsehd5EqGV17nKMzTqOHvaRMMLEp7qACfinq%2FW1BBx5ZxB13x5X3Jr1v%2Fz9pUcaHU63PiicjrhvXfNRbY1Th49Q6Y1vu6zyqSjzX3aVIgf4OkKToxhgxpd5OMzV0bYE4CRN1Chu34pnTfwnV03FiTlfzDRXBHo6dfgIq8sX6ByV6vjthGc0UdrrPPVGFQBxlSjzJQWENVUZkebceiLpyM8IZSx7O7Zl4JivUNMZX5h8Rt4%2B2L0llKfgu6JKa%2BXvpB5bZ48%2Ba3F6lil2pDkE2rODzCsU0VUnNFHNZQqdS3lx3Utl%2FMILQcfYt5TEeC1GSprgAq0XlgYGLQyxJTlr0uK0DVX7E5s2ZtOgHvLw5fLK9xVmcqguEj%2F2LXbwsvPBkZZKl4j5NcIKinaUsLbejFWZ7m8Do2cmwnb4cFqArRwx3TEYzi%2Bz7DTD0uhxnj8cAEWWUZK%2BTcdhh4pmTWUsW01Y1uCUmNY7Rtqzo5svJSS0poVXtg6yVj7sn9qunek3j8xPVXXeMFoaDkev6lDF7ene7Y5r2taNAXmEBXaP69zevaOjuUeeZ0zhzJuPsM5CdYvOhZVqBMhBqIVDt8zwGdQjR4of9AA%2BXJjUFpww7GodnHAQca4srDAWCXjW3pETal%2BbfumuOLKqSm17vIQtWr1Uu3JYy6JbXuXFbRN1R8pm5byxtG5CcdOz9EUVc7I5IeQEWQ7wWVwzwrsRn%2BbAFeiCxNsKv5Y9P03BFgjAlT90AGOQy2T47fObl00ocFZHl%2B2UGXw0RjzNUWHTPFthckHWh18al8KsGuaFigVVzlKuY%2BG9z37qvuoGlelpsJVldrgrFjbOE%2BeWe8uW18W84qCqc4s7tmCIgzI75hs%2FaJKNFu7rF%2BIIIhr%2BmIQ%2Btn8LQkDMQOeWAYnDHgsQI3NNU7W9j4h5t72o%2FEyvLEQ%2F%2Bu7ymzbOxbCAeOxAgtghz6YgOVYiufEOUlqu0M37ho%2BYn%2FnpJT8bsejVSt90uqdFdlGmV7hF7cuWXetNCShLX%2BI3nKhN%2ByvCs%2Bs6GQpWB33fzKNQR%2BqWr022yvc94q7spBCY%2Bbzkou6ZfJNPf89ZN%2FdidYHnIsKfIzjCMIc7MAwSJiMPFxGMcKQixGwx07R%2FiEe4CNsxFCbAJvwifj8LkIgYRHa8Lm47jNY8AokmMS5NryPh%2FijOB%2BOX4h7foEuyPHlisMtylJpzu1YspkQ36YbLqnx8F1X4abaqmYs9DGmLlrk4CE9XlHlKZskxfpt%2FUJLzyhV23dG%2BITF72fqo9njEaokwIu8lSbG1N4wx273CrP%2B%2BjniQVZhGrzQjlEioFIRcjDM6MIdjBVtHogvl4W9qIX8sTfwU5SgU%2FzdhdGYLcJ9BzvRID6vgx2SxN8PUI9KnIEWH4n7FuIo%2FoRfYV5vMMV4wHRFs%2BvG%2FKl05ZrDVdP11T7eulK3oNQcz%2FAXcj3DpMePjO44KetDL2lDh%2FmV1S3nNoeWnJb7RSXmMJl%2BI0GmH13rKs8lvEdQwfoWKmCxdmGbAEdgAW5jFiQhBb8WXSYTPSjGCBHaMPR5LMANkOCM%2B%2FgD3MS5Z8W1ElzwW3HNJCSI9tcw2ub%2BO8T5LPTBQBy1nusNcB7ztximI1sIsSSzXb04v3vyusJmx63nMufHXlV6LvpEShDd9x%2FHFYWXVPuSX7%2FD7zmpcjuWRupbyvaHnj8Z7BNsUFCArm70iTRcd5bFEN7oxwJs%2FpoA%2FwfBaLJ2Z2EFbmEsNKL7fYYPUI9DIqj%2Fsgkw0CasW%2BL6RbBDFI7gTZSKzz6Gk02AJ23G3QF4xybYU8INce6s5CJNlTyXhYwKv%2FRWMiEeimquzIhrPpGzuSNCsbvLec2%2Brpmh2e0yu%2FxOp96wv6p8X0xeIZW5Bo2%2F6ucdvb%2FdMWVDm8lX11pRpD16OJ6VyZsrQ8yK%2BVFJ9h4UhwEHDj5JgGE23UkSfoZujMMzSESNCPBT9KAFjqi2rcIYZRPgYmzDQ9xDLSz4%2FGsCPIE%2BNkWrTJy%2FhRrRthpVyJJExbnmG2I%2B6x%2BT%2FHxYyQkzQfJGlufpWy6bYlvPUEgu%2BHlHJA5boo7rE3blnBR7r6mv%2BvCBMYEag%2Faqsyr1%2BIk5a%2Fd2z9zGBDpZ31qulCWk9443Hfg5BuJJAgxAG0ZBEmS4DZ7RKIliMVi0d8UvRUCeuPoNAf4Z%2FmgV13pAwiwR3iffFKBQJM5noB%2F6Y5h45v7Wwf0cDtD1DlMIeiugWmZOy5Cv3RgjX7%2FF4GdMXasOjgurmqdafqpojltml9IjvOJ8NMu9lNL5gQmXdMu0BTefz8loMyoJvivs3VMZvhpjqaig%2FZ8gwJGYIsIKRh%2FY4wh%2Bg%2FGQoxYbREgZ%2BB3uww1V3xKgN%2BrwCNtF4Pvx8NveQCEYX%2BAukhCIYuHZLy%2FyDjHbJQfo7PTK1dEBWqPBX2vS%2B2hNW1XquDURypiwXStCjVWuyrSKQC%2FdoUaHtOT2HENoyal4b40x7rK7ylip9NIV3Jy0P6fD24fl3Ra6uoe3PNqOH2Pw3x%2FC8K8CHIU%2BIpQ7OI8yNOJ9TMJO%2FAU9Nn6PjRiGmm%2FpwgsRLQpKjwjuU%2Fz1CQK0R4G4T4%2FwCHWYKlmcA6xr4SA2EzobXeUa9vh21LgpdKxK8hqd5RsaXWS7S9YvlhU2O7ya3ekXrm%2B9lK3KzFH6a4y5V92Ve5hkM4d02EShMestZekE2IxZX7MWdkAgBtmsi9U2lXEwliAOK%2BGLTowThWIZkrEVSSKYgegPOUxwtFmdaBGLsRgg2qeKtosQDh2GYzbisUIEaPvcQ8T5VGzCKowBk2I3mTVALe4wd4tumKcoaZirSKte4RtVrvXwLrw%2BJXV%2F18Ts3BtLEmOaS0yRtRdMfpGJhTKNMbDJWR5V7eEbUNDtcIQAd1PJMwnuJl6E9KQHY7AAHkzQoBkj8B%2B%2FpTWQ4Maezne1P3x1esLBuqmB%2BbccNhJMGetbM%2BGZIi1V%2FoRyOXB77sKVWuPmrd4RBvYQm9ihVue%2F7xDPGljB50MoJmO%2By36gCGsQovCyCGwOarD9R7PLLXZOJjKZvse%2FDQQSvffG7F1rWrZPiLKUX2DPr1hbfHAKb0kDBSeTed5MQj94Pn1xBMvA%2B2IDYTAkcXzXANPRjHq04ACeFeH9aAIcBC3LOq%2FY5pPDeYtO4yRTmzUhbx9LozCEea8ybaHoxDNmVtPltxSVzxhCm3Asg4Tvs683Aa5wwkD8qP9XbgQqUbb6Tp09U5Os3rWiV4jZv2OuvxPdvht70RfST8fjATZd7P33OYzxZ%2FdF7FwcgqPU0yMR2vMYDulpDfBvw%2BGCdBePpq8AAAAASUVORK5CYII%3D)](http://dx.doi.org/10.25446/oxford.23205173) <br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
## Introduction {#sec-intro}
The Holle List (hereafter HL) consists of approximately 1000 lexical items designed by K. F. Holle (1829-1896), an "eminent authority and lover of the Netherlands Indies and their people" [@holleli1980, 1]. The HL was prepared to be dispersed across the Indonesian archipelago to gather knowledge about the linguistic situation of Indonesia, the then Dutch colony.
The HL exists in three variants (versions 1894, 1904/1911, and 1931) differing slightly in content and the order of the items. The HL in the [`engganolang`](https://github.com/engganolang/digitised-holle-list) GitHub repository and in the [Oxford University Research Archive (ORA)](https://ora.ox.ac.uk/objects/uuid:a511951b-86fb-4019-94d4-280efa83de02) [@Rajeg_Holle_2023] is the "new basic list (NBL)" set up by Stokhof [-@holleli1980, 17, 22-72] "to facilitate comparative work" across the three different variants of the HL (see @fig-table for the interactive version and the [raw file here](https://github.com/engganolang/digitised-holle-list/blob/main/data/digitised-holle-list-in-stokhof-1980.tsv)). The NBL captures all lexical items appearing in the three variants of the HL, except those items "which never or hardly ever appeared to be filled in by the researchers" [@holleli1980, 17]. These exception items appear as footnotes in the word list of each target language.
## The rationale for the digitisation of the Holle List {#sec-rational}
The publication of the three variants of the Holle List as the new basic list (NBL) in Stokhof [-@holleli1980] is available as an [open-access PDF](https://core.ac.uk/reader/159464813) file under the [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/) license (license provided in the footer of the cover page in the PDF file). While the PDF itself is searchable via the basic find functionality in a PDF viewer, the list is obviously not manipulatable (e.g., when we want to filter certain items). It also cannot facilitate computational processing to automatically match the IDs of the list with the ID of the vocabulary in the target languages.
Given that the CC-BY license allows us to copy, adapt and build upon the material for any purpose, as long as we provide attribution (i.e., citation) to the original material, we decided to digitise the NBL into a fully searchable, portable format (i.e., a UTF-8 encoded, tab-separated plain text) (see @fig-table for the interactive version). The digitisation is conducted in conjunction with our AHRC-funded project to build lexical resources for Enggano ("Lexical resources for Enggano, a threatened language of Indonesia", <https://enggano.ling-phil.ox.ac.uk/>). The project, amongst others, aims at bringing together a host of historical, paper-born resources available for Enggano. The Enggano vocabularies in the Holle List is one of the oldest from the late 19^th^ century [collected in 1895 by Abs vd Noord: see @stokhof-1987, 189]; this late 19^th^ century word list of Enggano has also been digitised and deposited on [GitHub](https://github.com/engganolang/holle-list-enggano-1895), the [Oxford Research Archive (ORA)](https://ora.ox.ac.uk/objects/uuid:070e5dd2-512f-4c2b-812d-f24bb944a81f), and [Zenodo](https://doi.org/10.5281/zenodo.8038974) [@rajeg2023a].
## Content of the digitised Holle List {#sec-content}
The digitised, NBL Holle List (HL) preserves the original columns. The columns containing the years for the three versions of the Holle List were renamed so that these columns do not begin with numbers. Note that the first four columns are not labelled in the original PDF. These columns are the `Index`, `Dutch`, `English`, and `Indonesian`. The Indonesian glosses were taken from the 1931 version of the HL [@holleli1980, 18]. It is the values in the `Index` column that can be computationally matched with the Index in the (also digitised) word lists of the target languages (published as subsequent volumes after Stokhof [-@holleli1980]); a use case of this computational matching is performed in preparing the Enggano word list that is part of the HL [see @rajeg2023a].
The values of the `English` column in @fig-table are hyperlinked to the [Concept sets](https://concepticon.clld.org/parameters) in the [Concepticon](https://concepticon.clld.org/) catalogue [@list]. The initial mapping of the English glosses to the Concepticon Concept sets was programmatically performed using [pyconcepticon](https://pypi.org/project/pyconcepticon/ "pyconcepticon") [@forkel2022], a Python package to access and curate the Concepticon data, following the tutorial in Tjuka [-@tjuka2020]. The output of the mapping has also been manually curated and checked (track the changes [here](https://github.com/engganolang/digitised-holle-list/commits/main/data/concepticon-mapping.tsv)). However, there are cases where the English glosses cannot be linked to the relevant Concept sets because they are not yet mapped in the Concepticon data. For this case, the glosses are not hyperlinked.
We added several new columns after the version years columns. One of these is the `Swadesh` columns (Boolean `true`/`false`), indicating whether the entries are part of the Swadesh items (`true`) or not (`false`)[^1] [see @holleli1980, 141-143]. From this table, it is then possible to easily filter out the Swadesh items, something that is not possible in the PDF version, since the NBL table does not directly include a column marking which items are from the Swadesh list; we then hand-coded this Swadesh column based on the index numbers provided by Stokhof [-@holleli1980, 141-143].
[^1]: There are also Swadesh items that do not have number index in the NBL. These items are marked with "`--`" in the PDF [@holleli1980, 141-143]; they can be viewed and downloaded [here](https://github.com/engganolang/digitised-holle-list/blob/main/data/swadesh-unindexed-in-NBL.txt)
An additional column after the `Swadesh` column is the `Swadesh_orig` column. It lists the English forms/labels given in the Swadesh appendix [@holleli1980, 141-143], which could be phrased differently in the `English` column in the NBL. When the forms in the `Swadesh` and the `English` columns are exact matches, the `Swadesh_orig` column is left empty. Moreover, typo corrections were done for the entries of the three language columns (either typo from the original PDF or typo due to the first-pass OCR error) (listed in the `Remark` column). Finally, there are two additional tables (to the right panel of @fig-table), which contain phrases and clauses from the 1904/1911 ([raw file](https://github.com/engganolang/digitised-holle-list/blob/main/data/digitised-holle-list-in-stokhof-1980-add-1904_1911.tsv)) and the 1931 ([raw file](https://github.com/engganolang/digitised-holle-list/blob/main/data/digitised-holle-list-in-stokhof-1980-add-1931.tsv)) editions of the HL.
We hope that the digitised NBL of the Holle List can be helpful for, and escalate the workflow of, other researchers with computational orientation. Readers/users are also encouraged to check the original PDF list in Stokhof [-@holleli1980].
```{r}
#| message: false
#| warning: false
#| echo: false
# code to process the digitised Holle's list in Stokhof and Almanar (1987)
# load the package =====
library(tidyverse)
library(readxl)
library(reactable)
library(tippy)
```
::: column-page
::: panel-tabset
### The new basic list (NBL)
```{r}
#| fig-cap: The digitised, new basic list of the Holle List in Stokhof [-@holleli1980, 22-72]
#| label: fig-table
#| message: false
#| warning: false
# read the concepticon mapping
concepticon <- read_tsv("data/concepticon-mapping.tsv") |>
rename(Index = NUMBER,
English = GLOSS,
Concepticon_Gloss = CONCEPTICON_GLOSS) |>
select(-SIMILARITY) |>
mutate(concept_url = paste("https://concepticon.clld.org/parameters/",
CONCEPTICON_ID,
sep = ""))
concepticon_checked <- concepticon |>
filter(CHECKED == "y") |>
select(English, Index, Concepticon_Gloss, concept_url) |>
mutate(Index = as.character(Index))
holle_tb <- read_tsv("data/digitised-holle-list-in-stokhof-1980.tsv")
holle_tb <- holle_tb |>
# merge with the checked concepticon mapping
left_join(concepticon_checked, by = join_by(Index, English)) |>
mutate(English = replace(English, is.na(English), ""))
url_eng <- '<a href="%s" target="_blank">%s</a>'
holle_tb |>
reactable(style = list(fontFamily = "Canela Text"),
elementId = "digitised-holle-list",
filterable = TRUE,
highlight = TRUE,
resizable = TRUE,
bordered = TRUE,
borderless = TRUE,
defaultPageSize = 20,
wrap = FALSE,
columns = list(
Index = colDef(align = "center",
sticky = "left"),
Dutch = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
English = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = if_else(!is.na(holle_tb$Concepticon_Gloss[index]),
sprintf(url_eng,
holle_tb$concept_url[index],
value),
value),
tooltip = value)}),
Indonesian = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
v1894 = colDef(align = "center"),
`v1904/1911` = colDef(align = "center"),
v1931 = colDef(align = "center"),
Swadesh = colDef(align = "center"),
Swadesh_orig = colDef(minWidth = 150),
Concepticon_Gloss = colDef(show = FALSE),
concept_url = colDef(show = FALSE)
))
```
### Additional list (1904/1911 edition)
```{r}
#| fig-cap: The additional list from the 1904/1911 edition [@holleli1980, 73-74]
#| label: fig-1904
#| message: false
#| warning: false
holle_1904_tb <- read_tsv("data/digitised-holle-list-in-stokhof-1980-add-1904_1911.tsv")
holle_1904_tb |>
reactable(style = list(fontFamily = "Canela Text"),
elementId = "1904-edition",
filterable = TRUE,
highlight = TRUE,
resizable = TRUE,
bordered = TRUE,
borderless = TRUE,
defaultPageSize = 10,
wrap = FALSE,
columns = list(
Index = colDef(align = "center",
sticky = "left"),
Dutch = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
English = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
Indonesian = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)})
))
```
### Additional list (1931 edition)
```{r}
#| fig-cap: The additional list from the 1931 edition [@holleli1980, 72-73]
#| label: fig-1931
#| message: false
#| warning: false
holle_1931_tb <- read_tsv("data/digitised-holle-list-in-stokhof-1980-add-1931.tsv")
holle_1931_tb |>
reactable(style = list(fontFamily = "Canela Text"),
elementId = "1931-edition",
filterable = TRUE,
highlight = TRUE,
resizable = TRUE,
bordered = TRUE,
borderless = TRUE,
defaultPageSize = 10,
wrap = FALSE,
columns = list(
Index = colDef(align = "center",
sticky = "left"),
Dutch = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
English = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)}),
Indonesian = colDef(minWidth = 150,
cell = function(value, index, name) {tippy(text = value, tooltip = value)})
))
```
:::
:::