The main purpose of this Python script is the conversion of each dictionary and inflection entry into an appropriate JSON representation.
The late col. William Whitaker created a large dictionary of the Latin language, which contains about 39500 entries gathered from various sources. He also made a very useful program to aid in dictionary search.
Whitaker wrote this program a long while ago, unaware of the technologies that we - for the better or the worse - deal with today. Thus, the original program has lots of room for improvement and modernisation, and as of this moment there are several independent projects doing exactly that. I have resolved to undertake one such project myself.
Therefore, I have found organising the data my first priority. The original version read the lines from the plaintext source files (DICTLINE.txt
and INFLECTS.txt
), processed them and finally saved them in a binary format for easy reading/writing from/to the disk.
Back then, when computers used to have major memory limitations, this was perfectly reasonable. Nowadays, there is no reason not to store all this data in a widely used and supported data format like JSON. This makes it easier to build a new, modern, more powerful implementation of WORDS; easily integrated in modern languages and technologies.
Each dictionary entry and each inflection represents a single JSON entry and each one of them has certain attributes which depend on the entry/inflection in question.
Below is a series of table documenting all entries with explanations for each attribute.
But first, there is a series of attributes which are the same for all dictionary entries. In the tables below, I shall omit them, but you should assume that all entries have them:
Attribute | Description |
---|---|
pos |
Type of word |
stems |
List of all stems the word has* |
age |
Approximate time of word's origin |
geography |
Approximate location word's usage |
area |
Area of study the word pertains to |
frequency |
How often the word appears in texts |
source |
Dictionary the word was taken from |
senses |
Word definitions and meanings |
* Depending on word type, the number of stems ranges anywhere from 1 to 4. In the tables below, each table entry will have a detailed explanation.
The same goes for attributes in the inflection entries:
Attribute | Description |
---|---|
pos |
Type of word the inflection is applied to |
stem |
Stem the ending should be applied to |
characters |
How many characters the ending has |
ending |
Ending itself |
age |
Approximate time of inflection's origin |
frequency |
How often the inflection appears in texts |
Attribute | Description |
---|---|
stems (1) |
Nominative stem |
stems (2) |
Genitive stem* |
declension |
Noun declension |
declension_variant |
Specific type of noun declension |
gender |
Grammatical gender of the noun |
noun_kind |
What the noun refers to |
* May be empty for some (rare) nouns, such as abbreviations
Example entry:
{
"stems": ["abac", "abac"],
"pos": "N",
"declension": "2",
"declension_variant": "1",
"gender": "M",
"noun_kind": "T",
"age": "E",
"area": "E",
"geography": "X",
"frequency": "C",
"source": "E",
"senses": "small table for cruets, credence, buffet;"
}
Attribute | Description |
---|---|
stems (1) |
Nominative stem |
stems (2) |
Genitive stem* |
declension |
Noun declension |
declension_variant |
Specific type of noun declension |
gender |
Grammatical gender of the noun |
pronoun_kind |
Specific type of pronoun |
* May be empty for some pronouns
Example entry:
{
"stems": ["qu", "NO_STEM"],
"pos": "PRON",
"declension": "1",
"declension_variant": "3",
"pronoun_kind": "ADJECT",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "O",
"senses": "any; anyone/anything, any such; (after si sin/sive/ne);"
}
Attribute | Description |
---|---|
stems (1) |
First principal part minus -o * |
stems (2) |
Second principal part (minus infinitive ending) |
stems (3) |
Third principal part (minus -i ) |
stems (4) |
Fourth principal part (minus -us/um ) |
conjugation |
Verb conjugation |
conjugation_variant |
Specific type of conjugation |
verb_kind |
Type of action the verb represents |
* May be the only stem in (rare) Biblical/Aramaic verbs
Example entry:
{
"stems": ["put", "put", "putav", "putat"],
"pos": "V",
"conjugation": "1",
"conjugation_variant": "1",
"verb_kind": "TRANS",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "X",
"senses": "think, believe, suppose, hold; reckon, estimate, value; clear up, settle;"
}
Attribute | Description |
---|---|
stems (1) |
Nominative stem |
stems (2) |
Genitive stem |
stems (3) |
Comparative stem* |
stems (4) |
Superlative stem* |
declension |
Adjective declension |
declension_variant |
Specific type of adjective declension |
gender |
Grammatical gender of the noun |
comparison |
Comparison this adjective is in |
* Some adjectives are incomparable and therefore have no 3rd and 4th stems
Example form:
{
"stems": ["bon", "bon", "meli", "opti"],
"pos": "ADJ",
"declension": "1",
"declension_variant": "1",
"comparison": "X",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "O",
"senses": "good, honest, brave, noble, kind, pleasant, right, useful; valid; healthy;"
}
Attribute | Description |
---|---|
stems (1) |
Positive stem |
stems (2) |
Comparative stem* |
stems (3) |
Superlative stem* |
declension |
Adjective declension |
declension_variant |
Specific type of adjective declension |
gender |
Grammatical gender of the noun |
comparison |
Comparison this adverb is in |
* Adverbs that are derived from adjectives are comparable and have all three stems. Other adverbs are incomparable and have only the 1st stem.
Example form:
{
"stems": ["bene", "melius", "optime"],
"pos": "ADV",
"comparison": "X",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "O",
"senses": "well, very, quite, rightly, agreeably, cheaply, in good style; better; best;"
}
Attribute | Description |
---|---|
stems (1) |
Preposition |
case |
Case it determines |
Example form:
{
"stems": ["ad"],
"pos": "PREP",
"case": "ACC",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "O",
"senses": "to, up to, towards; near, at; until, on, by; almost; according to; about w/NUM;"
}
Attribute | Description |
---|---|
stems (1) |
Interjection |
Example form:
{
"stems": ["vae"],
"pos": "INTERJ",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "B",
"source": "X",
"senses": "alas, woe, ah; oh dear; (Vae, puto deus fio. - Vespasian); Bah!, Curses!;"
}
Attribute | Description |
---|---|
stems (1) |
Cardinal number stem* |
stems (2) |
Ordinal number stem |
stems (3) |
Distributive stem |
stems (4) |
Numerical adverb stem |
declension |
Numeral declension |
declension_variant |
Specific type of numeral declension |
numeral_sort |
Specific type of number |
numeral_value |
Value the number holds (an integer) |
* Depending on the number, some may have all stems while others will lack some.
Example form:
{
"stems": ["undeviginti", "undevicesim", "undevicen", "undevic"],
"pos": "NUM",
"declension": "2",
"declension_variant": "0",
"numeral_sort": "X",
"numeral_value": "19",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "D",
"source": "X",
"senses": "nineteen;"
}
Attribute | Description |
---|---|
stems (1) |
Conjunction |
Example form:
{
"stems": ["ubi"],
"pos": "CONJ",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "X",
"senses": "where, whereby;"
}
(Artificial constructs used for the dictionary software. More specifically, they are used to represent all the -qu-/-cu-
pronouns.)
Attribute | Description |
---|---|
stems (1) |
-qu packon |
stems (2) |
-cu packon* |
declension |
Declension inflections required |
declension_variant |
Variant of the inflections |
packon_kind |
Which pronoun type it pertains to |
* Some packons have no -cu
stem.
{
"stems": ["qu", "cu"],
"pos": "PACK",
"declension": "1",
"declension_variant": "0",
"packon_kind": "REL",
"age": "X",
"area": "X",
"geography": "X",
"frequency": "A",
"source": "X",
"senses": "(w/-cumque) who/whatever, no matter who/what, in any time/way, however small;"
}
Attribute | Description |
---|---|
declension |
Noun declension the ending is applied to |
declension_variant |
Specific declension variant for the ending |
case |
Case the ending represents |
number |
Number (singular/plural) the ending represents |
gender |
Gender the ending represents |
Example form:
{
"pos": "N",
"declension": "2",
"declension_variant": "1",
"case": "GEN",
"number": "S",
"gender": "X",
"stem": "2",
"characters": "1",
"ending": "i",
"age": "X",
"frequency": "A"
}
would correspond to the -i
ending of the 2nd declension.
Attribute | Description |
---|---|
declension |
Pronoun declension the ending is applied to |
declension_variant |
Specific declension variant for the ending |
case |
Case the ending represents |
number |
Number (singular/plural) the ending represents |
gender |
Gender the ending represents |
{
"pos": "PRON",
"declension": "4",
"declension_variant": "2",
"case": "NOM",
"number": "P",
"gender": "N",
"stem": "2",
"characters": "1",
"ending": "a",
"age": "X",
"frequency": "A"
}
Attribute | Description |
---|---|
conjugation |
Conjugation the ending is applied to |
conjugation_variant |
Conjugation variant for the ending |
tense |
Tense the ending represents |
voice |
Voice the ending represents |
mood |
Mood the ending represents |
person |
Person the ending represents |
number |
Number (singular/plural) the ending represents |
Example form:
{
"pos": "V",
"conjugation": "1",
"conjugation_variant": "1",
"tense": "PRES",
"voice": "ACTIVE",
"mood": "IND",
"person": "1",
"number": "S",
"stem": "1",
"characters": "1",
"ending": "o",
"age": "X",
"frequency": "A"
}
Attribute | Description |
---|---|
conjugation |
Conjugation the ending is applied to |
conjugation_variant |
Conjugation variant for the ending |
case |
Case the ending represents |
tense |
Tense the ending represents |
voice |
Voice the ending represents |
mood |
Mood the ending represents |
number |
Number (singular/plural) the ending represents |
Example form:
{
"pos": "VPAR",
"conjugation": "1",
"conjugation_variant": "0",
"case": "NOM",
"number": "S",
"gender": "X",
"tense": "PRES",
"voice": "ACTIVE",
"mood": "PPL",
"stem": "1",
"characters": "PPL",
"ending": "1",
"age": "3",
"frequency": "ans"
}
Attribute | Description |
---|---|
declension |
Adjective declension the ending is applied to |
declension_variant |
Specific declension variant for the ending |
case |
Case the ending represents |
number |
Number (singular/plural) the ending represents |
gender |
Gender the ending represents |
comparison |
Adjective comparison the ending represents |
Example form:
{
"pos": "ADJ",
"declension": "1",
"declension_variant": "1",
"case": "NOM",
"number": "S",
"gender": "M",
"comparison": "POS",
"stem": "1",
"characters": "2",
"ending": "us",
"age": "X",
"frequency": "A"
}
Attribute | Description |
---|---|
declension |
Number declension the ending is applied to |
declension_variant |
Specific declension variant for the ending |
case |
Case the ending represents |
number |
Number (singular/plural) the ending represents |
gender |
Gender the ending represents |
numeral_sort |
Sort of the number this ending is applied to |
Example form:
{
"pos": "NUM",
"declension": "1",
"declension_variant": "1",
"case": "NOM",
"number": "S",
"gender": "M",
"numeral_sort": "CARD",
"stem": "1",
"characters": "2",
"ending": "us",
"age": "X",
"frequency": "A"
}
Attribute | Description |
---|---|
conjugation |
Conjugation the ending is applied to |
conjugation_variant |
Conjugation variant for the ending |
case |
Case the ending represents |
gender |
Gender the ending represents |
number |
Number (singular/plural) the ending represents |
Example form:
{
"pos": "SUPINE",
"conjugation": "0",
"conjugation_variant": "0",
"case": "ACC",
"number": "S",
"gender": "N",
"stem": "4",
"characters": "2",
"ending": "um",
"age": "X",
"frequency": "A"
}
Attribute | Description |
---|---|
comparison |
Adverb comparison the ending represents |
Example form:
{
"pos": "ADV",
"comparison": "COMP",
"stem": "1",
"characters": "0",
"ending": "NO_ENDING",
"age": "X",
"frequency": "A"
}
* Adverbs, prepositions, conjunctions and interjections are not inflected. Thus they all have NO_ENDING and barely any additional attributes.
Attributes | Description |
---|---|
case |
Case the prepositions determines |
Example form:
{
"pos": "PREP",
"case": "ACC",
"stem": "1",
"characters": "0",
"ending": "NO_ENDING",
"age": "X",
"frequency": "A"
}
Attributes | Description |
---|---|
N/A | N/A |
Example form:
{
"pos": "INTERJ",
"stem": "1",
"characters": "0",
"ending": "NO_ENDING",
"age": "X",
"frequency": "A"
}
Attributes | Description |
---|---|
N/A | N/A |
Example form:
{
"pos": "CONJ",
"stem": "1",
"characters": "0",
"ending": "NO_ENDING",
"age": "X",
"frequency": "A"
}