Skip to content

Commit e3dac87

Browse files
Re-wrote sections on architecture in readme
1 parent a7c6853 commit e3dac87

File tree

1 file changed

+55
-34
lines changed

1 file changed

+55
-34
lines changed

README.md

Lines changed: 55 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,40 @@ Anyone with an interest in Latin can get something out of velut, but it is aimed
88

99
This GitHub repo is publicly visible. The site is hosted by [Fly](https://fly.io/) from the main branch, at https://www.velut.co.uk.
1010

11+
## Functionality
12+
13+
On visiting the [homepage](https://www.velut.co.uk), you are invited to type in a Latin word, and select the type of rhymes you want to search for. “Types of rhyme” here also include anagrams, words that scan the same metrically, or words with the same consonants in order (consonyms).
14+
15+
This will return words that rhyme with the input, as well as information about the “lemmata” of the input — that is, the headwords that the word can be an inflected form of. The lemmata information includes the part of speech, definitions, any notes or transliterations, the inflected forms, and lemmata that I think have the same etymology (cognates). At the bottom of the page are links to external online resources (such as Logeion and Wiktionary) that may have more details about the input word.
16+
17+
All my Latin words are macronized, meaning every long vowel is marked with a macron, but you can input a word without the macra, or with hyphens instead of the macra, and velut will find the word you mean, if I have it. For example, you can search for “vocabulorum”, “voca-bulo-rum”, or “vocābulōrum”, and get results for “vocābulōrum”. Similarly, proper nouns and related adjectives are capitalised, but the input is not case-sensitive except in instances of ambiguity (eg, between “Cōs” the Greek island and “cōs” meaning “whetstone”).
18+
19+
Other sections of the site let you find:
20+
21+
- Latin words whose letters are contained in an input string (I call them [subwords](https://www.velut.co.uk/subwords)),
22+
- Latin [phrases that are anagrams](https://www.velut.co.uk/anagramphrases) of an input (this is not actually linked from elsewhere on the site, because it can be very slow!),
23+
- Latin lemmata [from an English meaning](https://www.velut.co.uk/english),
24+
- Latin words [that fit either](https://www.velut.co.uk/advanced) an input pattern of letters or an input metrical scansion, or both (added in November 2020), and
25+
- [many Latin words at once](https://www.velut.co.uk/multiword) (added in May 2021).
26+
1127
## Architecture
1228

1329
The velut website (in this repository) is a Next.js site that reads from two MongoDB collections in accordance with what the user searches for. None of its functionality requires client-side JavaScript, because the site is entirely server-side–rendered. However, the Multi-word page (www.velut.co.uk/multiword) uses client-side rendering if possible, as does the [Search component](https://github.com/DuncanRitchie/velut/blob/main/components/search/Search.jsx).
1430

31+
Vocabulary data and scripts for processing them are in separate repos. I have a private Json file listing all lemmata (dictionary headwords), and public JavaScript scripts that process the Json into more Json, and that is what goes into the two MongoDB collections.
32+
33+
The scripts include my:
34+
- [Inflector](https://github.com/DuncanRitchie/velut-inflector), which generates inflected forms for all lemmata; and
35+
- [Word Data Generator](https://github.com/DuncanRitchie/velut-word-data-generator), which generates phonetic and other information about all words.
36+
37+
There will also be a script that converts the output of the Inflector into the format needed for the Word Data Generator. This will enable all inflected forms to be words that you can search for and see rhymes for on the velut website.
38+
39+
The two MongoDB collections are:
40+
- `lemmata`, in which every document (database record) is a “lemma” with information from the source Json file plus the inflected forms from the Inflector; and
41+
- `words`, in which every document is a “word” with information from the Word Data Generator.
42+
43+
There’s also a MongoDB collection called `summary`, but this is temporary.
44+
1545
### Old version with Create React App
1646

1747
When I first made the velut website, it was a single-page application that had the same functionality, but using an [Express](https://expressjs.com/) server on the backend and client-side–rendered [React.js](https://reactjs.org/) on the frontend (using [Create React App](https://create-react-app.dev/)).
@@ -20,55 +50,48 @@ The code was on a branch called mern, whose last commit was [413ddae4](https://g
2050

2151
(MERN stands for “MongoDB, Express, React, Node”. Technically the Next.js version is also MERN, because Next.js uses an Express server internally, but with the client-side–rendered version I wrote code that directly — expressly?! — calls Express, so the branchnames “main” and “mern” made sense to me.)
2252

23-
### Local data storage
53+
### Excel and de-Excellation
2454

25-
Much of the word information you see on the website is stored in an an Excel file, which is now more than 90MB in size. Until recently, I added to it frequently, converted the data to Json — using a [webpage I made specifically for this purpose](https://github.com/DuncanRitchie/velut-json-generator) — and used mongoimport to replace my two MongoDB Atlas collections.
55+
velut started life as an Excel file, which over the years grew to more than 90MB in size.
56+
Much of the word information you see on the website is stored in it.
2657

27-
I am now well into the process of replacing Excel with custom Json, JavaScript, and MongoDB. See the “Ongoing work” section below.
58+
In 2019, I created the website to show the data publicly.
59+
But I still relied heavily on Excel for generating, checking, and storing the data.
60+
I added to the Excel file frequently, converted the data to Json — using a [webpage I made specifically for this purpose](https://github.com/DuncanRitchie/velut-json-generator) — and used mongoimport to replace my two MongoDB Atlas collections.
2861

29-
## Functionality
30-
31-
On visiting the [homepage](https://www.velut.co.uk), you are invited to type in a Latin word, and select the type of rhymes you want to search for. “Types of rhyme” here also include anagrams, words that scan the same metrically, or words with the same consonants in order (consonyms).
62+
I am now well into the process of replacing Excel with my custom Json, JavaScript, and MongoDB.
63+
It feels good to not have to open up a 90MB file!
3264

33-
This will return words that rhyme with the input, as well as information about the “lemmata” of the input — that is, the headwords that the word can be an inflected form of. The lemmata information includes the part of speech, definitions, any notes or transliterations, the inflected forms, and lemmata that I think have the same etymology (cognates). At the bottom of the page are links to external online resources (such as Logeion and Wiktionary) that may have more details about the input word.
65+
My current stage is manually reviewing the output of my Inflector script (inflection-tables for all lemmata).
66+
This stage may take another few months because accuracy is important to me and I’m reviewing each of my 14,127 lemma individually.
67+
So far, I have reviewed most of the lemmata and published their inflection-tables to the live website.
3468

35-
All my Latin words are macronized, meaning every long vowel is marked with a macron, but you can input a word without the macra, or with hyphens instead of the macra, and velut will find the word you mean, if I have it. For example, you can search for “vocabulorum”, “voca-bulo-rum”, or “vocābulōrum”, and get results for “vocābulōrum”. Similarly, proper nouns and related adjectives are capitalised, but the input is not case-sensitive except in instances of ambiguity (eg, between “Cōs” the Greek island and “cōs” meaning “whetstone”).
69+
The `words` collection, at the moment, consists of Latin words that I had in Excel, fed through the Word Data Generator.
70+
This means that words that I didn’t have in Excel cannot be searched for on the velut website — even if they appear in the inflection-tables that the Inflector creates.
71+
Eventually, I will be able to use the Inflector’s output for the input to the Word Data Generator (via a new script I alluded to earlier).
72+
That will mean every form in the inflection-tables will be in the `words` collection, and every form therefore will be a word that can be searched for on the website.
3673

37-
Other sections of the site let you find:
74+
There’s still a lot of common Latin vocabulary that is not yet in the velut database, and that I’d like to include.
75+
But, that will have to wait.
76+
My priority is finishing my script for generating forms (or finishing checking that it’s all correct) and completing the new architecture without Excel.
3877

39-
- Latin words whose letters are contained in an input string (I call them [subwords](https://www.velut.co.uk/subwords)),
40-
- Latin [phrases that are anagrams](https://www.velut.co.uk/anagramphrases) of an input (this is not actually linked from elsewhere on the site, because it can be very slow!),
41-
- Latin lemmata [from an English meaning](https://www.velut.co.uk/english),
42-
- Latin words [that fit either](https://www.velut.co.uk/advanced) an input pattern of letters or an input metrical scansion, or both (added in November 2020), and
43-
- [many Latin words at once](https://www.velut.co.uk/multiword) (added in May 2021).
78+
For the details, see my [plan of de-Excellation](https://github.com/DuncanRitchie/velut/blob/main/plan.md).
4479

4580
## Screenshots
4681

47-
### Excel
48-
49-
The velut Excel file has nine sheets, of which four are shown below. The “words” sheet stores data on Latin words as plaintext. The “wordsform” sheet generates the data for the “words” sheet based on the inputs in columns B and C. The “lemmata” sheet stores data on Latin lemmata. The “output” sheet displays information (including rhymes and inflected forms) about whatever Latin word is typed in the orange cell.
50-
51-
![Composite screenshot of four Excel sheets](https://github.com/DuncanRitchie/velut-screenshots/blob/main/compressed/velut-excel-4sheets.png)
52-
5382
### Website
5483

5584
Displayed below is the page for the word “opportūna”, showing that it is different to “opportūnā”, scans as long-long-long-short metrically, rhymes with words like “ūna” and “lūna”, and is a form of the lemma “opportūnus” (an adjective meaning “timely; suitable”). https://www.velut.co.uk/opportu-na.
5685

5786
![“opportūna” on velut](https://github.com/DuncanRitchie/velut-screenshots/blob/main/compressed/velut-web-opportuna.png)
5887

59-
## Ongoing work
60-
61-
Most of my efforts on velut nowadays are towards its de-Excellation. I do not want to be using Excel for this project!
88+
### Excel
6289

63-
I relied heavily on Excel for generating, checking, and storing the data. I am gradually weaning myself off it by creating webpages and websites that replicate the functionality that I have had in spreadsheets. The [velut website](https://www.velut.co.uk) itself is one example; the [Json generator](https://www.github.com/DuncanRitchie/velut-json-generator) is another; I’ve made and am making [more](https://www.duncanritchie.co.uk/code#velut-projects).
90+
(The velut Excel file is pretty much deprecated, but some screenshots here won’t hurt.)
6491

65-
At the moment, most of this work involves a script I’ve written to generate all the forms I want of all the lemmata I have.
66-
My script generates forms for all the lemmata, but I need to manually review its output.
67-
For the details, see my [plan of de-Excellation](https://github.com/DuncanRitchie/velut/blob/main/plan.md).
92+
The Excel file has nine sheets, of which four are shown below. The “words” sheet stores data on Latin words as plaintext. The “wordsform” sheet generates the data for the “words” sheet based on the inputs in columns B and C. The “lemmata” sheet stores data on Latin lemmata. The “output” sheet displays information (including rhymes and inflected forms) about whatever Latin word is typed in the orange cell.
6893

69-
There’s still a lot of common Latin vocabulary that is not yet in the velut database, and that I’d like to include.
70-
But, that will have to wait.
71-
My priority is finishing my script for generating forms (or finishing checking that it’s all correct) and completing the new architecture without Excel.
94+
![Composite screenshot of four Excel sheets](https://github.com/DuncanRitchie/velut-screenshots/blob/main/compressed/velut-excel-4sheets.png)
7295

7396
## Environment variables
7497

@@ -108,11 +131,9 @@ npm run dev
108131

109132
To redeploy, I simply push to the main branch on GitHub.
110133

111-
### Editing the data
134+
For how I edit the data with Json files and JavaScript scripts, see [“Architecture”](#architecture) above.
112135

113-
The source data about Latin vocabulary are in a private repo.
114-
My scripts for processing the data are in public repos (eg the [Inflector](https://github.com/DuncanRitchie/velut-inflector) and [Word Data Generator](https://github.com/DuncanRitchie/velut-word-data-generator)) — these are written in JavaScript and they read and/or write Json.
115-
My scripts to refresh the database with the Json data (using mongoimport) are private, so as not to expose the MongoDB connection string.
136+
My scripts to refresh the database (using mongoimport) are private, so as not to expose the MongoDB connection string.
116137

117138
## Miscellanea
118139

0 commit comments

Comments
 (0)