-
-
Notifications
You must be signed in to change notification settings - Fork 2
/
TODO.txt
125 lines (78 loc) · 5.79 KB
/
TODO.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
Prevent the program from inserting two words that have the same canonical form and part of speech (I think it does so for different etymologies)
Add the forms given by the new Wiktextract dumps (Russian but also maybe for Spanish)
Create public project with name ebookreader_dictionary_creator
Exact prevents uppercase symbols to be matched, so I should evaluate if I should use it
Add a command line interface that executes everything
Fix create_openrussian_database_from_csv so that it doesn't crash from wrong csv data
Fix/report баско́й obsolete
Alternative_canonical_forms is sometimes wrongly filled
Report to Wiktextract that accusative would be nice if it had animate/inanimate tags
Refactor create_db_tables_spanish into functions and reuse for english version
Add a custom collation to the Spanish SQLITE file so that it can be searched properly with a nocase version
Delete the openrussian.db database before adding elements again.
Maybe fix the double linkage that occurs when base_form links to a word that is two times in database.
-> But first assure that there are no valid base_form links across part of speech tokens (who knows?)
Recommend dont_append source in pyglossary
A hurtadillas is currently not recognized -> Change this with an algorithm to detect words occuring only in phrases
Check for German if the Kindle algorithm really changes ä to a
This term needs a translation to english -> Remove entries that have this
Latin generation doesn't work -> for example aspiciō
Release Tabfiles
Fix upper-lowercase-cases in ambiguity resolver (russian stress)
Add кубарем
Fix cases where the word type gets incorrectly detected, but this could be detected if the cases don't match up
Create a version of the dictionary that includes declension tables
Check why приня́лся does not cause the OpenRussian words not to be added.
Make a real effort to generate a dictionary for each language
Find words without glosses in kaikki data
Fix generation of Russian dictionaries to include the better formatting similar to the other dictionaries
Add generation of Russian dictionaries
Add this component that automatically generates entries for derived forms that don't have their own entries
(After creating the entire database) For each word, iterate through derived forms:
if the inflection exists, do nothing
if it doesn't, check the ending: does it end the same as the base word?
-> if no, do some extra handling for imperfect form or so
-> some form of imperfect form detection?
-> if yes, also check if the accent is captured by the identical ending
-> if so, the prefix can simply be added to all inflections
-> else the accented prefix can be added to the unaccented base word inflections
-> Additionally ся handling
Fix that adjectives for Russian are not added to the DB (?)
Test installation
Test Stardict Metadata
Create all dictionaries
Value auxiliary verb field not as inflection for Italian
if a word has no space and inflections with space, split by space and choose the one more similar to the word
-> This fixes problems with German
Algorithm that creates reverse dictionaries (English to other language)
Korean Kindle error:
Info:I9021:Option: (verborgen) erstellt mobi für ältere Geräte.
Info(prcgen):I1047: Metadaten hinzugefügt dc:Title "Korean-English Wiktionary dictionary"
Info(prcgen):I1047: Metadaten hinzugefügt dc:Creator "Vuizur"
Info(prcgen):I1047: Metadaten hinzugefügt BASICCode "REF008000"
Info(prcgen):I1047: Metadaten hinzugefügt dc:Subject "Dictionaries"
Warnung(inputpreprocessor):W29004: Gewaltsam geschlossenes offenes Tag: <idx:iform value="ryeon">
In Datei: C:\Users\hanne\Documents\Programme\wiktionary_extract_test\Korean-English Wiktionary dictionary\OEBPS\g000002.xhtml Zeile: 0004372
Warnung(inputpreprocessor):W29004: Gewaltsam geschlossenes offenes Tag: <idx:iform value="ryŏn">
In Datei: C:\Users\hanne\Documents\Programme\wiktionary_extract_test\Korean-English Wiktionary dictionary\OEBPS\g000002.xhtml Zeile: 0004372
Warnung(inputpreprocessor):W29004: Gewaltsam geschlossenes offenes Tag: <idx:iform value="련">
In Datei: C:\Users\hanne\Documents\Programme\wiktionary_extract_test\Korean-English Wiktionary dictionary\OEBPS\g000002.xhtml Zeile: 0004372
Warnung(inputpreprocessor):W29004: Gewaltsam geschlossenes offenes Tag: <idx:iform value="lyen">
In Datei: C:\Users\hanne\Documents\Programme\wiktionary_extract_test\Korean-English Wiktionary dictionary\OEBPS\g000002.xhtml Zeile: 0004372
Warnung(inputpreprocessor):W29004: Gewaltsam geschlossenes offenes Tag: <idx:iform value="료">
In Datei: C:\Users\hanne\Documents\Programme\wiktionary_extract_test\Korean-English Wiktionary dictionary\OEBPS\g000002.xhtml Zeile: 0008956
Fehler(core):E1005: Zugriff auf Datei nicht möglich.
In Datei: C:\Users\hanne\Documents\Programme\wiktionary_extract_test\Korean-English Wiktionary dictionary\OEBPS\style.css
Info(prcgen):I1002: Parsing von Dateien 0000093
Fix dictionary creation so that glossary does not have to be created 3 times -> Performance improvement
Pull request to pyglossary with language codes
Add support for creating reverse dictionaries (English to Language X)
verstopft|verstopfter|am verstopftesten in German does not allow proper lookup
Fix Russian dictionary to use the Russian specific methods for all dictionary creator
Display canonical form instead of the title for
Regenerate dictionaries that have stress marks/"canonical forms" (because I think the words might have problems with being found)
and because it is useful for people to know the stress
Euklidův algoritmus|Euklidův algoritmus m Category:cs:Mathematics
Include IPA
Include Etymology section
Add proper Russian-English dictionary so that it is listed in Koreader dict file