Live: nihongosensei.app 日本語先生 This project provides a collection of useful tools for learning japanese.
Live:
A simple generator for adding furigana to japanese text via kuroshiro is available under nihongosensei.app/furigana.
A dictionary is available based on wadoku.de xml data.
The dictionary uses a single table for now which holds the converted xml entry in json and additional fields ot enable text search.
create table entry
(
id int unsigned not null
primary key,
entry_json json null,
lastchange timestamp default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP,
jlpt tinyint unsigned null
)
charset = utf8;
create table entry_map
(
entry_id int not null,
text varchar(255) charset utf8mb4 not null,
primary key (entry_id, text)
);
create fulltext index entry_map__text
on entry_map (text);
create index entry_map__text_index
on entry_map (text);
create table entry_ref
(
target_id int not null,
source_id int not null,
type varchar(255) not null,
subentrytype varchar(255) null,
primary key (target_id, source_id)
);
create index entry_ref__target_id
on entry_ref (target_id);
See https://github.com/nihongosensei/wadoku-export-reader
See https://www.wadoku.de/downloads/xml-export/
See https://www.wadoku.de/wiki/display/WAD/Wadoku.de-Daten+Lizenz
JLPT levels are imported from wikipedia: https://en.wiktionary.org/wiki/Appendix:JLPT
Multiple Senses: 167612
Def and Text inside a TR: 1707
With a ref inside a Sense: 273
Tr followed by a def: 208
Tr followed by a def with multiple Tr: 515
Multiple defs after a tr: 4029690
Usg with type and reg: 4151
Long list of senses: 8042046
With etym: 11712
With etym which has a ref: 8545
With etym which has a foreign word: 490814
With multiple etyms: 2516676
Usg with type HINT: 3778315
Usg with type TIME: 8444455
Def followed by text: 8444455
Usg on entry level: 5075870
Famn and title: 226081
Season word: 10000528
Many senses: 5260527
Verb with 2 doushi definitions: 2972828