Skip to content

Commit

Permalink
Add lexica
Browse files Browse the repository at this point in the history
Introduce OLang/MDF for use with
+Err/Lex+OLang/RUS when there are inflected
  • Loading branch information
rueter committed Oct 12, 2023
1 parent a620fb7 commit 3bf3cc7
Show file tree
Hide file tree
Showing 4 changed files with 311 additions and 212 deletions.
35 changes: 35 additions & 0 deletions src/fst/root.lexc
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,41 @@ Multichar_Symbols !!≈!!@CODE@ / Analysis symbols
+Der/PatrMal
+Der/PatrFem

!! # Tags for originating language

!! The following tags are used to guide conversion to IPA: loan words
!! and foreign names are usually pronounced (approximately) as in the
!! originating (majority) language. Instead of trying to identify the
!! correct pronunciation based on phonotactics (orthotactics actually),
!! we tag all words that can't be correctly transcribed using the SME
!! transcriber with source language codes. Once tagged, it is possible
!! to split the lexical transducer in smaller ones according to langu-
!! age, and apply different IPA conversion to each of them.
!!
!! The principle of tagging is that we only tag to the extent needed,
!! and following a priority:
!! 1. any untagged word is pronounced with SME orthographic conventions
!! 1. NNO and NOB have identical pronunciation, NNO is only used if
!! different in spelling from NOB
!! 1. SWE has mostly the same pronunciation as NOB, and is only used
!! if different in spelling from NOB
!! 1. Occasionally even SME (the default) may be tagged, to block other
!! languages from being specified, mainly during semi-automatic
!! language tagging sessions
!!
!! All in all, we want to get as much correctly transcribed to IPA
!! with as little work as possible. On the other hand, if more words
!! are tagged than strictly needed, this should pose no problem as
!! long as the IPA conversion is correct - at least some words will
!! get the same pronunciation whether read as SME or NOB/NNO/SWE.

+OLang/CHV !!≈ * **@CODE@** = Chuvash
+OLang/MDF !!≈ * **@CODE@** = Moksha
+OLang/MYV !!≈ * **@CODE@** = Erzya
+OLang/RUS !!≈ * **@CODE@** = Russian
+OLang/TAT !!≈ * **@CODE@** = Tatar


!! Morphophonology
! ---------------
!! To represent phonologic variations in word forms we use the following
Expand Down
Loading

0 comments on commit 3bf3cc7

Please sign in to comment.