The lexer should allow raw source in 2 forms:
- All directives (categories, syllables, replacements, rejections, letters) in one file/text block
- Each of the aforementioned in separate files/text blocks
5 directives, depending on starting token:
Token | Description | Multiple Allowed? |
---|---|---|
[any category name] |
Category definition | ✅ |
syllable |
Syllable definition | ✅ |
reject |
Rejection rule | ✅ |
replace |
Replacement rule | ✅ |
letters |
Sorting order | ❌ |
The following grammar is in common for all rules below:
line-ending = "\n" | ";" ;
with-weight = "*" weight ;
weight = [1-9]+[0-9]* ; # any positive decimal integer
comment = "#" .* line-ending ;
context-prefix = "^" | "@" | "!";
context-suffix = "\" | "&" ;
Components are the base unit of all directives.
They are used in all directives (except for letters
).
All components can be weighted.
phoneme = "any non-space text not in "\n;*#$=[](){}|,/_:>^@!\\&%" ;
category-reference = "$" identifier ;
identifier = any non-space text not in "\n;*#$=[](){}|,/_:>^@!\\&%" ;
weighted-component = syllable-component with-weight? ;
weighted-components = syllable-components with-weight? ;
Defines a category. One category per line, or multiple when separated by semicolons.
category-definition = identifier "=" space-separated-elements line-ending ;
space-separated-elements = weighted-category-element (" " weighted-category-element)* ;
weighted-category-element = category-element with-weight? ;
category-element = phoneme | category-reference ;
Defines all possible syllables in the language. Multiple syllable directives per file is allowed.
syllable-definition = "syllable" ":" syllable-components line-ending ;
syllable-components = syllable-component+ ;
syllable-component = phoneme | category-reference | component-reference | grouping | selection | optional ;
grouping = "{" "}" | "{" syllable-components (" "? syllable-components)* "}" ;
optional = "(" syllable-components ")" with-weight? ;
selection = "[" "]" | "[" selection-elements "]" ;
selection-elements = selection-element ("," selection-element)*
selection-element = syllable-components with-weight?;
Named syllable components are syllable components with an assigned name.
They are defined with the component
directive, and invoked similarly to categories, using %
.
component-definition = "component" ":" identifier "=" syllable-components line-ending ;
component-reference = "%" identifier ;
Defines a rejection rule. Multiple per file is allowed.
They are of the form reject: ...
where ...
is any syllable component.
rejection-definition = "reject" ":" rejection-elements line-ending ;
rejection-elements = rejection-element ("|" rejection-element)? ;
rejection-element = context-prefix? syllable-components context-suffix? ;
replacement-definition = "replace" ":" source? ">" replacement? "/" replace-condition ("//" replace-exception)? line-ending ;
source = (reference | phoneme)+ ;
replacement = phoneme+ ;
condition = env ;
exception = env ;
env = (context-prefix? syllable-components)? "_" (syllable-components context-suffix?)?
letters-definition = "letters" ":" letters line-ending ;
letters = phoneme (" " phoneme)* ;
- Create categories
- Create syllables
- Create rejection rules
- Create replacement rules
- Create letter sorting order
- Pick the syllable count
- Generate that many syllables
- Apply rejection rules
- Apply replacement rules
- Generate the word's letterization
Weights on components represent how much more probable it is to select it compared to other components. Components that have the same weight have identical probabilities to be selected. Based on the semantics of the weightedrandom package.
Categories should assign weights to every phoneme during preparation. By default, every phoneme has a weight of 1. Weighted phonemes have their written weight.
When Get
is called, it should return a phoneme from its colleciton of phonemes.
If the element that was gotten is a reference, it should get from the reference.
Recursive/looped references (C = $A; A = $C
) are not allowed.
This is checked during preperation, after the categories are generated.
Syllables should be prepared once, and run many times (assuming the source input doesn't change).
When a syllable's Get
is called, call Get
for each component, and turn itself into a string.
Words are groupings of multiple syllables. The number of syllables in a word is dependent on the min/max syllable count options. The number of words generated is determined by the word count, "allow duplicates", and "force generate to word limit" options.
- If on, remove duplicates
- If on, generate again until the word count is reached
- If on, sort the words based on letters
- If on, display the word split into syllables
- Otherwise, display the word as one word