[REQ] Simplify token iterator #129

mhatzl · 2024-03-20T13:29:13Z

Is your feature request related to other issues/PRs?

unimarkup/specification#55 unimarkup/specification#56

Remove matching fns

Prefix matching will be done on the number of indent spaces.
Therefore, only a number needs to be stored instead of a complex matching fn.

For end matching, it might be possible to provide an enum to cover all needed scenarios.
Storing the enum directly in the iterator should make parsing faster, and significantly reduce complexity. Scoping is still needed for enclosed elements.

Blankline ... Iterator ends on blankline, or if iterator end is reached

Needed for Paragraph, Quote Block, Line Block.
NewlineMatch(Vec) ... Matches given tokens once a newline is matched

Needed for enclosed blocks. Assuming issue [REQ] Relax enclosing block parsing specification#56 gets accepted.
BlankOrNewlineMatch(Vec) ... Either ends on blankline, or matches given tokens once a newline is matched

Needed for Heading and lists, because they do not require a blank line in case the next line starts with the element keyword.
Match(Token) ... Ends if the token is matched

Needed mostly for inline elements, but also for tables.
MatchEither(Token1, Token2) ... Ends if either Token1 or Token2 matches

Needed to handle ambiguous inline elements.

Try to combine block and inline iterator

The inline iterator is currently needed, because base tokens are converted to inline tokens,
and open formats are stored in a slice.

Generic token iterator

It might be possible to make the base token iterator generic.
The generic token type must have functions to determine if a token is a newline, blankline, or EOI.

With tokens being generic, a conversion layer may be added to convert between base tokens to inline tokens. This has the benefit of reducing API duplication, because base and inline iterators get merged.
Use end matching for inline formats

Inline formats use an open format map to determine if a format is open or not.
This open map is needed to decide if a keyword should open or close a format.
With iterators being nested, it might be possible to add a function that checks whether a format is already open (by having a parent parser that handles the respective format), or not.
If this works, no open map would be needed for inline parsing, which makes inline parsing much easier.

To achieve this, iterators must know for what element parsing they are used.
Could be done by adding a field with type ElementKind. To resolve ambiguous tokens, it must be possible to cache exactly one token.

mhatzl · 2024-03-20T14:50:01Z

It must be possible to get the number of prefix spaces of all parent iterators for correct indentation of block quoted logic.

Unimarkup block content in the logic part is started with """.
To get indentation consistency, prefix for all enclosed lines must be set so that the content starts at the same "visual depth" of the leftmost double quote (in left-to-right flow).

let block = """
            # Heading

            Paragraph
            """;

The number of spaces that were skipped by parent iterators makes it easy to calculate the needed indentation, because with this information, the start.col_grapheme value of the first quote token can be used.

prefix_indent = start.col_grapheme - sum_parent_indents

Without this information, the start.col_grapheme value of the first token an iterator returns after a newline would need to be kept.

mhatzl added the waiting-on-assignee Issue/PR author or reviewer is awaiting response from assignee label Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REQ] Simplify token iterator #129

[REQ] Simplify token iterator #129

mhatzl commented Mar 20, 2024 •

edited

Loading

mhatzl commented Mar 20, 2024

[REQ] Simplify token iterator #129

[REQ] Simplify token iterator #129

Comments

mhatzl commented Mar 20, 2024 • edited Loading

Is your feature request related to other issues/PRs?

Remove matching fns

Try to combine block and inline iterator

mhatzl commented Mar 20, 2024

mhatzl commented Mar 20, 2024 •

edited

Loading