Skip to content

Writing Extensions

Vladimir Schneider edited this page Jan 20, 2017 · 34 revisions

Table of Contents

Extensions need to extend the parser, or the HTML renderer, or both. To use an extension, the builder objects can be configured with a list of extensions. Because extensions are optional, they live in separate artifacts, so additional dependencies need to be added as well.

The best way to create an extension is to start with a copy of an existing one and modify the source and tests to suit the new extension.

Parsing Process

Parsing proceeds in distinct steps with ability add custom processing at each step.

Flexmark Parsing

  1. Text in the source is broken up into Block nodes. Block parsers decide whether the current line begins a block they recognize as one they create. Any lines not claimed by some block parser are claimed by the core ParagraphParser.

    A block parser can claim lines currently accumulated by the paragraph parser with the ability to replace it as the currently active parser.

  2. Paragraph blocks are processed to remove leading lines that are should not be processed as text. For example link references are processed and removed from the paragraphs. Only full lines should be processed at this step. Partial removal of text should be done at the next step.

    The inline parser instance is available to blocks at this time to allow them to use the API to help process the text. Processors that require inline processing of their contents should be run after the core reference link paragraph processor, otherwise some link refs will not be recognized because they are not defined yet. A custom processor factory should return ReferencePreProcessorFactory.class from getAfterDependents(). Similarly, if a paragraph processor needs to run before another pre-processor then it should return the processor's factory class from getBeforeDependents().

    Any circular dependencies will cause an IllegalStateException() to be thrown during builder preparation stage.

    The paragraph pre-processing is divided into stages of paragraph pre-processors to allow for dependencies to be respected.

    Paragraph pre-processors can also report that they affectsGlobalStat(), which means that some document properties are affected by the result of their processing. For example, ReferencePreProcessorFactory does so because reference link definitions affect the REFERENCES node repository.

    Since a global processor will be run on all the paragraphs in the document before one of its dependents is allowed to process any paragraphs, global processors will be the only processor in their respective pre-processing stage.

    Non global processors within the same stage will be run sequentially on each paragraph block until no more changes to the paragraph are made. This means that non-global processors of the same stage are allowed to have intermixed content, while global ones will only be run once for each paragraph. Non-global processors dependent on one or more global processors will be run at the first available stage after all their global dependencies have completed processing.

    The order of pre-processors within a stage will be based on dependencies between processors and where not so constrained on the registration order of their corresponding extensions.

    ⚠️ It is best to implement the desired customization by using block parsers rather than paragraph pre-processors. Use the latter only if proper interpretation is not possible without using the inline parser API. Using the inline parser API during block parsing is a serious performance issue.

  3. Block pre processing is strictly for custom processors. At this point block nodes can be added, replaced or removed. Any other nodes can also be added to the AST, however no inline blocks have been created at this point.

    Node creation and removal should be communicated to the ParserState instance via its implementation of the BlockParserTracker and BlockTracker interfaces. This is necessary to allow the internal parser optimization structures to be updated so that further block pre-processing will proceed correctly.

  4. During inline processing each block is given the chance to process inline elements contained in its text node or nodes. There are two types of customizations available at this step: link ref processing and delimiter processing. Delimiters are runs of text that have a start and end character to determine their span. Delimiters may be nested and have a minimum and maximum level of nesting.

    Link ref processors are responsible for processing custom elements that are recognized as possible link refs, ie. they are delimited by ![ or [ and terminated by ]. Link ref processors determine whether brackets can be nested, whether the ! should be processed as text or part of their node and determine whether they accept the potential link ref text as their node. Full text, ![] or [] is given for approval to give maximum flexibility on handling contained white space.

    Footnote [^footnote] and Wiki link [[wiki link]] extensions make use of ths extension mechanism.

  5. Post processing step is for final AST modifications. Post processors come in two varieties: node and document post processors. Although the PostProcessor interface is used for both types, a post processor can only be one or the other.

  6. HTML rendering step. Renders the final AST. Extension provide default renderers for their custom nodes. Rendering for any node can be customized by replacing the default renderer or through an attribute provider that will override HTML element attributes for default renderers. LinkResolvers are responsible for converting the link url text from the text in the markdown element to the rendered URL.

    Node Post Processors

    Node post processors specify the classes of nodes they want to post process, with ancestor exclusion criteria. The process(NodeTracker, Node) function of the processor will be called for every AST node that matches the given criteria.

    Any modifications to the AST must be communicated to the NodeTracker instance, which is responsible for updating the internal structures used to optimize node selection for processors.

    Specifically, each new node added or moved in the AST hierarchy will need to have its ancestor list updated for further node post processing. These notification functions should be called after the particular changed hierarchy change is complete to eliminate unnecessary updates for intermediate AST changes.

    Nodes that contain child nodes which are new or have been moved from their previous parents need to be notified via the nodeAddedWithChildren(Node), rather than using nodeAdded(Node) callback for each individual node. Similarly, greater depth changes should be communicated via nodeAddedWithDescendants(Node) notification.

    Complete node removals should be communicated via nodeRemoved() function after all its child nodes that need to be moved elsewhere have been removed.

    ⚠️ All node removal functions will perform node removal of the node and all its descendants since any child nodes of an unlinked node are removed from the AST.

    Document Post Processors

    Document post processors are invoked using the processDocument(Document) member function and the returned document will be used as the document for further processing.

    Document processors are responsible for finding nodes of interest by recursively traversing the AST. For this reason, using a document post processor should only be done when processing cannot be done on individual nodes.

    Although, traversing the AST for one extension is faster than creating, maintaining and accessing the optimization structures in the parser, doing this with just two extensions on a large document is a much slower process.

    This performance gain is especially true for extensions that exclude nodes based on their ancestor types in the AST. For node post processors this hierarchy is determined during a single traversal of the AST to build all the node tracking structures. If the extension determines inheritance by looking back at the getParent() function of a node this becomes very inefficient on large documents.

  7. Include File Support allows extensions to copy their custom reference defining elements from included document to the including document so that any included custom elements requiring definition of these references will be resolved correctly before rendering. This is an optional step that should be performed by the application between parsing the document and rendering it. See Include Markdown and HTML content with Jekyll Tag Extension

Source Tracking

To track source location in the AST, all parsing is performed using BasedSequence class which extends CharSequence and wraps the original source character sequence of the document with start and end offsets to represent its own contents. subSequence() returns another BasedSequence instance with the original base sequence and new start/end offsets.

In this way the source location representing any string being parsed can be obtained using the BasedSequence.getStartOffset() and BasedSequence.getEndOffset(). At the same time parsing is no more complicated than working with CharSequence. Any string stored in the AST has to be a subSequence() of the original source. This constraint makes sense since the AST represents the source.

The fly in the ointment is that parsing unescaped text from the AST is a bit more involved since it is the escaped original which must be added to the AST. For this all methods in the Escaping utility class were added that take a BasedSequence and a ReplacedTextMapper class. The returned result is a modified sequence whose contents can be mapped to the original source using the methods of the ReplacedTextMapper object. Allowing parsing of massaged text with ability to extract un-massaged counterpart for placement in the AST. See implementation in the flexmark-ext-autolink AutolinkNodePostProcessor for an example of how this is achieved in a working extension.

Similarly, when using regex matching you cannot simply take the string returned by group() but must extract a subSequence from the input using the start/end offsets for the group. The best way to do this is to take a subSequence() from the original sequence that was used to create the matcher(), this eliminates . Examples of this are abundant in the core parser implementation.

A small price to pay for having complete source reference in the AST and ease of parsing without having to carry dedicated separate state to represent source position or use dedicated grammar tools.

Source tracking in the core was complicated by leading tab expansion and prefix removal from parsed lines with later concatenation of these partial results for inline parsing, which too must track the original source position. This is addressed with additional BasedSequence implementation classes: PrefixedSubSequence for partially used tabs and SegmentedSequence for concatenated sequences. The result is almost a transparent propagation of source position throughout the parsing process.

If there are any missed or erroneous settings in the AST then these should be caught by tests that also validate the generated AST.

Configuring Options

A generic options API was added to allow easy configuration of the parser, renderer and extensions. It consists of DataKey<T> instances defined by various components. Each data key defines the type of its value and a default value. A DynamicDefaultKey<T> will not return a default value generated in the constructor but will invoke the default value factory function every time the default value is requested. Useful if you want the default value to be the current value of another key in the DataHolder.

The values are accessed via the DataHolder and MutableDataHolder interfaces, with the former being a read only container. Since the data key provides a unique identifier for the data there is no collision for options.

Parser.EXTENSIONS option holds a list of extensions to use for the Parser and HtmlWriter. This allows configuring the parser and renderer with a single set of optioins.

To configure the parser or renderer, pass a data holder to the builder() method.

public class SomeClass {
    static final MutableDataHolder OPTIONS = new MutableDataSet()
            .set(Parser.REFERENCES_KEEP, KeepType.LAST)
            .set(HtmlRenderer.INDENT_SIZE, 2)
            .set(HtmlRenderer.PERCENT_ENCODE_URLS, true)
            .set(Parser.EXTENSIONS, Arrays.asList(TablesExtension.create()))
            ;
    
    static final Parser PARSER = Parser.builder(OPTIONS).build();
    static final HtmlRenderer RENDERER = HtmlRenderer.builder(OPTIONS).build();
}

In the code sample above, ReferenceRepository.KEEP defines the behavior of references when duplicate references are defined in the source. In this case it is configured to keep the last value, whereas the default behavior is to keep the first value.

The HtmlRenderer.INDENT_SIZE and HtmlRenderer.PERCENT_ENCODE_URLS define options to use for rendering. Similarly, other extension options can be added at the same time. Any options not set will default to their respective defaults as defined by their data keys.

All markdown element reference types should be stored using a subclass of NodeRepository<T> as is the case for references, abbreviations and footnotes. This provides a consistent mechanism for overriding the default behavior of these references for duplicates from keep first to keep last.

By convention, data keys are defined in the extension class and in the case of the core in the Parser or HtmlRenderer.

⚠️ The DataHolder argument passed to the DataValueFactory::create() method will be null when creating a read-only default value instance for use by the key. The class constructor should be able to handle this case seamlessly. To make it convenient to implement such classes, use the DataKey::getFrom(DataHolder) method instead of the DataHolder::get(DataKey) method to access the values of interest. The former will provide the key's default value if the data holder argument is null, the latter will generate a run time java.lang.ExceptionInInitializerError error.

Option data keys for the Parser:

Static Field Default Value Description
ASTERISK_DELIMITER_PROCESSOR true enable asterisk delimiter inline processing.
BLOCK_QUOTE_PARSER true enable parsing of block quotes
BLOCK_QUOTE_IGNORE_BLANK_LINE false block quotes will include blank lines between block quotes and treat them as if the blank lines are also preceded by the block quote marker
BLOCK_QUOTE_TO_BLANK_LINE false block quotes extend to blank line when true. Enables more customary block quote parsing than commonmark strict standard
EXTENSIONS empty list list of extension to use for builders. Can use this option instead of passing extensions to parser builder and renderer builder.
FENCED_CODE_BLOCK_PARSER true enable parsing of fenced code blocks
HEADING_NO_ATX_SPACE false allow headers without a space between # and the header text if true
HEADING_NO_LEAD_SPACE false do not allow non-indent spaces before # for atx headers and text or -/= marker for setext, if true (pegdown and GFM), if false commonmark rules.
HEADING_PARSER true enable parsing of headings
HEADING_SETEXT_MARKER_LENGTH 1 sets the minimum number of - or = needed under a setext heading text before it being recognized as a heading.
HTML_BLOCK_PARSER true enable parsing of html blocks
INDENTED_CODE_BLOCK_PARSER true enable parsing of indented code block
INDENTED_CODE_NO_TRAILING_BLANK_LINES false enable removing trailing blank lines from indented code blocks
INTELLIJ_DUMMY_IDENTIFIER false add '\u001f' to all parse patterns as an allowable character, used by plugin to allow for IntelliJ completion location marker
LIST_BLOCK_PARSER true enable parsing of lists
LISTS_AUTO_LOOSE true enable setting list to loose when a single item in the list is loose
LISTS_BULLET_ITEM_INTERRUPTS_PARAGRAPH true controls whether bullet lists are allowed to interrupt paragraphs. ie. lists do not need to start with a blank line.
LISTS_BULLET_ITEM_INTERRUPTS_ITEM_PARAGRAPH true controls whether bullet list items are allowed to interrupt other item's text paragraphs. ie. second items and sublist items do not need to have a preceding blank line
LISTS_BULLET_MATCH true enable starting a new bullet list when list marker does not match
LISTS_END_ON_DOUBLE_BLANK false enable closing all nested lists on double blank line. Old markdown spec
LISTS_FIXED_INDENT 0 set list item indent to a fixed number of spaces, 0 means item indent is the list item prefix or 4 spaces, whichever is smaller
LISTS_ITEM_TYPE_MATCH true enable starting a new list if list item types are different or treating the mismatch as a sub-item, ie. one is a bullet item the other ordered item, when false a bullet list will contain ordered items, and vice versa unless LISTS_ITEM_MISMATCH_TO_SUBITEM is true
LISTS_ITEM_MISMATCH_TO_SUBITEM false enable treating a mismatched item as a su-bitem instead of starting a new list
LISTS_LOOSE_ON_PREV_LOOSE_ITEM false enable setting the next list item as loose when making an item loose. This makes list items mimic GFM quirky list item parsing.
LISTS_ORDERED_ITEM_INTERRUPTS_PARAGRAPH true controls whether ordered lists are allowed to interrupt paragraphs. ie. lists do not need to start with a blank line.
LISTS_ORDERED_ITEM_INTERRUPTS_ITEM_PARAGRAPH true controls whether ordered list items are allowed to interrupt other item's text paragraphs. ie. second items and sublist items do not need to have a preceding blank line
LISTS_ORDERED_LIST_MANUAL_START true enable lists to start with first item's number taken from the markdown item marker, else always 1
LISTS_ORDERED_NON_ONE_ITEM_INTERRUPTS_PARAGRAPH false control whether items numbered >1. can break paragraphs.
LISTS_ORDERED_NON_ONE_ITEM_INTERRUPTS_PARENT_ITEM_PARAGRAPH false control whether sub items numbered >1. can break parent item's item text paragraphs.
MATCH_CLOSING_FENCE_CHARACTERS true whether the closing fence character has to match opening character, when false then back ticks can open and tildes close and vice versa. The number of characters in the opener and close still have to be the same.
MATCH_NESTED_LINK_REFS_FIRST true custom link ref processors that take tested [] have priority over ones that do not. ie. [[^f]][test] is a wiki link with ^f as page ref followed by ref link test when this option is true. IF false then the same would be a ref link test with a footnote ^f refernce for text
ORDERED_LIST_DOT_ONLY false ordered list items can only use . as delimiter
ORDERED_LIST_INTERRUPTS_PARAGRAPH true enable an ordered list to interrupt a paragraph. if ORDERED_LIST_START is true then it will only interrupt a paragraph when list item number is 1.
ORDERED_LIST_START true enables lists being generated with a start="" attribute if their number is >1, when true and ORDERED_LIST_INTERRUPTS_PARAGRAPH is true then any numbered item will interrupt a paragraph
ORDERED_SUBITEM_INTERRUPTS_PARENT_ITEM false allow an ordered sub-item to interrupt the parent item's paragraph, even if ORDERED_LIST_INTERRUPTS_PARAGRAPH is set to false
PARSE_INNER_HTML_COMMENTS false when true will parse inner HTML comments in HTML blocks
PARSE_MULTI_LINE_IMAGE_URLS false not implemented yet, but when it is will allow parsing of multi line urls: ![](something?
......
) as part of a single paragraph.
REFERENCE_PARAGRAPH_PRE_PROCESSOR true enable parsing of reference definitions
REFERENCES new repository repository for document's reference definitions
REFERENCES_KEEP KeepType.FIRST which duplicates to keep.
THEMATIC_BREAK_PARSER true enable parsing of thematic breaks
THEMATIC_BREAK_RELAXED_START true enable parsing of thematic breaks which are not preceded by a blank line
UNDERSCORE_DELIMITER_PROCESSOR true whether to process underscore delimiters

Option data keys for the HtmlRenderer:

Static Field Default Value Description
DO_NOT_RENDER_LINKS false Disable link rendering in the document. This will cause sub-contexts to also have link rendering disabled.
ESCAPE_HTML false escape all html found in the document
ESCAPE_HTML_BLOCKS value of ESCAPE_HTML escape html blocks found in the document
ESCAPE_HTML_COMMENT_BLOCKS value of ESCAPE_HTML_BLOCKS escape html comment blocks found in the document.
ESCAPE_INLINE_HTML value of ESCAPE_HTML escape inline html found in the document
ESCAPE_INLINE_HTML_COMMENTS value of ESCAPE_HTML_BLOCKS escape inline html found in the document
GENERATE_HEADER_ID false Generate a header id attribute using the configured HtmlIdGenerator but not render it. Use this when an
HARD_BREAK "<br />\n" string to use for rendering hard breaks
INDENT_SIZE 0 how many spaces to use for each indent level of nested tags
FENCED_CODE_LANGUAGE_CLASS_PREFIX language- prefix used for generating the <code> class for a fenced code block, only used if info is not empty
PERCENT_ENCODE_URLS false percent encode urls
RENDER_HEADER_ID false Render a header id attribute for headers using the configured HtmlIdGenerator
SOFT_BREAK "\n" string to use for rendering soft breaks
SOURCE_POSITION_ATTRIBUTE "" name of the source position HTML attribute, source position is assigned as startOffset + '-' + endOffset
SOURCE_POSITION_PARAGRAPH_LINES false if true enables wrapping individual paragraph source lines in span with source position attribute set
SUPPRESS_HTML false suppress html output for all html
SUPPRESS_HTML_BLOCKS value of SUPPRESS_HTML suppress html output for html blocks
SUPPRESS_HTML_COMMENT_BLOCKS value of SUPPRESS_HTML_BLOCKS suppress html output for html comment blocks
SUPPRESS_INLINE_HTML value of SUPPRESS_HTML suppress html output for inline html
SUPPRESS_INLINE_HTML false
SUPPRESS_INLINE_HTML_COMMENTS value of SUPPRESS_INLINE_HTML suppress html output for inline html comments
TYPE "HTML" renderer type. Renderer type extensions can add their own. JiraConverterExtension defines JIRA

Changes to Extension API from commonmark-java

  1. PhasedNodeRenderer and ParagraphPreProcessor interfaces were added with associated Builder methods for extending the parser.

    PhasedNodeRenderer allows an extension to generate HTML for various parts of the HTML document. These phases are listed in the order of their occurrence during document rendering:

    • HEAD_TOP

    • HEAD

    • HEAD_CSS

    • HEAD_SCRIPTS

    • HEAD_BOTTOM

    • BODY_TOP

    • BODY

    • BODY_BOTTOM

    • BODY_LOAD_SCRIPTS

    • BODY_SCRIPTS

    BODY phase is the standard HTML generation phase using the NodeRenderer::render(Node ast) method. It is called for every node in the document.

    The other phases are only called on the Document root node and only for custom renderers that implement the PhasedNodeRenderer interface. The PhasedNodeRenderer::render(Node ast, RenderingPhase phase).

    The extension can call context.render(ast) and context.renderChildren(ast) during any rendering phase. The functions will process the node as they do during the BODY rendering phase. The FootnoteExtension uses the BODY_BOTTOM phase to render the footnotes referenced within the page. Similarly, Table of Contents extension can use the BODY_TOP phase to insert the table of contents at the top of the document.

    The HEAD... phases are not used by any extension but can be used to generate a full HTML document, with style sheets and scripts.

  2. CustomBlockParserFactory, BlockParserFactory and BlockParser are used to extend the parsing of blocks that handle partitioning of the document into blocks, which are then parsed for inlines and post processed.

  3. ParagraphPreProcessor and ParagraphPreProcessorFactory interfaces allow customization of pre-processing of block elements at the time they are closed by the parser. This is done by the ParagraphParser to extract leading reference definition from the paragraph. Special handling of ParagraphParser block was removed from the parser and instead a generic mechanism was added to allow any BlockParser to perform similar functionality and to allow adding custom pre-processors to handle elements other than the built in reference definitions.

  4. BlockPreProcessor and BlockPreProcessorFactory interfaces allow pre-processing of blocks after ParagraphPreProcessor instances have run but before inline parsing is performed. Useful if you want to replace a standard node with a custom one based on its context or children but not inline element information. Currently this mechanism is not used. May be removed in the future if it does not prove to be useful.

  5. Document level, extensible properties were added to allow extensions to have document level properties which are available during rendering. While parsing these are available from the ParserState::getProperties(), state parameter and during post-processing and rendering from the Document node reachable via getDocument() method of any Node.

    The DocumentParser and Document properties will also contain options passed or defined on the Parser.builder() object, in addition to any added in the process of parsing the document.

    ⚠️ HtmlRenderer options are only available on the rendering context object. NodeRenderer extensions should check for their options using the NodeRendererContext.getOptions() not the getDocument() method. If HtmlRenderer was customized with options which were not passed to Parser.Builder then these options will not be available through the document properties. The node renderer context options will contain all custom options defined for HtmlRenderer.builder() and all document properties, which will contain all options passed to the Parser.builder() plus any defined during the parsing process. If an option is customized or defined in the renderer, its value from the document will not be accessible. For these you will need to use the document available through the rendering context getDocument() method.

    DataKey defines the property, its type and default value instantiation. DataHolder and MutableDataHolder interfaces are used to access or set properties, respectively.

    NodeRepository is an abstract class used to create repositories for nodes: references, footnotes and abbreviations.

  6. Since the AST now represents the source of the document not the HTML to be rendered, the text stored in the AST must be as it is in the source. This means that all un-escaping and resolving of references has to be done during the rendering phase. For example a footnote reference to an undefined footnote will be rendered as if it was a Text node, including any emphasis embedded in the footnote id. If the footnote reference is defined it will render both as expected.

    Handling disparate end of lines used in the source. It too must now be handled in the rendering phase. This means that text which contains end of lines must be normalized before it is rendered since it is no longer normalized during parsing.

    This extra processing is not difficult to implement since the necessary member methods were added to the BasedSequence class, which used to represent all text in the AST.

  7. Nodes do not define accept(Visitor) method. Instead visitor handling is delegated via VisitHandler instances and NodeVisitor derived classes.

Parser

Unified options handling was added which are also can be used to selectively disable loading of core processors for greater customization.

Parser.builder() now implements MutableDataHolder so you can use get/set to customize properties.

Parser.builder() now implements MutableDataHolder so you can use get/set to customize p

New extension points for the parser:

  • ParagraphPreProcessor is used by the ParagraphBlock to extract reference definitions from the beginning of the paragraph, but can be used by any other block for the same purpose. Any custom block pre-processors will be called first, in order of their registration. Multiple calls may result since removal of some text can expose text for another pre-processor. Block pre-processors are called until no changes to the block are made.

  • InlineParserFactory is used to override the default inline parser. Only one custom inline parser factory can be set. If none are set then the default will be used.

  • LinkRefProcessor is used to create custom elements that syntactically derive from link refs: [] or ![]. This will work correctly for nested [] in the element and allows for treating the leading ! as plain text if the custom element does not use it. Footnotes ([^footnote ref]) and wiki links ([[]] or [[text|link]]) are examples of such elements.

Renderer

Unified options handling added, existing configuration options were kept but now they modify the corresponding unified property.

Renderer Builder() now has an indentSize(int) method to set size of indentation for hierarchical tags. Same as setting HtmlRenderer.INDENT_SIZE data key in options.

All the HtmlWriter methods now return this so method chaining can be used. Additionally, tag() and indentedTag() methods that take a Runnable will automatically close the tag, and un-indent after the run() method is executed. This makes seeing the HTML hierarchy easier in the rendered output.

Instead of writing out all the opening, closing tags and attributes individually:

class CustomNodeRenderer implements NodeRenderer {
    @Override
    public void render(NodeRendererContext context, HtmlWriter html, BlockQuote node) {
        html.line();
        html.tag("blockquote", getAttrs(node));
        html.line();
        visitChildren(node);
        html.line();
        html.tag("/blockquote");
        html.line();
    }
}

You can combine them and use a lambda to render the children, that way indentation and closing tag is handled automatically:

class CustomNodeRenderer implements NodeRenderer {
    @Override
    public void render(NodeRendererContext context, HtmlWriter html, BlockQuote node) {
        html.withAttr().tagIndent("blockquote", () -> {
            context.visitChildren(node);
        });
    }
}

For increased stack use the added benefits are:

  • indenting child tags
  • attributes are easier to handle since they only require setting the attributes with .attr() and using .withAttr() call before the tag () method
  • tag is automatically close The previous behavior of using explicit attribute parameter is still preserved.

The indentation useful for testing because it is easier to visually validate and correlate:

> - item 1
> - item 2
>     1. item 1
>     2. item 2

the the rendered html:

<blockquote>
  <ul>
    <li>item 1</li>
    <li>item 2
      <ol>
        <li>item 1</li>
        <li>item 2</li>
      </ol>
    </li>
  </ul>
</blockquote>

than this:

<blockquote>
<ul>
<li>item 1</li>
<li>item 2
<ol>
<li>item 1</li>
<li>item 2</li>
</ol>
</li>
</ul>
</blockquote>

Some methods of HtmlWriter were changed to be more descriptive instead of passing boolean arguments. New methods were added to allow accumulation of attributes without having to create a hash map and then invoke extendRenderingNodeAttributes():

  • tagVoid() a void tag
  • tagVoidLine() a void tag that should be by itself on a line, equivalent to .line().tag().line()
  • tagLine() a non-void tag that should start on a new line and should have its closing tag the last one on the line.
  • tagIndent() a tag that indents contained lines, takes a Runnable argument.
  • attr(String name, String value) set an attribute for the next .withAttr() tag generation.
  • withAttr() the next tag should take accumulated attributes as default and allow overrides by extensions.
  • withCondIndent() will do an indent before an indenting child tag. Used by a tight list item, which does not normally do an indent, but if it contains other indenting tags then these should be indented.
  • withCondLine() will output an EOL after the opening tag, but only if a child node produces output. Used to conditionally put parent open/close tags on separate lines and on the same line if there is no text between the tags.
  • HtmlWriter.openPre() open a pre-formatted context to prevent \n from inserting an indent prefix.
  • HtmlWriter.closePre() close a pre-formatted context to allow \n from inserting an indent prefix. Indentation will resume when the last pre-formatted context is closed.

You can also get a renderer sub-context from the current rendering context and set a different html writer and do not renter links settings. There is also a TextCollectingAppendable which you can pass to the NodeRendererContext.getSubContext() method when you need to capture html from nodes. TocExtension uses this to get the header text html, but without any possible links.

withCondLine() and withCondIndent() only work on tag..() functions that take a Runnable argument for handling child node output.