Skip to content

Comments

Preserve ESI tags verbatim during processing#7

Merged
mpdude merged 5 commits intomainfrom
fix-esi-processing
Jan 30, 2026
Merged

Preserve ESI tags verbatim during processing#7
mpdude merged 5 commits intomainfrom
fix-esi-processing

Conversation

@mpdude
Copy link
Member

@mpdude mpdude commented Jan 30, 2026

This PR improves the handling of ESI (Edge Side Includes) tags during HTML5 parsing and serialization, including support for ESI tags that are self-closing, span across HTML element boundaries or contain non-HTML-encoded characters like &.

Approach taken

  • Complete ESI tag preservation: Every ESI tag (opening, closing, or self-closing) is wrapped in an HTML comment during pre-processing and restored verbatim during post-processing
  • Support for arbitrary nesting: ESI tags can now span across HTML element boundaries without being "repaired" by the HTML5 parser
  • Attribute preservation: Original attributes are preserved exactly, preventing encoding changes (e.g., & becoming &)

Each ESI tag is wrapped in an HTML comment that the HTML5 parser treats as atomic. The original tags are preserved verbatim inside the comments and restored exactly during post-processing.

Important: During processing, ESI tags appear as Comment nodes in the DOM, not as Elements. If RewriteHandler transformations move or delete these comment nodes, the final result may not match expectations.

We use the ESI comment syntax defined in Section 3.7 of the ESI specification (<!--esi ... -->) to hide ESI tags from the HTML5 parser, but include an extra html5-tagrewriter marker token.

Why is this approach necessary?

ESI tags present multiple challenges for HTML5 parsing:

  1. Self-closing syntax: ESI tags like <esi:include src="..." /> use self-closing syntax, which does not exist in HTML5. The parser treats them as opening tags, causing incorrect nesting.

  2. Arbitrary interleaving: ESI tags can span across HTML element boundaries:

    <p>Start <esi:remove>content</p><p>more</esi:remove> end</p>

    HTML5 parsers would "repair" such structures, breaking the intended ESI behavior.

  3. Attribute encoding: HTML5 serializers encode special characters (&&amp;), but ESI processors work on a text basis and expect the original characters.

What does the ESI standard say?

The ESI Language Specification 1.0 describes ESI as an "XML-based markup language" (Section 1). However, the standard also explicitly states:

"As a result, the markup that is emitted by the origin server is not valid; it contains interposed elements from the ESI namespace." (Section 1.1)

ESI elements can be arbitrarily interleaved with the underlying content, and that content does not even need to be HTML. The standard makes no statements about whether HTML entities must be applied to attribute values.

Since parsing ESI-containing documents with an XML parser is likely not possible anyway, assuming XML encoding rules (&amp;) is not warranted. The safest approach is to preserve ESI tags verbatim.

@mpdude mpdude changed the title fix esi processing Preserve ESI tags verbatim during processing Jan 30, 2026
@mpdude mpdude merged commit 60a22bd into main Jan 30, 2026
4 checks passed
@mpdude mpdude deleted the fix-esi-processing branch January 30, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant