Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriter trigger the "text" event even after the write callback is executed #1046

Closed
ericmorand opened this issue Oct 11, 2023 · 10 comments
Closed

Comments

@ericmorand
Copy link

ericmorand commented Oct 11, 2023

Consider the following code:

import RewritingStream from "parse5-html-rewriting-stream";

const rewritingStream = new RewritingStream();

rewritingStream.on("text", (token, rawHtml) => {
  console.warn('text EVENT', rawHtml);

  rewritingStream.emitText(token);
});

rewritingStream.write(`<div>foo</div>bar`, () => {
  console.warn('WRITE');

  rewritingStream.end();
});

Executing it with node shows that the rewritingStream.write callback is executed before the second text event listener is executed.

I'm not sure it is expected or not - hence I hesitate to call this a bug - but it prevents the safe usage of @RReverser's solution to the #273 issue (#273 (comment)).

Note that it only happens with the text event and only when there is a text after the last close tag (or when there is not tag at all). It happens even is the text is a an empty string.

Context

To give a little bit of context, we use the rewriter to rebase resources in HTML document based on the source map of that HTML document. The library that executes this rebasing is html-source-map-rebase. This is when writing tests for this library that we discovered the issue with the rewriter.

@fb55
Copy link
Collaborator

fb55 commented Oct 15, 2023

This is expected behavior in the rewriter: not all data will have been processed yet, and ending parsing will emit remaining data.

Seems like there is an await missing at the end of the referenced comment.

@fb55
Copy link
Collaborator

fb55 commented Oct 15, 2023

I've updated the comment to add the await

@fb55 fb55 closed this as completed Oct 15, 2023
@ericmorand
Copy link
Author

ericmorand commented Oct 15, 2023

@fb55 how can we get the remaining data if the event is not triggered? Is there a method that we are missing here?

And even if there is a way, how can we avoid the event to be triggered? This event listener write the raw text back to the stream and crashes since the stream is considered as closed.

@fb55
Copy link
Collaborator

fb55 commented Oct 16, 2023

The event is triggered, you just have to wait for the queue to clear.

@RReverser
Copy link
Collaborator

Seems like there is an await missing at the end of the referenced comment.

No, it wasn't really missing. queue was an implementation detail in my implementation and not meant for user consumption.

Rewriter is a Node.js transform stream, and, like with any stream, users should be able to listen for close or finish events if they want to handle end of stream situation explicitly.

As a more general comment, unless it's something minor like stylistic issue, please don't edit someone else's comments without discussing with the author :)

@ericmorand
Copy link
Author

Rewriter is a Node.js transform stream, and, like with any stream, users should be able to listen for close or finish events if they want to handle end of stream situation explicitly.

Oh, I feel so stypid now. My IDE autocompletion didn't propose either close or finish as supported event name for the on method, but it actually is supported - as you explained because it is a Node stream.

Thanks a lot for your time and support. I should be able to handle the issue now.

@RReverser
Copy link
Collaborator

Glad it helped :) I removed the await queue; from my comment back.

@RReverser
Copy link
Collaborator

RReverser commented Oct 16, 2023

My IDE autocompletion didn't propose either close or finish as supported event name for the on method, but it actually is supported - as you explained because it is a Node stream.

I wonder if it's because we export RewritingStream as an interface separately from class for some reason, which overrides the inherited on from SAXParser / TransformStream:

export interface RewritingStream {

@ericmorand
Copy link
Author

ericmorand commented Oct 16, 2023

Maybe, yes. But I would have assumed class and interface merging would trigger here.

Still, compilation is fine because there is an on fallback that accepts any string as an event name:

on(event: string, handler: (...args: any[]) => void): this;

The only drawback is that we don't know at compile time what arguments do the listener accept. Still, for finish and close events, we know from Node documentation that there is none so it is a trivial issue.

@RReverser
Copy link
Collaborator

RReverser commented Oct 16, 2023

class and interface merging would trigger here

there is an on fallback that accepts any string as an event name:

on(event: string, handler: (...args: any[]) => void): this;

Right, I suspect that is the problem - our class inherits from Transform (via SAXParser), which has specialised on, once and other signatures, but we add our own generic untyped on handler that likely overloads those more specialised ones from the base class, so TS picks it first for intellisense.

I might be wrong but worth investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants