-
Notifications
You must be signed in to change notification settings - Fork 10
Escaping and quoting issues #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
I'm leaning towards always quote only SVG and other XML tags and all descendants, when in compact*HTML functions. The idea behind those is to be used as a simple HTML compressor. For the normal output, and your current change will greatly reduce the effectiveness for this use-case. Enabling it only where necessary seems like a sensible solution. Maybe even do it for all tags except the known "HTML5" ones. What do you think? |
Minification is indeed a nice goal as long as it doesn't lead to invalid outcome. Hmm, it might be a good idea to keep a whitelist of HTML block tags. Then an emitter could work in one of three modes:
In HTML mode it omits quotation marks; in XML mode it "self-closes" empty tags (which is not legal in HTML). When stepping into a non-whitelisted tag, it switches to XML. However, since this library can be used not only for complete documents but also for fragments, an emitter should start in generic mode by default (to not accidentally break anything). There should be an optional parameter in Any remarks on that? Edit: I just realized there already is "HTML" word in the method's name, so specifying it explicitly would be a bit redundant. Does it attract enough attention that we could make |
Personally, I think that attribute values should either always be quoted, or that there should be some sort of setting to specify whether the output should be "compact" (non-quoted) or fully quoted. I've rarely if ever seen HTML with non-quoted attribute values before encountering it with htmld, and while yeah, it does save a few bytes here and there, it just feels sloppy and inconsistent to me to leave certain values unquoted even if it is technically allowed by the spec. |
Yes, I also consistently quote values when have to write HTML by hand. But here we deal with machine-generated code. I see nothing bad in saving a few bytes if it is not intended to be human-readable. If, however, it is (I don't know your use-case), then you might want to turn minification off altogether (with |
The idea of the compactHTML methods was to produce the smallest possible valid representation of the nodes, not for humans to look at or having feelings about, but for crawlers and browsers to ingest. Note that being XML compatible is not at all the primary goal of this library, but I do agree it doesn't hurt to aim for it. I think having the non-compact methods allow for a setting is a great idea, but compactHTML methods should try to live up to their name. |
How about having a compactXML method? Then we could switch to this when we encounter for example an SVG node. Would be useful for someone using the library as XML as well. |
I'm afraid there would be too much duplicated code, so I'd stick with an The idea is to be able to get the smallest possible representation that is:
Edit: By the way, there's XHTML as well. |
Quotation marks are mandatory in XML. Since there is no easy way of telling HTML apart from HTML-looking XML (
<title>
is both an HTML and SVG tag, for example), we should quote everything.This also addresses Attribute values not quoted when getting HTML from Document #31.
Some entities must be preserved to prevent JS injection attacks: