Why do control characters disappear? #163
-
I noticed this while attempting to write a round-trip property test for an HTML representation of a markup datatype, which is using Lucid for the rendering portion. I find that when text contains a control character (like �) Lucid's output omits the character entirely, whereas I would expect it to print the escape sequence (like
Is this considered a bug, or are my expectations wrong? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Consulting ChatGPT and an HTML3 spec say that control characters are to be avoided due to having no visible representation and parsing issues. Presumably the blaze-builder which lucid is based on made the choice to filter them out. Not a bad choice, think. I believe even ampersand encoding would be considered sketchy due to having no visible representation. Quote from spec:
I don’t consider GPT an authority, but it echoes what I found on w3c.org, so that’s something. Source, c>=' ': condB (\c -> c >= ' ' || c == '\t' || c == '\n' || c == '\r')
(P.liftFixedToBounded P.char7) $
… |
Beta Was this translation helpful? Give feedback.
Consulting ChatGPT and an HTML3 spec say that control characters are to be avoided due to having no visible representation and parsing issues. Presumably the blaze-builder which lucid is based on made the choice to filter them out. Not a bad choice, think. I believe even ampersand encoding would be considered sketchy due to having no visible representation. Quote from spec: