Skip to content

0.3.0

Compare
Choose a tag to compare
@jmdavis jmdavis released this 19 Apr 14:30
· 42 commits to master since this release

The biggest changes here are that basic writer support has been added with dxml.writer and that throwOnEntityRef was added to the parser config for managing how non-standard entity references are handled.

  • dxml.writer has been added. It provides functionality for creating XML documents.

  • The deprecated dxml.parser.stax (the old name for dxml.parser) has been removed. Any code that has not yet been updated to use dxml.parser instead will no longer compile until it's updated to import dxml.parser instead of dxml.parser.stax.

  • Fixed dxml issue #5: dxml.parser.Config has a new option: throwOnEntityRef. It controls how EntityRange handles entity references.

    Any entity references other than the five defined by the XML spec must be declared in the DTD, and dxml does not support parsing the DTD beyond what is required to skip it. So, dxml cannot correctly handle such entity references. Prior to this release of dxml, EntityRange always threw when encountering such entity references. Without DTD support, the only alternative would be to ignore them, and doing so means losing an unknown amount of XML (since they can refer to arbitrarily complex XML), which could be a non-issue or a complete disaster depending on the XML document and what the program is trying to do. However, that tradeoff may be acceptable to some programs, particularly since the alternative is not being able to parse the document at all past the point that such an entity reference is encountered.

    So, with this release, Config.throwOnEntityRef has been added. The default is ThrowOnEntityRef.yes, which is the same behavior as in previous releases of dxml - an XMLParsingException is thrown if a non-standard entity references is ecountered. However, if throwOnEntityRef == ThrowOnEntityRef.no, then non-standard entity references are treated as normal text so long as they are syntactically valid (i.e. they have to contain the correct characters to be a valid entity reference). As the DTD was not parsed, they are not semantically validated (and thus may or may not have been correctly declared in the DTD), and they are not replaced with whatever they refer to (which EntityRange doesn't even do with the five predefined entity references, since that doesn't work with its slicing semantics). So, aside from the fact they they are semantically validated, they are treated as normal text and passed on to the application.

    simpleXML.throwOnEntityRef == ThrowOnEntityRef.yes, so its behavior is unchanged.

  • Fixed dxml issue #4: Renamed normalize to decodeXML and asNormalized to asDecodedXML. The old names are still provided as deprecated aliases but will be removed in dxml 0.4.0.

    The new names fit better with the newly added writer functionality and don't risk conflicting with std.utf.normalize (which is a function that is fairly likely to be used in conjunction with dxml and would therefore have resulted in symbol conflicts).

  • Fixed dxml issue #3: Improved the error message when the XML document has whitespace before the <?xml...?> declaration.

  • Fixed an @safety bug in asDecodedXML/asNormalized. Internally, the return type maintained a static array of code units for a decoded character reference, and a dynamic array which sliced it to keep track of the iteration through that buffer. It was smart enough to use the postblit constructor to fix-up the dynamic array when the struct was copied, but it did not take into account the fact that the struct could be moved, in which case the dynamic array would refer to an invalid slice of memory and produce the wrong results. Without -dip1000, the compiler doesn't catch that @safety hole, and -dip1000 isn't ready yet, so it wasn't caught by the compiler when testing dxml. Fortunately, that particular issue was recently discussed in D.Learn with regards to -dip1000 as well as in the discussions which sparked the opMove DIP. So, the @safety bug in asDecodedXML/asNormalized was discovered, and it has now been fixed.