Skip to content

Releases: databricks/spark-xml

Version 0.8.0

08 Jan 16:05
Compare
Choose a tag to compare

New Features

  • Support for validating XML rows against an XSD
  • from_xml for parsing an existing column or string to a struct
  • schema_of_xml for inferring schema of XML in a string column

Changes: https://github.com/databricks/spark-xml/milestone/5?closed=1

Version 0.7.0

06 Nov 16:08
cc977ef
Compare
Choose a tag to compare

Fixes

  • Important fix to XML writing, which could cause newlines to be inserted in the wrong place in output (#417)
  • Ignore XML processing instructions, which otherwise fail parsing (#412)
  • Ignore text children in mixed text/element nodes, instead of parsing element incorrectly (#416)

Changes: https://github.com/databricks/spark-xml/milestone/4?closed=1

Version 0.6.0

08 Aug 18:10
348bb13
Compare
Choose a tag to compare

Fixes:

  • Fixed an error that could cause records to be dropped when uncompressed files are read and XML tags happen to span an input split boundary, but fit within the stream read buffer (#400)
  • Fixed issue with nested tags names in attributes (#374)

Improvements:

  • inferSchema can now be set to false during parsing to leave all values as string type (#393)
  • Also treat empty values as null if the nullValue is "" (#381)
  • Log malformed records for debugging (#372)

Changes: https://github.com/databricks/spark-xml/issues?utf8=%E2%9C%93&q=milestone%3A0.6.0+is%3Aclosed+

Version 0.5.0

30 Dec 16:13
1a712ea
Compare
Choose a tag to compare

Spark-xml 0.5.0 include many bug fixes but also following

Improvements :

  • Partial results support #358, #368 and #370
  • XML self-closing tag support #352
  • Scala 2.12 support #343
  • Hadoop 2.9+ support #282
  • Add an option ignoreSurroundingSpaces to allow to trim spaces between values #237

Removals, Behavior Changes and Deprecations

  • Scala 2.10 drop #343

Issues Closed

https://github.com/databricks/spark-xml/milestone/1?closed=1

Version 0.4.1

06 Nov 10:50
Compare
Choose a tag to compare

Spark-xml 0.4.1 adds following

Improvements :

  • Produce the correct results instead of null from pruned scan in some cases - #186, #197
  • Treat string types for nullValue - #182

Removals, Behavior Changes and Deprecations

  • Deprecates treatEmptyAsNulls option - #182

Version 0.3.5

06 Nov 11:18
Compare
Choose a tag to compare

Spark-xml 0.3.5 adds following

Improvements :

  • Produce the correct results instead of null from pruned scan in some cases - #189, #199

Version 0.4.0

10 Sep 11:30
Compare
Choose a tag to compare

Spark-xml 0.4.0 adds following

Features:

  • Support for PERMISSIVE/DROPMALFORMED mode and corrupt record option - #107

Removals, Behavior Changes and Deprecations

  • Deprecate saveAsXmlFile and promote the usage of write() - #150
  • Deprecate xmlFile and promote the usage of read() - #150
  • Drop 1.x compatibility from 0.4.0 - #150
  • Make not supporting UserDefinedType as it became private - #150
  • Change default values for valueTag and attributePrefix to _ and _VALUE - #142

Version 0.3.4

10 Sep 07:35
Compare
Choose a tag to compare

XML Data Source 0.3.4 adds following

Improvements:

  • Produces correct order of columns for nested rows when user specifies a schema - #125
  • No value in nested struct causes arrayIndexOutOfBounds - #121 by @lokm01
  • compression aslias for codec option - #145
  • Remove dead codes - #144
  • Fix nested element with name of parent bug - #161 by @mattroberts297
  • Do not allow empty strings for attributePrefix, valueTag and rowTag - #170
  • Add missed other default case when parsing/inferring XML documents - #166
  • Minor documentation changes - #159 by @mattroberts297 and #143 by @anastasia

Version 0.3.3

25 Apr 10:01
Compare
Choose a tag to compare

XML Data Source 0.3.3 adds following

Improvements:

  • Parse elements in array having attributes correctly
  • Parse correctly duplicated valueTag field in few special cases
  • Parse non-existing element in an array as null
  • Support to parse XML documents when the datatypes in the same elements are structural data type and non-structural data type
  • Ignore comments
  • Improvement of documentation

Version 0.3.2

25 Apr 09:55
Compare
Choose a tag to compare

Spark-xml 0.3.2 adds following

Improvements:

  • Fix a bug in type inference for empty values in structual types
  • Performance improvement
  • Support for parsing correctly when structural data types are specified
  • Parse long characters within tags
  • Added some more tests
  • Parse correctly even if some attributes exist sparsely
  • Ignore namespaces
  • Improvement of documentation