Skip to content

Commit

Permalink
Changes for 0.3.3 release
Browse files Browse the repository at this point in the history
Updates build and readme files only.

Due to some bugs, I am going to bump up the version from `0.3.2` to `0.3.3`.

The bugs includes some major issues for primitive functionalities in few cases.

1. This fails to read some value in attributes in few special cases, #89.
2. Duplicated `valueTag` field in few special cases, #96.
3. Non-existing element in an array when it is supposed to be `null`, #95.
4. Failed to parse XML documents when the datatypes in the same elements are structural data type and non-structural data type, #106
5. Always ignore comments, #110

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #99 from HyukjinKwon/version-0.3.3.
  • Loading branch information
HyukjinKwon committed Apr 25, 2016
1 parent 41e0fc5 commit ad7abbd
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 8 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,30 +20,30 @@ You can link against this library in your program at the following coordinates:
```
groupId: com.databricks
artifactId: spark-xml_2.10
version: 0.3.2
version: 0.3.3
```
### Scala 2.11
```
groupId: com.databricks
artifactId: spark-xml_2.11
version: 0.3.2
version: 0.3.3
```

## Using with Spark shell
This package can be added to Spark using the `--packages` command line option. For example, to include it when starting the spark shell:

### Spark compiled with Scala 2.10
```
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.2
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.10:0.3.3
```

### Spark compiled with Scala 2.11
```
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.11:0.3.2
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-xml_2.11:0.3.3
```

## Features
This package allows reading XML files in local or distributed filesystem as [Spark DataFrames](https://spark.apache.org/docs/1.3.0/sql-programming-guide.html).
This package allows reading XML files in local or distributed filesystem as [Spark DataFrames](https://spark.apache.org/docs/1.6.0/sql-programming-guide.html).
When reading files the API accepts several options:
* `path`: Location of files. Similar to Spark can accept standard Hadoop globbing expressions.
* `rowTag`: The row tag of your xml files to treat as a row. For example, in this xml `<books> <book><book> ...</books>`, the appropriate value would be `book`. Default is `ROW`.
Expand Down Expand Up @@ -436,7 +436,7 @@ Automatically infer schema (data types)
```R
library(SparkR)

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-xml_2.10:0.3.2" "sparkr-shell"')
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-xml_2.10:0.3.3" "sparkr-shell"')
sqlContext <- sparkRSQL.init(sc)

df <- read.df(sqlContext, "books.xml", source = "com.databricks.spark.xml", rowTag = "book")
Expand All @@ -449,7 +449,7 @@ You can manually specify schema:
```R
library(SparkR)

Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:0.3.2" "sparkr-shell"')
Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:0.3.3" "sparkr-shell"')
sqlContext <- sparkRSQL.init(sc)
customSchema <- structType(
structField("@id", "string"),
Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name := "spark-xml"

version := "0.3.2"
version := "0.3.3"

organization := "com.databricks"

Expand Down

0 comments on commit ad7abbd

Please sign in to comment.