v0.5.0 (Spark 2.4) - 2020-10-21
EnricoMi
released this
14 Jul 10:29
·
174 commits
to spark-3.3
since this release
Added
- Load data from Dgraph cluster as GraphFrames
GraphFrame
. - Optionally reads all partitions within the same transaction. This guarantees a consistent snapshot of the graph (issue #6).
However, concurrent mutations reduce the lifetime of such a transaction and will cause an exception when lifespan exceeds. - Add Python API that mirrors the Scala API. The README.md fully documents how to load Dgraph data in PySpark.
- Fixed dependency conflicts between connector dependencies and Spark
by shading the Java Dgraph client and all its dependencies.
Changed
- Refactored connector API, renamed
spark.read.dgraph*
methods tospark.read.dgraph.*
. - Moved
triples
,edges
andnodes
sources from packageuk.co.gresearch.spark.dgraph.connector
touk.co.gresearch.spark.dgraph
. - Moved Java Dgraph client to 20.03.1 and Dgraph test cluster to 20.07.0.