Skip to content

v0.5.0 (Spark 2.4) - 2020-10-21

Compare
Choose a tag to compare
@EnricoMi EnricoMi released this 14 Jul 10:29
· 174 commits to spark-3.3 since this release

Added

  • Load data from Dgraph cluster as GraphFrames GraphFrame.
  • Optionally reads all partitions within the same transaction. This guarantees a consistent snapshot of the graph (issue #6).
    However, concurrent mutations reduce the lifetime of such a transaction and will cause an exception when lifespan exceeds.
  • Add Python API that mirrors the Scala API. The README.md fully documents how to load Dgraph data in PySpark.
  • Fixed dependency conflicts between connector dependencies and Spark
    by shading the Java Dgraph client and all its dependencies.

Changed

  • Refactored connector API, renamed spark.read.dgraph* methods to spark.read.dgraph.*.
  • Moved triples, edges and nodes sources from package uk.co.gresearch.spark.dgraph.connector to uk.co.gresearch.spark.dgraph.
  • Moved Java Dgraph client to 20.03.1 and Dgraph test cluster to 20.07.0.