[DO NOT MERGE]Support Spark Connect#651
Conversation
|
|
||
| // The transform method receives protobuf Any from Spark Connect | ||
| // Scala compiler sees com.google.protobuf.Any in the interface signature | ||
| override def transform( |
There was a problem hiding this comment.
Feel free to ignore
In Spark 4.x the signature was changed from relation: protobuf.Any to relation: Array[Byte]. To avoid pain during the migration I would strongly recommend to keep transform as small as possible and better in a separate class. In GraphFrames we separated implementation of the plugin and the plugin logic to be able to have two versions for different spark. You can see an example here: spark3 and spark4
Otherwise you may need to duplicate the whole logic on a day you will work on support of the spark 4.x
There was a problem hiding this comment.
Great call. Thanks, I haven't considered much about Spark 3.x to 4.x breaking change yet (it seems more annoying than I thought..). Let me revisit this in a new revision.
Issue #, if available:
See awslabs/python-deequ#254
Description of changes:
Initial effort to evolve PyDeequ to use Spark Connect instead of the currently fragile Py4J based bridge.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.