Skip to content

Latest commit

 

History

History
72 lines (48 loc) · 2.19 KB

spark-sql-streaming-DataSource.adoc

File metadata and controls

72 lines (48 loc) · 2.19 KB

DataSource — Pluggable Data Source

DataSource is…​FIXME

DataSource is created when…​FIXME

Tip
Read DataSource — Pluggable Data Sources (for Spark SQL’s batch structured queries).
Table 1. DataSource’s Internal Registries and Counters
Name Description

sourceInfo

SourceInfo with the name, the schema, and optional partitioning columns of a source.

Used when:

Describing Name and Schema of Streaming Source — sourceSchema Internal Method

sourceSchema(): SourceInfo

sourceSchema…​FIXME

Note
sourceSchema is used exclusively when DataSource is requested SourceInfo.

Creating DataSource Instance

DataSource takes the following when created:

  • SparkSession

  • Name of the class

  • Paths (default: Nil, i.e. an empty collection)

  • Optional user-defined schema (default: None)

  • Names of the partition columns (default: empty)

  • Optional BucketSpec (default: None)

  • Configuration options (default: empty)

  • Optional CatalogTable (default: None)

DataSource initializes the internal registries and counters.

createSource Method

createSource(metadataPath: String): Source

createSource…​FIXME

Note
createSource is used when…​FIXME

Creating Streaming Sink — createSink Method

createSink(outputMode: OutputMode): Sink

createSink…​FIXME

Note
createSink is used exclusively when DataStreamWriter is requested to start.