DataStreamWriter

Caution

FIXME

DataFrameWriter is a part of Structured Streaming API.

val df: DataFrame = ...

import org.apache.spark.sql.streaming.ProcessingTime
import scala.concurrent.duration._
df.writeStream
  .queryName("textStream")
  .trigger(ProcessingTime(10.seconds))
  .format("console")
  .start

trigger to set the Trigger for a stream query.
queryName
startStream to start a continuous write.

Data Streams (startStream methods)

DataFrameWriter comes with two startStream methods to return a StreamingQuery object to continually write data.

startStream(): StreamingQuery
startStream(path: String): StreamingQuery  // (1)

Sets path option to path and calls startStream()

Note	`startStream` uses StreamingQueryManager.startQuery to create StreamingQuery.

Note	Whether or not you have to specify `path` option depends on the DataSource in use.

Recognized options:

queryName is the name of active streaming query.
checkpointLocation is the directory for checkpointing.

Note	Define options using option or options methods.

Note	It is a new feature of Spark 2.0.0.

Specifying Output Mode (outputMode method)

outputMode(outputMode: OutputMode): DataStreamWriter[T]

outputMode specifies output mode of a streaming Dataset which is what gets written to a streaming sink when there is a new data available.

Currently, the following output modes are supported:

OutputMode.Append — only the new rows in the streaming dataset will be written to a sink.
OutputMode.Complete — entire streaming dataset (with all the rows) will be written to a sink every time there are updates. It is supported only for streaming queries with aggregations.

queryName

queryName(queryName: String): DataStreamWriter[T]

queryName sets the name of a streaming query.

Internally, it is just an additional option with the key queryName.

trigger

trigger(trigger: Trigger): DataStreamWriter[T]

trigger method sets the time interval of the trigger (batch) for a streaming query.

Note	`Trigger` specifies how often results should be produced by a StreamingQuery. See Trigger.

The default trigger is ProcessingTime(0L) that runs a streaming query as often as possible.

Tip	Consult Trigger to learn about `Trigger` and `ProcessingTime` types.

start methods

start(path: String): StreamingQuery
start(): StreamingQuery

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-sql-streaming-DataStreamWriter.adoc

spark-sql-streaming-DataStreamWriter.adoc

DataStreamWriter

Data Streams (startStream methods)

Specifying Output Mode (outputMode method)

queryName

trigger

start methods

foreach

Files

spark-sql-streaming-DataStreamWriter.adoc

Latest commit

History

spark-sql-streaming-DataStreamWriter.adoc

File metadata and controls

DataStreamWriter

Data Streams (startStream methods)

Specifying Output Mode (outputMode method)

queryName

trigger

start methods

foreach