Skip to content

Latest commit

 

History

History
69 lines (44 loc) · 3.38 KB

spark-sql-streaming-ContinuousWriteRDD.adoc

File metadata and controls

69 lines (44 loc) · 3.38 KB

ContinuousWriteRDD — RDD of WriteToContinuousDataSourceExec Unary Physical Operator

ContinuousWriteRDD is a specialized RDD (RDD[Unit]) that is used exclusively as the underlying RDD of WriteToContinuousDataSourceExec unary physical operator to write records continuously.

ContinuousWriteRDD is created exclusively when WriteToContinuousDataSourceExec unary physical operator is requested to execute and generate an RDD.

ContinuousWriteRDD uses the parent RDD for the partitions and the partitioner.

ContinuousWriteRDD takes the following to be created:

  • Parent RDD (RDD[InternalRow])

  • Write task (DataWriterFactory[InternalRow])

Computing Partition — compute Method

compute(
  split: Partition,
  context: TaskContext): Iterator[Unit]
Note
compute is part of the RDD Contract to compute a partition.

compute requests the EpochCoordinatorRef helper for a remote reference to the EpochCoordinator RPC endpoint (using the __epoch_coordinator_id local property).

Note
The EpochCoordinator RPC endpoint runs on the driver as the single point to coordinate epochs across partition tasks.

compute uses the EpochTracker helper to initializeCurrentEpoch (using the __continuous_start_epoch local property).

compute then executes the following steps (in a loop) until the task (as the given TaskContext) is killed or completed.

compute requests the parent RDD to compute the given partition (that gives an Iterator[InternalRow]).

compute requests the DataWriterFactory to create a DataWriter (for the partition and the task attempt IDs from the given TaskContext and the current epoch from the EpochTracker helper) and requests it to write all records (from the Iterator[InternalRow]).

compute prints out the following INFO message to the logs:

Writer for partition [partitionId] in epoch [epoch] is committing.

compute requests the DataWriter to commit (that gives a WriterCommitMessage).

compute requests the EpochCoordinator RPC endpoint reference to send out a CommitPartitionEpoch message (with the WriterCommitMessage).

compute prints out the following INFO message to the logs:

Writer for partition [partitionId] in epoch [epoch] is committed.

In the end (of the loop), compute uses the EpochTracker helper to incrementCurrentEpoch.

In case of an error, compute prints out the following ERROR message to the logs and requests the DataWriter to abort.

Writer for partition [partitionId] is aborting.

In the end, compute prints out the following ERROR message to the logs:

Writer for partition [partitionId] aborted.