ContinuousWriteRDD
is a specialized RDD
(RDD[Unit]
) that is used exclusively as the underlying RDD of WriteToContinuousDataSourceExec
unary physical operator to write records continuously.
ContinuousWriteRDD
is created exclusively when WriteToContinuousDataSourceExec
unary physical operator is requested to execute and generate an RDD.
ContinuousWriteRDD
uses the parent RDD for the partitions and the partitioner.
ContinuousWriteRDD
takes the following to be created:
compute(
split: Partition,
context: TaskContext): Iterator[Unit]
Note
|
compute is part of the RDD Contract to compute a partition.
|
compute
requests the EpochCoordinatorRef
helper for a remote reference to the EpochCoordinator RPC endpoint (using the __epoch_coordinator_id local property).
Note
|
The EpochCoordinator RPC endpoint runs on the driver as the single point to coordinate epochs across partition tasks. |
compute
uses the EpochTracker
helper to initializeCurrentEpoch (using the __continuous_start_epoch local property).
compute
then executes the following steps (in a loop) until the task (as the given TaskContext
) is killed or completed.
compute
requests the parent RDD to compute the given partition (that gives an Iterator[InternalRow]
).
compute
requests the DataWriterFactory to create a DataWriter
(for the partition and the task attempt IDs from the given TaskContext
and the current epoch from the EpochTracker
helper) and requests it to write all records (from the Iterator[InternalRow]
).
compute
prints out the following INFO message to the logs:
Writer for partition [partitionId] in epoch [epoch] is committing.
compute
requests the DataWriter
to commit (that gives a WriterCommitMessage
).
compute
requests the EpochCoordinator RPC endpoint reference to send out a CommitPartitionEpoch message (with the WriterCommitMessage
).
compute
prints out the following INFO message to the logs:
Writer for partition [partitionId] in epoch [epoch] is committed.
In the end (of the loop), compute
uses the EpochTracker
helper to incrementCurrentEpoch.
In case of an error, compute
prints out the following ERROR message to the logs and requests the DataWriter
to abort.
Writer for partition [partitionId] is aborting.
In the end, compute
prints out the following ERROR message to the logs:
Writer for partition [partitionId] aborted.