Skip to content

Latest commit

 

History

History
91 lines (60 loc) · 5.71 KB

spark-sql-streaming-IncrementalExecution.adoc

File metadata and controls

91 lines (60 loc) · 5.71 KB

IncrementalExecution — QueryExecution of Streaming Datasets

IncrementalExecution is a QueryExecution of a streaming Dataset that StreamExecution creates when incrementally executing the logical query plan (every trigger).

IncrementalExecution StreamExecution
Figure 1. StreamExecution creates IncrementalExecution (every trigger / streaming batch)
Tip
Details on QueryExecution contract can be found in the Mastering Apache Spark 2 gitbook.

IncrementalExecution registers state physical preparation rule with the parent QueryExecution's preparations that prepares the streaming physical plan (using batch-specific execution properties).

IncrementalExecution is created when:

Table 1. IncrementalExecution’s Internal Registries and Counters (in alphabetical order)
Name Description

planner

SparkPlanner with the following extra planning strategies (in the order of execution):

Note

planner is used to plan (aka convert) an optimized logical plan into a physical plan (that is later available as sparkPlan).

sparkPlan physical plan is then prepared for execution using preparations physical optimization rules. The result is later available as executedPlan physical plan.

state

State preparation rule (i.e. Rule[SparkPlan]) that transforms a streaming physical plan (i.e. SparkPlan with StateStoreSaveExec, StreamingDeduplicateExec and FlatMapGroupsWithStateExec physical operators) and fills missing properties that are batch-specific, e.g.

Used when IncrementalExecution prepares a physical plan (i.e. SparkPlan) for execution (which is when StreamExecution runs a streaming batch and plans a streaming query).

statefulOperatorId

Java’s AtomicInteger

  • 0 when IncrementalExecution is created

  • Incremented…​FIXME

nextStatefulOperationStateInfo Internal Method

nextStatefulOperationStateInfo(): StatefulOperatorStateInfo

nextStatefulOperationStateInfo creates a StatefulOperatorStateInfo with checkpointLocation, runId, the next statefulOperatorId and currentBatchId.

Note
All the properties of StatefulOperatorStateInfo are specified when IncrementalExecution is created.
Note
nextStatefulOperationStateInfo is used exclusively when IncrementalExecution is requested to transform a streaming physical plan using state preparation rule.

Creating IncrementalExecution Instance

IncrementalExecution takes the following when created:

IncrementalExecution initializes the internal registries and counters.