Skip to content

Latest commit

 

History

History
35 lines (18 loc) · 1.88 KB

spark-sql-queryplanner.adoc

File metadata and controls

35 lines (18 loc) · 1.88 KB

QueryPlanner

QueryPlanner transforms a LogicalPlan through a chain of GenericStrategy objects to produce a PhysicalPlan, e.g. SparkPlan for SparkPlanner or the custom SparkPlanner for HiveSessionState.

QueryPlanner contract defines three operations:

  • strategies that returns a collection of GenericStrategy objects.

  • planLater(plan: LogicalPlan): PhysicalPlan that skips the current plan.

  • plan(plan: LogicalPlan) that returns an Iterator[PhysicalPlan] with elements being the result of applying each GenericStrategy object from strategies collection to plan input parameter.

SparkStrategies

SparkStrategies is an abstract QueryPlanner for SparkPlan.

It serves as a source of concrete Strategy objects.

Among available SparkStrategies is SparkPlanner.

SparkPlanner

SparkPlanner is a concrete QueryPlanner (extending SparkStrategies) that uses a SparkContext, a SQLConf, and a collection of Strategy objects (as extraStrategies).

It defines numPartitions method that is the value of spark.sql.shuffle.partitions for the number of partitions to use for joins and aggregations.

strategies collection uses predefined Strategy objects as well as the constructor’s extraStrategies.

Among the Strategy objects is JoinSelection.

Custom SparkPlanner for HiveSessionState

HiveSessionState class uses an custom anonymous SparkPlanner for planner method (part of SessionState contract).

The custom anonymous SparkPlanner uses Strategy objects defined in HiveStrategies.