Skip to content

Latest commit

 

History

History
90 lines (60 loc) · 3.34 KB

spark-sql-sql-parsers.adoc

File metadata and controls

90 lines (60 loc) · 3.34 KB

SQL Parsers

ParserInterface — SQL Parser Contract

ParserInterface is the parser contract for extracting LogicalPlan, Expressions, and TableIdentifiers from a given SQL string.

package org.apache.spark.sql.catalyst.parser

trait ParserInterface {
  def parsePlan(sqlText: String): LogicalPlan

  def parseExpression(sqlText: String): Expression

  def parseTableIdentifier(sqlText: String): TableIdentifier
}

It has the only single abstract subclass AbstractSqlParser.

AbstractSqlParser

AbstractSqlParser abstract class is a ParserInterface that provides the foundation for the SQL parsing infrastructure in Spark SQL with two concrete implementations: SparkSqlParser and CatalystSqlParser.

AbstractSqlParser expects that subclasses provide custom AstBuilder (as astBuilder) that converts a ParseTree (from ANTLR) into an AST.

SparkSqlParser

SparkSqlParser is the default parser of the SQL statements supported in Spark SQL. It is available as a ParserInterface object in SessionState (as sqlParser).

It uses its own specialized astBuilder, i.e. SparkSqlAstBuilder, that extends CatalystSqlParser's AstBuilder.

It is used for the expr function.

scala> expr("token = 'hello'")
16/07/07 18:32:53 INFO SparkSqlParser: Parsing command: token = 'hello'
res0: org.apache.spark.sql.Column = (token = hello)
Tip

Enable INFO logging level for org.apache.spark.sql.execution.SparkSqlParser logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.execution.SparkSqlParser=INFO

Refer to Logging.

Caution
FIXME Review parse method.

CatalystSqlParser

CatalystSqlParser is a AbstractSqlParser that comes with its own specialized astBuilder (i.e. AstBuilder).

CatalystSqlParser is used to parse data types (using their canonical string representation), e.g. when creating StructType or executing cast function (on a Column).

import org.apache.spark.sql.types._
scala> val struct = (new StructType).add("a", "int")
16/07/07 18:50:52 INFO CatalystSqlParser: Parsing command: int
struct: org.apache.spark.sql.types.StructType = StructType(StructField(a,IntegerType,true))

scala> expr("token = 'hello'").cast("int")
16/07/07 19:00:26 INFO SparkSqlParser: Parsing command: token = 'hello'
16/07/07 19:00:26 INFO CatalystSqlParser: Parsing command: int
res0: org.apache.spark.sql.Column = CAST((token = hello) AS INT)

It is also used in SimpleCatalogRelation, MetastoreRelation, and OrcFileOperator.

Tip

Enable INFO logging level for org.apache.spark.sql.catalyst.parser.CatalystSqlParser logger to see what happens inside.

Add the following line to conf/log4j.properties:

log4j.logger.org.apache.spark.sql.catalyst.parser.CatalystSqlParser=INFO

Refer to Logging.