Skip to content

Latest commit

 

History

History
95 lines (72 loc) · 2.14 KB

spark-sql-columns.adoc

File metadata and controls

95 lines (72 loc) · 2.14 KB

Columns

Caution
FIXME

Column type represents…​FIXME

like

Caution
FIXME
scala> df("id") like "0"
res0: org.apache.spark.sql.Column = id LIKE 0

scala> df.filter('id like "0").show
+---+-----+
| id| text|
+---+-----+
|  0|hello|
+---+-----+

Symbols As Column Names

scala> val df = Seq((0, "hello"), (1, "world")).toDF("id", "text")
df: org.apache.spark.sql.DataFrame = [id: int, text: string]

scala> df.select('id)
res0: org.apache.spark.sql.DataFrame = [id: int]

scala> df.select('id).show
+---+
| id|
+---+
|  0|
|  1|
+---+

over function

over(window: expressions.WindowSpec): Column

over function defines a windowing column that allows for window computations to be applied to a window. Window functions are defined using WindowSpec.

Tip
Read about Windows in Windows.

cast

cast method casts a column to a data type. It makes for type-safe maps with Row objects of the proper type (not Any).

cast(to: String): Column
cast(to: DataType): Column

It uses CatalystSqlParser to parse the data type from its canonical string representation.

cast Example

scala> val df = Seq((0f, "hello")).toDF("label", "text")
df: org.apache.spark.sql.DataFrame = [label: float, text: string]

scala> df.printSchema
root
 |-- label: float (nullable = false)
 |-- text: string (nullable = true)

// without cast
import org.apache.spark.sql.Row
scala> df.select("label").map { case Row(label) => label.getClass.getName }.show(false)
+---------------+
|value          |
+---------------+
|java.lang.Float|
+---------------+

// with cast
import org.apache.spark.sql.types.DoubleType
scala> df.select(col("label").cast(DoubleType)).map { case Row(label) => label.getClass.getName }.show(false)
+----------------+
|value           |
+----------------+
|java.lang.Double|
+----------------+