Skip to content

Latest commit

 

History

History
100 lines (80 loc) · 3.24 KB

Quick-Start.md

File metadata and controls

100 lines (80 loc) · 3.24 KB

Getting started with Apache CarbonData

This tutorial provides a quick introduction to using CarbonData.

Examples

Firstly suggest you go through all examples, to understand how to create table, how to load data, how to make query.

Interactive Query with the Spark Shell

1.Install

  • Download a packaged release of Spark 1.5.0 or later
  • Configure the Hive Metastore using Mysql (you can use this key words to search:mysql hive metastore) and move mysql-connector-java jar to ${SPARK_HOME}/lib
  • Download thrift, rename to thrift and add to path.
  • Download Apache CarbonData code and build it
$ git clone https://github.com/apache/incubator-carbondata.git carbondata
$ cd carbondata
$ mvn clean install -DskipTests
$ cp assembly/target/scala-2.10/carbondata_*.jar ${SPARK_HOME}/lib
$ mkdir ${SPARK_HOME}/carbondata
$ cp -r processing/carbonplugins ${SPARK_HOME}/carbondata

2 Interactive Data Query

  • Run spark shell
$ cd ${SPARK_HOME}
$ carbondata_jar=./lib/$(ls -1 lib |grep "^carbondata_.*\.jar$")
$ mysql_jar=./lib/$(ls -1 lib |grep "^mysql.*\.jar$")
$ ./bin/spark-shell --master local --jars ${carbondata_jar},${mysql_jar}
  • Create CarbonContext instance
import org.apache.spark.sql.CarbonContext
import java.io.File
import org.apache.hadoop.hive.conf.HiveConf
val cc = new CarbonContext(sc, "./carbondata/store")
cc.setConf("carbon.kettle.home","./carbondata/carbonplugins")
val metadata = new File("").getCanonicalPath + "/carbondata/metadata"
cc.setConf("hive.metastore.warehouse.dir", metadata)
cc.setConf(HiveConf.ConfVars.HIVECHECKFILEFORMAT.varname, "false")
  • Create table
cc.sql("create table if not exists table1 (id string, name string, city string, age Int) STORED BY 'org.apache.carbondata.format'")
  • Create sample.csv file in ${SPARK_HOME}/carbondata directory
cd ${SPARK_HOME}/carbondata
cat > sample.csv << EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF
  • Load data to table1 in spark shell
val dataFilePath = new File("").getCanonicalPath + "/carbondata/sample.csv"
cc.sql(s"load data inpath '$dataFilePath' into table table1")
  • Query data from table1
cc.sql("select * from table1").show
cc.sql("select city, avg(age), sum(age) from table1 group by city").show