sh script/run.sh
sh script/runPyspark.sh- Quick Start: a quick introduction to the Spark API; start here!
- RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
- Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
- Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
- micro-batch processing model
- Continuous Processing model
- Cluster Model Overview : overview of concepts and components when running on a cluster
- Submitting Application : packaging and deploying applications
- Amazon EC2: scripts that let you launch a cluster on EC2 in about 5 minutes
- Standalone Deploy Mode: simplest way to deploy Spark on a private cluster. launch a standalone cluster quickly without a third-party cluster manager
- 가장 가벼움.
- Spark외에 다른 어플리케이션 사용 불가
- Apache Mesos: deploy a private cluster using Apache Mesos
- 무겁다
- 내고장성. 탄력적 분산 시스템을 쉽게 구성
- 큰 규모의 클러스터에 적합
- Hadoop YARN: deploy Spark on top of Hadoop NextGen (YARN)
- HDFS를 사용하는 애플리케이션에 적합(HDFS와 강하게 결합 됩.)
- 클라우드 환경을 제대로 지원하지 못
- Kubernetes: deploy Spark on top of Kubernetes
- Understand Cluster Manager, Master, Worker node
- Understanding Job, Stage, Task
- Setup Cluster
- Zeppelin Guide
- Pyspark Guide
- Spark Tuning & Monitoring
- Spark Kotlin
- Configuration: customize Spark via its configuration system
- Job Scheduling
- Shuffle
- Security: Spark security support
- Hardware Provisioning: recommendations for cluster hardware
- Integration with other storage systems:
- Cloud Infrastructures
- OpenStack Swift
- Migration Guide: Migration guides for Spark components
- Building Spark: build Spark using the Maven system
- Contributing to Spark
- Third Party Projects: related third party Spark projects
- Zeppelin add maven package
export SPARK_SUBMIT_OPTIONS="--packages com.databricks:spark-csv_2.10:1.2.0"- https://zeppelin.apache.org/docs/latest/interpreter/spark.html#1-export-spark_home
- Spark with Excel
- Spark The Definitive Guide