Skip to content

dss99911/ds-study

Repository files navigation

Spark

Run on local

sh script/run.sh
sh script/runPyspark.sh

TODO

References(Not arranged)

Programming Guides:

  • Quick Start: a quick introduction to the Spark API; start here!
  • RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
  • Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
  • Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
    • micro-batch processing model
    • Continuous Processing model

Deployment Guides:

  • Cluster Model Overview : overview of concepts and components when running on a cluster
  • Submitting Application : packaging and deploying applications
    • Amazon EC2: scripts that let you launch a cluster on EC2 in about 5 minutes
    • Standalone Deploy Mode: simplest way to deploy Spark on a private cluster. launch a standalone cluster quickly without a third-party cluster manager
      • 가장 가벼움.
      • Spark외에 다른 어플리케이션 사용 불가
    • Apache Mesos: deploy a private cluster using Apache Mesos
      • 무겁다
      • 내고장성. 탄력적 분산 시스템을 쉽게 구성
      • 큰 규모의 클러스터에 적합
    • Hadoop YARN: deploy Spark on top of Hadoop NextGen (YARN)
      • HDFS를 사용하는 애플리케이션에 적합(HDFS와 강하게 결합 됩.)
      • 클라우드 환경을 제대로 지원하지 못
    • Kubernetes: deploy Spark on top of Kubernetes
  • Understand Cluster Manager, Master, Worker node
  • Understanding Job, Stage, Task
  • Setup Cluster
  • Zeppelin Guide
  • Pyspark Guide

Other Documents:

External Resources:

Example

Books

  • Spark The Definitive Guide

Later

  • UnitTest

  • MLlib: applying machine learning algorithms

    • Spark NLP
  • GraphX: processing graphs

  • Spark Streaming: processing data streams using DStreams (old API)

  • Spark Security

Zeppelin

Machine Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •