Spark

Study Doc

Run on local

sh script/run.sh
sh script/runPyspark.sh

TODO

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

References(Not arranged)

Programming Guides:

Quick Start: a quick introduction to the Spark API; start here!
RDD Programming Guide: overview of Spark basics - RDDs (core but old API), accumulators, and broadcast variables
Spark SQL, Datasets, and DataFrames: processing structured data with relational queries (newer API than RDDs)
Structured Streaming: processing structured data streams with relation queries (using Datasets and DataFrames, newer API than DStreams)
- micro-batch processing model
- Continuous Processing model

Deployment Guides:

Cluster Model Overview : overview of concepts and components when running on a cluster
Submitting Application : packaging and deploying applications
- Amazon EC2: scripts that let you launch a cluster on EC2 in about 5 minutes
- Standalone Deploy Mode: simplest way to deploy Spark on a private cluster. launch a standalone cluster quickly without a third-party cluster manager
  - 가장 가벼움.
  - Spark외에 다른 어플리케이션 사용 불가
- Apache Mesos: deploy a private cluster using Apache Mesos
  - 무겁다
  - 내고장성. 탄력적 분산 시스템을 쉽게 구성
  - 큰 규모의 클러스터에 적합
- Hadoop YARN: deploy Spark on top of Hadoop NextGen (YARN)
  - HDFS를 사용하는 애플리케이션에 적합(HDFS와 강하게 결합 됩.)
  - 클라우드 환경을 제대로 지원하지 못
- Kubernetes: deploy Spark on top of Kubernetes
Understand Cluster Manager, Master, Worker node
Understanding Job, Stage, Task
Setup Cluster
Zeppelin Guide
Pyspark Guide

External Resources:

Example

Books

Spark The Definitive Guide

Later

UnitTest
MLlib: applying machine learning algorithms
- Spark NLP
GraphX: processing graphs
Spark Streaming: processing data streams using DStreams (old API)
Spark Security

Zeppelin

Doc

Machine Learning

Doc

Name		Name	Last commit message	Last commit date
Latest commit History 243 Commits
data		data
kotlin		kotlin
lib		lib
project		project
python		python
script		script
src		src
study-dashboard		study-dashboard
study-data-warehouse		study-data-warehouse
study-etc		study-etc
study-machinelearning		study-machinelearning
study-matplot		study-matplot
study-notebook		study-notebook
study-spark		study-spark
study-statistics		study-statistics
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spark

Run on local

TODO

References(Not arranged)

Programming Guides:

Deployment Guides:

Other Documents:

External Resources:

Example

Books

Later

Zeppelin

Machine Learning

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

dss99911/ds-study

Folders and files

Latest commit

History

Repository files navigation

Spark

Run on local

TODO

References(Not arranged)

Programming Guides:

Deployment Guides:

Other Documents:

External Resources:

Example

Books

Later

Zeppelin

Machine Learning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages