Spotguide specification

Apache Spark

One of the default spotguides is for Apache Spark. There are many examples on how to run Spark on Kubernetes/cloud, however they are all running in standalone or YARN mode. Neither of these is really efective, Spark on YARN on Kubernetes is using multiple resource managers that doesn't have any knowledge about each other. Pipeline's spotguide understands the Spark job and the Apache Spark internals as well - all the Spark components are deployed in containers, scheduled by the Kubernetes scheduler and can be (auto)scaled vertically or horizontally.

A typical example of a Spark flow looks like this:

The sequence of events/internals within a cluster is the following:

Apache Zeppelin

The Apache Zeppelin spotguide picks up a change in a Spark notebook and deploys and executes it on Kubernetes/cloud in cluster mode. This is built on top of the Spark spotguide - the Spark Driver runs inside the Kubernetes cluster. Zeppelin uses spark-submit to start RemoteInterpreterServer which is able to execute notebooks on Spark.

A typical example of a Zeppelin flow looks like this:

Apache Kafka

The Apache Kafka spotguide has a good understanding of consumers and producers but more importantly it monitors, scales, rebalances and auto-heals the Kafka cluster. It autodetects broker failures, reassigns workloads and edits partition reassignment files.

Note: Kafka on Kubernetes does not use Zookeper at all. For all quotas, controller election, cluster membership and configuration it is using etcd, a faster and more reliable cloud-native distributed system for coordination and metadata storage.

TiDB

The TiDB spotguide provisions, runs, scales and monitors a TiDB cluster (TiDB, TiKV, PD) on the Pipeline PaaS. It detects failures and auto-scales, heals or rebalances the cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

spotguides.md

spotguides.md

Spotguide specification

Apache Spark

Apache Zeppelin

Apache Kafka

TiDB

Collapse file tree

Files

spotguides.md

Latest commit

History

spotguides.md

File metadata and controls

Spotguide specification

Apache Spark

Apache Zeppelin

Apache Kafka

TiDB