-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Twitter Storm is included in high performance distributed computing, is an open source real-time computation engine developed by a company called BackType that was acquired by Twitter in 2011 partially because Twitter uses Storm internally. Nathan Marz is the main contributor to the project.
Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
A Storm cluster is somewhat similar to Hadoop clusters, but while a Hadoop cluster runs map-reduce jobs, Storm runs topologies. One of the primary differences between map-reduce jobs and topologies is that map-reduce jobs eventually end, while topologies are destined to run until you explicitly kill them. Storm clusters define two types of nodes:
Master node: This node runs a daemon process called Nimbus. Nimbus is responsible for distributing code across the cluster, assigning tasks to machines, and monitoring the success and failure of units of work.
Worker nodes: These nodes run a daemon process called the Supervisor. A Supervisor is responsible for listening for work assignments for its machine. It then subsequently starts and stops worker processes. Each worker process executes a subset of a topology, so that the execution of a topology is spread across a multitude of worker processes running on a multitude of machines.
Storm and ZooKeeper: Sitting between the Nimbus and the various Supervisors is the Apache open source project ZooKeeper. ZooKeeper's goal is to enable highly reliable distributed coordination, mainly by acting as a centralized service for distributed cluster functionality.
-
Take a look to the Storm programming model
-
Take a look to the Storm Parallelism
Storm has a wide community support and there are several companies, such Twitter, that are using it; go to this link to see more companies. There is plenty of tutorials, however, no one provides all the setting process to have the whole system working. So, this section intends to gather all the information and give a painless start to Storm for those who are not used with it and all its related stuff.
Basically, Storm can be set for three different operation modes, in spite all three have common steps, each one involves different configurations. Because of installation, compilation and run issues are usually associated with the operation mode, the tutorial is replicated for each mode.