Skip to content
David edited this page Jun 3, 2013 · 12 revisions

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language, and is a lot of fun to use!

Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

How Storm works

Storm starting tutorial

Storm has a wide community support and there are several companies, such Twitter, that are using it; go to this link to see more companies. There is plenty of tutorials, however, no one provides all the setting process to have the whole system working. So, this section intends to gather all the information and give a painless start to Storm for those who are not used with it and all its related stuff.

Basically, Storm can be set for three different operation modes, in spite all three have common steps, each one involves different configurations. Because of installation, compilation and run issues are usually associated with the operation mode, the tutorial is replicated for each mode.

Installing Storm for testing

Installing Storm for production in single node

here

Installing Storm for production in a cluster

Storm elaborate examples

Documentation

Clone this wiki locally