-
Notifications
You must be signed in to change notification settings - Fork 16
Home
This project contains two email processing applications used for doing detailed performance analysis between IBM InfoSphere Streams and Apache Storm. The applications process emails from the Enron dataset and calculate metrics on the emails.
For a detailed description of the applications, please refer to the report here: https://developer.ibm.com/streamsdev/wp-content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf
In each of the applications, there are three distinct tasks that need to be performed:
- Preprocessing: Merge the Enron dataset into a single file. This is common to Storm and Streams.
- Dataset Creation: Take the merged Enron dataset, and serialize and compress it.
- Execution: Execute the main processing benchmark.
Before you get started, make sure that the following software requirements are installed on your system.
For Apache Storm Benchmark:
- Apache Storm 0.8.2 or 0.9.0.1 (http://storm.incubator.apache.org/)
- Apache Maven >= 3.0 (http://maven.apache.org/index.html)
For InfoSphere Streams Benchmark:
- InfoSphere Streams 3.2 (https://developer.ibm.com/streamsdev/)
- Avro C++: 1.7.4 (http://avro.apache.org/docs/1.7.4/api/cpp/html/index.html) Note: Make sure the include files are located at /usr/local/include and shared libraries at /usr/local/lib
- Boost 1.54.0 (required by Avro) (http://www.boost.org/doc/libs/1_54_0/doc/html/bbv2/installation.html) Note: Make sure the include files are located at /usr/local/include and shared libraries at /usr/local/lib
[Preprocess Enron Email Dataset](Preprocess Enron Email Dataset)
[Create dataset for InfoSphere Streams benchmark](Create dataset for InfoSphere Streams benchmark)
[Running InfoSphere Streams benchmark](Running InfoSphere Streams benchmark)
[Create dataset for Apache Storm benchmark](Create dataset for Apache Storm benchmark)
[Running Apache Storm benchmark ](Running Apache Storm benchmark )