THIS REPOSITORY IS NOT MAINTAINED ANYMORE

Please go here: https://github.com/dataArtisans/performance

Apache Flink: Performance and Testing

Flink performance tests

Add

WordCount WordCount NoComb

K-Means

low dimensional (3 dimensions k =20)
high dimensional (1000 dimensions, k =200)

TPC-H with two joins and aggregation (Q3 if suitable)

Connected components

PageRank

Contains (for now) one large test job that tests all Flink components.

Background information

This code originated from the Stratosphere.eu project. It is now called Apache Flink (incubating).

Original issue for planning this feature: stratosphere/stratosphere#379

Preparation

Generate TPC-H Data

Download and build the TPC-H generate tool using ./prepareTPCH.sh.

Generate data using ./generateTPCH.sh

Generate Special Data

Generate Avro File

This command will generate an Avro file from the original TPC-H orders file. (This step is not necessary when running the test job locally.)

java -cp target/testjob-0.1-SNAPSHOT.jar eu.stratosphere.test.testPlan.LargeTestPlan '/input/orders.tbl' '/output/orders.avro'

Generate Sequence File

java -cp target/testjob-0.1-SNAPSHOT.jar eu.stratosphere.test.testPlan.SequenceFileGenerator SeqOut 1000000 15

(45 MB)

Execute Plan

Execute Plan with LocalExecutor

The following parameters need to be passed to the main()-Method of LargeTestPlan (in that order).

Parameter	Description	Example
customer	path to TPC-H file	file:///.../customer.tbl
lineitem	path to TPC-H file	file:///.../lineitem.tbl
nation	path to TPC-H file	file:///.../nation.tbl
orders	path to TPC-H file	file:///.../orders.tbl
region	path to TPC-H file	file:///.../region.tbl
orderAvroFile	path to generated Avro file	file:///.../orders.avro
sequenceFileInput	path to generated Sequence file	file:///.../seqfile
outputTableDirectory	path to test output directory	file:///.../directory/
maxBulkIterations	max bulk iterations (Test 10), should be greater than number of orders	1000
ordersPath	ordinary path to TPC-H order file	/.../orders.tbl
outputAccumulatorsPath	ordinary path where accumulator results get stored	/.../accumulators.txt
outputKeylessReducerPath	ordinary path where AllReduce count results get stored	/.../allreduce.txt
outputOrderAvroPath	ordinary path where the generated Avro file gets stored	/.../orders.avro

Execute Plan on Cluster

The following parameters need to be passed to the Job (in that order). The Avro file and the Sequence file need to be generated previously.

Parameter	Description	Example
customer	path to TPC-H file	file:///.../customer.tbl
lineitem	path to TPC-H file	file:///.../lineitem.tbl
nation	path to TPC-H file	file:///.../nation.tbl
orders	path to TPC-H file	file:///.../orders.tbl
region	path to TPC-H file	file:///.../region.tbl
orderAvroFile	path to generated Avro file	file:///.../orders.avro
sequenceFileInput	path to generated Sequence file	file:///.../seqfile
outputTableDirectory	path to test output directory	file:///.../directory/
maxBulkIterations	max bulk iterations (Test 10), should be greater than number of orders	1000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

THIS REPOSITORY IS NOT MAINTAINED ANYMORE

Please go here: https://github.com/dataArtisans/performance

Apache Flink: Performance and Testing

Background information

Preparation

Generate TPC-H Data

Generate Special Data

Execute Plan

Execute Plan with LocalExecutor

Execute Plan on Cluster

Files

README.md

Latest commit

History

README.md

File metadata and controls

THIS REPOSITORY IS NOT MAINTAINED ANYMORE

Please go here: https://github.com/dataArtisans/performance

Apache Flink: Performance and Testing

Background information

Preparation

Generate TPC-H Data

Generate Special Data

Execute Plan

Execute Plan with LocalExecutor

Execute Plan on Cluster