Please go here: https://github.com/dataArtisans/performance
Flink performance tests
Add
WordCount WordCount NoComb
K-Means
- low dimensional (3 dimensions k =20)
- high dimensional (1000 dimensions, k =200)
TPC-H with two joins and aggregation (Q3 if suitable)
Connected components
PageRank
Contains (for now) one large test job that tests all Flink components.
This code originated from the Stratosphere.eu project. It is now called Apache Flink (incubating).
Original issue for planning this feature: stratosphere/stratosphere#379
Download and build the TPC-H generate tool using ./prepareTPCH.sh
.
Generate data using ./generateTPCH.sh
- Generate Avro File
This command will generate an Avro file from the original TPC-H orders file. (This step is not necessary when running the test job locally.)
java -cp target/testjob-0.1-SNAPSHOT.jar eu.stratosphere.test.testPlan.LargeTestPlan '/input/orders.tbl' '/output/orders.avro'
- Generate Sequence File
java -cp target/testjob-0.1-SNAPSHOT.jar eu.stratosphere.test.testPlan.SequenceFileGenerator SeqOut 1000000 15
(45 MB)
The following parameters need to be passed to the main()-Method of LargeTestPlan (in that order).
Parameter | Description | Example |
---|---|---|
customer | path to TPC-H file | file:///.../customer.tbl |
lineitem | path to TPC-H file | file:///.../lineitem.tbl |
nation | path to TPC-H file | file:///.../nation.tbl |
orders | path to TPC-H file | file:///.../orders.tbl |
region | path to TPC-H file | file:///.../region.tbl |
orderAvroFile | path to generated Avro file | file:///.../orders.avro |
sequenceFileInput | path to generated Sequence file | file:///.../seqfile |
outputTableDirectory | path to test output directory | file:///.../directory/ |
maxBulkIterations | max bulk iterations (Test 10), should be greater than number of orders | 1000 |
ordersPath | ordinary path to TPC-H order file | /.../orders.tbl |
outputAccumulatorsPath | ordinary path where accumulator results get stored | /.../accumulators.txt |
outputKeylessReducerPath | ordinary path where AllReduce count results get stored | /.../allreduce.txt |
outputOrderAvroPath | ordinary path where the generated Avro file gets stored | /.../orders.avro |
The following parameters need to be passed to the Job (in that order). The Avro file and the Sequence file need to be generated previously.
Parameter | Description | Example |
---|---|---|
customer | path to TPC-H file | file:///.../customer.tbl |
lineitem | path to TPC-H file | file:///.../lineitem.tbl |
nation | path to TPC-H file | file:///.../nation.tbl |
orders | path to TPC-H file | file:///.../orders.tbl |
region | path to TPC-H file | file:///.../region.tbl |
orderAvroFile | path to generated Avro file | file:///.../orders.avro |
sequenceFileInput | path to generated Sequence file | file:///.../seqfile |
outputTableDirectory | path to test output directory | file:///.../directory/ |
maxBulkIterations | max bulk iterations (Test 10), should be greater than number of orders | 1000 |