Skip to content

robes/chisel-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

chisel-benchmark

Benchmark tools for CHiSEL.

Overview

This package includes utilities for generating, driving, and plotting benchmark tests. Each module is described below. The modules include a main routine. To learn about their options run python ./chiselbenchmark/<module>.py -h from the command-line, where <module> is in [generator, driver, plotter].

Benchmark data generator

The Python module chiselbenchmark.generator may be used to generate a table of data with different characteristics of denormalization. The main routine requires an argument for the number of rows to be generated. All other parameters are optional. By default it will generate a dataset with a variety of all possible characteristics (simple typed columns, unconstrained "term" columns, denormalized "term" list columns, repeated embedded subconcepts within the main entity). Data are written to stdout so redirect to save to a file. To run the generator programmatically, import the module and invoke help(entities) to learn more.

Note that the generator requires an input file, looked for by default at ~/terms.txt, with a line-delimited list of "terms" to be used by the generator. The terms will be sampled randomly for use in generated term and termlist columns.

To run the command to generate 1000000 row dataset, this command:

python ./chiselbenchmark/generator.py 1000000 >~/1000000.csv

Benchmark test driver

The Python module chiselbenchmark.driver may be used to drive the test cases against the datasets generated by the generator. Options include the set of test cases to be run, the parameters for the test cases, and the conditions to be tests. The required arguments are number of rounds per test and list of datasets by "name" where name can be a filename without extension, if connecting to the default local file catalog. Without further arguments, the script will run all default test cases (see -h for defaults), both conditions, and all default parameters of each test case. Output are written to stdout and stderr so redirect them to files to save the output.

Note that the driver requires a local file system data catalog, looked for by default at ~/benchmarks. The generated datasets (i.e., 1000.csv, 10000.csv, etc.) should be saved directly underneath this directory. Results of each round of tests will be saved under ~/benchmarks/output and deleted during teardown. To debug, the option --disable-teardown may be used.

To run the script with the defaults, with 10 rounds each test, using datasets 1000, 10000, and 100000, this command:

python ./chiselbenchmark/driver.py 10  1000 10000 100000 2>~/error.log >~/results.csv

To runt he script for one example testcase (more than one are allowed), using dataset 1000, with only 3 rounds per test, and using a single param 1, this command:

python ./chiselbenchmark/driver.py 3  1000  --testcases create_vocabulary_then_align_and_tag --params 1  2>~/error.log >~/results.csv

Benchmark test result plotter

The Python module chiselbenchmark.plotter may be used to plot results from the driver. The main routine requires the filename of the results to be plotted. The format should conform to the output produced by the driver module. Either or both of --show and --save should be specified in order to display the plots and/or save the figures to files. Other options include the output format (any accepted by matplotlib), dots-per-inch (dpi), and y-units timescale (s or ms). When saving, the files will be named according to test_case.format in the current working directory.

To plot results.csv and show and save the figures, this command:

python ./chiselbenchmark/plotter.py ~/results.csv --show --save

About

Benchmark tools for CHiSEL

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages