This is the code base of the evaluation shown in the work Simplicity Done Right for Join Ordering.
A short presentation is available at CIDR DB.
- Focused Sampling: The code mimics an in-memory column store and exploits specific access patterns to boost sampling speed.
- Conditional Sampling: The code simulates an index over arbitrary filter predicates. The approach exploits index or index like structures to boost estimation accuracy.
- JOB-Queries: Queries of the Join-Order-Benchmark by Leis et al. with implicit where clauses and queries transformed into an explicit join order according to our enumeration scheme.
Please run the following to compile code for the focused sampling approach and to download the necessary data sets.
./prepare.bash
To run Focused Sampling:
cd FocusedSampling/bin
./focusedSampling
To run Conditional Sampling:
cd ConditionalSampling
python3 evalCondSample.py
To build new conditional samples:
cd buildCondSample
python3 buildCondSample.py
To compare the implicit to the explicit JOB Queries you need to install Postgres and load the (frozen,) official IMDB data or [CSV files].
For running the explicit queries on Postgres, please use the following SQL hints to prevent join reordering and to restrict the physical join operator selection to hash joins (merge joins for sorted attributes):
set join_collapse_limit = 1;
set enable_nestloop to false;
The submodule Simplicity++ (WIP) is a workload-independent implementation that extends the original concept, e.g., by taking multiple freuqency statics into account.
If you want to take a look at the upstream development of Simplicity++, check out the Github Repo.