11# Database Benchmark Report Framework
22
3- A modular framework for running and documenting database benchmarks, with a focus on comparing ** Exasol** with other database systems. This repository provides reusable building blocks to launch benchmark environments, collect detailed system information, run benchmark workloads, and generate reports documenting the results.
3+ A modular framework for running and documenting database benchmarks, with a focus on comparing ** Exasol** with
4+ other database systems. This repository provides reusable building blocks to launch benchmark environments,
5+ collect detailed system information, run benchmark workloads, and generate reports documenting the results.
46
57## Features
68
79- 🏗️ ** Modular Architecture** : Fine-grained templates for setup, execution, and reporting
8- - ☁️ ** Multi-Cloud Support** : AWS infrastructure automation with separate instances per database
10+ - ☁️ ** Multi-Cloud Support** : Infrastructure automation with separate instances per database
911- 📊 ** Benchmark Workloads** : TPC-H with support for custom workloads
1012- 📝 ** Self-Contained Reports** : Generate reproducible reports with all attachments
1113- 🔧 ** Extensible** : Easy to add new systems, workloads, and cloud providers
1214- 📈 ** Rich Visualizations** : Automated generation of performance plots and tables
1315- 🔍 ** Result Verification** : Validate query correctness against expected outputs
1416
17+ ## Requirements
18+
19+ - Python 3.10+
20+ - ** Terraform** (for cloud infrastructure) - [ Installation Guide] ( https://developer.hashicorp.com/terraform/install )
21+
1522## Quick Start
1623
17- ``` bash
18- # Clone the repository
19- git clone < repository-url>
24+ > [ !TIP]
25+ > You might have to set up a python virtual environment for installing python packages.
26+
27+ > [ !CAUTION]
28+ > The sample benchmark uses AWS cloud infrastructure. See [ Getting Started Guide] ( user-docs/GETTING_STARTED.md )
29+ > for detailed cloud setup instructions.
30+ > Note that AWS infrastructure is usually not free to use.
31+
32+ ``` shell
33+ # 1. Clone and enter the repository
34+ git clone https://github.com/exasol/benchkit.git
2035cd benchkit
2136
22- # Install dependencies
37+ # 2. Install dependencies and local package
2338python -m pip install -e .
2439
25- # Run a sample benchmark
40+ # 3. Copy and edit example environment
41+ cp .env.example .env
42+ $EDITOR .env
43+
44+ # 3b. (temporary) fix hardcoded ssh-key names in 'env' section of configuration
45+ $EDITOR configs/exa_vs_ch_1g.yaml
46+
47+ # 4. Validate your configuration
48+ python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml
49+
50+ # 5. Run sample benchmark
2651make all CFG=configs/exa_vs_ch_1g.yaml
27- ```
2852
29- This will:
30- 1 . Provision cloud infrastructure (if configured)
31- 2 . Probe system information
32- 3 . Run Exasol vs ClickHouse TPC-H benchmark
33- 4 . Generate a complete report with results and reproducibility instructions
53+ # 6. Clean up AWS resources
54+ make infra-destroy CFG=configs/exa_vs_ch_1g.yaml
3455
35- 📖 ** See [ Getting Started Guide] ( GETTING_STARTED.md ) for detailed installation and usage instructions.**
56+ # 7. view benchmark report
57+ ...TBD
58+ ```
3659
3760## Usage
3861
3962The framework provides 9 commands for complete benchmark lifecycle management:
4063
4164``` bash
65+ # Manage infrastructure
66+ benchkit infra apply --provider aws --config configs/my_benchmark.yaml
67+
4268# System information collection
4369benchkit probe --config configs/my_benchmark.yaml
4470
@@ -48,207 +74,74 @@ benchkit run --config configs/my_benchmark.yaml [--systems exasol] [--queries Q0
4874# Generate reports
4975benchkit report --config configs/my_benchmark.yaml
5076
51- # Manage infrastructure
52- benchkit infra apply --provider aws --config configs/my_benchmark.yaml
53-
5477# Other commands: execute, status, package, verify, cleanup
5578```
5679
5780** Status Command** provides comprehensive project insights:
81+
5882- Overview of all projects (probe, benchmark, report status)
5983- Detailed status for specific configs (system info, infrastructure, timing)
6084- Cloud infrastructure details (IPs, connection strings)
6185- Multiple config support and smart project lookup
6286
63- 📖 ** See [ Getting Started Guide] ( GETTING_STARTED.md ) for comprehensive CLI documentation and examples.**
87+ 📖 ** See [ Getting Started Guide] ( user-docs/ GETTING_STARTED.md) for comprehensive CLI documentation and examples.**
6488
65- ## Repository Structure
89+ ## Repository Structure (User Version)
6690
6791```
6892benchkit/
6993├── benchkit/ # Core framework
70- │ ├── cli.py # Command-line interface (9 commands)
71- │ ├── systems/ # Database system implementations
72- │ ├── workloads/ # Benchmark workloads (TPC-H)
73- │ ├── gather/ # System information collection
74- │ ├── run/ # Benchmark execution
75- │ ├── report/ # Report generation
76- │ ├── infra/ # Cloud infrastructure management
77- │ ├── package/ # Minimal package creation
78- │ └── verify/ # Result verification
79- ├── templates/ # Jinja2 templates for reports
8094├── configs/ # Benchmark configurations
81- ├── infra/aws/ # AWS Terraform modules
82- ├── workloads/tpch/ # TPC-H queries and schemas
8395└── results/ # Generated results (auto-created)
8496```
8597
86- ## Configuration Example
87-
88- ``` yaml
89- project_id : " exasol_vs_clickhouse_tpch"
90- title : " Exasol vs ClickHouse Performance on TPC-H"
91-
92- env :
93- mode : " aws"
94- region : " eu-west-1"
95- instances :
96- exasol :
97- instance_type : " m7i.4xlarge"
98- clickhouse :
99- instance_type : " m7i.4xlarge"
100-
101- systems :
102- - name : " exasol"
103- kind : " exasol"
104- version : " 2025.1.0"
105- setup :
106- method : " installer"
107- extra :
108- dbram : " 32g"
109-
110- - name : " clickhouse"
111- kind : " clickhouse"
112- version : " 24.12"
113- setup :
114- method : " native"
115- extra :
116- memory_limit : " 32g"
117-
118- workload :
119- name : " tpch"
120- scale_factor : 1
121- queries :
122- include : ["Q01", "Q03", "Q06", "Q13"]
123- runs_per_query : 3
124- warmup_runs : 1
125- ` ` `
98+ See [ Developer Guide] ( dev-docs/DEVELOPERS.md ) for a more detailed structure definition.
12699
127- 📖 **See [Getting Started Guide](GETTING_STARTED.md) for more configuration examples.**
100+ ## Defining Your Own Benchmarks
128101
129- ## Requirements
102+ You can easily create your own benchmark by creating a yaml configuration file combining
130103
131- - Python 3.10+
132- - **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install)
133- - At least 16GB RAM (32GB+ recommended for larger benchmarks)
134- - SSD storage recommended
104+ - One infrastructure provider (aws/docker/local/...)
105+ - One workload (benchmark type) to be executed
106+ - Multiple systems (software) to be tested
135107
136- ### AWS Setup (Optional)
108+ 📖 ** See [ Getting Started Guide] ( user-docs/GETTING_STARTED.md ) for information on how to create
109+ benchmark configurations using supported modules.**
137110
138- For cloud deployments, configure AWS credentials:
111+ ## Support Matrix
139112
140- ` ` ` bash
141- # Create .env file (recommended)
142- cat > .env << EOF
143- AWS_PROFILE=default-mfa
144- AWS_REGION=eu-west-1
145- EOF
146- ```
147-
148- ** Required AWS Permissions** : ` ec2:* ` , ` ec2:DescribeImages ` , ` ec2:DescribeAvailabilityZones `
149-
150- 📖 ** See [ Getting Started Guide] ( GETTING_STARTED.md ) for detailed cloud setup instructions.**
151-
152- ## Extending the Framework
153-
154- The framework is designed for easy extension:
113+ ### setup / installation
155114
156- ### Quick Example: Adding a New Database System
115+ | system | local | aws | docker | gcp | azure |
116+ | ------------| -------| ---------------| --------| -----| -------|
117+ | Exasol | ✗ | ✓<sup >1</sup > | ✗ | ✗ | ✗ |
118+ | ClickHouse | ✗ | ✓<sup >1</sup > | ✗ | ✗ | ✗ |
157119
158- 1 . Create ` benchkit/systems/newsystem.py ` :
159-
160- ``` python
161- from .base import SystemUnderTest
162-
163- class NewSystem (SystemUnderTest ):
164- @ classmethod
165- def get_python_dependencies (cls ) -> list[str ]:
166- return [" newsystem-driver>=1.0.0" ]
167-
168- def execute_query (self , query : str , query_name : str = None ):
169- # Use native Python driver for universal connectivity
170- pass
171-
172- # ... implement other required methods
173- ```
174-
175- 2 . Register in ` benchkit/systems/__init__.py ` :
176-
177- ``` python
178- SYSTEM_IMPLEMENTATIONS = {
179- " exasol" : " ExasolSystem" ,
180- " clickhouse" : " ClickHouseSystem" ,
181- " newsystem" : " NewSystem" , # Add this line
182- }
183- ```
120+ Notes:
184121
185- 📖 ** See [ Extending the Framework] ( EXTENDING.md ) for comprehensive guides on:**
186- - Adding new database systems
187- - Creating custom workloads
188- - Adding cloud providers
189- - Customizing reports and visualizations
190- - Implementing result verification
122+ 1 . Only single-node deployments supported at this time.
191123
192- ## Key Design Principles
124+ ### "tpch" workload
193125
194- ### 1. Self-Contained Reports
195-
196- Every report is a complete directory with:
197- - All result data as attachments
198- - Exact configuration files
199- - Minimal reproduction package
200- - Complete setup commands
201-
202- ### 2. Installation-Independent Connectivity
203-
204- Uses official Python drivers for universal database connectivity:
205- - ** Exasol** : ` pyexasol ` - works with Docker, native, cloud, preinstalled
206- - ** ClickHouse** : ` clickhouse-connect ` - works with any deployment
207-
208- ### 3. Dynamic Dependency Management
209-
210- Each system defines its own dependencies via ` get_python_dependencies() ` . Packages only include drivers for databases actually benchmarked.
211-
212- ### 4. Environment-Agnostic Templates
213-
214- Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands.
126+ | system | local | aws | docker | gcp | azure |
127+ | ------------| -------| -----| --------| -----| -------|
128+ | Exasol | ✗ | ✓ | ✗ | ✗ | ✗ |
129+ | ClickHouse | ✗ | ✓ | ✗ | ✗ | ✗ |
215130
216131## Documentation
217132
218- - 📖 [ Getting Started Guide] ( GETTING_STARTED.md ) - Installation, usage, and examples
219- - 🔧 [ Extending the Framework] ( EXTENDING.md ) - Adding systems, workloads, and features
220-
221- ## Dependencies
222-
223- Core dependencies (automatically installed):
224- - ` typer ` - CLI framework
225- - ` jinja2 ` - Template rendering
226- - ` pyyaml ` - Configuration parsing
227- - ` pandas ` - Data manipulation
228- - ` matplotlib ` - Plotting
229- - ` rich ` - CLI formatting
230- - ` boto3 ` - AWS integration (optional)
231- - ` python-dotenv ` - .env file support (optional)
232-
233- Database-specific drivers loaded dynamically based on systems used.
234-
235- ## Contributing
133+ ### For Users
236134
237- 1 . Fork the repository
238- 2 . Create a feature branch
239- 3 . Make your changes
240- 4 . Add tests for new functionality
241- 5 . Submit a pull request
135+ - 📖 [ Getting Started Guide] ( user-docs/GETTING_STARTED.md ) - Installation, usage, and examples
242136
243- ## Security
137+ ### For Developers
244138
245- - Database credentials and licenses should not be committed to the repository
246- - Use environment variables or ` .env ` file for sensitive data
247- - The framework includes basic security practices but should be reviewed for production use
139+ - 🔧 [ Extending the Framework] ( dev-docs/EXTENDING.md ) - Adding systems, workloads, and features
248140
249141## License
250142
251143This project is licensed under the MIT License - see the LICENSE file for details.
144+ All names used are copyright and owned by the respective companies.
252145
253146---
254147
0 commit comments