From 7c97f5dd03e11bb1c6b680130debb5b671171832 Mon Sep 17 00:00:00 2001 From: Stefan Reich Date: Wed, 26 Nov 2025 14:52:24 +0100 Subject: [PATCH 1/6] Restructuring documentation --- README.md | 245 ++++++------------ dev-docs/DEVELOPERS.md | 111 ++++++++ EXTENDING.md => dev-docs/EXTENDING.md | 80 ++---- .../GETTING_STARTED.md | 54 ++-- 4 files changed, 247 insertions(+), 243 deletions(-) create mode 100644 dev-docs/DEVELOPERS.md rename EXTENDING.md => dev-docs/EXTENDING.md (92%) rename GETTING_STARTED.md => user-docs/GETTING_STARTED.md (94%) diff --git a/README.md b/README.md index cf2cdbd..042c3cf 100644 --- a/README.md +++ b/README.md @@ -1,44 +1,80 @@ # Database Benchmark Report Framework -A modular framework for running and documenting database benchmarks, with a focus on comparing **Exasol** with other database systems. This repository provides reusable building blocks to launch benchmark environments, collect detailed system information, run benchmark workloads, and generate reports documenting the results. +A modular framework for running and documenting database benchmarks, with a focus on comparing **Exasol** with +other database systems. This repository provides reusable building blocks to launch benchmark environments, +collect detailed system information, run benchmark workloads, and generate reports documenting the results. ## Features - 🏗️ **Modular Architecture**: Fine-grained templates for setup, execution, and reporting -- ☁️ **Multi-Cloud Support**: AWS infrastructure automation with separate instances per database +- ☁️ **Multi-Cloud Support[^todo-cloud]**: Infrastructure automation with separate instances per database - 📊 **Benchmark Workloads**: TPC-H with support for custom workloads - 📝 **Self-Contained Reports**: Generate reproducible reports with all attachments - 🔧 **Extensible**: Easy to add new systems, workloads, and cloud providers - 📈 **Rich Visualizations**: Automated generation of performance plots and tables - 🔍 **Result Verification**: Validate query correctness against expected outputs +[^todo-cloud]: Currently, only AWS is fully supported. Local and docker-based deployments are work in progress. + +## Requirements + +- Python 3.10+ +- **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install) + ## Quick Start -```bash -# Clone the repository -git clone +```shell +# 1. Clone and enter the repository +git clone https://github.com/exasol/benchkit.git cd benchkit -# Install dependencies +# 2. Install dependencies and local package python -m pip install -e . +``` + +> [!TIP] +> You might have to set up a python virtual environment for this first. -# Run a sample benchmark +```shell +# 3. Copy and edit example environment +cp .env.example .env +$EDITOR .env + +# 4. Validate your configuration +python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml + +# 5. Run sample benchmark make all CFG=configs/exa_vs_ch_1g.yaml ``` -This will: -1. Provision cloud infrastructure (if configured) -2. Probe system information -3. Run Exasol vs ClickHouse TPC-H benchmark -4. Generate a complete report with results and reproducibility instructions +> [!CAUTION] +> Please note that the sample benchmark will use cost-incurring AWS resources, and requires your account +> to be properly set up. +> +> **Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones` +> +> 📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for detailed cloud setup instructions.** + +> [!NOTE] +> Currently, the `env` section of the sample benchmark contains references to AWS key pair name and +> ssh key files. You will also have to edit those parts accordingly. -📖 **See [Getting Started Guide](GETTING_STARTED.md) for detailed installation and usage instructions.** +```shell +# 6. Clean up AWS resources +make infra-destroy CFG=configs/exa_vs_ch_1g.yaml + +# 7. view benchmark report +...TBD +``` ## Usage The framework provides 9 commands for complete benchmark lifecycle management: ```bash +# Manage infrastructure +benchkit infra apply --provider aws --config configs/my_benchmark.yaml + # System information collection benchkit probe --config configs/my_benchmark.yaml @@ -48,207 +84,84 @@ benchkit run --config configs/my_benchmark.yaml [--systems exasol] [--queries Q0 # Generate reports benchkit report --config configs/my_benchmark.yaml -# Manage infrastructure -benchkit infra apply --provider aws --config configs/my_benchmark.yaml - # Other commands: execute, status, package, verify, cleanup ``` **Status Command** provides comprehensive project insights: + - Overview of all projects (probe, benchmark, report status) - Detailed status for specific configs (system info, infrastructure, timing) - Cloud infrastructure details (IPs, connection strings) - Multiple config support and smart project lookup -📖 **See [Getting Started Guide](GETTING_STARTED.md) for comprehensive CLI documentation and examples.** +📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for comprehensive CLI documentation and examples.** -## Repository Structure +## Repository Structure (User Version) ``` benchkit/ ├── benchkit/ # Core framework -│ ├── cli.py # Command-line interface (9 commands) -│ ├── systems/ # Database system implementations -│ ├── workloads/ # Benchmark workloads (TPC-H) -│ ├── gather/ # System information collection -│ ├── run/ # Benchmark execution -│ ├── report/ # Report generation -│ ├── infra/ # Cloud infrastructure management -│ ├── package/ # Minimal package creation -│ └── verify/ # Result verification -├── templates/ # Jinja2 templates for reports ├── configs/ # Benchmark configurations -├── infra/aws/ # AWS Terraform modules -├── workloads/tpch/ # TPC-H queries and schemas └── results/ # Generated results (auto-created) ``` -## Configuration Example - -```yaml -project_id: "exasol_vs_clickhouse_tpch" -title: "Exasol vs ClickHouse Performance on TPC-H" - -env: - mode: "aws" - region: "eu-west-1" - instances: - exasol: - instance_type: "m7i.4xlarge" - clickhouse: - instance_type: "m7i.4xlarge" - -systems: - - name: "exasol" - kind: "exasol" - version: "2025.1.0" - setup: - method: "installer" - extra: - dbram: "32g" - - - name: "clickhouse" - kind: "clickhouse" - version: "24.12" - setup: - method: "native" - extra: - memory_limit: "32g" - -workload: - name: "tpch" - scale_factor: 1 - queries: - include: ["Q01", "Q03", "Q06", "Q13"] - runs_per_query: 3 - warmup_runs: 1 -``` +See [Developer Guide](dev-docs/DEVELOPERS.md) for a more detailed structure definition. -📖 **See [Getting Started Guide](GETTING_STARTED.md) for more configuration examples.** +## Defining Your Own Benchmarks -## Requirements +You can easily create your own benchmark by creating a yaml configuration file combining -- Python 3.10+ -- **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install) -- At least 16GB RAM (32GB+ recommended for larger benchmarks) -- SSD storage recommended - -### AWS Setup (Optional) +- One infrastructure provider (aws/docker/local/...) +- Multiple systems (software) to be tested +- Infrastructure definition per system (e.g. AWS instance types) -For cloud deployments, configure AWS credentials: - -```bash -# Create .env file (recommended) -cat > .env << EOF -AWS_PROFILE=default-mfa -AWS_REGION=eu-west-1 -EOF -``` - -**Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones` - -📖 **See [Getting Started Guide](GETTING_STARTED.md) for detailed cloud setup instructions.** +📖 See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create +benchmark configurations using supported modules. ## Extending the Framework -The framework is designed for easy extension: - -### Quick Example: Adding a New Database System - -1. Create `benchkit/systems/newsystem.py`: - -```python -from .base import SystemUnderTest +The framework is designed for easy extension. -class NewSystem(SystemUnderTest): - @classmethod - def get_python_dependencies(cls) -> list[str]: - return ["newsystem-driver>=1.0.0"] - - def execute_query(self, query: str, query_name: str = None): - # Use native Python driver for universal connectivity - pass - - # ... implement other required methods -``` - -2. Register in `benchkit/systems/__init__.py`: - -```python -SYSTEM_IMPLEMENTATIONS = { - "exasol": "ExasolSystem", - "clickhouse": "ClickHouseSystem", - "newsystem": "NewSystem", # Add this line -} -``` +📖 **See [Extending the Framework](dev-docs/EXTENDING.md) for comprehensive guides on:** -📖 **See [Extending the Framework](EXTENDING.md) for comprehensive guides on:** - Adding new database systems - Creating custom workloads - Adding cloud providers - Customizing reports and visualizations - Implementing result verification -## Key Design Principles - -### 1. Self-Contained Reports - -Every report is a complete directory with: -- All result data as attachments -- Exact configuration files -- Minimal reproduction package -- Complete setup commands +## Support Matrix -### 2. Installation-Independent Connectivity +### setup / installation -Uses official Python drivers for universal database connectivity: -- **Exasol**: `pyexasol` - works with Docker, native, cloud, preinstalled -- **ClickHouse**: `clickhouse-connect` - works with any deployment +| system | local | aws | docker | gcp | azure | +|------------|-------|-----------------|--------|-----|-------| +| Exasol | ✗ | ✓[^single-node] | ✗ | ✗ | ✗ | +| ClickHouse | ✗ | ✓[^single-node] | ✗ | ✗ | ✗ | -### 3. Dynamic Dependency Management +[^single-node]: Single-node system support for now. -Each system defines its own dependencies via `get_python_dependencies()`. Packages only include drivers for databases actually benchmarked. +### tcph workload -### 4. Environment-Agnostic Templates - -Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands. +| system | local | aws | docker | gcp | azure | +|------------|-------|-----|--------|-----|-------| +| Exasol | ✗ | ✓ | ✗ | ✗ | ✗ | +| ClickHouse | ✗ | ✓ | ✗ | ✗ | ✗ | ## Documentation -- 📖 [Getting Started Guide](GETTING_STARTED.md) - Installation, usage, and examples -- 🔧 [Extending the Framework](EXTENDING.md) - Adding systems, workloads, and features - -## Dependencies - -Core dependencies (automatically installed): -- `typer` - CLI framework -- `jinja2` - Template rendering -- `pyyaml` - Configuration parsing -- `pandas` - Data manipulation -- `matplotlib` - Plotting -- `rich` - CLI formatting -- `boto3` - AWS integration (optional) -- `python-dotenv` - .env file support (optional) - -Database-specific drivers loaded dynamically based on systems used. - -## Contributing +### For Users -1. Fork the repository -2. Create a feature branch -3. Make your changes -4. Add tests for new functionality -5. Submit a pull request +- 📖 [Getting Started Guide](user-docs/GETTING_STARTED.md) - Installation, usage, and examples -## Security +### For Developers -- Database credentials and licenses should not be committed to the repository -- Use environment variables or `.env` file for sensitive data -- The framework includes basic security practices but should be reviewed for production use +- 🔧 [Extending the Framework](dev-docs/EXTENDING.md) - Adding systems, workloads, and features ## License This project is licensed under the MIT License - see the LICENSE file for details. +All names used are copyright and owned by the respective companies. --- diff --git a/dev-docs/DEVELOPERS.md b/dev-docs/DEVELOPERS.md new file mode 100644 index 0000000..a14a7d8 --- /dev/null +++ b/dev-docs/DEVELOPERS.md @@ -0,0 +1,111 @@ +# Main Developers Guide + +## Key Design Principles + +### 1. Self-Contained Reports + +Every report is a complete directory with: +- All result data as attachments +- Full configuration files being used +- Minimal reproduction package +- Complete setup commands + +### 2. Installation-Independent Connectivity + +Uses official Python drivers for universal database connectivity: + +- **Exasol**: `pyexasol` - works with Docker, native, cloud, preinstalled +- **ClickHouse**: `clickhouse-connect` - works with any deployment + +### 3. Dynamic Dependency Management + +Each system defines its own dependencies via `get_python_dependencies()`. Packages only include drivers for databases actually benchmarked. + +### 4. Environment-Agnostic Templates + +Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands. + +## Repository Structure + +``` +benchkit/ +├── benchkit/ # Core framework +│ ├── cli.py # Command-line interface (9 commands) +│ ├── systems/ # Database system implementations +│ ├── workloads/ # Benchmark workloads (TPC-H) +│ ├── gather/ # System information collection +│ ├── run/ # Benchmark execution +│ ├── report/ # Report generation +│ ├── infra/ # Cloud infrastructure management +│ ├── package/ # Minimal package creation +│ └── verify/ # Result verification +├── templates/ # Jinja2 templates for reports +├── configs/ # Benchmark configurations +├── infra/aws/ # AWS Terraform modules +├── workloads/tpch/ # TPC-H queries and schemas +└── results/ # Generated results (auto-created) +``` + +## Adding New Systems + +See [Extending Guide](EXTENDING.md) + +## Adding New Infrastructure Providers + +See [Extending Guide](EXTENDING.md) + +## Adding New Workloads + +See [Extending Guide](EXTENDING.md) + +## Best Practices + +### Code Quality + +1. **Follow existing patterns**: Study `ExasolSystem` and `ClickHouseSystem` implementations +2. **Error handling**: Always include proper error handling and logging +3. **Documentation**: Add docstrings explaining complex logic +4. **Type hints**: Use type hints for better code clarity + +### Installation Independence + +1. **Use Python drivers**: Prefer official Python drivers over CLI tools +2. **Universal connectivity**: Code should work with Docker, native, cloud, preinstalled +3. **Graceful fallback**: Provide fallback mechanisms when drivers unavailable + +### Dynamic Dependencies + +1. **Implement `get_python_dependencies()`**: Each system declares its dependencies +2. **Minimal packages**: Only include what's needed for the specific benchmark +3. **Version pinning**: Specify minimum versions for dependencies + +### Testing + +1. **Unit tests**: Create tests for new functionality in `tests/` +2. **Integration tests**: Test with actual database systems when possible +3. **Cross-environment**: Test across Docker, native, and cloud deployments + +### Configuration + +1. **Validation**: Add configuration validation for new parameters +2. **Defaults**: Provide sensible defaults for optional parameters +3. **Documentation**: Document all configuration options + +### Security + +1. **Credentials**: Never commit credentials or sensitive data +2. **Input validation**: Validate all user inputs +3. **Least privilege**: Use minimal required permissions + +### Extension Checklist + +When adding a new component, verify: + +- [ ] Follows base class interface +- [ ] Implements `get_python_dependencies()` (for systems) +- [ ] Configuration validation includes new parameters +- [ ] Documentation updated +- [ ] Tests added for new functionality +- [ ] Error handling implemented +- [ ] Resource cleanup implemented +- [ ] Works across deployment methods diff --git a/EXTENDING.md b/dev-docs/EXTENDING.md similarity index 92% rename from EXTENDING.md rename to dev-docs/EXTENDING.md index 1d94f16..0dc3828 100644 --- a/EXTENDING.md +++ b/dev-docs/EXTENDING.md @@ -4,6 +4,8 @@ This guide explains how to extend the database benchmark framework with new syst ## Table of Contents +- [Dependencies](#dependencies) +- [Contributing](#contributing) - [Adding New Database Systems](#adding-new-database-systems) - [Adding New Workloads](#adding-new-workloads) - [Adding Cloud Providers](#adding-cloud-providers) @@ -11,6 +13,28 @@ This guide explains how to extend the database benchmark framework with new syst - [Adding Result Verification](#adding-result-verification) - [Best Practices](#best-practices) +## Dependencies + +Core dependencies (automatically installed): +- `typer` - CLI framework +- `jinja2` - Template rendering +- `pyyaml` - Configuration parsing +- `pandas` - Data manipulation +- `matplotlib` - Plotting +- `rich` - CLI formatting +- `boto3` - AWS integration (optional) +- `python-dotenv` - .env file support (optional) + +Database-specific drivers loaded dynamically based on systems used. + +## Contributing + +1. Fork the repository +2. Create a feature branch +3. Make your changes +4. Add tests for new functionality +5. Submit a pull request + ## Adding New Database Systems ### Overview @@ -768,61 +792,5 @@ class ResultVerifier: The `verify` command is already implemented in `benchkit/cli.py`. Extend the verification logic in `benchkit/verify/__init__.py`. -## Best Practices - -### Code Quality - -1. **Follow existing patterns**: Study `ExasolSystem` and `ClickHouseSystem` implementations -2. **Error handling**: Always include proper error handling and logging -3. **Documentation**: Add docstrings explaining complex logic -4. **Type hints**: Use type hints for better code clarity - -### Installation Independence - -1. **Use Python drivers**: Prefer official Python drivers over CLI tools -2. **Universal connectivity**: Code should work with Docker, native, cloud, preinstalled -3. **Graceful fallback**: Provide fallback mechanisms when drivers unavailable - -### Dynamic Dependencies - -1. **Implement `get_python_dependencies()`**: Each system declares its dependencies -2. **Minimal packages**: Only include what's needed for the specific benchmark -3. **Version pinning**: Specify minimum versions for dependencies - -### Testing - -1. **Unit tests**: Create tests for new functionality in `tests/` -2. **Integration tests**: Test with actual database systems when possible -3. **Cross-environment**: Test across Docker, native, and cloud deployments - -### Configuration - -1. **Validation**: Add configuration validation for new parameters -2. **Defaults**: Provide sensible defaults for optional parameters -3. **Documentation**: Document all configuration options - -### Security - -1. **Credentials**: Never commit credentials or sensitive data -2. **Input validation**: Validate all user inputs -3. **Least privilege**: Use minimal required permissions - -### Extension Checklist - -When adding a new component, verify: - -- [ ] Follows base class interface -- [ ] Implements `get_python_dependencies()` (for systems) -- [ ] Configuration validation includes new parameters -- [ ] Documentation updated -- [ ] Tests added for new functionality -- [ ] Error handling implemented -- [ ] Resource cleanup implemented -- [ ] Works across deployment methods - -## References - -- [Getting Started Guide](GETTING_STARTED.md) - Basic usage instructions -- [README](../README.md) - Quick start and overview This extensible design allows the framework to grow and adapt to new requirements while maintaining consistency and reliability across all components. diff --git a/GETTING_STARTED.md b/user-docs/GETTING_STARTED.md similarity index 94% rename from GETTING_STARTED.md rename to user-docs/GETTING_STARTED.md index adf76f9..5db9b05 100644 --- a/GETTING_STARTED.md +++ b/user-docs/GETTING_STARTED.md @@ -16,17 +16,22 @@ This comprehensive guide will help you install, configure, and run your first da ## Prerequisites -### System Requirements +### System Requirements (Benchmark Host) -- **Operating System**: Linux (Ubuntu 20.04+ recommended) +- **Operating System**: Linux (Ubuntu 22.04+ recommended) - **Python**: 3.10 or higher -- **Memory**: 16GB RAM minimum (32GB+ recommended for larger benchmarks) -- **Storage**: 100GB+ free space (SSD recommended) -- **Docker**: Optional, for containerized database systems +- **Memory**: 2GB RAM minimum -### Software Dependencies +### System Requirements (System Host) -```bash +- **Operating System**: Linux (Ubuntu 22.04+ recommended) +- **Python**: 3.10 or higher +- **Memory**: Depends on benchmark settings +- **Storage**: Depends on benchmark settings + +### Software Dependencies (ubuntu syntax, when starting at zero) + +```shell # Update system packages sudo apt-get update && sudo apt-get upgrade -y @@ -46,16 +51,16 @@ sudo usermod -aG docker $USER ### 1. Clone the Repository -```bash -git clone +```shell +git clone https://github.com/exasol/benchkit.git cd benchkit ``` ### 2. Set Up Python Environment -```bash +```shell # Create virtual environment -python3 -m venv .venv +python3 -m venv --system-site-packages .venv source .venv/bin/activate # Install the framework @@ -67,7 +72,7 @@ benchkit --help You should see the framework's help message with 9 available commands. -### 3. Install TPC-H Tools (Optional) +### 3. Install TPC-H Tools (Optional/Obsolete?) For TPC-H benchmarks, install the data generation tools: @@ -94,12 +99,11 @@ cat configs/exa_vs_ch_1g.yaml ``` This configuration defines: -- Two systems: Exasol and ClickHouse -- TPC-H workload at scale factor 100 -- Selected queries: Q01, Q03, Q05, Q06, Q09, Q12, Q13, Q18, Q22 -- 3 runs per query with 1 warmup run +- Three systems: Exasol, ClickHouse and Clickhouse with tuned queries +- TPC-H workload at scale factor 1 +- 7 runs per query with 1 warmup run -### 2. Prepare Data Directory +### 2. Prepare Data Directory (Obsolete ?) ```bash # Create data directory @@ -114,6 +118,9 @@ mkdir -p /data/{exasol,clickhouse,tpch} #### Option A: Run Everything at Once +The included Makefile provides some shortcuts combining multiple calls to `benchkit`. +Run `make` without arguments to get acommand overview. + ```bash # Run the complete benchmark pipeline make all CFG=configs/exa_vs_ch_1g.yaml @@ -122,7 +129,8 @@ make all CFG=configs/exa_vs_ch_1g.yaml This will: 1. Probe system information 2. Run the benchmark -3. Generate a report report +3. Generate a report +4. Leave the cloud infrastructure running #### Option B: Run Step by Step @@ -169,6 +177,9 @@ benchkit probe --config configs/my_benchmark.yaml --debug **Output**: Creates `results//system.json` (or `system_.json` for cloud setups) +> [!IMPORTANT] +> Note that `probe` will automatically call `infra apply` if necessary, possibly starting cost-incurring services. + ### 2. `run` - Execute Benchmarks Execute benchmarks against configured database systems: @@ -364,7 +375,8 @@ benchkit verify --config configs/my_benchmark.yaml --systems exasol benchkit verify --config configs/my_benchmark.yaml --debug ``` -**Note**: Requires expected results in `workloads//expected/` +> [!NOTE] +> Requires expected results in `workloads//expected/` ### 9. `cleanup` - System Cleanup @@ -938,7 +950,7 @@ results/my-benchmark/reports// Once you have a basic benchmark working: 1. **Explore Configurations**: Try different scale factors and query sets -2. **Add Systems**: See [EXTENDING.md](EXTENDING.md) for adding new databases +2. **Add Systems**: See [EXTENDING.md](../dev-docs/EXTENDING.md) for adding new databases 3. **Custom Workloads**: Create domain-specific benchmarks 4. **Automate**: Set up CI/CD pipelines for regular benchmarking 5. **Share Results**: Publish your benchmark methodology and results @@ -946,6 +958,6 @@ Once you have a basic benchmark working: ## Additional Resources - [README](../README.md) - Framework overview and quick reference -- [Extending the Framework](EXTENDING.md) - Add systems, workloads, and features +- [Extending the Framework](../dev-docs/EXTENDING.md) - Add systems, workloads, and features This framework provides a solid foundation for database benchmarking. Start with the simple examples above and gradually explore more advanced features as you become comfortable with the system. From 9b7d07b7a8e64105fa0aca3d6779bf857ea2c99b Mon Sep 17 00:00:00 2001 From: Stefan Reich Date: Wed, 26 Nov 2025 16:16:49 +0100 Subject: [PATCH 2/6] more doc shuffling --- README.md | 55 ++++++++++++++---------------------- user-docs/GETTING_STARTED.md | 3 ++ 2 files changed, 24 insertions(+), 34 deletions(-) diff --git a/README.md b/README.md index 042c3cf..6f88158 100644 --- a/README.md +++ b/README.md @@ -7,15 +7,13 @@ collect detailed system information, run benchmark workloads, and generate repor ## Features - 🏗️ **Modular Architecture**: Fine-grained templates for setup, execution, and reporting -- ☁️ **Multi-Cloud Support[^todo-cloud]**: Infrastructure automation with separate instances per database +- ☁️ **Multi-Cloud Support**: Infrastructure automation with separate instances per database - 📊 **Benchmark Workloads**: TPC-H with support for custom workloads - 📝 **Self-Contained Reports**: Generate reproducible reports with all attachments - 🔧 **Extensible**: Easy to add new systems, workloads, and cloud providers - 📈 **Rich Visualizations**: Automated generation of performance plots and tables - 🔍 **Result Verification**: Validate query correctness against expected outputs -[^todo-cloud]: Currently, only AWS is fully supported. Local and docker-based deployments are work in progress. - ## Requirements - Python 3.10+ @@ -27,39 +25,40 @@ collect detailed system information, run benchmark workloads, and generate repor # 1. Clone and enter the repository git clone https://github.com/exasol/benchkit.git cd benchkit - -# 2. Install dependencies and local package -python -m pip install -e . ``` > [!TIP] -> You might have to set up a python virtual environment for this first. +> You might have to set up a python virtual environment before running the next command. ```shell +# 2. Install dependencies and local package +python -m pip install -e . + # 3. Copy and edit example environment cp .env.example .env $EDITOR .env +``` +> [!TIP] +> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md) +> for detailed cloud setup instructions. + +```shell # 4. Validate your configuration python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml - -# 5. Run sample benchmark -make all CFG=configs/exa_vs_ch_1g.yaml ``` > [!CAUTION] -> Please note that the sample benchmark will use cost-incurring AWS resources, and requires your account -> to be properly set up. -> -> **Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones` -> -> 📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for detailed cloud setup instructions.** +> Please note that running the sample benchmark will use cost-incurring AWS resources. > [!NOTE] > Currently, the `env` section of the sample benchmark contains references to AWS key pair name and > ssh key files. You will also have to edit those parts accordingly. ```shell +# 5. Run sample benchmark +make all CFG=configs/exa_vs_ch_1g.yaml + # 6. Clean up AWS resources make infra-destroy CFG=configs/exa_vs_ch_1g.yaml @@ -118,30 +117,18 @@ You can easily create your own benchmark by creating a yaml configuration file c 📖 See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create benchmark configurations using supported modules. -## Extending the Framework - -The framework is designed for easy extension. - -📖 **See [Extending the Framework](dev-docs/EXTENDING.md) for comprehensive guides on:** - -- Adding new database systems -- Creating custom workloads -- Adding cloud providers -- Customizing reports and visualizations -- Implementing result verification - ## Support Matrix ### setup / installation -| system | local | aws | docker | gcp | azure | -|------------|-------|-----------------|--------|-----|-------| -| Exasol | ✗ | ✓[^single-node] | ✗ | ✗ | ✗ | -| ClickHouse | ✗ | ✓[^single-node] | ✗ | ✗ | ✗ | +| system | local | aws | docker | gcp | azure | +|------------|-------|------|--------|-----|-------| +| Exasol | ✗ | ✓^1^ | ✗ | ✗ | ✗ | +| ClickHouse | ✗ | ✓^1^ | ✗ | ✗ | ✗ | -[^single-node]: Single-node system support for now. +^1^ Only single-node deployments supported at this time. -### tcph workload +### "tpch" workload | system | local | aws | docker | gcp | azure | |------------|-------|-----|--------|-----|-------| diff --git a/user-docs/GETTING_STARTED.md b/user-docs/GETTING_STARTED.md index 5db9b05..b048dd4 100644 --- a/user-docs/GETTING_STARTED.md +++ b/user-docs/GETTING_STARTED.md @@ -534,6 +534,9 @@ workload: ### AWS Deployment +> [!NOTE] +> **Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones` + #### 1. Configure AWS Credentials Choose one of these methods: From 86da7903b4ee7f042a0736713b3141f51a6d4cb5 Mon Sep 17 00:00:00 2001 From: Stefan Reich Date: Wed, 26 Nov 2025 16:21:08 +0100 Subject: [PATCH 3/6] shorten quickstart --- README.md | 29 ++++++++++------------------- 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 6f88158..f092e3f 100644 --- a/README.md +++ b/README.md @@ -21,41 +21,32 @@ collect detailed system information, run benchmark workloads, and generate repor ## Quick Start +> [!TIP] +> You might have to set up a python virtual environment for installing python packages. + +> [!CAUTION] +> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md) +> for detailed cloud setup instructions. +> Note that AWS infrastructure is usually not free to use. + ```shell # 1. Clone and enter the repository git clone https://github.com/exasol/benchkit.git cd benchkit -``` -> [!TIP] -> You might have to set up a python virtual environment before running the next command. - -```shell # 2. Install dependencies and local package python -m pip install -e . # 3. Copy and edit example environment cp .env.example .env $EDITOR .env -``` -> [!TIP] -> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md) -> for detailed cloud setup instructions. +# 3b. (temporary) fix hardcoded ssh-key names in 'env' section of configuration +$EDITOR configs/exa_vs_ch_1g.yaml -```shell # 4. Validate your configuration python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml -``` - -> [!CAUTION] -> Please note that running the sample benchmark will use cost-incurring AWS resources. -> [!NOTE] -> Currently, the `env` section of the sample benchmark contains references to AWS key pair name and -> ssh key files. You will also have to edit those parts accordingly. - -```shell # 5. Run sample benchmark make all CFG=configs/exa_vs_ch_1g.yaml From 21f997ea9750d46bf8f8afd0c12e4137c8488f19 Mon Sep 17 00:00:00 2001 From: Stefan Reich Date: Wed, 26 Nov 2025 16:23:28 +0100 Subject: [PATCH 4/6] fix superscript --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index f092e3f..34bd93e 100644 --- a/README.md +++ b/README.md @@ -112,12 +112,14 @@ benchmark configurations using supported modules. ### setup / installation -| system | local | aws | docker | gcp | azure | -|------------|-------|------|--------|-----|-------| -| Exasol | ✗ | ✓^1^ | ✗ | ✗ | ✗ | -| ClickHouse | ✗ | ✓^1^ | ✗ | ✗ | ✗ | +| system | local | aws | docker | gcp | azure | +|------------|-------|---------------|--------|-----|-------| +| Exasol | ✗ | ✓1 | ✗ | ✗ | ✗ | +| ClickHouse | ✗ | ✓1 | ✗ | ✗ | ✗ | -^1^ Only single-node deployments supported at this time. +Notes: + +1. Only single-node deployments supported at this time. ### "tpch" workload From e959fa47763805656c7ec456435406ec91a99a20 Mon Sep 17 00:00:00 2001 From: Stefan Reich Date: Thu, 27 Nov 2025 09:48:28 +0100 Subject: [PATCH 5/6] more shuffling --- README.md | 6 +++--- dev-docs/DEVELOPERS.md | 25 ++++++++++++++++++++----- 2 files changed, 23 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 34bd93e..4de5831 100644 --- a/README.md +++ b/README.md @@ -102,11 +102,11 @@ See [Developer Guide](dev-docs/DEVELOPERS.md) for a more detailed structure defi You can easily create your own benchmark by creating a yaml configuration file combining - One infrastructure provider (aws/docker/local/...) +- One workload (benchmark type) to be executed - Multiple systems (software) to be tested -- Infrastructure definition per system (e.g. AWS instance types) -📖 See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create -benchmark configurations using supported modules. +📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create +benchmark configurations using supported modules.** ## Support Matrix diff --git a/dev-docs/DEVELOPERS.md b/dev-docs/DEVELOPERS.md index a14a7d8..467de14 100644 --- a/dev-docs/DEVELOPERS.md +++ b/dev-docs/DEVELOPERS.md @@ -46,17 +46,32 @@ benchkit/ └── results/ # Generated results (auto-created) ``` +## Adding New Workloads + +A workload defines the contents of the benchmark, in terms of + +- **data model**, typically a set of DDL queries stored under `workloads/` +- **table data**, generated or otherwise defined by code under `benchkit/workloads/` +- **query execution logic**, defined by code under `benchkit/workloads/` +- **benchmark queries**, stored as SQL files under `workloads/` + +📖 **See [Extending Guide](EXTENDING.md) for details** + ## Adding New Systems -See [Extending Guide](EXTENDING.md) +Systems are defined by python code at `benchkit/systems/`, which needs to provide methods to -## Adding New Infrastructure Providers +- **deploy** the software on supported infrastructure providers +- **configure** the software according to the benchmark configuration +- **execute SQL** statements +- **load data** (CSV) into tables -See [Extending Guide](EXTENDING.md) +📖 **See [Extending Guide](EXTENDING.md) for details** -## Adding New Workloads +## Adding New Infrastructure Providers + +📖 **See [Extending Guide](EXTENDING.md) for details** -See [Extending Guide](EXTENDING.md) ## Best Practices From 6dd1423b276fdb0edba31e6e2f6212624270e394 Mon Sep 17 00:00:00 2001 From: Stefan Reich Date: Thu, 27 Nov 2025 09:54:04 +0100 Subject: [PATCH 6/6] more shuffling --- dev-docs/DEVELOPERS.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/dev-docs/DEVELOPERS.md b/dev-docs/DEVELOPERS.md index 467de14..0500032 100644 --- a/dev-docs/DEVELOPERS.md +++ b/dev-docs/DEVELOPERS.md @@ -57,6 +57,11 @@ A workload defines the contents of the benchmark, in terms of 📖 **See [Extending Guide](EXTENDING.md) for details** +## Adding Query Variants To Existing Workloads + +> [!NOTE] +> Section needs content. + ## Adding New Systems Systems are defined by python code at `benchkit/systems/`, which needs to provide methods to @@ -70,6 +75,9 @@ Systems are defined by python code at `benchkit/systems/`, which needs to provid ## Adding New Infrastructure Providers +> [!NOTE] +> Section needs content. + 📖 **See [Extending Guide](EXTENDING.md) for details**