From 7c97f5dd03e11bb1c6b680130debb5b671171832 Mon Sep 17 00:00:00 2001
From: Stefan Reich <stefan.reich@exasol.com>
Date: Wed, 26 Nov 2025 14:52:24 +0100
Subject: [PATCH 1/6] Restructuring documentation

---
 README.md                                     | 245 ++++++------------
 dev-docs/DEVELOPERS.md                        | 111 ++++++++
 EXTENDING.md => dev-docs/EXTENDING.md         |  80 ++----
 .../GETTING_STARTED.md                        |  54 ++--
 4 files changed, 247 insertions(+), 243 deletions(-)
 create mode 100644 dev-docs/DEVELOPERS.md
 rename EXTENDING.md => dev-docs/EXTENDING.md (92%)
 rename GETTING_STARTED.md => user-docs/GETTING_STARTED.md (94%)
diff --git a/README.md b/README.md
index cf2cdbd..042c3cf 100644
--- a/README.md
+++ b/README.md
@@ -1,44 +1,80 @@
 # Database Benchmark Report Framework
 
-A modular framework for running and documenting database benchmarks, with a focus on comparing **Exasol** with other database systems. This repository provides reusable building blocks to launch benchmark environments, collect detailed system information, run benchmark workloads, and generate reports documenting the results.
+A modular framework for running and documenting database benchmarks, with a focus on comparing **Exasol** with
+other database systems. This repository provides reusable building blocks to launch benchmark environments,
+collect detailed system information, run benchmark workloads, and generate reports documenting the results.
 
 ## Features
 
 - 🏗️ **Modular Architecture**: Fine-grained templates for setup, execution, and reporting
-- ☁️ **Multi-Cloud Support**: AWS infrastructure automation with separate instances per database
+- ☁️ **Multi-Cloud Support[^todo-cloud]**: Infrastructure automation with separate instances per database
 - 📊 **Benchmark Workloads**: TPC-H with support for custom workloads
 - 📝 **Self-Contained Reports**: Generate reproducible reports with all attachments
 - 🔧 **Extensible**: Easy to add new systems, workloads, and cloud providers
 - 📈 **Rich Visualizations**: Automated generation of performance plots and tables
 - 🔍 **Result Verification**: Validate query correctness against expected outputs
 
+[^todo-cloud]: Currently, only AWS is fully supported. Local and docker-based deployments are work in progress.
+
+## Requirements
+
+- Python 3.10+
+- **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install)
+
 ## Quick Start
 
-```bash
-# Clone the repository
-git clone <repository-url>
+```shell
+# 1. Clone and enter the repository
+git clone https://github.com/exasol/benchkit.git
 cd benchkit
 
-# Install dependencies
+# 2. Install dependencies and local package
 python -m pip install -e .
+```
+
+> [!TIP]
+> You might have to set up a python virtual environment for this first.
 
-# Run a sample benchmark
+```shell
+# 3. Copy and edit example environment
+cp .env.example .env
+$EDITOR .env
+
+# 4. Validate your configuration
+python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml
+
+# 5. Run sample benchmark
 make all CFG=configs/exa_vs_ch_1g.yaml
 ```
 
-This will:
-1. Provision cloud infrastructure (if configured)
-2. Probe system information
-3. Run Exasol vs ClickHouse TPC-H benchmark
-4. Generate a complete report with results and reproducibility instructions
+> [!CAUTION]
+> Please note that the sample benchmark will use cost-incurring AWS resources, and requires your account
+> to be properly set up.
+> 
+> **Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones`
+>
+> 📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for detailed cloud setup instructions.**
+
+> [!NOTE]
+> Currently, the `env` section of the sample benchmark contains references to AWS key pair name and
+> ssh key files. You will also have to edit those parts accordingly.
 
-📖 **See [Getting Started Guide](GETTING_STARTED.md) for detailed installation and usage instructions.**
+```shell
+# 6. Clean up AWS resources
+make infra-destroy CFG=configs/exa_vs_ch_1g.yaml
+
+# 7. view benchmark report
+...TBD
+```
 
 ## Usage
 
 The framework provides 9 commands for complete benchmark lifecycle management:
 
 ```bash
+# Manage infrastructure
+benchkit infra apply --provider aws --config configs/my_benchmark.yaml
+
 # System information collection
 benchkit probe --config configs/my_benchmark.yaml
 
@@ -48,207 +84,84 @@ benchkit run --config configs/my_benchmark.yaml [--systems exasol] [--queries Q0
 # Generate reports
 benchkit report --config configs/my_benchmark.yaml
 
-# Manage infrastructure
-benchkit infra apply --provider aws --config configs/my_benchmark.yaml
-
 # Other commands: execute, status, package, verify, cleanup
 ```
 
 **Status Command** provides comprehensive project insights:
+
 - Overview of all projects (probe, benchmark, report status)
 - Detailed status for specific configs (system info, infrastructure, timing)
 - Cloud infrastructure details (IPs, connection strings)
 - Multiple config support and smart project lookup
 
-📖 **See [Getting Started Guide](GETTING_STARTED.md) for comprehensive CLI documentation and examples.**
+📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for comprehensive CLI documentation and examples.**
 
-## Repository Structure
+## Repository Structure (User Version)
 
 ```
 benchkit/
 ├── benchkit/                  # Core framework
-│   ├── cli.py                 # Command-line interface (9 commands)
-│   ├── systems/               # Database system implementations
-│   ├── workloads/             # Benchmark workloads (TPC-H)
-│   ├── gather/                # System information collection
-│   ├── run/                   # Benchmark execution
-│   ├── report/                # Report generation
-│   ├── infra/                 # Cloud infrastructure management
-│   ├── package/               # Minimal package creation
-│   └── verify/                # Result verification
-├── templates/                 # Jinja2 templates for reports
 ├── configs/                   # Benchmark configurations
-├── infra/aws/                 # AWS Terraform modules
-├── workloads/tpch/            # TPC-H queries and schemas
 └── results/                   # Generated results (auto-created)
 ```
 
-## Configuration Example
-
-```yaml
-project_id: "exasol_vs_clickhouse_tpch"
-title: "Exasol vs ClickHouse Performance on TPC-H"
-
-env:
-  mode: "aws"
-  region: "eu-west-1"
-  instances:
-    exasol:
-      instance_type: "m7i.4xlarge"
-    clickhouse:
-      instance_type: "m7i.4xlarge"
-
-systems:
-  - name: "exasol"
-    kind: "exasol"
-    version: "2025.1.0"
-    setup:
-      method: "installer"
-      extra:
-        dbram: "32g"
-
-  - name: "clickhouse"
-    kind: "clickhouse"
-    version: "24.12"
-    setup:
-      method: "native"
-      extra:
-        memory_limit: "32g"
-
-workload:
-  name: "tpch"
-  scale_factor: 1
-  queries:
-    include: ["Q01", "Q03", "Q06", "Q13"]
-  runs_per_query: 3
-  warmup_runs: 1
-```
+See [Developer Guide](dev-docs/DEVELOPERS.md) for a more detailed structure definition.
 
-📖 **See [Getting Started Guide](GETTING_STARTED.md) for more configuration examples.**
+## Defining Your Own Benchmarks
 
-## Requirements
+You can easily create your own benchmark by creating a yaml configuration file combining
 
-- Python 3.10+
-- **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install)
-- At least 16GB RAM (32GB+ recommended for larger benchmarks)
-- SSD storage recommended
-
-### AWS Setup (Optional)
+- One infrastructure provider (aws/docker/local/...)
+- Multiple systems (software) to be tested
+- Infrastructure definition per system (e.g. AWS instance types)
 
-For cloud deployments, configure AWS credentials:
-
-```bash
-# Create .env file (recommended)
-cat > .env << EOF
-AWS_PROFILE=default-mfa
-AWS_REGION=eu-west-1
-EOF
-```
-
-**Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones`
-
-📖 **See [Getting Started Guide](GETTING_STARTED.md) for detailed cloud setup instructions.**
+📖 See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create
+benchmark configurations using supported modules.
 
 ## Extending the Framework
 
-The framework is designed for easy extension:
-
-### Quick Example: Adding a New Database System
-
-1. Create `benchkit/systems/newsystem.py`:
-
-```python
-from .base import SystemUnderTest
+The framework is designed for easy extension.
 
-class NewSystem(SystemUnderTest):
-    @classmethod
-    def get_python_dependencies(cls) -> list[str]:
-        return ["newsystem-driver>=1.0.0"]
-    
-    def execute_query(self, query: str, query_name: str = None):
-        # Use native Python driver for universal connectivity
-        pass
-    
-    # ... implement other required methods
-```
-
-2. Register in `benchkit/systems/__init__.py`:
-
-```python
-SYSTEM_IMPLEMENTATIONS = {
-    "exasol": "ExasolSystem",
-    "clickhouse": "ClickHouseSystem",
-    "newsystem": "NewSystem",  # Add this line
-}
-```
+📖 **See [Extending the Framework](dev-docs/EXTENDING.md) for comprehensive guides on:**
 
-📖 **See [Extending the Framework](EXTENDING.md) for comprehensive guides on:**
 - Adding new database systems
 - Creating custom workloads
 - Adding cloud providers
 - Customizing reports and visualizations
 - Implementing result verification
 
-## Key Design Principles
-
-### 1. Self-Contained Reports
-
-Every report is a complete directory with:
-- All result data as attachments
-- Exact configuration files
-- Minimal reproduction package
-- Complete setup commands
+## Support Matrix
 
-### 2. Installation-Independent Connectivity
+### setup / installation
 
-Uses official Python drivers for universal database connectivity:
-- **Exasol**: `pyexasol` - works with Docker, native, cloud, preinstalled
-- **ClickHouse**: `clickhouse-connect` - works with any deployment
+| system     | local | aws             | docker | gcp | azure |
+|------------|-------|-----------------|--------|-----|-------|
+| Exasol     | ✗     | ✓[^single-node] | ✗      | ✗   | ✗     | 
+| ClickHouse | ✗     | ✓[^single-node] | ✗      | ✗   | ✗     |
 
-### 3. Dynamic Dependency Management
+[^single-node]: Single-node system support for now.
 
-Each system defines its own dependencies via `get_python_dependencies()`. Packages only include drivers for databases actually benchmarked.
+### tcph workload
 
-### 4. Environment-Agnostic Templates
-
-Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands.
+| system     | local | aws | docker | gcp | azure |
+|------------|-------|-----|--------|-----|-------|
+| Exasol     | ✗     | ✓   | ✗      | ✗   | ✗     |
+| ClickHouse | ✗     | ✓   | ✗      | ✗   | ✗     |
 
 ## Documentation
 
-- 📖 [Getting Started Guide](GETTING_STARTED.md) - Installation, usage, and examples
-- 🔧 [Extending the Framework](EXTENDING.md) - Adding systems, workloads, and features
-
-## Dependencies
-
-Core dependencies (automatically installed):
-- `typer` - CLI framework
-- `jinja2` - Template rendering
-- `pyyaml` - Configuration parsing
-- `pandas` - Data manipulation
-- `matplotlib` - Plotting
-- `rich` - CLI formatting
-- `boto3` - AWS integration (optional)
-- `python-dotenv` - .env file support (optional)
-
-Database-specific drivers loaded dynamically based on systems used.
-
-## Contributing
+### For Users
 
-1. Fork the repository
-2. Create a feature branch
-3. Make your changes
-4. Add tests for new functionality
-5. Submit a pull request
+- 📖 [Getting Started Guide](user-docs/GETTING_STARTED.md) - Installation, usage, and examples
 
-## Security
+### For Developers
 
-- Database credentials and licenses should not be committed to the repository
-- Use environment variables or `.env` file for sensitive data
-- The framework includes basic security practices but should be reviewed for production use
+- 🔧 [Extending the Framework](dev-docs/EXTENDING.md) - Adding systems, workloads, and features
 
 ## License
 
 This project is licensed under the MIT License - see the LICENSE file for details.
+All names used are copyright and owned by the respective companies.
 
 ---
 
diff --git a/dev-docs/DEVELOPERS.md b/dev-docs/DEVELOPERS.md
new file mode 100644
index 0000000..a14a7d8
--- /dev/null
+++ b/dev-docs/DEVELOPERS.md
@@ -0,0 +1,111 @@
+# Main Developers Guide
+
+## Key Design Principles
+
+### 1. Self-Contained Reports
+
+Every report is a complete directory with:
+- All result data as attachments
+- Full configuration files being used
+- Minimal reproduction package
+- Complete setup commands
+
+### 2. Installation-Independent Connectivity
+
+Uses official Python drivers for universal database connectivity:
+
+- **Exasol**: `pyexasol` - works with Docker, native, cloud, preinstalled
+- **ClickHouse**: `clickhouse-connect` - works with any deployment
+
+### 3. Dynamic Dependency Management
+
+Each system defines its own dependencies via `get_python_dependencies()`. Packages only include drivers for databases actually benchmarked.
+
+### 4. Environment-Agnostic Templates
+
+Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands.
+
+## Repository Structure
+
+```
+benchkit/
+├── benchkit/                  # Core framework
+│   ├── cli.py                 # Command-line interface (9 commands)
+│   ├── systems/               # Database system implementations
+│   ├── workloads/             # Benchmark workloads (TPC-H)
+│   ├── gather/                # System information collection
+│   ├── run/                   # Benchmark execution
+│   ├── report/                # Report generation
+│   ├── infra/                 # Cloud infrastructure management
+│   ├── package/               # Minimal package creation
+│   └── verify/                # Result verification
+├── templates/                 # Jinja2 templates for reports
+├── configs/                   # Benchmark configurations
+├── infra/aws/                 # AWS Terraform modules
+├── workloads/tpch/            # TPC-H queries and schemas
+└── results/                   # Generated results (auto-created)
+```
+
+## Adding New Systems
+
+See [Extending Guide](EXTENDING.md)
+
+## Adding New Infrastructure Providers
+
+See [Extending Guide](EXTENDING.md)
+
+## Adding New Workloads
+
+See [Extending Guide](EXTENDING.md)
+
+## Best Practices
+
+### Code Quality
+
+1. **Follow existing patterns**: Study `ExasolSystem` and `ClickHouseSystem` implementations
+2. **Error handling**: Always include proper error handling and logging
+3. **Documentation**: Add docstrings explaining complex logic
+4. **Type hints**: Use type hints for better code clarity
+
+### Installation Independence
+
+1. **Use Python drivers**: Prefer official Python drivers over CLI tools
+2. **Universal connectivity**: Code should work with Docker, native, cloud, preinstalled
+3. **Graceful fallback**: Provide fallback mechanisms when drivers unavailable
+
+### Dynamic Dependencies
+
+1. **Implement `get_python_dependencies()`**: Each system declares its dependencies
+2. **Minimal packages**: Only include what's needed for the specific benchmark
+3. **Version pinning**: Specify minimum versions for dependencies
+
+### Testing
+
+1. **Unit tests**: Create tests for new functionality in `tests/`
+2. **Integration tests**: Test with actual database systems when possible
+3. **Cross-environment**: Test across Docker, native, and cloud deployments
+
+### Configuration
+
+1. **Validation**: Add configuration validation for new parameters
+2. **Defaults**: Provide sensible defaults for optional parameters
+3. **Documentation**: Document all configuration options
+
+### Security
+
+1. **Credentials**: Never commit credentials or sensitive data
+2. **Input validation**: Validate all user inputs
+3. **Least privilege**: Use minimal required permissions
+
+### Extension Checklist
+
+When adding a new component, verify:
+
+- [ ] Follows base class interface
+- [ ] Implements `get_python_dependencies()` (for systems)
+- [ ] Configuration validation includes new parameters
+- [ ] Documentation updated
+- [ ] Tests added for new functionality
+- [ ] Error handling implemented
+- [ ] Resource cleanup implemented
+- [ ] Works across deployment methods
diff --git a/EXTENDING.md b/dev-docs/EXTENDING.md
similarity index 92%
rename from EXTENDING.md
rename to dev-docs/EXTENDING.md
index 1d94f16..0dc3828 100644
--- a/EXTENDING.md
+++ b/dev-docs/EXTENDING.md
@@ -4,6 +4,8 @@ This guide explains how to extend the database benchmark framework with new syst
 
 ## Table of Contents
 
+- [Dependencies](#dependencies)
+- [Contributing](#contributing)
 - [Adding New Database Systems](#adding-new-database-systems)
 - [Adding New Workloads](#adding-new-workloads)
 - [Adding Cloud Providers](#adding-cloud-providers)
@@ -11,6 +13,28 @@ This guide explains how to extend the database benchmark framework with new syst
 - [Adding Result Verification](#adding-result-verification)
 - [Best Practices](#best-practices)
 
+## Dependencies
+
+Core dependencies (automatically installed):
+- `typer` - CLI framework
+- `jinja2` - Template rendering
+- `pyyaml` - Configuration parsing
+- `pandas` - Data manipulation
+- `matplotlib` - Plotting
+- `rich` - CLI formatting
+- `boto3` - AWS integration (optional)
+- `python-dotenv` - .env file support (optional)
+
+Database-specific drivers loaded dynamically based on systems used.
+
+## Contributing
+
+1. Fork the repository
+2. Create a feature branch
+3. Make your changes
+4. Add tests for new functionality
+5. Submit a pull request
+
 ## Adding New Database Systems
 
 ### Overview
@@ -768,61 +792,5 @@ class ResultVerifier:
 
 The `verify` command is already implemented in `benchkit/cli.py`. Extend the verification logic in `benchkit/verify/__init__.py`.
 
-## Best Practices
-
-### Code Quality
-
-1. **Follow existing patterns**: Study `ExasolSystem` and `ClickHouseSystem` implementations
-2. **Error handling**: Always include proper error handling and logging
-3. **Documentation**: Add docstrings explaining complex logic
-4. **Type hints**: Use type hints for better code clarity
-
-### Installation Independence
-
-1. **Use Python drivers**: Prefer official Python drivers over CLI tools
-2. **Universal connectivity**: Code should work with Docker, native, cloud, preinstalled
-3. **Graceful fallback**: Provide fallback mechanisms when drivers unavailable
-
-### Dynamic Dependencies
-
-1. **Implement `get_python_dependencies()`**: Each system declares its dependencies
-2. **Minimal packages**: Only include what's needed for the specific benchmark
-3. **Version pinning**: Specify minimum versions for dependencies
-
-### Testing
-
-1. **Unit tests**: Create tests for new functionality in `tests/`
-2. **Integration tests**: Test with actual database systems when possible
-3. **Cross-environment**: Test across Docker, native, and cloud deployments
-
-### Configuration
-
-1. **Validation**: Add configuration validation for new parameters
-2. **Defaults**: Provide sensible defaults for optional parameters
-3. **Documentation**: Document all configuration options
-
-### Security
-
-1. **Credentials**: Never commit credentials or sensitive data
-2. **Input validation**: Validate all user inputs
-3. **Least privilege**: Use minimal required permissions
-
-### Extension Checklist
-
-When adding a new component, verify:
-
-- [ ] Follows base class interface
-- [ ] Implements `get_python_dependencies()` (for systems)
-- [ ] Configuration validation includes new parameters
-- [ ] Documentation updated
-- [ ] Tests added for new functionality
-- [ ] Error handling implemented
-- [ ] Resource cleanup implemented
-- [ ] Works across deployment methods
-
-## References
-
-- [Getting Started Guide](GETTING_STARTED.md) - Basic usage instructions
-- [README](../README.md) - Quick start and overview
 
 This extensible design allows the framework to grow and adapt to new requirements while maintaining consistency and reliability across all components.
diff --git a/GETTING_STARTED.md b/user-docs/GETTING_STARTED.md
similarity index 94%
rename from GETTING_STARTED.md
rename to user-docs/GETTING_STARTED.md
index adf76f9..5db9b05 100644
--- a/GETTING_STARTED.md
+++ b/user-docs/GETTING_STARTED.md
@@ -16,17 +16,22 @@ This comprehensive guide will help you install, configure, and run your first da
 
 ## Prerequisites
 
-### System Requirements
+### System Requirements (Benchmark Host)
 
-- **Operating System**: Linux (Ubuntu 20.04+ recommended)
+- **Operating System**: Linux (Ubuntu 22.04+ recommended)
 - **Python**: 3.10 or higher
-- **Memory**: 16GB RAM minimum (32GB+ recommended for larger benchmarks)
-- **Storage**: 100GB+ free space (SSD recommended)
-- **Docker**: Optional, for containerized database systems
+- **Memory**: 2GB RAM minimum
 
-### Software Dependencies
+### System Requirements (System Host)
 
-```bash
+- **Operating System**: Linux (Ubuntu 22.04+ recommended)
+- **Python**: 3.10 or higher
+- **Memory**: Depends on benchmark settings
+- **Storage**: Depends on benchmark settings
+
+### Software Dependencies (ubuntu syntax, when starting at zero)
+
+```shell
 # Update system packages
 sudo apt-get update && sudo apt-get upgrade -y
 
@@ -46,16 +51,16 @@ sudo usermod -aG docker $USER
 
 ### 1. Clone the Repository
 
-```bash
-git clone <repository-url>
+```shell
+git clone https://github.com/exasol/benchkit.git
 cd benchkit
 ```
 
 ### 2. Set Up Python Environment
 
-```bash
+```shell
 # Create virtual environment
-python3 -m venv .venv
+python3 -m venv --system-site-packages .venv
 source .venv/bin/activate
 
 # Install the framework
@@ -67,7 +72,7 @@ benchkit --help
 
 You should see the framework's help message with 9 available commands.
 
-### 3. Install TPC-H Tools (Optional)
+### 3. Install TPC-H Tools (Optional/Obsolete?)
 
 For TPC-H benchmarks, install the data generation tools:
 
@@ -94,12 +99,11 @@ cat configs/exa_vs_ch_1g.yaml
 ```
 
 This configuration defines:
-- Two systems: Exasol and ClickHouse
-- TPC-H workload at scale factor 100
-- Selected queries: Q01, Q03, Q05, Q06, Q09, Q12, Q13, Q18, Q22
-- 3 runs per query with 1 warmup run
+- Three systems: Exasol, ClickHouse and Clickhouse with tuned queries
+- TPC-H workload at scale factor 1
+- 7 runs per query with 1 warmup run
 
-### 2. Prepare Data Directory
+### 2. Prepare Data Directory (Obsolete ?)
 
 ```bash
 # Create data directory
@@ -114,6 +118,9 @@ mkdir -p /data/{exasol,clickhouse,tpch}
 
 #### Option A: Run Everything at Once
 
+The included Makefile provides some shortcuts combining multiple calls to `benchkit`.
+Run `make` without arguments to get acommand overview.
+
 ```bash
 # Run the complete benchmark pipeline
 make all CFG=configs/exa_vs_ch_1g.yaml
@@ -122,7 +129,8 @@ make all CFG=configs/exa_vs_ch_1g.yaml
 This will:
 1. Probe system information
 2. Run the benchmark
-3. Generate a report report
+3. Generate a report
+4. Leave the cloud infrastructure running
 
 #### Option B: Run Step by Step
 
@@ -169,6 +177,9 @@ benchkit probe --config configs/my_benchmark.yaml --debug
 
 **Output**: Creates `results/<project_id>/system.json` (or `system_<systemname>.json` for cloud setups)
 
+> [!IMPORTANT]
+> Note that `probe` will automatically call `infra apply` if necessary, possibly starting cost-incurring services.
+
 ### 2. `run` - Execute Benchmarks
 
 Execute benchmarks against configured database systems:
@@ -364,7 +375,8 @@ benchkit verify --config configs/my_benchmark.yaml --systems exasol
 benchkit verify --config configs/my_benchmark.yaml --debug
 ```
 
-**Note**: Requires expected results in `workloads/<workload_name>/expected/`
+> [!NOTE]
+> Requires expected results in `workloads/<workload_name>/expected/`
 
 ### 9. `cleanup` - System Cleanup
 
@@ -938,7 +950,7 @@ results/my-benchmark/reports/<report-name>/
 Once you have a basic benchmark working:
 
 1. **Explore Configurations**: Try different scale factors and query sets
-2. **Add Systems**: See [EXTENDING.md](EXTENDING.md) for adding new databases
+2. **Add Systems**: See [EXTENDING.md](../dev-docs/EXTENDING.md) for adding new databases
 3. **Custom Workloads**: Create domain-specific benchmarks
 4. **Automate**: Set up CI/CD pipelines for regular benchmarking
 5. **Share Results**: Publish your benchmark methodology and results
@@ -946,6 +958,6 @@ Once you have a basic benchmark working:
 ## Additional Resources
 
 - [README](../README.md) - Framework overview and quick reference
-- [Extending the Framework](EXTENDING.md) - Add systems, workloads, and features
+- [Extending the Framework](../dev-docs/EXTENDING.md) - Add systems, workloads, and features
 
 This framework provides a solid foundation for database benchmarking. Start with the simple examples above and gradually explore more advanced features as you become comfortable with the system.

From 9b7d07b7a8e64105fa0aca3d6779bf857ea2c99b Mon Sep 17 00:00:00 2001
From: Stefan Reich <stefan.reich@exasol.com>
Date: Wed, 26 Nov 2025 16:16:49 +0100
Subject: [PATCH 2/6] more doc shuffling

---
 README.md                    | 55 ++++++++++++++----------------------
 user-docs/GETTING_STARTED.md |  3 ++
 2 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/README.md b/README.md
index 042c3cf..6f88158 100644
--- a/README.md
+++ b/README.md
@@ -7,15 +7,13 @@ collect detailed system information, run benchmark workloads, and generate repor
 ## Features
 
 - 🏗️ **Modular Architecture**: Fine-grained templates for setup, execution, and reporting
-- ☁️ **Multi-Cloud Support[^todo-cloud]**: Infrastructure automation with separate instances per database
+- ☁️ **Multi-Cloud Support**: Infrastructure automation with separate instances per database
 - 📊 **Benchmark Workloads**: TPC-H with support for custom workloads
 - 📝 **Self-Contained Reports**: Generate reproducible reports with all attachments
 - 🔧 **Extensible**: Easy to add new systems, workloads, and cloud providers
 - 📈 **Rich Visualizations**: Automated generation of performance plots and tables
 - 🔍 **Result Verification**: Validate query correctness against expected outputs
 
-[^todo-cloud]: Currently, only AWS is fully supported. Local and docker-based deployments are work in progress.
-
 ## Requirements
 
 - Python 3.10+
@@ -27,39 +25,40 @@ collect detailed system information, run benchmark workloads, and generate repor
 # 1. Clone and enter the repository
 git clone https://github.com/exasol/benchkit.git
 cd benchkit
-
-# 2. Install dependencies and local package
-python -m pip install -e .
 ```
 
 > [!TIP]
-> You might have to set up a python virtual environment for this first.
+> You might have to set up a python virtual environment before running the next command.
 
 ```shell
+# 2. Install dependencies and local package
+python -m pip install -e .
+
 # 3. Copy and edit example environment
 cp .env.example .env
 $EDITOR .env
+```
 
+> [!TIP]
+> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md)
+> for detailed cloud setup instructions.
+
+```shell
 # 4. Validate your configuration
 python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml
-
-# 5. Run sample benchmark
-make all CFG=configs/exa_vs_ch_1g.yaml
 ```
 
 > [!CAUTION]
-> Please note that the sample benchmark will use cost-incurring AWS resources, and requires your account
-> to be properly set up.
-> 
-> **Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones`
->
-> 📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for detailed cloud setup instructions.**
+> Please note that running the sample benchmark will use cost-incurring AWS resources.
 
 > [!NOTE]
 > Currently, the `env` section of the sample benchmark contains references to AWS key pair name and
 > ssh key files. You will also have to edit those parts accordingly.
 
 ```shell
+# 5. Run sample benchmark
+make all CFG=configs/exa_vs_ch_1g.yaml
+
 # 6. Clean up AWS resources
 make infra-destroy CFG=configs/exa_vs_ch_1g.yaml
 
@@ -118,30 +117,18 @@ You can easily create your own benchmark by creating a yaml configuration file c
 📖 See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create
 benchmark configurations using supported modules.
 
-## Extending the Framework
-
-The framework is designed for easy extension.
-
-📖 **See [Extending the Framework](dev-docs/EXTENDING.md) for comprehensive guides on:**
-
-- Adding new database systems
-- Creating custom workloads
-- Adding cloud providers
-- Customizing reports and visualizations
-- Implementing result verification
-
 ## Support Matrix
 
 ### setup / installation
 
-| system     | local | aws             | docker | gcp | azure |
-|------------|-------|-----------------|--------|-----|-------|
-| Exasol     | ✗     | ✓[^single-node] | ✗      | ✗   | ✗     | 
-| ClickHouse | ✗     | ✓[^single-node] | ✗      | ✗   | ✗     |
+| system     | local | aws  | docker | gcp | azure |
+|------------|-------|------|--------|-----|-------|
+| Exasol     | ✗     | ✓^1^ | ✗      | ✗   | ✗     | 
+| ClickHouse | ✗     | ✓^1^ | ✗      | ✗   | ✗     |
 
-[^single-node]: Single-node system support for now.
+^1^ Only single-node deployments supported at this time.
 
-### tcph workload
+### "tpch" workload
 
 | system     | local | aws | docker | gcp | azure |
 |------------|-------|-----|--------|-----|-------|
diff --git a/user-docs/GETTING_STARTED.md b/user-docs/GETTING_STARTED.md
index 5db9b05..b048dd4 100644
--- a/user-docs/GETTING_STARTED.md
+++ b/user-docs/GETTING_STARTED.md
@@ -534,6 +534,9 @@ workload:
 
 ### AWS Deployment
 
+> [!NOTE]
+> **Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones`
+
 #### 1. Configure AWS Credentials
 
 Choose one of these methods:

From 86da7903b4ee7f042a0736713b3141f51a6d4cb5 Mon Sep 17 00:00:00 2001
From: Stefan Reich <stefan.reich@exasol.com>
Date: Wed, 26 Nov 2025 16:21:08 +0100
Subject: [PATCH 3/6] shorten quickstart

---
 README.md | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/README.md b/README.md
index 6f88158..f092e3f 100644
--- a/README.md
+++ b/README.md
@@ -21,41 +21,32 @@ collect detailed system information, run benchmark workloads, and generate repor
 
 ## Quick Start
 
+> [!TIP]
+> You might have to set up a python virtual environment for installing python packages.
+
+> [!CAUTION]
+> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md)
+> for detailed cloud setup instructions.
+> Note that AWS infrastructure is usually not free to use.
+
 ```shell
 # 1. Clone and enter the repository
 git clone https://github.com/exasol/benchkit.git
 cd benchkit
-```
 
-> [!TIP]
-> You might have to set up a python virtual environment before running the next command.
-
-```shell
 # 2. Install dependencies and local package
 python -m pip install -e .
 
 # 3. Copy and edit example environment
 cp .env.example .env
 $EDITOR .env
-```
 
-> [!TIP]
-> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md)
-> for detailed cloud setup instructions.
+# 3b. (temporary) fix hardcoded ssh-key names in 'env' section of configuration
+$EDITOR configs/exa_vs_ch_1g.yaml
 
-```shell
 # 4. Validate your configuration
 python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml
-```
-
-> [!CAUTION]
-> Please note that running the sample benchmark will use cost-incurring AWS resources.
 
-> [!NOTE]
-> Currently, the `env` section of the sample benchmark contains references to AWS key pair name and
-> ssh key files. You will also have to edit those parts accordingly.
-
-```shell
 # 5. Run sample benchmark
 make all CFG=configs/exa_vs_ch_1g.yaml
 

From 21f997ea9750d46bf8f8afd0c12e4137c8488f19 Mon Sep 17 00:00:00 2001
From: Stefan Reich <stefan.reich@exasol.com>
Date: Wed, 26 Nov 2025 16:23:28 +0100
Subject: [PATCH 4/6] fix superscript

---
 README.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index f092e3f..34bd93e 100644
--- a/README.md
+++ b/README.md
@@ -112,12 +112,14 @@ benchmark configurations using supported modules.
 
 ### setup / installation
 
-| system     | local | aws  | docker | gcp | azure |
-|------------|-------|------|--------|-----|-------|
-| Exasol     | ✗     | ✓^1^ | ✗      | ✗   | ✗     | 
-| ClickHouse | ✗     | ✓^1^ | ✗      | ✗   | ✗     |
+| system     | local | aws           | docker | gcp | azure |
+|------------|-------|---------------|--------|-----|-------|
+| Exasol     | ✗     | ✓<sup>1</sup> | ✗      | ✗   | ✗     | 
+| ClickHouse | ✗     | ✓<sup>1</sup> | ✗      | ✗   | ✗     |
 
-^1^ Only single-node deployments supported at this time.
+Notes:
+
+1. Only single-node deployments supported at this time.
 
 ### "tpch" workload
 

From e959fa47763805656c7ec456435406ec91a99a20 Mon Sep 17 00:00:00 2001
From: Stefan Reich <stefan.reich@exasol.com>
Date: Thu, 27 Nov 2025 09:48:28 +0100
Subject: [PATCH 5/6] more shuffling

---
 README.md              |  6 +++---
 dev-docs/DEVELOPERS.md | 25 ++++++++++++++++++++-----
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index 34bd93e..4de5831 100644
--- a/README.md
+++ b/README.md
@@ -102,11 +102,11 @@ See [Developer Guide](dev-docs/DEVELOPERS.md) for a more detailed structure defi
 You can easily create your own benchmark by creating a yaml configuration file combining
 
 - One infrastructure provider (aws/docker/local/...)
+- One workload (benchmark type) to be executed
 - Multiple systems (software) to be tested
-- Infrastructure definition per system (e.g. AWS instance types)
 
-📖 See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create
-benchmark configurations using supported modules.
+📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create
+benchmark configurations using supported modules.**
 
 ## Support Matrix
 
diff --git a/dev-docs/DEVELOPERS.md b/dev-docs/DEVELOPERS.md
index a14a7d8..467de14 100644
--- a/dev-docs/DEVELOPERS.md
+++ b/dev-docs/DEVELOPERS.md
@@ -46,17 +46,32 @@ benchkit/
 └── results/                   # Generated results (auto-created)
 ```
 
+## Adding New Workloads
+
+A workload defines the contents of the benchmark, in terms of
+
+- **data model**, typically a set of DDL queries stored under `workloads/<name>`
+- **table data**, generated or otherwise defined by code under `benchkit/workloads/`
+- **query execution logic**, defined by code under `benchkit/workloads/`
+- **benchmark queries**, stored as SQL files under `workloads/<name>`
+
+📖 **See [Extending Guide](EXTENDING.md) for details**
+
 ## Adding New Systems
 
-See [Extending Guide](EXTENDING.md)
+Systems are defined by python code at `benchkit/systems/`, which needs to provide methods to
 
-## Adding New Infrastructure Providers
+- **deploy** the software on supported infrastructure providers
+- **configure** the software according to the benchmark configuration
+- **execute SQL** statements
+- **load data** (CSV) into tables
 
-See [Extending Guide](EXTENDING.md)
+📖 **See [Extending Guide](EXTENDING.md) for details**
 
-## Adding New Workloads
+## Adding New Infrastructure Providers
+
+📖 **See [Extending Guide](EXTENDING.md) for details**
 
-See [Extending Guide](EXTENDING.md)
 
 ## Best Practices
 

From 6dd1423b276fdb0edba31e6e2f6212624270e394 Mon Sep 17 00:00:00 2001
From: Stefan Reich <stefan.reich@exasol.com>
Date: Thu, 27 Nov 2025 09:54:04 +0100
Subject: [PATCH 6/6] more shuffling

---
 dev-docs/DEVELOPERS.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/dev-docs/DEVELOPERS.md b/dev-docs/DEVELOPERS.md
index 467de14..0500032 100644
--- a/dev-docs/DEVELOPERS.md
+++ b/dev-docs/DEVELOPERS.md
@@ -57,6 +57,11 @@ A workload defines the contents of the benchmark, in terms of
 
 📖 **See [Extending Guide](EXTENDING.md) for details**
 
+## Adding Query Variants To Existing Workloads
+
+> [!NOTE]
+> Section needs content.
+
 ## Adding New Systems
 
 Systems are defined by python code at `benchkit/systems/`, which needs to provide methods to
@@ -70,6 +75,9 @@ Systems are defined by python code at `benchkit/systems/`, which needs to provid
 
 ## Adding New Infrastructure Providers
 
+> [!NOTE]
+> Section needs content.
+
 📖 **See [Extending Guide](EXTENDING.md) for details**