Skip to content

Commit e08f285

Browse files
authored
Restructuring documentation (#7)
- Massive cuts to the main README - moved GETTING_STARTED and EXTENDING to separate folders for user- and developer- documentation - added fancy github info boxes
1 parent b520016 commit e08f285

File tree

4 files changed

+263
-253
lines changed

4 files changed

+263
-253
lines changed

README.md

Lines changed: 69 additions & 176 deletions
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,70 @@
11
# Database Benchmark Report Framework
22

3-
A modular framework for running and documenting database benchmarks, with a focus on comparing **Exasol** with other database systems. This repository provides reusable building blocks to launch benchmark environments, collect detailed system information, run benchmark workloads, and generate reports documenting the results.
3+
A modular framework for running and documenting database benchmarks, with a focus on comparing **Exasol** with
4+
other database systems. This repository provides reusable building blocks to launch benchmark environments,
5+
collect detailed system information, run benchmark workloads, and generate reports documenting the results.
46

57
## Features
68

79
- 🏗️ **Modular Architecture**: Fine-grained templates for setup, execution, and reporting
8-
- ☁️ **Multi-Cloud Support**: AWS infrastructure automation with separate instances per database
10+
- ☁️ **Multi-Cloud Support**: Infrastructure automation with separate instances per database
911
- 📊 **Benchmark Workloads**: TPC-H with support for custom workloads
1012
- 📝 **Self-Contained Reports**: Generate reproducible reports with all attachments
1113
- 🔧 **Extensible**: Easy to add new systems, workloads, and cloud providers
1214
- 📈 **Rich Visualizations**: Automated generation of performance plots and tables
1315
- 🔍 **Result Verification**: Validate query correctness against expected outputs
1416

17+
## Requirements
18+
19+
- Python 3.10+
20+
- **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install)
21+
1522
## Quick Start
1623

17-
```bash
18-
# Clone the repository
19-
git clone <repository-url>
24+
> [!TIP]
25+
> You might have to set up a python virtual environment for installing python packages.
26+
27+
> [!CAUTION]
28+
> The sample benchmark uses AWS cloud infrastructure. See [Getting Started Guide](user-docs/GETTING_STARTED.md)
29+
> for detailed cloud setup instructions.
30+
> Note that AWS infrastructure is usually not free to use.
31+
32+
```shell
33+
# 1. Clone and enter the repository
34+
git clone https://github.com/exasol/benchkit.git
2035
cd benchkit
2136

22-
# Install dependencies
37+
# 2. Install dependencies and local package
2338
python -m pip install -e .
2439

25-
# Run a sample benchmark
40+
# 3. Copy and edit example environment
41+
cp .env.example .env
42+
$EDITOR .env
43+
44+
# 3b. (temporary) fix hardcoded ssh-key names in 'env' section of configuration
45+
$EDITOR configs/exa_vs_ch_1g.yaml
46+
47+
# 4. Validate your configuration
48+
python scripts/check_aws_credentials.py --config configs/exa_vs_ch_1g.yaml
49+
50+
# 5. Run sample benchmark
2651
make all CFG=configs/exa_vs_ch_1g.yaml
27-
```
2852

29-
This will:
30-
1. Provision cloud infrastructure (if configured)
31-
2. Probe system information
32-
3. Run Exasol vs ClickHouse TPC-H benchmark
33-
4. Generate a complete report with results and reproducibility instructions
53+
# 6. Clean up AWS resources
54+
make infra-destroy CFG=configs/exa_vs_ch_1g.yaml
3455

35-
📖 **See [Getting Started Guide](GETTING_STARTED.md) for detailed installation and usage instructions.**
56+
# 7. view benchmark report
57+
...TBD
58+
```
3659

3760
## Usage
3861

3962
The framework provides 9 commands for complete benchmark lifecycle management:
4063

4164
```bash
65+
# Manage infrastructure
66+
benchkit infra apply --provider aws --config configs/my_benchmark.yaml
67+
4268
# System information collection
4369
benchkit probe --config configs/my_benchmark.yaml
4470

@@ -48,207 +74,74 @@ benchkit run --config configs/my_benchmark.yaml [--systems exasol] [--queries Q0
4874
# Generate reports
4975
benchkit report --config configs/my_benchmark.yaml
5076

51-
# Manage infrastructure
52-
benchkit infra apply --provider aws --config configs/my_benchmark.yaml
53-
5477
# Other commands: execute, status, package, verify, cleanup
5578
```
5679

5780
**Status Command** provides comprehensive project insights:
81+
5882
- Overview of all projects (probe, benchmark, report status)
5983
- Detailed status for specific configs (system info, infrastructure, timing)
6084
- Cloud infrastructure details (IPs, connection strings)
6185
- Multiple config support and smart project lookup
6286

63-
📖 **See [Getting Started Guide](GETTING_STARTED.md) for comprehensive CLI documentation and examples.**
87+
📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for comprehensive CLI documentation and examples.**
6488

65-
## Repository Structure
89+
## Repository Structure (User Version)
6690

6791
```
6892
benchkit/
6993
├── benchkit/ # Core framework
70-
│ ├── cli.py # Command-line interface (9 commands)
71-
│ ├── systems/ # Database system implementations
72-
│ ├── workloads/ # Benchmark workloads (TPC-H)
73-
│ ├── gather/ # System information collection
74-
│ ├── run/ # Benchmark execution
75-
│ ├── report/ # Report generation
76-
│ ├── infra/ # Cloud infrastructure management
77-
│ ├── package/ # Minimal package creation
78-
│ └── verify/ # Result verification
79-
├── templates/ # Jinja2 templates for reports
8094
├── configs/ # Benchmark configurations
81-
├── infra/aws/ # AWS Terraform modules
82-
├── workloads/tpch/ # TPC-H queries and schemas
8395
└── results/ # Generated results (auto-created)
8496
```
8597

86-
## Configuration Example
87-
88-
```yaml
89-
project_id: "exasol_vs_clickhouse_tpch"
90-
title: "Exasol vs ClickHouse Performance on TPC-H"
91-
92-
env:
93-
mode: "aws"
94-
region: "eu-west-1"
95-
instances:
96-
exasol:
97-
instance_type: "m7i.4xlarge"
98-
clickhouse:
99-
instance_type: "m7i.4xlarge"
100-
101-
systems:
102-
- name: "exasol"
103-
kind: "exasol"
104-
version: "2025.1.0"
105-
setup:
106-
method: "installer"
107-
extra:
108-
dbram: "32g"
109-
110-
- name: "clickhouse"
111-
kind: "clickhouse"
112-
version: "24.12"
113-
setup:
114-
method: "native"
115-
extra:
116-
memory_limit: "32g"
117-
118-
workload:
119-
name: "tpch"
120-
scale_factor: 1
121-
queries:
122-
include: ["Q01", "Q03", "Q06", "Q13"]
123-
runs_per_query: 3
124-
warmup_runs: 1
125-
```
98+
See [Developer Guide](dev-docs/DEVELOPERS.md) for a more detailed structure definition.
12699

127-
📖 **See [Getting Started Guide](GETTING_STARTED.md) for more configuration examples.**
100+
## Defining Your Own Benchmarks
128101

129-
## Requirements
102+
You can easily create your own benchmark by creating a yaml configuration file combining
130103

131-
- Python 3.10+
132-
- **Terraform** (for cloud infrastructure) - [Installation Guide](https://developer.hashicorp.com/terraform/install)
133-
- At least 16GB RAM (32GB+ recommended for larger benchmarks)
134-
- SSD storage recommended
104+
- One infrastructure provider (aws/docker/local/...)
105+
- One workload (benchmark type) to be executed
106+
- Multiple systems (software) to be tested
135107

136-
### AWS Setup (Optional)
108+
📖 **See [Getting Started Guide](user-docs/GETTING_STARTED.md) for information on how to create
109+
benchmark configurations using supported modules.**
137110

138-
For cloud deployments, configure AWS credentials:
111+
## Support Matrix
139112

140-
```bash
141-
# Create .env file (recommended)
142-
cat > .env << EOF
143-
AWS_PROFILE=default-mfa
144-
AWS_REGION=eu-west-1
145-
EOF
146-
```
147-
148-
**Required AWS Permissions**: `ec2:*`, `ec2:DescribeImages`, `ec2:DescribeAvailabilityZones`
149-
150-
📖 **See [Getting Started Guide](GETTING_STARTED.md) for detailed cloud setup instructions.**
151-
152-
## Extending the Framework
153-
154-
The framework is designed for easy extension:
113+
### setup / installation
155114

156-
### Quick Example: Adding a New Database System
115+
| system | local | aws | docker | gcp | azure |
116+
|------------|-------|---------------|--------|-----|-------|
117+
| Exasol || ✓<sup>1</sup> ||||
118+
| ClickHouse || ✓<sup>1</sup> ||||
157119

158-
1. Create `benchkit/systems/newsystem.py`:
159-
160-
```python
161-
from .base import SystemUnderTest
162-
163-
class NewSystem(SystemUnderTest):
164-
@classmethod
165-
def get_python_dependencies(cls) -> list[str]:
166-
return ["newsystem-driver>=1.0.0"]
167-
168-
def execute_query(self, query: str, query_name: str = None):
169-
# Use native Python driver for universal connectivity
170-
pass
171-
172-
# ... implement other required methods
173-
```
174-
175-
2. Register in `benchkit/systems/__init__.py`:
176-
177-
```python
178-
SYSTEM_IMPLEMENTATIONS = {
179-
"exasol": "ExasolSystem",
180-
"clickhouse": "ClickHouseSystem",
181-
"newsystem": "NewSystem", # Add this line
182-
}
183-
```
120+
Notes:
184121

185-
📖 **See [Extending the Framework](EXTENDING.md) for comprehensive guides on:**
186-
- Adding new database systems
187-
- Creating custom workloads
188-
- Adding cloud providers
189-
- Customizing reports and visualizations
190-
- Implementing result verification
122+
1. Only single-node deployments supported at this time.
191123

192-
## Key Design Principles
124+
### "tpch" workload
193125

194-
### 1. Self-Contained Reports
195-
196-
Every report is a complete directory with:
197-
- All result data as attachments
198-
- Exact configuration files
199-
- Minimal reproduction package
200-
- Complete setup commands
201-
202-
### 2. Installation-Independent Connectivity
203-
204-
Uses official Python drivers for universal database connectivity:
205-
- **Exasol**: `pyexasol` - works with Docker, native, cloud, preinstalled
206-
- **ClickHouse**: `clickhouse-connect` - works with any deployment
207-
208-
### 3. Dynamic Dependency Management
209-
210-
Each system defines its own dependencies via `get_python_dependencies()`. Packages only include drivers for databases actually benchmarked.
211-
212-
### 4. Environment-Agnostic Templates
213-
214-
Templates work everywhere - AWS, GCP, Azure, local, on-premises. All tuning parameters documented as copy-pasteable commands.
126+
| system | local | aws | docker | gcp | azure |
127+
|------------|-------|-----|--------|-----|-------|
128+
| Exasol ||||||
129+
| ClickHouse ||||||
215130

216131
## Documentation
217132

218-
- 📖 [Getting Started Guide](GETTING_STARTED.md) - Installation, usage, and examples
219-
- 🔧 [Extending the Framework](EXTENDING.md) - Adding systems, workloads, and features
220-
221-
## Dependencies
222-
223-
Core dependencies (automatically installed):
224-
- `typer` - CLI framework
225-
- `jinja2` - Template rendering
226-
- `pyyaml` - Configuration parsing
227-
- `pandas` - Data manipulation
228-
- `matplotlib` - Plotting
229-
- `rich` - CLI formatting
230-
- `boto3` - AWS integration (optional)
231-
- `python-dotenv` - .env file support (optional)
232-
233-
Database-specific drivers loaded dynamically based on systems used.
234-
235-
## Contributing
133+
### For Users
236134

237-
1. Fork the repository
238-
2. Create a feature branch
239-
3. Make your changes
240-
4. Add tests for new functionality
241-
5. Submit a pull request
135+
- 📖 [Getting Started Guide](user-docs/GETTING_STARTED.md) - Installation, usage, and examples
242136

243-
## Security
137+
### For Developers
244138

245-
- Database credentials and licenses should not be committed to the repository
246-
- Use environment variables or `.env` file for sensitive data
247-
- The framework includes basic security practices but should be reviewed for production use
139+
- 🔧 [Extending the Framework](dev-docs/EXTENDING.md) - Adding systems, workloads, and features
248140

249141
## License
250142

251143
This project is licensed under the MIT License - see the LICENSE file for details.
144+
All names used are copyright and owned by the respective companies.
252145

253146
---
254147

0 commit comments

Comments
 (0)