Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
d926979
:shirt: fixed ruff checks
anlowee Oct 21, 2024
f9d9e24
:memo: updated methodology
anlowee Oct 21, 2024
3ae6fb4
:memo: canceled modification
anlowee Oct 21, 2024
3e662e0
:shirt: fixed linting issues by black
anlowee Oct 21, 2024
3b89387
:shirt: fixed spelling and line wrapping issues
anlowee Oct 23, 2024
1847341
merge
anlowee Oct 24, 2024
3d1b344
:shirt: idk what happened, some previous fixes were gone. Just readde…
anlowee Oct 24, 2024
16920ad
:construction: wip
anlowee Oct 24, 2024
629da6a
:tada: added MongoDB executor
anlowee Oct 24, 2024
a25a435
:shirt: fixed some coderabitai's suggestions
anlowee Oct 24, 2024
f56c994
:shirt: fixed coderabbitai suggestions
anlowee Oct 25, 2024
b0eb437
:shirt: fixed coderabbit suggestions
anlowee Oct 25, 2024
6ed8020
:shirt: fixed coderabbit issues
anlowee Oct 25, 2024
f013db8
:construction: rebased from xiaochong-fix-ruff-check
anlowee Oct 24, 2024
96520dd
:construction: merged
anlowee Oct 25, 2024
ce49181
:construction: wip, separting ingest and query
anlowee Oct 27, 2024
845e7de
:tada: added MongoDB executor
anlowee Oct 24, 2024
f8f0e16
:construction: wip
anlowee Oct 27, 2024
2701df2
Merge branch 'xiaochong-add-mongodb-benchmark-toolset' of https://git…
anlowee Oct 27, 2024
68484fc
:sparkles: finished mongodb executor and results, but there are still…
anlowee Oct 28, 2024
6459d1b
:memo: start working on mongodb document, but before that lets do som…
anlowee Oct 28, 2024
8a5d1b8
:memo: initially updated the methodology for mongodb
anlowee Oct 28, 2024
fe24519
:construction: wip refactoring
anlowee Oct 29, 2024
4eaa056
:bug: minor fixed
anlowee Oct 29, 2024
89243e5
:bug: added dataset path argument
anlowee Oct 29, 2024
dd54161
:bug: minor fix
anlowee Oct 29, 2024
f320ae4
:bug: minor fix
anlowee Oct 29, 2024
0e26d0a
:construction: wip
anlowee Oct 29, 2024
64af2a3
:construction: wip
anlowee Oct 29, 2024
edb5a05
:tada: added MongoDB executor
anlowee Oct 24, 2024
dd386d8
:test_tube: added assets for clickhouse
anlowee Oct 29, 2024
7d166bc
:construction: wip
anlowee Oct 29, 2024
6411cea
:construction: wip
anlowee Oct 29, 2024
049ee44
:construction: wip
anlowee Oct 29, 2024
efc0649
:construction: wip
anlowee Oct 30, 2024
598997b
:construction: wip
anlowee Oct 30, 2024
8167cf1
:test_tube: added clickhouse results
anlowee Oct 30, 2024
cf9cc18
:lipstick: changed coloar calculation algorithm to logarithmic
anlowee Oct 30, 2024
b98c897
Merge branch 'xiaochong-fix-ruff-checks' of github.com:anlowee/clp-be…
anlowee Oct 30, 2024
501ed07
:tada: added MongoDB executor
anlowee Oct 24, 2024
3aa5606
:construction: wip
anlowee Oct 27, 2024
8073f53
:sparkles: finished mongodb executor and results, but there are still…
anlowee Oct 28, 2024
b88472c
:memo: start working on mongodb document, but before that lets do som…
anlowee Oct 28, 2024
df5e59f
:memo: initially updated the methodology for mongodb
anlowee Oct 28, 2024
4c12584
:rocket: Merge branch 'xiaochong-add-mongodb-benchmark-toolset' of gi…
anlowee Nov 3, 2024
895fb27
:rocket: Merge branch 'xiaochong-add-clickhouse-benchmark-toolset' in…
anlowee Nov 3, 2024
338b448
:rocket: Merge branch 'xiaochong-add-mongodb-benchmark-toolset' into …
anlowee Nov 3, 2024
b439d42
:hammer: refactored mongodb
anlowee Nov 3, 2024
a7ca501
:construction: added clp-s
anlowee Nov 3, 2024
3e9b573
:construction: wip-refactored semi-structure elasticsearch assets
anlowee Nov 4, 2024
d5587e2
:construction: wip-refactored glt
anlowee Nov 4, 2024
292b86f
:construction: wip-elasticsearch unstructured finished
anlowee Nov 4, 2024
095b483
:construction: wip-refactored loki
anlowee Nov 4, 2024
4a88cde
:construction: wip-refactored grep
anlowee Nov 4, 2024
6a005fa
:memo: updated pydoc
anlowee Nov 4, 2024
da5116a
:lipstick: formatted
anlowee Nov 4, 2024
dd67084
:fire: removed binaries
Nov 5, 2024
afdbb94
:memo: updated docs
anlowee Nov 6, 2024
8e291ba
:rocket: Merge branch 'xiaochong-refactor-and-add-new-results' of git…
Nov 6, 2024
58cd0b2
:construction: wip-updated splunk asstes and refactered the results
Nov 6, 2024
3d88950
:memo: updated splunk docs
Nov 6, 2024
0ef5a94
:memo: updated docs
Nov 6, 2024
62f1150
:shirt: addressed a senior comment
Nov 6, 2024
3ee697d
:memo: fixed doc error
Nov 6, 2024
babe2fc
:lipstick: changed the color representing the worst to very dark red …
Nov 6, 2024
8e485ea
:construction: wip-addressing some comments from a senior
Nov 7, 2024
8fa08a1
:rocket: Merge branch 'xiaochong-refactor-and-add-new-results' of git…
Nov 7, 2024
dae42f6
:hammer: fixed naming issues
anlowee Nov 7, 2024
f90734b
:construction: updated splunk results; some linting fix
Nov 7, 2024
88dfafb
:construction: fixed splunk script bugs
Nov 7, 2024
c11f491
:lipstick: wrapped the lines in clickhouse assets
anlowee Nov 8, 2024
ff1b7c9
:lipstick: finished wrapping lines for semistructure assets
anlowee Nov 8, 2024
6ca8007
:lipstick: fixed all wrapping lines and end with newline issue
anlowee Nov 8, 2024
336dc8e
Format Markdown files with Prettier and some manual tweaks.
kirkrodrigues Nov 11, 2024
6136f93
Edit README.md and associated files.
kirkrodrigues Nov 11, 2024
08f1f75
Sentence case for headings; minor edit.
kirkrodrigues Nov 11, 2024
6c848c2
Alphabetize env vars; Add newline at end of file.
kirkrodrigues Nov 11, 2024
d040292
:memo: fixed some lint issues of markdown
Nov 11, 2024
af15244
Minor edits.
kirkrodrigues Nov 11, 2024
5cd9997
:construction: wip
Nov 11, 2024
dae3cce
:construction: wip
Nov 11, 2024
552f43f
:construction: wip
Nov 13, 2024
2cfb374
:construction: wip
Nov 14, 2024
838fbb3
:construction: wip
Nov 14, 2024
66f605e
:lipstick: finished
Nov 14, 2024
e064936
:bug: fixed semi-structured Elasticsearch methodology link
Nov 17, 2024
45d2419
:shirt: fixed some comments from a senior
Nov 21, 2024
2fb824e
:construction: fixed some commnets but some comments need further det…
Nov 25, 2024
c867307
:construction: fixed the rest of comments
Dec 4, 2024
44d7ff2
:lipstick: fixed the mermaid issue
Jan 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 40 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,54 @@
# clp-bench
clp-bench is a tool for benchmarking [CLP] as well as other log management tools. The tool itself is
a Python package, and we also provide a [web interface][ui] for viewing results.

The methodology for the benchmarks is described [here](docs/methodology.md).
**clp-bench** is a tool for benchmarking [glt] and other log management systems. It functions as a
Python package and includes a [web interface][ui] for displaying benchmark results.

## Requirements

* Docker
* Python v3.10 or higher
- Docker
- Python v3.10 or higher

# Set up
# Setup

```shell
python3 -m venv venv
. venv/bin/activate
pip install -e .
```

You can use `clp-bench --help` to see usage instructions.
To view usage instructions, run `clp-bench --help`.

[CLP]: https://github.com/y-scope/clp
[ui]: ui
# Results
You can view the current benchmark results [here][webui]. The benchmark currently evaluates
ingestion and query performance for the following tools:

| Tool | Version |
|--------------------------------|-------------|
| [ClickHouse][clickhouse] | 23.3.1.2823 |
| [glt][glt] | 0.2.0 |
| [clp-s][clp-s] | 0.2.0 |
| [Elasticsearch][elasticsearch] | 8.6.2 |
| `grep` | 3.7 |
| [Loki][loki] | 3.0.0 |
| [MongoDB][mongodb] | 6.0.19 |
| [Splunk][splunk] | 9.3.2 |

Don't see a tool here? Feel free to file a [GitHub issue][new-issue] for one or follow this
[guide][adding-a-tool] for how to add one.

For a detailed description of the benchmarking
methodology, see [here][methodology].

[assets]: assets
[clickhouse]: https://clickhouse.com/
[glt]: https://github.com/y-scope/clp
[clp-s]: https://docs.yscope.com/clp/main/user-guide/core-clp-s.html
[webui]: https://benchmarks.yscope.com/clp/
[elasticsearch]: https://www.elastic.co/downloads/elasticsearch
[loki]: https://grafana.com/oss/loki/
[mongodb]: https://www.mongodb.com/
[methodology]: docs/methodology.md
[new-issue]: https://github.com/y-scope/clp-bench/issues/new
[adding-a-tool]: docs/adding-a-tool.md
[splunk]: https://www.splunk.com/
[ui]: ui
1 change: 1 addition & 0 deletions assets/dynamically-structured/clickhouse/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
FROM clickhouse/clickhouse-server:latest
6 changes: 6 additions & 0 deletions assets/dynamically-structured/clickhouse/clear-cache.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

clickhouse-client --query "SYSTEM DROP UNCOMPRESSED CACHE"
clickhouse-client --query "SYSTEM DROP MARK CACHE"
sync
echo 1 >/proc/sys/vm/drop_caches
29 changes: 29 additions & 0 deletions assets/dynamically-structured/clickhouse/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
system_metric:
enable: true
memory:
ingest_polling_interval: 5
run_query_benchmark_polling_interval: 5

container_id: clickhouse-clp-bench
assets_path: /home/assets
datasets_path: /home/datasets/mongod.log
hot_run_warm_up_times: 3
related_processes:
- clickhouse-server
- clickhouse-client
- clickhouse-watchd
queries:
- '"JSON_EXISTS(raw, ''$.attr.tickets'')"'
- '"JSONExtractInt(raw, ''id'') = 22419"'
- >
"JSON_VALUE(raw, '$.attr.message.msg') like 'log_release%' AND JSON_VALUE(raw, '$.attr.
message.session_name') = 'connection'"
- >
"JSON_VALUE(raw, '$.ctx') = 'initandlisten' AND (JSON_VALUE(raw, '$.attr.message.msg')
like 'log_remove%' OR JSON_VALUE(raw, '$.msg') != 'WiredTiger message')"
- >
"JSON_VALUE(raw, '$.c') = 'WTWRTLOG' and JSONExtractInt(JSONExtractString(
JSONExtractString(raw, 'attr'), 'message'), 'ts_sec') > 1679490000"
- >
"JSON_VALUE(raw, '$.ctx') = 'FlowControlRefresher' AND JSONExtractInt(JSONExtractString(
raw, 'attr'), 'numTrimmed') = 0 "
1 change: 1 addition & 0 deletions assets/dynamically-structured/clickhouse/container-name
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
clickhouse-clp-bench
10 changes: 10 additions & 0 deletions assets/dynamically-structured/clickhouse/docker-build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -e

script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
container_name=$(cat "$script_dir/container-name")

docker build \
--tag "$container_name" \
"$script_dir"
25 changes: 25 additions & 0 deletions assets/dynamically-structured/clickhouse/docker-run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash

set -e

if [ -z "$1" ]; then
echo "Error: Datasets path argument is missing."
echo "Usage: bash ./docker-run.sh <absolute_datasets_path>"
exit 1
fi

script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
container_name=$(cat "$script_dir/container-name")
workdir=/home

docker run \
--privileged \
-it \
--rm \
--workdir "$workdir" \
--network host \
--name "$container_name" \
--mount "type=bind,src=$script_dir,dst=/home/assets" \
--mount "type=bind,src=$1,dst=/home/datasets" \
"$container_name" \
bash -c "cd ${workdir} && /bin/bash -l"
15 changes: 15 additions & 0 deletions assets/dynamically-structured/clickhouse/ingest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

set -e
if [ -z "$1" ]; then
echo "Error: Datasets path argument is missing."
echo "Usage: bash ./ingest.sh <absolute_datasets_path_in_container>"
exit 1
fi

collection_name=clickhouse_clp_bench

clickhouse-client \
--max_threads 1 \
--query "INSERT INTO ${collection_name} FROM INFILE '$1' FORMAT JSONAsString" \
>/dev/null 2>&1
9 changes: 9 additions & 0 deletions assets/dynamically-structured/clickhouse/launch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash

# Start the ClickHouse server in daemon mode
clickhouse-server --daemon

# Wait until ClickHouse server is running
while [ "$(clickhouse-client --query "SELECT 1" 2>/dev/null)" != "1" ]; do
sleep 1
done
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/usr/bin/env bash

collection_name=clickhouse_clp_bench

clickhouse-client \
--query "SELECT SUM(bytes) FROM system.parts WHERE active AND table = '${collection_name}'"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -e
if [ -z "$1" ]; then
echo "Error: Datasets path argument is missing."
echo "Usage: bash ./measure-decompressed-size.sh <absolute_datasets_path_in_container>"
exit 1
fi

du "$1" -bc | awk "END {print \$1}"
25 changes: 25 additions & 0 deletions assets/dynamically-structured/clickhouse/methodology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# ClickHouse methodology

## Basics

Version: [23.3.1.2823][download]

## Setup

We start the ClickHouse server in daemon mode.

## Specifics

To store JSON records, we use a
[single string field][jsonasstring] in ClickHouse, eliminating the need for preprocessing.

For query benchmarking, we operate in [single-thread mode][max_threads] by setting
`max_threads = 1`. Additionally, we configure the
[minimum data volume required for direct I/O access][direct_io] to 1 byte
(`min_bytes_to_use_direct_io = 1`) on the storage disk.


[download]: https://hub.docker.com/layers/clickhouse/clickhouse-server/23.3.1.2823/images/sha256-b88fd8c71b64d3158751337557ff089ff7b0d1ebf81d9c4c7aa1f0b37a31ee64?context=explore
[direct_io]: https://clickhouse.com/docs/en/operations/settings/settings#min_bytes_to_use_direct_io
[jsonasstring]: https://clickhouse.com/docs/en/interfaces/formats#jsonasstring
[max_threads]: https://clickhouse.com/docs/en/operations/settings/settings#max_threads
11 changes: 11 additions & 0 deletions assets/dynamically-structured/clickhouse/reset.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#!/usr/bin/env bash

collection_name=clickhouse_clp_bench

clickhouse-client \
--max_threads 1 \
--query "DROP TABLE IF EXISTS ${collection_name}" >/dev/null 2>&1
clickhouse-client \
--max_threads 1 \
--query "CREATE TABLE ${collection_name}(raw String CODEC(ZSTD(3))) ENGINE = MergeTree ORDER \
BY tuple()" >/dev/null 2>&1
36 changes: 36 additions & 0 deletions assets/dynamically-structured/clickhouse/results.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
{
"target": "clickhouse",
"targetDisplayedName": "ClickHouse",
"displayedOrder": 3,
"isEnable": true,
"type": 2,
"ingestTime": 636409,
"compressedSize": 1050348093,
"avgIngestMem": 39008949248,
"metrics": [
{
"metric": 1,
"avgQueryMem": 707801088,
"queryTimes": [
189293,
117725,
199560,
226136,
221677,
219761
]
},
{
"metric": 2,
"avgQueryMem": 603047936,
"queryTimes": [
189381,
115290,
198320,
225983,
227425,
224492
]
}
]
}
15 changes: 15 additions & 0 deletions assets/dynamically-structured/clickhouse/search.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

set -e
if [ -z "$1" ]; then
echo "Error: Query argument is missing."
echo "Usage: bash ./search.sh <query>"
exit 1
fi

collection_name=clickhouse_clp_bench

clickhouse-client \
--max_threads 1 \
--query "SELECT * from ${collection_name} where $1 SETTINGS max_threads = 1, \
min_bytes_to_use_direct_io = 1" 2>/dev/null | wc -l
3 changes: 3 additions & 0 deletions assets/dynamically-structured/clickhouse/terminate.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env bash

/etc/init.d/clickhouse-server stop >/dev/null 2>&1
11 changes: 11 additions & 0 deletions assets/dynamically-structured/clp-s/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM ghcr.io/y-scope/clp/clp-core-dependencies-x86-ubuntu-jammy:main

RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
htop \
jq \
python3-venv \
rsync \
sqlite3 \
tmux \
vim
4 changes: 4 additions & 0 deletions assets/dynamically-structured/clp-s/clear-cache.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env bash

sync
echo 1 >/proc/sys/vm/drop_caches
19 changes: 19 additions & 0 deletions assets/dynamically-structured/clp-s/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
system_metric:
enable: true
memory:
ingest_polling_interval: 5
run_query_benchmark_polling_interval: 5

container_id: clp-clp-bench
assets_path: /home/assets
datasets_path: /home/datasets/mongod.log
hot_run_warm_up_times: 3
related_processes:
- /home/assets/clp-s
queries:
- '"attr.tickets:*"'
- '"id: 22419"'
- '"attr.message.msg: log_release* AND attr.message.session_name: connection"'
- '''ctx: initandlisten AND (NOT msg: "WiredTigermessage" OR attr.message.msg: log_remove*)'''
- '"c: WTWRTLOG AND attr.message.ts_sec > 1679490000"'
- '"ctx: FlowControlRefresher AND attr.numTrimmed: 0"'
1 change: 1 addition & 0 deletions assets/dynamically-structured/clp-s/container-name
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
clp-clp-bench
10 changes: 10 additions & 0 deletions assets/dynamically-structured/clp-s/docker-build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -e

script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
container_name=$(cat "$script_dir/container-name")

docker build \
--tag "$container_name" \
"$script_dir"
25 changes: 25 additions & 0 deletions assets/dynamically-structured/clp-s/docker-run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash

set -e

if [ -z "$1" ]; then
echo "Error: Datasets path argument is missing."
echo "Usage: bash ./docker-run.sh <absolute_datasets_path>"
exit 1
fi

script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
container_name=$(cat "$script_dir/container-name")
workdir=/home

docker run \
--privileged \
-it \
--rm \
--workdir "$workdir" \
--network host \
--name "$container_name" \
--mount "type=bind,src=$script_dir,dst=/home/assets" \
--mount "type=bind,src=$1,dst=/home/datasets" \
"$container_name" \
bash -c "cd ${workdir} && /bin/bash -l"
13 changes: 13 additions & 0 deletions assets/dynamically-structured/clp-s/ingest.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/usr/bin/env bash

set -e
if [ -z "$1" ]; then
echo "Error: Datasets path argument is missing."
echo "Usage: bash ./ingest.sh <absolute_datasets_path_in_container>"
exit 1
fi

clp_s_binary=/home/assets/clp-s
data_path=/home/archives

"${clp_s_binary}" c --timestamp-key "t.\$date" --target-encoded-size 268435456 "$data_path" "$1"
5 changes: 5 additions & 0 deletions assets/dynamically-structured/clp-s/launch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#!/usr/bin/env bash

data_path=/home/archives

mkdir -p "$data_path"
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
#!/usr/bin/env bash

data_path=/home/archives
du "$data_path" -bc | awk "END {print \$1}"
10 changes: 10 additions & 0 deletions assets/dynamically-structured/clp-s/measure-decompressed-size.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash

set -e
if [ -z "$1" ]; then
echo "Error: Datasets path argument is missing."
echo "Usage: bash ./measure-decompressed-size.sh <absolute_datasets_path_in_container>"
exit 1
fi

du "$1" -bc | awk "END {print \$1}"
Loading