From f7152bce1eae7d8c5011e85a614c91911eed9075 Mon Sep 17 00:00:00 2001 From: DanRoscigno Date: Tue, 10 Feb 2026 18:35:00 -0500 Subject: [PATCH 1/3] trigger Signed-off-by: DanRoscigno --- .../administration/management/BE_blacklist.md | 103 + .../management/BE_configuration.md | 3474 +++++++++++++ .../management/Backup_and_restore.md | 650 +++ .../management/FE_configuration.md | 4502 +++++++++++++++++ .../management/Scale_up_down.md | 99 + .../administration/management/audit_loader.md | 221 + .../administration/management/compaction.md | 303 ++ .../management/configuration.mdx | 11 + .../administration/management/enable_fqdn.md | 163 + .../management/graceful_exit.md | 276 + docs/en/administration/management/logs.md | 272 + .../administration/management/management.mdx | 9 + .../management/monitor_manage_big_queries.md | 256 + .../monitoring/Monitor_and_Alert.md | 913 ++++ .../management/monitoring/alert.md | 759 +++ .../monitoring/metrics-materialized_view.md | 113 + .../monitoring/metrics-shared-data.md | 251 + .../management/monitoring/metrics.md | 1973 ++++++++ .../administration/management/proc_profile.md | 115 + .../resource_management/Blacklist.md | 138 + .../resource_management/Load_balance.md | 127 + .../resource_management/Memory_management.md | 99 + .../resource_management/Query_management.md | 105 + .../management/resource_management/Replica.md | 576 +++ .../resource_management/be_label.md | 128 + .../resource_management/filemanager.md | 42 + .../resource_management/query_queues.md | 187 + .../resource_management/resource_group.md | 519 ++ .../resource_management/spill_to_disk.md | 91 + docs/en/administration/management/timezone.md | 55 + 30 files changed, 16530 insertions(+) create mode 100644 docs/en/administration/management/BE_blacklist.md create mode 100644 docs/en/administration/management/BE_configuration.md create mode 100644 docs/en/administration/management/Backup_and_restore.md create mode 100644 docs/en/administration/management/FE_configuration.md create mode 100644 docs/en/administration/management/Scale_up_down.md create mode 100644 docs/en/administration/management/audit_loader.md create mode 100644 docs/en/administration/management/compaction.md create mode 100644 docs/en/administration/management/configuration.mdx create mode 100644 docs/en/administration/management/enable_fqdn.md create mode 100644 docs/en/administration/management/graceful_exit.md create mode 100644 docs/en/administration/management/logs.md create mode 100644 docs/en/administration/management/management.mdx create mode 100644 docs/en/administration/management/monitor_manage_big_queries.md create mode 100644 docs/en/administration/management/monitoring/Monitor_and_Alert.md create mode 100644 docs/en/administration/management/monitoring/alert.md create mode 100644 docs/en/administration/management/monitoring/metrics-materialized_view.md create mode 100644 docs/en/administration/management/monitoring/metrics-shared-data.md create mode 100644 docs/en/administration/management/monitoring/metrics.md create mode 100644 docs/en/administration/management/proc_profile.md create mode 100644 docs/en/administration/management/resource_management/Blacklist.md create mode 100644 docs/en/administration/management/resource_management/Load_balance.md create mode 100644 docs/en/administration/management/resource_management/Memory_management.md create mode 100644 docs/en/administration/management/resource_management/Query_management.md create mode 100644 docs/en/administration/management/resource_management/Replica.md create mode 100644 docs/en/administration/management/resource_management/be_label.md create mode 100644 docs/en/administration/management/resource_management/filemanager.md create mode 100644 docs/en/administration/management/resource_management/query_queues.md create mode 100644 docs/en/administration/management/resource_management/resource_group.md create mode 100644 docs/en/administration/management/resource_management/spill_to_disk.md create mode 100644 docs/en/administration/management/timezone.md diff --git a/docs/en/administration/management/BE_blacklist.md b/docs/en/administration/management/BE_blacklist.md new file mode 100644 index 0000000..0fdc629 --- /dev/null +++ b/docs/en/administration/management/BE_blacklist.md @@ -0,0 +1,103 @@ +--- +displayed_sidebar: docs +--- + +# Manage BE and CN Blacklist + + +From v3.3.0 onwards, StarRocks supports the BE Blacklist feature, which allows you to forbid the usage of certain BE nodes in query execution, thereby avoiding frequent query failures or other unexpected behaviors caused by the failed connections to the BE nodes. A network issue preventing connections to one or more BEs would be an example of when to use the blacklist. + +From v4.0 onwards, StarRocks supports adding Compute Nodes (CNs) to the Blacklist. + +By default, StarRocks can automatically manage the BE and CN Blacklist, adding the BE or CN nodes that have lost connection to the blacklist and removing them from the blacklist when the connection is reestablished. However, StarRocks will not remove the node from the Blacklist if it is manually blacklisted. + +:::note + +- Only users with the SYSTEM-level BLACKLIST privilege can use this feature. +- Each FE node keeps its own BE and CN Blacklist, and will not share it with other FE nodes. + +::: + +## Add a BE/CN to the blacklist + +You can manually add a BE/CN node to the Blacklist using [ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md). In this statement, you must specify the ID of the BE/CN node to be blacklisted. You can obtain the BE ID by executing [SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md) and CN ID by executing [SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md). + +Example: + +```SQL +-- Obtain BE ID. +SHOW BACKENDS\G +*************************** 1. row *************************** + BackendId: 10001 + IP: xxx.xx.xx.xxx + ... +-- Add BE to the blacklist. +ADD BACKEND BLACKLIST 10001; + +-- Obtain CN ID. +SHOW COMPUTE NODES\G +*************************** 1. row *************************** + ComputeNodeId: 10005 + IP: xxx.xx.xx.xxx + ... +-- Add CN to the blacklist. +ADD COMPUTE NODE BLACKLIST 10005; +``` + +## Remove a BE/CN from blacklist + +You can manually remove a BE/CN node from the Blacklist using [DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md). In this statement, you must also specify the ID of the BE/CN node. + +Example: + +```SQL +-- Remove a BE from the Blacklist. +DELETE BACKEND BLACKLIST 10001; + +-- Remove a CN from the Blacklist. +DELETE COMPUTE NODE BLACKLIST 10005; +``` + +## View BE/CN Blacklist + +You can view the BE/CN nodes in the Blacklist using [SHOW BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKEND_BLACKLIST.md). + +Example: + +```SQL +-- View the BE Blacklist. +SHOW BACKEND BLACKLIST; ++-----------+------------------+---------------------+------------------------------+--------------------+ +| BackendId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | ++-----------+------------------+---------------------+------------------------------+--------------------+ +| 10001 | MANUAL | 2024-04-28 11:52:09 | 0 | 5 | ++-----------+------------------+---------------------+------------------------------+--------------------+ + +-- View the CN Blacklist. +SHOW COMPUTE NODE BLACKLIST; ++---------------+------------------+---------------------+------------------------------+--------------------+ +| ComputeNodeId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | ++---------------+------------------+---------------------+------------------------------+--------------------+ +| 10005 | MANUAL | 2025-08-18 10:47:51 | 0 | 5 | ++---------------+------------------+---------------------+------------------------------+--------------------+ +``` + +The following fields are returned: + +- `AddBlackListType`: How the BE/CN node was added to the blacklist. `MANUAL` indicates it is manually blacklisted by the user. `AUTO` indicates it is automatically blacklisted by StarRocks. +- `LostConnectionTime`: + - For the `MANUAL` type, it indicates the time when the BE/CN node was manually added to the blacklist. + - For the `AUTO` type, it indicates the time when the last successful connection was established. +- `LostConnectionNumberInPeriod`: The number of disconnections detected within `CheckTimePeriod(s)`, which is the interval at which StarRocks checks the connection status of the BE/CN nodes in the blacklist. +- `CheckTimePeriod(s)`: The interval at which StarRocks checks the connection status of the blacklisted BE/CN nodes. Its value is evaluated to the value you specified for the FE configuration item `black_host_history_sec`. Unit: Seconds. + +## Configure automatic management of BE/CN Blacklist + +Each time a BE/CN node loses connection to the FE node, or a query fails due to timeout on a BE/CN node, the FE node adds the BE/CN node to its BE and CN Blacklist. The FE node will constantly assess the connectivity of the BE/CN node in the blacklist by counting its connection failures within a certain duration of time. StarRocks will remove a blacklisted BE/CN node only if the number of its connection failures is below a pre-specified threshold. + +You can configure the automatic management of the BE and CN Blacklist using the following [FE configurations](./FE_configuration.md): + +- `black_host_history_sec`: The time duration for retaining historical connection failures of BE/CN nodes in the Blacklist. +- `black_host_connect_failures_within_time`: The threshold of connection failures allowed for a blacklisted BE/CN node. + +If a BE/CN node is added to the Blacklist automatically, StarRocks will assess its connectivity and judge whether it can be removed from the Blacklist. Within `black_host_history_sec`, only if a blacklisted BE/CN node has fewer connection failures than the threshold set in `black_host_connect_failures_within_time`, it can be removed from the Blacklist. diff --git a/docs/en/administration/management/BE_configuration.md b/docs/en/administration/management/BE_configuration.md new file mode 100644 index 0000000..6a3bde6 --- /dev/null +++ b/docs/en/administration/management/BE_configuration.md @@ -0,0 +1,3474 @@ +--- +displayed_sidebar: docs +--- + +import BEConfigMethod from '../../_assets/commonMarkdown/BE_config_method.mdx' + +import CNConfigMethod from '../../_assets/commonMarkdown/CN_config_method.mdx' + +import PostBEConfig from '../../_assets/commonMarkdown/BE_dynamic_note.mdx' + +import StaticBEConfigNote from '../../_assets/commonMarkdown/StaticBE_config_note.mdx' + +# BE Configuration + + + + + + +## View BE configuration items + +You can view the BE configuration items using the following command: + +```shell +curl http://:/varz +``` + +## Configure BE parameters + + + + + +## Understand BE Parameters + +### Logging + +##### diagnose_stack_trace_interval_ms + +- Default: 1800000 (30 minutes) +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Controls the minimum time gap between successive stack-trace diagnostics performed by DiagnoseDaemon for `STACK_TRACE` requests. When a diagnose request arrives, the daemon skips collecting and logging stack traces if the last collection happened less than `diagnose_stack_trace_interval_ms` milliseconds ago. Increase this value to reduce CPU overhead and log volume from frequent stack dumps; decrease it to capture more frequent traces to debug transient issues (for example, in load fail-point simulations of long `TabletsChannel::add_chunk` blocking). +- Introduced in: v3.5.0 + +##### lake_replication_slow_log_ms + +- Default: 30000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Threshold for emitting slow-log entries during lake replication. After each file copy the code measures elapsed time in microseconds and marks the operation as slow when elapsed time is greater than or equal to `lake_replication_slow_log_ms * 1000`. When triggered, StarRocks writes an INFO log with file size, cost and trace metrics for that replicated file. Increase the value to reduce noisy slow logs for large/slow transfers; decrease it to detect and surface smaller slow-copy events sooner. +- Introduced in: - + +##### load_rpc_slow_log_frequency_threshold_seconds + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Controls how frequently the system prints slow-log entries for load RPCs that exceed their configured RPC timeout. The slow-log also includes the load channel runtime profile. Setting this value to 0 causes per-timeout logging in practice. +- Introduced in: v3.4.3, v3.5.0 + +##### log_buffer_level + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The strategy for flushing logs. The default value indicates that logs are buffered in memory. Valid values are `-1` and `0`. `-1` indicates that logs are not buffered in memory. +- Introduced in: - + +##### pprof_profile_dir + +- Default: `${STARROCKS_HOME}/log` +- Type: String +- Unit: - +- Is mutable: No +- Description: Directory path where StarRocks writes pprof artifacts (Jemalloc heap snapshots and gperftools CPU profiles). +- Introduced in: v3.2.0 + +##### sys_log_dir + +- Default: `${STARROCKS_HOME}/log` +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores system logs (including INFO, WARNING, ERROR, and FATAL). +- Introduced in: - + +##### sys_log_level + +- Default: INFO +- Type: String +- Unit: - +- Is mutable: Yes (from v3.3.0, v3.2.7, and v3.1.12) +- Description: The severity levels into which system log entries are classified. Valid values: INFO, WARN, ERROR, and FATAL. This item was changed to a dynamic configuration from v3.3.0, v3.2.7, and v3.1.12 onwards. +- Introduced in: - + +##### sys_log_roll_mode + +- Default: SIZE-MB-1024 +- Type: String +- Unit: - +- Is mutable: No +- Description: The mode in which system logs are segmented into log rolls. Valid values include `TIME-DAY`, `TIME-HOUR`, and `SIZE-MB-`size. The default value indicates that logs are segmented into rolls, each of which is 1 GB. +- Introduced in: - + +##### sys_log_roll_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of log rolls to reserve. +- Introduced in: - + +##### sys_log_timezone + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to show timezone information in the log prefix. `true` indicates to show timezone information, `false` indicates not to show. +- Introduced in: - + +##### sys_log_verbose_level + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The level of the logs to be printed. This configuration item is used to control the output of logs initiated with VLOG in codes. +- Introduced in: - + +##### sys_log_verbose_modules + +- Default: +- Type: Strings +- Unit: - +- Is mutable: No +- Description: The module of the logs to be printed. For example, if you set this configuration item to OLAP, StarRocks only prints the logs of the OLAP module. Valid values are namespaces in BE, including `starrocks`, `starrocks::debug`, `starrocks::fs`, `starrocks::io`, `starrocks::lake`, `starrocks::pipeline`, `starrocks::query_cache`, `starrocks::stream`, and `starrocks::workgroup`. +- Introduced in: - + +### Server + +##### abort_on_large_memory_allocation + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When a single allocation request exceeds the configured large-allocation threshold (g_large_memory_alloc_failure_threshold `>` 0 and requested size `>` threshold), this flag controls how the process responds. If true, StarRocks calls std::abort() immediately (hard crash) when such a large allocation is detected. If false, the allocation is blocked and the allocator returns failure (nullptr or ENOMEM) so callers can handle the error. This check only takes effect for allocations that are not wrapped with the TRY_CATCH_BAD_ALLOC path (the mem hook uses a different flow when bad-alloc is being caught). Enable for fail-fast debugging of unexpected huge allocations; keep disabled in production unless you want an immediate process abort on over-large allocation attempts. +- Introduced in: v3.4.3, 3.5.0, 4.0.0 + +##### arrow_flight_port + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: TCP port for the BE Arrow Flight SQL server. `-1` indicaes to disable the Arrow Flight service. On non-macOS builds, BE invokes Arrow Flight SQL Server with this port during startup; if the port is unavailable, the server startup fails and the BE process exits. The configured port is reported to the FE in the heartbeat payload. +- Introduced in: v3.4.0, v3.5.0 + +##### be_exit_after_disk_write_hang_second + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The length of time that the BE waits to exit after the disk hangs. +- Introduced in: - + +##### be_http_num_workers + +- Default: 48 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used by the HTTP server. +- Introduced in: - + +##### be_http_port + +- Default: 8040 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The BE HTTP server port. +- Introduced in: - + +##### be_port + +- Default: 9060 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The BE thrift server port, which is used to receive requests from FEs. +- Introduced in: - + +##### be_service_threads + +- Default: 64 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Number of worker threads the BE Thrift server uses to serve backend RPC/execution requests. This value is passed to ThriftServer when creating the BackendService and controls how many concurrent request handlers are available; requests are queued when all worker threads are busy. Tune based on expected concurrent RPC load and available CPU/memory: increasing it raises concurrency but also per-thread memory and context-switch cost, decreasing it limits parallel handling and may increase request latency. +- Introduced in: v3.2.0 + +##### brpc_connection_type + +- Default: `"single"` +- Type: string +- Unit: - +- Is mutable: No +- Description: The bRPC channel connection mode. Valid values: + - `"single"` (Default): One persistent TCP connection for each channel. + - `"pooled"`: A pool of persistent connections for higher concurrency at the cost of more sockets/file descriptors. + - `"short"`: Short‑lived connections created per RPC to reduce persistent resource usage but with higher latency. + The choice affects per-socket buffering behavior and can influence `Socket.Write` failures (EOVERCROWDED) when unwritten bytes exceed socket limits. +- Introduced in: v3.2.5 + +##### brpc_max_body_size + +- Default: 2147483648 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The maximum body size of a bRPC. +- Introduced in: - + +##### brpc_max_connections_per_server + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of persistent bRPC connections the client keeps for each remote server endpoint. For each endpoint `BrpcStubCache` creates a `StubPool` whose `_stubs` vector is reserved to this size. On first accesses, new stubs are created until the limit is reached. After that, existing stubs are returned in a round‑robin fashion. Increasing this value raises per‑endpoint concurrency (reduces contention on a single channel) at the cost of more file descriptors, memory, and channels. +- Introduced in: v3.2.0 + +##### brpc_num_threads + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of bthreads of a bRPC. The value `-1` indicates the same number with the CPU threads. +- Introduced in: - + +##### brpc_port + +- Default: 8060 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The BE bRPC port, which is used to view the network statistics of bRPCs. +- Introduced in: - + +##### brpc_socket_max_unwritten_bytes + +- Default: 1073741824 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Sets the per-socket limit for unwritten outbound bytes in the bRPC server. When the amount of buffered, not-yet-written data on a socket reaches this limit, subsequent `Socket.Write` calls fail with EOVERCROWDED. This prevents unbounded per-connection memory growth but can cause RPC send failures for very large messages or slow peers. Align this value with `brpc_max_body_size` to ensure single-message bodies are not larger than the allowed unwritten buffer. Increasing the value raises memory usage per connection. +- Introduced in: v3.2.0 + +##### brpc_stub_expire_s + +- Default: 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The expire time of bRPC stub cache. The default value is 60 minutes. +- Introduced in: - + +##### compress_rowbatches + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: A boolean value to control whether to compress the row batches in RPCs between BEs. `true` indicates compressing the row batches, and `false` indicates not compressing them. +- Introduced in: - + +##### consistency_max_memory_limit_percent + +- Default: 20 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Percentage cap used to compute the memory budget for consistency-related tasks. During BE startup, the final consistency limit is computed as the minimum of the value parsed from `consistency_max_memory_limit` (bytes) and (`process_mem_limit * consistency_max_memory_limit_percent / 100`). If `process_mem_limit` is unset (-1), consistency memory is considered unlimited. For `consistency_max_memory_limit_percent`, values less than 0 or greater than 100 are treated as 100. Adjusting this value increases or decreases memory reserved for consistency operations and therefore affects memory available for queries and other services. +- Introduced in: v3.2.0 + +##### delete_worker_count_normal_priority + +- Default: 2 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Number of normal-priority worker threads dedicated to handling delete (REALTIME_PUSH with DELETE) tasks on the BE agent. At startup this value is added to delete_worker_count_high_priority to size the DeleteTaskWorkerPool (see agent_server.cpp). The pool assigns the first delete_worker_count_high_priority threads as HIGH priority and the rest as NORMAL; normal-priority threads process standard delete tasks and contribute to overall delete throughput. Increase to raise concurrent delete capacity (higher CPU/IO usage); decrease to reduce resource contention. +- Introduced in: v3.2.0 + +##### disable_mem_pools + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to disable MemPool. When this item is set to `true`, the MemPool chunk pooling is disabled so each allocation gets its own sized chunk instead of reusing or increasing pooled chunks. Disabling pooling reduces long-lived retained buffer memory at the cost of more frequent allocations, increased number of chunks, and skipped integrity checks (which are avoided because of the large chunk count). Keep `disable_mem_pools` as `false` (default) to benefit from allocation reuse and fewer system calls. Set it to `true` only when you must avoid large pooled memory retention (for example, low-memory environments or diagnostic runs). +- Introduced in: v3.2.0 + +##### enable_https + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the BE's bRPC server is configured to use TLS: `ServerOptions.ssl_options` will be populated with the certificate and private key specified by `ssl_certificate_path` and `ssl_private_key_path` at BE startup. This enables HTTPS/TLS for incoming bRPC connections; clients must connect using TLS. Ensure the certificate and key files exist, are accessible to the BE process, and match bRPC/SSL expectations. +- Introduced in: v4.0.0 + +##### enable_jemalloc_memory_tracker + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the BE starts a background thread (jemalloc_tracker_daemon) that polls jemalloc statistics (once per second) and updates the GlobalEnv jemalloc metadata MemTracker with the jemalloc "stats.metadata" value. This ensures jemalloc metadata consumption is included in StarRocks process memory accounting and prevents under‑reporting of memory used by jemalloc internals. The tracker is only compiled/started on non‑macOS builds (#ifndef __APPLE__) and runs as a daemon thread named "jemalloc_tracker_daemon". Because this setting affects startup behaviour and threads that maintain MemTracker state, changing it requires a restart. Disable only if jemalloc is not used or when jemalloc tracking is intentionally managed differently; otherwise keep enabled to maintain accurate memory accounting and allocation safeguards. +- Introduced in: v3.2.12 + +##### enable_jvm_metrics + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Controls whether the system initializes and registers JVM-specific metrics at startup. When enabled the metrics subsystem will create JVM-related collectors (for example, heap, GC and thread metrics) for export, and when disabled, those collectors are not initialized. This parameter is intended for forward compatibility and may be removed in a future release. Use `enable_system_metrics` to control system-level metric collection. +- Introduced in: v4.0.0 + +##### get_pindex_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Sets the number of worker threads for the "get_pindex" thread pool in UpdateManager, which is used to load / fetch persistent index data (used when applying rowsets for primary-key tables). At runtime, a config update will adjust the pool's maximum threads: if `>0` that value is applied; if 0 the runtime callback uses the number of CPU cores (CpuInfo::num_cores()). On initialization the pool's max threads is computed as max(get_pindex_worker_count, max_apply_thread_cnt * 2) where max_apply_thread_cnt is the apply-thread pool maximum. Increase to raise parallelism for pindex loading; lowering reduces concurrency and memory/CPU usage. +- Introduced in: v3.2.0 + +##### heartbeat_service_port + +- Default: 9050 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The BE heartbeat service port, which is used to receive heartbeats from FEs. +- Introduced in: - + +##### heartbeat_service_thread_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The thread count of the BE heartbeat service. +- Introduced in: - + +##### local_library_dir + +- Default: `${UDF_RUNTIME_DIR}` +- Type: string +- Unit: - +- Is mutable: No +- Description: Local directory on the BE where UDF (user-defined function) libraries are staged and where Python UDF worker processes operate. StarRocks copies UDF libraries from HDFS into this path, creates per-worker Unix domain sockets at `/pyworker_`, and chdirs Python worker processes into this directory before exec. The directory must exist, be writable by the BE process, and reside on a filesystem that supports Unix domain sockets (i.e., a local filesystem). Because this config is immutable at runtime, set it before startup and ensure adequate permissions and disk space on each BE. +- Introduced in: v3.2.0 + +##### max_transmit_batched_bytes + +- Default: 262144 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Maximum number of serialized bytes to accumulate in a single transmit request before it is flushed to the network. Sender implementations add serialized ChunkPB payloads into a PTransmitChunkParams request and send the request once the accumulated bytes exceed `max_transmit_batched_bytes` or when EOS is reached. Increase this value to reduce RPC frequency and improve throughput at the cost of higher per-request latency and memory use; reduce it to lower latency and memory but increase RPC rate. +- Introduced in: v3.2.0 + +##### mem_limit + +- Default: 90% +- Type: String +- Unit: - +- Is mutable: No +- Description: BE process memory upper limit. You can set it as a percentage ("80%") or a physical limit ("100G"). The default hard limit is 90% of the server's memory size, and the soft limit is 80%. You need to configure this parameter if you want to deploy StarRocks with other memory-intensive services on a same server. +- Introduced in: - + +##### memory_max_alignment + +- Default: 16 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Sets the maximum byte alignment that MemPool will accept for aligned allocations. Increase this value only when callers require larger alignment (for SIMD, device buffers, or ABI constraints). Larger values increase per-allocation padding and reserved memory waste and must remain within what the system allocator and platform support. +- Introduced in: v3.2.0 + +##### memory_urgent_level + +- Default: 85 +- Type: long +- Unit: Percentage (0-100) +- Is mutable: Yes +- Description: The emergency memory water‑level expressed as a percentage of the process memory limit. When process memory consumption exceeds `(limit * memory_urgent_level / 100)`, BE triggers immediate memory reclamation, which forces data cache shrinkage, evicts update caches, and causes persistent/lake MemTables to be treated as "full" so they will be flushed/compacted soon. The code validates that this setting must be greater than `memory_high_level`, and `memory_high_level` must be greater or equal to `1`, and less thant or equal to `100`). A lower value causes more aggressive, earlier reclamation, that is, more frequent cache evictions and flushes. A higher value delays reclamation and risks OOM if too close to 100. Tune this item together with `memory_high_level` and Data Cache-related auto‑adjust settings. +- Introduced in: v3.2.0 + +##### net_use_ipv6_when_priority_networks_empty + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: A boolean value to control whether to use IPv6 addresses preferentially when `priority_networks` is not specified. `true` indicates to allow the system to use an IPv6 address preferentially when the server that hosts the node has both IPv4 and IPv6 addresses and `priority_networks` is not specified. +- Introduced in: v3.3.0 + +##### num_cores + +- Default: 0 +- Type: Int +- Unit: Cores +- Is mutable: No +- Description: Controls the number of CPU cores the system will use for CPU-aware decisions (for example, thread-pool sizing and runtime scheduling). A value of 0 enables auto-detection: the system reads `/proc/cpuinfo` and uses all available cores. If set to a positive integer, that value overrides the detected core count and becomes the effective core count. When running inside containers, cgroup cpuset or cpu quota settings can further restrict usable cores; `CpuInfo` also respects those cgroup limits. +- Introduced in: v3.2.0 + +##### plugin_path + +- Default: `${STARROCKS_HOME}/plugin` +- Type: String +- Unit: - +- Is mutable: No +- Description: Filesystem directory where StarRocks loads external plugins (dynamic libraries, connector artifacts, UDF binaries, etc.). `plugin_path` should point to a directory accessible by the BE process (read and execute permissions) and must exist before plugins are loaded. Ensure correct ownership and that plugin files use the platform's native binary extension (for example, .so on Linux). +- Introduced in: v3.2.0 + +##### priority_networks + +- Default: An empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: Declares a selection strategy for servers that have multiple IP addresses. Note that at most one IP address must match the list specified by this parameter. The value of this parameter is a list that consists of entries, which are separated with semicolons (;) in CIDR notation, such as `10.10.10.0/24`. If no IP address matches the entries in this list, an available IP address of the server will be randomly selected. From v3.3.0, StarRocks supports deployment based on IPv6. If the server has both IPv4 and IPv6 addresses, and this parameter is not specified, the system uses an IPv4 address by default. You can change this behavior by setting `net_use_ipv6_when_priority_networks_empty` to `true`. +- Introduced in: - + +##### rpc_compress_ratio_threshold + +- Default: 1.1 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: Threshold (uncompressed_size / compressed_size) used when deciding whether to send serialized row-batches over the network in compressed form. When compression is attempted (e.g., in DataStreamSender, exchange sink, tablet sink index channel, dictionary cache writer), StarRocks computes compress_ratio = uncompressed_size / compressed_size; it uses the compressed payload only if compress_ratio `>` rpc_compress_ratio_threshold. With the default 1.1, compressed data must be at least ~9.1% smaller than uncompressed to be used. Lower the value to prefer compression (more CPU for smaller bandwidth savings); raise it to avoid compression overhead unless it yields larger size reductions. Note: this applies to RPC/shuffle serialization and is effective only when row-batch compression is enabled (compress_rowbatches). +- Introduced in: v3.2.0 + +##### ssl_private_key_path + +- Default: An empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: File system path to the TLS/SSL private key (PEM) that the BE's brpc server uses as the private key for the default certificate. When `enable_https` is set to `true`, the system sets `brpc::ServerOptions::ssl_options().default_cert.private_key` to this path at process start. The file must be accessible by the BE process and must match the certificate provided by `ssl_certificate_path`. If this value is not set or the file is missing or unaccessible, HTTPS will not be configured and the bRPC server may fail to start. Protect this file with restrictive filesystem permissions (for example, 600). +- Introduced in: v4.0.0 + +##### thrift_client_retry_interval_ms + +- Default: 100 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The time interval at which a thrift client retries. +- Introduced in: - + +##### thrift_connect_timeout_seconds + +- Default: 3 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: Connection timeout (in seconds) used when creating Thrift clients. ClientCacheHelper::_create_client multiplies this value by 1000 and passes it to ThriftClientImpl::set_conn_timeout(), so it controls the TCP/connect handshake timeout for new Thrift connections opened by the BE client cache. This setting affects only connection establishment; send/receive timeouts are configured separately. Very small values can cause spurious connection failures on high-latency networks, while large values delay detection of unreachable peers. +- Introduced in: v3.2.0 + +##### thrift_port + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Port used to export the internal Thrift-based BackendService. When the process runs as a Compute Node and this item is set to a non-zero value, it overrides `be_port` and the Thrift server binds to this value; otherwise `be_port` is used. This configuration is deprecated — setting a non-zero `thrift_port` logs a warning advising to use `be_port` instead. +- Introduced in: v3.2.0 + +##### thrift_rpc_connection_max_valid_time_ms + +- Default: 5000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Maximum valid time for a thrift RPC connection. A connection will be closed if it has existed in the connection pool for longer than this value. It must be set consistent with FE configuration `thrift_client_timeout_ms`. +- Introduced in: - + +##### thrift_rpc_max_body_size + +- Default: 0 +- Type: Int +- Unit: +- Is mutable: No +- Description: The maximum string body size of RPC. `0` indicates the size is unlimited. +- Introduced in: - + +##### thrift_rpc_strict_mode + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether thrift's strict execution mode is enabled. For more information on thrift strict mode, see [Thrift Binary protocol encoding](https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md). +- Introduced in: - + +##### thrift_rpc_timeout_ms + +- Default: 5000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The timeout for a thrift RPC. +- Introduced in: - + +##### transaction_apply_thread_pool_num_min + +- Default: 0 +- Type: Int +- Unit: Threads +- Is mutable: Yes +- Description: Sets the minimum number of threads for the "update_apply" thread pool in BE's UpdateManager — the pool that applies rowsets for primary-key tables. A value of 0 disables a fixed minimum (no enforced lower bound); when transaction_apply_worker_count is also 0 the pool's max threads defaults to the number of CPU cores, so effective worker capacity equals CPU cores. You can raise this to guarantee a baseline concurrency for applying transactions; setting it too high may increase CPU contention. Changes are applied at runtime via the update_config HTTP handler (it calls update_min_threads on the apply thread pool). +- Introduced in: v3.2.11 + +##### transaction_publish_version_thread_pool_num_min + +- Default: 0 +- Type: Int +- Unit: Threads +- Is mutable: Yes +- Description: Sets the minimum number of threads reserved in the AgentServer "publish_version" dynamic thread pool (used to publish transaction versions / handle TTaskType::PUBLISH_VERSION tasks). At startup the pool is created with min = max(config value, MIN_TRANSACTION_PUBLISH_WORKER_COUNT) (MIN_TRANSACTION_PUBLISH_WORKER_COUNT = 1), so the default 0 results in a minimum of 1 thread. Changing this value at runtime invokes the update callback to call ThreadPool::update_min_threads, raising or lowering the pool's guaranteed minimum (but not below the enforced minimum of 1). Coordinate with transaction_publish_version_worker_count (max threads) and transaction_publish_version_thread_pool_idle_time_ms (idle timeout). +- Introduced in: v3.2.11 + +##### use_mmap_allocate_chunk + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the system allocates chunks using anonymous private mmap mappings (MAP_ANONYMOUS | MAP_PRIVATE) and frees them with munmap. Enabling this may create many virtual memory mappings, thus you must raise the kernel limit (as root user, running `sysctl -w vm.max_map_count=262144` or `echo 262144 > /proc/sys/vm/max_map_count`), and set `chunk_reserved_bytes_limit` to a relatively large value. Otherwise, enabling mmap can cause very poor performance due to frequent mapping/unmapping. +- Introduced in: v3.2.0 + +### Metadata and cluster management + +##### cluster_id + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Global cluster identifier for this StarRocks backend. At startup StorageEngine reads config::cluster_id into its effective cluster id and verifies that all data root paths contain the same cluster id (see StorageEngine::_check_all_root_path_cluster_id). A value of -1 means "unset" — the engine may derive the effective id from existing data directories or from master heartbeats. If a non‑negative id is configured, any mismatch between configured id and ids stored in data directories will cause startup verification to fail (Status::Corruption). When some roots lack an id and the engine is allowed to write ids (options.need_write_cluster_id), it will persist the effective id into those roots. +- Introduced in: v3.2.0 + +##### consistency_max_memory_limit + +- Default: 10G +- Type: String +- Unit: - +- Is mutable: No +- Description: Memory size specification for the CONSISTENCY memory tracker. +- Introduced in: v3.2.0 + +##### make_snapshot_rpc_timeout_ms + +- Default: 20000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Sets the Thrift RPC timeout in milliseconds used when making a snapshot on a remote BE. Increase this value when remote snapshot creation regularly exceeds the default timeout; reduce it to fail faster on unresponsive BEs. Note other timeouts may affect end-to-end operations (for example the effective tablet-writer open timeout can relate to `tablet_writer_open_rpc_timeout_sec` and `load_timeout_sec`). +- Introduced in: v3.2.0 + +##### metadata_cache_memory_limit_percent + +- Default: 30 +- Type: Int +- Unit: Percent +- Is mutable: Yes +- Description: Sets the metadata LRU cache size as a percentage of the process memory limit. At startup StarRocks computes cache bytes as (process_mem_limit * metadata_cache_memory_limit_percent / 100) and passes that to the metadata cache allocator. The cache is only used for non-PRIMARY_KEYS rowsets (PK tables are not supported) and is enabled only when metadata_cache_memory_limit_percent > 0; set it <= 0 to disable the metadata cache. Increasing this value raises metadata cache capacity but reduces memory available to other components; tune based on workload and system memory. Not active in BE_TEST builds. +- Introduced in: v3.2.10 + +##### retry_apply_interval_second + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Base interval (in seconds) used when scheduling retries of failed tablet apply operations. It is used directly to schedule a retry after a submission failure and as the base multiplier for backoff: the next retry delay is calculated as min(600, `retry_apply_interval_second` * failed_attempts). The code also uses `retry_apply_interval_second` to compute the cumulative retry duration (arithmetic-series sum) which is compared against `retry_apply_timeout_second` to decide whether to keep retrying. Effective only when `enable_retry_apply` is true. Increasing this value lengthens both individual retry delays and the cumulative time spent retrying; decreasing it makes retries more frequent and may increase the number of attempts before reaching `retry_apply_timeout_second`. +- Introduced in: v3.2.9 + +##### retry_apply_timeout_second + +- Default: 7200 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Maximum cumulative retry time (in seconds) allowed for applying a pending version before the apply process gives up and the tablet enters an error state. The apply logic accumulates exponential/backoff intervals based on `retry_apply_interval_second` and compares the total duration against `retry_apply_timeout_second`. If `enable_retry_apply` is true and the error is considered retryable, apply attempts will be rescheduled until the accumulated backoff exceeds `retry_apply_timeout_second`; then apply stops and the tablet transitions to error. Explicitly non-retryable errors (e.g., Corruption) are not retried regardless of this setting. Tune this value to control how long StarRocks will keep retrying apply operations (default 7200s = 2 hours). +- Introduced in: v3.3.13, v3.4.3, v3.5.0 + +##### txn_commit_rpc_timeout_ms + +- Default: 60000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Maximum allowed lifetime (in milliseconds) for Thrift RPC connections used by BE stream-load and transaction commit calls. StarRocks sets this value as the `thrift_rpc_timeout_ms` on requests sent to FE (used in stream_load planning, loadTxnBegin/loadTxnPrepare/loadTxnCommit, and getLoadTxnStatus). If a connection has been pooled longer than this value it will be closed. When a per-request timeout (`ctx->timeout_second`) is provided, the BE computes the RPC timeout as rpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms)), so the effective RPC timeout is bounded by the context and this configuration. Keep this consistent with FE's `thrift_client_timeout_ms` to avoid mismatched timeouts. +- Introduced in: v3.2.0 + +##### txn_map_shard_size + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Number of lock-map shards used by the transaction manager to partition transaction locks and reduce contention. Its value should be a power of two (2^n); increasing it augments concurrency and reduces lock contention at the cost of additional memory and marginal bookkeeping overhead. Choose a shard count sized for expected concurrent transactions and available memory. +- Introduced in: v3.2.0 + +##### txn_shard_size + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Controls the number of lock shards used by the transaction manager. This value determines the shard size for txn locks. It must be a power of two; Setting it to a larger value reduces lock contention and improves concurrent COMMIT/PUBLISH throughput at the expense of additional memory and finer-grained internal bookkeeping. +- Introduced in: v3.2.0 + +##### update_schema_worker_count + +- Default: 3 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Sets the maximum number of worker threads in the backend's "update_schema" dynamic ThreadPool that processes TTaskType::UPDATE_SCHEMA tasks. The ThreadPool is created in agent_server during startup with a minimum of 0 threads (it can scale down to zero when idle) and a max equal to this setting; the pool uses the default idle timeout and an effectively unlimited queue. Increase this value to allow more concurrent schema-update tasks (higher CPU and memory usage), or lower it to limit parallel schema operations. +- Introduced in: v3.2.3 + +##### update_tablet_meta_info_worker_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Sets the maximum number of worker threads in the backend thread pool that handles tablet metadata update tasks. The thread pool is created during backend startup with a minimum of 0 threads (it can scale down to zero when idle) and a max equal to this setting (clamped to at least 1). Updating this value at runtime adjusts the pool's max threads. Increase it to allow more concurrent metadata-update tasks, or lower it to limit concurrency. +- Introduced in: v4.1.0, v4.0.6, v3.5.13 + +### User, role, and privilege + +##### ssl_certificate_path + +- Default: +- Type: String +- Unit: - +- Is mutable: No +- Description: Absolute path to the TLS/SSL certificate file (PEM) that the BE's brpc server will use when enable_https is true. At BE startup this value is copied into `brpc::ServerOptions::ssl_options().default_cert.certificate`; you must also set `ssl_private_key_path` to the matching private key. Provide the server certificate and any intermediate certificates in PEM format (certificate chain) if required by your CA. The file must be readable by the StarRocks BE process and is applied only at startup. If unset or invalid while enable_https is enabled, brpc TLS setup may fail and prevent the server from starting correctly. +- Introduced in: v4.0.0 + +### Query engine + +##### clear_udf_cache_when_start + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When enabled, the BE's UserFunctionCache will clear all locally cached user function libraries on startup. During UserFunctionCache::init, the code calls _reset_cache_dir(), which removes UDF files from the configured UDF library directory (organized into kLibShardNum subdirectories) and deletes files with Java/Python UDF suffixes (.jar/.py). When disabled (default), the BE loads existing cached UDF files instead of deleting them. Enabling this forces UDF binaries to be re-downloaded on first use after restart (increasing network traffic and first-use latency). +- Introduced in: v4.0.0 + +##### dictionary_speculate_min_chunk_size + +- Default: 10000 +- Type: Int +- Unit: Rows +- Is mutable: No +- Description: Minimum number of rows (chunk size) used by StringColumnWriter and DictColumnWriter to trigger dictionary-encoding speculation. If an incoming column (or the accumulated buffer plus incoming rows) has size larger than or equal `dictionary_speculate_min_chunk_size` the writer will run speculation immediately and set an encoding (DICT, PLAIN or BIT_SHUFFLE) rather than buffering more rows. Speculation uses `dictionary_encoding_ratio` for string columns and `dictionary_encoding_ratio_for_non_string_column` for numeric/non-string columns to decide whether dictionary encoding is beneficial. Also, a large column byte_size (larger than or equal to UINT32_MAX) forces immediate speculation to avoid `BinaryColumn` overflow. +- Introduced in: v3.2.0 + +##### disable_storage_page_cache + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: A boolean value to control whether to disable PageCache. + - When PageCache is enabled, StarRocks caches the recently scanned data. + - PageCache can significantly improve the query performance when similar queries are repeated frequently. + - `true` indicates disabling PageCache. + - The default value of this item has been changed from `true` to `false` since StarRocks v2.4. +- Introduced in: - + +##### enable_bitmap_index_memory_page_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable memory cache for Bitmap index. Memory cache is recommended if you want to use Bitmap indexes to accelerate point queries. +- Introduced in: v3.1 + +##### enable_compaction_flat_json + +- Default: True +- Type: Boolean +- Unit: +- Is mutable: Yes +- Description: Whether to enable compaction for Flat JSON data. +- Introduced in: v3.3.3 + +##### enable_json_flat + +- Default: false +- Type: Boolean +- Unit: +- Is mutable: Yes +- Description: Whether to enable the Flat JSON feature. After this feature is enabled, newly loaded JSON data will be automatically flattened, improving JSON query performance. +- Introduced in: v3.3.0 + +##### enable_lazy_dynamic_flat_json + +- Default: True +- Type: Boolean +- Unit: +- Is mutable: Yes +- Description: Whether to enable Lazy Dyamic Flat JSON when a query misses Flat JSON schema in read process. When this item is set to `true`, StarRocks will postpone the Flat JSON operation to calculation process instead of read process. +- Introduced in: v3.3.3 + +##### enable_ordinal_index_memory_page_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable memory cache for ordinal index. Ordinal index is a mapping from row IDs to data page positions, and it can be used to accelerate scans. +- Introduced in: - + +##### enable_string_prefix_zonemap + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable ZoneMap for string (CHAR/VARCHAR) columns using prefix-based min/max. For non-key string columns, the min/max values are truncated to a fixed prefix length configured by `string_prefix_zonemap_prefix_len`. +- Introduced in: - + +##### enable_zonemap_index_memory_page_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable memory cache for zonemap index. Memory cache is recommended if you want to use zonemap indexes to accelerate scan. +- Introduced in: - + +##### exchg_node_buffer_size_bytes + +- Default: 10485760 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum buffer size on the receiver end of an exchange node for each query. This configuration item is a soft limit. A backpressure is triggered when data is sent to the receiver end with an excessive speed. +- Introduced in: - + +##### file_descriptor_cache_capacity + +- Default: 16384 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of file descriptors that can be cached. +- Introduced in: - + +##### flamegraph_tool_dir + +- Default: `${STARROCKS_HOME}/bin/flamegraph` +- Type: String +- Unit: - +- Is mutable: No +- Description: Directory of the flamegraph tool, which should contain pprof, stackcollapse-go.pl, and flamegraph.pl scripts for generating flame graphs from profile data. +- Introduced in: - + +##### fragment_pool_queue_size + +- Default: 2048 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The upper limit of the query number that can be processed on each BE node. +- Introduced in: - + +##### fragment_pool_thread_num_max + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of threads used for query. +- Introduced in: - + +##### fragment_pool_thread_num_min + +- Default: 64 +- Type: Int +- Unit: Minutes - +- Is mutable: No +- Description: The minimum number of threads used for query. +- Introduced in: - + +##### hdfs_client_enable_hedged_read + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Specifies whether to enable the hedged read feature. +- Introduced in: v3.0 + +##### hdfs_client_hedged_read_threadpool_size + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Specifies the size of the Hedged Read thread pool on your HDFS client. The thread pool size limits the number of threads to dedicate to the running of hedged reads in your HDFS client. It is equivalent to the `dfs.client.hedged.read.threadpool.size` parameter in the **hdfs-site.xml** file of your HDFS cluster. +- Introduced in: v3.0 + +##### hdfs_client_hedged_read_threshold_millis + +- Default: 2500 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Specifies the number of milliseconds to wait before starting up a hedged read. For example, you have set this parameter to `30`. In this situation, if a read from a block has not returned within 30 milliseconds, your HDFS client immediately starts up a new read against a different block replica. It is equivalent to the `dfs.client.hedged.read.threshold.millis` parameter in the **hdfs-site.xml** file of your HDFS cluster. +- Introduced in: v3.0 + +##### io_coalesce_adaptive_lazy_active + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Based on the selectivity of predicates, adaptively determines whether to combine the I/O of predicate columns and non-predicate columns. +- Introduced in: v3.2 + +##### jit_lru_cache_size + +- Default: 0 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The LRU cache size for JIT compilation. It represents the actual size of the cache if it is set to greater than 0. If it is set to less than or equal to 0, the system will adaptively set the cache using the formula `jit_lru_cache_size = min(mem_limit*0.01, 1GB)` (while `mem_limit` of the node must be greater or equal to 16 GB). +- Introduced in: - + +##### json_flat_column_max + +- Default: 100 +- Type: Int +- Unit: +- Is mutable: Yes +- Description: The maximum number of sub-fields that can be extracted by Flat JSON. This parameter takes effect only when `enable_json_flat` is set to `true`. +- Introduced in: v3.3.0 + +##### json_flat_create_zonemap + +- Default: true +- Type: Boolean +- Unit: +- Is mutable: Yes +- Description: Whether to create ZoneMaps for flattened JSON sub-columns during write. This parameter takes effect only when `enable_json_flat` is set to `true`. +- Introduced in: - + +##### json_flat_null_factor + +- Default: 0.3 +- Type: Double +- Unit: +- Is mutable: Yes +- Description: The proportion of NULL values in the column to extract for Flat JSON. A column will not be extracted if its proportion of NULL value is higher than this threshold. This parameter takes effect only when `enable_json_flat` is set to `true`. +- Introduced in: v3.3.0 + +##### json_flat_sparsity_factor + +- Default: 0.3 +- Type: Double +- Unit: +- Is mutable: Yes +- Description: The proportion of columns with the same name for Flat JSON. Extraction is not performed if the proportion of columns with the same name is lower than this value. This parameter takes effect only when `enable_json_flat` is set to `true`. +- Introduced in: v3.3.0 + +##### lake_tablet_ignore_invalid_delete_predicate + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: A boolean value to control whether ignore invalid delete predicates in tablet rowset metadata which may be introduced by logic deletion to a duplicate key table after the column name renamed. +- Introduced in: v4.0 + +##### late_materialization_ratio + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Integer ratio in range [0-1000] that controls the use of late materialization in the SegmentIterator (vector query engine). A value of `0` (or ≤ 0) disables late materialization; `1000` (or ≥ 1000) forces late materialization for all reads. Values > 0 and < 1000 enable a conditional strategy where both late and early materialization contexts are prepared and the iterator selects behavior based on predicate filter ratios (higher values favor late materialization). When a segment contains complex metric types, StarRocks uses `metric_late_materialization_ratio` instead. If `lake_io_opts.cache_file_only` is set, late materialization is disabled. +- Introduced in: v3.2.0 + +##### max_hdfs_file_handle + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of HDFS file descriptors that can be opened. +- Introduced in: - + +##### max_memory_sink_batch_count + +- Default: 20 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of Scan Cache batches. +- Introduced in: - + +##### max_pushdown_conditions_per_column + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of conditions that allow pushdown in each column. If the number of conditions exceeds this limit, the predicates are not pushed down to the storage layer. +- Introduced in: - + +##### max_scan_key_num + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of scan keys segmented by each query. +- Introduced in: - + +##### metric_late_materialization_ratio + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Controls when the late-materialization row access strategy is used for reads that include complex metric columns. Valid range: [0-1000]. `0` disables late materialization; `1000` forces late materialization for all applicable reads. Values 1–999 enable a conditional strategy where both late and early materialization contexts are prepared and chosen at runtime based on predicate/selectivity. When complex metric types exist, `metric_late_materialization_ratio` overrides the general `late_materialization_ratio`. Note: `cache_file_only` I/O mode will cause late materialization to be disabled regardless of this setting. +- Introduced in: v3.2.0 + +##### min_file_descriptor_number + +- Default: 60000 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The minimum number of file descriptors in the BE process. +- Introduced in: - + +##### object_storage_connect_timeout_ms + +- Default: -1 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Timeout duration to establish socket connections with object storage. `-1` indicates to use the default timeout duration of the SDK configurations. +- Introduced in: v3.0.9 + +##### object_storage_request_timeout_ms + +- Default: -1 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Timeout duration to establish HTTP connections with object storage. `-1` indicates to use the default timeout duration of the SDK configurations. +- Introduced in: v3.0.9 + +##### parquet_late_materialization_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: A boolean value to control whether to enable the late materialization of Parquet reader to improve performance. `true` indicates enabling late materialization, and `false` indicates disabling it. +- Introduced in: - + +##### parquet_page_index_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: A boolean value to control whether to enable the pageindex of Parquet file to improve performance. `true` indicates enabling pageindex, and `false` indicates disabling it. +- Introduced in: v3.3 + +##### parquet_reader_bloom_filter_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: A boolean value to control whether to enable the bloom filter of Parquet file to improve performance. `true` indicates enabling the bloom filter, and `false` indicates disabling it. You can also control this behavior on session level using the system variable `enable_parquet_reader_bloom_filter`. Bloom filters in Parquet are maintained **at the column level within each row group**. If a Parquet file contains bloom filters for certain columns, queries can use predicates on those columns to efficiently skip row groups. +- Introduced in: v3.5 + +##### path_gc_check_step + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of files that can be scanned continuously each time. +- Introduced in: - + +##### path_gc_check_step_interval_ms + +- Default: 10 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The time interval between file scans. +- Introduced in: - + +##### path_scan_interval_second + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which GC cleans expired data. +- Introduced in: - + +##### pipeline_connector_scan_thread_num_per_cpu + +- Default: 8 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The number of scan threads assigned to Pipeline Connector per CPU core in the BE node. This configuration is changed to dynamic from v3.1.7 onwards. +- Introduced in: - + +##### pipeline_poller_timeout_guard_ms + +- Default: -1 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: When this item is set to greater than `0`, if a driver takes longer than `pipeline_poller_timeout_guard_ms` for a single dispatch in the poller, then the information of the driver and operator is printed. +- Introduced in: - + +##### pipeline_prepare_thread_pool_queue_size + +- Default: 102400 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum queue lenggth of PREPARE fragment thread pool for Pipeline execution engine. +- Introduced in: - + +##### pipeline_prepare_thread_pool_thread_num + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Number of threads in the pipeline execution engine PREPARE fragment thread pool. `0` indicates the value is equal to the number of system VCPU core number. +- Introduced in: - + +##### pipeline_prepare_timeout_guard_ms + +- Default: -1 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: When this item is set to greater than `0`, if a plan fragment exceeds `pipeline_prepare_timeout_guard_ms` during the PREPARE process, a stack trace of the plan fragment is printed. +- Introduced in: - + +##### pipeline_scan_thread_pool_queue_size + +- Default: 102400 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum task queue length of SCAN thread pool for Pipeline execution engine. +- Introduced in: - + +##### pk_index_parallel_get_threadpool_size + +- Default: 1048576 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Sets the maximum queue size (number of pending tasks) for the "cloud_native_pk_index_get" thread pool used by PK index parallel get operations in shared-data (cloud-native/lake) mode. The actual thread count for that pool is controlled by `pk_index_parallel_get_threadpool_max_threads`; this setting only limits how many tasks may be queued awaiting execution. The very large default (2^20) effectively makes the queue unbounded; lowering it prevents excessive memory growth from queued tasks but may cause task submissions to block or fail when the queue is full. Tune together with `pk_index_parallel_get_threadpool_max_threads` based on workload concurrency and memory constraints. +- Introduced in: - + +##### priority_queue_remaining_tasks_increased_frequency + +- Default: 512 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls how often the BlockingPriorityQueue increases ("ages") the priority of all remaining tasks to avoid starvation. Each successful get/pop increments an internal `_upgrade_counter`; when `_upgrade_counter` exceeds `priority_queue_remaining_tasks_increased_frequency`, the queue increments every element's priority, rebuilds the heap, and resets the counter. Lower values cause more frequent priority aging (reducing starvation but increasing CPU cost due to iterating and re-heapifying); higher values reduce that overhead but delay priority adjustments. The value is a simple operation count threshold, not a time duration. +- Introduced in: v3.2.0 + +##### query_cache_capacity + +- Default: 536870912 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The size of the query cache in the BE. The default size is 512 MB. The size cannot be less than 4 MB. If the memory capacity of the BE is insufficient to provision your expected query cache size, you can increase the memory capacity of the BE. +- Introduced in: - + +##### query_pool_spill_mem_limit_threshold + +- Default: 1.0 +- Type: Double +- Unit: - +- Is mutable: No +- Description: If automatic spilling is enabled, when the memory usage of all queries exceeds `query_pool memory limit * query_pool_spill_mem_limit_threshold`, intermediate result spilling will be triggered. +- Introduced in: v3.2.7 + +##### query_scratch_dirs + +- Default: `${STARROCKS_HOME}` +- Type: string +- Unit: - +- Is mutable: No +- Description: Comma-separated list of writable scratch directories used by query execution to spill intermediate data (for example, external sorts, hash joins, and other operators). Specify one or more paths separated by `;` (e.g. `/mnt/ssd1/tmp;/mnt/ssd2/tmp`). Directories should be accessible and writable by the BE process and have sufficient free space; StarRocks will pick among them to distribute spill I/O. Changes require a restart to take effect. If a directory is missing, not writable, or full, spilling may fail or degrade query performance. +- Introduced in: v3.2.0 + +##### result_buffer_cancelled_interval_time + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The wait time before BufferControlBlock releases data. +- Introduced in: - + +##### scan_context_gc_interval_min + +- Default: 5 +- Type: Int +- Unit: Minutes +- Is mutable: Yes +- Description: The time interval at which to clean the Scan Context. +- Introduced in: - + +##### scanner_row_num + +- Default: 16384 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum row count returned by each scan thread in a scan. +- Introduced in: - + +##### scanner_thread_pool_queue_size + +- Default: 102400 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of scan tasks supported by the storage engine. +- Introduced in: - + +##### scanner_thread_pool_thread_num + +- Default: 48 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of threads which the storage engine used for concurrent storage volume scanning. All threads are managed in the thread pool. +- Introduced in: - + +##### string_prefix_zonemap_prefix_len + +- Default: 16 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Prefix length used for string ZoneMap min/max when `enable_string_prefix_zonemap` is enabled. +- Introduced in: - + +##### udf_thread_pool_size + +- Default: 1 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Sets the size of the UDF call PriorityThreadPool created in ExecEnv (used for executing user-defined functions / UDF-related tasks). The value is used as the pool thread count and also as the pool queue capacity when constructing the thread pool (PriorityThreadPool("udf", thread_num, queue_size)). Increase to allow more concurrent UDF executions; keep small to avoid excessive CPU and memory contention. +- Introduced in: v3.2.0 + +##### update_memory_limit_percent + +- Default: 60 +- Type: Int +- Unit: Percent +- Is mutable: No +- Description: Fraction of the BE process memory reserved for update-related memory and caches. During startup `GlobalEnv` computes the `MemTracker` for updates as process_mem_limit * clamp(update_memory_limit_percent, 0, 100) / 100. `UpdateManager` also uses this percentage to size its primary-index/index-cache capacity (index cache capacity = GlobalEnv::process_mem_limit * update_memory_limit_percent / 100). The HTTP config update logic registers a callback that calls `update_primary_index_memory_limit` on the update managers, so changes would be applied to the update subsystem if the config were changed. Increasing this value gives more memory to update/primary-index paths (reducing memory available for other pools); decreasing it reduces update memory and cache capacity. Values are clamped to the range 0–100. +- Introduced in: v3.2.0 + +##### vector_chunk_size + +- Default: 4096 +- Type: Int +- Unit: Rows +- Is mutable: No +- Description: The number of rows per vectorized chunk (batch) used throughout the execution and storage code paths. This value controls Chunk and RuntimeState batch_size creation, affects operator throughput, memory footprint per operator, spill and sort buffer sizing, and I/O heuristics (for example, ORC writer natural write size). Increasing it can improve CPU and I/O efficiency for wide/CPU-bound workloads but raises peak memory usage and can increase latency for small-result queries. Tune only when profiling shows batch-size is a bottleneck; otherwise keep the default for balanced memory and performance. +- Introduced in: v3.2.0 + +### Loading + +##### clear_transaction_task_worker_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used for clearing transaction. +- Introduced in: - + +##### column_mode_partial_update_insert_batch_size + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Batch size for column mode partial update when processing inserted rows. If this item is set to `0` or negative, it will be clamped to `1` to avoid infinite loop. This item controls the number of newly inserted rows processed in each batch. Larger values can improve write performance but will consume more memory. +- Introduced in: v3.5.10, v4.0.2 + +##### enable_load_spill_parallel_merge + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Specifies whether to enable parallel spill merge within a single tablet. Enabling this can improve the performance of spill merge during data loading. +- Introduced in: - + +##### enable_stream_load_verbose_log + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Specifies whether to log the HTTP requests and responses for Stream Load jobs. +- Introduced in: v2.5.17, v3.0.9, v3.1.6, v3.2.1 + +##### flush_thread_num_per_store + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Number of threads that are used for flushing MemTable in each store. +- Introduced in: - + +##### lake_flush_thread_num_per_store + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Number of threads that are used for flushing MemTable in each store in a shared-data cluster. +When this value is set to `0`, the system uses twice of the CPU core count as the value. +When this value is set to less than `0`, the system uses the product of its absolute value and the CPU core count as the value. +- Introduced in: v3.1.12, 3.2.7 + +##### load_data_reserve_hours + +- Default: 4 +- Type: Int +- Unit: Hours +- Is mutable: No +- Description: The reservation time for the files produced by small-scale loadings. +- Introduced in: - + +##### load_error_log_reserve_hours + +- Default: 48 +- Type: Int +- Unit: Hours +- Is mutable: Yes +- Description: The time for which data loading logs are reserved. +- Introduced in: - + +##### load_process_max_memory_limit_bytes + +- Default: 107374182400 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The maximum size limit of memory resources that can be taken up by all load processes on a BE node. +- Introduced in: - + +##### load_spill_memory_usage_per_merge + +- Default: 1073741824 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum memory usage per merge operation during spill merge. Default is 1 GB (1073741824 bytes). This parameter controls the memory consumption of individual merge tasks during data loading spill merge to prevent excessive memory usage. +- Introduced in: - + +##### max_consumer_num_per_group + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of consumers in a consumer group of Routine Load. +- Introduced in: - + +##### max_runnings_transactions_per_txn_map + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of transactions that can run concurrently in each partition. +- Introduced in: - + +##### number_tablet_writer_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of tablet writer threads used in ingestion, such as Stream Load, Broker Load and Insert. When the parameter is set to less than or equal to 0, the system uses half of the number of CPU cores, with a minimum of 16. When the parameter is set to greater than 0, the system uses that value. This configuration is changed to dynamic from v3.1.7 onwards. +- Introduced in: - + +##### push_worker_count_high_priority + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used to handle a load task with HIGH priority. +- Introduced in: - + +##### push_worker_count_normal_priority + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used to handle a load task with NORMAL priority. +- Introduced in: - + +##### streaming_load_max_batch_size_mb + +- Default: 100 +- Type: Int +- Unit: MB +- Is mutable: Yes +- Description: The maximum size of a JSON file that can be streamed into StarRocks. +- Introduced in: - + +##### streaming_load_max_mb + +- Default: 102400 +- Type: Int +- Unit: MB +- Is mutable: Yes +- Description: The maximum size of a file that can be streamed into StarRocks. From v3.0, the default value has been changed from `10240` to `102400`. +- Introduced in: - + +##### streaming_load_rpc_max_alive_time_sec + +- Default: 1200 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The RPC timeout for Stream Load. +- Introduced in: - + +##### transaction_publish_version_thread_pool_idle_time_ms + +- Default: 60000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: The idle time before a thread is reclaimed by the Publish Version thread pool. +- Introduced in: - + +##### transaction_publish_version_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads used to publish a version. When this value is set to less than or equal to `0`, the system uses the CPU core count as the value, so as to avoid insufficient thread resources when import concurrency is high but only a fixed number of threads are used. From v2.5, the default value has been changed from `8` to `0`. +- Introduced in: - + +##### write_buffer_size + +- Default: 104857600 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The buffer size of MemTable in the memory. This configuration item is the threshold to trigger a flush. +- Introduced in: - + +### Loading and unloading + +##### broker_write_timeout_seconds + +- Default: 30 +- Type: int +- Unit: Seconds +- Is mutable: No +- Description: Timeout (in seconds) used by backend broker operations for write/IO RPCs. The value is multiplied by 1000 to produce millisecond timeouts and is passed as the default timeout_ms to BrokerFileSystem and BrokerServiceConnection instances (e.g., file export and snapshot upload/download). Increase this when brokers or network are slow or when transferring large files to avoid premature timeouts; decreasing it may cause broker RPCs to fail earlier. This value is defined in common/config and is applied at process start (not dynamically reloadable). +- Introduced in: v3.2.0 + +##### enable_load_channel_rpc_async + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, handling of load-channel open RPCs (for example, `PTabletWriterOpen`) is offloaded from the BRPC worker to a dedicated thread pool: the request handler creates a `ChannelOpenTask` and submits it to the internal `_async_rpc_pool` instead of running `LoadChannelMgr::_open` inline. This reduces work and blocking inside BRPC threads and allows tuning concurrency via `load_channel_rpc_thread_pool_num` and `load_channel_rpc_thread_pool_queue_size`. If the thread pool submission fails (when pool is full or shut down), the request is canceled and an error status is returned. The pool is shut down on `LoadChannelMgr::close()`, so consider capacity and lifecycle when you want to enable this feature so as to avoid request rejections or delayed processing. +- Introduced in: v3.5.0 + +##### enable_load_diagnose + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, StarRocks will attempt an automated load diagnosis from BE OlapTableSink/NodeChannel after a brpc timeout matching "[E1008]Reached timeout". The code creates a `PLoadDiagnoseRequest` and sends an RPC to the remote LoadChannel to collect a profile and/or stack trace (controlled by `load_diagnose_rpc_timeout_profile_threshold_ms` and `load_diagnose_rpc_timeout_stack_trace_threshold_ms`). The diagnose RPC uses `load_diagnose_send_rpc_timeout_ms` as its timeout. Diagnosis is skipped if a diagnose request is already in progress. Enabling this produces additional RPCs and profiling work on target nodes; disable on sensitive production workloads to avoid extra overhead. +- Introduced in: v3.5.0 + +##### enable_load_segment_parallel + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When enabled, rowset segment loading and rowset-level reads are performed concurrently using StarRocks background thread pools (ExecEnv::load_segment_thread_pool and ExecEnv::load_rowset_thread_pool). Rowset::load_segments and TabletReader::get_segment_iterators submit per-segment or per-rowset tasks to these pools, falling back to serial loading and logging a warning if submission fails. Enable this to reduce read/load latency for large rowsets at the cost of increased CPU/IO concurrency and memory pressure. Note: parallel loading can change the load completion order of segments and therefore prevents partial compaction (code checks `_parallel_load` and disables partial compaction when enabled); consider implications for operations that rely on segment order. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### enable_streaming_load_thread_pool + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Controls whether streaming load scanners are submitted to the dedicated streaming load thread pool. When enabled and a query is a LOAD with `TLoadJobType::STREAM_LOAD`, ConnectorScanNode submits scanner tasks to the `streaming_load_thread_pool` (which is configured with INT32_MAX threads and queue sizes, i.e. effectively unbounded). When disabled, scanners use the general `thread_pool` and its `PriorityThreadPool` submission logic (priority computation, try_offer/offer behavior). Enabling isolates streaming-load work from regular query execution to reduce interference; however, because the dedicated pool is effectively unbounded, enabling may increase concurrent threads and resource usage under heavy streaming-load traffic. This option is on by default and typically does not require modification. +- Introduced in: v3.2.0 + +##### es_http_timeout_ms + +- Default: 5000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: HTTP connection timeout (in milliseconds) used by the ES network client in ESScanReader for Elasticsearch scroll requests. This value is applied via network_client.set_timeout_ms() before sending subsequent scroll POSTs and controls how long the client waits for an ES response during scrolling. Increase this value for slow networks or large queries to avoid premature timeouts; decrease to fail faster on unresponsive ES nodes. This setting complements `es_scroll_keepalive`, which controls the scroll context keep-alive duration. +- Introduced in: v3.2.0 + +##### es_index_max_result_window + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Limits the maximum number of documents StarRocks will request from Elasticsearch in a single batch. StarRocks sets the ES request batch size to min(`es_index_max_result_window`, `chunk_size`) when building `KEY_BATCH_SIZE` for the ES reader. If an ES request exceeds the Elasticsearch index setting `index.max_result_window`, Elasticsearch returns HTTP 400 (Bad Request). Adjust this value when scanning large indexes or increase the ES `index.max_result_window` on the Elasticsearch side to permit larger single requests. +- Introduced in: v3.2.0 + +##### ignore_load_tablet_failure + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `false`, the system will treat any tablet header load failures (non-NotFound and non-AlreadyExist errors) as fatal: the code logs the error and calls LOG(FATAL) to stop the BE process. When it is set to `true`, the BE continues startup despite such per-tablet load errors — failed tablet IDs are recorded and skipped while successful tablets are still loaded. Note that this parameter does NOT suppress fatal errors from the RocksDB meta scan itself, which always cause the process to quit. +- Introduced in: v3.2.0 + +##### load_channel_abort_clean_up_delay_seconds + +- Default: 600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Controls how long (in seconds) the system keeps the load IDs of aborted load channels before removing them from `_aborted_load_channels`. When a load job is cancelled or fails, the load ID stays recorded so any late-arriving load RPCs can be rejected immediately; once the delay expires, the entry is cleaned during the periodic background sweep (minimum sweep interval is 60 seconds). Setting the delay too low risks accepting stray RPCs after an abort, while setting it too high may retain state and consume resources longer than necessary. Tune this to balance correctness of late-request rejection and resource retention for aborted loads. +- Introduced in: v3.5.11, v4.0.4 + +##### load_channel_rpc_thread_pool_num + +- Default: -1 +- Type: Int +- Unit: Threads +- Is mutable: Yes +- Description: Maximum number of threads for the load-channel async RPC thread pool. When set to less than or equal to 0 (default `-1`) the pool size is auto-set to the number of CPU cores (`CpuInfo::num_cores()`). The configured value is used as ThreadPoolBuilder's max threads and the pool's min threads is set to min(5, max_threads). The pool queue size is controlled separately by `load_channel_rpc_thread_pool_queue_size`. This setting was introduced to align the async RPC pool size with brpc workers' default (`brpc_num_threads`) so behavior remains compatible after switching load RPC handling from synchronous to asynchronous. Changing this config at runtime triggers `ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)`. +- Introduced in: v3.5.0 + +##### load_channel_rpc_thread_pool_queue_size + +- Default: 1024000 +- Type: int +- Unit: Count +- Is mutable: No +- Description: Sets the maximum pending-task queue size for the Load channel RPC thread pool created by LoadChannelMgr. This thread pool executes asynchronous `open` requests when `enable_load_channel_rpc_async` is enabled; the pool size is paired with `load_channel_rpc_thread_pool_num`. The large default (1024000) aligns with brpc workers' defaults to preserve behavior after switching from synchronous to asynchronous handling. If the queue is full, ThreadPool::submit() will fail and the incoming open RPC is cancelled with an error, causing the caller to receive a rejection. Increase this value to buffer larger bursts of concurrent `open` requests; reducing it tightens backpressure but may cause more rejections under load. +- Introduced in: v3.5.0 + +##### load_diagnose_rpc_timeout_profile_threshold_ms + +- Default: 60000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: When a load RPC times out (error contains "[E1008]Reached timeout") and `enable_load_diagnose` is true, this threshold controls whether a full profiling diagnose is requested. If the request-level RPC timeout `_rpc_timeout_ms` is greater than `load_diagnose_rpc_timeout_profile_threshold_ms`, profiling is enabled for that diagnose. For smaller `_rpc_timeout_ms` values, profiling is sampled once every 20 timeouts to avoid frequent heavy diagnostics for real-time/short-timeout loads. This value affects the `profile` flag in the `PLoadDiagnoseRequest` sent; stack-trace behavior is controlled separately by `load_diagnose_rpc_timeout_stack_trace_threshold_ms` and send timeout by `load_diagnose_send_rpc_timeout_ms`. +- Introduced in: v3.5.0 + +##### load_diagnose_rpc_timeout_stack_trace_threshold_ms + +- Default: 600000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Threshold (in ms) used to decide when to request remote stack traces for long-running load RPCs. When a load RPC times out with a timeout error and the effective RPC timeout (_rpc_timeout_ms) exceeds this value, `OlapTableSink`/`NodeChannel` will include `stack_trace=true` in a `load_diagnose` RPC to the target BE so the BE can return stack traces for debugging. `LocalTabletsChannel::SecondaryReplicasWaiter` also triggers a best-effort stack-trace diagnose from the primary if waiting for secondary replicas exceeds this interval. This behavior requires `enable_load_diagnose` and uses `load_diagnose_send_rpc_timeout_ms` for the diagnose RPC timeout; profiling is gated separately by `load_diagnose_rpc_timeout_profile_threshold_ms`. Lowering this value increases how aggressively stack traces are requested. +- Introduced in: v3.5.0 + +##### load_diagnose_send_rpc_timeout_ms + +- Default: 2000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Timeout (in milliseconds) applied to diagnosis-related brpc calls initiated by BE load paths. It is used to set the controller timeout for `load_diagnose` RPCs (sent by NodeChannel/OlapTableSink when a LoadChannel brpc call times out) and for replica-status queries (used by SecondaryReplicasWaiter / LocalTabletsChannel when checking primary replica state). Choose a value high enough to allow the remote side to respond with profile or stack-trace data, but not so high that failure handling is delayed. This parameter works together with `enable_load_diagnose`, `load_diagnose_rpc_timeout_profile_threshold_ms`, and `load_diagnose_rpc_timeout_stack_trace_threshold_ms` which control when and what diagnostic information is requested. +- Introduced in: v3.5.0 + +##### load_fp_brpc_timeout_ms + +- Default: -1 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Overrides the per-channel brpc RPC timeout used by OlapTableSink when the `node_channel_set_brpc_timeout` fail point is triggered. If set to a positive value, NodeChannel will set its internal `_rpc_timeout_ms` to this value (in milliseconds) causing open/add-chunk/cancel RPCs to use the shorter timeout and enabling simulation of brpc timeouts that produce the "[E1008]Reached timeout" error. Default (`-1`) disables the override. Changing this value is intended for testing and fault injection; small values may produce false timeouts and trigger load diagnostics (see `enable_load_diagnose`, `load_diagnose_rpc_timeout_profile_threshold_ms`, `load_diagnose_rpc_timeout_stack_trace_threshold_ms`, and `load_diagnose_send_rpc_timeout_ms`). +- Introduced in: v3.5.0 + +##### load_fp_tablets_channel_add_chunk_block_ms + +- Default: -1 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: When enabled (set to a positive milliseconds value) this fail-point configuration makes TabletsChannel::add_chunk sleep for the specified time during load processing. It is used to simulate BRPC timeout errors (e.g., "[E1008]Reached timeout") and to emulate an expensive add_chunk operation that increases load latency. A value less than or equal to 0 (default `-1`) disables the injection. Intended for testing fault handling, timeouts, and replica synchronization behavior — do not enable in normal production workloads as it delays write completion and can trigger upstream timeouts or replica aborts. +- Introduced in: v3.5.0 + +##### load_segment_thread_pool_num_max + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Sets the maximum number of worker threads for BE load-related thread pools. This value is used by ThreadPoolBuilder to limit threads for both `load_rowset_pool` and `load_segment_pool` in exec_env.cpp, controlling concurrency for processing loaded rowsets and segments (e.g., decoding, indexing, writing) during streaming and batch loads. Increasing this value raises parallelism and can improve load throughput but also increases CPU, memory usage, and potential contention; decreasing it limits concurrent load processing and may reduce throughput. Tune together with `load_segment_thread_pool_queue_size` and `streaming_load_thread_pool_idle_time_ms`. Change requires BE restart. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### load_segment_thread_pool_queue_size + +- Default: 10240 +- Type: Int +- Unit: Tasks +- Is mutable: No +- Description: Sets the maximum queue length (number of pending tasks) for the load-related thread pools created as "load_rowset_pool" and "load_segment_pool". These pools use `load_segment_thread_pool_num_max` for their max thread count and this configuration controls how many load segment/rowset tasks can be buffered before the ThreadPool's overflow policy takes effect (further submissions may be rejected or blocked depending on the ThreadPool implementation). Increase to allow more pending load work (uses more memory and can raise latency); decrease to limit buffered load concurrency and reduce memory usage. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### max_pulsar_consumer_num_per_group + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls the maximum number of Pulsar consumers that may be created in a single data consumer group for routine load on a BE. Because cumulative acknowledge is not supported for multi-topic subscriptions, each consumer subscribes exactly one topic/partition; if the number of partitions in `pulsar_info->partitions` exceeds this value, group creation fails with an error advising to increase `max_pulsar_consumer_num_per_group` on the BE or add more BEs. This limit is enforced when constructing a PulsarDataConsumerGroup and prevents a BE from hosting more than this many consumers for one routine load group. For Kafka routine load, `max_consumer_num_per_group` is used instead. +- Introduced in: v3.2.0 + +##### pull_load_task_dir + +- Default: `${STARROCKS_HOME}/var/pull_load` +- Type: string +- Unit: - +- Is mutable: No +- Description: Filesystem path where the BE stores data and working files for "pull load" tasks (downloaded source files, task state, temporary output, etc.). The directory must be writable by the BE process and have sufficient disk space for incoming loads. The default is relative to STARROCKS_HOME; tests create and expect this directory to exist (see test configuration). +- Introduced in: v3.2.0 + +##### routine_load_kafka_timeout_second + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: Timeout in seconds used for Kafka-related routine load operations. When a client request does not specify a timeout, `routine_load_kafka_timeout_second` is used as the default RPC timeout (converted to milliseconds) for `get_info`. It is also used as the per-call consume poll timeout for the librdkafka consumer (converted to milliseconds and capped by remaining runtime). Note: the internal `get_info` path reduces this value to 80% before passing it to librdkafka to avoid FE-side timeout races. Set this to a value that balances timely failure reporting and sufficient time for network/broker responses; changes require a restart because the setting is not mutable. +- Introduced in: v3.2.0 + +##### routine_load_pulsar_timeout_second + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: Default timeout (in seconds) that the BE uses for Pulsar-related routine load operations when the request does not supply an explicit timeout. Specifically, `PInternalServiceImplBase::get_pulsar_info` multiplies this value by 1000 to form the millisecond timeout passed to the routine load task executor methods that fetch Pulsar partition metadata and backlog. Increase to allow slower Pulsar responses at the cost of longer failure detection; decrease to fail faster on slow brokers. Analogous to `routine_load_kafka_timeout_second` used for Kafka. +- Introduced in: v3.2.0 + +##### streaming_load_thread_pool_idle_time_ms + +- Default: 2000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Sets the thread idle timeout (in milliseconds) for streaming-load related thread pools. The value is used as the idle timeout passed to ThreadPoolBuilder for the `stream_load_io` pool and also for `load_rowset_pool` and `load_segment_pool`. Threads in these pools are reclaimed when idle for this duration; lower values reduce idle resource usage but increase thread creation overhead, while higher values keep threads alive longer. The `stream_load_io` pool is used when `enable_streaming_load_thread_pool` is enabled. +- Introduced in: v3.2.0 + +##### streaming_load_thread_pool_num_min + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Minimum number of threads for the streaming load IO thread pool ("stream_load_io") created during ExecEnv initialization. The pool is built with `set_max_threads(INT32_MAX)` and `set_max_queue_size(INT32_MAX)` so it is effectively unbounded to avoid deadlocks for concurrent streaming loads. A value of 0 lets the pool start with no threads and grow on demand; setting a positive value reserves that many threads at startup. This pool is used when `enable_streaming_load_thread_pool` is true and its idle timeout is controlled by `streaming_load_thread_pool_idle_time_ms`. Overall concurrency is still constrained by `fragment_pool_thread_num_max` and `webserver_num_workers`; changing this value is rarely necessary and may increase resource usage if set too high. +- Introduced in: v3.2.0 + +### Statistic report + +##### enable_metric_calculator + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: No +- Description: When true, the BE process launches a background "metrics_daemon" thread (started in Daemon::init on non-Apple platforms) that runs every ~15 seconds to invoke `StarRocksMetrics::instance()->metrics()->trigger_hook()` and compute derived/system metrics (e.g., push/query bytes/sec, max disk I/O util, max network send/receive rates), log memory breakdowns and run table metrics cleanup. When false, those hooks are executed synchronously inside `MetricRegistry::collect` at metric collection time, which can increase metric-scrape latency. Requires process restart to take effect. +- Introduced in: v3.2.0 + +##### enable_system_metrics + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When true, StarRocks initializes system-level monitoring during startup: it discovers disk devices from the configured store paths and enumerates network interfaces, then passes this information into the metrics subsystem to enable collection of disk I/O, network traffic and memory-related system metrics. If device or interface discovery fails, initialization logs a warning and aborts system metrics setup. This flag only controls whether system metrics are initialized; periodic metric aggregation threads are controlled separately by `enable_metric_calculator`, and JVM metrics initialization is controlled by `enable_jvm_metrics`. Changing this value requires a restart. +- Introduced in: v3.2.0 + +##### profile_report_interval + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Interval in seconds that the ProfileReportWorker uses to (1) decide when to report per-fragment profile information for LOAD queries and (2) sleep between reporting cycles. The worker compares current time against each task's last_report_time using (profile_report_interval * 1000) ms to determine if a profile should be re-reported for both non-pipeline and pipeline load tasks. At each loop the worker reads the current value (mutable at runtime); if the configured value is less than or euqual to 0 the worker forces it to 1 and emits a warning. Changing this value affects the next reporting decision and sleep duration. +- Introduced in: v3.2.0 + +##### report_disk_state_interval_seconds + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to report the storage volume state, which includes the size of data within the volume. +- Introduced in: - + +##### report_resource_usage_interval_ms + +- Default: 1000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Interval, in milliseconds, between periodic resource-usage reports sent by the BE agent to the FE (master). The agent worker thread collects TResourceUsage (number of running queries, memory used/limit, CPU used permille, and resource-group usages) and calls report_task, then sleeps for this configured interval (see task_worker_pool). Lower values increase reporting timeliness but raise CPU, network, and master load; higher values reduce overhead but make resource information less current. The reporting updates related metrics (report_resource_usage_requests_total, report_resource_usage_requests_failed). Tune according to cluster scale and FE load. +- Introduced in: v3.2.0 + +##### report_tablet_interval_seconds + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to report the most updated version of all tablets. +- Introduced in: - + +##### report_task_interval_seconds + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to report the state of a task. A task can be creating a table, dropping a table, loading data, or changing a table schema. +- Introduced in: - + +##### report_workgroup_interval_seconds + +- Default: 5 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to report the most updated version of all workgroups. +- Introduced in: - + +### Storage + +##### alter_tablet_worker_count + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of threads used for Schema Change. +- Introduced in: - + +##### avro_ignore_union_type_tag + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to strip the type tag from the JSON string serialized from the Avro Union data type. +- Introduced in: v3.3.7, v3.4 + +##### base_compaction_check_interval_seconds + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval of thread polling for a Base Compaction. +- Introduced in: - + +##### base_compaction_interval_seconds_since_last_operation + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval since the last Base Compaction. This configuration item is one of the conditions that trigger a Base Compaction. +- Introduced in: - + +##### base_compaction_num_threads_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used for Base Compaction on each storage volume. +- Introduced in: - + +##### base_cumulative_delta_ratio + +- Default: 0.3 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The ratio of cumulative file size to base file size. The ratio reaching this value is one of the conditions that trigger the Base Compaction. +- Introduced in: - + +##### chaos_test_enable_random_compaction_strategy + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, TabletUpdates::compaction() uses the random compaction strategy (compaction_random) intended for chaos engineering tests. This flag forces compaction to follow a nondeterministic/random policy instead of normal strategies (e.g., size-tiered compaction), and takes precedence during compaction selection for the tablet. It is intended only for controlled testing: enabling it can produce unpredictable compaction order, increased I/O/CPU, and test flakiness. Do not enable in production; use only for fault-injection or chaos-test scenarios. +- Introduced in: v3.3.12, 3.4.2, 3.5.0, 4.0.0 + +##### check_consistency_worker_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used for checking the consistency of tablets. +- Introduced in: - + +##### clear_expired_replication_snapshots_interval_seconds + +- Default: 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which the system clears the expired snapshots left by abnormal replications. +- Introduced in: v3.3.5 + +##### compact_threads + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads used for concurrent compaction tasks. This configuration is changed to dynamic from v3.1.7 and v3.2.2 onwards. +- Introduced in: v3.0.0 + +##### compaction_max_memory_limit + +- Default: -1 +- Type: Long +- Unit: Bytes +- Is mutable: No +- Description: Global upper bound (in bytes) for memory available to compaction tasks on this BE. During BE initialization the final compaction memory limit is computed as min(`compaction_max_memory_limit`, process_mem_limit * `compaction_max_memory_limit_percent` / 100). If `compaction_max_memory_limit` is negative (default `-1`) it falls back to the BE process memory limit derived from `mem_limit`. The percent value is clamped to [0,100]. If the process memory limit is not set (negative) compaction memory remains unlimited (`-1`). This computed value is used to initialize the `_compaction_mem_tracker`. See also `compaction_max_memory_limit_percent` and `compaction_memory_limit_per_worker`. +- Introduced in: v3.2.0 + +##### compaction_max_memory_limit_percent + +- Default: 100 +- Type: Int +- Unit: Percent +- Is mutable: No +- Description: Percentage of the BE process memory that may be used for compaction. The BE computes the compaction memory cap as the minimum of `compaction_max_memory_limit` and (process memory limit × this percent / 100). If this value is < 0 or > 100 it is treated as 100. If `compaction_max_memory_limit` < 0 the process memory limit is used instead. The calculation also considers the BE process memory derived from `mem_limit`. Combined with `compaction_memory_limit_per_worker` (per-worker cap), this setting controls total compaction memory available and therefore affects compaction concurrency and OOM risk. +- Introduced in: v3.2.0 + +##### compaction_memory_limit_per_worker + +- Default: 2147483648 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The maximum memory size allowed for each Compaction thread. +- Introduced in: - + +##### compaction_trace_threshold + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time threshold for each compaction. If a compaction takes more time than the time threshold, StarRocks prints the corresponding trace. +- Introduced in: - + +##### create_tablet_worker_count + +- Default: 3 +- Type: Int +- Unit: Threads +- Is mutable: Yes +- Description: Sets the maximum number of worker threads in the AgentServer thread pool that process TTaskType::CREATE (create-tablet) tasks submitted by FE. At BE startup this value is used as the thread-pool max (the pool is created with min threads = 1 and max queue size = unlimited), and changing it at runtime triggers `ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)`. Increase this to raise concurrent tablet creation throughput (useful during bulk load or partition creation); decreasing it throttles concurrent create operations. Raising the value increases CPU, memory and I/O concurrency and may cause contention; the thread pool enforces at least one thread, so values less than 1 have no practical effect. +- Introduced in: v3.2.0 + +##### cumulative_compaction_check_interval_seconds + +- Default: 1 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval of thread polling for a Cumulative Compaction. +- Introduced in: - + +##### cumulative_compaction_num_threads_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of Cumulative Compaction threads per disk. +- Introduced in: - + +##### data_page_size + +- Default: 65536 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Target uncompressed page size (in bytes) used when building column data and index pages. This value is copied into ColumnWriterOptions.data_page_size and IndexedColumnWriterOptions.index_page_size and is consulted by page builders (e.g., BinaryPlainPageBuilder::is_page_full and buffer reservation logic) to decide when to finish a page and how much memory to reserve. A value of 0 disables the page-size limit in builders. Changing this value affects page count, metadata overhead, memory reservation and I/O/compression trade-offs (smaller pages → more pages and metadata; larger pages → fewer pages, potentially better compression but larger memory spikes). +- Introduced in: v3.2.4 + +##### default_num_rows_per_column_file_block + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of rows that can be stored in each row block. +- Introduced in: - + +##### delete_worker_count_high_priority + +- Default: 1 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Number of worker threads in the DeleteTaskWorkerPool that are allocated as HIGH-priority delete threads. On startup AgentServer creates the delete pool with total threads = delete_worker_count_normal_priority + delete_worker_count_high_priority; the first delete_worker_count_high_priority threads are marked to exclusively try to pop TPriority::HIGH tasks (they poll for high-priority delete tasks and sleep/loop if none are available). Increasing this value increases concurrency for high-priority delete requests; decreasing it reduces dedicated capacity and may increase latency for high-priority deletes. +- Introduced in: v3.2.0 + +##### dictionary_encoding_ratio + +- Default: 0.7 +- Type: Double +- Unit: - +- Is mutable: No +- Description: Fraction (0.0–1.0) used by StringColumnWriter during the encode-speculation phase to decide between dictionary (DICT_ENCODING) and plain (PLAIN_ENCODING) encoding for a chunk. The code computes max_card = row_count * `dictionary_encoding_ratio` and scans the chunk’s distinct key count; if the distinct count exceeds max_card the writer chooses PLAIN_ENCODING. The check is performed only when the chunk size passes `dictionary_speculate_min_chunk_size` (and when row_count > dictionary_min_rowcount). Setting the value higher favors dictionary encoding (tolerates more distinct keys); setting it lower causes earlier fallback to plain encoding. A value of 1.0 effectively forces dictionary encoding (distinct count can never exceed row_count). +- Introduced in: v3.2.0 + +##### dictionary_encoding_ratio_for_non_string_column + +- Default: 0 +- Type: double +- Unit: - +- Is mutable: No +- Description: Ratio threshold used to decide whether to use dictionary encoding for non-string columns (numeric, date/time, decimal types). When enabled (value > 0.0001) the writer computes max_card = row_count * dictionary_encoding_ratio_for_non_string_column and, for samples with row_count > `dictionary_min_rowcount`, chooses DICT_ENCODING only if distinct_count ≤ max_card; otherwise it falls back to BIT_SHUFFLE. A value of `0` (default) disables non-string dictionary encoding. This parameter is analogous to `dictionary_encoding_ratio` but applies to non-string columns. Use values in (0,1] — smaller values restrict dictionary encoding to lower-cardinality columns and reduce dictionary memory/IO overhead. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### dictionary_page_size + +- Default: 1048576 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Size in bytes of dictionary pages used when building rowset segments. This value is read into `PageBuilderOptions::dict_page_size` in the BE rowset code and controls how many dictionary entries can be stored in a single dictionary page. Increasing this value can improve compression ratio for dictionary-encoded columns by allowing larger dictionaries, but larger pages consume more memory during write/encode and can increase I/O and latency when reading or materializing pages. Set conservatively for large-memory, write-heavy workloads and avoid excessively large values to prevent runtime performance degradation. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### disk_stat_monitor_interval + +- Default: 5 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to monitor health status of disks. +- Introduced in: - + +##### download_low_speed_limit_kbps + +- Default: 50 +- Type: Int +- Unit: KB/Second +- Is mutable: Yes +- Description: The download speed lower limit of each HTTP request. An HTTP request aborts when it constantly runs with a lower speed than this value within the time span specified in the configuration item `download_low_speed_time`. +- Introduced in: - + +##### download_low_speed_time + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum time that an HTTP request can run with a download speed lower than the limit. An HTTP request aborts when it constantly runs with a lower speed than the value of `download_low_speed_limit_kbps` within the time span specified in this configuration item. +- Introduced in: - + +##### download_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads for the download tasks of restore jobs on a BE node. `0` indicates setting the value to the number of CPU cores on the machine where the BE resides. +- Introduced in: - + +##### drop_tablet_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of threads used to drop a tablet. `0` indicates half of the CPU cores in the node. +- Introduced in: - + +##### enable_check_string_lengths + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to check the data length during loading to solve compaction failures caused by out-of-bound VARCHAR data. +- Introduced in: - + +##### enable_event_based_compaction_framework + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable the Event-based Compaction Framework. `true` indicates Event-based Compaction Framework is enabled, and `false` indicates it is disabled. Enabling Event-based Compaction Framework can greatly reduce the overhead of compaction in scenarios where there are many tablets or a single tablet has a large amount of data. +- Introduced in: - + +##### enable_lazy_delta_column_compaction + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, compaction will prefer a "lazy" strategy for delta columns produced by partial column updates: StarRocks will avoid eagerly merging delta-column files back into their main segment files to save compaction I/O. In practice the compaction selection code checks for partial column-update rowsets and multiple candidates; if found and this flag is true, the engine will either stop adding further inputs to the compaction or only merge empty rowsets (level -1), leaving delta columns separate. This reduces immediate I/O and CPU during compaction at the cost of delayed consolidation (potentially more segments and temporary storage overhead). Correctness and query semantics are unchanged. +- Introduced in: v3.2.3 + +##### enable_new_load_on_memory_limit_exceeded + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow new loading processes when the hard memory resource limit is reached. `true` indicates new loading processes will be allowed, and `false` indicates they will be rejected. +- Introduced in: v3.3.2 + +##### enable_pk_index_parallel_compaction + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable parallel Compaction for Primary Key index in a shared-data cluster. +- Introduced in: - + +##### enable_pk_index_parallel_execution + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable parallel execution for Primary Key index operations in a shared-data cluster. When enabled, the system uses a thread pool to process segments concurrently during publish operations, significantly improving performance for large tablets. +- Introduced in: - + +##### enable_pk_index_eager_build + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to eagerly build Primary Key index files during data import and compaction phases. When enabled, the system generates persistent PK index files immediately during data writes, improving subsequent query performance. +- Introduced in: - + +##### enable_pk_size_tiered_compaction_strategy + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable the Size-tiered Compaction policy for Primary Key tables. `true` indicates the Size-tiered Compaction strategy is enabled, and `false` indicates it is disabled. +- Introduced in: This item takes effect for shared-data clusters from v3.2.4 and v3.1.10 onwards, and for shared-nothing clusters from v3.2.5 and v3.1.10 onwards. + +##### enable_rowset_verify + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to verify the correctness of generated rowsets. When enabled, the correctness of the generated rowsets will be checked after Compaction and Schema Change. +- Introduced in: - + +##### enable_size_tiered_compaction_strategy + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable the Size-tiered Compaction policy (excluding Primary Key tables). `true` indicates the Size-tiered Compaction strategy is enabled, and `false` indicates it is disabled. +- Introduced in: - + +##### enable_strict_delvec_crc_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enable_strict_delvec_crc_check is set to true, we will perform a strict CRC32 check on the delete vector, and if a mismatch is detected, a failure will be returned. +- Introduced in: - + +##### enable_transparent_data_encryption + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When enabled, StarRocks will create encrypted on‑disk artifacts for newly written storage objects (segment files, delete/update files, rowset segments, lake SSTs, persistent index files, etc.). Writers (RowsetWriter/SegmentWriter, lake UpdateManager/LakePersistentIndex and related code paths) will request encryption info from the KeyCache, attach encryption_info to writable files and persist encryption_meta into rowset / segment / sstable metadata (segment_encryption_metas, delete/update encryption metadata). The Frontend and Backend/CN encryption flags must match — a mismatch causes the BE to abort on heartbeat (LOG(FATAL)). This flag is not runtime‑mutable; enable it before deployment and ensure key management (KEK) and KeyCache are properly configured and synchronized across the cluster. +- Introduced in: v3.3.1, 3.4.0, 3.5.0, 4.0.0 + +##### enable_zero_copy_from_page_cache + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, FixedLengthColumnBase may avoid copying bytes when appending data that originates from a page-cache-backed buffer. In append_numbers the code will acquire the incoming ContainerResource and set the column's internal resource pointer (zero-copy) if all conditions are met: the config is true, the incoming resource is owned, the resource memory is aligned for the column element type, the column is empty, and the resource length is a multiple of the element size. Enabling this reduces CPU and memory-copy overhead and can improve ingestion/scan throughput. Drawbacks: it couples the column lifetime to the acquired buffer and relies on correct ownership/alignment; disable to force safe copying. +- Introduced in: - + +##### file_descriptor_cache_clean_interval + +- Default: 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to clean file descriptors that have not been used for a certain period of time. +- Introduced in: - + +##### ignore_broken_disk + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Controls startup behavior when configured storage paths fail read/write checks or fail to parse. When `false` (default), BE treats any broken entry in `storage_root_path` or `spill_local_storage_dir` as fatal and will abort startup. When `true`, StarRocks will skip (log a warning and remove) any storage path that fails `check_datapath_rw` or fails parsing so the BE can continue starting with the remaining healthy paths. Note: if all configured paths are removed, BE will still exit. Enabling this can mask misconfigured or failed disks and cause data on ignored paths to be unavailable; monitor logs and disk health accordingly. +- Introduced in: v3.2.0 + +##### inc_rowset_expired_sec + +- Default: 1800 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The expiration time of the incoming data. This configuration item is used in incremental clone. +- Introduced in: - + +##### load_process_max_memory_hard_limit_ratio + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The hard limit (ratio) of memory resources that can be taken up by all load processes on a BE node. When `enable_new_load_on_memory_limit_exceeded` is set to `false`, and the memory consumption of all loading processes exceeds `load_process_max_memory_limit_percent * load_process_max_memory_hard_limit_ratio`, new loading processes will be rejected. +- Introduced in: v3.3.2 + +##### load_process_max_memory_limit_percent + +- Default: 30 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The soft limit (in percentage) of memory resources that can be taken up by all load processes on a BE node. +- Introduced in: - + +##### lz4_acceleration + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls the LZ4 "acceleration" parameter used by the built-in LZ4 compressor (passed to LZ4_compress_fast_continue). Higher values prioritize compression speed at the cost of compression ratio; lower values (1) produce better compression but are slower. Valid range: MIN=1, MAX=65537. This setting affects all LZ4-based codecs in BlockCompression (e.g., LZ4 and Hadoop-LZ4) and only changes how compression is performed — it does not change the LZ4 format or decompression compatibility. Tune upward (e.g., 4, 8, ...) for CPU-bound or low-latency workloads where larger output is acceptable; keep at 1 for storage- or IO-sensitive workloads. Test with representative data before changing, since throughput vs. size trade-offs are highly data-dependent. +- Introduced in: v3.4.1, 3.5.0, 4.0.0 + +##### lz4_expected_compression_ratio + +- Default: 2.1 +- Type: double +- Unit: Dimensionless (compression ratio) +- Is mutable: Yes +- Description: Threshold used by the serialization compression strategy to judge whether observed LZ4 compression is "good". In compress_strategy.cpp this value divides the observed compress_ratio when computing a reward metric together with lz4_expected_compression_speed_mbps; if the combined reward `>` 1.0 the strategy records positive feedback. Increasing this value raises the expected compression ratio (making the condition harder to satisfy), while lowering it makes it easier for observed compression to be considered satisfactory. Tune to match typical data compressibility. Valid range: MIN=1, MAX=65537. +- Introduced in: v3.4.1, 3.5.0, 4.0.0 + +##### lz4_expected_compression_speed_mbps + +- Default: 600 +- Type: double +- Unit: MB/s +- Is mutable: Yes +- Description: Expected LZ4 compression throughput in megabytes per second used by the adaptive compression policy (CompressStrategy). The feedback routine computes a reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps). A reward_ratio `>` 1.0 increments the positive counter (alpha), otherwise the negative counter (beta); this influences whether future data will be compressed. Tune this value to reflect typical LZ4 throughput on your hardware — raising it makes the policy harder to classify a run as "good" (requires higher observed speed), lowering it makes classification easier. Must be a positive finite number. +- Introduced in: v3.4.1, 3.5.0, 4.0.0 + +##### make_snapshot_worker_count + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads for the make snapshot tasks on a BE node. +- Introduced in: - + +##### manual_compaction_threads + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Number of threads for Manual Compaction. +- Introduced in: - + +##### max_base_compaction_num_singleton_deltas + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of segments that can be compacted in each Base Compaction. +- Introduced in: - + +##### max_compaction_candidate_num + +- Default: 40960 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of candidate tablets for compaction. If the value is too large, it will cause high memory usage and high CPU load. +- Introduced in: - + +##### max_compaction_concurrency + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum concurrency of compactions (including both Base Compaction and Cumulative Compaction). The value `-1` indicates that no limit is imposed on the concurrency. `0` indicates disabling compaction. This parameter is mutable when the Event-based Compaction Framework is enabled. +- Introduced in: - + +##### max_cumulative_compaction_num_singleton_deltas + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of segments that can be merged in a single Cumulative Compaction. You can reduce this value if OOM occurs during compaction. +- Introduced in: - + +##### max_download_speed_kbps + +- Default: 50000 +- Type: Int +- Unit: KB/Second +- Is mutable: Yes +- Description: The maximum download speed of each HTTP request. This value affects the performance of data replica synchronization across BE nodes. +- Introduced in: - + +##### max_garbage_sweep_interval + +- Default: 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum time interval for garbage collection on storage volumes. This configuration is changed to dynamic from v3.0 onwards. +- Introduced in: - + +##### max_percentage_of_error_disk + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum percentage of error that is tolerable in a storage volume before the corresponding BE node quits. +- Introduced in: - + +##### max_queueing_memtable_per_tablet + +- Default: 2 +- Type: Long +- Unit: Count +- Is mutable: Yes +- Description: Controls per-tablet backpressure for write paths: when a tablet's number of queueing (not-yet-flushing) memtables reaches or exceeds `max_queueing_memtable_per_tablet`, writers in `LocalTabletsChannel` and `LakeTabletsChannel` will block (sleep/retry) before submitting more write work. This reduces simultaneous memtable flush concurrency and peak memory use at the cost of increased latency or RPC timeouts for heavy load. Set higher to allow more concurrent memtables (more memory and I/O burst); set lower to limit memory pressure and increase write throttling. +- Introduced in: v3.2.0 + +##### max_row_source_mask_memory_bytes + +- Default: 209715200 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The maximum memory size of the row source mask buffer. When the buffer is larger than this value, data will be persisted to a temporary file on the disk. This value should be set lower than the value of `compaction_memory_limit_per_worker`. +- Introduced in: - + +##### max_tablet_write_chunk_bytes + +- Default: 536870912 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: Maximum allowed memory (in bytes) for the current in-memory tablet write chunk before it is treated as full and enqueued for sending. Increase this value to reduce the frequency of RPCs when loading wide tables (many columns), which can improve throughput at the cost of higher memory usage and larger RPC payloads. Tune to balance fewer RPCs against memory and serialization/BRPC limits. +- Introduced in: v3.2.12 + +##### max_update_compaction_num_singleton_deltas + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of rowsets that can be merged in a single Compaction for Primary Key tables. +- Introduced in: - + +##### memory_limitation_per_thread_for_schema_change + +- Default: 2 +- Type: Int +- Unit: GB +- Is mutable: Yes +- Description: The maximum memory size allowed for each schema change task. +- Introduced in: - + +##### memory_ratio_for_sorting_schema_change + +- Default: 0.8 +- Type: Double +- Unit: - (unitless ratio) +- Is mutable: Yes +- Description: Fraction of the per-thread schema-change memory limit used as the memtable maximum buffer size during sorting schema-change operations. The ratio is multiplied by memory_limitation_per_thread_for_schema_change (configured in GB and converted to bytes) to compute max_buffer_size, and that result is capped at 4GB. Used by SchemaChangeWithSorting and SortedSchemaChange when creating MemTable/DeltaWriter. Increasing this ratio allows larger in-memory buffers (fewer flushes/merges) but raises risk of memory pressure; reducing it causes more frequent flushes and higher I/O/merge overhead. +- Introduced in: v3.2.0 + +##### min_base_compaction_num_singleton_deltas + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The minimum number of segments that trigger a Base Compaction. +- Introduced in: - + +##### min_compaction_failure_interval_sec + +- Default: 120 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum time interval at which a tablet compaction can be scheduled since the previous compaction failure. +- Introduced in: - + +##### min_cumulative_compaction_failure_interval_sec + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum time interval at which Cumulative Compaction retries upon failures. +- Introduced in: - + +##### min_cumulative_compaction_num_singleton_deltas + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The minimum number of segments to trigger Cumulative Compaction. +- Introduced in: - + +##### min_garbage_sweep_interval + +- Default: 180 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum time interval for garbage collection on storage volumes. This configuration is changed to dynamic from v3.0 onwards. +- Introduced in: - + +##### parallel_clone_task_per_path + +- Default: 8 +- Type: Int +- Unit: Threads +- Is mutable: Yes +- Description: Number of parallel clone worker threads allocated per storage path on a BE. At BE startup the clone thread-pool max threads is computed as max(number_of_store_paths * parallel_clone_task_per_path, MIN_CLONE_TASK_THREADS_IN_POOL). For example, with 4 storage paths and default=8 the clone pool max = 32. This setting directly controls concurrency of CLONE tasks (tablet replica copies) handled by the BE: increasing it raises parallel clone throughput but also increases CPU, disk and network contention; decreasing it limits simultaneous clone tasks and can throttle FE-scheduled clone operations. The value is applied to the dynamic clone thread pool and can be changed at runtime via the update-config path (causes agent_server to update the clone pool max threads). +- Introduced in: v3.2.0 + +##### partial_update_memory_limit_per_worker + +- Default: 2147483648 +- Type: long +- Unit: Bytes +- Is mutable: Yes +- Description: Maximum memory (in bytes) a single worker may use for assembling a source chunk when performing partial column updates (used in compaction / rowset update processing). The reader estimates per-row update memory (total_update_row_size / num_rows_upt) and multiplies it by the number of rows read; when that product exceeds this limit the current chunk is flushed and processed to avoid additional memory growth. Set this to match the available memory per update worker—too low increases I/O/processing overhead (many small chunks); too high risks memory pressure or OOM. If the per-row estimate is zero (legacy rowsets), this config does not impose a byte-based limit (only the INT32_MAX row count limit applies). +- Introduced in: v3.2.10 + +##### path_gc_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When enabled, StorageEngine starts per-data-dir background threads that perform periodic path scanning and garbage collection. On startup `start_bg_threads()` spawns `_path_scan_thread_callback` (calls `DataDir::perform_path_scan` and `perform_tmp_path_scan`) and `_path_gc_thread_callback` (calls `DataDir::perform_path_gc_by_tablet`, `DataDir::perform_path_gc_by_rowsetid`, `DataDir::perform_delta_column_files_gc`, and `DataDir::perform_crm_gc`). The scan and GC intervals are controlled by `path_scan_interval_second` and `path_gc_check_interval_second`; CRM file cleanup uses `unused_crm_file_threshold_second`. Disable this to prevent automatic path-level cleanup (you must then manage orphaned/temp files manually). Changing this flag requires restarting the process. +- Introduced in: v3.2.0 + +##### path_gc_check_interval_second + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: Interval in seconds between runs of the storage engine's path garbage-collection background thread. Each wake triggers DataDir to perform path GC by tablet, by rowset id, delta column file GC and CRM GC (the CRM GC call uses `unused_crm_file_threshold_second`). If set to a non-positive value the code forces the interval to 1800 seconds (half hour) and emits a warning. Tune this to control how frequently on-disk temporary or downloaded files are scanned and removed. +- Introduced in: v3.2.0 + +##### pending_data_expire_time_sec + +- Default: 1800 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The expiration time of the pending data in the storage engine. +- Introduced in: - + +##### pindex_major_compaction_limit_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum concurrency of compaction on a disk. This addresses the issue of uneven I/O across disks due to compaction. This issue can cause excessively high I/O for certain disks. +- Introduced in: v3.0.9 + +##### pk_index_compaction_score_ratio + +- Default: 1.5 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: Compaction score ratio for Primary Key index in a shared-data cluster. For example, if there are N filesets, the Compaction score will be `N * pk_index_compaction_score_ratio`. +- Introduced in: - + +##### pk_index_early_sst_compaction_threshold + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: early sst compaction threshold for primary key index in a shared-data cluster. +- Introduced in: - + +##### pk_index_map_shard_size + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Number of shards used by the Primary Key index shard map in the lake UpdateManager. UpdateManager allocates a vector of `PkIndexShard` of this size and maps a tablet ID to a shard via a bitmask. Increasing this value reduces lock contention among tablets that would otherwise share the same shard, at the cost of more mutex objects and slightly higher memory usage. The value must be a power of two because the code relies on bitmask indexing. For sizing guidance see `tablet_map_shard_size` heuristic: `total_num_of_tablets_in_BE / 512`. +- Introduced in: v3.2.0 + +##### pk_index_memtable_flush_threadpool_max_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads in the thread pool for Primary Key index MemTable flush in a shared-data cluster. `0` means automatically set to half of the number of CPU cores. +- Introduced in: - + +##### pk_index_memtable_flush_threadpool_size + +- Default: 1048576 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls the maximum queue size (number of pending tasks) for the Primary Key index memtable flush thread pool used in shared-data (cloud-native / lake) mode. The thread pool is created as "cloud_native_pk_index_flush" in ExecEnv; its max thread count is governed by `pk_index_memtable_flush_threadpool_max_threads`. Increasing this value permits more memtable flush tasks to be buffered before execution, which can reduce immediate backpressure but increases memory consumed by queued task objects. Decreasing it limits buffered tasks and can cause earlier backpressure or task rejections depending on thread-pool behavior. Tune according to available memory and expected concurrent flush workload. +- Introduced in: - + +##### pk_index_memtable_max_count + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of MemTables for Primary Key index in a shared-data cluster. +- Introduced in: - + +##### pk_index_memtable_max_wait_flush_timeout_ms + +- Default: 30000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The maximum timeout for waiting for Primary Key index MemTable flush completion in a shared-data cluster. When synchronously flushing all MemTables (for example, before an ingest SST operation), the system waits up to this timeout. The default is 30 seconds. +- Introduced in: - + +##### pk_index_parallel_compaction_task_split_threshold_bytes + +- Default: 33554432 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The splitting threshold for Primary Key index Compaction tasks. When the total size of the files involved in a task is smaller than this threshold, the task will not be split. +- Introduced in: - + +##### pk_index_parallel_compaction_threadpool_max_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads in the thread pool for cloud native Primary Key index parallel Compaction in a shared-data cluster. `0` means automatically set to half of the number of CPU cores. +- Introduced in: - + +##### pk_index_parallel_compaction_threadpool_size + +- Default: 1048576 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum queue size (number of pending tasks) for the thread pool used by cloud-native Primary Key index parallel compaction in shared-data mode. This setting controls how many Compaction tasks can be enqueued before the thread pool rejects new submissions. The effective parallelism is bounded by `pk_index_parallel_compaction_threadpool_max_threads`; increase this value to avoid task rejections when you expect many concurrent Compaction tasks, but be aware larger queues can increase memory and latency for queued work. +- Introduced in: - + +##### pk_index_parallel_execution_min_rows + +- Default: 16384 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The minimum rows threshold to enable parallel execution for Primary Key index operations in a shared-data cluster. +- Introduced in: - + +##### pk_index_parallel_execution_threadpool_max_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads in the thread pool for Primary Key index parallel execution in a shared-data cluster. `0` means automatically set to half of the number of CPU cores. +- Introduced in: - + +##### pk_index_size_tiered_level_multiplier + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The level multiplier parameter for Primary Key index size-tiered Compaction strategy. +- Introduced in: - + +##### pk_index_size_tiered_max_level + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum level for Primary Key index size-tiered Compaction strategy. +- Introduced in: - + +##### pk_index_size_tiered_min_level_size + +- Default: 131072 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The minimum level for Primary Key index size-tiered Compaction strategy. +- Introduced in: - + +##### pk_index_sstable_sample_interval_bytes + +- Default: 16777216 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The sampling interval size for SSTable files in a shared-data cluster. When the size of an SSTable file exceeds this threshold, the system samples keys from the SSTable at this interval to optimize the boundary partitioning of Compaction tasks. For SSTables smaller than this threshold, only the start key is used as the boundary key. The default is 16 MB. +- Introduced in: - + +##### pk_index_target_file_size + +- Default: 67108864 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The target file size for Primary Key index in a shared-data cluster. +- Introduced in: - + +##### pk_index_eager_build_threshold_bytes + +- Default: 104857600 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: When `enable_pk_index_eager_build` is set to true, the system will eagerly build PK index files only if the data generated during import or compaction exceeds this threshold. Default is 100MB. +- Introduced in: - + +##### primary_key_limit_size + +- Default: 128 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum size of a key column in Primary Key tables. +- Introduced in: v2.5 + +##### release_snapshot_worker_count + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads for the release snapshot tasks on a BE node. +- Introduced in: - + +##### repair_compaction_interval_seconds + +- Default: 600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval to poll Repair Compaction threads. +- Introduced in: - + +##### replication_max_speed_limit_kbps + +- Default: 50000 +- Type: Int +- Unit: KB/s +- Is mutable: Yes +- Description: The maximum speed of each replication thread. +- Introduced in: v3.3.5 + +##### replication_min_speed_limit_kbps + +- Default: 50 +- Type: Int +- Unit: KB/s +- Is mutable: Yes +- Description: The minimum speed of each replication thread. +- Introduced in: v3.3.5 + +##### replication_min_speed_time_seconds + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time duration allowed for a replication thread to be under the minimum speed. Replication will fail if the time when the actual speed is lower than `replication_min_speed_limit_kbps` exceeds this value. +- Introduced in: v3.3.5 + +##### replication_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads used for replication. `0` indicates setting the thread number to four times the BE CPU core count. +- Introduced in: v3.3.5 + +##### size_tiered_level_multiple + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The multiple of data size between two contiguous levels in the Size-tiered Compaction policy. +- Introduced in: - + +##### size_tiered_level_multiple_dupkey + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: In the Size-tiered Compaction policy, the multiple of the data amount difference between two adjacent levels for Duplicate Key tables. +- Introduced in: - + +##### size_tiered_level_num + +- Default: 7 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of levels for the Size-tiered Compaction policy. At most one rowset is reserved for each level. Therefore, under a stable condition, there are, at most, as many rowsets as the level number specified in this configuration item. +- Introduced in: - + +##### size_tiered_max_compaction_level + +- Default: 3 +- Type: Int +- Unit: Levels +- Is mutable: Yes +- Description: Limits how many size-tiered levels may be merged into a single primary-key real-time compaction task. During the PK size-tiered compaction selection, StarRocks builds ordered "levels" of rowsets by size and will add successive levels into the chosen compaction input until this limit is reached (the code uses compaction_level `<=` size_tiered_max_compaction_level). The value is inclusive and counts the number of distinct size tiers merged (the top level is counted as 1). Effective only when the PK size-tiered compaction strategy is enabled; raising it lets a compaction task include more levels (larger, more I/O- and CPU-intensive merges, potential higher write amplification), while lowering it restricts merges and reduces task size and resource usage. +- Introduced in: v4.0.0 + +##### size_tiered_min_level_size + +- Default: 131072 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The data size of the minimum level in the Size-tiered Compaction policy. Rowsets smaller than this value immediately trigger the data compaction. +- Introduced in: - + +##### small_dictionary_page_size + +- Default: 4096 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Threshold (in bytes) used by BinaryPlainPageDecoder to decide whether to eagerly parse a dictionary (binary/plain) page. If a page's encoded size is < `small_dictionary_page_size`, the decoder pre-parses all string entries into an in-memory vector (`_parsed_datas`) to accelerate random access and batch reads. Raising this value causes more pages to be pre-parsed (which can reduce per-access decoding overhead and may increase effective compression for larger dictionaries) but increases memory usage and CPU spent parsing; excessively large values can degrade overall performance. Tune only after measuring memory and access-latency trade-offs. +- Introduced in: v3.4.1, v3.5.0 + +##### snapshot_expire_time_sec + +- Default: 172800 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The expiration time of snapshot files. +- Introduced in: - + +##### stale_memtable_flush_time_sec + +- Default: 0 +- Type: long +- Unit: Seconds +- Is mutable: Yes +- Description: When a sender job's memory usage is high, memtables that have not been updated for longer than `stale_memtable_flush_time_sec` seconds will be flushed to reduce memory pressure. This behavior is only considered when memory limits are approaching (`limit_exceeded_by_ratio(70)` or higher). In `LocalTabletsChannel`, an additional path at very high memory usage (`limit_exceeded_by_ratio(95)`) may flush memtables whose size exceeds `write_buffer_size / 4`. A value of `0` disables this age-based stale-memtable flushing (immutable-partition memtables still flush immediately when idle or on high memory). +- Introduced in: v3.2.0 + +##### storage_flood_stage_left_capacity_bytes + +- Default: 107374182400 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: Hard limit of the remaining storage space in all BE directories. If the remaining storage space of the BE storage directory is less than this value and the storage usage (in percentage) exceeds `storage_flood_stage_usage_percent`, Load and Restore jobs are rejected. You need to set this item together with the FE configuration item `storage_usage_hard_limit_reserve_bytes` to allow the configurations to take effect. +- Introduced in: - + +##### storage_flood_stage_usage_percent + +- Default: 95 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Hard limit of the storage usage percentage in all BE directories. If the storage usage (in percentage) of the BE storage directory exceeds this value and the remaining storage space is less than `storage_flood_stage_left_capacity_bytes`, Load and Restore jobs are rejected. You need to set this item together with the FE configuration item `storage_usage_hard_limit_percent` to allow the configurations to take effect. +- Introduced in: - + +##### storage_high_usage_disk_protect_ratio + +- Default: 0.1 +- Type: double +- Unit: - +- Is mutable: Yes +- Description: When selecting a storage root for tablet creation, StorageEngine sorts candidate disks by `disk_usage(0)` and computes the average usage. Any disk whose usage is greater than (average usage + `storage_high_usage_disk_protect_ratio`) is excluded from the preferential selection pool (it will not participate in the randomized, preferrential shuffle and thus is deferred from being chosen initially). Set to 0 to disable this protection. Values are fractional (typical range 0.0–1.0); larger values make the scheduler more tolerant of higher-than-average disks. +- Introduced in: v3.2.0 + +##### storage_medium_migrate_count + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads used for storage medium migration (from SATA to SSD). +- Introduced in: - + +##### storage_root_path + +- Default: `${STARROCKS_HOME}/storage` +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory and medium of the storage volume. Example: `/data1,medium:hdd;/data2,medium:ssd`. + - Multiple volumes are separated by semicolons (`;`). + - If the storage medium is SSD, add `,medium:ssd` at the end of the directory. + - If the storage medium is HDD, add `,medium:hdd` at the end of the directory. +- Introduced in: - + +##### sync_tablet_meta + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: A boolean value to control whether to enable the synchronization of the tablet metadata. `true` indicates enabling synchronization, and `false` indicates disabling it. +- Introduced in: - + +##### tablet_map_shard_size + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The tablet map shard size. The value must be a power of two. +- Introduced in: - + +##### tablet_max_pending_versions + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of pending versions that are tolerable on a Primary Key tablet. Pending versions refer to versions that are committed but not applied yet. +- Introduced in: - + +##### tablet_max_versions + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of versions allowed on a tablet. If the number of versions exceeds this value, new write requests will fail. +- Introduced in: - + +##### tablet_meta_checkpoint_min_interval_secs + +- Default: 600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval of thread polling for a TabletMeta Checkpoint. +- Introduced in: - + +##### tablet_meta_checkpoint_min_new_rowsets_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The minimum number of rowsets to create since the last TabletMeta Checkpoint. +- Introduced in: - + +##### tablet_rowset_stale_sweep_time_sec + +- Default: 1800 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to sweep the stale rowsets in tablets. +- Introduced in: - + +##### tablet_stat_cache_update_interval_second + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: 是 +- Description: The time interval at which Tablet Stat Cache updates. +- Introduced in: - + +##### tablet_writer_open_rpc_timeout_sec + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Timeout (in seconds) for the RPC that opens a tablet writer on a remote BE. The value is converted to milliseconds and applied to both the request timeout and the brpc control timeout when issuing the open call. The runtime uses the effective timeout as the minimum of `tablet_writer_open_rpc_timeout_sec` and half of the overall load timeout (i.e., min(`tablet_writer_open_rpc_timeout_sec`, `load_timeout_sec` / 2)). Set this to balance timely failure detection (too small may cause premature open failures) and giving BEs enough time to initialize writers (too large delays error handling). +- Introduced in: v3.2.0 + +##### transaction_apply_worker_count + +- Default: 0 +- Type: Int +- Unit: Threads +- Is mutable: Yes +- Description: Controls the maximum number of worker threads used by the UpdateManager's "update_apply" thread pool — the pool that applies rowsets for transactions (notably for primary-key tables). A value `>0` sets a fixed maximum thread count; 0 (the default) makes the pool size equal to the number of CPU cores. The configured value is applied at startup (UpdateManager::init) and can be changed at runtime via the update-config HTTP action, which updates the pool's max threads. Tune this to increase apply concurrency (throughput) or limit CPU/memory contention; min threads and idle timeout are governed by transaction_apply_thread_pool_num_min and transaction_apply_worker_idle_time_ms respectively. +- Introduced in: v3.2.0 + +##### transaction_apply_worker_idle_time_ms + +- Default: 500 +- Type: int +- Unit: Milliseconds +- Is mutable: No +- Description: Sets the idle timeout (in milliseconds) for the UpdateManager's "update_apply" thread pool used to apply transactions/updates. The value is passed to ThreadPoolBuilder::set_idle_timeout via MonoDelta::FromMilliseconds, so worker threads that remain idle longer than this timeout may be terminated (subject to the pool's configured minimum thread count and max threads). Lower values free resources faster but increase thread creation/teardown overhead under bursty load; higher values keep workers warm for short bursts at the cost of higher baseline resource usage. +- Introduced in: v3.2.11 + +##### trash_file_expire_time_sec + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to clean trash files. The default value has been changed from 259,200 to 86,400 since v2.5.17, v3.0.9, and v3.1.6. +- Introduced in: - + +##### unused_rowset_monitor_interval + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to clean the expired rowsets. +- Introduced in: - + +##### update_cache_expire_sec + +- Default: 360 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The expiration time of Update Cache. +- Introduced in: - + +##### update_compaction_check_interval_seconds + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time interval at which to check compaction for Primary Key tables. +- Introduced in: - + +##### update_compaction_delvec_file_io_amp_ratio + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Used to control the priority of compaction for rowsets that contain Delvec files in Primary Key tables. The larger the value, the higher the priority. +- Introduced in: - + +##### update_compaction_num_threads_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of Compaction threads per disk for Primary Key tables. +- Introduced in: - + +##### update_compaction_per_tablet_min_interval_seconds + +- Default: 120 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum time interval at which compaction is triggered for each tablet in a Primary Key table. +- Introduced in: - + +##### update_compaction_ratio_threshold + +- Default: 0.5 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The maximum proportion of data that a compaction can merge for a Primary Key table in a shared-data cluster. It is recommended to shrink this value if a single tablet becomes excessively large. +- Introduced in: v3.1.5 + +##### update_compaction_result_bytes + +- Default: 1073741824 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum result size of a single compaction for Primary Key tables. +- Introduced in: - + +##### update_compaction_size_threshold + +- Default: 268435456 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The Compaction Score of Primary Key tables is calculated based on the file size, which is different from other table types. This parameter can be used to make the Compaction Score of Primary Key tables similar to that of other table types, making it easier for users to understand. +- Introduced in: - + +##### upload_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads for the upload tasks of backup jobs on a BE node. `0` indicates setting the value to the number of CPU cores on the machine where the BE resides. +- Introduced in: - + +##### vertical_compaction_max_columns_per_group + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of columns per group of Vertical Compactions. +- Introduced in: - + +### Shared-data + +##### download_buffer_size + +- Default: 4194304 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: Size (in bytes) of the in-memory copy buffer used when downloading snapshot files. SnapshotLoader::download passes this value to fs::copy as the per-transfer chunk size when reading from the remote sequential file into the local writable file. Larger values can improve throughput on high-bandwidth links by reducing syscall/IO overhead; smaller values reduce peak memory use per active transfer. Note: this parameter controls buffer size per stream, not the number of download threads—total memory consumption = download_buffer_size * number_of_concurrent_downloads. +- Introduced in: v3.2.13 + +##### graceful_exit_wait_for_frontend_heartbeat + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Determines whether to await at least one frontend heartbeat response indicating SHUTDOWN status before completing graceful exit. When enabled, the graceful shutdown process remains active until a SHUTDOWN confirmation is responded via heartbeat RPC, ensuring the frontend has sufficient time to detect the termination state between two regular heartbeat intervals. +- Introduced in: v3.4.5 + +##### lake_compaction_stream_buffer_size_bytes + +- Default: 1048576 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The reader's remote I/O buffer size for cloud-native table compaction in a shared-data cluster. The default value is 1MB. You can increase this value to accelerate compaction process. +- Introduced in: v3.2.3 + +##### lake_pk_compaction_max_input_rowsets + +- Default: 500 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data cluster. The default value of this parameter is changed from `5` to `1000` since v3.2.4 and v3.1.10, and to `500` since v3.3.1 and v3.2.9. After the Sized-tiered Compaction policy is enabled for Primary Key tables (by setting `enable_pk_size_tiered_compaction_strategy` to `true`), StarRocks does not need to limit the number of rowsets for each compaction to reduce write amplification. Therefore, the default value of this parameter is increased. +- Introduced in: v3.1.8, v3.2.3 + +##### loop_count_wait_fragments_finish + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of loops to be waited when the BE/CN process exits. Each loop is a fixed interval of 10 seconds. You can set it to `0` to disable the loop wait. From v3.4 onwards, this item is changed to mutable and its default value is changed from `0` to `2`. +- Introduced in: v2.5 + +##### max_client_cache_size_per_host + +- Default: 10 +- Type: Int +- Unit: entries (cached client instances) per host +- Is mutable: No +- Description: The maximum number of cached client instances retained for each remote host by BE-wide client caches. This single setting is used when creating BackendServiceClientCache, FrontendServiceClientCache, and BrokerServiceClientCache during ExecEnv initialization, so it limits the number of client stubs/connections kept per host across those caches. Raising this value reduces reconnects and stub creation overhead at the cost of increased memory and file-descriptor usage; lowering it saves resources but may increase connection churn. The value is read at startup and cannot be changed at runtime. Currently one shared setting controls all client cache types; separate per-cache configuration may be introduced later. +- Introduced in: v3.2.0 + +##### starlet_filesystem_instance_cache_capacity + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The cache capacity of starlet filesystem instances. +- Introduced in: v3.2.16, v3.3.11, v3.4.1 + +##### starlet_filesystem_instance_cache_ttl_sec + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The cache expiration time of starlet filesystem instances. +- Introduced in: v3.3.15, 3.4.5 + +##### starlet_port + +- Default: 9070 +- Type: Int +- Unit: - +- Is mutable: No +- Description: An extra agent service port for BE and CN. +- Introduced in: - + +##### starlet_star_cache_disk_size_percent + +- Default: 80 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The percentage of disk capacity that Data Cache can use at most in a shared-data cluster. +- Introduced in: v3.1 + +##### starlet_use_star_cache + +- Default: false in v3.1 and true from v3.2.3 +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable Data Cache in a shared-data cluster. `true` indicates enabling this feature and `false` indicates disabling it. The default value is set from `false` to `true` from v3.2.3 onwards. +- Introduced in: v3.1 + +##### starlet_write_file_with_tag + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: In a shared-data cluster, whether to tag files written to object storage with object storage tags for convenient custom file management. +- Introduced in: v3.5.3 + +##### table_schema_service_max_retries + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of retries for Table Schema Service requests. +- Introduced in: v4.1 + +### Data Lake + +##### datacache_block_buffer_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable Block Buffer to optimize Data Cache efficiency. When Block Buffer is enabled, the system reads the Block data from the Data Cache and caches it in a temporary buffer, thus reducing the extra overhead caused by frequent cache reads. +- Introduced in: v3.2.0 + +##### datacache_disk_adjust_interval_seconds + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The interval of Data Cache automatic capacity scaling. At regular intervals, the system checks the cache disk usage, and triggers Automatic Scaling when necessary. +- Introduced in: v3.3.0 + +##### datacache_disk_idle_seconds_for_expansion + +- Default: 7200 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum wait time for Data Cache automatic expansion. Automatic scaling up is triggered only if the disk usage remains below `datacache_disk_low_level` for longer than this duration. +- Introduced in: v3.3.0 + +##### datacache_disk_size + +- Default: 0 +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The maximum amount of data that can be cached on a single disk. You can set it as a percentage (for example, `80%`) or a physical limit (for example, `2T`, `500G`). For example, if you use two disks and set the value of the `datacache_disk_size` parameter as `21474836480` (20 GB), a maximum of 40 GB data can be cached on these two disks. The default value is `0`, which indicates that only memory is used to cache data. +- Introduced in: - + +##### datacache_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable Data Cache. `true` indicates Data Cache is enabled, and `false` indicates Data Cache is disabled. The default value is changed to `true` from v3.3. +- Introduced in: - + +##### datacache_eviction_policy + +- Default: slru +- Type: String +- Unit: - +- Is mutable: No +- Description: The eviction policy of Data Cache. Valid values: `lru` (least recently used) and `slru` (Segmented LRU). +- Introduced in: v3.4.0 + +##### datacache_inline_item_count_limit + +- Default: 130172 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of inline cache items in Data Cache. For some particularly small cache blocks, Data Cache stores them in `inline` mode, which caches the block data and metadata together in memory. +- Introduced in: v3.4.0 + +##### datacache_mem_size + +- Default: 0 +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The maximum amount of data that can be cached in memory. You can set it as a percentage (for example, `10%`) or a physical limit (for example, `10G`, `21474836480`). +- Introduced in: - + +##### datacache_min_disk_quota_for_adjustment + +- Default: 10737418240 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The minimum effective capacity for Data Cache Automatic Scaling. If the system tries to adjust the cache capacity to less than this value, the cache capacity will be directly set to `0` to prevent suboptimal performance caused by frequent cache fills and evictions due to insufficient cache capacity. +- Introduced in: v3.3.0 + +##### disk_high_level + +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The upper limit of disk usage (in percentage) that triggers the automatic scaling up of the cache capacity. When the disk usage exceeds this value, the system automatically evicts cache data from the Data Cache. From v3.4.0 onwards, the default value is changed from `80` to `90`. This item is renamed from `datacache_disk_high_level` to `disk_high_level` from v4.0 onwards. +- Introduced in: v3.3.0 + +##### disk_low_level + +- Default: 60 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The lower limit of disk usage (in percentage) that triggers the automatic scaling down of the cache capacity. When the disk usage remains below this value for the period specified in `datacache_disk_idle_seconds_for_expansion`, and the space allocated for Data Cache is fully utilized, the system will automatically expand the cache capacity by increasing the upper limit. This item is renamed from `datacache_disk_low_level` to `disk_low_level` from v4.0 onwards. +- Introduced in: v3.3.0 + +##### disk_safe_level + +- Default: 80 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The safe level of disk usage (in percentage) for Data Cache. When Data Cache performs automatic scaling, the system adjusts the cache capacity with the goal of maintaining disk usage as close to this value as possible. From v3.4.0 onwards, the default value is changed from `70` to `80`. This item is renamed from `datacache_disk_safe_level` to `disk_safe_level` from v4.0 onwards. +- Introduced in: v3.3.0 + +##### enable_connector_sink_spill + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable Spilling for writes to external tables. Enabling this feature prevents the generation of a large number of small files as a result of writing to an external table when memory is insufficient. Currently, this feature only supports writing to Iceberg tables. +- Introduced in: v4.0.0 + +##### enable_datacache_disk_auto_adjust + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable Automatic Scaling for Data Cache disk capacity. When it is enabled, the system dynamically adjusts the cache capacity based on the current disk usage rate. This item is renamed from `datacache_auto_adjust_enable` to `enable_datacache_disk_auto_adjust` from v4.0 onwards. +- Introduced in: v3.3.0 + +##### jdbc_connection_idle_timeout_ms + +- Default: 600000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: The length of time after which an idle connection in the JDBC connection pool expires. If the connection idle time in the JDBC connection pool exceeds this value, the connection pool closes idle connections beyond the number specified in the configuration item `jdbc_minimum_idle_connections`. +- Introduced in: - + +##### jdbc_connection_pool_size + +- Default: 8 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The JDBC connection pool size. On each BE node, queries that access the external table with the same `jdbc_url` share the same connection pool. +- Introduced in: - + +##### jdbc_minimum_idle_connections + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The minimum number of idle connections in the JDBC connection pool. +- Introduced in: - + +##### lake_clear_corrupted_cache_data + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow the system to clear the corrupted data cache in a shared-data cluster. +- Introduced in: v3.4 + +##### lake_clear_corrupted_cache_meta + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow the system to clear the corrupted metadata cache in a shared-data cluster. +- Introduced in: v3.3 + +##### lake_enable_vertical_compaction_fill_data_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow vertical compaction tasks to cache data on local disks in a shared-data cluster. +- Introduced in: v3.1.7, v3.2.3 + +##### lake_replication_read_buffer_size + +- Default: 16777216 +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: The read buffer size used when downloading lake segment files during lake replication. This value determines the per-read allocation for reading remote files; the implementation uses the larger of this setting and a 1 MB minimum. A larger value reduces the number of read calls and can improve throughput but increases memory used per concurrent download; a smaller value lowers memory usage at the cost of more I/O calls. Tune according to network bandwidth, storage I/O characteristics, and the number of parallel replication threads. +- Introduced in: - + +##### lake_service_max_concurrency + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum concurrency of RPC requests in a shared-data cluster. Incoming requests will be rejected when this threshold is reached. When this item is set to `0`, no limit is imposed on the concurrency. +- Introduced in: - + +##### max_hdfs_scanner_num + +- Default: 50 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Limits the maximum number of concurrently running connector (HDFS/remote) scanners that a ConnectorScanNode can have. During scan startup the node computes an estimated concurrency (based on memory, chunk size and scanner_row_num) and then caps it with this value to determine how many scanners and chunks to reserve and how many scanner threads to start. It is also consulted when scheduling pending scanners at runtime (to avoid oversubscription) and when deciding how many pending scanners can be re-submitted considering file-handle limits. Lowering this reduces threads, memory and open-file pressure at the cost of potential throughput; increasing it raises concurrency and resource usage. +- Introduced in: v3.2.0 + +##### query_max_memory_limit_percent + +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum memory that the Query Pool can use. It is expressed as a percentage of the Process memory limit. +- Introduced in: v3.1.0 + +##### rocksdb_max_write_buffer_memory_bytes + +- Default: 1073741824 +- Type: Int64 +- Unit: - +- Is mutable: No +- Description: It is the max size of the write buffer for meta in rocksdb. Default is 1GB. +- Introduced in: v3.5.0 + +##### rocksdb_write_buffer_memory_percent + +- Default: 5 +- Type: Int64 +- Unit: - +- Is mutable: No +- Description: It is the memory percent of write buffer for meta in rocksdb. default is 5% of system memory. However, aside from this, the final calculated size of the write buffer memory will not be less than 64MB nor exceed 1G (rocksdb_max_write_buffer_memory_bytes) +- Introduced in: v3.5.0 + +### Other + +##### default_mv_resource_group_concurrency_limit + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum concurrency (per BE node) of the materialized view refresh tasks in the resource group `default_mv_wg`. The default value `0` indicates no limits. +- Introduced in: v3.1 + +##### default_mv_resource_group_cpu_limit + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of CPU cores (per BE node) that can be used by the materialized view refresh tasks in the resource group `default_mv_wg`. +- Introduced in: v3.1 + +##### default_mv_resource_group_memory_limit + +- Default: 0.8 +- Type: Double +- Unit: +- Is mutable: Yes +- Description: The maximum memory proportion (per BE node) that can be used by the materialized view refresh tasks in the resource group `default_mv_wg`. The default value indicates 80% of the memory. +- Introduced in: v3.1 + +##### default_mv_resource_group_spill_mem_limit_threshold + +- Default: 0.8 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The memory usage threshold before a materialized view refresh task in the resource group `default_mv_wg` triggers intermediate result spilling. The default value indicates 80% of the memory. +- Introduced in: v3.1 + +##### enable_resolve_hostname_to_ip_in_load_error_url + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: For `error_urls` debugging, whether to allow operators to choose between using original hostnames from FE heartbeat or forcing resolution to IP addresses based on their environment needs. + - `true`: Resolve hostnames to IPs. + - `false` (Default): Keeps the original hostname in the error URL. +- Introduced in: v4.0.1 + +##### enable_retry_apply + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, Tablet apply failures that are classified as retryable (for example transient memory-limit errors) are rescheduled for retry instead of immediately marking the tablet in error. The retry path in TabletUpdates schedules the next attempt using `retry_apply_interval_second` multiplied by the current failure count and clamped to a 600s maximum, so backoff grows with successive failures. Explicitly non-retryable errors (for example corruption) bypass retries and cause the apply process to enter the error state immediately. Retries continue until an overall timeout/terminal condition is reached, after which the apply will enter the error state. Turning this off disables automatic rescheduling of failed apply tasks and causes failed applies to transition to error state without retries. +- Introduced in: v3.2.9 + +##### enable_token_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: A boolean value to control whether to enable the token check. `true` indicates enabling the token check, and `false` indicates disabling it. +- Introduced in: - + +##### es_scroll_keepalive + +- Default: 5m +- Type: String +- Unit: Minutes (string with suffix, e.g. "5m") +- Is mutable: No +- Description: The keep-alive duration sent to Elasticsearch for scroll search contexts. The value is used verbatim (for example "5m") when building the initial scroll URL (`?scroll=`) and when sending subsequent scroll requests (via ESScrollQueryBuilder). This controls how long the ES search context is retained before garbage collection on the ES side; setting it longer keeps scroll contexts alive for more time but prolongs resource usage on the ES cluster. The value is read at startup by the ES scan reader and is not changeable at runtime. +- Introduced in: v3.2.0 + +##### load_replica_status_check_interval_ms_on_failure + +- Default: 2000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The interval that the secondary replica checks it's status on the primary replica if the last check rpc fails. +- Introduced in: v3.5.1 + +##### load_replica_status_check_interval_ms_on_success + +- Default: 15000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The interval that the secondary replica checks it's status on the primary replica if the last check rpc successes. +- Introduced in: v3.5.1 + +##### max_length_for_bitmap_function + +- Default: 1000000 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The maximum length of input values for bitmap functions. +- Introduced in: - + +##### max_length_for_to_base64 + +- Default: 200000 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: The maximum length of input values for the to_base64() function. +- Introduced in: - + +##### memory_high_level + +- Default: 75 +- Type: Long +- Unit: Percent +- Is mutable: Yes +- Description: High water memory threshold expressed as a percentage of the process memory limit. When total memory consumption rises above this percentage, BE begins to free memory gradually (currently by evicting data cache and update cache) to relieve pressure. The monitor uses this value to compute memory_high = mem_limit * memory_high_level / 100 and, if consumption `>` memory_high, performs controlled eviction guided by the GC advisor; if consumption exceeds memory_urgent_level (a separate config), more aggressive immediate reductions occur. This value is also consulted to disable certain memory‑intensive operations (for example, primary-key preload) when the threshold is exceeded. Must satisfy validation with memory_urgent_level (memory_urgent_level `>` memory_high_level, memory_high_level `>=` 1, memory_urgent_level `<=` 100). +- Introduced in: v3.2.0 + +##### report_exec_rpc_request_retry_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The retry times of rpc request to report exec rpc request to FE. The default value is 10, which means that the rpc request will be retried 10 times if it fails only if it's fragment instatnce finish rpc. Report exec rpc request is important for load job, if one fragment instance finish report failed, the load job will be hang until timeout. +- Introduced in: - + +##### sleep_one_second + +- Default: 1 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: A small, global sleep interval (in seconds) used by BE agent worker threads as a one-second pause when the master address/heartbeat is not yet available or when a short retry/backoff is needed. In the codebase it is referenced by several report worker pools (e.g., ReportDiskStateTaskWorkerPool, ReportOlapTableTaskWorkerPool, ReportWorkgroupTaskWorkerPool) to avoid busy-waiting and reduce CPU consumption while retrying. Increasing this value slows the retry frequency and responsiveness to master availability; reducing it increases polling rate and CPU usage. Adjust only with awareness of the trade-off between responsiveness and resource use. +- Introduced in: v3.2.0 + +##### small_file_dir + +- Default: `${STARROCKS_HOME}/lib/small_file/` +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory used to store the files downloaded by the file manager. +- Introduced in: - + +##### upload_buffer_size + +- Default: 4194304 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: Buffer size (in bytes) used by file copy operations when uploading snapshot files to remote storage (broker or direct FileSystem). In the upload path (snapshot_loader.cpp) this value is passed to fs::copy as the read/write chunk size for each upload stream. The default is 4 MiB. Increasing this value can improve throughput on high-latency or high-bandwidth links but increases memory usage per concurrent upload; decreasing it reduces per-stream memory but may reduce transfer efficiency. Tune together with upload_worker_count and overall available memory. +- Introduced in: v3.2.13 + +##### user_function_dir + +- Default: `${STARROCKS_HOME}/lib/udf` +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory used to store User-defined Functions (UDFs). +- Introduced in: - + +##### web_log_bytes + +- Default: 1048576 (1 MB) +- Type: long +- Unit: Bytes +- Is mutable: No +- Description: Maximum number of bytes to read from the INFO logfile and show on the BE debug webserver's log page. The handler uses this value to compute a seek offset (showing the last N bytes) to avoid reading or serving very large log files. If the logfile is smaller than this value the whole file is shown. Note: in the current implementation the code that reads and serves the INFO log is commented out and the handler reports that the INFO log file couldn't be opened, so this parameter may have no effect unless the log-serving code is enabled. +- Introduced in: v3.2.0 + +### Removed parameters + +##### enable_bit_unpack_simd + +- Status: Removed +- Description: This parameter has been removed. Bit-unpack SIMD selection is now handled at compile time (AVX2/BMI2) with automatic fallback to the default implementation. +- Removed in: - diff --git a/docs/en/administration/management/Backup_and_restore.md b/docs/en/administration/management/Backup_and_restore.md new file mode 100644 index 0000000..701703c --- /dev/null +++ b/docs/en/administration/management/Backup_and_restore.md @@ -0,0 +1,650 @@ +--- +displayed_sidebar: docs +--- + +# Back up and restore data + +This topic describes how to back up and restore data in StarRocks, or migrate data to a new StarRocks cluster. + +StarRocks supports backing up data as snapshots into a remote storage system and restoring the data to any StarRocks clusters. + +From v3.4.0 onwards, StarRocks have enhanced the functionality of BACKUP and RESTORE by supporting more objects and refactoring the syntax for better flexibility. + +StarRocks supports the following remote storage systems: + +- Apache™ Hadoop® (HDFS) cluster +- AWS S3 +- Google GCS +- MinIO + +StarRocks supports backing up the following objects: + +- Internal databases, tables (of all types and partitioning strategies), and partitions +- Metadata of external catalogs (supported from v3.4.0 onwards) +- Synchronous materialized views and asynchronous materialized views +- Logical views (supported from v3.4.0 onwards) +- User-defined functions (supported from v3.4.0 onwards) + +> **NOTE** +> +> Shared-data StarRocks clusters do not support data BACKUP and RESTORE. + +## Create a repository + +Before backing up data, you need to create a repository, which is used to store data snapshots in a remote storage system. You can create multiple repositories in a StarRocks cluster. For detailed instructions, see [CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md). + +- Create a repository in HDFS + +The following example creates a repository named `test_repo` in an HDFS cluster. + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "hdfs://:/repo_dir/backup" +PROPERTIES( + "username" = "", + "password" = "" +); +``` + +- Create a repository in AWS S3 + + You can choose IAM user-based credential (Access Key and Secret Key), Instance Profile, or Assumed Role as the credential method for accessing AWS S3. + + - The following example creates a repository named `test_repo` in the AWS S3 bucket `bucket_s3` using IAM user-based credentials as the credential method. + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyyyyyyyyy", + "aws.s3.region" = "us-east-1" + ); + ``` + + - The following example creates a repository named `test_repo` in the AWS S3 bucket `bucket_s3` using Instance Profile as the credential method. + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.use_instance_profile" = "true", + "aws.s3.region" = "us-east-1" + ); + ``` + + - The following example creates a repository named `test_repo` in the AWS S3 bucket `bucket_s3` using Assumed Role as the credential method. + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.use_instance_profile" = "true", + "aws.s3.iam_role_arn" = "arn:aws:iam::xxxxxxxxxx:role/yyyyyyyy", + "aws.s3.region" = "us-east-1" + ); + ``` + +> **NOTE** +> +> StarRocks supports creating repositories in AWS S3 only according to the S3A protocol. Therefore, when you create repositories in AWS S3, you must replace `s3://` in the S3 URI you pass as a repository location in `ON LOCATION` with `s3a://`. + +- Create a repository in Google GCS + +The following example creates a repository named `test_repo` in the Google GCS bucket `bucket_gcs`. + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "s3a://bucket_gcs/backup" +PROPERTIES( + "fs.s3a.access.key" = "xxxxxxxxxxxxxxxxxxxx", + "fs.s3a.secret.key" = "yyyyyyyyyyyyyyyyyyyy", + "fs.s3a.endpoint" = "storage.googleapis.com" +); +``` + +> **NOTE** +> +> - StarRocks supports creating repositories in Google GCS only according to the S3A protocol. Therefore, when you create repositories in Google GCS, you must replace the prefix in the GCS URI you pass as a repository location in `ON LOCATION` with `s3a://`. +> - Do not specify `https` in the endpoint address. + +- Create a repository in MinIO + +The following example creates a repository named `test_repo` in the MinIO bucket `bucket_minio`. + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "s3://bucket_minio/backup" +PROPERTIES( + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy", + "aws.s3.endpoint" = "http://minio:9000" +); +``` + +After the repository is created, you can check the repository via [SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md). After restoring data, you can delete the repository in StarRocks using [DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md). However, data snapshots backed up in the remote storage system cannot be deleted through StarRocks. You need to delete them manually in the remote storage system. + +## Back up data + +After the repository is created, you need to create a data snapshot and back up it in the remote repository. For detailed instructions, see [BACKUP](../../sql-reference/sql-statements/backup_restore/BACKUP.md). BACKUP is an asynchronous operation. You can check the status of a BACKUP job using [SHOW BACKUP](../../sql-reference/sql-statements/backup_restore/SHOW_BACKUP.md), or cancel a BACKUP job using [CANCEL BACKUP](../../sql-reference/sql-statements/backup_restore/CANCEL_BACKUP.md). + +StarRocks supports FULL backup on the granularity level of database, table, or partition. + +If you have stored a large amount of data in a table, we recommend that you back up and restore data by partition. This way, you can reduce the cost of retries in case of job failures. If you need to back up incremental data on a regular basis, you can configure a [partitioning plan](../../table_design/data_distribution/Data_distribution.md#partitioning) for your table, and back up only new partitions each time. + +### Back up database + +Performing a full BACKUP on a database will back up all tables, synchronous and asynchronous materialized views, logical views, and UDFs within the database. + +The following examples back up the database `sr_hub` in the snapshot `sr_hub_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +-- Supported from v3.4.0 onwards. +BACKUP DATABASE sr_hub SNAPSHOT sr_hub_backup +TO test_repo; + +-- Compatible with the syntax in earlier versions. +BACKUP SNAPSHOT sr_hub.sr_hub_backup +TO test_repo; +``` + +### Back up table + +StarRocks supports backing up and restoring tables of all types and partitioning strategies. Performing a full BACKUP on a table will back up the table and the synchronous materialized views built on it. + +The following examples back up the table `sr_member` from the database `sr_hub` in the snapshot `sr_member_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +-- Supported from v3.4.0 onwards. +BACKUP DATABASE sr_hub SNAPSHOT sr_member_backup +TO test_repo +ON (TABLE sr_member); + +-- Compatible with the syntax in earlier versions. +BACKUP SNAPSHOT sr_hub.sr_member_backup +TO test_repo +ON (sr_member); +``` + +The following example backs up two tables, `sr_member` and `sr_pmc`, from the database `sr_hub` in the snapshot `sr_core_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_core_backup +TO test_repo +ON (TABLE sr_member, TABLE sr_pmc); +``` + +The following example backs up all tables from the database `sr_hub` in the snapshot `sr_all_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_all_backup +TO test_repo +ON (ALL TABLES); +``` + +### Back up partition + +The following examples back up the partition `p1` of the table `sr_member` from the database `sr_hub` in the snapshot `sr_par_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +-- Supported from v3.4.0 onwards. +BACKUP DATABASE sr_hub SNAPSHOT sr_par_backup +TO test_repo +ON (TABLE sr_member PARTITION (p1)); + +-- Compatible with the syntax in earlier versions. +BACKUP SNAPSHOT sr_hub.sr_par_backup +TO test_repo +ON (sr_member PARTITION (p1)); +``` + +You can specify multiple partition names separated by commas (`,`) to back up partitions in batch. + +### Back up materialized view + +You do not need to manually back up synchronous materialized views because they will be backed up along with the BACKUP operation of the base table. + +Asynchronous materialized views can be backed up along with the BACKUP operation of the database it belongs to. You can also manually back up them. + +The following example backs up the materialized view `sr_mv1` from the database `sr_hub` in the snapshot `sr_mv1_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv1_backup +TO test_repo +ON (MATERIALIZED VIEW sr_mv1); +``` + +The following example backs up two materialized views, `sr_mv1` and `sr_mv2`, from the database `sr_hub` in the snapshot `sr_mv2_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv2_backup +TO test_repo +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2); +``` + +The following example backs up all materialized views from the database `sr_hub` in the snapshot `sr_mv3_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv3_backup +TO test_repo +ON (ALL MATERIALIZED VIEWS); +``` + +### Back up logical view + +The following example backs up the logical view `sr_view1` from the database `sr_hub` in the snapshot `sr_view1_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view1_backup +TO test_repo +ON (VIEW sr_view1); +``` + +The following example backs up two logical views, `sr_view1` and `sr_view2`, from the database `sr_hub` in the snapshot `sr_view2_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view2_backup +TO test_repo +ON (VIEW sr_view1, VIEW sr_view2); +``` + +The following example backs up all logical views from the database `sr_hub` in the snapshot `sr_view3_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view3_backup +TO test_repo +ON (ALL VIEWS); +``` + +### Back up UDF + +The following example backs up the UDF `sr_udf1` from the database `sr_hub` in the snapshot `sr_udf1_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf1_backup +TO test_repo +ON (FUNCTION sr_udf1); +``` + +The following example backs up two UDFs, `sr_udf1` and `sr_udf2`, from the database `sr_hub` in the snapshot `sr_udf2_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf2_backup +TO test_repo +ON (FUNCTION sr_udf1, FUNCTION sr_udf2); +``` + +The following example backs up all UDFs from the database `sr_hub` in the snapshot `sr_udf3_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf3_backup +TO test_repo +ON (ALL FUNCTIONS); +``` + +### Back up metadata of external catalog + +The following example backs up the metadata of the external catalog `iceberg` in the snapshot `iceberg_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP EXTERNAL CATALOG (iceberg) SNAPSHOT iceberg_backup +TO test_repo; +``` + +The following example backs up the metadata of two external catalogs, `iceberg` and `hive`, in the snapshot `iceberg_hive_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP EXTERNAL CATALOGS (iceberg, hive) SNAPSHOT iceberg_hive_backup +TO test_repo; +``` + +The following example backs up the metadata of all external catalogs in the snapshot `all_catalog_backup` and upload the snapshot to the repository `test_repo`. + +```SQL +BACKUP ALL EXTERNAL CATALOGS SNAPSHOT all_catalog_backup +TO test_repo; +``` + +To cancel the BACKUP operation on external catalogs, execute the following statement: + +```SQL +CANCEL BACKUP FOR EXTERNAL CATALOG; +``` + +## Restore data + +You can restore the data snapshot backed up in the remote storage system to the current or other StarRocks clusters to restore or migrate data. + +**When you restore an object from a snapshot, you must specify the timestamp of the snapshot.** + +Use the [RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md) statement to restore data snapshots in the remote storage system. + +RESTORE is an asynchronous operation. You can check the status of a RESTORE job using [SHOW RESTORE](../../sql-reference/sql-statements/backup_restore/SHOW_RESTORE.md), or cancel a RESTORE job using [CANCEL RESTORE](../../sql-reference/sql-statements/backup_restore/CANCEL_RESTORE.md). + +### (Optional) Create a repository in the new cluster + +To migrate data to another StarRocks cluster, you need to create a repository with the same **repository name** and **location** in the target cluster, otherwise, you will not be able to view the previously backed-up data snapshots. See [Create a repository](#create-a-repository) for details. + +### Obtain snapshot timestamp + +Before restoring data, you can check the snapshots in the repository to obtain the timestamps using [SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md). + +The following example checks the snapshot information in `test_repo`. + +```Plain +mysql> SHOW SNAPSHOT ON test_repo; ++------------------+-------------------------+--------+ +| Snapshot | Timestamp | Status | ++------------------+-------------------------+--------+ +| sr_member_backup | 2023-02-07-14-45-53-143 | OK | ++------------------+-------------------------+--------+ +1 row in set (1.16 sec) +``` + +### Restore database + +The following examples restore the database `sr_hub` in the snapshot `sr_hub_backup` to the database `sr_hub` in the target cluster. If the database does not exist in the snapshot, the system will return an error. If the database does not exist in the target cluster, the system will create it automatically. + +```SQL +-- Supported from v3.4.0 onwards. +RESTORE SNAPSHOT sr_hub_backup +FROM test_repo +DATABASE sr_hub +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); + +-- Compatible with the syntax in earlier versions. +RESTORE SNAPSHOT sr_hub.sr_hub_backup +FROM `test_repo` +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); +``` + +The following example restore the database `sr_hub` in the snapshot `sr_hub_backup` to the database `sr_hub_new` in the target cluster. If the database `sr_hub` does not exist in the snapshot, the system will return an error. If the database `sr_hub_new` does not exist in the target cluster, the system will create it automatically. + +```SQL +-- Supported from v3.4.0 onwards. +RESTORE SNAPSHOT sr_hub_backup +FROM test_repo +DATABASE sr_hub AS sr_hub_new +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); +``` + +### Restore table + +The following examples restore the table `sr_member` of the database `sr_hub` in the snapshot `sr_member_backup` to the table `sr_member` of the database `sr_hub` in the target cluster. + +```SQL +-- Supported from v3.4.0 onwards. +RESTORE SNAPSHOT sr_member_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); + +-- Compatible with the syntax in earlier versions. +RESTORE SNAPSHOT sr_hub.sr_member_backup +FROM test_repo +ON (sr_member) +PROPERTIES ("backup_timestamp"="2024-12-09-10-52-10-940"); +``` + +The following examples restore the table `sr_member` of the database `sr_hub` in the snapshot `sr_member_backup` to the table `sr_member_new` of the database `sr_hub_new` in the target cluster. + +```SQL +RESTORE SNAPSHOT sr_member_backup +FROM test_repo +DATABASE sr_hub AS sr_hub_new +ON (TABLE sr_member AS sr_member_new) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores two tables, `sr_member` and `sr_pmc`, of the database `sr_hub` in the snapshot `sr_core_backup` to two tables, `sr_member` and `sr_pmc`, of the database `sr_hub` in the target cluster. + +```SQL +RESTORE SNAPSHOT sr_core_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member, TABLE sr_pmc) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores all tables from the database `sr_hub` in the snapshot `sr_all_backup`. + +```SQL +RESTORE SNAPSHOT sr_all_backup +FROM test_repo +DATABASE sr_hub +ON (ALL TABLES); +``` + +The following example restores one of all tables from the database `sr_hub` in the snapshot `sr_all_backup`. + +```SQL +RESTORE SNAPSHOT sr_all_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### Restore partition + +The following examples restore the partition `p1` of the table `sr_member` in the snapshot `sr_par_backup` to the partition `p1` of the table `sr_member` in the target cluster. + +```SQL +-- Supported from v3.4.0 onwards. +RESTORE SNAPSHOT sr_par_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member PARTITION (p1)) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); + +-- Compatible with the syntax in earlier versions. +RESTORE SNAPSHOT sr_hub.sr_par_backup +FROM test_repo +ON (sr_member PARTITION (p1)) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +You can specify multiple partition names separated by commas (`,`) to restore partitions in batch. + +### Restore materialized view + +The following example restores the materialized view `sr_mv1` from the database `sr_hub` in the snapshot `sr_mv1_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_mv1_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores two materialized views, `sr_mv1` and `sr_mv2`, from the database `sr_hub` in the snapshot `sr_mv2_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_mv2_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores all materialized views from the database `sr_hub` in the snapshot `sr_mv3_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_mv3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL MATERIALIZED VIEWS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores one of the materialized views from the database `sr_hub` in the snapshot `sr_mv3_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_mv3_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +:::info + +After RESTORE, you can check the status of the materialized view using [SHOW MATERIALIZED VIEWS](../../sql-reference/sql-statements/materialized_view/SHOW_MATERIALIZED_VIEW.md). + +- If the materialized view is active, it can be used directly. +- If the materialized view is inactive, it might be because its base tables are not restored. After all the base tables are restored, you can use [ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md) to re-activate the materialized view. + +::: + +### Restore logical view + +The following example restores the logical view `sr_view1` from the database `sr_hub` in the snapshot `sr_view1_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_view1_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores two logical views, `sr_view1` and `sr_view2`, from the database `sr_hub` in the snapshot `sr_view2_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_view2_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1, VIEW sr_view2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores all logical views from the database `sr_hub` in the snapshot `sr_view3_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_view3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL VIEWS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores one of all logical views from the database `sr_hub` in the snapshot `sr_view3_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_view3_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### Restore UDF + +The following example restores the UDF `sr_udf1` from the database `sr_hub` in the snapshot `sr_udf1_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_udf1_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores two UDFs, `sr_udf1` and `sr_udf2`, from the database `sr_hub` in the snapshot `sr_udf2_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_udf2_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1, FUNCTION sr_udf2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores all UDFs from the database `sr_hub` in the snapshot `sr_udf3_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_udf3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL FUNCTIONS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores one of all UDFs from the database `sr_hub` in the snapshot `sr_udf3_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT sr_udf3_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### Restore metadata of external catalog + +The following example restores the metadata of the external catalog `iceberg` in the snapshot `iceberg_backup` to the target cluster, and rename it as `iceberg_new`. + +```SQL +RESTORE SNAPSHOT iceberg_backup +FROM test_repo +EXTERNAL CATALOG (iceberg AS iceberg_new) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores the metadata of two external catalogs, `iceberg` and `hive`, in the snapshot `iceberg_hive_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT iceberg_hive_backup +FROM test_repo +EXTERNAL CATALOGS (iceberg, hive) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +The following example restores the metadata of all external catalogs in the snapshot `all_catalog_backup` to the target cluster. + +```SQL +RESTORE SNAPSHOT all_catalog_backup +FROM test_repo +ALL EXTERNAL CATALOGS +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +To cancel the RESTORE operation on external catalogs, execute the following statement: + +```SQL +CANCEL RESTORE FOR EXTERNAL CATALOG; +``` + +## Configure BACKUP or RESTORE jobs + +You can optimize the performance of BACKUP or RESTORE jobs by modifying the following configuration items in the BE configuration file **be.conf**: + +| Configuration item | Description | +| ----------------------- | -------------------------------------------------------------------------------- | +| make_snapshot_worker_count | The maximum number of threads for the make snapshot tasks of BACKUP jobs on a BE node. Default: `5`. Increase the value of this configuration item to increase the concurrency of the make snapshot task. | +| release_snapshot_worker_count | The maximum number of threads for the release snapshot tasks of failed BACKUP jobs on a BE node. Default: `5`. Increase the value of this configuration item to increase the concurrency of the release snapshot task. | +| upload_worker_count | The maximum number of threads for the upload tasks of BACKUP jobs on a BE node. Default: `0`. `0` indicates setting the value to the number of CPU cores on the machine where the BE resides. Increase the value of this configuration item to increase the concurrency of the upload task. | +| download_worker_count | The maximum number of threads for the download tasks of RESTORE jobs on a BE node. Default: `0`. `0` indicates setting the value to the number of CPU cores on the machine where the BE resides. Increase the value of this configuration item to increase the concurrency of the download task. | + +## Usage notes + +- Performing backup and restore operations on global, database, table, and partition levels requires different privileges. For detailed information, see [Customize roles based on scenarios](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios). +- In each database, only one running BACKUP or RESTORE job is allowed each time. Otherwise, StarRocks returns an error. +- Because BACKUP and RESTORE jobs occupy many resources of your StarRocks cluster, you can back up and restore your data while your StarRocks cluster is not heavily loaded. +- StarRocks does not support specifying data compression algorithms for data backup. +- Because data is backed up as snapshots, the data loaded upon snapshot generation is not included in the snapshot. Therefore, if you load data into the old cluster after the snapshot is generated and before the RESTORE job is completed, you also need to load the data into the cluster that data is restored into. It is recommended that you load data into both clusters in parallel for a period of time after the data migration is complete, and then migrate your application to the new cluster after verifying the correctness of the data and services. +- Before the RESTORE job is completed, you cannot operate the table to be restored. +- Primary Key tables cannot be restored to a StarRocks cluster earlier than v2.5. +- You do not need to create the table to be restored in the new cluster before restoring it. The RESTORE job automatically creates it. +- If there is an existing table that has a duplicated name with the table to be restored, StarRocks first checks whether or not the schema of the existing table matches that of the table to be restored. If the schemas match, StarRocks overwrites the existing table with the data in the snapshot. If the schema does not match, the RESTORE job fails. You can either rename the table to be restored using the keyword `AS`, or delete the existing table before restoring data. +- If the RESTORE job overwrites an existing database, table, or partition, the overwritten data cannot be restored after the job enters the COMMIT phase. If the RESTORE job fails or is canceled at this point, the data may be corrupted and inaccessible. In this case, you can only perform the RESTORE operation again and wait for the job to complete. Therefore, we recommend that you do not restore data by overwriting unless you are sure that the current data is no longer used. The overwrite operation first checks metadata consistency between the snapshot and the existing database, table, or partition. If an inconsistency is detected, the RESTORE operation cannot be performed. +- Currently, StarRocks does not support backing up and restoring the configuration data related to user accounts, privileges, and resource groups. +- Currently, StarRocks does not support backing up and restoring the Colocate Join relationship among tables. diff --git a/docs/en/administration/management/FE_configuration.md b/docs/en/administration/management/FE_configuration.md new file mode 100644 index 0000000..ae97d6b --- /dev/null +++ b/docs/en/administration/management/FE_configuration.md @@ -0,0 +1,4502 @@ +--- +displayed_sidebar: docs +--- + +import FEConfigMethod from '../../_assets/commonMarkdown/FE_config_method.mdx' + +import AdminSetFrontendNote from '../../_assets/commonMarkdown/FE_config_note.mdx' + +import StaticFEConfigNote from '../../_assets/commonMarkdown/StaticFE_config_note.mdx' + +import EditionSpecificFEItem from '../../_assets/commonMarkdown/Edition_Specific_FE_Item.mdx' + +# FE Configuration + + + +## View FE configuration items + +After your FE is started, you can run the ADMIN SHOW FRONTEND CONFIG command on your MySQL client to check the parameter configurations. If you want to query the configuration of a specific parameter, run the following command: + +```SQL +ADMIN SHOW FRONTEND CONFIG [LIKE "pattern"]; +``` + +For detailed description of the returned fields, see [ADMIN SHOW CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SHOW_CONFIG.md). + +:::note +You must have administrator privileges to run cluster administration-related commands. +::: + +## Configure FE parameters + +### Configure FE dynamic parameters + +You can configure or modify the settings of FE dynamic parameters using [ADMIN SET FRONTEND CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md). + +```SQL +ADMIN SET FRONTEND CONFIG ("key" = "value"); +``` + + + +### Configure FE static parameters + + + +## Understand FE parameters + +### Logging + +##### audit_log_delete_age + +- Default: 30d +- Type: String +- Unit: - +- Is mutable: No +- Description: The retention period of audit log files. The default value `30d` specifies that each audit log file can be retained for 30 days. StarRocks checks each audit log file and deletes those that were generated 30 days ago. +- Introduced in: - + +##### audit_log_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores audit log files. +- Introduced in: - + +##### audit_log_enable_compress + +- Default: false +- Type: Boolean +- Unit: N/A +- Is mutable: No +- Description: When true, the generated Log4j2 configuration appends a ".gz" postfix to rotated audit log filenames (fe.audit.log.*) so that Log4j2 will produce compressed (.gz) archived audit log files on rollover. The setting is read during FE startup in Log4jConfig.initLogging and is applied to the RollingFile appender for audit logs; it only affects rotated/archived files, not the active audit log. Because the value is initialized at startup, changing it requires restarting the FE to take effect. Use alongside audit log rotation settings (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num). +- Introduced in: 3.2.12 + +##### audit_log_json_format + +- Default: false +- Type: Boolean +- Unit: N/A +- Is mutable: Yes +- Description: When true, FE audit events are emitted as structured JSON (Jackson ObjectMapper serializing a Map of annotated AuditEvent fields) instead of the default pipe-separated "key=value" string. The setting affects all built-in audit sinks handled by AuditLogBuilder: connection audit, query audit, big-query audit (big-query threshold fields are added to the JSON when the event qualifies), and slow-audit output. Fields annotated for big-query thresholds and the "features" field are treated specially (excluded from normal audit entries; included in big-query or feature logs as applicable). Enable this to make logs machine-parsable for log collectors or SIEMs; note it changes the log format and may require updating any existing parsers that expect the legacy pipe-separated format. +- Introduced in: 3.2.7 + +##### audit_log_modules + +- Default: slow_query, query +- Type: String[] +- Unit: - +- Is mutable: No +- Description: The modules for which StarRocks generates audit log entries. By default, StarRocks generates audit logs for the `slow_query` module and the `query` module. The `connection` module is supported from v3.0. Separate the module names with a comma (,) and a space. +- Introduced in: - + +##### audit_log_roll_interval + +- Default: DAY +- Type: String +- Unit: - +- Is mutable: No +- Description: The time interval at which StarRocks rotates audit log entries. Valid values: `DAY` and `HOUR`. + - If this parameter is set to `DAY`, a suffix in the `yyyyMMdd` format is added to the names of audit log files. + - If this parameter is set to `HOUR`, a suffix in the `yyyyMMddHH` format is added to the names of audit log files. +- Introduced in: - + +##### audit_log_roll_num + +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of audit log files that can be retained within each retention period specified by the `audit_log_roll_interval` parameter. +- Introduced in: - + +##### bdbje_log_level + +- Default: INFO +- Type: String +- Unit: - +- Is mutable: No +- Description: Controls the logging level used by Berkeley DB Java Edition (BDB JE) in StarRocks. During BDB environment initialization BDBEnvironment.initConfigs() applies this value to the Java logger for the `com.sleepycat.je` package and to the BDB JE environment file logging level (EnvironmentConfig.FILE_LOGGING_LEVEL). Accepts standard java.util.logging.Level names such as SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, ALL, OFF. Setting to ALL enables all log messages. Increasing verbosity will raise log volume and may impact disk I/O and performance; the value is read when the BDB environment is initialized, so it takes effect only after environment (re)initialization. +- Introduced in: v3.2.0 + +##### big_query_log_delete_age + +- Default: 7d +- Type: String +- Unit: - +- Is mutable: No +- Description: Controls how long FE big query log files (`fe.big_query.log.*`) are retained before automatic deletion. The value is passed to Log4j's deletion policy as the IfLastModified age — any rotated big query log whose last-modified time is older than this value will be removed. Supports suffixes include `d` (day), `h` (hour), `m` (minute), and `s` (second). Example: `7d` (7 days), `10h` (10 hours), `60m` (60 minutes), and `120s` (120 seconds). This item works together with `big_query_log_roll_interval` and `big_query_log_roll_num` to determine which files are kept or purged. +- Introduced in: v3.2.0 + +##### big_query_log_dir + +- Default: `Config.STARROCKS_HOME_DIR + "/log"` +- Type: String +- Unit: - +- Is mutable: No +- Description: Directory where the FE writes big query dump logs (`fe.big_query.log.*`). The Log4j configuration uses this path to create a RollingFile appender for `fe.big_query.log` and its rotated files. Rotation and retention are governed by `big_query_log_roll_interval` (time-based suffix), `log_roll_size_mb` (size trigger), `big_query_log_roll_num` (max files), and `big_query_log_delete_age` (age-based deletion). Big query records are logged for queries that exceed user-defined thresholds such as `big_query_log_cpu_second_threshold`, `big_query_log_scan_rows_threshold`, or `big_query_log_scan_bytes_threshold`. Use `big_query_log_modules` to control which modules log to this file. +- Introduced in: v3.2.0 + +##### big_query_log_modules + +- Default: `{"query"}` +- Type: String[] +- Unit: - +- Is mutable: No +- Description: List of module name suffixes that enable per-module big query logging. Typical values are logical component names. For example, the default `query` produces `big_query.query`. +- Introduced in: v3.2.0 + +##### big_query_log_roll_interval + +- Default: `"DAY"` +- Type: String +- Unit: - +- Is mutable: No +- Description: Specifies the time interval used to construct the date component of the rolling file name for the `big_query` log appender. Valid values (case-insensitive) are `DAY` (default) and `HOUR`. `DAY` produces a daily pattern (`"%d{yyyyMMdd}"`) and `HOUR` produces an hourly pattern (`"%d{yyyyMMddHH}"`). The value is combined with size-based rollover (`big_query_roll_maxsize`) and index-based rollover (`big_query_log_roll_num`) to form the RollingFile filePattern. An invalid value causes log configuration generation to fail (IOException) and may prevent log initialization or reconfiguration. Use alongside `big_query_log_dir`, `big_query_roll_maxsize`, `big_query_log_roll_num`, and `big_query_log_delete_age`. +- Introduced in: v3.2.0 + +##### big_query_log_roll_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Maximum number of rotated FE big query log files to retain per `big_query_log_roll_interval`. This value is bound to the RollingFile appender's DefaultRolloverStrategy `max` attribute for `fe.big_query.log`; when logs roll (by time or by `log_roll_size_mb`), StarRocks keeps up to `big_query_log_roll_num` indexed files (filePattern uses a time suffix plus index). Files older than this count may be removed by rollover, and `big_query_log_delete_age` can additionally delete files by last-modified age. +- Introduced in: v3.2.0 + +##### dump_log_delete_age + +- Default: 7d +- Type: String +- Unit: - +- Is mutable: No +- Description: The retention period of dump log files. The default value `7d` specifies that each dump log file can be retained for 7 days. StarRocks checks each dump log file and deletes those that were generated 7 days ago. +- Introduced in: - + +##### dump_log_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores dump log files. +- Introduced in: - + +##### dump_log_modules + +- Default: query +- Type: String[] +- Unit: - +- Is mutable: No +- Description: The modules for which StarRocks generates dump log entries. By default, StarRocks generates dump logs for the query module. Separate the module names with a comma (,) and a space. +- Introduced in: - + +##### dump_log_roll_interval + +- Default: DAY +- Type: String +- Unit: - +- Is mutable: No +- Description: The time interval at which StarRocks rotates dump log entries. Valid values: `DAY` and `HOUR`. + - If this parameter is set to `DAY`, a suffix in the `yyyyMMdd` format is added to the names of dump log files. + - If this parameter is set to `HOUR`, a suffix in the `yyyyMMddHH` format is added to the names of dump log files. +- Introduced in: - + +##### dump_log_roll_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of dump log files that can be retained within each retention period specified by the `dump_log_roll_interval` parameter. +- Introduced in: - + +##### edit_log_write_slow_log_threshold_ms + +- Default: 2000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Threshold (in ms) used by JournalWriter to detect and log slow edit-log batch writes. After a batch commit, if the batch duration exceeds this value, JournalWriter emits a WARN with batch size, duration and current journal queue size (rate-limited to once every ~2s). This setting only controls logging/alerts for potential IO or replication latency on the FE leader; it does not change commit or roll behavior (see `edit_log_roll_num` and commit-related settings). Metric updates still occur regardless of this threshold. +- Introduced in: v3.2.3 + +##### enable_audit_sql + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the FE audit subsystem records the SQL text of statements into FE audit logs (`fe.audit.log`) processed by ConnectProcessor. The stored statement respects other controls: encrypted statements are redacted (`AuditEncryptionChecker`), sensitive credentials may be redacted or desensitized if `enable_sql_desensitize_in_log` is set, and digest recording is controlled by `enable_sql_digest`. When it is set to `false`, ConnectProcessor replaces the statement text with "?" in audit events — other audit fields (user, host, duration, status, slow-query detection via `qe_slow_log_ms`, and metrics) are still recorded. Enabling SQL audit increases forensic and troubleshooting visibility but may expose sensitive SQL content and increase log volume and I/O; disabling it improves privacy at the cost of losing full-statement visibility in audit logs. +- Introduced in: - + +##### enable_profile_log + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable profile logging. When this feature is enabled, the FE writes per-query profile logs (the serialized `queryDetail` JSON produced by `ProfileManager`) to the profile log sink. This logging is performed only if `enable_collect_query_detail_info` is also enabled; when `enable_profile_log_compress` is enabled, the JSON may be gzipped before logging. Profile log files are managed by `profile_log_dir`, `profile_log_roll_num`, `profile_log_roll_interval` and rotated/deleted according to `profile_log_delete_age` (supports formats like `7d`, `10h`, `60m`, `120s`). Disabling this feature stops writing profile logs (reducing disk I/O, compression CPU and storage usage). +- Introduced in: v3.2.5 + +##### enable_qe_slow_log + +- Default: true +- Type: Boolean +- Unit: N/A +- Is mutable: Yes +- Description: When enabled, the FE builtin audit plugin (AuditLogBuilder) will write query events whose measured execution time ("Time" field) exceeds the threshold configured by qe_slow_log_ms into the slow-query audit log (AuditLog.getSlowAudit). If disabled, those slow-query entries are suppressed (regular query and connection audit logs are unaffected). The slow-audit entries follow the global audit_log_json_format setting (JSON vs. plain string). Use this flag to control generation of slow-query audit volume independently of regular audit logging; turning it off may reduce log I/O when qe_slow_log_ms is low or workloads produce many long-running queries. +- Introduced in: 3.2.11 + +##### enable_sql_desensitize_in_log + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the system replaces or hides sensitive SQL content before it is written to logs and query-detail records. Code paths that honor this configuration include ConnectProcessor.formatStmt (audit logs), StmtExecutor.addRunningQueryDetail (query details), and SimpleExecutor.formatSQL (internal executor logs). With the feature enabled, invalid SQLs may be replaced with a fixed desensitized message, credentials (user/password) are hidden, and the SQL formatter is required to produce a sanitized representation (it can also enable digest-style output). This reduces leakage of sensitive literals and credentials in audit/internal logs but also means logs and query details no longer contain the original full SQL text (which can affect replay or debugging). +- Introduced in: - + +##### internal_log_delete_age + +- Default: 7d +- Type: String +- Unit: - +- Is mutable: No +- Description: Specifies the retention period for FE internal log files (written to `internal_log_dir`). The value is a duration string. Supported suffixes: `d` (day), `h` (hour), `m` (minute), `s` (second). Examples: `7d` (7 days), `10h` (10 hours), `60m` (60 minutes), `120s` (120 seconds). This item is substituted into the log4j configuration as the `` predicate used by the RollingFile Delete policy. Files whose last-modified time is earlier than this duration will be removed during log rollover. Increase this value to free disk space sooner, or decrease it to retain internal materialized view or statistics logs longer. +- Introduced in: v3.2.4 + +##### internal_log_dir + +- Default: `Config.STARROCKS_HOME_DIR + "/log"` +- Type: String +- Unit: - +- Is mutable: No +- Description: Directory used by the FE logging subsystem for storing internal logs (`fe.internal.log`). This configuration is substituted into the Log4j configuration and determines where the InternalFile appender writes internal/materialized view/statistics logs and where per-module loggers under `internal.` place their files. Ensure the directory exists, is writable, and has sufficient disk space. Log rotation and retention for files in this directory are controlled by `log_roll_size_mb`, `internal_log_roll_num`, `internal_log_delete_age`, and `internal_log_roll_interval`. If `sys_log_to_console` is enabled, internal logs may be written to console instead of this directory. +- Introduced in: v3.2.4 + +##### internal_log_json_format + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, internal statistic/audit entries are written as compact JSON objects to the statistic audit logger. The JSON contains keys "executeType" (InternalType: QUERY or DML), "queryId", "sql", and "time" (elapsed milliseconds). When it is set to `false`, the same information is logged as a single formatted text line ("statistic execute: ... | QueryId: [...] | SQL: ..."). Enabling JSON improves machine parsing and integration with log processors but also causes raw SQL text to be included in logs, which may expose sensitive information and increase log size. +- Introduced in: - + +##### internal_log_modules + +- Default: `{"base", "statistic"}` +- Type: String[] +- Unit: - +- Is mutable: No +- Description: A list of module identifiers that will receive dedicated internal logging. For each entry X, Log4j creates a logger named `internal.` with level INFO and additivity="false". Those loggers are routed to the internal appender (written to `fe.internal.log`) or to console when `sys_log_to_console` is enabled. Use short names or package fragments as needed — the exact logger name becomes `internal.` + the configured string. Internal log file rotation and retention follow `internal_log_dir`, `internal_log_roll_num`, `internal_log_delete_age`, `internal_log_roll_interval`, and `log_roll_size_mb`. Adding a module causes its runtime messages to be separated into the internal logger stream for easier debugging and audit. +- Introduced in: v3.2.4 + +##### internal_log_roll_interval + +- Default: DAY +- Type: String +- Unit: - +- Is mutable: No +- Description: Controls the time-based roll interval for the FE internal log appender. Accepted values (case-insensitive) are `HOUR` and `DAY`. `HOUR` produces an hourly file pattern (`"%d{yyyyMMddHH}"`) and `DAY` produces a daily file pattern (`"%d{yyyyMMdd}"`), which are used by the RollingFile TimeBasedTriggeringPolicy to name rotated `fe.internal.log` files. An invalid value causes initialization to fail (an IOException is thrown when building the active Log4j configuration). Roll behavior also depends on related settings such as `internal_log_dir`, `internal_roll_maxsize`, `internal_log_roll_num`, and `internal_log_delete_age`. +- Introduced in: v3.2.4 + +##### internal_log_roll_num + +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Maximum number of rolled internal FE log files to retain for the internal appender (`fe.internal.log`). This value is used as the Log4j DefaultRolloverStrategy `max` attribute; when rollovers occur, StarRocks keeps up to `internal_log_roll_num` archived files and removes older ones (also governed by `internal_log_delete_age`). A lower value reduces disk usage but shortens log history; a higher value preserves more historical internal logs. This item works together with `internal_log_dir`, `internal_log_roll_interval`, and `internal_roll_maxsize`. +- Introduced in: v3.2.4 + +##### log_cleaner_audit_log_min_retention_days + +- Default: 3 +- Type: Int +- Unit: Days +- Is mutable: Yes +- Description: Minimum retention days for audit log files. Audit log files newer than this will not be deleted even if disk usage is high. This ensures that audit logs are preserved for compliance and troubleshooting purposes. +- Introduced in: - + +##### log_cleaner_check_interval_second + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Interval in seconds to check disk usage and clean logs. The cleaner periodically checks each log directory's disk usage and triggers cleaning when necessary. Default is 300 seconds (5 minutes). +- Introduced in: - + +##### log_cleaner_disk_usage_target + +- Default: 60 +- Type: Int +- Unit: Percentage +- Is mutable: Yes +- Description: Target disk usage (percentage) after log cleaning. Log cleaning will continue until disk usage drops below this threshold. The cleaner deletes the oldest log files one by one until the target is reached. +- Introduced in: - + +##### log_cleaner_disk_usage_threshold + +- Default: 80 +- Type: Int +- Unit: Percentage +- Is mutable: Yes +- Description: Disk usage threshold (percentage) to trigger log cleaning. When disk usage exceeds this threshold, log cleaning will start. The cleaner checks each configured log directory independently and processes directories that exceed this threshold. +- Introduced in: - + +##### log_cleaner_disk_util_based_enable + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Enable automatic log cleaning based on disk usage. When enabled, logs will be cleaned when disk usage exceeds the threshold. The log cleaner runs as a background daemon on the FE node and helps prevent disk space exhaustion from log file accumulation. +- Introduced in: - + +##### log_plan_cancelled_by_crash_be + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the query execution plan logging when a query is cancelled due to BE crash or an RPC exception. When this feature is enabled, StarRocks logs the query execution plan (at `TExplainLevel.COSTS`) as a WARN entry when a query is cancelled due to BE crash or an `RpcException`. The log entry includes QueryId, SQL and the COSTS plan; in the ExecuteExceptionHandler path, the exception stacktrace is also logged. The logging is skipped when `enable_collect_query_detail_info` is enabled (the plan is then stored in the query detail) — in code paths, the check is performed by verifying the query detail is null. Note that, in ExecuteExceptionHandler, the plan is logged only on the first retry (`retryTime == 0`). Enabling this may increase log volume because full COSTS plans can be large. +- Introduced in: v3.2.0 + +##### log_register_and_unregister_query_id + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow FE to log query registration and deregistration messages (e.g., `"register query id = {}"` and `"deregister query id = {}"`) from QeProcessorImpl. The log is emitted only when the query has a non-null ConnectContext and either the command is not `COM_STMT_EXECUTE` or the session variable `isAuditExecuteStmt()` is true. Because these messages are written for every query lifecycle event, enabling this feature can produce high log volume and become a throughput bottleneck in high concurrency environments. Enable it for debugging or auditing; and disable it to reduce logging overhead and improve performance. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### log_roll_size_mb + +- Default: 1024 +- Type: Int +- Unit: MB +- Is mutable: No +- Description: The maximum size of a system log file or an audit log file. +- Introduced in: - + +##### proc_profile_file_retained_days + +- Default: 1 +- Type: Int +- Unit: Days +- Is mutable: Yes +- Description: Number of days to retain process profiling files (CPU and memory) generated under `sys_log_dir/proc_profile`. The ProcProfileCollector computes a cutoff by subtracting `proc_profile_file_retained_days` days from the current time (formatted as yyyyMMdd-HHmmss) and deletes profile files whose timestamp portion is lexicographically earlier than that cutoff (that is, `timePart.compareTo(timeToDelete) < 0`). File deletion also respects the size-based cutoff controlled by `proc_profile_file_retained_size_bytes`. Profile files use the prefixes `cpu-profile-` and `mem-profile-` and are compressed after collection. +- Introduced in: v3.2.12 + +##### proc_profile_file_retained_size_bytes + +- Default: 2L * 1024 * 1024 * 1024 (2147483648) +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: Maximum total bytes of collected CPU and memory profile files (files named with prefixes `cpu-profile-` and `mem-profile-`) to keep under the profile directory. When the sum of valid profile files exceeds `proc_profile_file_retained_size_bytes`, the collector deletes the oldest profile files until the remaining total size is less than or equal to `proc_profile_file_retained_size_bytes`. Files older than `proc_profile_file_retained_days` are also removed regardless of size. This setting controls disk usage for profile archives and interacts with `proc_profile_file_retained_days` to determine deletion order and retention. +- Introduced in: v3.2.12 + +##### profile_log_delete_age + +- Default: 1d +- Type: String +- Unit: - +- Is mutable: No +- Description: Controls how long FE profile log files are retained before they are eligible for deletion. The value is injected into Log4j's `` policy (via `Log4jConfig`) and is applied together with rotation settings such as `profile_log_roll_interval` and `profile_log_roll_num`. Supported suffixes: `d` (day), `h` (hour), `m` (minute), `s` (second). For example: `7d` (7 days), `10h` (10 hours), `60m` (60 minutes), `120s` (120 seconds). +- Introduced in: v3.2.5 + +##### profile_log_dir + +- Default: `Config.STARROCKS_HOME_DIR + "/log"` +- Type: String +- Unit: - +- Is mutable: No +- Description: Directory where FE profile logs are written. Log4jConfig uses this value to place profile-related appenders (creates files like `fe.profile.log` and `fe.features.log` under this directory). Rotation and retention for these files are governed by `profile_log_roll_size_mb`, `profile_log_roll_num` and `profile_log_delete_age`; the timestamp suffix format is controlled by `profile_log_roll_interval` (supports DAY or HOUR). Because the default directory is under `STARROCKS_HOME_DIR`, ensure the FE process has write and rotation/delete permissions on this directory. +- Introduced in: v3.2.5 + +##### profile_log_roll_interval + +- Default: DAY +- Type: String +- Unit: - +- Is mutable: No +- Description: Controls the time granularity used to generate the date part of profile log filenames. Valid values (case-insensitive) are `HOUR` and `DAY`. `HOUR` produces a pattern of `"%d{yyyyMMddHH}"` (hourly time bucket) and `DAY` produces `"%d{yyyyMMdd}"` (daily time bucket). This value is used when computing `profile_file_pattern` in the Log4j configuration and only affects the time-based component of rollover file names; size-based rollover is still controlled by `profile_log_roll_size_mb` and retention by `profile_log_roll_num` / `profile_log_delete_age`. Invalid values cause an IOException during logging initialization (error message: `"profile_log_roll_interval config error: "`). Choose `HOUR` for high-volume profiling to limit per-file size per hour, or `DAY` for daily aggregation. +- Introduced in: v3.2.5 + +##### profile_log_roll_num + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Specifies the maximum number of rotated profile log files retained by Log4j's DefaultRolloverStrategy for the profile logger. This value is injected into the logging XML as `${profile_log_roll_num}` (e.g. ``). Rotations are triggered by `profile_log_roll_size_mb` or `profile_log_roll_interval`; when rotation occurs, Log4j keeps at most these indexed files and older index files become eligible for removal. Actual retention on disk is also affected by `profile_log_delete_age` and the `profile_log_dir` location. Lower values reduce disk usage but limit retained history; higher values preserve more historical profile logs. +- Introduced in: v3.2.5 + +##### profile_log_roll_size_mb + +- Default: 1024 +- Type: Int +- Unit: MB +- Is mutable: No +- Description: Sets the size threshold (in megabytes) that triggers a size-based rollover of the FE profile log file. This value is used by the Log4j RollingFile SizeBasedTriggeringPolicy for the `ProfileFile` appender; when a profile log exceeds `profile_log_roll_size_mb` it will be rotated. Rotation can also occur by time when `profile_log_roll_interval` is reached — either condition will trigger rollover. Combined with `profile_log_roll_num` and `profile_log_delete_age`, this item controls how many historical profile files are retained and when old files are deleted. Compression of rotated files is controlled by `enable_profile_log_compress`. +- Introduced in: v3.2.5 + +##### qe_slow_log_ms + +- Default: 5000 +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: The threshold used to determine whether a query is a slow query. If the response time of a query exceeds this threshold, it is recorded as a slow query in **fe.audit.log**. +- Introduced in: - + +##### slow_lock_log_every_ms + +- Default: 3000L +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: Minimum interval (in ms) to wait before emitting another "slow lock" warning for the same SlowLockLogStats instance. LockUtils checks this value after a lock wait exceeds slow_lock_threshold_ms and will suppress additional warnings until slow_lock_log_every_ms milliseconds have passed since the last logged slow-lock event. Use a larger value to reduce log volume during prolonged contention or a smaller value to get more frequent diagnostics. Changes take effect at runtime for subsequent checks. +- Introduced in: v3.2.0 + +##### slow_lock_print_stack + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow LockManager to include the owning thread's full stack trace in the JSON payload of slow-lock warnings emitted by `logSlowLockTrace` (the "stack" array is populated via `LogUtil.getStackTraceToJsonArray` with `start=0` and `max=Short.MAX_VALUE`). This configuration controls only the extra stack information for lock owners shown when a lock acquisition exceeds the threshold configured by `slow_lock_threshold_ms`. Enabling this feature helps debugging by giving precise thread stacks that hold the lock; disabling it reduces log volume and CPU/memory overhead caused by capturing and serializing stack traces in high concurrency environments. +- Introduced in: v3.3.16, v3.4.5, v3.5.1 + +##### slow_lock_threshold_ms + +- Default: 3000L +- Type: long +- Unit: Milliseconds +- Is mutable: Yes +- Description: Threshold (in ms) used to classify a lock operation or a held lock as "slow". When the elapsed wait or hold time for a lock exceeds this value, StarRocks will (depending on context) emit diagnostic logs, include stack traces or waiter/owner info, and—in LockManager—start deadlock detection after this delay. It's used by LockUtils (slow-lock logging), QueryableReentrantReadWriteLock (filtering slow readers), LockManager (deadlock-detection delay and slow-lock trace), LockChecker (periodic slow-lock detection), and other callers (e.g., DiskAndTabletLoadReBalancer logging). Lowering the value increases sensitivity and logging/diagnostic overhead; setting it to 0 or negative disables the initial wait-based deadlock-detection delay behavior. Tune together with slow_lock_log_every_ms, slow_lock_print_stack, and slow_lock_stack_trace_reserve_levels. +- Introduced in: 3.2.0 + +##### sys_log_delete_age + +- Default: 7d +- Type: String +- Unit: - +- Is mutable: No +- Description: The retention period of system log files. The default value `7d` specifies that each system log file can be retained for 7 days. StarRocks checks each system log file and deletes those that were generated 7 days ago. +- Introduced in: - + +##### sys_log_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores system log files. +- Introduced in: - + +##### sys_log_enable_compress + +- Default: false +- Type: boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the system appends a ".gz" postfix to rotated system log filenames so Log4j will produce gzip-compressed rotated FE system logs (for example, fe.log.*). This value is read during Log4j configuration generation (Log4jConfig.initLogging / generateActiveLog4jXmlConfig) and controls the `sys_file_postfix` property used in the RollingFile filePattern. Enabling this feature reduces disk usage for retained logs but increases CPU and I/O during rollovers and changes log filenames, so that tools or scripts that read logs must be able to handle .gz files. Note that audit logs use a separate configuration for compression, that is, `audit_log_enable_compress`. +- Introduced in: v3.2.12 + +##### sys_log_format + +- Default: "plaintext" +- Type: String +- Unit: - +- Is mutable: No +- Description: Selects the Log4j layout used for FE logs. Valid values: `"plaintext"` (Default) and `"json"`. The values are case-insensitive. `"plaintext"` configures PatternLayout with human-readable timestamps, level, thread, class.method:line and stack traces for WARN/ERROR. `"json"` configures JsonTemplateLayout and emits structured JSON events (UTC timestamps, level, thread id/name, source file/method/line, message, exception stackTrace) suitable for log aggregators (ELK, Splunk). JSON output abides by `sys_log_json_max_string_length` and `sys_log_json_profile_max_string_length` for maximum string lengths. +- Introduced in: v3.2.10 + +##### sys_log_json_max_string_length + +- Default: 1048576 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Sets the JsonTemplateLayout "maxStringLength" value used for the JSON-formatted system logs. When `sys_log_format` is set to `"json"`, string-valued fields (for example "message" and stringified exception stack traces) are truncated if their length exceeds this limit. The value is injected into the generated Log4j XML in `Log4jConfig.generateActiveLog4jXmlConfig()`, and is applied to default, warning, audit, dump and bigquery layouts. The profile layout uses a separate configuration (`sys_log_json_profile_max_string_length`). Lowering this value reduces log size but can truncate useful information. +- Introduced in: 3.2.11 + +##### sys_log_json_profile_max_string_length + +- Default: 104857600 (100 MB) +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Sets the maxStringLength of JsonTemplateLayout for profile (and related feature) log appenders when `sys_log_format` is "json". String field values in JSON-formatted profile logs will be truncated to this byte length; non-string fields are unaffected. This item is applied in Log4jConfig `JsonTemplateLayout maxStringLength` and is ignored when `plaintext` logging is used. Keep the value large enough for full messages you need, but note larger values increase log size and I/O. +- Introduced in: v3.2.11 + +##### sys_log_level + +- Default: INFO +- Type: String +- Unit: - +- Is mutable: No +- Description: The severity levels into which system log entries are classified. Valid values: `INFO`, `WARN`, `ERROR`, and `FATAL`. +- Introduced in: - + +##### sys_log_roll_interval + +- Default: DAY +- Type: String +- Unit: - +- Is mutable: No +- Description: The time interval at which StarRocks rotates system log entries. Valid values: `DAY` and `HOUR`. + - If this parameter is set to `DAY`, a suffix in the `yyyyMMdd` format is added to the names of system log files. + - If this parameter is set to `HOUR`, a suffix in the `yyyyMMddHH` format is added to the names of system log files. +- Introduced in: - + +##### sys_log_roll_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of system log files that can be retained within each retention period specified by the `sys_log_roll_interval` parameter. +- Introduced in: - + +##### sys_log_to_console + +- Default: false (unless the environment variable `SYS_LOG_TO_CONSOLE` is set to "1") +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, the system configures Log4j to send all logs to the console (ConsoleErr appender) instead of the file-based appenders. This value is read when generating the active Log4j XML configuration (which affects the root logger and per-module logger appender selection). Its value is captured from the `SYS_LOG_TO_CONSOLE` environment variable at process startup. Changing it at runtime has no effect. This configuration is commonly used in containerized or CI environments where stdout/stderr log collection is preferred over writing log files. +- Introduced in: v3.2.0 + +##### sys_log_verbose_modules + +- Default: Empty string +- Type: String[] +- Unit: - +- Is mutable: No +- Description: The modules for which StarRocks generates system logs. If this parameter is set to `org.apache.starrocks.catalog`, StarRocks generates system logs only for the catalog module. Separate the module names with a comma (,) and a space. +- Introduced in: - + +##### sys_log_warn_modules + +- Default: {} +- Type: String[] +- Unit: - +- Is mutable: No +- Description: A list of logger names or package prefixes that the system will configure at startup as WARN-level loggers and route to the warning appender (SysWF) — the `fe.warn.log` file. Entries are inserted into the generated Log4j configuration (alongside builtin warn modules such as org.apache.kafka, org.apache.hudi, and org.apache.hadoop.io.compress) and produce logger elements like ``. Fully-qualified package and class prefixes (for example, "com.example.lib") are recommended to suppress noisy INFO/DEBUG output into the regular log and to allow warnings to be captured separately. +- Introduced in: v3.2.13 + +### Server + +##### brpc_idle_wait_max_time + +- Default: 10000 +- Type: Int +- Unit: ms +- Is mutable: No +- Description: The maximum length of time for which bRPC clients wait as in the idle state. +- Introduced in: - + +##### brpc_inner_reuse_pool + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: No +- Description: Controls whether the underlying BRPC client uses an internal shared reuse pool for connections/channels. StarRocks reads `brpc_inner_reuse_pool` in BrpcProxy when constructing RpcClientOptions (via `rpcOptions.setInnerResuePool(...)`). When enabled (true) the RPC client reuses internal pools to reduce per-call connection creation, lowering connection churn, memory and file-descriptor usage for FE-to-BE / LakeService RPCs. When disabled (false) the client may create more isolated pools (increasing concurrency isolation at the cost of higher resource usage). Changing this value requires restarting the process to take effect. +- Introduced in: v3.3.11, v3.4.1, v3.5.0 + +##### brpc_min_evictable_idle_time_ms + +- Default: 120000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: Time in milliseconds that an idle BRPC connection must remain in the connection pool before it becomes eligible for eviction. Applied to the RpcClientOptions used by `BrpcProxy` (via RpcClientOptions.setMinEvictableIdleTime). Raise this value to keep idle connections longer (reducing reconnect churn); lower it to free unused sockets faster (reducing resource usage). Tune together with `brpc_connection_pool_size` and `brpc_idle_wait_max_time` to balance connection reuse, pool growth, and eviction behavior. +- Introduced in: v3.3.11, v3.4.1, v3.5.0 + +##### brpc_reuse_addr + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When true, StarRocks sets the socket option to allow local address reuse for client sockets created by the brpc RpcClient (via RpcClientOptions.setReuseAddress). Enabling this reduces bind failures and allows faster rebinding of local ports after sockets are closed, which is helpful for high-rate connection churn or rapid restarts. When false, address/port reuse is disabled, which can reduce the chance of unintended port sharing but may increase transient bind errors. This option interacts with connection behavior configured by `brpc_connection_pool_size` and `brpc_short_connection` because it affects how rapidly client sockets can be rebound and reused. +- Introduced in: v3.3.11, v3.4.1, v3.5.0 + +##### cluster_name + +- Default: StarRocks Cluster +- Type: String +- Unit: - +- Is mutable: No +- Description: The name of the StarRocks cluster to which the FE belongs. The cluster name is displayed for `Title` on the web page. +- Introduced in: - + +##### dns_cache_ttl_seconds + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: DNS cache TTL (Time-To-Live) in seconds for successful DNS lookups. This sets the Java security property `networkaddress.cache.ttl` which controls how long the JVM caches successful DNS lookups. Set this item to `-1` to allow the system to always cache the infomration, or `0` to disable caching. This is particularly useful in environments where IP addresses change frequently, such as Kubernetes deployments or when dynamic DNS is used. +- Introduced in: v3.5.11, v4.0.4 + +##### enable_http_async_handler + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow the system to process HTTP requests asynchronously. If this feature is enabled, an HTTP request received by Netty worker threads will then be submitted to a separate thread pool for service logic handling to avoid blocking the HTTP server. If disabled, Netty workers will handle the service logic. +- Introduced in: 4.0.0 + +##### enable_http_validate_headers + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Controls whether Netty's HttpServerCodec performs strict HTTP header validation. The value is passed to HttpServerCodec when the HTTP pipeline is initialized in `HttpServer` (see UseLocations). Default is false for backward compatibility because newer netty versions enforce stricter header rules (https://github.com/netty/netty/pull/12760). Set to true to enforce RFC-compliant header checks; doing so may cause malformed or nonconforming requests from legacy clients or proxies to be rejected. Change requires a restart of the HTTP server to take effect. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### enable_https + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable HTTPS server alongside HTTP server in FE nodes. +- Introduced in: v4.0 + +##### frontend_address + +- Default: 0.0.0.0 +- Type: String +- Unit: - +- Is mutable: No +- Description: The IP address of the FE node. +- Introduced in: - + +##### http_async_threads_num + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Size of the thread pool for asynchronous HTTP request processing. The alias is `max_http_sql_service_task_threads_num`. +- Introduced in: 4.0.0 + +##### http_backlog_num + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The length of the backlog queue held by the HTTP server in the FE node. +- Introduced in: - + +##### http_max_chunk_size + +- Default: 8192 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Sets the maximum allowed size (in bytes) of a single HTTP chunk handled by Netty's HttpServerCodec in the FE HTTP server. It is passed as the third argument to HttpServerCodec and limits the length of chunks during chunked transfer or streaming requests/responses. If an incoming chunk exceeds this value, Netty will raise a frame-too-large error (e.g., TooLongFrameException) and the request may be rejected. Increase this for legitimate large chunked uploads; keep it small to reduce memory pressure and surface area for DoS attacks. This setting is used alongside `http_max_initial_line_length`, `http_max_header_size`, and `enable_http_validate_headers`. +- Introduced in: v3.2.0 + +##### http_max_header_size + +- Default: 32768 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Maximum allowed size in bytes for the HTTP request header block parsed by Netty's `HttpServerCodec`. StarRocks passes this value to `HttpServerCodec` (as `Config.http_max_header_size`); if an incoming request's headers (names and values combined) exceed this limit, the codec will reject the request (decoder exception) and the connection/request will fail. Increase only when clients legitimately send very large headers (large cookies or many custom headers); larger values increase per-connection memory use. Tune in conjunction with `http_max_initial_line_length` and `http_max_chunk_size`. Changes require FE restart. +- Introduced in: v3.2.0 + +##### http_max_initial_line_length + +- Default: 4096 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Sets the maximum allowed length (in bytes) of the HTTP initial request line (method + request-target + HTTP version) accepted by the Netty `HttpServerCodec` used in HttpServer. The value is passed to Netty's decoder and requests with an initial line longer than this will be rejected (TooLongFrameException). Increase this only when you must support very long request URIs; larger values increase memory use and may raise exposure to malformed/request-abuse. Tune together with `http_max_header_size` and `http_max_chunk_size`. +- Introduced in: v3.2.0 + +##### http_port + +- Default: 8030 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The port on which the HTTP server in the FE node listens. +- Introduced in: - + +##### http_web_page_display_hardware + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When true, the HTTP index page (/index) will include a hardware information section populated via the oshi library (CPU, memory, processes, disks, filesystems, network, etc.). oshi may invoke system utilities or read system files indirectly (for example, it can execute commands such as `getent passwd`), which can surface sensitive system data. If you require stricter security or want to avoid executing those indirect commands on the host, set this configuration to false to disable collection and display of hardware details on the web UI. +- Introduced in: v3.2.0 + +##### http_worker_threads_num + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Number of worker threads for http server to deal with http requests. For a negative or 0 value, the number of threads will be twice the number of cpu cores. +- Introduced in: v2.5.18, v3.0.10, v3.1.7, v3.2.2 + +##### https_port + +- Default: 8443 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The port on which the HTTPS server in the FE node listens. +- Introduced in: v4.0 + +##### max_mysql_service_task_threads_num + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of threads that can be run by the MySQL server in the FE node to process tasks. +- Introduced in: - + +##### max_task_runs_threads_num + +- Default: 512 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Controls the maximum number of threads in the task-run executor thread pool. This value is the upper bound of concurrent task-run executions; increasing it raises parallelism but also increases CPU, memory, and network usage, while reducing it can cause task-run backlog and higher latency. Tune this value according to expected concurrent scheduled jobs and available system resources. +- Introduced in: v3.2.0 + +##### memory_tracker_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Enables the FE memory tracker subsystem. When `memory_tracker_enable` is set to `true`, `MemoryUsageTracker` periodically scans registered metadata modules, updates the in-memory `MemoryUsageTracker.MEMORY_USAGE` map, logs totals, and causes `MetricRepo` to expose memory usage and object-count gauges in metrics output. Use `memory_tracker_interval_seconds` to control the sampling interval. Enabling this feature helps monitoring and debugging memory consumption but introduces CPU and I/O overhead and additional metric cardinality. +- Introduced in: v3.2.4 + +##### memory_tracker_interval_seconds + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Interval in seconds for the FE `MemoryUsageTracker` daemon to poll and record memory usage of the FE process and registered `MemoryTrackable` modules. When `memory_tracker_enable` is set to `true`, the tracker runs on this cadence, updates `MEMORY_USAGE`, and logs aggregated JVM and tracked-module usage. +- Introduced in: v3.2.4 + +##### mysql_nio_backlog_num + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The length of the backlog queue held by the MySQL server in the FE node. +- Introduced in: - + +##### mysql_server_version + +- Default: 8.0.33 +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The MySQL server version returned to the client. Modifying this parameter will affect the version information in the following situations: + 1. `select version();` + 2. Handshake packet version + 3. Value of the global variable `version` (`show variables like 'version';`) +- Introduced in: - + +##### mysql_service_io_threads_num + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of threads that can be run by the MySQL server in the FE node to process I/O events. +- Introduced in: - + +##### mysql_service_kill_after_disconnect + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Controls how the server handles the session when the MySQL TCP connection is detected closed (EOF on read). If it is set to `true`, the server immediately kills any running query for that connection and performs immediate cleanup. If it is `false`, the server does not kill running queries on disconnection and only performs cleanup when there are no pending request tasks, allowing long-running queries to continue after client disconnects. Note: despite a brief comment suggesting TCP keep‑alive, this parameter specifically governs post-disconnection killing behavior and should be set according to whether you want orphaned queries terminated (recommended behind unreliable/load‑balanced clients) or allowed to finish. +- Introduced in: - + +##### mysql_service_nio_enable_keep_alive + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Enable TCP Keep-Alive for MySQL connections. Useful for long-idled connections behind load balancers. +- Introduced in: - + +##### net_use_ipv6_when_priority_networks_empty + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: A boolean value to control whether to use IPv6 addresses preferentially when `priority_networks` is not specified. `true` indicates to allow the system to use an IPv6 address preferentially when the server that hosts the node has both IPv4 and IPv6 addresses and `priority_networks` is not specified. +- Introduced in: v3.3.0 + +##### priority_networks + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: Declares a selection strategy for servers that have multiple IP addresses. Note that at most one IP address must match the list specified by this parameter. The value of this parameter is a list that consists of entries, which are separated with semicolons (;) in CIDR notation, such as 10.10.10.0/24. If no IP address matches the entries in this list, an available IP address of the server will be randomly selected. From v3.3.0, StarRocks supports deployment based on IPv6. If the server has both IPv4 and IPv6 addresses, and this parameter is not specified, the system uses an IPv4 address by default. You can change this behavior by setting `net_use_ipv6_when_priority_networks_empty` to `true`. +- Introduced in: - + +##### proc_profile_cpu_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, the background `ProcProfileCollector` will collect CPU profiles using `AsyncProfiler` and write HTML reports under `sys_log_dir/proc_profile`. Each collection run records CPU stacks for the duration configured by `proc_profile_collect_time_s` and uses `proc_profile_jstack_depth` for Java stack depth. Generated profiles are compressed and old files are pruned according to `proc_profile_file_retained_days` and `proc_profile_file_retained_size_bytes`. `AsyncProfiler` requires the native library (`libasyncProfiler.so`); `one.profiler.extractPath` is set to `STARROCKS_HOME_DIR/bin` to avoid noexec issues on `/tmp`. +- Introduced in: v3.2.12 + +##### qe_max_connection + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of connections that can be established by all users to the FE node. From v3.1.12 and v3.2.7 onwards, the default value has been changed from `1024` to `4096`. +- Introduced in: - + +##### query_port + +- Default: 9030 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The port on which the MySQL server in the FE node listens. +- Introduced in: - + +##### rpc_port + +- Default: 9020 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The port on which the Thrift server in the FE node listens. +- Introduced in: - + +##### slow_lock_stack_trace_reserve_levels + +- Default: 15 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls how many stack-trace frames are captured and emitted when StarRocks dumps lock debug information for slow or held locks. This value is passed to `LogUtil.getStackTraceToJsonArray` by `QueryableReentrantReadWriteLock` when producing JSON for the exclusive lock owner, current thread, and oldest/shared readers. Increasing this value provides more context for diagnosing slow-lock or deadlock issues at the cost of larger JSON payloads and slightly higher CPU/memory for stack capture; decreasing it reduces overhead. Note: reader entries can be filtered by `slow_lock_threshold_ms` when only logging slow locks. +- Introduced in: v3.4.0, v3.5.0 + +##### ssl_cipher_blacklist + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: Comma separated list, with regex support to blacklist ssl cipher suites by IANA names. If both whitelist and blacklist are set, blacklist takes precedence. +- Introduced in: v4.0 + +##### ssl_cipher_whitelist + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: Comma separated list, with regex support to whitelist ssl cipher suites by IANA names. If both whitelist and blacklist are set, blacklist takes precedence. +- Introduced in: v4.0 + +##### task_runs_concurrency + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Global limit of concurrently running TaskRun instances. `TaskRunScheduler` stops scheduling new runs when current running count is greater than or equal to `task_runs_concurrency`, so this value caps parallel TaskRun execution across the scheduler. It is also used by `MVPCTRefreshPartitioner` to compute per-TaskRun partition refresh granularity. Increasing the value raises parallelism and resource usage; decreasing it reduces concurrency and makes partition refreshes larger per run. Do not set to 0 or negative unless intentionally disabling scheduling: 0 (or negative) will effectively prevent new TaskRuns from being scheduled by `TaskRunScheduler`. +- Introduced in: v3.2.0 + +##### task_runs_queue_length + +- Default: 500 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Limits the maximum number of pending TaskRun items kept in the pending queue. `TaskRunManager` checks the current pending count and rejects new submissions when valid pending TaskRun count is greater than or equal to `task_runs_queue_length`. The same limit is rechecked before merged/accepted TaskRuns are added. Tune this value to balance memory and scheduling backlog: set higher for large bursty workloads to avoid rejects, or lower to bound memory and reduce pending backlog. +- Introduced in: v3.2.0 + +##### thrift_backlog_num + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The length of the backlog queue held by the Thrift server in the FE node. +- Introduced in: - + +##### thrift_client_timeout_ms + +- Default: 5000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: The length of time after which idle client connections time out. +- Introduced in: - + +##### thrift_rpc_max_body_size + +- Default: -1 +- Type: Int +- Unit: Bytes +- Is mutable: No +- Description: Controls the maximum allowed Thrift RPC message body size (in bytes) used when constructing the server's Thrift protocol (passed to TBinaryProtocol.Factory in `ThriftServer`). A value of `-1` disables the limit (unbounded). Setting a positive value enforces an upper bound so that messages larger than this are rejected by the Thrift layer, which helps limit memory usage and mitigate oversized-request or DoS risks. Set this to a size large enough for expected payloads (large structs or batched data) to avoid rejecting legitimate requests. +- Introduced in: v3.2.0 + +##### thrift_server_max_worker_threads + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of worker threads that are supported by the Thrift server in the FE node. +- Introduced in: - + +##### thrift_server_queue_size + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The length of queue where requests are pending. If the number of threads that are being processed in the thrift server exceeds the value specified in `thrift_server_max_worker_threads`, new requests are added to the pending queue. +- Introduced in: - + +### Metadata and cluster management + +##### alter_max_worker_queue_size + +- Default: 4096 +- Type: Int +- Unit: Tasks +- Is mutable: No +- Description: Controls the capacity of the internal worker thread pool queue used by the alter subsystem. It is passed to `ThreadPoolManager.newDaemonCacheThreadPool` in `AlterHandler` together with `alter_max_worker_threads`. When the number of pending alter tasks exceeds `alter_max_worker_queue_size`, new submissions will be rejected and a `RejectedExecutionException` can be thrown (see `AlterHandler.handleFinishAlterTask`). Tune this value to balance memory usage and the amount of backlog you permit for concurrent alter tasks. +- Introduced in: v3.2.0 + +##### alter_max_worker_threads + +- Default: 4 +- Type: Int +- Unit: Threads +- Is mutable: No +- Description: Sets the maximum number of worker threads in the AlterHandler's thread pool. The AlterHandler constructs the executor with this value to run and finalize alter-related tasks (e.g., submitting `AlterReplicaTask` via handleFinishAlterTask). This value bounds concurrent execution of alter operations; raising it increases parallelism and resource usage, lowering it limits concurrent alters and may become a bottleneck. The executor is created together with `alter_max_worker_queue_size`, and the handler scheduling uses `alter_scheduler_interval_millisecond`. +- Introduced in: v3.2.0 + +##### automated_cluster_snapshot_interval_seconds + +- Default: 600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The interval at which the Automated Cluster Snapshot tasks are triggered. +- Introduced in: v3.4.2 + +##### background_refresh_metadata_interval_millis + +- Default: 600000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The interval between two consecutive Hive metadata cache refreshes. +- Introduced in: v2.5.5 + +##### background_refresh_metadata_time_secs_since_last_access_secs + +- Default: 3600 * 24 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The expiration time of a Hive metadata cache refresh task. For the Hive catalog that has been accessed, if it has not been accessed for more than the specified time, StarRocks stops refreshing its cached metadata. For the Hive catalog that has not been accessed, StarRocks will not refresh its cached metadata. +- Introduced in: v2.5.5 + +##### bdbje_cleaner_threads + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Number of background cleaner threads for the Berkeley DB Java Edition (JE) environment used by StarRocks journal. This value is read during environment initialization in `BDBEnvironment.initConfigs` and applied to `EnvironmentConfig.CLEANER_THREADS` using `Config.bdbje_cleaner_threads`. It controls parallelism for JE log cleaning and space reclamation; increasing it can speed up cleaning at the cost of additional CPU and I/O interference with foreground operations. Changes take effect only when the BDB environment is (re)initialized, so a frontend restart is required to apply a new value. +- Introduced in: v3.2.0 + +##### bdbje_heartbeat_timeout_second + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The amount of time after which the heartbeats among the leader, follower, and observer FEs in the StarRocks cluster time out. +- Introduced in: - + +##### bdbje_lock_timeout_second + +- Default: 1 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The amount of time after which a lock in the BDB JE-based FE times out. +- Introduced in: - + +##### bdbje_replay_cost_percent + +- Default: 150 +- Type: Int +- Unit: Percent +- Is mutable: No +- Description: Sets the relative cost (as a percentage) of replaying transactions from a BDB JE log versus obtaining the same data via a network restore. The value is supplied to the underlying JE replication parameter REPLAY_COST_PERCENT and is typically `>100` to indicate that replay is usually more expensive than a network restore. When deciding whether to retain cleaned log files for potential replay, the system compares replay cost multiplied by log size against the cost of a network restore; files will be removed if network restore is judged more efficient. A value of 0 disables retention based on this cost comparison. Log files required for replicas within `REP_STREAM_TIMEOUT` or for any active replication are always retained. +- Introduced in: v3.2.0 + +##### bdbje_replica_ack_timeout_second + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The maximum amount of time for which the leader FE can wait for ACK messages from a specified number of follower FEs when metadata is written from the leader FE to the follower FEs. Unit: second. If a large amount of metadata is being written, the follower FEs require a long time before they can return ACK messages to the leader FE, causing ACK timeout. In this situation, metadata writes fail, and the FE process exits. We recommend that you increase the value of this parameter to prevent this situation. +- Introduced in: - + +##### bdbje_reserved_disk_size + +- Default: 512 * 1024 * 1024 (536870912) +- Type: Long +- Unit: Bytes +- Is mutable: No +- Description: Limits the number of bytes Berkeley DB JE will reserve as "unprotected" (deletable) log/data files. StarRocks passes this value to JE via `EnvironmentConfig.RESERVED_DISK` in BDBEnvironment; JE's built-in default is 0 (unlimited). The StarRocks default (512 MiB) prevents JE from reserving excessive disk space for unprotected files while allowing safe cleanup of obsolete files. Tune this value on disk-constrained systems: decreasing it lets JE free more files sooner, increasing it lets JE retain more reserved space. Changes require restarting the process to take effect. +- Introduced in: v3.2.0 + +##### bdbje_reset_election_group + +- Default: false +- Type: String +- Unit: - +- Is mutable: No +- Description: Whether to reset the BDBJE replication group. If this parameter is set to `TRUE`, the FE will reset the BDBJE replication group (that is, remove the information of all electable FE nodes) and start as the leader FE. After the reset, this FE will be the only member in the cluster, and other FEs can rejoin this cluster by using `ALTER SYSTEM ADD/DROP FOLLOWER/OBSERVER 'xxx'`. Use this setting only when no leader FE can be elected because the data of most follower FEs have been damaged. `reset_election_group` is used to replace `metadata_failure_recovery`. +- Introduced in: - + +##### black_host_connect_failures_within_time + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The threshold of connection failures allowed for a blacklisted BE node. If a BE node is added to the BE Blacklist automatically, StarRocks will assess its connectivity and judge whether it can be removed from the BE Blacklist. Within `black_host_history_sec`, only if a blacklisted BE node has fewer connection failures than the threshold set in `black_host_connect_failures_within_time`, it can be removed from the BE Blacklist. +- Introduced in: v3.3.0 + +##### black_host_history_sec + +- Default: 2 * 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time duration for retaining historical connection failures of BE nodes in the BE Blacklist. If a BE node is added to the BE Blacklist automatically, StarRocks will assess its connectivity and judge whether it can be removed from the BE Blacklist. Within `black_host_history_sec`, only if a blacklisted BE node has fewer connection failures than the threshold set in `black_host_connect_failures_within_time`, it can be removed from the BE Blacklist. +- Introduced in: v3.3.0 + +##### brpc_connection_pool_size + +- Default: 16 +- Type: Int +- Unit: Connections +- Is mutable: No +- Description: The maximum number of pooled BRPC connections per endpoint used by the FE's BrpcProxy. This value is applied to RpcClientOptions via `setMaxTotoal` and `setMaxIdleSize`, so it directly limits concurrent outgoing BRPC requests because each request must borrow a connection from the pool. In high concurrency scenarios increase this to avoid request queuing; increasing it raises socket and memory usage and may increase remote server load. When tuning, consider related settings such as `brpc_idle_wait_max_time`, `brpc_short_connection`, `brpc_inner_reuse_pool`, `brpc_reuse_addr`, and `brpc_min_evictable_idle_time_ms`. Changing this value is not hot-reloadable and requires a restart. +- Introduced in: v3.2.0 + +##### brpc_short_connection + +- Default: false +- Type: boolean +- Unit: - +- Is mutable: No +- Description: Controls whether the underlying brpc RpcClient uses short-lived connections. When enabled (`true`), RpcClientOptions.setShortConnection is set and connections are closed after a request completes, reducing the number of long-lived sockets at the cost of higher connection setup overhead and increased latency. When disabled (`false`, the default) persistent connections and connection pooling are used. Enabling this option affects connection-pool behavior and should be considered together with `brpc_connection_pool_size`, `brpc_idle_wait_max_time`, `brpc_min_evictable_idle_time_ms`, `brpc_reuse_addr`, and `brpc_inner_reuse_pool`. Keep it disabled for typical high-throughput deployments; enable only to limit socket lifetime or when short connections are required by network policy. +- Introduced in: v3.3.11, v3.4.1, v3.5.0 + +##### catalog_try_lock_timeout_ms + +- Default: 5000 +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: The timeout duration to obtain the global lock. +- Introduced in: - + +##### checkpoint_only_on_leader + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When `true`, the CheckpointController will only select the leader FE as the checkpoint worker; when `false`, the controller may pick any frontend and prefers nodes with lower heap usage. With `false`, workers are sorted by recent failure time and `heapUsedPercent` (the leader is treated as having infinite usage to avoid selecting it). For operations that require cluster snapshot metadata, the controller already forces leader selection regardless of this flag. Enabling `true` centralizes checkpoint work on the leader (simpler but increases leader CPU/memory and network load); keeping it `false` distributes checkpoint load to less-loaded FEs. This setting affects worker selection and interaction with timeouts such as `checkpoint_timeout_seconds` and RPC settings like `thrift_rpc_timeout_ms`. +- Introduced in: v3.4.0, v3.5.0 + +##### checkpoint_timeout_seconds + +- Default: 24 * 3600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: Maximum time (in seconds) the leader's CheckpointController will wait for a checkpoint worker to complete a checkpoint. The controller converts this value to nanoseconds and polls the worker result queue; if no successful completion is received within this timeout the checkpoint is treated as failed and createImage returns failure. Increasing this value accommodates longer-running checkpoints but delays failure detection and subsequent image propagation; decreasing it causes faster failover/retries but can produce false timeouts for slow workers. This setting only controls the waiting period in `CheckpointController` during checkpoint creation and does not change the worker's internal checkpointing behavior. +- Introduced in: v3.4.0, v3.5.0 + +##### db_used_data_quota_update_interval_secs + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The interval at which the database used data quota is updated. StarRocks periodically updates the used data quota for all databases to track storage consumption. This value is used for quota enforcement and metrics collection. The minimum allowed interval is 30 seconds to prevent excessive system load. A value less than 30 will be rejected. +- Introduced in: - + +##### drop_backend_after_decommission + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to delete a BE after the BE is decommissioned. `TRUE` indicates that the BE is deleted immediately after it is decommissioned. `FALSE` indicates that the BE is not deleted after it is decommissioned. +- Introduced in: - + +##### edit_log_port + +- Default: 9010 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The port that is used for communication among the Leader, Follower, and Observer FEs in the cluster. +- Introduced in: - + +##### edit_log_roll_num + +- Default: 50000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of metadata log entries that can be written before a log file is created for these log entries. This parameter is used to control the size of log files. The new log file is written to the BDBJE database. +- Introduced in: - + +##### edit_log_type + +- Default: BDB +- Type: String +- Unit: - +- Is mutable: No +- Description: The type of edit log that can be generated. Set the value to `BDB`. +- Introduced in: - + +##### enable_background_refresh_connector_metadata + +- Default: true in v3.0 and later and false in v2.5 +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the periodic Hive metadata cache refresh. After it is enabled, StarRocks polls the metastore (Hive Metastore or AWS Glue) of your Hive cluster, and refreshes the cached metadata of the frequently accessed Hive catalogs to perceive data changes. `true` indicates to enable the Hive metadata cache refresh, and `false` indicates to disable it. +- Introduced in: v2.5.5 + +##### enable_collect_query_detail_info + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to collect the profile of a query. If this parameter is set to `TRUE`, the system collects the profile of the query. If this parameter is set to `FALSE`, the system does not collect the profile of the query. +- Introduced in: - + +##### enable_create_partial_partition_in_batch + +- Default: false +- Type: boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `false` (default), StarRocks enforces that batch-created range partitions align to the standard time unit boundaries. It will reject non‑aligned ranges to avoid creating holes. Setting this item to `true` disables that alignment check and allows creating partial (non‑standard) partitions in batch, which can produce gaps or misaligned partition ranges. You should only set it to `true` when you intentionally need partial batch partitions and accept the associated risks. +- Introduced in: v3.2.0 + +##### enable_internal_sql + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: When this item is set to `true`, internal SQL statements executed by internal components (for example, SimpleExecutor) are preserved and written into internal audit or log messages (and can be further desensitized if `enable_sql_desensitize_in_log` is set). When it is set to `false`, internal SQL text is suppressed: formatting code (SimpleExecutor.formatSQL) returns "?" and the actual statement is not emitted to internal audit or log messages. This configuration does not change execution semantics of internal statements — it only controls logging and visibility of internal SQL for privacy or security. +- Introduced in: - + +##### enable_legacy_compatibility_for_replication + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the Legacy Compatibility for Replication. StarRocks may behave differently between the old and new versions, causing problems during cross-cluster data migration. Therefore, you must enable Legacy Compatibility for the target cluster before data migration and disable it after data migration is completed. `true` indicates enabling this mode. +- Introduced in: v3.1.10, v3.2.6 + +##### enable_show_materialized_views_include_all_task_runs + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Controls how TaskRuns are returned to the SHOW MATERIALIZED VIEWS command. When this item is set to `false`, StarRocks returns only the newest TaskRun per task (legacy behavior for compatibility). When it is set to `true` (default), `TaskManager` may include additional TaskRuns for the same task only when they share the same start TaskRun ID (for example, belong to the same job), preventing unrelated duplicate runs from appearing while allowing multiple statuses tied to one job to be shown. Set this item to `false` to restore single-run output or to surface multi-run job history for debugging and monitoring. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### enable_statistics_collect_profile + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to generate profiles for statistics queries. You can set this item to `true` to allow StarRocks to generate query profiles for queries on system statistics. +- Introduced in: v3.1.5 + +##### enable_table_name_case_insensitive + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable case-insensitive processing on catalog names, database names, table names, view names, and materialized view names. Currently, table names are case-sensitive by default. + - After enabling this feature, all related names will be stored in lowercase, and all SQL commands containing these names will automatically convert them to lowercase. + - You can enable this feature only when creating a cluster. **After the cluster is started, the value of this configuration cannot be modified by any means**. Any attempt to modify it will result in an error. FE will fail to start when it detects that the value of this configuration item is inconsistent with that when the cluster was first started. + - Currently, this feature does not support JDBC catalog and table names. Do not enable this feature if you want to perform case-insensitive processing on JDBC or ODBC data sources. +- Introduced in: v4.0 + +##### enable_task_history_archive + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, finished task-run records are archived to the persistent task-run history table and recorded to the edit log so lookups (e.g., `lookupHistory`, `lookupHistoryByTaskNames`, `lookupLastJobOfTasks`) include archived results. Archiving is performed by the FE leader and is skipped during unit tests (`FeConstants.runningUnitTest`). When enabled, in-memory expiration and forced-GC paths are bypassed (the code returns early from `removeExpiredRuns` and `forceGC`), so retention/eviction is handled by the persistent archive instead of `task_runs_ttl_second` and `task_runs_max_history_number`. When disabled, history stays in memory and is pruned by those configurations. +- Introduced in: v3.3.1, v3.4.0, v3.5.0 + +##### enable_task_run_fe_evaluation + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, the FE will perform local evaluation for the system table `task_runs` in `TaskRunsSystemTable.supportFeEvaluation`. FE-side evaluation is only allowed for conjunctive equality predicates comparing a column to a constant and is limited to the columns `QUERY_ID` and `TASK_NAME`. Enabling this improves performance for targeted lookups by avoiding broader scans or additional remote processing; disabling it forces the planner to skip FE evaluation for `task_runs`, which may reduce predicate pruning and affect query latency for those filters. +- Introduced in: v3.3.13, v3.4.3, v3.5.0 + +##### heartbeat_mgr_blocking_queue_size + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The size of the blocking queue that stores heartbeat tasks run by the Heartbeat Manager. +- Introduced in: - + +##### heartbeat_mgr_threads_num + +- Default: 8 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The number of threads that can be run by the Heartbeat Manager to run heartbeat tasks. +- Introduced in: - + +##### ignore_materialized_view_error + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether FE ignores the metadata exception caused by materialized view errors. If FE fails to start due to the metadata exception caused by materialized view errors, you can set this parameter to `true` to allow FE to ignore the exception. +- Introduced in: v2.5.10 + +##### ignore_meta_check + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether non-Leader FEs ignore the metadata gap from the Leader FE. If the value is TRUE, non-Leader FEs ignore the metadata gap from the Leader FE and continue providing data reading services. This parameter ensures continuous data reading services even when you stop the Leader FE for a long period of time. If the value is FALSE, non-Leader FEs do not ignore the metadata gap from the Leader FE and stop providing data reading services. +- Introduced in: - + +##### ignore_task_run_history_replay_error + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When StarRocks deserializes TaskRun history rows for `information_schema.task_runs`, a corrupted or invalid JSON row will normally cause deserialization to log a warning and throw a RuntimeException. If this item is set to `true`, the system will catch deserialization errors, skip the malformed record, and continue processing remaining rows instead of failing the query. This will make `information_schema.task_runs` queries tolerant of bad entries in the `_statistics_.task_run_history` table. Note that enabling it will silently drop corrupted history records (potential data loss) instead of surfacing an explicit error. +- Introduced in: v3.3.3, v3.4.0, v3.5.0 + +##### lock_checker_interval_second + +- Default: 30 +- Type: long +- Unit: Seconds +- Is mutable: Yes +- Description: Interval, in seconds, between executions of the LockChecker frontend daemon (named "deadlock-checker"). The daemon performs deadlock detection and slow-lock scanning; the configured value is multiplied by 1000 to set the timer in milliseconds. Decreasing this value reduces detection latency but increases scheduling and CPU overhead; increasing it reduces overhead but delays detection and slow-lock reporting. Changes take effect at runtime because the daemon resets its interval each run. This setting interacts with `lock_checker_enable_deadlock_check` (enables deadlock checks) and `slow_lock_threshold_ms` (defines what constitutes a slow lock). +- Introduced in: v3.2.0 + +##### master_sync_policy + +- Default: SYNC +- Type: String +- Unit: - +- Is mutable: No +- Description: The policy based on which the leader FE flushes logs to disk. This parameter is valid only when the current FE is a leader FE. Valid values: + - `SYNC`: When a transaction is committed, a log entry is generated and flushed to disk simultaneously. + - `NO_SYNC`: The generation and flushing of a log entry do not occur at the same time when a transaction is committed. + - `WRITE_NO_SYNC`: When a transaction is committed, a log entry is generated simultaneously but is not flushed to disk. + + If you have deployed only one follower FE, we recommend that you set this parameter to `SYNC`. If you have deployed three or more follower FEs, we recommend that you set this parameter and the `replica_sync_policy` both to `WRITE_NO_SYNC`. + +- Introduced in: - + +##### max_bdbje_clock_delta_ms + +- Default: 5000 +- Type: Long +- Unit: Milliseconds +- Is mutable: No +- Description: The maximum clock offset that is allowed between the leader FE and the follower or observer FEs in the StarRocks cluster. +- Introduced in: - + +##### meta_delay_toleration_second + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum duration by which the metadata on the follower and observer FEs can lag behind that on the leader FE. Unit: seconds. If this duration is exceeded, the non-leader FEs stops providing services. +- Introduced in: - + +##### meta_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/meta" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores metadata. +- Introduced in: - + +##### metadata_ignore_unknown_operation_type + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to ignore an unknown log ID. When an FE is rolled back, the FEs of the earlier version may be unable to recognize some log IDs. If the value is `TRUE`, the FE ignores unknown log IDs. If the value is `FALSE`, the FE exits. +- Introduced in: - + +##### profile_info_format + +- Default: default +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The format of the Profile output by the system. Valid values: `default` and `json`. When set to `default`, Profile is of the default format. When set to `json`, the system outputs Profile in JSON format. +- Introduced in: v2.5 + +##### replica_ack_policy + +- Default: SIMPLE_MAJORITY +- Type: String +- Unit: - +- Is mutable: No +- Description: The policy based on which a log entry is considered valid. The default value `SIMPLE_MAJORITY` specifies that a log entry is considered valid if a majority of follower FEs return ACK messages. +- Introduced in: - + +##### replica_sync_policy + +- Default: SYNC +- Type: String +- Unit: - +- Is mutable: No +- Description: The policy based on which the follower FE flushes logs to disk. This parameter is valid only when the current FE is a follower FE. Valid values: + - `SYNC`: When a transaction is committed, a log entry is generated and flushed to disk simultaneously. + - `NO_SYNC`: The generation and flushing of a log entry do not occur at the same time when a transaction is committed. + - `WRITE_NO_SYNC`: When a transaction is committed, a log entry is generated simultaneously but is not flushed to disk. +- Introduced in: - + +##### start_with_incomplete_meta + +- Default: false +- Type: boolean +- Unit: - +- Is mutable: No +- Description: When true, the FE will allow startup when image data exists but Berkeley DB JE (BDB) log files are missing or corrupted. `MetaHelper.checkMetaDir()` uses this flag to bypass the safety check that otherwise prevents starting from an image without corresponding BDB logs; starting this way can produce stale or inconsistent metadata and should only be used for emergency recovery. `RestoreClusterSnapshotMgr` temporarily sets this flag to true while restoring a cluster snapshot and then rolls it back; that component also toggles `bdbje_reset_election_group` during restore. Do not enable in normal operation — enable only when recovering from corrupted BDB data or when explicitly restoring an image-based snapshot. +- Introduced in: v3.2.0 + +##### table_keeper_interval_second + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Interval, in seconds, between executions of the TableKeeper daemon. The TableKeeperDaemon uses this value (multiplied by 1000) to set its internal timer and periodically runs keeper tasks that ensure history tables exist, correct table properties (replication number) and update partition TTLs. The daemon only performs work on the leader node and updates its runtime interval via setInterval when `table_keeper_interval_second` changes. Increase to reduce scheduling frequency and load; decrease for faster reaction to missing or stale history tables. +- Introduced in: v3.3.1, v3.4.0, v3.5.0 + +##### task_runs_ttl_second + +- Default: 7 * 24 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Controls the time-to-live (TTL) for task run history. Lowering this value shortens history retention and reduces memory/disk usage; raising it keeps histories longer but increases resource usage. Adjust together with `task_runs_max_history_number` and `enable_task_history_archive` for predictable retention and storage behavior. +- Introduced in: v3.2.0 + +##### task_ttl_second + +- Default: 24 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Time-to-live (TTL) for tasks. For manual tasks (when no schedule is set), TaskBuilder uses this value to compute the task's `expireTime` (`expireTime = now + task_ttl_second * 1000L`). TaskRun also uses this value as an upper bound when computing a run's execute timeout — the effective execute timeout is `min(task_runs_timeout_second, task_runs_ttl_second, task_ttl_second)`. Adjusting this value changes how long manually created tasks remain valid and can indirectly limit the maximum allowed execution time of task runs. +- Introduced in: v3.2.0 + +##### thrift_rpc_retry_times + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls the total number of attempts a Thrift RPC call will make. This value is used by `ThriftRPCRequestExecutor` (and callers such as `NodeMgr` and `VariableMgr`) as the loop count for retries — i.e., a value of 3 allows up to three attempts including the initial try. On `TTransportException` the executor will try to reopen the connection and retry up to this count; it will not retry when the cause is a `SocketTimeoutException` or when reopen fails. Each attempt is subject to the per-attempt timeout configured by `thrift_rpc_timeout_ms`. Increasing this value improves resilience to transient connection failures but can increase overall RPC latency and resource usage. +- Introduced in: v3.2.0 + +##### thrift_rpc_strict_mode + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Controls the TBinaryProtocol "strict read" mode used by the Thrift server. This value is passed as the first argument to org.apache.thrift.protocol.TBinaryProtocol.Factory in the Thrift server stack and affects how incoming Thrift messages are parsed and validated. When `true` (default), the server enforces strict Thrift encoding/version checks and honors the configured `thrift_rpc_max_body_size` limit; when `false`, the server accepts non-strict (legacy/lenient) message formats, which can improve compatibility with older clients but may bypass some protocol validations. Use caution changing this on a running cluster because it is not mutable and affects interoperability and parsing safety. +- Introduced in: v3.2.0 + +##### thrift_rpc_timeout_ms + +- Default: 10000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Timeout (in milliseconds) used as the default network/socket timeout for Thrift RPC calls. It is passed to TSocket when creating Thrift clients in `ThriftConnectionPool` (used by the frontend and backend pools) and is also added to an operation's execution timeout (e.g., ExecTimeout*1000 + `thrift_rpc_timeout_ms`) when computing RPC call timeouts in places such as `ConfigBase`, `LeaderOpExecutor`, `GlobalStateMgr`, `NodeMgr`, `VariableMgr`, and `CheckpointWorker`. Increasing this value makes RPC calls tolerate longer network or remote processing delays; decreasing it causes faster failover on slow networks. Changing this value affects connection creation and request deadlines across the FE code paths that perform Thrift RPCs. +- Introduced in: v3.2.0 + +##### txn_latency_metric_report_groups + +- Default: An empty string +- Type: String +- Unit: - +- Is mutable: Yes +- Description: A comma-separated list of transaction latency metric groups to report. Load types are categorized into logical groups for monitoring. When a group is enabled, its name is added as a 'type' label to transaction metrics. Valid values: `stream_load`, `routine_load`, `broker_load`, `insert`, and `compaction` (availabl only for shared-data clusters). Example: `"stream_load,routine_load"`. +- Introduced in: v4.0 + +##### txn_rollback_limit + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of transactions that can be rolled back. +- Introduced in: - + +### User, role, and privilege + +##### enable_task_info_mask_credential + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When true, StarRocks redacts credentials from task SQL definitions before returning them in information_schema.tasks and information_schema.task_runs by applying SqlCredentialRedactor.redact to the DEFINITION column. In `information_schema.task_runs` the same redaction is applied whether the definition comes from the task run status or, when empty, from the task definition lookup. When false, raw task definitions are returned (may expose credentials). Masking is CPU/string-processing work and can be time-consuming when the number of tasks or task_runs is large; disable only if you need unredacted definitions and accept the security risk. +- Introduced in: v3.5.6 + +##### privilege_max_role_depth + +- Default: 16 +- Type: Int +- Unit: +- Is mutable: Yes +- Description: The maximum role depth (level of inheritance) of a role. +- Introduced in: v3.0.0 + +##### privilege_max_total_roles_per_user + +- Default: 64 +- Type: Int +- Unit: +- Is mutable: Yes +- Description: The maximum number of roles a user can have. +- Introduced in: v3.0.0 + +### Query engine + +##### brpc_send_plan_fragment_timeout_ms + +- Default: 60000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Timeout in milliseconds applied to the BRPC TalkTimeoutController before sending a plan fragment. `BackendServiceClient.sendPlanFragmentAsync` sets this value prior to calling the backend `execPlanFragmentAsync`. It governs how long BRPC will wait when borrowing an idle connection from the connection pool and while performing the send; if exceeded, the RPC will fail and may trigger the method's retry logic. Set this lower to fail fast under contention, or raise it to tolerate transient pool exhaustion or slow networks. Be cautious: very large values can delay failure detection and block request threads. +- Introduced in: v3.3.11, v3.4.1, v3.5.0 + +##### connector_table_query_trigger_analyze_large_table_interval + +- Default: 12 * 3600 +- Type: Int +- Unit: Second +- Is mutable: Yes +- Description: The interval for query-trigger ANALYZE tasks of large tables. +- Introduced in: v3.4.0 + +##### connector_table_query_trigger_analyze_max_pending_task_num + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Maximum number of query-trigger ANALYZE tasks that are in Pending state on the FE. +- Introduced in: v3.4.0 + +##### connector_table_query_trigger_analyze_max_running_task_num + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Maximum number of query-trigger ANALYZE tasks that are in Running state on the FE. +- Introduced in: v3.4.0 + +##### connector_table_query_trigger_analyze_small_table_interval + +- Default: 2 * 3600 +- Type: Int +- Unit: Second +- Is mutable: Yes +- Description: The interval for query-trigger ANALYZE tasks of small tables. +- Introduced in: v3.4.0 + +##### connector_table_query_trigger_analyze_small_table_rows + +- Default: 10000000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The threshold for determining whether a table is a small table for query-trigger ANALYZE tasks. +- Introduced in: v3.4.0 + +##### connector_table_query_trigger_task_schedule_interval + +- Default: 30 +- Type: Int +- Unit: Second +- Is mutable: Yes +- Description: The interval at which the Scheduler thread schedules the query-trigger background tasks. This item is to replace `connector_table_query_trigger_analyze_schedule_interval` introduced in v3.4.0. Here, the background tasks refer `ANALYZE` tasks in v3.4,and the collection task of low-cardinality columns' dictionary in versions later than v3.4. +- Introduced in: v3.4.2 + +##### create_table_max_serial_replicas + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of replicas to create serially. If actual replica count exceeds this value, replicas will be created concurrently. Try to reduce this value if table creation is taking a long time to complete. +- Introduced in: - + +##### default_mv_partition_refresh_number + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: When a materialized view refresh involves multiple partitions, this parameter controls how many partitions are refreshed in a single batch by default. +Starting from version 3.3.0, the system defaults to refreshing one partition at a time to avoid potential out-of-memory (OOM) issues. In earlier versions, all partitions were refreshed at once by default, which could lead to memory exhaustion and task failure. However, note that when a materialized view refresh involves a large number of partitions, refreshing only one partition at a time may lead to excessive scheduling overhead, longer overall refresh time, and a large number of refresh records. In such cases, it is recommended to adjust this parameter appropriately to improve refresh efficiency and reduce scheduling costs. +- Introduced in: v3.3.0 + +##### default_mv_refresh_immediate + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to refresh an asynchronous materialized view immediately after creation. When this item is set to `true`, newly created materialized view will be refreshed immediately. +- Introduced in: v3.2.3 + +##### dynamic_partition_check_interval_seconds + +- Default: 600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The interval at which new data is checked. If new data is detected, StarRocks automatically creates partitions for the data. +- Introduced in: - + +##### dynamic_partition_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the dynamic partitioning feature. When this feature is enabled, StarRocks dynamically creates partitions for new data and automatically deletes expired partitions to ensure the freshness of data. +- Introduced in: - + +##### enable_active_materialized_view_schema_strict_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to strictly check the length consistency of data types when activating an inactive materialized view. When this item is set to `false`, the activation of the materialized view is not affected if the length of the data types has changed in the base table. +- Introduced in: v3.3.4 + +##### enable_auto_collect_array_ndv + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable automatic collection for the NDV information of the ARRAY type. +- Introduced in: v4.0 + +##### enable_backup_materialized_view + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the BACKUP and RESTORE of asynchronous materialized views when backing up or restoring a specific database. If this item is set to `false`, StarRocks will skip backing up asynchronous materialized views. +- Introduced in: v3.2.0 + +##### enable_collect_full_statistic + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable automatic full statistics collection. This feature is enabled by default. +- Introduced in: - + +##### enable_colocate_mv_index + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to support colocating the synchronous materialized view index with the base table when creating a synchronous materialized view. If this item is set to `true`, tablet sink will speed up the write performance of synchronous materialized views. +- Introduced in: v3.2.0 + +##### enable_decimal_v3 + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to support the DECIMAL V3 data type. +- Introduced in: - + +##### enable_experimental_mv + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the asynchronous materialized view feature. TRUE indicates this feature is enabled. From v2.5.2 onwards, this feature is enabled by default. For versions earlier than v2.5.2, this feature is disabled by default. +- Introduced in: v2.4 + +##### enable_local_replica_selection + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to select local replicas for queries. Local replicas reduce the network transmission cost. If this parameter is set to TRUE, the CBO preferentially selects tablet replicas on BEs that have the same IP address as the current FE. If this parameter is set to `FALSE`, both local replicas and non-local replicas can be selected. +- Introduced in: - + +##### enable_manual_collect_array_ndv + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable manual collection for the NDV information of the ARRAY type. +- Introduced in: v4.0 + +##### enable_materialized_view + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the creation of materialized views. +- Introduced in: - + +##### enable_materialized_view_external_table_precise_refresh + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Set this item to `true` to enable an internal optimization for materialized view refresh when a base table is an external (non-cloud-native) table. When enabled, the materialized view refresh processor computes candidate partitions and refreshes only the affected base-table partitions instead of all partitions, reducing I/O and refresh cost. Set it to `false` to force full-partition refresh of external tables. +- Introduced in: v3.2.9 + +##### enable_materialized_view_metrics_collect + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to collect monitoring metrics for asynchronous materialized views by default. +- Introduced in: v3.1.11, v3.2.5 + +##### enable_materialized_view_spill + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable Intermediate Result Spilling for materialized view refresh tasks. +- Introduced in: v3.1.1 + +##### enable_materialized_view_text_based_rewrite + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable text-based query rewrite by default. If this item is set to `true`, the system builds the abstract syntax tree while creating an asynchronous materialized view. +- Introduced in: v3.2.5 + +##### enable_mv_automatic_active_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the system to automatically check and re-activate the asynchronous materialized views that are set inactive because their base tables (views) had undergone Schema Change or had been dropped and re-created. Please note that this feature will not re-activate the materialized views that are manually set inactive by users. +- Introduced in: v3.1.6 + +##### enable_mv_automatic_repairing_for_broken_base_tables + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, StarRocks will attempt to automatically repair materialized view base-table metadata when a base external table is dropped and recreated or its table identifier changes. The repair flow can update the materialized view's base table information, collect partition-level repair information for external table partitions, and drive partition refresh decisions for async auto-refresh materialized views while honoring `autoRefreshPartitionsLimit`. Currently the automated repair supports Hive external tables; unsupported table types will cause the materialized view to be set inactive and a repair exception. Partition information collection is non-blocking and failures are logged. +- Introduced in: v3.3.19, v3.4.8, v3.5.6 + +##### enable_predicate_columns_collection + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable predicate columns collection. If disabled, predicate columns will not be recorded during query optimization. +- Introduced in: - + +##### enable_query_queue_v2 + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: No +- Description: When true, switches the FE slot-based query scheduler to Query Queue V2. The flag is read by the slot manager and trackers (for example, `BaseSlotManager.isEnableQueryQueueV2` and `SlotTracker#createSlotSelectionStrategy`) to choose `SlotSelectionStrategyV2` instead of the legacy strategy. `query_queue_v2_xxx` configuration options and `QueryQueueOptions` take effect only when this flag is enabled. From v4.1 onwards, the default value is changed from `false` to `true`. +- Introduced in: v3.3.4, v3.4.0, v3.5.0 + +##### enable_sql_blacklist + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable blacklist check for SQL queries. When this feature is enabled, queries in the blacklist cannot be executed. +- Introduced in: - + +##### enable_statistic_collect + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to collect statistics for the CBO. This feature is enabled by default. +- Introduced in: - + +##### enable_statistic_collect_on_first_load + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Controls automatic statistics collection and maintenance triggered by data loading operations. This includes: + - Statistics collection when data is first loaded into a partition (partition version equals 2). + - Statistics collection when data is loaded into empty partitions of multi-partition tables. + - Statistics copying and updating for INSERT OVERWRITE operations. + + **Decision Policy for Statistics Collection Type:** + + - For INSERT OVERWRITE: `deltaRatio = |targetRows - sourceRows| / (sourceRows + 1)` + - If `deltaRatio < statistic_sample_collect_ratio_threshold_of_first_load` (Default: 0.1), statistics collection will not be performed. Only the existing statistics will be copied. + - Else, if `targetRows > statistic_sample_collect_rows` (Default: 200000), SAMPLE statistics collection is used. + - Else, FULL statistics collection is used. + + - For First Load: `deltaRatio = loadRows / (totalRows + 1)` + - If `deltaRatio < statistic_sample_collect_ratio_threshold_of_first_load` (Default: 0.1), statistics collection will not be performed. + - Else, if `loadRows > statistic_sample_collect_rows` (Default: 200000), SAMPLE statistics collection is used. + - Else, FULL statistics collection is used. + + **Synchronization Behavior:** + + - For DML statements (INSERT INTO/INSERT OVERWRITE): Synchronous mode with table lock. The load operation waits for statistics collection to complete (up to `semi_sync_collect_statistic_await_seconds`). + - For Stream Load and Broker Load: Asynchronous mode without lock. Statistics collection runs in background without blocking the load operation. + + :::note + Disabling this configuration will prevent all loading-triggered statistics operations, including statistics maintenance for INSERT OVERWRITE, which may result in tables lacking statistics. If new tables are frequently created and data is frequently loaded, enabling this feature will increase memory and CPU overhead. + ::: + +- Introduced in: v3.1 + +##### enable_statistic_collect_on_update + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Controls whether UPDATE statements can trigger automatic statistics collection. When enabled, UPDATE operations that modify table data may schedule statistics collection through the same ingestion-based statistics framework controlled by `enable_statistic_collect_on_first_load`. Disabling this configuration skips statistics collection for UPDATE statements while keeping load-triggered statistics collection behavior unchanged. +- Introduced in: v3.5.11, v4.0.4 + +##### enable_udf + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to enable UDF. +- Introduced in: - + +##### expr_children_limit + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of child expressions allowed in an expression. +- Introduced in: - + +##### histogram_buckets_size + +- Default: 64 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The default bucket number for a histogram. +- Introduced in: - + +##### histogram_max_sample_row_count + +- Default: 10000000 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The maximum number of rows to collect for a histogram. +- Introduced in: - + +##### histogram_mcv_size + +- Default: 100 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The number of most common values (MCV) for a histogram. +- Introduced in: - + +##### histogram_sample_ratio + +- Default: 0.1 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The sampling ratio for a histogram. +- Introduced in: - + +##### http_slow_request_threshold_ms + +- Default: 5000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: If the response time for an HTTP request exceeds the value specified by this parameter, a log is generated to track this request. +- Introduced in: v2.5.15, v3.1.5 + +##### lock_checker_enable_deadlock_check + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, the LockChecker thread performs JVM-level deadlock detection using ThreadMXBean.findDeadlockedThreads() and logs the offending threads' stack traces. The check runs inside the LockChecker daemon (whose frequency is controlled by `lock_checker_interval_second`) and writes detailed stack information to the log, which may be CPU- and I/O-intensive. Enable this option only for troubleshooting live or reproducible deadlock issues; leaving it enabled in normal operation can increase overhead and log volume. +- Introduced in: v3.2.0 + +##### low_cardinality_threshold + +- Default: 255 +- Type: Int +- Unit: - +- Is mutable: No +- Description: Threshold of low cardinality dictionary. +- Introduced in: v3.5.0 + +##### materialized_view_min_refresh_interval + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum allowed refresh interval (in seconds) for ASYNC materialized view schedules. When a materialized view is created with a time-based interval, the interval is converted to seconds and must not be less tham this value; otherwise the CREATE/ALTER operation fails with a DDL error. If this value is greater than 0, the check is enforced; set it to 0 or a negative value to disable the limit, which prevents excessive TaskManager scheduling and high FE memory/CPU usage from overly frequent refreshes. This item does not apply to EVENT_TRIGGERED refreshes. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### materialized_view_refresh_ascending + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, materialized view partition refresh will iterate partitions in ascending partition-key order (oldest to newest). When it is set to `false` (default), the system iterates in descending order (newest to oldest). StarRocks uses this item in both list- and range-partitioned materialized view refresh logic to choose which partitions to process when partition refresh limits apply and to compute the next start/end partition boundaries for subsequent TaskRun executions. Changing this item alters which partitions are refreshed first and how the next partition range is derived; for range-partitioned materialized views, the scheduler validates new start/end and will raise an error if a change would create a repeated boundary (dead-loop), so set this item with care. +- Introduced in: v3.3.1, v3.4.0, v3.5.0 + +##### max_allowed_in_element_num_of_delete + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of elements allowed for the IN predicate in a DELETE statement. +- Introduced in: - + +##### max_create_table_timeout_second + +- Default: 600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum timeout duration for creating a table. +- Introduced in: - + +##### max_distribution_pruner_recursion_depth + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description:: The maximum recursion depth allowed by the partition pruner. Increasing the recursion depth can prune more elements but also increases CPU consumption. +- Introduced in: - + +##### max_partitions_in_one_batch + +- Default: 4096 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The maximum number of partitions that can be created when you bulk create partitions. +- Introduced in: - + +##### max_planner_scalar_rewrite_num + +- Default: 100000 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The maximum number of times that the optimizer can rewrite a scalar operator. +- Introduced in: - + +##### max_query_queue_history_slots_number + +- Default: 0 +- Type: Int +- Unit: Slots +- Is mutable: Yes +- Description: Controls how many recently released (history) allocated slots are retained per query queue for monitoring and observability. When `max_query_queue_history_slots_number` is set to a value `> 0`, BaseSlotTracker keeps up to that many most-recently released LogicalSlot entries in an in-memory queue, evicting the oldest when the limit is exceeded. Enabling this causes getSlots() to include these history entries (newest first), allows BaseSlotTracker to attempt registering slots with the ConnectContext for richer ExtraMessage data, and lets LogicalSlot.ConnectContextListener attach query finish metadata to history slots. When `max_query_queue_history_slots_number` `<= 0` the history mechanism is disabled (no extra memory used). Use a reasonable value to balance observability and memory overhead. +- Introduced in: v3.5.0 + +##### max_query_retry_time + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of query retries on an FE. +- Introduced in: - + +##### max_running_rollup_job_num_per_table + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of rollup jobs can run in parallel for a table. +- Introduced in: - + +##### max_scalar_operator_flat_children + +- Default:10000 +- Type:Int +- Unit:- +- Is mutable: Yes +- Description:The maximum number of flat children for ScalarOperator. You can set this limit to prevent the optimizer from using too much memory. +- Introduced in: - + +##### max_scalar_operator_optimize_depth + +- Default:256 +- Type:Int +- Unit:- +- Is mutable: Yes +- Description: The maximum depth that ScalarOperator optimization can be applied. +- Introduced in: - + +##### mv_active_checker_interval_seconds + +- Default: 60 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: When the background active_checker thread is enabled, the system will periodically detect and automatically reactivate materialized views that became Inactive due to schema changes or rebuilds of their base tables (or views). This parameter controls the scheduling interval of the checker thread, in seconds. The default value is system-defined. +- Introduced in: v3.1.6 + +##### mv_rewrite_consider_data_layout_mode + +- Default: `enable` +- Type: String +- Unit: - +- Is mutable: Yes +- Description: Controls whether materialized view rewrite should take the base table data layout into account when selecting the best materialized view. Valid values: + - `disable`: Never use data-layout criteria when choosing between candidate materialized views. + - `enable`: Use data-layout criteria only when the query is recognized as layout-sensitive. + - `force`: Always apply data-layout criteria when selecting the best materialized view. + Changing this item affects `BestMvSelector` behavior and can improve or broaden rewrite applicability depending on whether physical layout matters for plan correctness or performance. +- Introduced in: - + +##### publish_version_interval_ms + +- Default: 10 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: The time interval at which release validation tasks are issued. +- Introduced in: - + +##### query_queue_slots_estimator_strategy + +- Default: MAX +- Type: String +- Unit: - +- Is mutable: Yes +- Description: Selects the slot estimation strategy used for queue-based queries when `enable_query_queue_v2` is true. Valid values: MBE (memory-based), PBE (parallelism-based), MAX (take max of MBE and PBE) and MIN (take min of MBE and PBE). MBE estimates slots from predicted memory or plan costs divided by the per-slot memory target and is capped by `totalSlots`. PBE derives slots from fragment parallelism (scan range counts or cardinality / rows-per-slot) and a CPU-cost based calculation (using CPU costs per slot), then bounds the result within [numSlots/2, numSlots]. MAX and MIN combine MBE and PBE by taking their maximum or minimum respectively. If the configured value is invalid, the default (`MAX`) is used. +- Introduced in: v3.5.0 + +##### query_queue_v2_concurrency_level + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls how many logical concurrency "layers" are used when computing the system's total query slots. In shared-nothing mode the total slots = `query_queue_v2_concurrency_level` * number_of_BEs * cores_per_BE (derived from BackendResourceStat). In multi-warehouse mode the effective concurrency is scaled down to max(1, `query_queue_v2_concurrency_level` / 4). If the configured value is non-positive it is treated as `4`. Changing this value increases or decreases totalSlots (and therefore concurrent query capacity) and affects per-slot resources: memBytesPerSlot is derived by dividing per-worker memory by (cores_per_worker * concurrency), and CPU accounting uses `query_queue_v2_cpu_costs_per_slot`. Set it proportional to cluster size; very large values may reduce per-slot memory and cause resource fragmentation. +- Introduced in: v3.3.4, v3.4.0, v3.5.0 + +##### query_queue_v2_cpu_costs_per_slot + +- Default: 1000000000 +- Type: Long +- Unit: planner CPU cost units +- Is mutable: Yes +- Description: Per-slot CPU cost threshold used to estimate how many slots a query needs from its planner CPU cost. The scheduler computes slots as integer(plan_cpu_costs / `query_queue_v2_cpu_costs_per_slot`) and then clamps the result to the range [1, totalSlots] (totalSlots is derived from the query queue V2 `V2` parameters). The V2 code normalizes non-positive settings to 1 (Math.max(1, value)), so a non-positive value effectively becomes `1`. Increasing this value reduces slots allocated per query (favoring fewer, larger-slot queries); decreasing it increases slots per query. Tune together with `query_queue_v2_num_rows_per_slot` and concurrency settings to control parallelism vs. resource granularity. +- Introduced in: v3.3.4, v3.4.0, v3.5.0 + +##### query_queue_v2_num_rows_per_slot + +- Default: 4096 +- Type: Int +- Unit: Rows +- Is mutable: Yes +- Description: The target number of source-row records assigned to a single scheduling slot when estimating per-query slot count. StarRocks computes estimated_slots = (cardinality of the Source Node) / `query_queue_v2_num_rows_per_slot`, then clamps the result to the range [1, totalSlots] and enforces a minimum of 1 if the computed value is non-positive. totalSlots is derived from available resources (roughly DOP * `query_queue_v2_concurrency_level` * number_of_workers/BE) and therefore depends on cluster/core counts. Increase this value to reduce slot count (each slot handles more rows) and lower scheduling overhead; decrease it to increase parallelism (more, smaller slots), up to the resource limit. +- Introduced in: v3.3.4, v3.4.0, v3.5.0 + +##### query_queue_v2_schedule_strategy + +- Default: SWRR +- Type: String +- Unit: - +- Is mutable: Yes +- Description: Selects the scheduling policy used by Query Queue V2 to order pending queries. Supported values (case-insensitive) are `SWRR` (Smooth Weighted Round Robin) — the default, suitable for mixed/hybrid workloads that need fair weighted sharing — and `SJF` (Short Job First + Aging) — prioritizes short jobs while using aging to avoid starvation. The value is parsed with case-insensitive enum lookup; an unrecognized value is logged as an error and the default policy is used. This configuration only affects behavior when Query Queue V2 is enabled and interacts with V2 sizing settings such as `query_queue_v2_concurrency_level`. +- Introduced in: v3.3.12, v3.4.2, v3.5.0 + +##### semi_sync_collect_statistic_await_seconds + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Maximum wait time for semi-synchronous statistics collection during DML operations (INSERT INTO and INSERT OVERWRITE statements). Stream Load and Broker Load use asynchronous mode and are not affected by this configuration. If statistics collection time exceeds this value, the load operation continues without waiting for collection to complete. This configuration works in conjunction with `enable_statistic_collect_on_first_load`. +- Introduced in: v3.1 + +##### slow_query_analyze_threshold + +- Default: 5 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description:: The execution time threshold for queries to trigger the analysis of Query Feedback. +- Introduced in: v3.4.0 + +##### statistic_analyze_status_keep_second + +- Default: 3 * 24 * 3600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The duration to retain the history of collection tasks. The default value is 3 days. +- Introduced in: - + +##### statistic_auto_analyze_end_time + +- Default: 23:59:59 +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The end time of automatic collection. Value range: `00:00:00` - `23:59:59`. +- Introduced in: - + +##### statistic_auto_analyze_start_time + +- Default: 00:00:00 +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The start time of automatic collection. Value range: `00:00:00` - `23:59:59`. +- Introduced in: - + +##### statistic_auto_collect_ratio + +- Default: 0.8 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The threshold for determining whether the statistics for automatic collection are healthy. If statistics health is below this threshold, automatic collection is triggered. +- Introduced in: - + +##### statistic_auto_collect_small_table_rows + +- Default: 10000000 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: Threshold to determine whether a table in an external data source (Hive, Iceberg, Hudi) is a small table during automatic collection. If the table has rows less than this value, the table is considered a small table. +- Introduced in: v3.2 + +##### statistic_cache_columns + +- Default: 100000 +- Type: Long +- Unit: - +- Is mutable: No +- Description: The number of rows that can be cached for the statistics table. +- Introduced in: - + +##### statistic_cache_thread_pool_size + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The size of the thread-pool which will be used to refresh statistic caches. +- Introduced in: - + +##### statistic_collect_interval_sec + +- Default: 5 * 60 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The interval for checking data updates during automatic collection. +- Introduced in: - + +##### statistic_max_full_collect_data_size + +- Default: 100 * 1024 * 1024 * 1024 +- Type: Long +- Unit: bytes +- Is mutable: Yes +- Description: The data size threshold for the automatic collection of statistics. If the total size exceeds this value, then sampled collection is performed instead of full. +- Introduced in: - + +##### statistic_sample_collect_rows + +- Default: 200000 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The row count threshold for deciding between SAMPLE and FULL statistics collection during loading-triggered statistics operations. If the number of loaded or changed rows exceeds this threshold (default 200,000), SAMPLE statistics collection is used; otherwise, FULL statistics collection is used. This setting works in conjunction with `enable_statistic_collect_on_first_load` and `statistic_sample_collect_ratio_threshold_of_first_load`. +- Introduced in: - + +##### statistic_update_interval_sec + +- Default: 24 * 60 * 60 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The interval at which the cache of statistical information is updated. +- Introduced in: - + +##### task_check_interval_second + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Interval between executions of task background jobs. GlobalStateMgr uses this value to schedule the TaskCleaner FrontendDaemon which invokes `doTaskBackgroundJob()`; the value is multiplied by 1000 to set the daemon interval in milliseconds. Decreasing the value makes background maintenance (task cleanup, checks) run more frequently and react faster but increases CPU/IO overhead; increasing it reduces overhead but delays cleanup and detection of stale tasks. Tune this value to balance maintenance responsiveness and resource usage. +- Introduced in: v3.2.0 + +##### task_min_schedule_interval_s + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Minimum allowed schedule interval (in seconds) for task schedules checked by the SQL layer. When a task is submitted, TaskAnalyzer converts the schedule period to seconds and rejects the submission with ERR_INVALID_PARAMETER if the period is smaller than `task_min_schedule_interval_s`. This prevents creating tasks that run too frequently and protects the scheduler from high-frequency tasks. If a schedule has no explicit start time, TaskAnalyzer sets the start time to the current epoch seconds. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### task_runs_timeout_second + +- Default: 4 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Default execution timeout (in seconds) for a TaskRun. This item is used by TaskRun execution as the baseline timeout. If the task run's properties include session variables `query_timeout` or `insert_timeout` with a positive integer value, the runtime uses the larger value between that session timeout and `task_runs_timeout_second`. The effective timeout is then bounded to not exceed the configured `task_runs_ttl_second` and `task_ttl_second`. Set this item to limit how long a task run may execute. Very large values may be clipped by the task/task-run TTL settings. +- Introduced in: - + +### Loading and unloading + +##### broker_load_default_timeout_second + +- Default: 14400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for a Broker Load job. +- Introduced in: - + +##### desired_max_waiting_jobs + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of pending jobs in an FE. The number refers to all jobs, such as table creation, loading, and schema change jobs. If the number of pending jobs in an FE reaches this value, the FE will reject new load requests. This parameter takes effect only for asynchronous loading. From v2.5 onwards, the default value is changed from 100 to 1024. +- Introduced in: - + +##### disable_load_job + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to disable loading when the cluster encounters an error. This prevents any loss caused by cluster errors. The default value is `FALSE`, indicating that loading is not disabled. `TRUE` indicates loading is disabled and the cluster is in read-only state. +- Introduced in: - + +##### empty_load_as_error + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to return an error message "all partitions have no load data" if no data is loaded. Valid values: + - `true`: If no data is loaded, the system displays a failure message and returns an error "all partitions have no load data". + - `false`: If no data is loaded, the system displays a success message and returns OK, instead of an error. +- Introduced in: - + +##### enable_file_bundling + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the File Bundling optimization for the cloud-native table. When this feature is enabled (set to `true`), the system automatically bundles the data files generated by loading, Compaction, or Publish operations, thereby reducing the API cost caused by high-frequency access to the external storage system. You can also control this behavior on the table level using the CREATE TABLE property `file_bundling`. For detailed instructions, see [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md). +- Introduced in: v4.0 + +##### enable_routine_load_lag_metrics + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to collect Routine Load Kafka partition offset lag metrics. Please note that set this item to `true` will call the Kafka API to get the partition's latest offset. +- Introduced in: - + +##### enable_sync_publish + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to synchronously execute the apply task at the publish phase of a load transaction. This parameter is applicable only to Primary Key tables. Valid values: + - `TRUE` (default): The apply task is synchronously executed at the publish phase of a load transaction. It means that the load transaction is reported as successful only after the apply task is completed, and the loaded data can truly be queried. When a task loads a large volume of data at a time or loads data frequently, setting this parameter to `true` can improve query performance and stability, but may increase load latency. + - `FALSE`: The apply task is asynchronously executed at the publish phase of a load transaction. It means that the load transaction is reported as successful after the apply task is submitted, but the loaded data cannot be immediately queried. In this case, concurrent queries need to wait for the apply task to complete or time out before they can continue. When a task loads a large volume of data at a time or loads data frequently, setting this parameter to `false` may affect query performance and stability. +- Introduced in: v3.2.0 + +##### export_checker_interval_second + +- Default: 5 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which load jobs are scheduled. +- Introduced in: - + +##### export_max_bytes_per_be_per_task + +- Default: 268435456 +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum amount of data that can be exported from a single BE by a single data unload task. +- Introduced in: - + +##### export_running_job_num_limit + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of data exporting tasks that can run in parallel. +- Introduced in: - + +##### export_task_default_timeout_second + +- Default: 2 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for a data exporting task. +- Introduced in: - + +##### export_task_pool_size + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The size of the unload task thread pool. +- Introduced in: - + +##### external_table_commit_timeout_ms + +- Default: 10000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The timeout duration for committing (publishing) a write transaction to a StarRocks external table. The default value `10000` indicates a 10-second timeout duration. +- Introduced in: - + +##### finish_transaction_default_lock_timeout_ms + +- Default: 1000 +- Type: Int +- Unit: MilliSeconds +- Is mutable: Yes +- Description: The default timeout for acquiring the db and table lock during finishing transaction. +- Introduced in: v4.0.0, v3.5.8 + +##### history_job_keep_max_second + +- Default: 7 * 24 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum duration a historical job can be retained, such as schema change jobs. +- Introduced in: - + +##### insert_load_default_timeout_second + +- Default: 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for the INSERT INTO statement that is used to load data. +- Introduced in: - + +##### label_clean_interval_second + +- Default: 4 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which labels are cleaned up. Unit: second. We recommend that you specify a short time interval to ensure that historical labels can be cleaned up in a timely manner. +- Introduced in: - + +##### label_keep_max_num + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of load jobs that can be retained within a period of time. If this number is exceeded, the information of historical jobs will be deleted. +- Introduced in: - + +##### label_keep_max_second + +- Default: 3 * 24 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum duration in seconds to keep the labels of load jobs that have been completed and are in the FINISHED or CANCELLED state. The default value is 3 days. After this duration expires, the labels will be deleted. This parameter applies to all types of load jobs. A value too large consumes a lot of memory. +- Introduced in: - + +##### load_checker_interval_second + +- Default: 5 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which load jobs are processed on a rolling basis. +- Introduced in: - + +##### load_parallel_instance_num + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls the number of parallel load fragment instances created on a single host for broker and stream loads. LoadPlanner uses this value as the per-host degree of parallelism unless the session enables adaptive sink DOP; if the session variable `enable_adaptive_sink_dop` is true, the session`s `sink_degree_of_parallelism` overrides this configuration. When shuffle is required, this value is applied to fragment parallel execution (scan fragment and sink fragment parallel exec instances). When no shuffle is needed, it is used as the sink pipeline DOP. Note: loads from local files are forced to a single instance (pipeline DOP = 1, parallel exec = 1) to avoid local disk contention. Increasing this number raises per-host concurrency and throughput but may increase CPU, memory and I/O contention. +- Introduced in: v3.2.0 + +##### load_straggler_wait_second + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum loading lag that can be tolerated by a BE replica. If this value is exceeded, cloning is performed to clone data from other replicas. +- Introduced in: - + +##### loads_history_retained_days + +- Default: 30 +- Type: Int +- Unit: Days +- Is mutable: Yes +- Description: Number of days to retain load history in the internal `_statistics_.loads_history` table. This value is used for table creation to set the table property `partition_live_number` and is passed to `TableKeeper` (clamped to a minimum of 1) to determine how many daily partitions to keep. Increasing or decreasing this value adjusts how long completed load jobs are retained in daily partitions; it affects new table creation and the keeper's pruning behavior but does not automatically recreate past partitions. The `LoadsHistorySyncer` relies on this retention when managing the loads history lifecycle; its sync cadence is controlled by `loads_history_sync_interval_second`. +- Introduced in: v3.3.6, v3.4.0, v3.5.0 + +##### loads_history_sync_interval_second + +- Default: 60 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Interval (in seconds) used by LoadsHistorySyncer to schedule periodic syncs of finished load jobs from `information_schema.loads` into the internal `_statistics_.loads_history` table. The value is multiplied by 1000 in the constructor to set the FrontendDaemon interval. The syncer skips the first run (to allow table creation) and only imports loads that finished more than one minute ago; small values increase DML and executor load, while larger values delay availability of historical load records. See `loads_history_retained_days` for retention/partitioning behavior of the target table. +- Introduced in: v3.3.6, v3.4.0, v3.5.0 + +##### max_broker_load_job_concurrency + +- Default: 5 +- Alias: async_load_task_pool_size +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of concurrent Broker Load jobs allowed within the StarRocks cluster. This parameter is valid only for Broker Load. The value of this parameter must be less than the value of `max_running_txn_num_per_db`. From v2.5 onwards, the default value is changed from `10` to `5`. +- Introduced in: - + +##### max_load_timeout_second + +- Default: 259200 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum timeout duration allowed for a load job. The load job fails if this limit is exceeded. This limit applies to all types of load jobs. +- Introduced in: - + +##### max_routine_load_batch_size + +- Default: 4294967296 +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum amount of data that can be loaded by a Routine Load task. +- Introduced in: - + +##### max_routine_load_task_concurrent_num + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of concurrent tasks for each Routine Load job. +- Introduced in: - + +##### max_routine_load_task_num_per_be + +- Default: 16 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of concurrent Routine Load tasks on each BE. Since v3.1.0, the default value for this parameter is increased to 16 from 5, and no longer needs to be less than or equal to the value of BE static parameter `routine_load_thread_pool_size` (deprecated). +- Introduced in: - + +##### max_running_txn_num_per_db + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of load transactions allowed to be running for each database within a StarRocks cluster. The default value is `1000`. From v3.1 onwards, the default value is changed to `1000` from `100`. When the actual number of load transactions running for a database exceeds the value of this parameter, new load requests will not be processed. New requests for synchronous load jobs will be denied, and new requests for asynchronous load jobs will be placed in queue. We do not recommend you increase the value of this parameter because this will increase system load. +- Introduced in: - + +##### max_stream_load_timeout_second + +- Default: 259200 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum allowed timeout duration for a Stream Load job. +- Introduced in: - + +##### max_tolerable_backend_down_num + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of faulty BE nodes allowed. If this number is exceeded, Routine Load jobs cannot be automatically recovered. +- Introduced in: - + +##### min_bytes_per_broker_scanner + +- Default: 67108864 +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: The minimum allowed amount of data that can be processed by a Broker Load instance. +- Introduced in: - + +##### min_load_timeout_second + +- Default: 1 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum timeout duration allowed for a load job. This limit applies to all types of load jobs. +- Introduced in: - + +##### min_routine_load_lag_for_metrics + +- Default: 10000 +- Type: INT +- Unit: - +- Is mutable: Yes +- Description: The minimum offset lag of Routine Load jobs to be shown in monitoring metrics. Routine Load jobs whose offset lags are greater than this value will be displayed in the metrics. +- Introduced in: - + +##### period_of_auto_resume_min + +- Default: 5 +- Type: Int +- Unit: Minutes +- Is mutable: Yes +- Description: The interval at which Routine Load jobs are automatically recovered. +- Introduced in: - + +##### prepared_transaction_default_timeout_second + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The default timeout duration for a prepared transaction. +- Introduced in: - + +##### routine_load_task_consume_second + +- Default: 15 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum time for each Routine Load task within the cluster to consume data. Since v3.1.0, Routine Load job supports a new parameter `task_consume_second` in [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties). This parameter applies to individual load tasks within a Routine Load job, which is more flexible. +- Introduced in: - + +##### routine_load_task_timeout_second + +- Default: 60 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for each Routine Load task within the cluster. Since v3.1.0, Routine Load job supports a new parameter `task_timeout_second` in [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties). This parameter applies to individual load tasks within a Routine Load job, which is more flexible. +- Introduced in: - + +##### routine_load_unstable_threshold_second + +- Default: 3600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: Routine Load job is set to the UNSTABLE state if any task within the Routine Load job lags. To be specific, the difference between the timestamp of the message being consumed and the current time exceeds this threshold, and unconsumed messages exist in the data source. +- Introduced in: - + +##### spark_dpp_version + +- Default: 1.0.0 +- Type: String +- Unit: - +- Is mutable: No +- Description: The version of Spark Dynamic Partition Pruning (DPP) used. +- Introduced in: - + +##### spark_home_default_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/lib/spark2x" +- Type: String +- Unit: - +- Is mutable: No +- Description: The root directory of a Spark client. +- Introduced in: - + +##### spark_launcher_log_dir + +- Default: sys_log_dir + "/spark_launcher_log" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores Spark log files. +- Introduced in: - + +##### spark_load_default_timeout_second + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for each Spark Load job. +- Introduced in: - + +##### spark_load_submit_timeout_second + +- Default: 300 +- Type: long +- Unit: Seconds +- Is mutable: No +- Description: Maximum time in seconds to wait for a YARN response after submitting a Spark application. `SparkLauncherMonitor.LogMonitor` converts this value to milliseconds and will stop monitoring and forcibly kill the spark launcher process if the job remains in UNKNOWN/CONNECTED/SUBMITTED longer than this timeout. `SparkLoadJob` reads this configuration as the default and allows a per-load override via the `LoadStmt.SPARK_LOAD_SUBMIT_TIMEOUT` property. Set it high enough to accommodate YARN queueing delays; setting it too low may abort legitimately queued jobs, while setting it too high may delay failure handling and resource cleanup. +- Introduced in: v3.2.0 + +##### spark_resource_path + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The root directory of the Spark dependency package. +- Introduced in: - + +##### stream_load_default_timeout_second + +- Default: 600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The default timeout duration for each Stream Load job. +- Introduced in: - + +##### stream_load_max_txn_num_per_be + +- Default: -1 +- Type: Int +- Unit: Transactions +- Is mutable: Yes +- Description: Limits the number of concurrent stream-load transactions accepted from a single BE (backend) host. When set to a non-negative integer, FrontendServiceImpl checks the current transaction count for the BE (by client IP) and rejects new stream-load begin requests if the count `>=` this limit. A value of `< 0` disables the limit (unlimited). This check occurs during stream load begin and may cause a `streamload txn num per be exceeds limit` error when exceeded. Related runtime behavior uses `stream_load_default_timeout_second` for request timeout fallback. +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### stream_load_task_keep_max_num + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Maximum number of Stream Load tasks that StreamLoadMgr keeps in memory (global across all databases). When the number of tracked tasks (`idToStreamLoadTask`) exceeds this threshold, StreamLoadMgr first calls `cleanSyncStreamLoadTasks()` to remove completed synchronous stream-load tasks; if the size still remains greater than half of this threshold, it invokes `cleanOldStreamLoadTasks(true)` to force removal of older or finished tasks. Increase this value to retain more task history in memory; decrease it to reduce memory usage and make cleanup more aggressive. This value controls in-memory retention only and does not affect persisted/replayed tasks. +- Introduced in: v3.2.0 + +##### stream_load_task_keep_max_second + +- Default: 3 * 24 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Retention window for finished or cancelled Stream Load tasks. After a task reaches a final state and its end timestamp is ealier than this threshold (`currentMs - endTimeMs > stream_load_task_keep_max_second * 1000`), it becomes eligible for removal by `StreamLoadMgr.cleanOldStreamLoadTasks` and is discarded when loading persisted state. Applies to both `StreamLoadTask` and `StreamLoadMultiStmtTask`. If total task count exceeds `stream_load_task_keep_max_num`, cleanup may be triggered earlier (synchronous tasks are prioritized by `cleanSyncStreamLoadTasks`). Set this to balance history/debugability against memory usage. +- Introduced in: v3.2.0 + +##### transaction_clean_interval_second + +- Default: 30 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which finished transactions are cleaned up. Unit: second. We recommend that you specify a short time interval to ensure that finished transactions can be cleaned up in a timely manner. +- Introduced in: - + +##### transaction_stream_load_coordinator_cache_capacity + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The capacity of the cache that stores the mapping from transaction label to coordinator node. +- Introduced in: - + +##### transaction_stream_load_coordinator_cache_expire_seconds + +- Default: 900 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The time to keep the coordinator mapping in the cache before it's evicted(TTL). +- Introduced in: - + +##### yarn_client_path + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn" +- Type: String +- Unit: - +- Is mutable: No +- Description: The root directory of the Yarn client package. +- Introduced in: - + +##### yarn_config_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-config" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores the Yarn configuration file. +- Introduced in: - + +### Statistic report + +##### enable_collect_warehouse_metrics + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, the system will collect and export per-warehouse metrics. Enabling it adds warehouse-level metrics (slot/usage/availability) to the metric output and increases metric cardinality and collection overhead. Disable it to omit warehouse-specific metrics and reduce CPU/network and monitoring storage cost. +- Introduced in: v3.5.0 + +##### enable_http_detail_metrics + +- Default: false +- Type: boolean +- Unit: - +- Is mutable: Yes +- Description: When true, the HTTP server computes and exposes detailed HTTP worker metrics (notably the `HTTP_WORKER_PENDING_TASKS_NUM` gauge). Enabling this causes the server to iterate over Netty worker executors and call `pendingTasks()` on each `NioEventLoop` to sum pending task counts; when disabled the gauge returns 0 to avoid that cost. This extra collection can be CPU- and latency-sensitive — enable only for debugging or detailed investigation. +- Introduced in: v3.2.3 + +##### proc_profile_collect_time_s + +- Default: 120 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Duration in seconds for a single process profile collection. When `proc_profile_cpu_enable` or `proc_profile_mem_enable` is set to `true`, AsyncProfiler is started, the collector thread sleeps for this duration, then the profiler is stopped and the profile is written. Larger values increase sample coverage and file size but prolong profiler runtime and delay subsequent collections; smaller values reduce overhead but may produce insufficient samples. Ensure this value aligns with retention settings such as `proc_profile_file_retained_days` and `proc_profile_file_retained_size_bytes`. +- Introduced in: v3.2.12 + +### Storage + +##### alter_table_timeout_second + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for the schema change operation (ALTER TABLE). +- Introduced in: - + +##### capacity_used_percent_high_water + +- Default: 0.75 +- Type: double +- Unit: Fraction (0.0–1.0) +- Is mutable: Yes +- Description: The high-water threshold of disk capacity used percent (fraction of total capacity) used when computing backend load scores. `BackendLoadStatistic.calcSore` uses `capacity_used_percent_high_water` to set `LoadScore.capacityCoefficient`: if a backend's used percent less than 0.5 the coefficient equal to 0.5; if used percent `>` `capacity_used_percent_high_water` the coefficient = 1.0; otherwise the coefficient transitions linearly with used percent via (2 * usedPercent - 0.5). When the coefficient is 1.0, the load score is driven entirely by capacity proportion; lower values increase the weight of replica count. Adjusting this value changes how aggressively the balancer penalizes backends with high disk utilization. +- Introduced in: v3.2.0 + +##### catalog_trash_expire_second + +- Default: 86400 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The longest duration the metadata can be retained after a database, table, or partition is dropped. If this duration expires, the data will be deleted and cannot be recovered through the [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) command. +- Introduced in: - + +##### check_consistency_default_timeout_second + +- Default: 600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for a replica consistency check. You can set this parameter based on the size of your tablet. +- Introduced in: - + +##### consistency_check_cooldown_time_second + +- Default: 24 * 3600 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Controls the minimal interval (in seconds) required between consistency checks of the same tablet. During tablet selection, a tablet is considered eligible only if `tablet.getLastCheckTime()` is less than `(currentTimeMillis - consistency_check_cooldown_time_second * 1000)`. The default value (24 * 3600) enforces roughly one check per tablet per day to reduce backend disk I/O. Lowering this value increases check frequency and resource usage; raising it reduces I/O at the cost of slower detection of inconsistencies. The value is applied globally when filtering cooldowned tablets from an index's tablet list. +- Introduced in: v3.5.5 + +##### consistency_check_end_time + +- Default: "4" +- Type: String +- Unit: Hour of day (0-23) +- Is mutable: No +- Description: Specifies the end hour (hour-of-day) of the ConsistencyChecker work window. The value is parsed with SimpleDateFormat("HH") in the system time zone and accepted as 0–23 (single or two-digit). StarRocks uses it with `consistency_check_start_time` to decide when to schedule and add consistency-check jobs. When `consistency_check_start_time` is greater than `consistency_check_end_time`, the window spans midnight (for example, default is `consistency_check_start_time` = "23" to `consistency_check_end_time` = "4"). When `consistency_check_start_time` is equal to `consistency_check_end_time`, the checker never runs. Parsing failure will cause FE startup to log an error and exit, so provide a valid hour string. +- Introduced in: v3.2.0 + +##### consistency_check_start_time + +- Default: "23" +- Type: String +- Unit: Hour of day (00-23) +- Is mutable: No +- Description: Specifies the start hour (hour-of-day) of the ConsistencyChecker work window. The value is parsed with SimpleDateFormat("HH") in the system time zone and accepted as 0–23 (single or two-digit). StarRocks uses it with `consistency_check_end_time` to decide when to schedule and add consistency-check jobs. When `consistency_check_start_time` is greater than `consistency_check_end_time`, the window spans midnight (for example, default is `consistency_check_start_time` = "23" to `consistency_check_end_time` = "4"). When `consistency_check_start_time` is equal to `consistency_check_end_time`, the checker never runs. Parsing failure will cause FE startup to log an error and exit, so provide a valid hour string. +- Introduced in: v3.2.0 + +##### consistency_tablet_meta_check_interval_ms + +- Default: 2 * 3600 * 1000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: Interval used by the ConsistencyChecker to run a full tablet-meta consistency scan between `TabletInvertedIndex` and `LocalMetastore`. The daemon in `runAfterCatalogReady` triggers checkTabletMetaConsistency when `current time - lastTabletMetaCheckTime` exceeds this value. When an invalid tablet is first detected, its `toBeCleanedTime` is set to `now + (consistency_tablet_meta_check_interval_ms / 2)` so actual deletion is delayed until a subsequent scan. Increase this value to reduce scan frequency and load (slower cleanup); decrease it to detect and remove stale tablets faster (higher overhead). +- Introduced in: v3.2.0 + +##### default_replication_num + +- Default: 3 +- Type: Short +- Unit: - +- Is mutable: Yes +- Description: Sets the default number of replicas for each data partition when creating a table in StarRocks. This setting can be overridden when creating a table by specifying `replication_num=x` in the CREATE TABLE DDL. +- Introduced in: - + +##### enable_auto_tablet_distribution + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to automatically set the number of buckets. + - If this parameter is set to `TRUE`, you don't need to specify the number of buckets when you create a table or add a partition. StarRocks automatically determines the number of buckets. + - If this parameter is set to `FALSE`, you need to manually specify the number of buckets when you create a table or add a partition. If you do not specify the bucket count when adding a new partition to a table, the new partition inherits the bucket count set at the creation of the table. However, you can also manually specify the number of buckets for the new partition. +- Introduced in: v2.5.7 + +##### enable_experimental_rowstore + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the [hybrid row-column storage](../../table_design/hybrid_table.md) feature. +- Introduced in: v3.2.3 + +##### enable_fast_schema_evolution + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable fast schema evolution for all tables within the StarRocks cluster. Valid values are `TRUE` and `FALSE` (default). Enabling fast schema evolution can increase the speed of schema changes and reduce resource usage when columns are added or dropped. +- Introduced in: v3.2.0 + +> **NOTE** +> +> - StarRocks shared-data clusters supports this parameter from v3.3.0. +> - If you need to configure the fast schema evolution for a specific table, such as disabling fast schema evolution for a specific table, you can set the table property [`fast_schema_evolution`](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md#set-fast-schema-evolution) at table creation. + +##### enable_online_optimize_table + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Controls whether StarRocks will use the non-blocking online optimization path when creating an optimize job. When `enable_online_optimize_table` is true and the target table meets compatibility checks (no partition/keys/sort specification, distribution is not `RandomDistributionDesc`, storage type is not `COLUMN_WITH_ROW`, replicated storage enabled, and the table is not a cloud-native table or materialized view), the planner creates an `OnlineOptimizeJobV2` to perform optimization without blocking writes. If false or any compatibility condition fails, StarRocks falls back to `OptimizeJobV2`, which may block write operations during optimization. +- Introduced in: v3.3.3, v3.4.0, v3.5.0 + +##### enable_strict_storage_medium_check + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether the FE strictly checks the storage medium of BEs when users create tables. If this parameter is set to `TRUE`, the FE checks the storage medium of BEs when users create tables and returns an error if the storage medium of the BE is different from the `storage_medium` parameter specified in the CREATE TABLE statement. For example, the storage medium specified in the CREATE TABLE statement is SSD but the actual storage medium of BEs is HDD. As a result, the table creation fails. If this parameter is `FALSE`, the FE does not check the storage medium of BEs when users create a table. +- Introduced in: - + +##### max_bucket_number_per_partition + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of buckets can be created in a partition. +- Introduced in: v3.3.2 + +##### max_column_number_per_table + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of columns can be created in a table. +- Introduced in: v3.3.2 + +##### max_dynamic_partition_num + +- Default: 500 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Limits the maximum number of partitions that can be created at once when analyzing or creating a dynamic-partitioned table. During dynamic partition property validation, the systemtask_runs_max_history_number computes expected partitions (end offset + history partition number) and throws a DDL error if that total exceeds `max_dynamic_partition_num`. Raise this value only when you expect legitimately large partition ranges; increasing it allows more partitions to be created but can increase metadata size, scheduling work, and operational complexity. +- Introduced in: v3.2.0 + +##### max_partition_number_per_table + +- Default: 100000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of partitions can be created in a table. +- Introduced in: v3.3.2 + +##### max_task_consecutive_fail_count + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Maximum number of consecutive failures a task may have before the scheduler automatically suspends it. When `TaskSource.MV.equals(task.getSource())` and `max_task_consecutive_fail_count` are greater than 0, if a task's consecutive failure counter reaches or exceeds `max_task_consecutive_fail_count`, the task is suspended via the TaskManager and, for materialized-view tasks, the materialized view is inactivated. An exception is thrown indicating suspension and how to reactivate (for example, `ALTER MATERIALIZED VIEW ACTIVE`). Set this item to 0 or a negative value to disable automatic suspension. +- Introduced in: - + +##### partition_recycle_retention_period_secs + +- Default: 1800 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The metadata retention time for the partition that is dropped by INSERT OVERWRITE or materialized view refresh operations. Note that such metadata cannot be recovered by executing [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md). +- Introduced in: v3.5.9 + +##### recover_with_empty_tablet + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to replace a lost or corrupted tablet replica with an empty one. If a tablet replica is lost or corrupted, data queries on this tablet or other healthy tablets may fail. Replacing the lost or corrupted tablet replica with an empty tablet ensures that the query can still be executed. However, the result may be incorrect because data is lost. The default value is `FALSE`, which means lost or corrupted tablet replicas are not replaced with empty ones, and the query fails. +- Introduced in: - + +##### storage_usage_hard_limit_percent + +- Default: 95 +- Alias: storage_flood_stage_usage_percent +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Hard limit of the storage usage percentage in a BE directory. If the storage usage (in percentage) of the BE storage directory exceeds this value and the remaining storage space is less than `storage_usage_hard_limit_reserve_bytes`, Load and Restore jobs are rejected. You need to set this item together with the BE configuration item `storage_flood_stage_usage_percent` to allow the configurations to take effect. +- Introduced in: - + +##### storage_usage_hard_limit_reserve_bytes + +- Default: 100 * 1024 * 1024 * 1024 +- Alias: storage_flood_stage_left_capacity_bytes +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: Hard limit of the remaining storage space in a BE directory. If the remaining storage space in the BE storage directory is less than this value and the storage usage (in percentage) exceeds `storage_usage_hard_limit_percent`, Load and Restore jobs are rejected. You need to set this item together with the BE configuration item `storage_flood_stage_left_capacity_bytes` to allow the configurations to take effect. +- Introduced in: - + +##### storage_usage_soft_limit_percent + +- Default: 90 +- Alias: storage_high_watermark_usage_percent +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Soft limit of the storage usage percentage in a BE directory. If the storage usage (in percentage) of the BE storage directory exceeds this value and the remaining storage space is less than `storage_usage_soft_limit_reserve_bytes`, tablets cannot be cloned into this directory. +- Introduced in: - + +##### storage_usage_soft_limit_reserve_bytes + +- Default: 200 * 1024 * 1024 * 1024 +- Alias: storage_min_left_capacity_bytes +- Type: Long +- Unit: Bytes +- Is mutable: Yes +- Description: Soft limit of the remaining storage space in a BE directory. If the remaining storage space in the BE storage directory is less than this value and the storage usage (in percentage) exceeds `storage_usage_soft_limit_percent`, tablets cannot be cloned into this directory. +- Introduced in: - + +##### tablet_checker_lock_time_per_cycle_ms + +- Default: 1000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The maximum lock hold time per cycle for tablet checker before releasing and reacquiring the table lock. Values less than 100 will be treated as 100. +- Introduced in: v3.5.9, v4.0.2 + +##### tablet_create_timeout_second + +- Default: 10 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for creating a tablet. The default value is changed from 1 to 10 from v3.1 onwards. +- Introduced in: - + +##### tablet_delete_timeout_second + +- Default: 2 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for deleting a tablet. +- Introduced in: - + +##### tablet_sched_balance_load_disk_safe_threshold + +- Default: 0.5 +- Alias: balance_load_disk_safe_threshold +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The percentage threshold for determining whether the disk usage of BEs is balanced. If the disk usage of all BEs is lower than this value, it is considered balanced. If the disk usage is greater than this value and the difference between the highest and lowest BE disk usage is greater than 10%, the disk usage is considered unbalanced and a tablet re-balancing is triggered. +- Introduced in: - + +##### tablet_sched_balance_load_score_threshold + +- Default: 0.1 +- Alias: balance_load_score_threshold +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The percentage threshold for determining whether the load of a BE is balanced. If a BE has a lower load than the average load of all BEs and the difference is greater than this value, this BE is in a low load state. On the contrary, if a BE has a higher load than the average load and the difference is greater than this value, this BE is in a high load state. +- Introduced in: - + +##### tablet_sched_be_down_tolerate_time_s + +- Default: 900 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The maximum duration the scheduler allows for a BE node to remain inactive. After the time threshold is reached, tablets on that BE node will be migrated to other active BE nodes. +- Introduced in: v2.5.7 + +##### tablet_sched_disable_balance + +- Default: false +- Alias: disable_balance +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to disable tablet balancing. `TRUE` indicates that tablet balancing is disabled. `FALSE` indicates that tablet balancing is enabled. +- Introduced in: - + +##### tablet_sched_disable_colocate_balance + +- Default: false +- Alias: disable_colocate_balance +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to disable replica balancing for Colocate Table. `TRUE` indicates replica balancing is disabled. `FALSE` indicates replica balancing is enabled. +- Introduced in: - + +##### tablet_sched_max_balancing_tablets + +- Default: 500 +- Alias: max_balancing_tablets +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of tablets that can be balanced at the same time. If this value is exceeded, tablet re-balancing will be skipped. +- Introduced in: - + +##### tablet_sched_max_clone_task_timeout_sec + +- Default: 2 * 60 * 60 +- Alias: max_clone_task_timeout_sec +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description:The maximum timeout duration for cloning a tablet. +- Introduced in: - + +##### tablet_sched_max_not_being_scheduled_interval_ms + +- Default: 15 * 60 * 1000 +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: When the tablet clone tasks are being scheduled, if a tablet has not been scheduled for the specified time in this parameter, StarRocks gives it a higher priority to schedule it as soon as possible. +- Introduced in: - + +##### tablet_sched_max_scheduling_tablets + +- Default: 10000 +- Alias: max_scheduling_tablets +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of tablets that can be scheduled at the same time. If the value is exceeded, tablet balancing and repair checks will be skipped. +- Introduced in: - + +##### tablet_sched_min_clone_task_timeout_sec + +- Default: 3 * 60 +- Alias: min_clone_task_timeout_sec +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum timeout duration for cloning a tablet. +- Introduced in: - + +##### tablet_sched_num_based_balance_threshold_ratio + +- Default: 0.5 +- Alias: - +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: Doing num based balance may break the disk size balance, but the maximum gap between disks cannot exceed tablet_sched_num_based_balance_threshold_ratio * tablet_sched_balance_load_score_threshold. If there are tablets in the cluster that are constantly balancing from A to B and B to A, reduce this value. If you want the tablet distribution to be more balanced, increase this value. +- Introduced in: - 3.1 + +##### tablet_sched_repair_delay_factor_second + +- Default: 60 +- Alias: tablet_repair_delay_factor_second +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The interval at which replicas are repaired, in seconds. +- Introduced in: - + +##### tablet_sched_slot_num_per_path + +- Default: 8 +- Alias: schedule_slot_num_per_path +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of tablet-related tasks that can run concurrently in a BE storage directory. From v2.5 onwards, the default value of this parameter is changed from `4` to `8`. +- Introduced in: - + +##### tablet_sched_storage_cooldown_second + +- Default: -1 +- Alias: storage_cooldown_second +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The latency of automatic cooling starting from the time of table creation. The default value `-1` specifies that automatic cooling is disabled. If you want to enable automatic cooling, set this parameter to a value greater than `-1`. +- Introduced in: - + +##### tablet_stat_update_interval_second + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which the FE retrieves tablet statistics from each BE. +- Introduced in: - + +### Shared-data + +##### aws_s3_access_key + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Access Key ID used to access your S3 bucket. +- Introduced in: v3.0 + +##### aws_s3_endpoint + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The endpoint used to access your S3 bucket, for example, `https://s3.us-west-2.amazonaws.com`. +- Introduced in: v3.0 + +##### aws_s3_external_id + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The external ID of the AWS account that is used for cross-account access to your S3 bucket. +- Introduced in: v3.0 + +##### aws_s3_iam_role_arn + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The ARN of the IAM role that has privileges on your S3 bucket in which your data files are stored. +- Introduced in: v3.0 + +##### aws_s3_path + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The S3 path used to store data. It consists of the name of your S3 bucket and the sub-path (if any) under it, for example, `testbucket/subpath`. +- Introduced in: v3.0 + +##### aws_s3_region + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The region in which your S3 bucket resides, for example, `us-west-2`. +- Introduced in: v3.0 + +##### aws_s3_secret_key + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Secret Access Key used to access your S3 bucket. +- Introduced in: v3.0 + +##### aws_s3_use_aws_sdk_default_behavior + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to use the default authentication credential of AWS SDK. Valid values: true and false (Default). +- Introduced in: v3.0 + +##### aws_s3_use_instance_profile + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to use Instance Profile and Assumed Role as credential methods for accessing S3. Valid values: true and false (Default). + - If you use IAM user-based credential (Access Key and Secret Key) to access S3, you must specify this item as `false`, and specify `aws_s3_access_key` and `aws_s3_secret_key`. + - If you use Instance Profile to access S3, you must specify this item as `true`. + - If you use Assumed Role to access S3, you must specify this item as `true`, and specify `aws_s3_iam_role_arn`. + - And if you use an external AWS account, you must also specify `aws_s3_external_id`. +- Introduced in: v3.0 + +##### azure_adls2_endpoint + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The endpoint of your Azure Data Lake Storage Gen2 Account, for example, `https://test.dfs.core.windows.net`. +- Introduced in: v3.4.1 + +##### azure_adls2_oauth2_client_id + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Client ID of the Managed Identity used to authorize requests for your Azure Data Lake Storage Gen2. +- Introduced in: v3.4.4 + +##### azure_adls2_oauth2_tenant_id + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Tenant ID of the Managed Identity used to authorize requests for your Azure Data Lake Storage Gen2. +- Introduced in: v3.4.4 + +##### azure_adls2_oauth2_use_managed_identity + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to use Managed Identity to authorize requests for your Azure Data Lake Storage Gen2. +- Introduced in: v3.4.4 + +##### azure_adls2_path + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Azure Data Lake Storage Gen2 path used to store data. It consists of the file system name and the directory name, for example, `testfilesystem/starrocks`. +- Introduced in: v3.4.1 + +##### azure_adls2_sas_token + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The shared access signatures (SAS) used to authorize requests for your Azure Data Lake Storage Gen2. +- Introduced in: v3.4.1 + +##### azure_adls2_shared_key + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Shared Key used to authorize requests for your Azure Data Lake Storage Gen2. +- Introduced in: v3.4.1 + +##### azure_blob_endpoint + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The endpoint of your Azure Blob Storage Account, for example, `https://test.blob.core.windows.net`. +- Introduced in: v3.1 + +##### azure_blob_path + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Azure Blob Storage path used to store data. It consists of the name of the container within your storage account and the sub-path (if any) under the container, for example, `testcontainer/subpath`. +- Introduced in: v3.1 + +##### azure_blob_sas_token + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The shared access signatures (SAS) used to authorize requests for your Azure Blob Storage. +- Introduced in: v3.1 + +##### azure_blob_shared_key + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Shared Key used to authorize requests for your Azure Blob Storage. +- Introduced in: v3.1 + +##### azure_use_native_sdk + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to use the native SDK to access Azure Blob Storage, thus allowing authentication with Managed Identities and Service Principals. If this item is set to `false`, only authentication with Shared Key and SAS Token is allowed. +- Introduced in: v3.4.4 + +##### cloud_native_hdfs_url + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The URL of the HDFS storage, for example, `hdfs://127.0.0.1:9000/user/xxx/starrocks/`. +- Introduced in: - + +##### cloud_native_meta_port + +- Default: 6090 +- Type: Int +- Unit: - +- Is mutable: No +- Description: FE cloud-native metadata server RPC listen port. +- Introduced in: - + +##### cloud_native_storage_type + +- Default: S3 +- Type: String +- Unit: - +- Is mutable: No +- Description: The type of object storage you use. In shared-data mode, StarRocks supports storing data in HDFS, Azure Blob (supported from v3.1.1 onwards), Azure Data Lake Storage Gen2 (supported from v3.4.1 onwards), Google Storage (with native SDK, supported from v3.5.1 onwards), and object storage systems that are compatible with the S3 protocol (such as AWS S3, and MinIO). Valid value: `S3` (Default), `HDFS`, `AZBLOB`, `ADLS2`, and `GS`. If you specify this parameter as `S3`, you must add the parameters prefixed by `aws_s3`. If you specify this parameter as `AZBLOB`, you must add the parameters prefixed by `azure_blob`. If you specify this parameter as `ADLS2`, you must add the parameters prefixed by `azure_adls2`. If you specify this parameter as `GS`, you must add the parameters prefixed by `gcp_gcs`. If you specify this parameter as `HDFS`, you only need to specify `cloud_native_hdfs_url`. +- Introduced in: - + +##### enable_load_volume_from_conf + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to allow StarRocks to create the built-in storage volume by using the object storage-related properties specified in the FE configuration file. The default value is changed from `true` to `false` from v3.4.1 onwards. +- Introduced in: v3.1.0 + +##### gcp_gcs_impersonation_service_account + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Service Account that you want to impersonate if you use the impersonation-based authentication to access Google Storage. +- Introduced in: v3.5.1 + +##### gcp_gcs_path + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Google Cloud path used to store data. It consists of the name of your Google Cloud bucket and the sub-path (if any) under it, for example, `testbucket/subpath`. +- Introduced in: v3.5.1 + +##### gcp_gcs_service_account_email + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The email address in the JSON file generated at the creation of the Service Account, for example, `user@hello.iam.gserviceaccount.com`. +- Introduced in: v3.5.1 + +##### gcp_gcs_service_account_private_key + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Private Key in the JSON file generated at the creation of the Service Account, for example, `-----BEGIN PRIVATE KEY----xxxx-----END PRIVATE KEY-----\n`. +- Introduced in: v3.5.1 + +##### gcp_gcs_service_account_private_key_id + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The Private Key ID in the JSON file generated at the creation of the Service Account. +- Introduced in: v3.5.1 + +##### gcp_gcs_use_compute_engine_service_account + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Whether to use the Service Account that is bound to your Compute Engine. +- Introduced in: v3.5.1 + +##### hdfs_file_system_expire_seconds + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: Time-to-live in seconds for an unused cached HDFS/ObjectStore FileSystem managed by HdfsFsManager. The FileSystemExpirationChecker (runs every 60s) calls each HdfsFs.isExpired(...) using this value; when expired the manager closes the underlying FileSystem and removes it from the cache. Accessor methods (for example `HdfsFs.getDFSFileSystem`, `getUserName`, `getConfiguration`) update the last-access timestamp, so expiry is based on inactivity. Lower values reduce idle resource holding but increase reopen overhead; higher values keep handles longer and may consume more resources. +- Introduced in: v3.2.0 + +##### lake_autovacuum_grace_period_minutes + +- Default: 30 +- Type: Long +- Unit: Minutes +- Is mutable: Yes +- Description: The time range for retaining historical data versions in a shared-data cluster. Historical data versions within this time range are not automatically cleaned via AutoVacuum after Compactions. You need to set this value greater than the maximum query time to avoid that the data accessed by running queries get deleted before the queries finish. The default value has been changed from `5` to `30` since v3.3.0, v3.2.5, and v3.1.10. +- Introduced in: v3.1.0 + +##### lake_autovacuum_parallel_partitions + +- Default: 8 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of partitions that can undergo AutoVacuum simultaneously in a shared-data cluster. AutoVacuum is the Garbage Collection after Compactions. +- Introduced in: v3.1.0 + +##### lake_autovacuum_partition_naptime_seconds + +- Default: 180 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The minimum interval between AutoVacuum operations on the same partition in a shared-data cluster. +- Introduced in: v3.1.0 + +##### lake_autovacuum_stale_partition_threshold + +- Default: 12 +- Type: Long +- Unit: Hours +- Is mutable: Yes +- Description: If a partition has no updates (loading, DELETE, or Compactions) within this time range, the system will not perform AutoVacuum on this partition. +- Introduced in: v3.1.0 + +##### lake_compaction_allow_partial_success + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: If this item is set to `true`, the system will consider the Compaction operation in a shared-data cluster as successful when one of the sub-tasks succeeds. +- Introduced in: v3.5.2 + +##### lake_compaction_disable_ids + +- Default: "" +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The table or partition list of which compaction is disabled in shared-data mode. The format is `tableId1;partitionId2`, seperated by semicolon, for example, `12345;98765`. +- Introduced in: v3.4.4 + +##### lake_compaction_history_size + +- Default: 20 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The number of recent successful Compaction task records to keep in the memory of the Leader FE node in a shared-data cluster. You can view recent successful Compaction task records using the `SHOW PROC '/compactions'` command. Note that the Compaction history is stored in the FE process memory, and it will be lost if the FE process is restarted. +- Introduced in: v3.1.0 + +##### lake_compaction_max_tasks + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of concurrent Compaction tasks allowed in a shared-data cluster. Setting this item to `-1` indicates to calculate the concurrent task number in an adaptive manner. Setting this value to `0` will disable compaction. +- Introduced in: v3.1.0 + +##### lake_compaction_score_selector_min_score + +- Default: 10.0 +- Type: Double +- Unit: - +- Is mutable: Yes +- Description: The Compaction Score threshold that triggers Compaction operations in a shared-data cluster. When the Compaction Score of a partition is greater than or equal to this value, the system performs Compaction on that partition. +- Introduced in: v3.1.0 + +##### lake_compaction_score_upper_bound + +- Default: 2000 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The upper limit of the Compaction Score for a partition in a shared-data cluster. `0` indicates no upper limit. This item only takes effect when `lake_enable_ingest_slowdown` is set to `true`. When the Compaction Score of a partition reaches or exceeds this upper limit, incoming loading tasks will be rejected. From v3.3.6 onwards, the default value is changed from `0` to `2000`. +- Introduced in: v3.2.0 + +##### lake_enable_balance_tablets_between_workers + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to balance the number of tablets among Compute Nodes during the tablet migration of cloud-native tables in a shared-data cluster. `true` indicates to balance the tablets among Compute Nodes, and `false` indicates to disabling this feature. +- Introduced in: v3.3.4 + +##### lake_enable_ingest_slowdown + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable Data Ingestion Slowdown in a shared-data cluster. When Data Ingestion Slowdown is enabled, if the Compaction Score of a partition exceeds `lake_ingest_slowdown_threshold`, loading tasks on that partition will be throttled down. This configuration only takes effect when `run_mode` is set to `shared_data`. From v3.3.6 onwards, the default value is chenged from `false` to `true`. +- Introduced in: v3.2.0 + +##### lake_ingest_slowdown_threshold + +- Default: 100 +- Type: Long +- Unit: - +- Is mutable: Yes +- Description: The Compaction Score threshold that triggers Data Ingestion Slowdown in a shared-data cluster. This configuration only takes effect when `lake_enable_ingest_slowdown` is set to `true`. +- Introduced in: v3.2.0 + +##### lake_publish_version_max_threads + +- Default: 512 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads for Version Publish tasks in a shared-data cluster. +- Introduced in: v3.2.0 + +##### meta_sync_force_delete_shard_meta + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow deleting the metadata of the shared-data cluster directly, bypassing cleaning the remote storage files. It is recommended to set this item to `true` only when there is an excessive number of shards to be cleaned, which leads to extreme memory pressure on the FE JVM. Note that the data files belonging to the shards or tablets cannot be automatically cleaned after this feature is enabled. +- Introduced in: v3.2.10, v3.3.3 + +##### run_mode + +- Default: shared_nothing +- Type: String +- Unit: - +- Is mutable: No +- Description: The running mode of the StarRocks cluster. Valid values: `shared_data` and `shared_nothing` (Default). + - `shared_data` indicates running StarRocks in shared-data mode. + - `shared_nothing` indicates running StarRocks in shared-nothing mode. + + > **CAUTION** + > + > - You cannot adopt the `shared_data` and `shared_nothing` modes simultaneously for a StarRocks cluster. Mixed deployment is not supported. + > - DO NOT change `run_mode` after the cluster is deployed. Otherwise, the cluster fails to restart. The transformation from a shared-nothing cluster to a shared-data cluster or vice versa is not supported. + +- Introduced in: - + +##### shard_group_clean_threshold_sec + +- Default: 3600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The time before FE cleans the unused tablet and shard groups in a shared-data cluster. Tablets and shard groups created within this threshold will not be cleaned. +- Introduced in: - + +##### star_mgr_meta_sync_interval_sec + +- Default: 600 +- Type: Long +- Unit: Seconds +- Is mutable: No +- Description: The interval at which FE runs the periodical metadata synchronization with StarMgr in a shared-data cluster. +- Introduced in: - + +##### starmgr_grpc_server_max_worker_threads + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of worker threads that are used by the grpc server in the FE starmgr module. +- Introduced in: v4.0.0, v3.5.8 + +##### starmgr_grpc_timeout_seconds + +- Default: 5 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: +- Introduced in: - + +### Data Lake + +##### files_enable_insert_push_down_schema + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, the analyzer will attempt to push the target table schema into the `files()` table function for INSERT ... FROM files() operations. This only applies when the source is a FileTableFunctionRelation, the target is a native table, and the SELECT list contains corresponding slot-ref columns (or *). The analyzer will match select columns to target columns (counts must match), lock the target table briefly, and replace file-column types with deep-copied target column types for non-complex types (complex types such as Parquet JSON `->` `array` are skipped). Column names from the original files table are preserved. This reduces type-mismatch and looseness from file-based type inference during ingestion. +- Introduced in: v3.4.0, v3.5.0 + +##### hdfs_read_buffer_size_kb + +- Default: 8192 +- Type: Int +- Unit: Kilobytes +- Is mutable: Yes +- Description: Size of the HDFS read buffer in kilobytes. StarRocks converts this value to bytes (`<< 10`) and uses it to initialize HDFS read buffers in `HdfsFsManager` and to populate the thrift field `hdfs_read_buffer_size_kb` sent to BE tasks (e.g., `TBrokerScanRangeParams`, `TDownloadReq`) when broker access is not used. Increasing `hdfs_read_buffer_size_kb` can improve sequential read throughput and reduce syscall overhead at the cost of higher per-stream memory usage; decreasing it reduces memory footprint but may lower IO efficiency. Consider workload (many small streams vs. few large sequential reads) when tuning. +- Introduced in: v3.2.0 + +##### hdfs_write_buffer_size_kb + +- Default: 1024 +- Type: Int +- Unit: Kilobytes +- Is mutable: Yes +- Description: Sets the HDFS write buffer size (in KB) used for direct writes to HDFS or object stores when not using a broker. The FE converts this value to bytes (`<< 10`) and initializes the local write buffer in HdfsFsManager, and it is propagated in Thrift requests (e.g., TUploadReq, TExportSink, sink options) so backends/agents use the same buffer size. Increasing this value can improve throughput for large sequential writes at the cost of more memory per writer; decreasing it reduces per-stream memory usage and may lower latency for small writes. Tune alongside `hdfs_read_buffer_size_kb` and consider available memory and concurrent writers. +- Introduced in: v3.2.0 + +##### lake_batch_publish_max_version_num + +- Default: 10 +- Type: Int +- Unit: Count +- Is mutable: Yes +- Description: Sets the upper bound on how many consecutive transaction versions may be grouped together when building a publish batch for lake (cloud‑native) tables. The value is passed to the transaction graph batching routine (see getReadyToPublishTxnListBatch) and works together with `lake_batch_publish_min_version_num` to determine the candidate range size for a TransactionStateBatch. Larger values can increase publish throughput by batching more commits, but increase the scope of an atomic publish (longer visibility latency and larger rollback surface) and may be limited at runtime when versions are not consecutive. Tune according to workload and visibility/latency requirements. +- Introduced in: v3.2.0 + +##### lake_batch_publish_min_version_num + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Sets the minimum number of consecutive transaction versions required to form a publish batch for lake tables. DatabaseTransactionMgr.getReadyToPublishTxnListBatch passes this value to transactionGraph.getTxnsWithTxnDependencyBatch together with `lake_batch_publish_max_version_num` to select dependent transactions. A value of `1` allows single-transaction publishes (no batching). Values `>1` require at least that many consecutively-versioned, single-table, non-replication transactions to be available; batching is aborted if versions are non-consecutive, a replication transaction appears, or a schema change consumes a version. Increasing this value can improve publish throughput by grouping commits but may delay publishing while waiting for enough consecutive transactions. +- Introduced in: v3.2.0 + +##### lake_enable_batch_publish_version + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, PublishVersionDaemon batches ready transactions for the same Lake (shared-data) table/partition and publishes their versions together instead of issuing per-transaction publishes. In RunMode shared-data, the daemon calls getReadyPublishTransactionsBatch() and uses publishVersionForLakeTableBatch(...) to perform grouped publish operations (reducing RPCs and improving throughput). When disabled, the daemon falls back to per-transaction publishing via publishVersionForLakeTable(...). The implementation coordinates in-flight work using internal sets to avoid duplicate publishes when the switch is toggled and is affected by the thread pool sizing via `lake_publish_version_max_threads`. +- Introduced in: v3.2.0 + +##### lake_enable_tablet_creation_optimization + +- Default: false +- Type: boolean +- Unit: - +- Is mutable: Yes +- Description: When enabled, StarRocks optimizes tablet creation for cloud-native tables and materialized views in shared-data mode by creating a single shared tablet metadata for all tablets under a physical partition instead of distinct metadata per tablet. This reduces the number of tablet creation tasks and metadata/files produced during table creation, rollup, and schema-change jobs. The optimization is applied only for cloud-native tables/materialized views and is combined with `file_bundling` (the latter reuses the same optimization logic). Note: schema-change and rollup jobs explicitly disable the optimization for tables using `file_bundling` to avoid overwriting files with identical names. Enable cautiously — it changes the granularity of created tablet metadata and can affect how replica creation and file naming behave. +- Introduced in: v3.3.1, v3.4.0, v3.5.0 + +##### lake_use_combined_txn_log + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: When this item is set to `true`, the system allows Lake tables to use the combined transaction log path for relevant transactions. Available for shared-data clusters only. +- Introduced in: v3.3.7, v3.4.0, v3.5.0 + +##### enable_iceberg_commit_queue + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable commit queue for Iceberg tables to avoid concurrent commit conflicts. Iceberg uses optimistic concurrency control (OCC) for metadata commits. When multiple threads concurrently commit to the same table, conflicts can occur with errors like: "Cannot commit: Base metadata location is not same as the current table metadata location". When enabled, each Iceberg table has its own single-threaded executor for commit operations, ensuring that commits to the same table are serialized and preventing OCC conflicts. Different tables can commit concurrently, maintaining overall throughput. This is a system-level optimization to improve reliability and should be enabled by default. If disabled, concurrent commits may fail due to optimistic locking conflicts. +- Introduced in: v4.1.0 + +##### iceberg_commit_queue_timeout_seconds + +- Default: 300 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout in seconds for waiting for an Iceberg commit operation to complete. When using the commit queue (`enable_iceberg_commit_queue=true`), each commit operation must complete within this timeout. If a commit takes longer than this timeout, it will be cancelled and an error will be raised. Factors that affect commit time include: number of data files being committed, metadata size of the table, performance of the underlying storage (e.g., S3, HDFS). +- Introduced in: v4.1.0 + +##### iceberg_commit_queue_max_size + +- Default: 1000 +- Type: Int +- Unit: Count +- Is mutable: No +- Description: The maximum number of pending commit operations per Iceberg table. When using the commit queue (`enable_iceberg_commit_queue=true`), this limits the number of commit operations that can be queued for a single table. When the limit is reached, additional commit operations will execute in the caller thread (blocking until capacity available). This configuration is read at FE startup and applies to newly created table executors. Requires FE restart to take effect. Increase this value if you expect many concurrent commits to the same table. If this value is too low, commits may block in the caller thread during high concurrency. +- Introduced in: v4.1.0 + +### Other + +##### agent_task_resend_wait_time_ms + +- Default: 5000 +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: The duration the FE must wait before it can resend an agent task. An agent task can be resent only when the gap between the task creation time and the current time exceeds the value of this parameter. This parameter is used to prevent repetitive sending of agent tasks. +- Introduced in: - + +##### allow_system_reserved_names + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow users to create columns whose names are initiated with `__op` and `__row`. To enable this feature, set this parameter to `TRUE`. Please note that these name formats are reserved for special purposes in StarRocks and creating such columns may result in undefined behavior. Therefore this feature is disabled by default. +- Introduced in: v3.2.0 + +##### auth_token + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The token that is used for identity authentication within the StarRocks cluster to which the FE belongs. If this parameter is left unspecified, StarRocks generates a random token for the cluster at the time when the leader FE of the cluster is started for the first time. +- Introduced in: - + +##### authentication_ldap_simple_bind_base_dn + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The base DN, which is the point from which the LDAP server starts to search for users' authentication information. +- Introduced in: - + +##### authentication_ldap_simple_bind_root_dn + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The administrator DN used to search for users' authentication information. +- Introduced in: - + +##### authentication_ldap_simple_bind_root_pwd + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The password of the administrator used to search for users' authentication information. +- Introduced in: - + +##### authentication_ldap_simple_server_host + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The host on which the LDAP server runs. +- Introduced in: - + +##### authentication_ldap_simple_server_port + +- Default: 389 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The port of the LDAP server. +- Introduced in: - + +##### authentication_ldap_simple_user_search_attr + +- Default: uid +- Type: String +- Unit: - +- Is mutable: Yes +- Description: The name of the attribute that identifies users in LDAP objects. +- Introduced in: - + +##### backup_job_default_timeout_ms + +- Default: 86400 * 1000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The timeout duration of a backup job. If this value is exceeded, the backup job fails. +- Introduced in: - + +##### enable_collect_tablet_num_in_show_proc_backend_disk_path + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable the collection of tablet numbers for each disk in the `SHOW PROC /BACKENDS/{id}` command +- Introduced in: v4.0.1, v3.5.8 + +##### enable_colocate_restore + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable Backup and Restore for Colocate Tables. `true` indicates enabling Backup and Restore for Colocate Tables and `false` indicates disabling it. +- Introduced in: v3.2.10, v3.3.3 + +##### enable_materialized_view_concurrent_prepare + +- Default: true +- Type: Boolean +- Unit: +- Is mutable: Yes +- Description: Whether to prepare materialized view concurrently to improve performance. +- Introduced in: v3.4.4 + +##### enable_metric_calculator + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: No +- Description: Specifies whether to enable the feature that is used to periodically collect metrics. Valid values: `TRUE` and `FALSE`. `TRUE` specifies to enable this feature, and `FALSE` specifies to disable this feature. +- Introduced in: - + +##### enable_table_metrics_collect + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to export table-level metrics in FE. When disabled, FE will skip exporting table metrics (such as table scan/load counters and table size metrics), but still records the counters in memory. +- Introduced in: - + +##### enable_mv_post_image_reload_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to perform reload flag check after FE loaded an image. If the check is performed for a base materialized view, it is not needed for other materialized views that related to it. +- Introduced in: v3.5.0 + +##### enable_mv_query_context_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable query-level materialized view rewrite cache to improve query rewrite performance. +- Introduced in: v3.3 + +##### enable_mv_refresh_collect_profile + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable profile in refreshing materialized view by default for all materialized views. +- Introduced in: v3.3.0 + +##### enable_mv_refresh_extra_prefix_logging + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable prefixes with materialized view names in logs for better debug. +- Introduced in: v3.4.0 + +##### enable_mv_refresh_query_rewrite + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable rewrite query during materialized view refresh so that the query can use the rewritten mv directly rather than the base table to improve query performance. +- Introduced in: v3.3 + +##### enable_trace_historical_node + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to allow the system to trace the historical nodes. By setting this item to `true`, you can enable the Cache Sharing feature and allow the system to choose the right cache nodes during elastic scaling. +- Introduced in: v3.5.1 + +##### es_state_sync_interval_second + +- Default: 10 +- Type: Long +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which the FE obtains Elasticsearch indexes and synchronizes the metadata of StarRocks external tables. +- Introduced in: - + +##### hive_meta_cache_refresh_interval_s + +- Default: 3600 * 2 +- Type: Long +- Unit: Seconds +- Is mutable: No +- Description: The time interval at which the cached metadata of Hive external tables is updated. +- Introduced in: - + +##### hive_meta_store_timeout_s + +- Default: 10 +- Type: Long +- Unit: Seconds +- Is mutable: No +- Description: The amount of time after which a connection to a Hive metastore times out. +- Introduced in: - + +##### jdbc_connection_idle_timeout_ms + +- Default: 600000 +- Type: Int +- Unit: Milliseconds +- Is mutable: No +- Description: The maximum amount of time after which a connection for accessing a JDBC catalog times out. Timed-out connections are considered idle. +- Introduced in: - + +##### jdbc_connection_timeout_ms + +- Default: 10000 +- Type: Long +- Unit: Milliseconds +- Is mutable: No +- Description: The timeout in milliseconds for HikariCP connection pool to acquire a connection. If a connection cannot be acquired from the pool within this time, the operation will fail. +- Introduced in: v3.5.13 + +##### jdbc_query_timeout_ms + +- Default: 30000 +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: The timeout in milliseconds for JDBC statement query execution. This timeout is applied to all SQL queries executed through JDBC catalogs (e.g., partition metadata queries). The value is converted to seconds when passed to the JDBC driver. +- Introduced in: v3.5.13 + +##### jdbc_network_timeout_ms + +- Default: 30000 +- Type: Long +- Unit: Milliseconds +- Is mutable: Yes +- Description: The timeout in milliseconds for JDBC network operations (socket read). This timeout applies to database metadata calls (e.g., getSchemas(), getTables(), getColumns()) to prevent indefinite blocking when the external database is unresponsive. +- Introduced in: v3.5.13 + +##### jdbc_connection_pool_size + +- Default: 8 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum capacity of the JDBC connection pool for accessing JDBC catalogs. +- Introduced in: - + +##### jdbc_meta_default_cache_enable + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: The default value for whether the JDBC Catalog metadata cache is enabled. When set to True, newly created JDBC Catalogs will default to metadata caching enabled. +- Introduced in: - + +##### jdbc_meta_default_cache_expire_sec + +- Default: 600 +- Type: Long +- Unit: Seconds +- Is mutable: Yes +- Description: The default expiration time for the JDBC Catalog metadata cache. When `jdbc_meta_default_cache_enable` is set to true, newly created JDBC Catalogs will default to setting the expiration time of the metadata cache. +- Introduced in: - + +##### jdbc_minimum_idle_connections + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The minimum number of idle connections in the JDBC connection pool for accessing JDBC catalogs. +- Introduced in: - + +##### jwt_jwks_url + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The URL to the JSON Web Key Set (JWKS) service or the path to the public key local file under the `fe/conf` directory. +- Introduced in: v3.5.0 + +##### jwt_principal_field + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The string used to identify the field that indicates the subject (`sub`) in the JWT. The default value is `sub`. The value of this field must be identical with the username for logging in to StarRocks. +- Introduced in: v3.5.0 + +##### jwt_required_audience + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The list of strings used to identify the audience (`aud`) in the JWT. The JWT is considered valid only if one of the values in the list match the JWT audience. +- Introduced in: v3.5.0 + +##### jwt_required_issuer + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The list of strings used to identify the issuers (`iss`) in the JWT. The JWT is considered valid only if one of the values in the list match the JWT issuer. +- Introduced in: v3.5.0 + +##### locale + +- Default: zh_CN.UTF-8 +- Type: String +- Unit: - +- Is mutable: No +- Description: The character set that is used by the FE. +- Introduced in: - + +##### max_agent_task_threads_num + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of threads that are allowed in the agent task thread pool. +- Introduced in: - + +##### max_download_task_per_be + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: In each RESTORE operation, the maximum number of download tasks StarRocks assigned to a BE node. When this item is set to less than or equal to 0, no limit is imposed on the task number. +- Introduced in: v3.1.0 + +##### max_mv_check_base_table_change_retry_times + +- Default: 10 +- Type: - +- Unit: - +- Is mutable: Yes +- Description: The maximum retry times for detecting base table change when refreshing materialized views. +- Introduced in: v3.3.0 + +##### max_mv_refresh_failure_retry_times + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum retry times when materialized view fails to refresh. +- Introduced in: v3.3.0 + +##### max_mv_refresh_try_lock_failure_retry_times + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum retry times of try lock when materialized view fails to refresh. +- Introduced in: v3.3.0 + +##### max_small_file_number + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of small files that can be stored on an FE directory. +- Introduced in: - + +##### max_small_file_size_bytes + +- Default: 1024 * 1024 +- Type: Int +- Unit: Bytes +- Is mutable: Yes +- Description: The maximum size of a small file. +- Introduced in: - + +##### max_upload_task_per_be + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: In each BACKUP operation, the maximum number of upload tasks StarRocks assigned to a BE node. When this item is set to less than or equal to 0, no limit is imposed on the task number. +- Introduced in: v3.1.0 + +##### mv_create_partition_batch_interval_ms + +- Default: 1000 +- Type: Int +- Unit: ms +- Is mutable: Yes +- Description: During materialized view refresh, if multiple partitions need to be created in bulk, the system divides them into batches of 64 partitions each. To reduce the risk of failures caused by frequent partition creation, a default interval (in milliseconds) is set between each batch to control the creation frequency. +- Introduced in: v3.3 + +##### mv_plan_cache_max_size + +- Default: 1000 +- Type: Long +- Unit: +- Is mutable: Yes +- Description: The maximum size of materialized view plan cache (which is used for materialized view rewrite). If there are many materialized views used for transparent query rewrite, you may increase this value. +- Introduced in: v3.2 + +##### mv_plan_cache_thread_pool_size + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The default thread pool size of materialized view plan cache (which is used for materialized view rewrite). +- Introduced in: v3.2 + +##### mv_refresh_default_planner_optimize_timeout + +- Default: 30000 +- Type: - +- Unit: - +- Is mutable: Yes +- Description: The default timeout for the planning phase of the optimizer when refresh materialized views. +- Introduced in: v3.3.0 + +##### mv_refresh_fail_on_filter_data + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Mv refresh fails if there is filtered data in refreshing, true by default, otherwise return success by ignoring the filtered data. +- Introduced in: - + +##### mv_refresh_try_lock_timeout_ms + +- Default: 30000 +- Type: Int +- Unit: Milliseconds +- Is mutable: Yes +- Description: The default try lock timeout for materialized view refresh to try the DB lock of its base table/materialized view. +- Introduced in: v3.3.0 + +##### oauth2_auth_server_url + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The authorization URL. The URL to which the users’ browser will be redirected in order to begin the OAuth 2.0 authorization process. +- Introduced in: v3.5.0 + +##### oauth2_client_id + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The public identifier of the StarRocks client. +- Introduced in: v3.5.0 + +##### oauth2_client_secret + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The secret used to authorize StarRocks client with the authorization server. +- Introduced in: v3.5.0 + +##### oauth2_jwks_url + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The URL to the JSON Web Key Set (JWKS) service or the path to the local file under the `conf` directory. +- Introduced in: v3.5.0 + +##### oauth2_principal_field + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The string used to identify the field that indicates the subject (`sub`) in the JWT. The default value is `sub`. The value of this field must be identical with the username for logging in to StarRocks. +- Introduced in: v3.5.0 + +##### oauth2_redirect_url + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The URL to which the users’ browser will be redirected after the OAuth 2.0 authentication succeeds. The authorization code will be sent to this URL. In most cases, it need to be configured as `http://:/api/oauth2`. +- Introduced in: v3.5.0 + +##### oauth2_required_audience + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The list of strings used to identify the audience (`aud`) in the JWT. The JWT is considered valid only if one of the values in the list match the JWT audience. +- Introduced in: v3.5.0 + +##### oauth2_required_issuer + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The list of strings used to identify the issuers (`iss`) in the JWT. The JWT is considered valid only if one of the values in the list match the JWT issuer. +- Introduced in: v3.5.0 + +##### oauth2_token_server_url + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: No +- Description: The URL of the endpoint on the authorization server from which StarRocks obtains the access token. +- Introduced in: v3.5.0 + +##### plugin_dir + +- Default: System.getenv("STARROCKS_HOME") + "/plugins" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores plugin installation packages. +- Introduced in: - + +##### plugin_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether plugins can be installed on FEs. Plugins can be installed or uninstalled only on the Leader FE. +- Introduced in: - + +##### proc_profile_jstack_depth + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Maximum Java stack depth when the system collects CPU and memory profiles. This value controls how many Java stack frames are captured for each sampled stack: larger values increase trace detail and output size and may add profiling overhead, while smaller values reduce details. This setting is used when the profiler is started for both CPU and memory profiling, so adjust it to balance diagnostic needs and performance impact. +- Introduced in: - + +##### proc_profile_mem_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to enable collection of process memory allocation profiles. When this item is set to `true`, the system generates an HTML profile named `mem-profile-.html` under `sys_log_dir/proc_profile`, sleeps for `proc_profile_collect_time_s` seconds while sampling, and uses `proc_profile_jstack_depth` for Java stack depth. Generated files are compressed and purged according to `proc_profile_file_retained_days` and `proc_profile_file_retained_size_bytes`. The native extraction path uses `STARROCKS_HOME_DIR` to avoid `/tmp` noexec issues. This item is intended for troubleshooting memory-allocation hotspots. Enabling it increases CPU, I/O and disk usage and may produce large files. +- Introduced in: v3.2.12 + +##### query_detail_explain_level + +- Default: COSTS +- Type: String +- Unit: - +- Is mutable: true +- Description: The detail level of query plan returned by the EXPLAIN statement. Valid values: COSTS, NORMAL, VERBOSE. +- Introduced in: v3.2.12, v3.3.5 + +##### replication_interval_ms + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The minimum time interval at which the replication tasks are scheduled. +- Introduced in: v3.3.5 + +##### replication_max_parallel_data_size_mb + +- Default: 1048576 +- Type: Int +- Unit: MB +- Is mutable: Yes +- Description: The maximum size of data allowed for concurrent synchronization. +- Introduced in: v3.3.5 + +##### replication_max_parallel_replica_count + +- Default: 10240 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of tablet replicas allowed for concurrent synchronization. +- Introduced in: v3.3.5 + +##### replication_max_parallel_table_count + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of concurrent data synchronization tasks allowed. StarRocks creates one synchronization task for each table. +- Introduced in: v3.3.5 + +##### replication_transaction_timeout_sec + +- Default: 86400 +- Type: Int +- Unit: Seconds +- Is mutable: Yes +- Description: The timeout duration for synchronization tasks. +- Introduced in: v3.3.5 + +##### skip_whole_phase_lock_mv_limit + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Controls when StarRocks applies the "non-lock" optimization for tables that have related materialized views. When this item is set to less than 0, the system always applies non-lock optimization and does not copy related materialized views for queries (FE memory usage and metadata copy/lock contention is reduced but risk of metadata concurrency issues can be increased). When it is set to 0, non-lock optimization is disable (the system always use the safe, copy-and-lock path). When it is set to greater than 0, non-lock optimization is applied only for tables whose number of related materialized views is less than or equal to the configured threshold. Additionally, when the value is greater than and equal to 0, the planner records query OLAP tables into the optimizer context to enable materialized view-related rewrite paths; when it is less than 0, this step is skipped. +- Introduced in: v3.2.1 + +##### small_file_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/small_files" +- Type: String +- Unit: - +- Is mutable: No +- Description: The root directory of small files. +- Introduced in: - + +##### task_runs_max_history_number + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: Maximum number of task run records to retain in memory and to use as a default LIMIT when querying archived task-run history. When `enable_task_history_archive` is false, this value bounds in-memory history: Force GC trims older entries so only the newest `task_runs_max_history_number` remain. When archive history is queried (and no explicit LIMIT is provided), `TaskRunHistoryTable.lookup` uses `"ORDER BY create_time DESC LIMIT "` if this value is greater than 0. Note: setting this to 0 disables the query-side LIMIT (no cap) but will cause in-memory history to be truncated to zero (unless archiving is enabled). +- Introduced in: v3.2.0 + +##### tmp_dir + +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/temp_dir" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores temporary files such as files generated during backup and restore procedures. After these procedures finish, the generated temporary files are deleted. +- Introduced in: - + +##### transform_type_prefer_string_for_varchar + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: Yes +- Description: Whether to prefer string type for fixed length varchar columns in materialized view creation and CTAS operations. +- Introduced in: v4.0.0 + + + + diff --git a/docs/en/administration/management/Scale_up_down.md b/docs/en/administration/management/Scale_up_down.md new file mode 100644 index 0000000..7d7eb3f --- /dev/null +++ b/docs/en/administration/management/Scale_up_down.md @@ -0,0 +1,99 @@ +--- +displayed_sidebar: docs +--- + +# Scale in and out + +This topic describes how to scale in and out the node of StarRocks. + +## Scale FE in and out + +StarRocks has two types of FE nodes: Follower and Observer. Followers are involved in election voting and writing. Observers are only used to synchronize logs and extend read performance. + +> * The number of follower FEs (including leader) must be odd, and it is recommended to deploy 3 of them to form a High Availability (HA) mode. +> * When the FE is in high availability deployment (1 leader, 2 followers), it is recommended to add Observer FEs for better read performance. + +### Scale FE out + +After deploying the FE node and starting the service, run the following command to scale FE out. + +~~~sql +alter system add follower "fe_host:edit_log_port"; +alter system add observer "fe_host:edit_log_port"; +~~~ + +### Scale FE in + +FE scale-in is similar to the scale-out. Run the following command to scale FE in. + +~~~sql +alter system drop follower "fe_host:edit_log_port"; +alter system drop observer "fe_host:edit_log_port"; +~~~ + +After the expansion and contraction, you can view the node information by running `show proc '/frontends';`. + +## Scale BE in and out + +StarRocks will automatically perform load-balancing after BE's are scaled in or out without affecting the overall performance. + +When you add a new BE node, the system's Tablet Scheduler will detect the new node and its low load. It will then start moving tablets from high-load BE nodes to the new, low-load BE node to ensure an even distribution of data and load across the entire cluster. + +The balancing process is based on a loadScore calculated for each BE, which considers both disk utilization and replica count. The system aims to move tablets from nodes with a higher loadScore to nodes with a lower loadScore. + +You can check the FE configuration parameter `tablet_sched_disable_balance` to ensure that automatic balancing is not disabled (the parameter is false by default, which means that tablet balancing is enabled by default). More details are in the [manage replica docs](./resource_management/Replica.md). + +### Scale BE out + +Run the following command to scale BE out. + +~~~sql +alter system add backend 'be_host:be_heartbeat_service_port'; +~~~ + +Run the following command to check the BE status. + +~~~sql +show proc '/backends'; +~~~ + +### Scale BE in + +There are two ways to scale in a BE node – `DROP` and `DECOMMISSION`. + +`DROP` will delete the BE node immediately, and the lost duplicates will be made up by FE scheduling. `DECOMMISSION` will make sure the duplicates are made up first, and then drop the BE node. `DECOMMISSION` is a bit more friendly and is recommended for BE scale-in. + +The commands of both methods are similar: + +* `alter system decommission backend "be_host:be_heartbeat_service_port";` +* `alter system drop backend "be_host:be_heartbeat_service_port";` + +Drop backend is a dangerous operation, so you need to confirm it twice before executing it + +* `alter system drop backend "be_host:be_heartbeat_service_port";` + +## Scale CN in and out + +### Scale CN out + +Run the following command to scale CN out. + +~~~sql +ALTER SYSTEM ADD COMPUTE NODE "cn_host:cn_heartbeat_service_port"; +~~~ + +Run the following command to check the CN status. + +~~~sql +SHOW PROC '/compute_nodes'; +~~~ + +### Scale CN in + +CN scale-in is similar to the scale-out. Run the following command to scale CN in. + +~~~sql +ALTER SYSTEM DROP COMPUTE NODE "cn_host:cn_heartbeat_service_port"; +~~~ + +You can view the node information by running `SHOW PROC '/compute_nodes';`. diff --git a/docs/en/administration/management/audit_loader.md b/docs/en/administration/management/audit_loader.md new file mode 100644 index 0000000..818c9d0 --- /dev/null +++ b/docs/en/administration/management/audit_loader.md @@ -0,0 +1,221 @@ +--- +displayed_sidebar: docs +--- + +# Manage audit logs within StarRocks via AuditLoader + +This topic describes how to manage StarRocks audit logs within a table via the plugin - AuditLoader. + +StarRocks stores its audit logs in the local file **fe/log/fe.audit.log** rather than an internal database. The plugin AuditLoader allows you to manage audit logs directly within your cluster. Once installed, AuditLoader reads logs from the file, and loads them into StarRocks via HTTP PUT. You can then query the audit logs in StarRocks using SQL statements. + +## Create a table to store audit logs + +Create a database and a table in your StarRocks cluster to store its audit logs. See [CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) and [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md) for detailed instructions. + +Because the fields of audit logs vary among different StarRocks versions, it is important to follow recommendations mentioned below to avoid compatibility issues during upgrade: + +> **CAUTION** +> +> - All new fields should be marked as `NULL`. +> - Fields should NOT be renamed, as users may rely on them. +> - Only backward compatible changes should be applied to field types, e.g. `VARCHAR(32)` -> `VARCHAR(64)`, to avoid errors during insert. +> - `AuditEvent` fields are resolved by name only. The order of columns within table doesn't matter, and can be changed by user in any time. +> - `AuditEvent` fields which doesn't exist in the table are ignored, so users can remove columns they don't need. + +```SQL +CREATE DATABASE starrocks_audit_db__; + +CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( + `queryId` VARCHAR(64) COMMENT "Unique ID of the query", + `timestamp` DATETIME NOT NULL COMMENT "Query start time", + `queryType` VARCHAR(12) COMMENT "Query type (query, slow_query, connection)", + `clientIp` VARCHAR(32) COMMENT "Client IP", + `user` VARCHAR(64) COMMENT "Query username", + `authorizedUser` VARCHAR(64) COMMENT "Unique identifier of the user, i.e., user_identity", + `resourceGroup` VARCHAR(64) COMMENT "Resource group name", + `catalog` VARCHAR(32) COMMENT "Catalog name", + `db` VARCHAR(96) COMMENT "Database where the query runs", + `state` VARCHAR(8) COMMENT "Query state (EOF, ERR, OK)", + `errorCode` VARCHAR(512) COMMENT "Error code", + `queryTime` BIGINT COMMENT "Query execution time (milliseconds)", + `scanBytes` BIGINT COMMENT "Number of bytes scanned by the query", + `scanRows` BIGINT COMMENT "Number of rows scanned by the query", + `returnRows` BIGINT COMMENT "Number of rows returned by the query", + `cpuCostNs` BIGINT COMMENT "CPU time consumed by the query (nanoseconds)", + `memCostBytes` BIGINT COMMENT "Memory consumed by the query (bytes)", + `stmtId` INT COMMENT "Incremental ID of the SQL statement", + `isQuery` TINYINT COMMENT "Whether the SQL is a query (1 or 0)", + `feIp` VARCHAR(128) COMMENT "FE IP that executed the statement", + `stmt` VARCHAR(1048576) COMMENT "Original SQL statement", + `digest` VARCHAR(32) COMMENT "Fingerprint of slow SQL", + `planCpuCosts` DOUBLE COMMENT "CPU usage during query planning (nanoseconds)", + `planMemCosts` DOUBLE COMMENT "Memory usage during query planning (bytes)", + `pendingTimeMs` BIGINT COMMENT "Time the query waited in the queue (milliseconds)", + `candidateMVs` VARCHAR(65533) NULL COMMENT "List of candidate materialized views", + `hitMvs` VARCHAR(65533) NULL COMMENT "List of matched materialized views", + `warehouse` VARCHAR(32) NULL COMMENT "Warehouse name" +) ENGINE = OLAP +DUPLICATE KEY (`queryId`, `timestamp`, `queryType`) +COMMENT "Audit log table" +PARTITION BY date_trunc('day', `timestamp`) +PROPERTIES ( + "replication_num" = "1", + "partition_live_number" = "30" +); +``` + +`starrocks_audit_tbl__` is created with dynamic partitions. By default, the first dynamic partition is created 10 minutes after the table is created. Audit logs can then be loaded into the table. You can check the partitions in the table using the following statement: + +```SQL +SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; +``` + +After a partition is created, you can move on to the next step. + +## Download and configure AuditLoader + +1. [Download](https://releases.starrocks.io/resources/auditloader.zip) the AuditLoader installation package. The package is compatible with all available versions of StarRocks. + +2. Unzip the installation package. + + ```shell + unzip auditloader.zip + ``` + + The following files are inflated: + + - **auditloader.jar**: the JAR file of AuditLoader. + - **plugin.properties**: the properties file of AuditLoader. You do not need to modify this file. + - **plugin.conf**: the configuration file of AuditLoader. In most cases, you only need to modify the `user` and `password` fields in the file. + +3. Modify **plugin.conf** to configure AuditLoader. You must configure the following items to make sure AuditLoader can work properly: + + - `frontend_host_port`: FE IP address and HTTP port, in the format `:`. It is recommended to set it to its default value `127.0.0.1:8030`. Each FE in StarRocks manages its own Audit Log independently, and after installing the plugin, each FE will start its own background thread to fetch and save Audit Logs, and write them via Stream Load. The `frontend_host_port` configuration item is used to provide the IP and port of the HTTP protocol for the background Stream Load task of the plug-in, and this parameter does not support multiple values. The IP part of the parameter can use the IP of any FE in the cluster, but it is not recommended because if the corresponding FE crashes, the audit log writing task in the background of other FEs will also fail due to the failure of communication. It is recommended to set it to the default value `127.0.0.1:8030`, so that each FE uses its own HTTP port to communicate, thus avoiding the impact on the communication in case of an exception of the other FEs (all the write tasks will be forwarded to the FE Leader node to be executed eventually). + - `database`: name of the database you created to host audit logs. + - `table`: name of the table you created to host audit logs. + - `user`: your cluster username. You MUST have the privilege to load data (LOAD_PRIV) into the table. + - `password`: your user password. + - `secret_key`: the key (string, must not be longer than 16 bytes) used to encrypt the password. If this parameter is not set, it indicates that the password in **plugin.conf** will not be encrypted, and you only need to specify the plaintext password in `password`. If this parameter is specified, it indicates that the password is encrypted by this key, and you need to specify the encrypted string in `password`. The encrypted password can be generated in StarRocks using the `AES_ENCRYPT` function: `SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`. + - `filter`: the filter conditions for audit log loading. This parameter is based on the [WHERE parameter](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties) in Stream Load, i.e. `-H “where: ”`, defaults to an empty string. Example: `filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`. + +4. Zip the files back into a package. + + ```shell + zip -q -m -r auditloader.zip auditloader.jar plugin.conf plugin.properties + ``` + +5. Dispatch the package to all machines that host FE nodes. Make sure all packages are stored in an identical path. Otherwise, the installation fails. Remember to copy the absolute path to the package after you dispatched the package. + + > **NOTE** + > + > You can also distribute **auditloader.zip** to an HTTP service accessible to all FEs (for example, `httpd` or `nginx`) and install it using the network. Note that in both cases the **auditloader.zip** needs to be persisted in the path after the installation is performed, and the source files should not be deleted after installation. + +## Install AuditLoader + +Execute the following statement along with the path you copied to install AuditLoader as a plugin in StarRocks: + +```SQL +INSTALL PLUGIN FROM ""; +``` + +Example of installation from a local package: + +```SQL +INSTALL PLUGIN FROM ""; +``` + +If you want install the plugin via a network path, you need to provide the md5 of the package in the properties of the INSTALL statement. + +Example: + +```sql +INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5sum" = "3975F7B880C9490FE95F42E2B2A28E2D"); +``` + +See [INSTALL PLUGIN](../../sql-reference/sql-statements/cluster-management/plugin/INSTALL_PLUGIN.md) for detailed instructions. + +## Verify the installation and query audit logs + +1. You can check if the installation is successful via [SHOW PLUGINS](../../sql-reference/sql-statements/cluster-management/plugin/SHOW_PLUGINS.md). + + In the following example, the `Status` of the plugin `AuditLoader` is `INSTALLED`, meaning installation is successful. + + ```Plain + mysql> SHOW PLUGINS\G + *************************** 1. row *************************** + Name: __builtin_AuditLogBuilder + Type: AUDIT + Description: builtin audit logger + Version: 0.12.0 + JavaVersion: 1.8.31 + ClassName: com.starrocks.qe.AuditLogBuilder + SoName: NULL + Sources: Builtin + Status: INSTALLED + Properties: {} + *************************** 2. row *************************** + Name: AuditLoader + Type: AUDIT + Description: Available for versions 3.3.11+. Load audit log to starrocks, and user can view the statistic of queries + Version: 5.0.0 + JavaVersion: 11 + ClassName: com.starrocks.plugin.audit.AuditLoaderPlugin + SoName: NULL + Sources: /x/xx/xxx/xxxxx/auditloader.zip + Status: INSTALLED + Properties: {} + 2 rows in set (0.01 sec) + ``` + +2. Execute some random SQLs to generate audit logs, and wait for 60 seconds (or the time you have specified in the item `max_batch_interval_sec` when you configure AuditLoader) to allow AuditLoader to load audit logs into StarRocks. + +3. Check the audit logs by querying the table. + + ```SQL + SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__; + ``` + + The following example shows when audit logs are loaded into the table successfully: + + ```Plain + mysql> SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__\G + *************************** 1. row *************************** + queryId: 01975a33-4129-7520-97a2-05e641cec6c9 + timestamp: 2025-06-10 14:16:37 + queryType: query + clientIp: xxx.xx.xxx.xx:65283 + user: root + authorizedUser: 'root'@'%' + resourceGroup: default_wg + catalog: default_catalog + db: + state: EOF + errorCode: + queryTime: 3 + scanBytes: 0 + scanRows: 0 + returnRows: 1 + cpuCostNs: 33711 + memCostBytes: 4200 + stmtId: 102 + isQuery: 1 + feIp: xxx.xx.xxx.xx + stmt: SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__ + digest: + planCpuCosts: 908 + planMemCosts: 0 + pendingTimeMs: -1 + candidateMvs: null + hitMVs: null + ………… + ``` + +## Troubleshooting + +If no audit logs are loaded to the table after the dynamic partition is created and the plugin is installed, you can check whether **plugin.conf** is configured properly or not. To modify it, you must first uninstall the plugin: + +```SQL +UNINSTALL PLUGIN AuditLoader; +``` + +Logs of AuditLoader are printed in **fe.log**, you can retrieve them by searching the keyword `audit` in **fe.log**. After all configurations are set correctly, you can follow the above steps to install AuditLoader again. diff --git a/docs/en/administration/management/compaction.md b/docs/en/administration/management/compaction.md new file mode 100644 index 0000000..f5b0bbc --- /dev/null +++ b/docs/en/administration/management/compaction.md @@ -0,0 +1,303 @@ +--- +displayed_sidebar: docs +--- + +# Compaction for Shared-data Clusters + +This topic describes how to manage compaction in StarRocks shared-data clusters. + +## Overview + +Each data loading operation in StarRocks generates a new version of data files. Compaction merges data files from different versions into larger files, reducing the number of small files and improving query efficiency. + +## Compaction Score + +### Overview + +The *Compaction Score* reflects the merging status of data files in a partition. A higher score indicates lower merging progress, meaning the partition has more unmerged data file versions. FE maintains Compaction Score information for each partition, including the Max Compaction Score (the highest score among all tablets in the partition). + +If a partition's Max Compaction Score is below the FE parameter `lake_compaction_score_selector_min_score` (default: 10), compaction for that partition is considered complete. A Max Compaction Score exceeding 100 indicates an unhealthy compaction state. When the score exceeds the FE parameter `lake_ingest_slowdown_threshold` (default: 100), the system slows down data loading transaction commits for the partition. If it surpasses `lake_compaction_score_upper_bound` (default: 2000), the system rejects import transactions for the partition. + +### Calculation Rules + +Typically, each data file contributes 1 to the Compaction Score. For example, if a partition has one tablet and 10 data files generated from the first loading operation, the partition’s Max Compaction Score is 10. All data files generated by a transaction within a tablet are grouped as a Rowset. + +During score calculation, a tablet’s Rowsets are grouped by size, and the group with the highest number of files determines the tablet’s Compaction Score. + +For example, a tablet undergoes 7 loading operations, generating Rowsets with sizes: 100 MB, 100 MB, 100 MB, 10 MB, 10 MB, 10 MB, and 10 MB. During calculation, the system will make three 100 MB Rowsets into one group and four 10 MB Rowsets into another. The Compaction Score is calculated based on the group with more files. In this case, the second group has bigger compaction score. The compaction prioritizes the higher-scoring group, so after the first compaction, the Rowset diatribution would be: 100 MB, 100 MB, 100 MB, and 40 MB. + +## Compaction Workflow + +For shared-data clusters, StarRocks introduces a new FE-controlled compaction mechanism: + +1. Score Calculation: The Leader FE node calculates and stores Compaction Scores for partitions based on transaction publish results. +2. Candidate Selection: FE selects partitions with the highest Max Compaction Scores as compaction candidates. +3. Task Generation: FE initiates compaction transactions for selected partitions, generates tablet-level subtasks, and dispatches them to Compute Nodes (CNs) until reaching the limit set by the FE parameter `lake_compaction_max_tasks`. +4. Subtask Execution: CNs execute compaction subtasks in the background. The number of concurrent subtasks per CN is controlled by the CN parameter `compact_threads`. +5. Result Collection: FE aggregates subtask results and commits the compaction transaction. +6. Publish: FE publishes the successfully committed compaction transaction. + +## Manage compaction + +### View compaction scores + +- You can view the compaction scores of partitions in a specific table by using the SHOW PROC statement. Typically, you only need to focus on the `MaxCS` field. If `MaxCS` is below 10, compaction is considered complete. If `MaxCS` is above 100, the Compaction Score is relatively high. If `MaxCS` exceeds 500, the Compaction Score is very high and manual intervention may be required. + + ```Plain + SHOW PARTITIONS FROM + SHOW PROC '/dbs///partitions' + ``` + + Example: + + ```Plain + mysql> SHOW PROC '/dbs/load_benchmark/store_sales/partitions'; + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | PartitionId | PartitionName | CompactVersion | VisibleVersion | NextVersion | State | PartitionKey | Range | DistributionKey | Buckets | DataSize | RowCount | CacheTTL | AsyncWrite | AvgCS | P50CS | MaxCS | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | 38028 | store_sales | 913 | 921 | 923 | NORMAL | | | ss_item_sk, ss_ticket_number | 64 | 15.6GB | 273857126 | 2592000 | false | 10.00 | 10.00 | 10.00 | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + 1 row in set (0.20 sec) + ``` + +- You can also view the partition compaction scores by querying the system-defined view `information_schema.partitions_meta`. + + Example: + + ```Plain + mysql> SELECT * FROM information_schema.partitions_meta ORDER BY Max_CS LIMIT 10; + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | DB_NAME | TABLE_NAME | PARTITION_NAME | PARTITION_ID | COMPACT_VERSION | VISIBLE_VERSION | VISIBLE_VERSION_TIME | NEXT_VERSION | PARTITION_KEY | PARTITION_VALUE | DISTRIBUTION_KEY | BUCKETS | REPLICATION_NUM | STORAGE_MEDIUM | COOLDOWN_TIME | LAST_CONSISTENCY_CHECK_TIME | IS_IN_MEMORY | IS_TEMP | DATA_SIZE | ROW_COUNT | ENABLE_DATACACHE | AVG_CS | P50_CS | MAX_CS | STORAGE_PATH | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | tpcds_1t | call_center | call_center | 11905 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | cc_call_center_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 12.3KB | 42 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11906/11905 | + | tpcds_1t | web_returns | web_returns | 12030 | 3 | 3 | 2024-03-17 08:40:48 | 4 | | | wr_item_sk, wr_order_number | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 3.5GB | 71997522 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12031/12030 | + | tpcds_1t | warehouse | warehouse | 11847 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | w_warehouse_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 4.2KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11848/11847 | + | tpcds_1t | ship_mode | ship_mode | 11851 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | sm_ship_mode_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 1.7KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11852/11851 | + | tpcds_1t | customer_address | customer_address | 11790 | 0 | 2 | 2024-03-17 08:32:19 | 3 | | | ca_address_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 120.9MB | 6000000 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11791/11790 | + | tpcds_1t | time_dim | time_dim | 11855 | 0 | 2 | 2024-03-17 08:30:48 | 3 | | | t_time_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 864.7KB | 86400 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11856/11855 | + | tpcds_1t | web_sales | web_sales | 12049 | 3 | 3 | 2024-03-17 10:14:20 | 4 | | | ws_item_sk, ws_order_number | 128 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 47.7GB | 720000376 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12050/12049 | + | tpcds_1t | store | store | 11901 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | s_store_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 95.6KB | 1002 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11902/11901 | + | tpcds_1t | web_site | web_site | 11928 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | web_site_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 13.4KB | 54 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11929/11928 | + | tpcds_1t | household_demographics | household_demographics | 11932 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | hd_demo_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 2.1KB | 7200 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11933/11932 | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + ``` + +### View compaction tasks + +As new data is loading to the system, FE constantly schedules compaction tasks to be executed on different CN nodes. You can first view the general status of compaction tasks on FE, and then view the execution details of each tasks on CN. + +#### View general status of compaction tasks + +You can view the general status of compaction tasks using the SHOW PROC statement. + +```SQL +SHOW PROC '/compactions'; +``` + +Example: + +```Plain +mysql> SHOW PROC '/compactions'; ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Partition | TxnID | StartTime | CommitTime | FinishTime | Error | Profile | ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ssb.lineorder.10081 | 15 | 2026-01-10 03:29:07 | 2026-01-10 03:29:11 | 2026-01-10 03:29:12 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":219,"write_remote_sec":4,"in_queue_sec":18} | +| ssb.lineorder.10068 | 16 | 2026-01-10 03:29:07 | 2026-01-10 03:29:13 | 2026-01-10 03:29:14 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":38} | +| ssb.lineorder.10055 | 20 | 2026-01-10 03:29:11 | 2026-01-10 03:29:15 | 2026-01-10 03:29:17 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":23} | ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +The following fields are returned: + +- `Partition`: The partition to which the compaction task belongs. +- `TxnID`: The transaction ID assigned to the compaction task. +- `StartTime`: The time when the compaction task starts. `NULL` indicates that the task has not yet been initiated. +- `CommitTime`: The time when the compaction task commits the data. `NULL` indicates that the data has not yet been committed. +- `FinishTime`: The time when the compaction task publishes the data. `NULL` indicates that the data has not yet been published. +- `Error`: The error message (if any) of the compaction task. +- `Profile`: (supported from v3.2.12 and v3.3.4) The Profile of the compaction task after finished. + - `sub_task_count`: The number of sub-tasks (equivalent to tablets) in the partition. + - `read_local_sec`: The total time consumption of all sub-tasks on reading data from the local cache. Unit: Seconds. + - `read_local_mb`: The total size of data read from the local cache by all sub-tasks. Unit: MB. + - `read_remote_sec`: The total time consumption of all sub-tasks on reading data from the remote storage. Unit: Seconds. + - `read_remote_mb`: The total size of data read from the remote storage by all sub-tasks. Unit: MB. + - `read_segment_count`: The total number of files read by all sub-tasks. + - `write_segment_count`: The total number of new files generated by all sub-tasks. + - `write_segment_mb`: The total size of new files generated by all sub-tasks. Unit: MB. + - `write_remote_sec`: The total time consumption of all sub-tasks on writing data to the remote storage. Unit: Seconds. + - `in_queue_sec`: The total time of all sub-tasks staying in the queue. Unit: Seconds. + +#### View execution details of compaction tasks + +Each compaction task is divided into multiple sub-tasks, each of which corresponds to a tablet. You can view the execution details of each sub-task by querying the system-defined view `information_schema.be_cloud_native_compactions`. + +Example: + +```Plain +mysql> SELECT * FROM information_schema.be_cloud_native_compactions; ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| BE_ID | TXN_ID | TABLET_ID | VERSION | SKIPPED | RUNS | START_TIME | FINISH_TIME | PROGRESS | STATUS | PROFILE | ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 10001 | 51047 | 43034 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51048 | 43032 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":32,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51049 | 43033 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51051 | 43038 | 9 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 84 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51052 | 43036 | 12 | 0 | 0 | NULL | NULL | 0 | | | +| 10001 | 51053 | 43035 | 12 | 0 | 1 | 2024-09-24 19:15:16 | NULL | 2 | | {"read_local_sec":0,"read_local_mb":1,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":100,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +The following fields are returned: + +- `BE_ID`: The ID of the CN. +- `TXN_ID`: The ID of transaction to which the sub-task belongs. +- `TABLET_ID`: The ID of tablet to which the sub-task belongs. +- `VERSION`: The version of the tablet. +- `RUNS`: The number of times the sub-task has been executed. +- `START_TIME`: The time when the sub-task starts. +- `FINISH_TIME`: The time when the sub-task finishes. +- `PROGRESS`: The compaction progress of the tablet in percentage. +- `STATUS`: The status of the sub-task. Error messages will be returned in this field if there is an error. +- `PROFILE`: (supported from v3.2.12 and v3.3.4) The runtime profile of the sub-task. + - `read_local_sec`: The time consumption of the sub-task on reading data from the local cache. Unit: Seconds. + - `read_local_mb`: The size of data read from the local cache by the sub-task. Unit: MB. + - `read_remote_sec`: The time consumption of the sub-task on reading data from the remote storage. Unit: Seconds. + - `read_remote_mb`: The size of data read from the remote storage by the sub-task. Unit: MB. + - `read_local_count`: The number of times the sub-task reads data from the local cache. + - `read_remote_count`: The number of times the sub-task reads data from the remote storage. + - `in_queue_sec`: The time of the sub-task staying in queue. Unit: Seconds. + +### Configure compaction tasks + +You can configure compaction tasks using these FE and CN (BE) parameters. + +#### FE parameter + +You can configure the following FE parameter dynamically. + +```SQL +ADMIN SET FRONTEND CONFIG ("lake_compaction_max_tasks" = "-1"); +``` + +##### lake_compaction_max_tasks + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of concurrent Compaction tasks allowed in a shared-data cluster. Setting this item to `-1` indicates calculating the concurrent task number in an adaptive manner, that is, the number of surviving CN nodes multiplied by 16. Setting this value to `0` will disable compaction. +- Introduced in: v3.1.0 + +```SQL +ADMIN SET FRONTEND CONFIG ("lake_compaction_disable_tables" = "11111;22222"); +``` + +##### lake_compaction_disable_tables + +- Default:"" +- Type:String +- Unit:- +- Is mutable:Yes +- Description:Disable compaction for certain tables. This will not affect compaction that has started. The value of this item is table ID. Multiple values are separated by ';'. +- Introduced in:v3.2.7 + +#### CN parameters + +You can configure the following CN parameter dynamically. + +```SQL +UPDATE information_schema.be_configs SET VALUE = 8 +WHERE name = "compact_threads"; +``` + +##### compact_threads + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of threads used for concurrent compaction tasks. This configuration is changed to dynamic from v3.1.7 and v3.2.2 onwards. +- Introduced in: v3.0.0 + +> **NOTE** +> +> In production, it is recommended to set `compact_threads` to 25% of the BE/CN CPU core count. + +##### max_cumulative_compaction_num_singleton_deltas + +- Default: 500 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of segments that can be merged in a single Cumulative Compaction. You can reduce this value if OOM occurs during compaction. +- Introduced in: - + +> **NOTE** +> +> In production, it is recommended to set `max_cumulative_compaction_num_singleton_deltas` to `100` to accelerate the compaction tasks and reduce their recource consumption. + +##### lake_pk_compaction_max_input_rowsets + +- Default: 500 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: The maximum number of input rowsets allowed in a Primary Key table compaction task in a shared-data cluster. The default value of this parameter is changed from `5` to `1000` since v3.2.4 and v3.1.10, and to `500` since since v3.3.1 and v3.2.9. After the Sized-tiered Compaction policy is enabled for Primary Key tables (by setting `enable_pk_size_tiered_compaction_strategy` to `true`), StarRocks does not need to limit the number of rowsets for each compaction to reduce write amplification. Therefore, the default value of this parameter is increased. +- Introduced in: v3.1.8, v3.2.3 + +> **NOTE** +> +> In production, it is recommended to set `max_cumulative_compaction_num_singleton_deltas` to `100` to accelerate the Compaction tasks and reduce their resource consumption. + +### Manually trigger compaction tasks + +```SQL +-- Trigger compaction for the whole table. +ALTER TABLE COMPACT; +-- Trigger compaction for a specific partition. +ALTER TABLE COMPACT ; +-- Trigger compaction for multiple partitions. +ALTER TABLE COMPACT (, , ...); +``` + +### Cancel compaction tasks + +You can manually cancel a compaction task using the transaction ID of the task. + +```SQL +CANCEL COMPACTION WHERE TXN_ID = ; +``` + +> **NOTE** +> +> - The CANCEL COMPACTION statement must be submitted from the Leader FE node. +> - The CANCEL COMPACTION statement only applies to transactions that have not committed, that is, `CommitTime` is NULL in the return of `SHOW PROC '/compactions'`. +> - CANCEL COMPACTION is an asynchronous process. You can check if the task is cancelled by executing `SHOW PROC '/compactions'`. + +## Best practices + +Since Compaction is crucial for query performance, it is recommended to regularly monitor the data merging status of tables and partitions. Here are some best practices and guidelines: + +- Try to increase the time interval between loading (avoid scenarios with intervals less than 10 seconds) and increase the batch size per load (avoid batch sizes smaller than 100 rows of data). +- Adjust the number of parallel compaction worker threads on CN to accelerate task execution. It is recommended to set `compact_threads` to 25% of the BE/CN CPU core count in a production environment. +- Monitor the Compaction task status using `show proc '/compactions'` and `select * from information_schema.be_cloud_native_compactions;`. +- Monitor the Compaction Score, and configure alerts based on it. StarRocks' built-in Grafana monitoring template includes this metric. +- Pay attention to the resource consumption during compaction, especially memory usage. The Grafana monitoring template also includes this metric. + +## Troubleshooting + +### Slow queries + +To identify slow queries caused by untimely Compaction, you can check, in the SQL Profile, the value of `SegmentsReadCount` divided by `TabletCount` within a single Fragment. If it is an large value, such as tens or more, untimely Compaction may be the cause of the slow query. + +### High Max Compaction Score in the cluster + +1. Check whether the Compaction-related parameters are within reasonable ranges using `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` and `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"`. +2. Check if Compaction is stuck using `SHOW PROC '/compactions'`: + - If `CommitTime` remains NULL, check the system view `information_schema.be_cloud_native_compactions` for the reason why Compaction is stuck. + - If `FinishTime` remains NULL, search for the Publish failure reason in the Leader FE log using `TxnID`. +3. Check if compaction is running slowly using `SHOW PROC '/compactions'`: + - If `sub_task_count` is too large (check the size of each tablet in this partition using `SHOW PARTITIONS`), the table may be created improperly. + - If `read_remote_mb` is too large (more than 30% of the total read data), check the server disk size and also check the cache quota through `SHOW BACKENDS` for field `DataCacheMetrics`. + - If `write_remote_sec` is too large (more than 90% of the total Compaction time), write to the remote storage may be too slow. This can be verified by checking the shared-data-specific monitoring metrics with keywords `single upload latency` and `multi upload latency`. + - If `in_queue_sec` is too large (average waiting time per tablet exceeds 60 seconds), the parameter settings may be unreasonable or other running Compactions are too slow. diff --git a/docs/en/administration/management/configuration.mdx b/docs/en/administration/management/configuration.mdx new file mode 100644 index 0000000..5e926d6 --- /dev/null +++ b/docs/en/administration/management/configuration.mdx @@ -0,0 +1,11 @@ +--- +displayed_sidebar: docs +--- + +# Configuration + +Configuration parameters for FE and BE nodes. + +import DocCardList from '@theme/DocCardList'; + + diff --git a/docs/en/administration/management/enable_fqdn.md b/docs/en/administration/management/enable_fqdn.md new file mode 100644 index 0000000..c752df2 --- /dev/null +++ b/docs/en/administration/management/enable_fqdn.md @@ -0,0 +1,163 @@ +--- +displayed_sidebar: docs +--- + +# Enable FQDN access + +This topic describes how to enable cluster access by using a fully qualified domain name (FQDN). An FQDN is a **complete domain name** for a specific entity that can be accessed over the Internet. The FQDN consists of two parts: the hostname and the domain name. + +Before 2.4, StarRocks supports access to FEs and BEs via IP address only. Even if an FQDN is used to add a node to a cluster, it is transformed into an IP address eventually. This causes a huge inconvenience for DBAs because changing the IP addresses of certain nodes in a StarRocks cluster can lead to access failures to the nodes. In version 2.4, StarRocks decouples each node from its IP address. You can now manage nodes in StarRocks solely via their FQDNs. + +## Prerequisites + +To enable FQDN access for a StarRocks cluster, make sure the following requirements are satisfied: + +- Each machine in the cluster must have a hostname. + +- In the file **/etc/hosts** on each machine, you must specify the corresponding IP addresses and FQDNs of other machines in the cluster. + +- IP addresses in the **/etc/hosts** file must be unique. + +## Set up a new cluster with FQDN access + +By default, FE nodes in a new cluster are started via IP address access. To start a new cluster with FQDN access, you must start the FE nodes by running the following commands **when you start the cluster for the first time**: + +```Shell +./bin/start_fe.sh --host_type FQDN --daemon +``` + +The property `--host_type` specifies the access method that is used to start the node. Valid values include `FQDN` and `IP`. You only need to specify this property ONCE when you start the node for the first time. + +Each BE node identifies itself with `BE Address` defined in the FE metadata. Therefore, you DO NOT need to specify `--host_type` when you start BE nodes. If the `BE Address` defines a BE node with an FQDN, the BE node identifies itself with this FQDN. + +## Enable FQDN access in an existing cluster + +To enable FQDN access in an existing cluster that was previously started via IP addresses, you must first **upgrade** StarRocks to version 2.4.0 or later. + +### Enable FQDN access for FE nodes + +You need to enable FQDN access for all the non-Leader Follower FE nodes before enabling that for the Leader FE node. + +> **CAUTION** +> +> Make sure that the cluster has at least three Follower FE nodes before you enable FQDN access for FE nodes. + +#### Enable FQDN access for non-Leader Follower FE nodes + +1. Navigate to the deployment directory of the FE node, and run the following command to stop the FE node: + + ```Shell + ./bin/stop_fe.sh + ``` + +2. Execute the following statement via your MySQL client to check the `Alive` status of the FE node that you have stopped. Wait until the `Alive` status becomes `false`. + + ```SQL + SHOW PROC '/frontends'\G + ``` + +3. Execute the following statement to replace the IP address with FQDN. + + ```SQL + ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; + ``` + +4. Run the following command to start the FE node with FQDN access. + + ```Shell + ./bin/start_fe.sh --host_type FQDN --daemon + ``` + + The property `--host_type` specifies the access method that is used to start the node. Valid values include `FQDN` and `IP`. You only need to specify this property ONCE when you restart the node after you modify the node. + +5. Check the `Alive` status of the FE node. Wait until the `Alive` status becomes `true`. + + ```SQL + SHOW PROC '/frontends'\G + ``` + +6. Repeat the above steps to enable FQDN access for the other non-Leader Follower FE nodes one after another when the `Alive` status of the current FE node is `true`. + +#### Enable FQDN access for the Leader FE node + +After all the non-Leader FE nodes have been modified and restarted successfully, you can now enable FQDN access for the Leader FE node. + +> **NOTE** +> +> Before the Leader FE node is enabled with FQDN access, the FQDNs used to add nodes to a cluster are still transformed into the corresponding IP addresses. After a Leader FE node with FQDN access enabled is elected for the cluster, the FQDNs will not be transformed into IP addresses. + +1. Navigate to the deployment directory of the Leader FE node, and run the following command to stop the Leader FE node. + + ```Shell + ./bin/stop_fe.sh + ``` + +2. Execute the following statement via your MySQL client to check whether a new Leader FE node has been elected for the cluster. + + ```SQL + SHOW PROC '/frontends'\G + ``` + + Any FE node with status `Alive` and `isMaster` being `true` is a Leader FE that is running. + +3. Execute the following statement to replace the IP address with FQDN. + + ```SQL + ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; + ``` + +4. Run the following command to start the FE node with FQDN access. + + ```Shell + ./bin/start_fe.sh --host_type FQDN --daemon + ``` + + The property `--host_type` specifies the access method that is used to start the node. Valid values include `FQDN` and `IP`. You only need to specify this property ONCE when you restart the node after you modify the node. + +5. Check the `Alive` status of the FE node. + + ```Plain + SHOW PROC '/frontends'\G + ``` + + If the `Alive` status becomes `true`, the FE node is successfully modified and added to the cluster as a Follower FE node. + +### Enable FQDN access for BE nodes + +Execute the following statement via your MySQL client to replace the IP address with FQDN to enable FQDN access for the BE node. + +```SQL +ALTER SYSTEM MODIFY BACKEND HOST "" TO ""; +``` + +> **NOTE** +> +> You DO NOT need to restart the BE node after FQDN access is enabled. + +## Rollback + +To rollback an FQDN access-enabled StarRocks cluster to an earlier version that does not support FQDN access, you must first enable IP address access for all nodes in the cluster. You can refer [Enable FQDN access in an existing cluster](#enable-fqdn-access-in-an-existing-cluster) for the general guidance except that you need to change the SQL commands to the following ones: + +- Enable IP address access for an FE node: + +```SQL +ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; +``` + +- Enable IP address access for a BE node: + +```SQL +ALTER SYSTEM MODIFY BACKEND HOST "" TO ""; +``` + +The modification takes effect after your cluster is successfully restarted. + +## FAQ + +**Q: An error occurs when I enable FQDN access for an FE node: "required 1 replica. But none were active with this master". What should I do?** + +A: Make sure the cluster has at least three Follower FE nodes before you enable FQDN access for FE nodes. + +**Q: Can I add a new node by using IP address to a cluster with FQDN access enabled?** + +A: Yes. diff --git a/docs/en/administration/management/graceful_exit.md b/docs/en/administration/management/graceful_exit.md new file mode 100644 index 0000000..0bc1be6 --- /dev/null +++ b/docs/en/administration/management/graceful_exit.md @@ -0,0 +1,276 @@ +--- +displayed_sidebar: docs +--- + +# Graceful Exit + +From v3.3 onwards, StarRocks supports Graceful Exit. + +## Overview + +Graceful Exit is a mechanism designed to support **non-disruptive upgrades and restarts** of StarRocks FE, BE, and CN nodes. Its primary objective is to minimize the impact on running queries and data ingestion tasks during maintenance operations such as node restart, rolling upgrade, or cluster scaling. + +Graceful Exit ensures that: + +- The node **stops accepting new tasks** once exit begins; +- Existing queries and load jobs are **allowed to complete** within a controlled time window; +- System components (FE/BE/CN) **coordinate status changes** so that the cluster correctly reroutes traffic. + +Graceful Exit mechanisms differ between FE and BE/CN nodes, as described below. + +### FE Graceful Exit Mechanism + +#### Trigger Signal + +FE Graceful Exit is initiated via: + +```bash +stop_fe.sh -g +``` + +This sends a `SIGUSR1` signal, while the default exit (without the `-g` option) sends `SIGTERM` signal. + +#### Load Balancer Awareness + +Upon receiving the signal: + +- FE immediately returns **HTTP 500** on the `/api/health` endpoint. +- Load balancers detect the degraded state within ~15 seconds and stop routing new connections to this FE. + +#### Connection Drain and Shutdown Logic + +**Follower FE** + +- Handles read-only queries. +- If the FE node has no active sessions, the connection is closed immediately. +- If SQL is running, the FE node waits for execution to finish before closing the session. + +**Leader FE** + +- Read request handling is identical to the of Followers. +- Write request handling requires: + + - Closing BDBJE. + - Allowing a new Leader election to complete. + - Redirecting subsequent writes to the newly elected Leader. + +#### Timeout Control + +If a query runs for too long, FE forcibly exits after **60 seconds** (configurable via the `--timeout` option). + +### BE/CN Graceful Exit Mechanism + +#### Trigger Signal + +BE Graceful Exit is initiated via: + +```bash +stop_be.sh -g +``` + +CN Graceful Exit is initiated via: + +```bash +stop_cn.sh -g +``` + +This sends a `SIGTERM` signal, while the default exit (without the `-g` option) sends `SIGKILL` signal. + +#### State Transition + +After receiving the signal: + +- The BE/CN node marks itself as **exiting**. +- It rejects **new query fragments** by returning `INTERNAL_ERROR`. +- It continues processing existing fragments. + +#### Wait Loop for In-Flight Queries + +The behavior that BE/CN waits for existing fragments to finish is controlled by the BE/CN configuration `loop_count_wait_fragments_finish` (Default: 2). The actual wait duration equals `loop_count_wait_fragments_finish × 10 seconds` (that is, 20 seconds by default). If fragments remain after timeout, BE/CN proceeds with normal shutdown (closing threads, network, and other processes). + +#### Improved FE Awareness + +From v3.4 onwards, FE no longer marks BE/CN as `DEAD` based on heartbeat failures. It correctly recognizes the BE/CN “exiting” state, allowing significantly longer graceful-exit windows for fragments to be completed. + +## Configurations + +### FE Configurations + +#### `stop_fe.sh -g --timeout` + +- Description: Maximum waiting time before FE is force-killed. +- Default: 60 (seconds) +- How to apply: Specify it in the script command, for example, `--timeout 120`. + +#### *Minimum LB detection time* + +- Description: LB requires at least 15 seconds to detect degraded health. +- Default: 15 (seconds) +- How to apply: Fixed value + +### BE/CN Configurations + +#### `loop_count_wait_fragments_finish` + +- Description: BE/CN wait duration for existing fragments. Multiply the value with 10 seconds. +- Default: 2 +- How to apply: Modify it in the BE/CN configuration file or update it dynamically. + +#### `graceful_exit_wait_for_frontend_heartbeat` + +- Description: Whether BE/CN waits for FE to confirm **SHUTDOWN** via heartbeat. From v3.4.5 onwards. +- Default: false +- How to apply: Modify it in the BE/CN configuration file or update it dynamically. + +#### `stop_be.sh -g --timeout`, `stop_cn.sh -g --timeout` + +- Description: Maximum waiting time before BE/CN is force-killed. Set it to a value larger than `loop_count_wait_fragments_finish` * 10 to prevent termination before the BE/CN wait duration is reached. +- Default: false +- How to apply: Specify it in the script command, for example, `--timeout 30`. + +### Global Switches + +Graceful Exit is **enabled by default** from v3.4 onwards. To disable it temporarily, set the BE/CN configuration `loop_count_wait_fragments_finish` to `0`. + +## Expected Behavior During Graceful Exit + +### Query Workloads + +| Query Type | Expected Behavior | +| ----------------------------------- | ----------------------------------------------------------------------------------- | +| **Short (less than 20 seconds)** | BE/CN waits long enough, so queries complete normally. | +| **Medium (20 to 60 seconds)** | Queries completed within BE/CN wait window are returned with success; else queries are cancelled and require manual retry. | +| **Long (more than 60 seconds)** | Queries are likely terminated by FE/BE/CN due to timeout and requires manual retry. | + +### Data Ingestion Tasks + +- **Loading tasks via Flink or Kafka Connectors** are automatically retried with no user-visible interruption. +- **Stream Load (non-framework), Broker Load, and Routine Load tasks** may fail if connection breaks. Manual retry is required. +- **Background tasks** are automatically re-scheduled and executed by the FE retry mechanism. + +### Upgrade and Restart Operations + +Graceful Exit ensures: + +- No cluster-wide downtime; +- Safe rolling upgrade by draining nodes one at a time. + +## Limitations and Version Differences + +### Version Behavior Differences + +| Version | Behavior | +| --------- | ------------------------------------------------------------------------------------------------------------------------- | +| **v3.3** | BE Graceful Exit flawed: FE may prematurely mark BE/CN as `DEAD`, causing queries to get cancelled. The effective wait is limited (15 seconds by default). | +| **v3.4+** | Fully supports extended wait duration; FE correctly identifies BE/CN “exiting” state. Recommended for production. | + +### Operational Limitations + +- In extreme cases (for example, BE/CN hangs), Graceful Exit may fail. Terminating the process requires `kill -9`, risking partial data persistence (recoverable via snapshot). + +## Usage + +### Prerequisites + +**StarRocks version**: + +- **v3.3+**: Basic Graceful Exit support. +- **v3.4+**: Enhanced status management, longer wait windows (up to several minutes). + +**Configuration**: + +- Make sure `loop_count_wait_fragments_finish` is set to a positive integer. +- Set `graceful_exit_wait_for_frontend_heartbeat` to `true` allow FE to detect BE's "EXITING" state. + +### Perform FE Graceful Exit + +```bash +./bin/stop_fe.sh -g --timeout 60 +``` + +Parameters: + +- `--timeout`: The maximum time to wait before the FE node is force-killed. + +Behavior: + +- The system sends the `SIGUSR1` signal first. +- After timeout, it falls back to `SIGKILL`. + +#### Validate FE State + +You can check the FE health via the following API: + +``` +http://:8030/api/health +``` + +LB removes the node after receiving consecutive non-200 responses. + +### Perform BE/CN Graceful Exit + +- **For v3.3:** + + - BE: + + ```bash + ./be/bin/stop_be.sh -g + ``` + + - CN: + + ```bash + ./be/bin/stop_cn.sh -g + ``` + +- **For v3.4+:** + + - BE: + + ```bash + ./bin/stop_be.sh -g --timeout 600 + ``` + + - CN: + + ```bash + ./bin/stop_cn.sh -g --timeout 600 + ``` + +BE/CN exits immediately if no fragments remain. + +#### Validate BE/CN Status + +Run on FE: + +```sql +SHOW BACKENDS; +``` + +`StatusCode`: + +- `SHUTDOWN`: BE/CN Graceful Exit in progress. +- `DISCONNECTED`: BE/CN Node has fully exited. + +## Rolling Upgrade Workflow + +### Procedure + +1. Perform Graceful Exit on the node `A`. +2. Confirm the node `A` is shown as `DISCONNECTED` from the FE side. +3. Upgrade and restart the node `A`. +4. Repeat the above for remaining nodes. + +### Monitor Graceful Exit + +Check FE logs `fe.log`, BE logs `be.log`, or CN logs `cn.log` to make sure whether there were tasks during the exit. + +## Troubleshooting + +### BE/CN exits by timeout + +If tasks fail to complete within the Graceful Exit period, BE/CN will trigger forced termination (`SIGKILL`). Verify whether this is caused by excessively long task duration or improper configurations (for example, an overly small `--timeout` value). + +### Node status is not SHUTDOWN + +If the node status is not `SHUTDOWN`, verify whether `loop_count_wait_fragments_finish` is set to a positive integer, or if BE/CN reported a heartbeat before exiting (if not, set `graceful_exit_wait_for_frontend_heartbeat` to `true`). diff --git a/docs/en/administration/management/logs.md b/docs/en/administration/management/logs.md new file mode 100644 index 0000000..442cfe0 --- /dev/null +++ b/docs/en/administration/management/logs.md @@ -0,0 +1,272 @@ +When deploying and operating StarRocks, understanding and properly using the logging system is critical for troubleshooting, performance analysis, and system tuning. This article provides a detailed overview of the log file types, typical content, configuration methods, and log rolling and retention strategies for both the Frontend (FE) and Backend (BE or CN) components of StarRocks. + +The information in this document is based on StarRocks Version 3.5.x. + +## FE Logging in detail + +### `fe.log` + +The main FE logs include the startup process, cluster state changes, DML/DQL requests, and scheduling-related information. These logs primarily record the behavior of the FE during its runtime. + +#### Configuration + +- `sys_log_dir`: Log storage directory. Default is `${STARROCKS_HOME}/log` +- `sys_log_level`: Log level. Default is `INFO` +- `sys_log_roll_num`: Controls the number of retained log files to prevent unlimited growth from consuming too much disk space. Default is 10 +- `sys_log_roll_interval`: Specifies the rotation frequency. Default is `DAY`, meaning logs are rotated daily +- `sys_log_delete_age`: Controls how long to keep old log files before deletion. Default is 7 day +- `sys_log_roll_mode`: Log rotation mode. Default is `SIZE-MB-1024`, meaning a new log file will be created when the current one reaches 1024 MB. Together with sys_log_roll_interval, this indicates that FE logs can be rotated either daily or based on file size +- `sys_log_enable_compress`: Controls whether log compression is enabled. Default is false, meaning compression is disabled + +### `fe.warn.log` + +`fe.warn.log` is an important log file for system monitoring and troubleshooting: +- Operations monitoring–monitors system health status +- Fault diagnosis–quickly locates critical issues +- Performance analysis–identifies system bottlenecks and anomalies +- Security auditing–records permission and access errors +Compared to `fe.log`, which records logs of all levels, `fe.warn.log` focuses on warnings and errors that require attention, helping operations personnel quickly identify and address system issues. + +### `fe.gc.log` + +`fe.gc.log` is the Java garbage collection log of the StarRocks FE, used to monitor and analyze JVM garbage collection behavior. + +It’s important to note that this log file uses the native JVM log rotation mechanism. For example, you can enable automatic rotation based on file size and file count with the following configuration: + +```bash +-Xlog:gc*:${LOG_DIR}/fe.gc.log:time,tags:filecount=7,filesize=100M +``` + +### `fe.out` + +`fe.out` is the standard output log file of StarRocks FE. It records the content printed to standard output (stdout) and standard error (stderr) during the runtime of the FE process. The main content includes: + +- Console output during FE startup + - JVM startup information (e.g., heap settings, GC parameters) + - FE module initialization order (Catalog, Scheduler, RPC, HTTP Server, etc.) +- Error stack traces from stderr + - Java exceptions (Exception/StackTrace) + - Uncaught errors like ClassNotFound, NullPointerException, etc. +- Output not captured by other logging systems + - Some third-party libraries that use `System.out.println()` or `e.printStackTrace()` + +You should check `fe.out` in the following scenarios: + +- FE fails to start: Check `fe.out` for Java exceptions or invalid parameter messages. +- FE crashes unexpectedly: Look for uncaught exception stack traces. +- Logging system uncertainty: If `fe.out` is the only log file with output in the FE log directory, it’s likely that `log4j` failed to initialize, or the configuration is incorrect. +By default, `fe.out` does not support automatic log rotation. + +### `fe.profile.log` + +The purpose of `fe.profile.log` is to record detailed query execution information for performance analysis. Its main functions include: + +- Query performance analysis: Logs detailed execution data for each query, including: + - Query ID, user, database, SQL statement + - Execution timing (startTime, endTime, latency) + - Resource usage (CPU, memory, number/size of scanned rows) + - Execution status (RUNNING, FINISHED, FAILED, CANCELLED) +- Runtime metrics tracking: Captures key indicators via the QueryDetail class: + - `scanRows / scanBytes`: Amount of data scanned + - `returnRows`: Number of result rows returned + - `cpuCostNs`: CPU time consumed (in nanoseconds) + - `memCostBytes`: Memory usage + - `spillBytes`: Amount of data spilled to disk +- Error diagnosis: Records error messages and stack traces for failed queries +- Resource group monitoring: Tracks query execution metrics across different resource groups +`fe.profile.log` is stored in JSON format. + +#### Configuration + +- `enable_profile_log`: Whether to enable profile logging +- `profile_log_dir`: Directory for storing profile logs +- `profile_log_roll_size_mb`: Log rotation size (in MB) +- `profile_log_roll_num`: Controls the number of retained profile log files to prevent unlimited growth and excessive disk usage. Default is 5 +- `profile_log_roll_interval`: Specifies the rotation frequency. Default is DAY, meaning daily rotation. When rotation conditions are met, the latest 5 files are retained, and older files are deleted +- `profile_log_delete_age`: Controls how long old files are kept before deletion. Default is 1 day + +### `fe.internal.log` + +The purpose of `fe.internal.log` is to record logs dedicated to internal operations of the FE (Frontend), primarily for system-level auditing and debugging. Its main functions include: +1. Internal operation auditing: Logs system-initiated internal SQL executions separately from user queries +2. Statistics tracking: Specifically records operations related to statistics collection +3. Debugging support: Provides detailed logs for troubleshooting internal operation issues +The log records include entries such as: +- Statistics module (internal.statistic) +- Core system module (internal.base) + +This log is especially useful for analyzing StarRocks’ internal statistics collection process and diagnosing issues related to internal operations. + +#### Configuration + +- `internal_log_dir`: Controls the storage directory for this log +- `internal_log_modules`: An array configuring internal log modules, defining which internal operation modules need to be recorded in the `fe.internal.log` file. Default is `{"base", "statistic"}` +- `internal_log_roll_num`: Number of files to retain. Default is 90 +- `internal_log_roll_interval`: Specifies the rotation frequency. Default is DAY, meaning daily rotation. When rotation conditions are met, the latest 90 files are retained, and older files are deleted +- `internal_log_delete_age`: Controls how long old files are kept before deletion. Default is 7 days + +### `fe.audit.log` + +This is StarRocks’ query audit log, which records detailed information about all user queries and connections. It is used for monitoring, analysis, and auditing. Its main purposes include: + +1. Query monitoring: Logs the execution status and performance metrics of all SQL queries +2. User auditing: Tracks user behavior and database access +3. Performance analysis: Provides metrics such as query execution time and resource consumption +4. Issue diagnosis: Records error statuses and error codes to facilitate troubleshooting + +#### Configuration + +- `audit_log_dir`: Controls the storage directory for this log +- `audit_log_roll_num`: Number of files to retain. Default is 90 +- `audit_log_roll_interval`: Specifies the rotation frequency. Default is DAY, meaning daily rotation. When rotation conditions are met, the latest 90 files are retained, and older files are deleted +- `audit_log_delete_age`: Controls how long old files are kept before deletion. Default is 7 days +- `audit_log_json_format`: Whether to log in JSON format. Default is false +- `audit_log_enable_compress`: Whether compression is enabled + +### `fe.big_query.log` + +This is StarRocks’ dedicated Big Query log file, used to monitor and analyze queries with high resource consumption. Its structure is similar to the audit log, but it includes three additional fields: +- `bigQueryLogCPUSecondThreshold`: CPU time threshold +- `bigQueryLogScanBytesThreshold`: Scan size (in bytes) threshold +- `bigQueryLogScanRowsThreshold`: Scan row count threshold + +#### Configuration + +- `big_query_log_dir`: Controls the storage directory for this log +- `big_query_log_roll_num`: Number of files to retain. Default is 10 +- `big_query_log_modules`: Types of internal log modules. Default is query +- `big_query_log_roll_interval`: Specifies the rotation frequency. Default is DAY, meaning daily rotation. When rotation conditions are met, the latest 10 files are retained, and older files are deleted +- `big_query_log_delete_age`: Controls how long old files are kept before deletion. Default is 7 days + +### `fe.dump.log` + +This is StarRocks’ query dump log, specifically used for detailed query debugging and issue diagnosis. Its main purposes include: +- Exception debugging: Automatically records the complete query context when query execution encounters exceptions +- Issue reproduction: Provides sufficiently detailed information to reproduce query problems +- In-depth diagnosis: Contains debugging information such as metadata, statistics, execution plans, and more +- Technical support: Provides comprehensive data for the technical support team to analyze issues +It can be enabled using the following command: + +```bash +SET enable_query_dump = true; +``` + +#### Configuration + +- `dump_log_dir`: Controls the storage directory for this log +- `dump_log_roll_num`: Number of files to retain. Default is 10 +- `dump_log_modules`: Types of internal log modules. Default is query +- `dump_log_roll_interval`: Specifies the rotation frequency. Default is DAY, meaning daily rotation. When rotation conditions are met, the latest 10 files are retained, and older files are deleted +- `dump_log_delete_age`: Controls how long old files are kept before deletion. Default is 7 days + +### `fe.features.log` +This is StarRocks’ query plan feature log, used to collect and record feature information of query execution plans. It mainly serves machine learning and query optimization analysis. Key purposes include: +1. Query plan feature collection: Extracting various characteristics from query execution plans +2. Machine learning data source: Providing training data for query cost prediction models +3. Query pattern analysis: Analyzing execution patterns and feature distributions +4. Optimizer improvement: Supplying data to support enhancements in the cost-based optimizer (CBO) + +It can be enabled via configuration. + +```bash +// Enable plan feature collection +enable_plan_feature_collection = false // Disabled by default + +// Enable query cost prediction +enable_query_cost_prediction = false // Disabled by default +``` + +#### Configuration + +- `feature_log_dir`: Controls the storage directory for this log +- `feature_log_roll_num`: Number of files to retain. Default is 5 +- `feature_log_roll_interval`: Specifies the rotation frequency. Default is DAY, meaning daily rotation. When rotation conditions are met, the latest 5 files are retained, and older files are deleted +- `feature_log_delete_age`: Controls how long old files are kept before deletion. Default is 3 days +- `feature_log_roll_size_mb`: Log rotation size. Default is 1024 MB, meaning a new file is created every 1 GB + +## BE/CN Logging in detail + +### `{be or cn}.INFO.log` + +It primarily records various runtime behavior logs generated by BE/CN nodes, and these logs are at the `INFO` level. For example: + +- System startup information: + - BE process startup and initialization + - Hardware resource detection (CPU, memory, disk) + - Configuration parameter loading +- Query execution information: + - Query reception and dispatch + - Fragment execution status +- Storage-related information: + - Tablet loading and unloading + - Compaction execution process + - Data import status + - Storage space management + +#### Configuration + +- `sys_log_level`: Log level, default is `INFO` +- `sys_log_dir`: Log storage directory, default is `${STARROCKS_HOME}/log` +- `sys_log_roll_mode`: Log rotation mode, default is `SIZE-MB-1024`, meaning a new log file is created when the current one reaches 1024 MB +- `sys_log_roll_num`: Number of retained log files, default is 10 + +### `{be or cn}.WARN.log` + +`be.WARN.log` stores log entries at WARNING level and above. Examples include +Query execution warnings: +- Query execution time too long +- Memory allocation failure warning +- Operator execution exception +Storage-related warnings: +- Abnormal Tablet status +- Slow compaction execution +- Data file corruption warning +- Storage I/O exception + +#### Configuration + +- `sys_log_level`: Log level, default is INFO +- `sys_log_dir`: Log storage directory, default is `${STARROCKS_HOME}/log` +- `sys_log_roll_mode`: Log rotation mode, default is `SIZE-MB-1024`, meaning a new log file is created when the current one reaches 1024 MB +- `sys_log_roll_num`: Number of retained log files, default is 10 + +### `{be or cn}.ERROR.log` + +`be.ERROR.log` stores log entries at `ERROR` level and above. Typical error log contents: +Query Execution Errors +- Query timeout or cancellation +- Query failure due to insufficient memory +Data Processing Errors +- Data load failures (e.g., format errors, constraint violations) +- Data write failures +- Data read errors (e.g., file corruption, I/O errors) +Storage System Errors +- Tablet load failures +- Compaction execution failures +- Corrupted data files +- Disk I/O errors + +#### Configuration + +- `sys_log_level`: Log level, default is `INFO` +- `sys_log_dir`: Log storage directory, default is `${STARROCKS_HOME}/log` +- `sys_log_roll_mode`: Log rotation mode, default is `SIZE-MB-1024`, meaning a new log file is created when the current one reaches 1024 MB +- `sys_log_roll_num`: Number of retained log files, default is 10 + +### `{be or cn}.FATAL.log` + +It primarily records various runtime behavior logs generated by BE/CN nodes, and these logs are at the `FATAL` level. Once such a log is generated, the BE/CN node process will exit. + +#### Configuration + +- `sys_log_level`: Log level, default is `INFO` +- `sys_log_dir`: Log storage directory, default is `${STARROCKS_HOME}/log` +- `sys_log_roll_mode`: Log rotation mode, default is `SIZE-MB-1024`, meaning a new log file is created when the current one reaches 1024 MB +- `sys_log_roll_num`: Number of retained log files, default is 10 + +### `error_log` + +This log primarily records various errors, rejected records, and ETL issues encountered by BE/CN nodes during data import. Users can obtain the main reasons for import errors via `http://be_ip:be_port/api/get_log_file`. The log files are stored in the `${STARROCKS_HOME}/storage/error_log` directory. + +#### Configuration +- `load_error_log_reserve_hours`: How long error log files are retained. The default is 48 hours, meaning the log files will be deleted 48 hours after they are created. diff --git a/docs/en/administration/management/management.mdx b/docs/en/administration/management/management.mdx new file mode 100644 index 0000000..2b5897c --- /dev/null +++ b/docs/en/administration/management/management.mdx @@ -0,0 +1,9 @@ +--- +displayed_sidebar: docs +--- + +# Management + +import DocCardList from '@theme/DocCardList'; + + diff --git a/docs/en/administration/management/monitor_manage_big_queries.md b/docs/en/administration/management/monitor_manage_big_queries.md new file mode 100644 index 0000000..b882e07 --- /dev/null +++ b/docs/en/administration/management/monitor_manage_big_queries.md @@ -0,0 +1,256 @@ +--- +displayed_sidebar: docs +--- + +# Monitor and manage big queries + +This topic describes how to monitor and manage big queries in your StarRocks cluster. + +Big queries include queries that scan too many rows or occupy too many CPU and memory resources. They can easily exhaust cluster resources and cause system overload if no restrictions are imposed on them. To tackle this issue, StarRocks provides a series of measures to monitor and manage big queries, preventing queries from monopolizing cluster resources. + +The overall idea of handling big queries in StarRocks is as follows: + +1. Set automatic precautions against big queries with resource groups and query queues. +2. Monitor big queries in real-time, and terminate those who bypass the precautions. +3. Analyze audit logs and Big Query Logs to study the patterns of big queries, and fine-tune the precaution mechanisms you set earlier. + +This feature is supported from v3.0. + +## Set precautions against big queries + +StarRocks provides two precautionary instruments for dealing with big queries - resource groups and query queues. You can use resource groups to stop big queries from being executed. Query queues, on the other hand, can help you queue the incoming queries when the concurrency threshold or resource limit is reached, preventing system overload. + +### Filter out big queries via resource groups + +Resource groups can automatically identify and terminate big queries. When creating a resource group, you can specify the upper limit of CPU time, memory usage, or scan row count that a query is entitled to. Among all queries that hit the resource group, any queries that require more resources are rejected and returned with an error. For more information and instructions on resource groups, see [Resource Isolation](../../administration/management/resource_management/resource_group.md). + +Before creating resource groups, you must execute the following statement to enable Pipeline Engine, on which the Resource Group feature depends: + +```SQL +SET GLOBAL enable_pipeline_engine = true; +``` + +The following example creates a resource group `bigQuery` that limits the CPU time upper limit to `100` seconds, scan row count upper limit to `100000`, and memory usage upper limit to `1073741824` bytes (1 GB): + +```SQL +CREATE RESOURCE GROUP bigQuery +TO + (db='sr_hub') +WITH ( + 'cpu_weight' = '10', + 'mem_limit' = '20%', + 'big_query_cpu_second_limit' = '100', + 'big_query_scan_rows_limit' = '100000', + 'big_query_mem_limit' = '1073741824' +); +``` + +If the required resources of a query exceed any of the limits, the query will not be executed and is returned with an error. The following example shows the error message returned when a query demands too many scan rows: + +```Plain +ERROR 1064 (HY000): exceed big query scan_rows limit: current is 4 but limit is 1 +``` + +If it is your first time setting up resource groups, we recommend you set relatively higher limits so that they will not hinder regular queries. You can fine-tune these limits after you have a better knowledge of the big query patterns. + +### Ease system overload via query queues + +Query queues are designed to cushion the system overload deterioration when the cluster resource occupation exceeds the prespecified thresholds. You can set thresholds for maximum concurrency, memory usage, and CPU usage. StarRocks automatically queues the incoming queries when any of these thresholds is reached. Pending queries either wait in the queue for execution or get cancelled when the prespecified resource threshold is reached. For more information, see [Query Queues](../../administration/management/resource_management/query_queues.md). + +Execute the following statements to enable query queues for the SELECT queries: + +```SQL +SET GLOBAL enable_query_queue_select = true; +``` + +After the query queue feature is enabled, you can then define the rules to trigger query queues. + +- Specify the concurrency threshold for triggering the query queue. + + The following example sets the concurrency threshold to `100`: + + ```SQL + SET GLOBAL query_queue_concurrency_limit = 100; + ``` + +- Specify the memory usage ratio threshold for triggering the query queue. + + The following example sets the memory usage ratio threshold to `0.9`: + + ```SQL + SET GLOBAL query_queue_mem_used_pct_limit = 0.9; + ``` + +- Specify the CPU usage ratio threshold for triggering the query queue. + + The following example sets the CPU usage permille (CPU usage * 1000) threshold to `800`: + + ```SQL + SET GLOBAL query_queue_cpu_used_permille_limit = 800; + ``` + +You can also decide how to deal with these queued queries by configuring the maximum queue length and the timeout for each pending query in the queue. + +- Specify the maximum query queue length. When this threshold is reached, incoming queries are rejected. + + The following example sets the query queue length to `100`: + + ```SQL + SET GLOBAL query_queue_max_queued_queries = 100; + ``` + +- Specify the maximum timeout of a pending query in a queue. When this threshold is reached, the corresponding query is rejected. + + The following example sets the maximum timeout to `480` seconds: + + ```SQL + SET GLOBAL query_queue_pending_timeout_second = 480; + ``` + +You can check whether a query is pending using [SHOW PROCESSLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_PROCESSLIST.md). + +```Plain +mysql> SHOW PROCESSLIST; ++------+------+---------------------+-------+---------+---------------------+------+-------+-------------------+-----------+ +| Id | User | Host | Db | Command | ConnectionStartTime | Time | State | Info | IsPending | ++------+------+---------------------+-------+---------+---------------------+------+-------+-------------------+-----------+ +| 2 | root | xxx.xx.xxx.xx:xxxxx | | Query | 2022-11-24 18:08:29 | 0 | OK | SHOW PROCESSLIST | false | ++------+------+---------------------+-------+---------+---------------------+------+-------+-------------------+-----------+ +``` + +If `IsPending` is `true`, the corresponding query is pending in the query queue. + +## Monitor big queries in real-time + +From v3.0 onwards, StarRocks supports viewing the queries that are currently processed in the cluster and the resources they occupy. This allows you to monitor the cluster in case any big queries bypass the precautions and cause unexpected system overload. + +### Monitor via MySQL client + +1. You can view the queries that are currently processed (`current_queries`) using [SHOW PROC](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_PROC.md). + + ```SQL + SHOW PROC '/current_queries'; + ``` + + StarRocks returns the query ID (`QueryId`), connection ID (`ConnectionId`), and the resource consumption of each query, including the scanned data size (`ScanBytes`), processed row count (`ProcessRows`), CPU time (`CPUCostSeconds`), memory usage (`MemoryUsageBytes`), and execution time (`ExecTime`). + + ```Plain + mysql> SHOW PROC '/current_queries'; + +--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+ + | QueryId | ConnectionId | Database | User | ScanBytes | ProcessRows | CPUCostSeconds | MemoryUsageBytes | ExecTime | + +--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+ + | 7c56495f-ae8b-11ed-8ebf-00163e00accc | 4 | tpcds_100g | root | 37.88 MB | 1075769 Rows | 11.13 Seconds | 146.70 MB | 3804 | + | 7d543160-ae8b-11ed-8ebf-00163e00accc | 6 | tpcds_100g | root | 13.02 GB | 487873176 Rows | 81.23 Seconds | 6.37 GB | 2090 | + +--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+ + 2 rows in set (0.01 sec) + ``` + +2. You can further examine a query's resource consumption on each BE node by specifying the query ID. + + ```SQL + SHOW PROC '/current_queries//hosts'; + ``` + + StarRocks returns the query's scanned data size (`ScanBytes`), scanned row count (`ScanRows`), CPU time (`CPUCostSeconds`), and memory usage (`MemUsageBytes`) on each BE node. + + ```Plain + mysql> show proc '/current_queries/7c56495f-ae8b-11ed-8ebf-00163e00accc/hosts'; + +--------------------+-----------+-------------+----------------+---------------+ + | Host | ScanBytes | ScanRows | CpuCostSeconds | MemUsageBytes | + +--------------------+-----------+-------------+----------------+---------------+ + | 172.26.34.185:8060 | 11.61 MB | 356252 Rows | 52.93 Seconds | 51.14 MB | + | 172.26.34.186:8060 | 14.66 MB | 362646 Rows | 52.89 Seconds | 50.44 MB | + | 172.26.34.187:8060 | 11.60 MB | 356871 Rows | 52.91 Seconds | 48.95 MB | + +--------------------+-----------+-------------+----------------+---------------+ + 3 rows in set (0.00 sec) + ``` + +### Monitor via FE console + +In addition to MySQL client, you can use the FE console for visualized, interactive monitoring. + +1. Navigate to the FE console in your browser using the following URL: + + ```Bash + http://:/system?path=//current_queries + ``` + + ![FE console 1](../../_assets/console_1.png) + + You can view the queries that are currently processed and their resource consumption on the **System Info** page. + +2. Click the **QueryID** of the query. + + ![FE console 2](../../_assets/console_2.png) + + You can view the detailed, node-specific resource consumption information on the page that appears. + +### Manually terminate big queries + +If any big queries bypass the precautions you have set and threaten the system availability, you can terminate them manually using the corresponding connection ID in the [KILL](../../sql-reference/sql-statements/cluster-management/nodes_processes/KILL.md) statement: + +```SQL +KILL QUERY ; +``` + +## Analyze Big Query Logs + +From v3.0 onwards, StarRocks supports Big Query Logs, which are stored in the file **fe/log/fe.big_query.log**. Compared to the StarRocks audit logs, Big Query Logs print additional three fields: + +- `bigQueryLogCPUSecondThreshold` +- `bigQueryLogScanBytesThreshold` +- `bigQueryLogScanRowsThreshold` + +These three fields correspond to the resource consumption thresholds you defined to determine whether a query is a big query. + +To enable Big Query Logs, execute the following statement: + +```SQL +SET GLOBAL enable_big_query_log = true; +``` + +After Big Query Logs are enabled, you can then define the rules to trigger Big Query Logs. + +- Specify the CPU time threshold for triggering Big Query Logs. + + The following example sets the CPU time threshold to `600` seconds: + + ```SQL + SET GLOBAL big_query_log_cpu_second_threshold = 600; + ``` + +- Specify the scan data size threshold for triggering Big Query Logs. + + The following example sets the scan data size threshold to `10737418240` bytes (10 GB): + + ```SQL + SET GLOBAL big_query_log_scan_bytes_threshold = 10737418240; + ``` + +- Specify the scan row count threshold for triggering Big Query Logs. + + The following example sets the scan row count threshold to `1500000000`: + + ```SQL + SET GLOBAL big_query_log_scan_rows_threshold = 1500000000; + ``` + +## Fine-tune precautions + +From the statistics obtained from real-time monitoring and Big Query Logs, you can study the pattern of the omitted big queries (or regular queries that are mistakenly diagnosed as big queries) in your cluster, and then optimize the settings for resource groups and the query queue. + +If a notable proportion of big queries conform to a certain SQL pattern, and you want to permanently forbid this SQL pattern, you can add this pattern to SQL Blacklist. StarRocks rejects all queries that match any patterns specified in SQL Blacklist, and returns an error. For more information, see [Manage SQL Blacklist](../../administration/management/resource_management/Blacklist.md). + +To enable SQL Blacklist, execute the following statement: + +```SQL +ADMIN SET FRONTEND CONFIG ("enable_sql_blacklist" = "true"); +``` + +Then you can add the regular expression that represents the SQL pattern to SQL Blacklist using [ADD SQLBLACKLIST](../../sql-reference/sql-statements/cluster-management/sql_blacklist/ADD_SQLBLACKLIST.md). + +The following example adds `COUNT(DISTINCT)` to SQL Blacklist: + +```SQL +ADD SQLBLACKLIST "SELECT COUNT(DISTINCT .+) FROM .+"; +``` diff --git a/docs/en/administration/management/monitoring/Monitor_and_Alert.md b/docs/en/administration/management/monitoring/Monitor_and_Alert.md new file mode 100644 index 0000000..b5d5773 --- /dev/null +++ b/docs/en/administration/management/monitoring/Monitor_and_Alert.md @@ -0,0 +1,913 @@ +--- +displayed_sidebar: docs +--- + +# Monitor and Alert with Prometheus and Grafana + +StarRocks provides a monitor and alert solution by using Prometheus and Grafana. This allows you to visualize the running of your cluster, facilitating monitoring and troubleshooting. + +## Overview + +StarRocks provides a Prometheus-compatible information collection interface. Prometheus can retrieve metric information of StarRocks by connecting to the HTTP ports of BE and FE nodes and storing the information in its own time-series database. Grafana can then use Prometheus as a data source to visualize the metric information. By using the dashboard templates provided by StarRocks, you can easily monitor your StarRocks cluster and set alerts for it with Grafana. + +![MA-1](../../../_assets/monitor/monitor1.png) + +Follow these steps to integrate your StarRocks cluster with Prometheus and Grafana: + +1. Install necessary components - Prometheus and Grafana. +2. Understand the core monitoring metrics of StarRocks. +3. Set alert channel and alert rule. + +## Step 1: Install Monitoring Components + +The default ports of Prometheus and Grafana do not conflict with those of StarRocks. However, it is recommended to deploy them on a different server from that of your StarRocks clusters for production. This reduces the risk of resource conflicts and avoids potential alert failure due to the server's abnormal shutdown. + +Additionally, please note that Prometheus and Grafana cannot monitor their own service's availability. Therefore, in a production environment, it is recommended to use Supervisor to set up a heartbeat service for them. + +The following tutorial deploys monitoring components on the monitoring node (IP: 192.168.110.23) using the root OS user. They monitor the following StarRocks cluster (which uses default ports). When setting up a monitoring service for your own StarRocks cluster based on this tutorial, you only need to replace the IP addresses. + +| **Host** | **IP** | **OS user** | **Services** | +| -------- | --------------- | --------------- | ------------ | +| node01 | 192.168.110.101 | root | 1 FE + 1 BE | +| node02 | 192.168.110.102 | root | 1 FE + 1 BE | +| node03 | 192.168.110.103 | root | 1 FE + 1 BE | + +> **NOTE** +> +> Prometheus and Grafana can only monitor FE, BE, and CN nodes, not Broker nodes. + +### 1.1 Deploy Prometheus + +#### 1.1.1 Download Prometheus + +For StarRocks, you only need to download the installation package of the Prometheus server. Download the package to the monitoring node. + +[Click here to download Prometheus](https://prometheus.io/download/). + +Take the LTS version v2.45.0 as an example, click the package to download it. + +![MA-2](../../../_assets/monitor/monitor2.png) + +Alternatively, you can download it using the `wget` command: + +```Bash +# The following example downloads the LTS version v2.45.0. +# You can download other versions by replacing the version number in the command. +wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz +``` + +After the download is complete, upload or copy the installation package to the directory **/opt** on the monitoring node. + +#### 1.1.2 Install Prometheus + +1. Navigate to **/opt** and decompress the Prometheus installation package. + + ```Bash + cd /opt + tar xvf prometheus-2.45.0.linux-amd64.tar.gz + ``` + +2. For ease of management, rename the decompressed directory to **prometheus**. + + ```Bash + mv prometheus-2.45.0.linux-amd64 prometheus + ``` + +3. Create a data storage path for Prometheus. + + ```Bash + mkdir prometheus/data + ``` + +4. For ease of management, you can create a system service startup file for Prometheus. + + ```Bash + vim /etc/systemd/system/prometheus.service + ``` + + Add the following content to the file: + + ```Properties + [Unit] + Description=Prometheus service + After=network.target + + [Service] + User=root + Type=simple + ExecReload=/bin/sh -c "/bin/kill -1 `/usr/bin/pgrep prometheus`" + ExecStop=/bin/sh -c "/bin/kill -9 `/usr/bin/pgrep prometheus`" + ExecStart=/opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=30GB + + [Install] + WantedBy=multi-user.target + ``` + + Then, save and exit the editor. + + > **NOTE** + > + > If you deploy Prometheus under a different path, please make sure to synchronize the path in the ExecStart command in the file above. Additionally, the file configures the expiration conditions for Prometheus data storage to be "30 days or more" or "greater than 30 GB". You can modify this according to your needs. + +5. Modify the Prometheus configuration file **prometheus/prometheus.yml**. This file has strict requirements for the format of the content. Please pay special attention to spaces and indentation when making modifications. + + ```Bash + vim prometheus/prometheus.yml + ``` + + Add the following content to the file: + + ```YAML + global: + scrape_interval: 15s # Set the global scrape interval to 15s. The default is 1 min. + evaluation_interval: 15s # Set the global rule evaluation interval to 15s. The default is 1 min. + scrape_configs: + - job_name: 'StarRocks_Cluster01' # A cluster being monitored corresponds to a job. You can customize the StarRocks cluster name here. + metrics_path: '/metrics' # Specify the Restful API for retrieving monitoring metrics. + static_configs: + # The following configuration specifies an FE group, which includes 3 FE nodes. + # Here, you need to fill in the IP and HTTP ports corresponding to each FE. + # If you modified the HTTP ports during cluster deployment, make sure to adjust them accordingly. + - targets: ['192.168.110.101:8030','192.168.110.102:8030','192.168.110.103:8030'] + labels: + group: fe + # The following configuration specifies a BE group, which includes 3 BE nodes. + # Here, you need to fill in the IP and HTTP ports corresponding to each BE. + # If you modified the HTTP ports during cluster deployment, make sure to adjust them accordingly. + - targets: ['192.168.110.101:8040','192.168.110.102:8040','192.168.110.103:8040'] + labels: + group: be + ``` + + :::note + Please note that Prometheus is unable to detect the service changes (`targets`) after the cluster has been scaled in or out. For example, for clusters deployed on AWS, you can grant the EC2 instance that hosts the Prometheus service the `ec2:DescribeInstances` and `ec2:DescribeTags` permissions, and add the `ec2_sd_configs` and `relabel_configs` properties to **prometheus/prometheus.yml**. For detailed instructions, see [Appendix - Enable Service Detection for Prometheus](#enable-service-detection-for-prometheus). + ::: + + After you have modified the configuration file, you can use `promtool` to verify whether the modification is valid. + + ```Bash + ./prometheus/promtool check config prometheus/prometheus.yml + ``` + + The following prompt indicates that the check has passed. You can then proceed. + + ```Bash + SUCCESS: prometheus/prometheus.yml is valid prometheus config file syntax + ``` + +6. Start Prometheus. + + ```Bash + systemctl daemon-reload + systemctl start prometheus.service + ``` + +7. Check the status of Prometheus. + + ```Bash + systemctl status prometheus.service + ``` + + If `Active: active (running)` is returned, it indicates that Prometheus has started successfully. + + You can also use `netstat` to check the status of the default Prometheus port (9090). + + ```Bash + netstat -nltp | grep 9090 + ``` + +8. Set Prometheus to start on boot. + + ```Bash + systemctl enable prometheus.service + ``` + +**Other commands**: + +- Stop Prometheus. + + ```Bash + systemctl stop prometheus.service + ``` + +- Restart Prometheus. + + ```Bash + systemctl restart prometheus.service + ``` + +- Reload configurations on runtime. + + ```Bash + systemctl reload prometheus.service + ``` + +- Disable start on boot. + + ```Bash + systemctl disable prometheus.service + ``` + +#### 1.1.3 Access Prometheus + +You can access the Prometheus Web UI through a browser, and the default port is 9090. For the monitoring node in this tutorial, you need to visit `192.168.110.23:9090`. + +On the Prometheus homepage, navigate to **Status** --> **Targets** in the top menu. Here, you can see all the monitored nodes for each group job configured in the **prometheus.yml** file. Usually, the status of all nodes should be UP, indicating that the service communication is normal. + +![MA-3](../../../_assets/monitor/monitor3.jpeg) + +At this point, Prometheus is configured and set up. For more detailed information, you can refer to the [Prometheus Documentation](https://prometheus.io/docs/). + +### 1.2 Deploy Grafana + +#### 1.2.1 Download Grafana + +[Click here to download Grafana](https://grafana.com/grafana/download). + +Alternatively, you can use the `wget` command to download the Grafana RPM installation package. + +```Bash +# The following example downloads the LTS version v10.0.3. +# You can download other versions by replacing the version number in the command. +wget https://dl.grafana.com/enterprise/release/grafana-enterprise-10.0.3-1.x86_64.rpm +``` + +#### 1.2.2 Install Grafana + +1. Use the `yum` command to install Grafana. This command will automatically install the dependencies required for Grafana. + + ```Bash + yum -y install grafana-enterprise-10.0.3-1.x86_64.rpm + ``` + +2. Start Grafana. + + ```Bash + systemctl start grafana-server.service + ``` + +3. Check the status of Grafana. + + ```Bash + systemctl status grafana-server.service + ``` + + If `Active: active (running)` is returned, it indicates that Grafana has started successfully. + + You can also use `netstat` to check the status of the default Grafana port (3000). + + ```Bash + netstat -nltp | grep 3000 + ``` + +4. Set Grafana to start on boot. + + ```Bash + systemctl enable grafana-server.service + ``` + +**Other commands**: + +- Stop Grafana. + + ```Bash + systemctl stop grafana-server.service + ``` + +- Restart Grafana. + + ```Bash + systemctl restart grafana-server.service + ``` + +- Disable start on boot. + + ```Bash + systemctl disable grafana-server.service + ``` + +For more information, refer to the [Grafana Documentation](https://grafana.com/docs/grafana/latest/). + +#### 1.2.3 Access Grafana + +You can access the Grafana Web UI through a browser, and the default port is 3000. For the monitoring node in this tutorial, you need to visit `192.168.110.23:3000`. The default username and password required for login are both set to `admin`. Upon the initial login, Grafana will prompt you to change the default login password. If you want to skip this for now, you can click `Skip`. Then, you will be re-directed to the Grafana Web UI homepage. + +![MA-4](../../../_assets/monitor/monitor4.png) + +#### 1.2.4 Configure data sources + +Click on the menu button in the upper-left corner, expand **Administration**, and then click **Data sources**. + +![MA-5](../../../_assets/monitor/monitor5.png) + +On the page that appears, click **Add data source**, and then choose **Prometheus**. + +![MA-6](../../../_assets/monitor/monitor6.png) + +![MA-7](../../../_assets/monitor/monitor7.png) + +To integrate Grafana with your Prometheus service, you need to modify the following configuration: + +- **Name**: The name of the data source. You can customize the name for the data source. + + ![MA-8](../../../_assets/monitor/monitor8.png) + +- **Prometheus Server URL**: The URL of the Prometheus server, which, in this tutorial, is `http://192.168.110.23:9090`. + + ![MA-9](../../../_assets/monitor/monitor9.png) + +After the configuration is complete, click **Save & Test** to save and test the configuration. If **Successfully queried the Prometheus API** is displayed, it means the data source is accessible. + +![MA-10](../../../_assets/monitor/monitor10.png) + +#### 1.2.5 Configure Dashboard + +1. Download the corresponding Dashboard template based on your StarRocks version. + + - [Dashboard template for All Architecture](https://releases.starrocks.io/resources/Dashboard-All-Arch-20260113.json) + - [Dashboard template for Shared-data Cluster - General](https://releases.starrocks.io/resources/Dashboard-Shared-data-General-3.5.json) + - [Dashboard template for Shared-data Cluster - Starlet](https://releases.starrocks.io/resources/Dashboard-Shared-data-Starlet-3.5.json) + + > **NOTE** + > + > The template file needs to be uploaded through the Grafana Web UI. Therefore, you need to download the template file to the machine you use to access Grafana, not the monitoring node itself. + +2. Configure the Dashboard template. + + Click on the menu button in the upper-left corner and click **Dashboards**. + + ![MA-11](../../../_assets/monitor/monitor11.png) + + On the page that appears, expand the **New** button and click **Import**. + + ![MA-12](../../../_assets/monitor/monitor12.png) + + On the new page, click on **Upload Dashboard JSON file** and upload the template file you downloaded earlier. + + ![MA-13](../../../_assets/monitor/monitor13.png) + + After uploading the file, you can rename the Dashboard. By default, it is named `StarRocks Overview`. Then, select the data source, which is the one you created earlier (`starrocks_monitor`). Then, click **Import**. + + ![MA-14](../../../_assets/monitor/monitor14.png) + + After the import is complete, you should see the StarRocks Dashboard displayed. + + ![MA-15](../../../_assets/monitor/monitor15.png) + +#### 1.2.6 Monitor StarRocks via Grafana + +Log in to the Grafana Web UI, click on the menu button in the upper-left corner, and click **Dashboards**. + +![MA-16](../../../_assets/monitor/monitor16.png) + +On the page that appears, select **StarRocks Overview** from the **General** directory. + +![MA-17](../../../_assets/monitor/monitor17.png) + +After you enter the StarRocks monitoring Dashboard, you can manually refresh the page in the upper-right corner or set the automatic refresh interval for monitoring the StarRocks cluster status. + +![MA-18](../../../_assets/monitor/monitor18.png) + +## Step 2: Understand the core monitoring metrics + +To accommodate the needs of development, operations, DBA, and more, StarRocks provides a wide range of monitoring metrics. This section only introduces some important metrics commonly used in business and their alert rules. For other metric details, please refer to [Monitoring Metrics](./metrics.md). + +### 2.1 Metrics for FE and BE status + +| **Metric** | **Description** | **Alert rule** | **Note** | +| ---------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| Frontends Status | FE Node Status. The status of a live node is represented by `1`, while a node that is down (DEAD) will be displayed as `0`. | The status of all FE nodes should be alive, and any FE node with a status of DEAD should trigger an alert. | The failure of any FE or BE nodes is considered critical, and it requires prompt troubleshooting to identify the cause of failure. | +| Backends Status | BE Node Status. The status of a live node is represented by `1`, while a node that is down (DEAD) will be displayed as `0`. | The status of all BE nodes should be alive, and any BE node with a status of DEAD should trigger an alert. | | + +### 2.2 Metrics for query failure + +| **Metric** | **Description** | **Alert rule** | **Note** | +| ----------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| Query Error | The query failure (including timeout) rate within one minute. Its value is calculated as the number of failed queries in one minute divided by 60 seconds. | You can configure this based on the actual QPS of your business. 0.05, for example, can be used as a preliminary setting. You can adjust it later as needed. | Usually, the query failure rate should be kept low. Setting this threshold to 0.05 means allowing a maximum of 3 failed queries per minute. If you receive the alert from this item, you can check resource utilization or configure the query timeout appropriately. | + +### 2.3 Metrics for external operation failure + +| **Metric** | **Description** | **Alert rule** | **Note** | +| ------------- | ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| Schema Change | The Schema Change operation failure rate. | Schema Change is a low-frequency operation. You can set this item to send an alert immediately upon failure. | Usually, Schema Change operations should not fail. If an alert is triggered for this item, you can consider increasing the memory limit of Schema Change operations, which is set to 2GB by default. | + +### 2.4 Metrics for internal operation failure + +| **Metric** | **Description** | **Alert rule** | **Note** | +| ------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| BE Compaction Score | The highest Compaction Score among all BE nodes, indicating the current compaction pressure. | In typical offline scenarios, this value is usually lower than 100. However, when there are a large number of loading tasks, the Compaction Score may increase significantly. In most cases, intervention is required when this value exceeds 800. | Usually, if the Compaction Score is greater than 1000, StarRocks will return an error "Too many versions". In such cases, you may consider reducing the loading concurrency and frequency. | +| Clone | The tablet clone operation failure rate. | You can set this item to send an alert immediately upon failure. | If an alert is triggered for this item, you can check the status of BE nodes, disk status, and network status. | + +### 2.5 Metrics for service availability + +| **Metric** | **Description** | **Alert rule** | **Note** | +| -------------- | ------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| Meta Log Count | The number of BDB metadata log entries on the FE node. | It is recommended to configure this item to trigger an immediate alert if it exceeds 100,000. | By default, the leader FE node triggers a checkpoint to flush the log to disk when the number of logs exceeds 50,000. If this value exceeds 50,000 by a large margin, it usually indicates a checkpoint failure. You can check whether the Xmx heap memory configuration is reasonable in **fe.conf**. | + +### 2.6 Metrics for system load + +| **Metric** | **Description** | **Alert rule** | **Note** | +| -------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | +| BE CPU Idle | CPU idle rate of the BE node. | It is recommended to configure this item to trigger an alert if the idle rate is lower than 10% for 30 consecutive seconds. | This item is used to monitor CPU resource bottlenecks. CPU usage can fluctuate significantly, and setting a small polling interval may result in false alerts. Therefore, you need to adjust this item based on the actual business conditions. If you have multiple batch processing tasks or a large number of queries, you may consider setting a lower threshold. | +| BE Mem | Memory usage for the BE node. | It is recommended to configure this item to 90% of the available memory size for each BE. | This value is equivalent to the value of Process Mem, and BE's default memory limit is 90% of the server's memory size (controlled by configuration `mem_limit` in **be.conf**). If you have deployed other services on the same server, be sure to adjust this value to avoid OOM. The alert threshold for this item should be set to 90% of BE's actual memory limit so that you can confirm whether BE memory resources have reached a bottleneck. | +| Disks Avail Capacity | Available disk space ratio (percentage) of the local disks on each BE node. | It is recommended to configure this item to trigger an alert if the value is less than 20%. | It is recommended to reserve sufficient available space for StarRocks based on your business requirements. | +| FE JVM Heap Stat | JVM heap memory usage percentage for each FE node in the cluster. | It is recommended to configure this item to trigger an alert if the value is greater than or equal to 80%. | If an alert is triggered for this item, it is recommended to increase the Xmx heap memory configuration in **fe.conf**; otherwise, it may affect query efficiency or lead to FE OOM issues. | + +## Step 3: Configure alert via Email + +### 3.1 Configure SMTP service + +Grafana supports various alerting solutions, such as email and webhooks. This tutorial uses email as an example. + +To enable email alerting, you first need to configure SMTP information in Grafana, allowing Grafana to send emails to your mailbox. Most commonly used email providers support SMTP services, and you need to enable SMTP service for your email account and obtain an authorization code. + +After completing these steps, modify the Grafana configuration file on the node where Grafana is deployed. + +```bash +vim /usr/share/grafana/conf/defaults.ini +``` + +Example: + +```Properties +###################### SMTP / Emailing ##################### +[smtp] +enabled = true +host = +user = johndoe@gmail.com +# If the password contains # or ; you have to wrap it with triple quotes.Ex """#password;""" +password = ABCDEFGHIJKLMNOP # The authorization password obtained after enabling SMTP. +cert_file = +key_file = +skip_verify = true ## Verify SSL for SMTP server +from_address = johndoe@gmail.com ## Address used when sending out emails. +from_name = Grafana +ehlo_identity = +startTLS_policy = + +[emails] +welcome_email_on_sign_up = false +templates_pattern = emails/*.html, emails/*.txt +content_types = text/html +``` + +You need to modify the following configuration items: + +- `enabled`: Whether to allow Grafana to send email alerts. Set this item to `true`. +- `host`: The SMTP server address and port for your email, separated by a colon (`:`). Example: `smtp.gmail.com:465`. +- `user`: SMTP username. +- `password`: The authorization password obtained after enabling SMTP. +- `skip_verify`: Whether to skip SSL verification for the SMTP server. Set this item to `true`. +- `from_address`: The email address used to send alert emails. + +After the configuration is complete, restart Grafana. + +```bash +systemctl daemon-reload +systemctl restart grafana-server.service +``` + +### 3.2 Create alert channel + +You need to create an alert channel (Contact Point) in Grafana to specify how to notify contacts when an alert is triggered. + +1. Log in to the Grafana Web UI, click on the menu button in the upper-left corner, expand **Alerting**, and select **Contact Points**. On the **Contact points** page, click **Add contact point** to create a new alert channel. + + ![MA-19](../../../_assets/monitor/monitor19.png) + +2. In the **Name** field, customize the name of the contact point. Then, in the **Integration** dropdown list, select **Email**. + + ![MA-20](../../../_assets/monitor/monitor20.png) + +3. In the **Addresses** field, enter the email addresses of the contacts to receive the alert. If there are multiple email addresses, separate the addresses using semicolons (`;`), commas (`,`), or line breaks. + + The configurations on the page can be left with their default values except for the following two items: + + - **Single email**: When enabled, if there are multiple contacts, the alert will be sent to them through a single email. It's recommended to enable this item. + - **Disable resolved message**: By default, when the issue causing the alert is resolved, Grafana sends another notification notifying the service recovery. If you don't need this recovery notification, you can disable this item. It's not recommended to disable this option. + +4. After the configuration is complete, click the **Test** button in the upper-right corner of the page. In the prompt that appears, click **Sent test notification**. If your SMTP service and address configuration are correct, the target email account should receive a test email with the subject "TestAlert Grafana". Once you confirm that you can receive the test alert email successfully, click the **Save contact point** button at the bottom of the page to complete the configuration. + + ![MA-21](../../../_assets/monitor/monitor21.png) + + ![MA-22](../../../_assets/monitor/monitor22.png) + +You can configure multiple notification methods for each contact point through "Add contact point integration", which will not be detailed here. For more details about Contact Points, you can refer to the [Grafana Documentation](https://grafana.com/docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/contact-points/) + +For subsequent demonstration, let's assume that in this step, you have created two contact points, "StarRocksDev" and "StarRocksOp", using different email addresses. + +### 3.3 Set notification policies + +Grafana uses notification policies to associate contact points with alert rules. Notification policies use matching labels to provide a flexible way to route different alerts to different contacts, allowing for alert grouping during O&M. + +1. Log in to the Grafana Web UI, click on the menu button in the upper-left corner, expand **Alerting**, and select **Notification policies**. + + ![MA-23](../../../_assets/monitor/monitor23.png) + +2. On the **Notification policies** page, click the more (**...**) icon to the right of **Default policy** and click **Edit** to modify the Default policy. + + ![MA-24](../../../_assets/monitor/monitor24.png) + + ![MA-25](../../../_assets/monitor/monitor25.png) + + Notification policies use a tree-like structure, and the Default policy represents the default root policy for notification. When no other policies are set, all alert rules will default to matching this policy. It will then use the default contact point configured within it for notifications. + + 1. In the **Default contact point** field, select the contact point you created previously, for example, "StarRocksOp". + + 2. **Group by** is a key concept in Grafana Alerting, grouping alert instances with similar characteristics into a single funnel. This tutorial does not involve grouping, and you can use the default setting. + + ![MA-26](../../../_assets/monitor/monitor26.png) + + 3. Expand the **Timing options** field and configure **Group wait**, **Group interval**, and **Repeat interval**. + + - **Group wait**: The time waiting for the initial notification to send after the new alert creates a new group. Default 30 seconds. + - **Group interval**: The interval at which alerts are sent for an existing group. Defaults to 5 minutes, which means that notifications will not be sent to this group any sooner than 5 minutes since the previous alert was sent. This means that notifications will not be sent any sooner than 5 minutes (default) since the last batch of updates were delivered, regardless of whether the alert rule interval for those alert instances was lower. Default 5 minutes. + - **Repeat interval**: The waiting time to resend an alert after they have successfully been sent. The interval at which alerts are sent for an existing group. Defaults to 5 minutes, which means that notifications will not be sent to this group any sooner than 5 minutes since the previous alert was sent. + + You can configure the parameters as shown below so that Grafana will send the alert by these rules: 0 seconds (Group wait) after the **alert conditions are met**, Grafana will send the first alert email. After that, Grafana will re-send the alert every 1 minute (Group interval + Repeat interval). + + ![MA-27](../../../_assets/monitor/monitor27.png) + + > **NOTE** + > + > The previous paragraph uses "meeting the alert conditions" rather than "reaching the alert threshold" to avoid false alerts. It's recommended to set the alert to be triggered a certain duration of time after the threshold has been reached. + +3. After the configuration is complete, click **Update default policy**. + +4. If you need to create a nested policy, click on **New nested policy** on the **Notification policies** page. + + Nested policies use labels to define matching rules. The labels defined in a nested policy can be used as conditions to match when configuring alert rules later. The following example configures a label as `Group=Development_team`. + + ![MA-28](../../../_assets/monitor/monitor28.png) + + In the **Contact point** field, select "StarRocksDev". This way, when configuring alert rules with the label `Group=Development_team`, "StarRocksDev" is set to receive the alerts. + + You can have the nested policy inherit the timing options from the parent policy. After the configuration is complete, click **Save policy** to save the policy. + + ![MA-29](../../../_assets/monitor/monitor29.png) + +If you are interested in the details of notification policies or if your business has more complex alerting scenarios, you can refer to the [Grafana Documentation](https://grafana.com/docs/grafana-cloud/alerting-and-irm/alerting/fundamentals/notifications/contact-points/) for more information. + +### 3.4 Define alert rules + +After setting up notification policies, you also need to define alert rules for StarRocks. + +Log in to the Grafana Web UI, and search for and navigate to the previously configured StarRocks Overview Dashboard. + +![MA-30](../../../_assets/monitor/monitor30.png) + +![MA-31](../../../_assets/monitor/monitor31.jpeg) + +#### 3.4.1 FE and BE status alert rule + +For a StarRocks cluster, the status of all FE and BE nodes must be alive. Any node with a status of DEAD should trigger an alert. + +The following example uses the Frontends Status and Backends Status metrics under StarRocks Overview to monitor FE and BE status. As you can configure multiple StarRocks clusters in Prometheus, note that the Frontends Status and Backends Status metrics are for all clusters that you have registered. + +##### Configure the alert rule for FE + +Follow these procedures to configure alerts for **Frontends Status**: + +1. Click on the More (...) icon to the right of the **Frontends Status** monitoring item, and click **Edit**. + + ![MA-32](../../../_assets/monitor/monitor32.jpeg) + +2. On the new page, choose **Alert**, then click **Create alert rule** from this panel to enter the rule creation page. + + ![MA-33](../../../_assets/monitor/monitor33.jpeg) + +3. Set the rule name in the **Rule name** field. The default value is the title of the monitoring metric. If you have multiple clusters, you can add the cluster name as a prefix for differentiation, for example, "[PROD]Frontends Status". + + ![MA-34](../../../_assets/monitor/monitor34.jpeg) + +4. Configure the alert rule as follows. + + 1. Choose **Grafana managed alert**. + 2. For section **B**, modify the rule as `(up{group="fe"})`. + 3. Click on the delete icon on the right of section **A** to remove section **A**. + 4. For section **C**, modify the **Input** field to **B**. + 5. For section **D**, modify the condition to `IS BELOW 1`. + + After completing these settings, the page will appear as shown below: + + ![MA-35](../../../_assets/monitor/monitor35.jpeg) + +
+ Click to view detailed instructions + + Configuring alert rules in Grafana typically involves three steps: + + 1. Retrieve the metric values from Prometheus through PromQL queries. PromQL is a data query DSL language developed by Prometheus, and it is also used in the JSON templates of Dashboards. The `expr` property of each monitoring item corresponds to the respective PromQL. You can click **Run queries** on the rule settings page to view the query results. + 2. Apply functions and modes to process the result data from the above queries. Usually, you need to use the Last function to retrieve the latest value and use Strict mode to ensure that if the returned value is non-numeric data, it can be displayed as `NaN`. + 3. Set rules for the processed query results. Taking FE as an example, if the FE node status is alive, the output result is `1`. If the FE node is down, the result is `0`. Therefore, you can set the rule to `IS BELOW 1`, meaning an alert will be triggered when this condition occurs. +
+ +5. Set up alert evaluation rules. + + According to the Grafana documentation, you need to configure the frequency for evaluating alert rules and the frequency at which their status changes. In simple terms, this involves configuring "how often to check with the alert rules" and "how long the abnormal state must persist after detection before triggering the alert (to avoid false alerts caused by transient spikes)". Each Evaluation group contains an independent evaluation interval to determine the frequency of checking the alert rules. You can create a new folder named **PROD** specifically for the StarRocks production cluster and create a new Evaluation group `01` within it. Then, configure this group to check every `10` seconds, and trigger the alert if the anomaly persists for `30` seconds. + + ![MA-36](../../../_assets/monitor/monitor36.png) + + > **NOTE** + > + > The previously mentioned "Disable resolved message" option in the alert channel configuration section, which controls the timing of sending emails for cluster service recovery, is also influenced by the "Evaluate every" parameter above. In other words, when Grafana performs a new check and detects that the service has recovered, it sends an email to notify the contacts. + +6. Add alert annotations. + + In the **Add details for your alert rule** section, click **Add annotation** to configure the content of the alert email. Please note not to modify the **Dashboard UID** and **Panel ID** fields. + + ![MA-37](../../../_assets/monitor/monitor37.jpeg) + + In the **Choose** drop-down list, select **Description**, and add the descriptive content for the alert email, for example, "FE node in your StarRocks production cluster failed, please check!" + +7. Match notification policies. + + Specify the notification policy for the alert rule. By default, all alert rules match the Default policy. When the alert condition is met, Grafana will use the "StarRocksOp" contact point in the Default policy to send alert messages to the configured email group. + + ![MA-38](../../../_assets/monitor/monitor38.jpeg) + + If you want to use a nested policy, set the **Label** field to the corresponding nested policy, for example, `Group=Development_team`. + + Example: + + ![MA-39](../../../_assets/monitor/monitor39.jpeg) + + When the alert condition is met, emails will be sent to "StarRocksDev" instead of "StarRocksOp" in the Default policy. + +8. Once all configurations are complete, click **Save rule and exit**. + + ![MA-40](../../../_assets/monitor/monitor40.jpeg) + +##### Test alert trigger + +You can manually stop an FE node to test the alert. At this point, the heart-shaped symbol to the right of Frontends Status will change from green to yellow and then to red. + +**Green**: Indicates that during the last periodic check, the status of each instance of the metric item was normal, and no alert was triggered. The green status does not guarantee that the current node is in normal status. There may be a delay in status change after a node service anomaly, but typically, the delay is not in the order of minutes. + +![MA-41](../../../_assets/monitor/monitor41.png) + +**Yellow**: Indicates that during the last periodic check, an instance of the metric item was found abnormal, but the abnormal state duration has not yet reached the "Duration" configured above. At this point, Grafana will not send an alert and will continue periodic checks until the abnormal state duration reaches the configured "Duration". During this period, if the status is restored, the symbol will change back to green. + +![MA-42](../../../_assets/monitor/monitor42.jpeg) + +**Red**: When the abnormal state duration reaches the configured "Duration", the symbol changes to red, and Grafana will send an email alert. The symbol will remain red until the abnormal state is resolved, at which point it will change back to green. + +![MA-43](../../../_assets/monitor/monitor43.jpeg) + +##### Manually pause Alerts + +Suppose the anomaly requires an extended period for resolution or the alerts are continuously triggered for some reasons other than anomaly. You can temporarily pause the evaluation of the alert rule to prevent Grafana from persistently sending alert emails. + +Navigate to the Alert tab corresponding to the metric item on the Dashboard and click the edit icon: + +![MA-44](../../../_assets/monitor/monitor44.png) + +In the **Alert Evaluation Behavior** section, toggle the **Pause Evaluation** switch to the ON position. + +![MA-45](../../../_assets/monitor/monitor45.png) + +> **NOTE** +> +> After pausing the evaluation, you will receive an email notifying you that the service is restored. + +##### Configure the alert rule for BE + +You can follow the above process to configure alert rules for BE. + +Editing the configuration for the metric item Backends Status: + +1. In the **Set an alert rule name** section, configure the name as "[PROD]Backends Status". +2. In the **Set a query and alert condition** section, set PromSQL to `(up{group="be"})`, and use the same settings as those in the FE alert rule for other items. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "BE node in your StarRocks production cluster failed, please check! Stack information for BE failure will be printed in the BE log file **be.out**. You can identify the cause based on the logs". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +#### 3.4.2 Query alert rule + +The metric item for query failures is **Query Error** under **Query Statistic**. + +Configure the alert rule for the metric item "Query Error" as follows: + +1. In the **Set an alert rule name** section, configure the name as "[PROD] Query Error". +2. In the **Set a query and alert condition** section, remove section **B**. Set **Input** in section **A** to **C**. In section **C**, use the default value for PromQL, which is `rate(starrocks_fe_query_err{job="StarRocks_Cluster01"}[1m])`, representing the number of failed queries per minute divided by 60s. This includes both failed queries and queries that exceeded the timeout limit. Then, in section **D**, configure the rule as `A IS ABOVE 0.05`. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "High query failure rate, please check the resource usage or configure query timeout reasonably. If queries are failing due to timeouts, you can adjust the query timeout by setting the system variable `query_timeout`". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +#### 3.4.3 User operation failure alert rule + +This item monitors the rate of Schema Change operation failures, corresponding to the metric item **Schema Change** under **BE tasks**. It should be configured to alert when greater than 0. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] Schema Change". +2. In the **Set a query and alert condition** section, remove section **A**. Set **Input** in section **C** to **B**. In section **B**, use the default value for PromQL, which is `irate(starrocks_be_engine_requests_total{job="StarRocks_Cluster01", type="create_rollup", status="failed"}[1m])`, representing the number of failed Schema Change tasks per minute divided by 60s. Then, in section **D**, configure the rule as `C IS ABOVE 0`. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Failed Schema Change tasks detected, please check promptly. You can increase the memory limit available for Schema Change by adjusting the BE configuration parameter `memory_limitation_per_thread_for_schema_change`, which is set to 2GB by default". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +#### 3.4.4 StarRocks operation failure alert rule + +##### BE Compaction Score + +This item corresponds to **BE Compaction Score** under **Cluster Overview**, and is used to monitor the compaction pressure on the cluster. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] BE Compaction Score". +2. In the **Set a query and alert condition** section, configure the rule in section C as `B IS ABOVE 0`. You can use default values for other items. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "High compaction pressure. Please check whether there are high-frequency or high concurrency loading tasks and reduce the loading frequency. If the cluster has sufficient CPU, memory, and I/O resources, consider adjusting the cluster compaction strategy". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +##### Clone + +This item corresponds to **Clone** in **BE tasks** and is mainly used to monitor replica balancing or replica repair operations within StarRocks, which usually should not fail. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] Clone". +2. In the **Set a query and alert condition** section, remove section A. Set **Input** in section **C** to **B**. In section **B**, use the default value for PromQL, which is `irate(starrocks_be_engine_requests_total{job="StarRocks_Cluster01", type="clone", status="failed"}[1m])`, representing the number of failed Clone tasks per minute divided by 60s. Then, in section **D**, configure the rule as `C IS ABOVE 0`. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Detected a failure in the clone task. Please check the cluster BE status, disk status, and network status". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +#### 3.4.5 Service availability alert rule + +This item monitors the metadata log count in BDB, corresponding to the **Meta Log Count** monitoring item under **Cluster Overview**. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] Meta Log Count". +2. In the **Set a query and alert condition** section, configure the rule in section **C** as `B IS ABOVE 100000`. You can use default values for other items. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Detected that the metadata count in FE BDB is significantly higher than the expected value, which can indicate a failed Checkpoint operation. Please check whether the Xmx heap memory configuration in the FE configuration file **fe.conf** is reasonable". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +#### 3.4.6 System overload alert rule + +##### BE CPU Idle + +This item monitors the CPU idle rate on BE nodes. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] BE CPU Idle". +2. In the **Set a query and alert condition** section, configure the rule in section C as `B IS BELOW 10`. You can use default values for other items. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Detected that BE CPU load is consistently high. It will impact other tasks in the cluster. Please check whether the cluster is abnormal or if there is a CPU resource bottleneck". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +##### BE Memory + +This item corresponds to **BE Mem** under **BE**, monitoring the memory usage on BE nodes. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] BE Mem". +2. In the **Set a query and alert condition** section, configure PromSQL as `starrocks_be_process_mem_bytes{job="StarRocks_Cluster01"}/(*1024*1024*1024)`, where `` needs to be replaced with the current BE node's available memory limit, that is, the server's memory size multiplied by the value of the BE configuration item `mem_limit`. Example: `starrocks_be_process_mem_bytes{job="StarRocks_Cluster01"}/(49*1024*1024*1024)`. Then, in section **C**, configure the rule as `B IS ABOVE 0.9`. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Detected that BE memory usage is consistently high. To prevent query failure, please consider expanding memory size or adding BE nodes". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +##### Disks Avail Capacity + +This item corresponds to **Disk Usage** under **BE**, monitoring the remaining space ratio in the directory where the BE storage path is located. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] Disks Avail Capacity". +2. In the **Set a query and alert condition** section, configure the rule in section **C** as `B`` IS BELOW 0.2`. You can use default values for other items. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Detected that BE disk available space is below 20%, please release disk space or expand the disk". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +##### FE JVM Heap Stat + +This item corresponds to **Cluster FE JVM Heap Stat** under **Overview**, monitoring the proportion of FE's JVM memory usage to FE heap memory limit. + +1. In the **Set an alert rule name** section, configure the name as "[PROD] FE JVM Heap Stat". +2. In the **Set a query and alert condition** section, configure the rule in section **C** as `B IS ABOVE 80`. You can use default values for other items. +3. In the **Alert evaluation behavior** section, choose the **PROD** directory and Evaluation group **01** created earlier, and set the duration to 30 seconds. +4. In the **Add details for your alert rule** section, click **Add annotation**, select **Description**, and input the alert content, for example, "Detected that FE heap memory usage is high, please adjust the heap memory limit in the FE configuration file **fe.conf**". +5. In the **Notifications** section, configure **Labels** the same as the FE alert rule. If Labels are not configured, Grafana will use the Default policy and send alert emails to the "StarRocksOp" alert channel. + +## Appendix + +### Enable Service Detection for Prometheus + +You can enable Service Detection for Prometheus so that it can automatically detect the services (nodes) after the cluster is scaled in or out. + +:::note +The following section uses AWS as an example. +::: + +1. Grant the EC2 instance that hosts your Prometheus service the following permissions using IAM Policy: + + ```JSON + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Action": [ + "ec2:DescribeInstances", + "ec2:DescribeTags" + ], + "Resource": "*" + } + ] + } + ``` + + For detailed instructions of authentication to AWS resources, see [Authenticate to AWS resources](../../../integrations/authenticate_to_aws_resources.md). + + With these permissions, Prometheus is able to list the instances and their tags in the region. + +2. Add the `ec2_sd_configs` and `relabel_configs` sections to **prometheus/prometheus.yml**. + + Example: + + ```Yaml + global: + scrape_interval: 15s # Set the global scrape interval to 15s. The default is 1 min. + evaluation_interval: 15s # Set the global rule evaluation interval to 15s. The default is 1 min. + scrape_configs: + - job_name: 'StarRocks_Cluster01' + metrics_path: '/metrics' + # highlight-start + ec2_sd_configs: + - region: us-west-2 + port: 8030 + filters: + - name: tag:ClusterName + values: ['test-stage-20251021'] + - name: tag:ProcessType + values: ['FE'] + - region: us-west-2 + port: 8040 + filters: + - name: tag:ClusterName + values: ['test-stage-20251021'] + - name: tag:ProcessType + values: ['BE'] + relabel_configs: + - source_labels: [__meta_ec2_tag_ClusterName] + regex: test-stage-20251021 + target_label: cluster + replacement: test-stage-20251021 + - source_labels: [__meta_ec2_tag_ProcessType] + regex: FE + target_label: group + replacement: fe + - source_labels: [__meta_ec2_tag_ProcessType] + regex: BE + target_label: group + replacement: be + # highlight-end + ``` + +## Q&A + +### Q: Why can't the Dashboard detect anomalies? + +A: Grafana Dashboard relies on the system time of the server it is hosted on to fetch values for monitoring items. If the Grafana Dashboard page remains unchanged after a cluster anomaly, you can check if the servers' system clock is synchronized and then perform cluster time calibration. + +### Q: How can I implement alert grading? + +A: Taking the Query Error item as an example, you can create two alert rules for it with different alert thresholds. For example: + +- **Risk Level**: Set the failure rate greater than 0.05, indicating a risk. Send the alert to the development team. +- **Severity Level**: Set the failure rate greater than 0.20, indicating a severity. At this point, the alert notification will be sent to both the development and operations teams simultaneously. + +### Q: How can I retrieve more detailed metrics, including table-level metrics, materialized view metrics, and connection statistics with user labels? + +A: By default, the `/metrics` endpoint collects metrics in a minified mode to minimize performance impact. To retrieve detailed metrics, you need to add specific parameters to the request and provide Basic Authentication credentials for a user with ADMIN privileges. + +**Supported Parameters:** + +- `with_table_metrics=all`: Collects all table-level metrics. +- `with_materialized_view_metrics=all`: Collects all materialized view metrics. +- `with_user_connections=all`: Collects connection statistics categorized by user labels. + +**Authentication Requirement:** + +These parameters take effect only when the request includes valid Basic Authentication credentials for an ADMIN user. If the request is anonymous or the user lacks ADMIN privileges, these parameters are ignored, and only default metrics are returned. + +**Example Curl Command:** + +```bash +curl -u : \ +"http://:/metrics?with_table_metrics=all&with_materialized_view_metrics=all&with_user_connections=all" +``` + +**Prometheus Configuration Example:** + +To enable detailed metric collection in Prometheus, configure `params` and `basic_auth` in your `prometheus.yml`: + +```yaml +scrape_configs: + - job_name: 'StarRocks_Detailed_Metrics' + metrics_path: '/metrics' + params: + with_table_metrics: ['all'] + with_materialized_view_metrics: ['all'] + with_user_connections: ['all'] + basic_auth: + username: '' + password: '' + static_configs: + - targets: [':'] +``` + +:::note + +Collecting all table and materialized view metrics may increase the load on the FE node. Use these parameters with caution in large-scale environments. + +::: diff --git a/docs/en/administration/management/monitoring/alert.md b/docs/en/administration/management/monitoring/alert.md new file mode 100644 index 0000000..2bdc656 --- /dev/null +++ b/docs/en/administration/management/monitoring/alert.md @@ -0,0 +1,759 @@ +--- +displayed_sidebar: docs +toc_max_heading_level: 4 +--- + +# Manage Alerts + +This topic introduces various alert items from different dimensions, including business continuity, cluster availability, and machine load, and provides corresponding resolutions. + +:::note + +In the following examples, all variables are prefixed with `$`. They should be replaced according to your business environment. For example, `$job_name` should be replaced with the corresponding Job Name in the Prometheus configuration, and `$fe_leader` should be replaced with the IP address of the Leader FE. + +::: + +## Service Suspension Alerts + +### FE Service Suspension + +**PromSQL** + +```Plain +count(up{group="fe", job="$job_name"}) >= 3 +``` + +**Alert Description** + +An alert is triggered when the number of active FE nodes falls below a specified value. You can adjust this value based on the actual number of FE nodes. + +**Resolution** + +Try to restart the suspended FE node. + +### BE Service Suspension + +**PromSQL** + +```Plain +node_info{type="be_node_num", job="$job_name",state="dead"} > 1 +``` + +**Alert Description** + +An alert is triggered when more than one BE node is suspended. + +**Resolution** + +Try to restart the suspended BE node. + +## Machine Load Alerts + +### BE CPU Alert + +**PromSQL** + +```Plain +(1-(sum(rate(starrocks_be_cpu{mode="idle", job="$job_name",instance=~".*"}[5m])) by (job, instance)) / (sum(rate(starrocks_be_cpu{job="$job_name",host=~".*"}[5m])) by (job, instance))) * 100 > 90 +``` + +**Alert Description** + +An alert is triggered when BE CPU Utilization exceeds 90%. + +**Resolution** + +Check whether there are large queries or large-scale data loading and forward the details to the support team for further investigation. + +1. Use the `top` command to check resource usage by processes. + + ```Bash + top -Hp $be_pid + ``` + +2. Use the `perf` command to collect and analyze performance data. + + ```Bash + # Execute the command for 1-2 minutes, and terminate it by pressing CTRL+C. + sudo perf top -p $be_pid -g >/tmp/perf.txt + ``` + +:::note + +In emergencies, to quickly restore service, you can try to restart the corresponding BE node after preserving the stack. An emergency here refers to a situation where the BE node's CPU utilization remains abnormally high, and no effective means are available to reduce CPU usage. + +::: + +### Memory Alert + +**PromSQL** + +```Plain +(1-node_memory_MemAvailable_bytes{instance=~".*"}/node_memory_MemTotal_bytes{instance=~".*"})*100 > 90 +``` + +**Alert Description** + +An alert is triggered when memory usage exceeds 90%. + +**Resolution** + +Refer to the [Get Heap Profile](https://github.com/StarRocks/starrocks/pull/35322) for troubleshooting. + +:::note + +- In emergencies, you can try to restart the corresponding BE service to restore the service. An emergency here refers to a situation where the BE node's memory usage remains abnormally high, and no effective means are available to reduce memory usage. +- If other mixed-deployed services are affecting the system, you may consider terminating those services in emergencies. + +::: + +### Disk Alerts + +#### Disk Load Alert + +**PromSQL** + +```SQL +rate(node_disk_io_time_seconds_total{instance=~".*"}[1m]) * 100 > 90 +``` + +**Alert Description** + +An alert is triggered when disk load exceeds 90%. + +**Resolution** + +If the cluster triggers a `node_disk_io_time_seconds_total` alert, first check if there are any business changes. If so, consider rolling back the changes to maintain the previous resource balance. If no changes are identified or rollback is not possible, consider whether normal business growth is driving the need for resource expansion. You can use the `iotop` tool to analyze disk I/O usage. `iotop` has a UI similar to `top` and includes information such as `pid`, `user`, and I/O. + +You can also use the following SQL query to identify the tablets consuming significant I/O and trace them back to specific tasks and tables. + +```SQL +-- "all" indicates all services. 10 indicates the collection lasts 10 seconds. 3 indicates fetching the top 3 results. +ADMIN EXECUTE ON $backend_id 'System.print(ExecEnv.io_profile_and_get_topn_stats("all", 10, 3))'; +``` + +#### Root Path Capacity Alert + +**PromSQL** + +```SQL +node_filesystem_free_bytes{mountpoint="/"} /1024/1024/1024 < 5 +``` + +**Alert Description** + +An alert is triggered when the available space in the root directory is less than 5GB. + +**Resolution** + +Common directories that may occupy significant space include **/var**, **/****opt**, and **/tmp**. Use the following command to check for large files and clear unnecessary files. + +```Bash +du -sh / --max-depth=1 +``` + +#### Data Disk Capacity Alert + +**PromSQL** + +```Bash +(SUM(starrocks_be_disks_total_capacity{job="$job"}) by (host, path) - SUM(starrocks_be_disks_avail_capacity{job="$job"}) by (host, path)) / SUM(starrocks_be_disks_total_capacity{job="$job"}) by (host, path) * 100 > 90 +``` + +**Alert Description** + +An alert is triggered when disk capacity utilization exceeds 90%. + +**Resolution** + +1. Check if there have been changes in the loaded data volume. + + Monitor the `load_bytes` metric in Grafana. If there has been a significant increase in data loading volume, you may need to scale the system resources. + +2. Check for any DROP operations. + + If data loading volume has not changed much, run `SHOW BACKENDS`. If the reported disk usage does not match the actual usage, check the FE Audit Log for recent DROP DATABASE, TABLE, or PARTITION operations. + + Metadata for these operations remains in FE memory for one day, allowing you to restore data using the RECOVER statement within 24 hours to avoid misoperations. After recovery, the actual disk usage may exceed what is shown in `SHOW BACKENDS`. + + The retention period of deleted data in memory can be adjusted using the FE dynamic parameter `catalog_trash_expire_second` (default value: 86400). + + ```Bash + ADMIN SET FRONTEND CONFIG ("catalog_trash_expire_second"="86400"); + ``` + + To persist this change, add the configuration item to the FE configuration file **fe.conf**. + + After that, deleted data will be moved to the **trash** directory on BE nodes (`$storage_root_path/trash`). By default, deleted data is kept in the **trash** directory for one day, which may also result in the actual disk usage exceeding what is shown in `SHOW BACKENDS`. + + The retention time of deleted data in the **trash** directory can be adjusted using the BE dynamic parameter `trash_file_expire_time_sec` (default value: 86400). + + ```Bash + curl http://$be_ip:$be_http_port/api/update_config?trash_file_expire_time_sec=86400 + ``` + +#### FE Metadata Disk Capacity Alert + +**PromSQL** + +```Bash +node_filesystem_free_bytes{mountpoint="${meta_path}"} /1024/1024/1024 < 10 +``` + +**Alert Description** + +An alert is triggered when the available disk space for FE metadata is less than 10GB. + +**Resolution** + +Use the following commands to check for directories occupying large amounts of space and clear unnecessary files. The metadata path is specified by the `meta_dir` configuration in **fe.conf**. + +```Bash +du -sh /${meta_dir} --max-depth=1 +``` + +If the metadata directory occupies a lot of space, it is usually because the **bdb** directory is large, possibly due to CheckPoint failure. Refer to the [CheckPoint Failure Alert](#checkpoint-failure-alert) for troubleshooting. If this method does not solve the issue, contact the technical support team. + +## Cluster Service Exception Alerts + +### Compaction Failure Alerts + +#### Cumulative Compaction Failure Alert + +**PromSQL** + +```Bash +increase(starrocks_be_engine_requests_total{job="$job_name" ,status="failed",type="cumulative_compaction"}[1m]) > 3 +increase(starrocks_be_engine_requests_total{job="$job_name" ,status="failed",type="base_compaction"}[1m]) > 3 +``` + +**Alert Description** + +An alert is triggered when there are three failures in Cumulative Compaction or Base Compaction within the last minute. + +**Resolution** + +Search the log of the corresponding BE node for the following keywords to identify the involved tablet. + +```Bash +grep -E 'compaction' be.INFO | grep failed +``` + +A log record like the following indicates a Compaction failure. + +```Plain +W0924 17:52:56:537041 123639 comaction_task_cpp:193] compaction task:8482. tablet:8423674 failed. +``` + +You can check the context of the log to analyze the failure. Typically, the failure may have been caused by a DROP TABLE or PARTITION operation during the Compaction process. The system has an internal retry mechanism for Compaction, and you can also manually set the tablet's status to BAD and trigger a Clone task to repair it. + +:::note + +Before performing the following operation, ensure that the table has at least three complete replicas. + +::: + +```Bash +ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "$tablet_id", "backend_id" = "$backend_id", "status" = "bad"); +``` + +#### High Compaction Pressure Alert + +**PromSQL** + +```Bash +starrocks_fe_max_tablet_compaction_score{job="$job_name",instance="$fe_leader"} > 100 +``` + +**Alert Description** + +An alert is triggered when the highest Compaction Score exceeds 100, indicating high Compaction pressure. + +**Resolution** + +This alert is typically caused by frequent loading, `INSERT INTO VALUES`, or `DELETE` operations (at a rate of 1 per second). It is recommended to set the interval between loading or DELETE tasks to more than 5 seconds and avoid submitting high concurrency DELETE tasks. + +#### Exceeding Version Count Alert + +**PromSQL** + +```Bash +starrocks_be_max_tablet_rowset_num{job="$job_name"} > 700 +``` + +**Alert Description** + +An alert is triggered when a tablet on a BE node has more than 700 data versions. + +**Resolution** + +Use the following command to check the tablet with excessive versions: + +```SQL +SELECT BE_ID,TABLET_ID FROM information_schema.be_tablets WHERE NUM_ROWSET>700; +``` + +Example for Tablet with ID `2889156`: + +```SQL +SHOW TABLET 2889156; +``` + +Execute the command returned in the `DetailCmd` field: + +```SQL +SHOW PROC '/dbs/2601148/2889154/partitions/2889153/2889155/2889156'; +``` + +![show proc replica](../../../_assets/alert_show_proc_3.png) + +Under normal circumstances, as shown, all three replicas should be in `NORMAL` status, and other metrics like `RowCount` and `DataSize` should remain consistent. If only one replica exceeds the version limit of 700, you can trigger a Clone task based on other replicas using the following command: + +```SQL +ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "$tablet_id", "backend_id" = "$backend_id", "status" = "bad"); +``` + +If two or more replicas exceed the version limit, you can temporarily increase the version count limit: + +```Bash +# Replace be_ip with the IP of the BE node which stores the tablet that exceeds the version limit. +# The default be_http_port is 8040. +# The default value of tablet_max_versions is 1000. +curl -XPOST http://$be_ip:$be_http_port/api/update_config?tablet_max_versions=2000 +``` + +### CheckPoint Failure Alert + +**PromSQL** + +```Bash +starrocks_fe_meta_log_count{job="$job_name",instance="$fe_master"} > 100000 +``` + +**Alert Description** + +An alert is triggered when the FE node's BDB log count exceeds 100,000. By default, the system performs a CheckPoint when the BDB log count exceeds 50,000, and then resets the count to 0. + +**Resolution** + +This alert indicates that a CheckPoint was not performed. You need to investigate the FE logs to analyze the CheckPoint process and resolve the issue: + +In the **fe.log** of the Leader FE node, search for records like `begin to generate new image: image.xxxx`. If found, it means the system has started generating a new image. Continue checking the logs for records like `checkpoint finished save image.xxxx` to confirm successful image creation. If you find `Exception when generate new image file`, the image generation failed. You should carefully handle the metadata based on the specific error. It is recommended to contact the support team for further analysis. + +### Excessive FE Thread Count Alert + +**PromSQL** + +```Bash +starrocks_fe_thread_pool{job="$job_name", type!="completed_task_count"} > 3000 +``` + +**Alert Description** + +An alert is triggered when the number of threads on the FE exceeds 3000. + +**Resolution** + +The default thread count limit for FE and BE nodes is 4096. A large number of UNION ALL queries typically lead to an excessive thread count. It is recommended to reduce the concurrency of UNION ALL queries and adjust the system variable `pipeline_dop`. If it is not possible to adjust SQL query granularity, you can globally adjust `pipeline_dop`: + +```SQL +SET GLOBAL pipeline_dop=8; +``` + +:::note + +In emergencies, to restore services quickly, you can increase the FE dynamic parameter `thrift_server_max_worker_threads` (default value: 4096). + +```SQL +ADMIN SET FRONTEND CONFIG ("thrift_server_max_worker_threads"="8192"); +``` + +::: + +### High FE JVM Usage Alert + +**PromSQL** + +```SQL +sum(jvm_heap_size_bytes{job="$job_name", type="used"}) * 100 / sum(jvm_heap_size_bytes{job="$job_name", type="max"}) > 90 +``` + +**Alert Description** + +An alert is triggered when the JVM usage on an FE node exceeds 90%. + +**Resolution** + +This alert indicates that JVM usage is too high. You can use the `jmap` command to analyze the situation. Since detailed monitoring information for this metric is still under development, direct insights are limited. Perform the following actions and send the results to the support team for analysis: + +```Bash +# Note that specifying `live` in the command may cause FE to restart. +jmap -histo[:live] $fe_pid > jmap.dump +``` + +:::note + +In emergencies, to quickly restore services, you can restart the corresponding FE node or increase the JVM (Xmx) size and then restart the FE service. + +::: + +## Service Availability Alerts + +### Loading Exception Alerts + +#### Loading Failure Alert + +**PromSQL** + +```SQL +rate(starrocks_fe_txn_failed{job="$job_name",instance="$fe_master"}[5m]) * 100 > 5 +``` + +**Alert Description** + +An alert is triggered when the number of failed loading transactions exceeds 5% of the total. + +**Resolution** + +Check the logs of the Leader FE node to find information about the loading errors. Search for the keyword `status: ABORTED` to identify failed loading tasks. + +```Plain +2024-04-09 18:34:02.363+08:00 INFO (thrift-server-pool-8845163|12111749) [DatabaseTransactionMgr.abortTransaction():1279] transaction:[TransactionState. txn_id: 7398864, label: 967009-2f20a55e-368d-48cf-833a-762cf1fe07c5, db id: 10139, table id list: 155532, callback id: 967009, coordinator: FE: 192.168.2.1, transaction status: ABORTED, error replicas num: 0, replica ids: , prepare time: 1712658795053, commit time: -1, finish time: 1712658842360, total cost: 47307ms, reason: [E1008]Reached timeout=30000ms @192.168.1.1:8060 attachment: RLTaskTxnCommitAttachment [filteredRows=0, loadedRows=0, unselectedRows=0, receivedBytes=1033110486, taskExecutionTimeMs=0, taskId=TUniqueId(hi:3395895943098091727, lo:-8990743770681178171), jobId=967009, progress=KafkaProgress [partitionIdToOffset=2_1211970882|7_1211893755]]] successfully rollback +``` + +#### Routine Load Consumption Delay Alert + +**PromSQL** + +```SQL +(sum by (job_name)(starrocks_fe_routine_load_max_lag_of_partition{job="$job_name",instance="$fe_mater"})) > 300000 +starrocks_fe_routine_load_jobs{job="$job_name",host="$fe_mater",state="NEED_SCHEDULE"} > 3 +starrocks_fe_routine_load_jobs{job="$job_name",host="$fe_mater",state="PAUSED"} > 0 +starrocks_fe_routine_load_jobs{job="$job_name",host="$fe_mater",state="UNSTABLE"} > 0 +``` + +**Alert Description** + +- An alert is triggered when over 300,000 entries are delayed in consumption. +- An alert is triggered when the number of pending Routine Load tasks exceeds 3. +- An alert is triggered when there are tasks in the `PAUSED` state. +- An alert is triggered when there are tasks in the `UNSTABLE` state. + +**Resolution** + +1. First, check if the Routine Load task status is `RUNNING`. + + ```SQL + SHOW ROUTINE LOAD FROM $db; + ``` + + Pay attention to the `State` field in the returned data. + +2. If any Routine Load task is in the `PAUSED` state, examine the `ReasonOfStateChanged`, `ErrorLogUrls`, and `TrackingSQL` fields. Typically, executing the SQL query in `TrackingSQL` can reveal the specific error. + + Example: + + ![Tracking SQL](../../../_assets/alert_routine_load_tracking.png) + +3. If the Routine Load task status is `RUNNING`, you can try to increase the task’s concurrency. The concurrency of individual Routine Load jobs is determined by the minimum value of the following four parameters: + + - `kafka_partition_num`: Number of partitions in the Kafka Topic. + - `desired_concurrent_number`: The set concurrency for the task. + - `alive_be_num`: Number of live BE nodes. + - `max_routine_load_task_concurrent_num`: FE configuration parameter, with a default value of 5. + +In most cases, you may need to adjust the task’s concurrency or the number of Kafka Topic partitions (contact Kafka support if necessary). + +The following example shows how to set concurrency for the task. + +```SQL +ALTER ROUTINE LOAD FOR ${routine_load_jobname} +PROPERTIES +( + "desired_concurrent_number" = "5" +); +``` + +#### Loading Transaction Limit Alert for a Single Database + +**PromSQL** + +```SQL +sum(starrocks_fe_txn_running{job="$job_name"}) by(db) > 900 +``` + +**Alert Description** + +An alert is triggered when the number of loading transactions for a single database exceeds 900 (100 in versions prior to v3.1). + +**Resolution** + +This alert is typically triggered by a large number of newly added loading tasks. You can temporarily increase the limit on loading transactions for a single database. + +```SQL +ADMIN SET FRONTEND CONFIG ("max_running_txn_num_per_db" = "2000"); +``` + +### Query Exception Alerts + +#### Query Latency Alert + +**PromSQL** + +```SQL +starrocks_fe_query_latency_ms{job="$job_name", quantile="0.95"} > 5000 +``` + +**Alert Description** + +An alert is triggered when the P95 query latency exceeds 5 seconds. + +**Resolution** + +1. Investigate whether there are any big queries. + + Check whether large queries have consumed significant machine resources during the exception, leading to other queries timing out or failing. + + - Execute `show proc '/current_queries';` to view the `QueryId` of big queries. If you need to quickly restore service, you can use the `KILL` command to terminate the long-running queries. + + ```SQL + mysql> SHOW PROC '/current_queries'; + +--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+ + | QueryId | ConnectionId | Database | User | ScanBytes | ProcessRows | CPUCostSeconds | MemoryUsageBytes | ExecTime | + +--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+ + | 7c56495f-ae8b-11ed-8ebf-00163e00accc | 4 | tpcds_100g | root | 37.88 MB | 1075769 Rows | 11.13 Seconds | 146.70 MB | 3804 | + | 7d543160-ae8b-11ed-8ebf-00163e00accc | 6 | tpcds_100g | root | 13.02 GB | 487873176 Rows | 81.23 Seconds | 6.37 GB | 2090 | + +--------------------------------------+--------------+------------+------+-----------+----------------+----------------+------------------+----------+ + 2 rows in set (0.01 sec) + ``` + + - You can also restart the BE nodes with high CPU utilization to resolve the issue. + +2. Check if the machine resources are sufficient. + + Verify whether CPU, memory, Disk I/O, and network traffic during the exception are normal. If anomalies are detected, investigate the root cause by examining peak traffic variations and cluster resource usage. If the issue persists, consider restarting the affected node. + +:::note + +In emergencies, you can resolve the issue by: + +- Reducing business traffic and restarting the affected BE node if a sudden traffic spike caused resource overuse and query failure. +- Expanding node capacity if high resource usage is due to normal operations. + +::: + +#### Query Failure Alert + +**PromSQL** + +```Plain +sum by (job,instance)(starrocks_fe_query_err_rate{job="$job_name"}) * 100 > 10 + +# This PromSQL is supported from v3.1.15, v3.2.11, and v3.3.3 onwards. +increase(starrocks_fe_query_internal_err{job="$job_name"})[1m] >10 +``` + +**Alert Description** + +An alert is triggered when the query failure rate exceeds 0.1/second or 10 failed queries occur within one minute. + +**Resolution** + +When this alert is triggered, check the logs to identify the queries that failed. + +```Bash +grep 'State=ERR' fe.audit.log +``` + +If you have the AuditLoader plugin installed, you can locate the corresponding queries using the following query. + +```Bash +SELECT stmt FROM starrocks_audit_db__.starrocks_audit_tbl__ WHERE state='ERR'; +``` + +Note that queries that fail due to syntax errors or timeouts are also recorded in `starrocks_fe_query_err_rate`. + +For query failures caused by kernel issues, search the `fe.log` for the error and obtain the complete stack trace and [Query Dump](../../../faq/Dump_query.md), and contact the support team for troubleshooting. + +#### Query Overload Alert + +**PromSQL** + +```Bash +abs((sum by (exported_job)(rate(starrocks_fe_query_total{process="FE",job="$job_name"}[3m]))-sum by (exported_job)(rate(starrocks_fe_query_total{process="FE",job="$job_name"}[3m] offset 1m)))/sum by (exported_job)(rate(starrocks_fe_query_total{process="FE",job="$job_name"}[3m]))) * 100 > 100 +abs((sum(starrocks_fe_connection_total{job="$job_name"})-sum(starrocks_fe_connection_total{job="$job_name"} offset 3m))/sum(starrocks_fe_connection_total{job="$job_name"})) * 100 > 100 +``` + +**Alert Description** + +An alert is triggered when the QPS or the number of connections increases by 100% within the last minute. + +**Resolution** + +Check whether the high-frequency queries in the `fe.audit.log` are expected. If there are legitimate changes in business behavior (for example, new services going live or increased data volumes), monitor machine load and scale BE nodes as needed. + +#### User Connection Limit Exceeded Alert + +**PromSQL** + +```Bash +sum(starrocks_fe_connection_total{job="$job_name"}) by(user) > 90 +``` + +**Alert Description** + +An alert is triggered when the number of user connections exceeds 90. (User connection limits are supported from versions v3.1.16, v3.2.12, and v3.3.4 onward.) + +**Resolution** + +Use the SQL command `SHOW PROCESSLIST` to check if the number of current connections is as expected. You can terminate unexpected connections using the `KILL` command. Additionally, ensure that frontend services are not holding connections open for too long, and consider adjusting the system variable `wait_timeout` (Unit: Seconds) to accelerate the system's automatic termination of idle connections. + +```Bash +SET wait_timeout = 3600; +``` + +:::note + +In emergencies, you can increase the user connection limit temporarily to restore service: + +- For v3.1.16, v3.2.12, and v3.3.4 or later: + + ```Bash + ALTER USER 'jack' SET PROPERTIES ("max_user_connections" = "1000"); + ``` + +- For v2.5 and earlier: + + ```Bash + SET PROPERTY FOR 'jack' 'max_user_connections' = '1000'; + ``` + +::: + +### Schema Change Exception Alert + +**PromSQL** + +```Bash +increase(starrocks_be_engine_requests_total{job="$job_name",type="schema_change", status="failed"}[1m]) > 1 +``` + +**Alert Description** + +An alert is triggered when more than one Schema Change task fails in the last minute. + +**Resolution** + +Run the following statement to check if the `Msg` field contains any error messages: + +```Bash +SHOW ALTER COLUMN FROM $db; +``` + +If no message is found, search for the JobId from the previous step in the Leader FE logs to retrieve the context. + +- Schema Change Out of Memory + + If the Schema Change fails due to insufficient memory, search the **be.WARNING** logs for `failed to process the version`, `failed to process the schema change from tablet`, or `Memory of schema change task exceeded limit` to identify log records shown in the following: + + ```Bash + fail to execute schema change: Memory of schema change task exceed limit. DirectSchemaChange Used: 2149621304, Limit: 2147483648. You can change the limit by modify BE config [memory_limitation_per_thread_for_schema_change] + ``` + + The memory limit error is typically caused by exceeding the 2GB memory limit for a single Schema Change, controlled by the BE dynamic parameter `memory_limitation_per_thread_for_schema_change`. You can modify this parameter to resolve the issue. + + ```Bash + curl -XPOST http://be_host:http_port/api/update_config?memory_limitation_per_thread_for_schema_change=8 + ``` + +- Schema Change Timeout + + Except for adding columns, which is a lightweight implementation, most Schema Changes involve creating a large number of new tablets, rewriting the original data, and implementing the operation via SWAP. + + ```Plain + Create replicas failed. Error: Error replicas:21539953=99583471, 21539953=99583467, 21539953=99599851 + ``` + + You can address this by: + + - Increasing the timeout for creating tablets (Default: 10 seconds). + + ```Bash + ADMIN SET FRONTEND CONFIG ("tablet_create_timeout_second"="60"); + ``` + + - Increasing the number of threads for creating tablets (default: 3). + + ```Bash + curl -XPOST http://be_host:http_port/api/update_config?alter_tablet_worker_count=6 + ``` + +- Non-Normal Tablet State + + 1. If a tablet is in a non-normal state, search the **be.WARNING** logs for `tablet is not normal` and execute `SHOW PROC '/statistic'` to check the cluster-level `UnhealthyTabletNum`. + + ![show statistic](../../../_assets/alert_show_statistic.png) + + 2. Execute `SHOW PROC '/statistic/$DbId'` to check the unhealthy tablet number in the specified database. + + ![show statistic db](../../../_assets/alert_show_statistic_db.png) + + 3. Execute `SHOW TABLET $tablet_id` to view the table information of the corresponding tablet. + + ![show tablet](../../../_assets/alert_show_tablet.png) + + 4. Execute the command returned in the `DetailCmd` field to identify the cause of the unhealthy tablets. + + ![show proc](../../../_assets/alert_show_proc.png) + + Typically, unhealthy as well as inconsistent replicas are usually caused by high-frequency loading, where the progress of writes to different replicas is not synchronized. You can check if the table has a large number of real-time writes and reduce the number of abnormal replicas by reducing the frequency of loading or temporarily suspending the service and retrying the task thereafter. + +:::note + +In emergencies, to restore the service, you can set the non-Normal replicas as Bad to trigger a Clone task. + +```Bash +ADMIN SET REPLICA STATUS PROPERTIES("tablet_id" = "$tablet_id", "backend_id" = "$backend_id", "status" = "bad"); +``` + +Before performing this operation, ensure the table has at least three complete replicas with only one non-normal replica. + +::: + +### Materialized View Refresh Exception Alert + +**PromSQL** + +```Bash +increase(starrocks_fe_mv_refresh_total_failed_jobs[5m]) > 0 +``` + +**Alert Description** + +An alert is triggered when more than one materialized view refresh fails in the last five minutes. + +**Resolution** + +1. Check the materialized views that failed to refresh. + + ```SQL + SELECT TABLE_NAME,IS_ACTIVE,INACTIVE_REASON,TASK_NAME FROM information_schema.materialized_views WHERE LAST_REFRESH_STATE !=" SUCCESS"; + ``` + +2. Try manually refreshing the materialized view. + + ```SQL + REFRESH MATERIALIZED VIEW $mv_name; + ``` + +3. If the materialized view is in the `INACTIVE` state, try to manually activate it. + + ```SQL + ALTER MATERIALIZED VIEW $mv_name ACTIVE; + ``` + +4. Investigate the cause of the refresh failure. + + ```SQL + SELECT * FROM information_schema.task_runs WHERE task_name ='mv-112517' \G + ``` diff --git a/docs/en/administration/management/monitoring/metrics-materialized_view.md b/docs/en/administration/management/monitoring/metrics-materialized_view.md new file mode 100644 index 0000000..13604d1 --- /dev/null +++ b/docs/en/administration/management/monitoring/metrics-materialized_view.md @@ -0,0 +1,113 @@ +--- +displayed_sidebar: docs +--- + +# Monitoring Metrics for Asynchronous Materialized Views + +From v3.1 onwards, StarRocks supports metrics for asynchronous materialized views. + +To allow Prometheus to access the materialized view metadata in your cluster, you must add the following configurations in the Prometheus configuration file **prometheus/prometheus.yml**: + +```YAML +global: +.... +scrape_configs: + + - job_name: 'dev' + metrics_path: '/metrics' + # Add the following configurations. + basic_auth: + username: 'root' + password: '' + params: + 'with_materialized_view_metrics' : ['all'] +.... +``` + +- `username`: The username used to log into your StarRocks cluster. Unless using the root account, the user must be granted both the `user_admin` and `db_admin` roles. +- `password`: The password used to log into your StarRocks cluster. +- `'with_materialized_view_metrics'`: The scope of the metrics to collect. Valid values: + - `'all'`: All metrics relevant to materialized views are collected. + - `'minified'`: Gauge metrics and metrics whose values are `0` will not be collected. + +## Metric items + +### mv_refresh_jobs + +- Type: Counter +- Description: Total number of refresh jobs of the materialized view. + +### mv_refresh_total_success_jobs + +- Type: Counter +- Description: Number of successful refresh jobs of the materialized view. + +### mv_refresh_total_failed_jobs + +- Type: Counter +- Description: Number of failed refresh jobs of the materialized view. + +### mv_refresh_total_empty_jobs + +- Type: Counter +- Description: Number of canceled refresh jobs of the materialized view because the data to refresh is empty. + +### mv_refresh_total_retry_meta_count + +- Type: Counter +- Description: Number of times when the materialized view refresh job checks whether the base table is updated. + +### mv_query_total_count + +- Type: Counter +- Description: Number of times when the materialized view is used in the pre-processing of a query. + +### mv_query_total_hit_count + +- Type: Counter +- Description: Number of times when the materialized view is considered able to rewrite a query in the query plan. This value may appear higher because the final query plan may skip rewriting due to a high cost. + +### mv_query_total_considered_count + +- Type: Counter +- Description: Number of times when the materialized view rewrites a query (excluding direct queries against the materialized view). + +### mv_query_total_matched_count + +- Type: Counter +- Description: Number of times when the materialized view is involved in the final plan of a query (including direct queries against the materialized view). + +### mv_refresh_pending_jobs + +- Type: Gauge +- Description:| Number of currently pending refresh jobs of the materialized view. + +### mv_refresh_running_jobs + +- Type: Gauge +- Description:| Number of currently running refresh jobs of the materialized view. + +### mv_row_count + +- Type: Gauge +- Description:| Row count of the materialized view. + +### mv_storage_size + +- Type: Gauge +- Description:| Size of the materialized view. Unit: Byte. + +### mv_inactive_state + +- Type: Gauge +- Description:| Status of the materialized view. Valid values: `0`(active) and `1`(inactive). + +### mv_partition_count + +- Type: Gauge +- Description:| Number of partitions in the materialized view. The value is `0` if the materialized view is not partitioned. + +### mv_refresh_duration + +- Type: Histogram +- Description: Duration of the successful materialized view refresh jobs. diff --git a/docs/en/administration/management/monitoring/metrics-shared-data.md b/docs/en/administration/management/monitoring/metrics-shared-data.md new file mode 100644 index 0000000..dc78377 --- /dev/null +++ b/docs/en/administration/management/monitoring/metrics-shared-data.md @@ -0,0 +1,251 @@ +--- +displayed_sidebar: docs +--- + +# Monitoring Metrics for Shared-data Clusters + +StarRocks provides two Dashboard templates for shared-data clusters: + +- [Shared-data Dashboard](#shared-data-dashboard) +- [Starlet Dashboard](#starlet-dashboard) + +## Shared-data Dashboard + +Shared-data Dashboard includes the following categories of monitoring metrics: + +- [Publish Version](#publish-version) +- [Metadata](#metadata) +- [Metacache](#metacache) +- [Vacuum](#vacuum) +- [Loading](#loading) + +### Publish Version + +#### Latency / QPS + +- Description: Quantile latency, average latency, and QPS of Public Version tasks. + +#### Queued Tasks + +- Description: The number of Public Version tasks in the queue. + +### Metadata + +#### Get Tablet Metadata + +- Description: Quantile latency, average latency, and QPS of Get Tablet Metadata tasks. + +#### Put Tablet Metadata + +- Description: Quantile latency, average latency, and QPS of Put Tablet Metadata tasks. + +#### Get Txn Log + +- Description: Quantile latency, average latency, and QPS of Get Txn Log tasks. + +#### Put Txn Log + +- Description: Quantile latency, average latency, and QPS of Put Txn Log tasks. + +### Metacache + +#### Metacache Usage + +- Description: Metacache utilization rate. + +#### Delvec Cache Miss Per Minute + +- Description: Number of cache misses in Delvec Cache per minute. + +#### Metadata Cache Miss Per Minute + +- Description: Number of cache misses in Metadata Cache per minute. + +#### Txn Log Cache Miss Per Minute + +- Description: Number of cache misses in Txn Log Cache per minute. + +#### Segment Cache Miss Per Minute + +- Description: Number of cache misses in Segment Cache per minute. + +### Vacuum + +#### Vacuum Deletes + +- Description: Quantile latency, average latency, and QPS of Vacuum Deletes tasks. + +#### Errors + +- Description: Number of failed Vacuum Deletes operations. + +### Loading + +#### Queue Size + +- Description: Queue size of BE Async Delta Writer. + +## Starlet Dashboard + +Starlet Dashboard includes the following categories of monitoring metrics: + +- [FSLIB READ IO METRICS](#fslib-read-io-metrics) +- [FSLIB WRITE IO METRICS](#fslib-write-io-metrics) +- [S3 IO METRICS](#s3-io-metrics) +- [FSLIB CACHE METRICS](#fslib-cache-metrics) +- [FSLIB FS METRICS](#fslib-fs-metrics) + +### FSLIB READ IO METRICS + +#### fslib read io_latency (quantile) + +- Type: Histogram +- Description: Quantile latency for S3 reads. + +#### fslib read io_latency (average) + +- Type: Counter +- Description: Average latency for S3 reads. + +#### fslib total read data + +- Type: Counter +- Description: Total data size for S3 reads. + +#### fslib read iosize (quantile) + +- Type: Histogram +- Description: Quantile I/O size for S3 reads. + +#### fslib read iosize (average) + +- Type: Counter +- Description: Average I/O size for S3 reads. + +#### fslib read throughput + +- Type: Counter +- Description: I/O throughput per second for S3 reads. + +#### fslib read iops + +- Type: Counter +- Description: Number of I/O operations per second for S3 reads. + +### FSLIB WRITE IO METRICS + +#### fslib write io_latency (quantile) + +- Type: Histogram +- Description: Quantile latency for application writes. Please note that this value may appear lower because this metric monitors only data written to the buffer. + +#### fslib write io_latency (average) + +- Type: Counter +- Description: Average latency for application writes. Please note that this value may appear lower because this metric monitors only data written to the buffer. + +#### fslib total write data + +- Type: Counter +- Description: Total data size for application writes. + +#### fslib write iosize (quantile) + +- Type: Histogram +- Description: Quantile I/O size for application writes. + +#### fslib write iosize (average) + +- Type: Counter +- Description: Average I/O size for application writes. + +#### fslib write throughput + +- Type: Counter +- Description: I/O throughput per second for application writes. + +### S3 IO METRICS + +#### fslib s3 single upload iops + +- Type: Counter +- Description: Number of invocations per second for S3 Put Object. + +#### fslib s3 single upload iosize (quantile) + +- Type: Histogram +- Description: Quantile I/O size for S3 Put Object. + +#### fslib s3 single upload latency (quantile) + +- Type: Histogram +- Description: Quantile latency for S3 Put Object. + +#### fslib s3 multi upload iops + +- Type: Counter +- Description: Number of invocations per second for S3 Multi Upload Object. + +#### fslib s3 multi upload iosize (quantile) + +- Type: Histogram +- Description: Quantile I/O size for S3 Multi Upload Object. + +#### fslib s3 multi upload latency (quantile) + +- Type: Histogram +- Description: Quantile latency for S3 Multi Upload Object. + +#### fslib s3 complete multi upload latency (quantile) + +- Type: Histogram +- Description: Quantile latency for S3 Complete Multi Upload Object. + +### FSLIB CACHE METRICS + +#### fslib cache hit ratio + +- Type: Counter +- Description: Cache hit ratio. + +#### fslib cache hits/misses + +- Type: Counter +- Description: Number of cache hits/misses per second. + +### FSLIB FS METRICS + +#### fslib alive fs instances count + +- Type: Gauge +- Description: Number of file system instances that are alive. + +#### fslib open files + +- Type: Counter +- Description: Cumulative number of opened files. + +#### fslib create files + +- Type: Counter +- Description: Average number of files created per second. + +#### filesystem meta operations + +- Type: Counter +- Description: Average number of directory listing operations per second. + +#### fslib async caches + +- Type: Counter +- Description: Cumulative number of files in asynchronous cache. + +#### fslib create files (TOTAL) + +- Type: Counter +- Description: Cumulative number of files created. + +#### fslib async tasks + +- Type: Counter +- Description: Cumulative number of asynchronous tasks in the queue. diff --git a/docs/en/administration/management/monitoring/metrics.md b/docs/en/administration/management/monitoring/metrics.md new file mode 100644 index 0000000..cc8b683 --- /dev/null +++ b/docs/en/administration/management/monitoring/metrics.md @@ -0,0 +1,1973 @@ +--- +displayed_sidebar: docs +--- + +# General Monitoring Metrics + +This topic introduces some important general metrics of StarRocks. + +For dedicated metrics for materialized views and shared-data clusters, please refer to the corresponding sections: + +- [Metrics for asynchronous materialized view metrics](./metrics-materialized_view.md) +- [Metrics for Shared-data Dashboard metrics, and Starlet Dashboard metrics](./metrics-shared-data.md) + +For more information on how to build a monitoring service for your StarRocks cluster, see [Monitor and Alert](./Monitor_and_Alert.md). + +## Metric items + +### be_broker_count + +- Unit: Count +- Type: Average +- Description: Number of brokers. + +### be_brpc_endpoint_count + +- Unit: Count +- Type: Average +- Description: Number of StubCache in bRPC. + +### be_bytes_read_per_second + +- Unit: Bytes/s +- Type: Average +- Description: Read speed of BE. + +### be_bytes_written_per_second + +- Unit: Bytes/s +- Type: Average +- Description: Write speed of BE. + +### be_base_compaction_bytes_per_second + +- Unit: Bytes/s +- Type: Average +- Description: Base compaction speed of BE. + +### be_cumulative_compaction_bytes_per_second + +- Unit: Bytes/s +- Type: Average +- Description: Cumulative compaction speed of BE. + +### be_base_compaction_rowsets_per_second + +- Unit: Count +- Type: Average +- Description: Base compaction speed of BE rowsets. + +### be_cumulative_compaction_rowsets_per_second + +- Unit: Count +- Type: Average +- Description: Cumulative compaction speed of BE rowsets. + +### be_base_compaction_failed + +- Unit: Count/s +- Type: Average +- Description: Base compaction failure of BE. + +### be_clone_failed + +- Unit: Count/s +- Type: Average +- Description: BE clone failure. + +### be_create_rollup_failed + +- Unit: Count/s +- Type: Average +- Description: Materialized view creation failure of BE. + +### be_create_tablet_failed + +- Unit: Count/s +- Type: Average +- Description: Tablet creation failure of BE. + +### be_cumulative_compaction_failed + +- Unit: Count/s +- Type: Average +- Description: Cumulative compaction failure of BE. + +### be_delete_failed + +- Unit: Count/s +- Type: Average +- Description: Delete failure of BE. + +### be_finish_task_failed + +- Unit: Count/s +- Type: Average +- Description: Task failure of BE. + +### be_publish_failed + +- Unit: Count/s +- Type: Average +- Description: Version release failure of BE. + +### be_report_tables_failed + +- Unit: Count/s +- Type: Average +- Description: Table report failure of BE. + +### be_report_disk_failed + +- Unit: Count/s +- Type: Average +- Description: Disk report failure of BE. + +### be_report_tablet_failed + +- Unit: Count/s +- Type: Average +- Description: Tablet report failure of BE. + +### be_report_task_failed + +- Unit: Count/s +- Type: Average +- Description: Task report failure of BE. + +### be_schema_change_failed + +- Unit: Count/s +- Type: Average +- Description: Schema change failure of BE. + +### be_base_compaction_requests + +- Unit: Count/s +- Type: Average +- Description: Base compaction request of BE. + +### be_clone_total_requests + +- Unit: Count/s +- Type: Average +- Description: Clone request of BE. + +### be_create_rollup_requests + +- Unit: Count/s +- Type: Average +- Description: Materialized view creation request of BE. + +### be_create_tablet_requests + +- Unit: Count/s +- Type: Average +- Description: Tablet creation request of BE. + +### be_cumulative_compaction_requests + +- Unit: Count/s +- Type: Average +- Description: Cumulative compaction request of BE. + +### be_delete_requests + +- Unit: Count/s +- Type: Average +- Description: Delete request of BE. + +### be_finish_task_requests + +- Unit: Count/s +- Type: Average +- Description: Task finish request of BE. + +### be_publish_requests + +- Unit: Count/s +- Type: Average +- Description: Version publish request of BE. + +### be_report_tablets_requests + +- Unit: Count/s +- Type: Average +- Description: Tablet report request of BE. + +### be_report_disk_requests + +- Unit: Count/s +- Type: Average +- Description: Disk report request of BE. + +### be_report_tablet_requests + +- Unit: Count/s +- Type: Average +- Description: Tablet report request of BE. + +### be_report_task_requests + +- Unit: Count/s +- Type: Average +- Description: Task report request of BE. + +### be_schema_change_requests + +- Unit: Count/s +- Type: Average +- Description: Schema change report request of BE. + +### be_storage_migrate_requests + +- Unit: Count/s +- Type: Average +- Description: Migration request of BE. + +### be_fragment_endpoint_count + +- Unit: Count +- Type: Average +- Description: Number of BE DataStream. + +### be_fragment_request_latency_avg + +- Unit: ms +- Type: Average +- Description: Latency of fragment requests. + +### be_fragment_requests_per_second + +- Unit: Count/s +- Type: Average +- Description: Number of fragment requests. + +### be_http_request_latency_avg + +- Unit: ms +- Type: Average +- Description: Latency of HTTP requests. + +### be_http_requests_per_second + +- Unit: Count/s +- Type: Average +- Description: Number of HTTP requests. + +### be_http_request_send_bytes_per_second + +- Unit: Bytes/s +- Type: Average +- Description: Number of bytes sent for HTTP requests. + +### fe_connections_per_second + +- Unit: Count/s +- Type: Average +- Description: New connection rate of FE. + +### fe_connection_total + +- Unit: Count +- Type: Cumulative +- Description: Total number of FE connections. + +### fe_edit_log_read + +- Unit: Count/s +- Type: Average +- Description: Read speed of FE edit log. + +### fe_edit_log_size_bytes + +- Unit: Bytes/s +- Type: Average +- Description: Size of FE edit log. + +### fe_edit_log_write + +- Unit: Bytes/s +- Type: Average +- Description: Write speed of FE edit log. + +### fe_checkpoint_push_per_second + +- Unit: Count/s +- Type: Average +- Description: Number of FE checkpoints. + +### fe_pending_hadoop_load_job + +- Unit: Count +- Type: Average +- Description: Number of pending hadoop jobs. + +### fe_committed_hadoop_load_job + +- Unit: Count +- Type: Average +- Description: Number of committed hadoop jobs. + +### fe_loading_hadoop_load_job + +- Unit: Count +- Type: Average +- Description: Number of loading hadoop jobs. + +### fe_finished_hadoop_load_job + +- Unit: Count +- Type: Average +- Description: Number of completedhadoop jobs. + +### fe_cancelled_hadoop_load_job + +- Unit: Count +- Type: Average +- Description: Number of cancelled hadoop jobs. + +### fe_pending_insert_load_job + +- Unit: Count +- Type: Average +- Description: Number of pending insert jobs. + +### fe_loading_insert_load_job + +- Unit: Count +- Type: Average +- Description: Number of loading insert jobs. + +### fe_committed_insert_load_job + +- Unit: Count +- Type: Average +- Description: Number of committed insert jobs. + +### fe_finished_insert_load_job + +- Unit: Count +- Type: Average +- Description: Number of completed insert jobs. + +### fe_cancelled_insert_load_job + +- Unit: Count +- Type: Average +- Description: Number of cancelled insert jobs. + +### fe_pending_broker_load_job + +- Unit: Count +- Type: Average +- Description: Number of pending broker jobs. + +### fe_loading_broker_load_job + +- Unit: Count +- Type: Average +- Description: Number of loading broker jobs. + +### fe_committed_broker_load_job + +- Unit: Count +- Type: Average +- Description: Number of committed broker jobs. + +### fe_finished_broker_load_job + +- Unit: Count +- Type: Average +- Description: Number of finished broker jobs. + +### fe_cancelled_broker_load_job + +- Unit: Count +- Type: Average +- Description: Number of cancelled broker jobs. + +### fe_pending_delete_load_job + +- Unit: Count +- Type: Average +- Description: Number of pending delete jobs. + +### fe_loading_delete_load_job + +- Unit: Count +- Type: Average +- Description: Number of loading delete jobs. + +### fe_committed_delete_load_job + +- Unit: Count +- Type: Average +- Description: Number of committed delete jobs. + +### fe_finished_delete_load_job + +- Unit: Count +- Type: Average +- Description: Number of finished delete jobs. + +### fe_cancelled_delete_load_job + +- Unit: Count +- Type: Average +- Description: Number of cancelled delete jobs. + +### fe_rollup_running_alter_job + +- Unit: Count +- Type: Average +- Description: Number of jobs created in rollup. + +### fe_schema_change_running_job + +- Unit: Count +- Type: Average +- Description: Number of jobs in schema change. + +### cpu_util + +- Unit: - +- Type: Average +- Description: CPU usage rate. + +### cpu_system + +- Unit: - +- Type: Average +- Description: cpu_system usage rate. + +### cpu_user + +- Unit: - +- Type: Average +- Description: cpu_user usage rate. + +### cpu_idle + +- Unit: - +- Type: Average +- Description: cpu_idle usage rate. + +### cpu_guest + +- Unit: - +- Type: Average +- Description: cpu_guest usage rate. + +### cpu_iowait + +- Unit: - +- Type: Average +- Description: cpu_iowait usage rate. + +### cpu_irq + +- Unit: - +- Type: Average +- Description: cpu_irq usage rate. + +### cpu_nice + +- Unit: - +- Type: Average +- Description: cpu_nice usage rate. + +### cpu_softirq + +- Unit: - +- Type: Average +- Description: cpu_softirq usage rate. + +### cpu_steal + +- Unit: - +- Type: Average +- Description: cpu_steal usage rate. + +### disk_free + +- Unit: Bytes +- Type: Average +- Description: Free disk capacity. + +### disk_io_svctm + +- Unit: ms +- Type: Average +- Description: Disk IO service time. + +### disk_io_util + +- Unit: - +- Type: Average +- Description: Disk usage. + +### disk_used + +- Unit: Bytes +- Type: Average +- Description: Used disk capacity. + +### encryption_keys_created + +- Unit: Count +- Type: Cumulative +- Description: number of file encryption keys created for file encryption + +### encryption_keys_unwrapped + +- Unit: Count +- Type: Cumulative +- Description: number of encryption meta unwrapped for file decryption + +### encryption_keys_in_cache + +- Unit: Count +- Type: Instantaneous +- Description: number of encryption keys currently in key cache + +### starrocks_fe_meta_log_count + +- Unit: Count +- Type: Instantaneous +- Description: The number of Edit Logs without a checkpoint. A value within `100000` is considered reasonable.. + +### starrocks_fe_query_resource_group + +- Unit: Count +- Type: Cumulative +- Description: The number of queries for each resource group. + +### starrocks_fe_query_resource_group_latency + +- Unit: Seconds +- Type: Average +- Description: the query latency percentile for each resource group. + +### starrocks_fe_query_resource_group_err + +- Unit: Count +- Type: Cumulative +- Description: The number of incorrect queries for each resource group. + +### starrocks_be_resource_group_cpu_limit_ratio + +- Unit: - +- Type: Instantaneous +- Description: Instantaneous value of resource group cpu quota ratio. + +### starrocks_be_resource_group_cpu_use_ratio + +- Unit: - +- Type: Average +- Description: The ratio of CPU time used by the resource group to the CPU time of all resource groups. + +### starrocks_be_resource_group_mem_limit_bytes + +- Unit: Bytes +- Type: Instantaneous +- Description: Instantaneous value of resource group memory quota. + +### starrocks_be_resource_group_mem_allocated_bytes + +- Unit: Bytes +- Type: Instantaneous +- Description: Instantaneous value of resource group memory usage. + +### starrocks_be_pipe_prepare_pool_queue_len + +- Unit: Count +- Type: Instantaneous +- Description: Instantaneous value of pipeline prepare thread pool task queue length. + +### starrocks_fe_safe_mode + +- Unit: - +- Type: Instantaneous +- Description: Indicates whether Safe Mode is enabled. Valid values: `0` (disabled) and `1` (enabled). When Safe Mode is enabled, the cluster no longer accepts any loading requests. + +### starrocks_fe_unfinished_backup_job + +- Unit: Count +- Type: Instantaneous +- Description: Indicates the number of running BACKUP tasks under the specific warehouse. For a shared-nothing cluster, this item only monitors the default warehouse. For a shared-data cluster, this value is always `0`. + +### starrocks_fe_unfinished_restore_job + +- Unit: Count +- Type: Instantaneous +- Description: Indicates the number of running RESTORE tasks under the specific warehouse. For a shared-nothing cluster, this item only monitors the default warehouse. For a shared-data cluster, this value is always `0`. + +### starrocks_fe_memory_usage + +- Unit: Bytes or Count +- Type: Instantaneous +- Description: Indicates the memory statistics for various modules under the specific warehouse. For a shared-nothing cluster, this item only monitors the default warehouse. + +### starrocks_fe_unfinished_query + +- Unit: Count +- Type: Instantaneous +- Description: Indicates the number of queries currently running under the specific warehouse. For a shared-nothing cluster, this item only monitors the default warehouse. + +### starrocks_fe_last_finished_job_timestamp + +- Unit: ms +- Type: Instantaneous +- Description: Indicates the end time of the last query or loading under the specific warehouse. For a shared-nothing cluster, this item only monitors the default warehouse. + +### starrocks_fe_query_resource_group + +- Unit: Count +- Type: Cumulative +- Description: Indicates the total number of queries executed under the specific resource group. + +### starrocks_fe_query_resource_group_err + +- Unit: Count +- Type: Cumulative +- Description: Indicates the number of failed queries under the specific resource group. + +### starrocks_fe_query_resource_group_latency + +- Unit: ms +- Type: Cumulative +- Description: Indicates the latency statistics for queries under the specific resource group. + +### starrocks_fe_tablet_num + +- Unit: Count +- Type: Instantaneous +- Description: Indicates the number of tablets on each BE node. + +### starrocks_fe_tablet_max_compaction_score + +- Unit: Count +- Type: Instantaneous +- Description: Indicates the highest Compaction Score on each BE node. + +### starrocks_fe_slow_lock_held_time_ms + +- Unit: ms +- Type: Summary +- Description: Histogram tracking the lock held time (in milliseconds) when slow locks are detected. This metric is updated when lock wait time exceeds the `slow_lock_threshold_ms` configuration parameter. It tracks the maximum lock held time among all lock owners when a slow lock event is detected. Each metric includes quantile values (0.75, 0.95, 0.98, 0.99, 0.999), `_sum`, and `_count` outputs. Note: This metric may not accurately reflect the exact lock held time under high contention, because the metric is updated once the wait time exceeds the threshold, but the held time may continue to increase until the owner completes its operation and releases the lock. However, this metric can still be updated even when deadlock occurs. + +### starrocks_fe_slow_lock_wait_time_ms + +- Unit: ms +- Type: Summary +- Description: Histogram tracking the lock wait time (in milliseconds) when slow locks are detected. This metric is updated when lock wait time exceeds the `slow_lock_threshold_ms` configuration parameter. It accurately tracks how long threads wait to acquire locks during lock contention scenarios. Each metric includes quantile values (0.75, 0.95, 0.98, 0.99, 0.999), `_sum`, and `_count` outputs. This metric provides precise wait time measurements. Note: This metric cannot be updated when deadlock occurs, hence it cannot be used to detect deadlock situations. + +### update_compaction_outputs_total + +- Unit: Count +- Description: Total number of Primary Key table compactions. + +### update_del_vector_bytes_total + +- Unit: Bytes +- Description: Total memory used for caching DELETE vectors in Primary Key tables. + +### push_request_duration_us + +- Unit: us +- Description: Total time spent on Spark Load. + +### writable_blocks_total (Deprecated) + +### disks_data_used_capacity + +- Description: Used capacity of each disk (represented by a storage path). + +### query_scan_rows + +- Unit: Count +- Description: Total number of scanned rows. + +### update_primary_index_num + +- Unit: Count +- Description: Number of Primary Key indexes cached in memory. + +### result_buffer_block_count + +- Unit: Count +- Description: Number of blocks in the result buffer. + +### query_scan_bytes + +- Unit: Bytes +- Description: Total number of scanned bytes. + +### starrocks_be_files_scan_num_files_read + +- Unit: Count +- Description: Number of files read from external storage (CSV, Parquet, ORC, JSON, Avro). Labels: `file_format`, `scan_type`. + +### starrocks_be_files_scan_num_bytes_read + +- Unit: Bytes +- Description: Total bytes read from external storage. Labels: `file_format`, `scan_type`. + +### starrocks_be_files_scan_num_raw_rows_read + +- Unit: Count +- Description: Total raw rows read from external storage before format validation and predicate filtering. Labels: `file_format`, `scan_type`. + +### starrocks_be_files_scan_num_valid_rows_read + +- Unit: Count +- Description: Number of valid rows read (excluding rows with invalid format). Labels: `file_format`, `scan_type`. + +### starrocks_be_files_scan_num_rows_return + +- Unit: Count +- Description: Number of rows returned after predicate filtering. Labels: `file_format`, `scan_type`. + +### disk_reads_completed + +- Unit: Count +- Description: Number of successfully completed disk reads. + +### query_cache_hit_count + +- Unit: Count +- Description: Number of query cache hits. + +### jemalloc_resident_bytes + +- Unit: Bytes +- Description: Maximum number of bytes in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages. + +### blocks_open_writing (Deprecated) + +### disk_io_time_weigthed + +- Unit: ms +- Description: Weighted time spent on I/Os. + +### update_compaction_task_byte_per_second + +- Unit: Bytes/s +- Description: Estimated rate of Primary Key table compactions. + +### blocks_open_reading (Deprecated) + +### tablet_update_max_compaction_score + +- Unit: - +- Description: Highest compaction score of tablets in Primary Key tables in the current BE. + +### segment_read + +- Unit: Count +- Description: Total number of segment reads. + +### disk_io_time_ms + +- Unit: ms +- Description: Time spent on I/Os. + +### load_mem_bytes + +- Unit: Bytes +- Description: Memory cost of data loading. + +### delta_column_group_get_non_pk_total + +- Unit: Count +- Description: Total number of times to get delta column group (for non-Primary Key tables only). + +### query_scan_bytes_per_second + +- Unit: Bytes/s +- Description: Estimated rate of scanned bytes per second. + +### active_scan_context_count + +- Unit: Count +- Description: Total number of scan tasks created by Flink/Spark SQL. + +### fd_num_limit + +- Unit: Count +- Description: Maximum number of file descriptors. + +### update_compaction_task_cost_time_ns + +- Unit: ns +- Description: Total time spent on the Primary Key table compactions. + +### delta_column_group_get_hit_cache + +- Unit: Count +- Description: Total number of delta column group cache hits (for Primary Key tables only). + +### data_stream_receiver_count + +- Unit: Count +- Description: Cumulative number of instances serving as Exchange receivers in BE. + +### bytes_written_total + +- Unit: Bytes +- Description: Total bytes written (sectors write * 512). + +### transaction_streaming_load_bytes + +- Unit: Bytes +- Description: Total loading bytes of transaction load. + +### running_cumulative_compaction_task_num + +- Unit: Count +- Description: Total number of running cumulative compactions. + +### transaction_streaming_load_requests_total + +- Unit: Count +- Description: Total number of transaction load requests. + +### cpu + +- Unit: - +- Description: CPU usage information returned by `/proc/stat`. + +### update_del_vector_num + +- Unit: Count +- Description: Number of the DELETE vector cache items in Primary Key tables. + +### disks_avail_capacity + +- Description: Available capacity of a specific disk. + +### clone_mem_bytes + +- Unit: Bytes +- Description: Memory used for replica clone. + +### fragment_requests_total + +- Unit: Count +- Description: Total fragment instances executing on a BE (for non-pipeline engine). + +### disk_write_time_ms + +- Unit: ms +- Description: Time spent on disk writing. Unit: ms. + +### schema_change_mem_bytes + +- Unit: Bytes +- Description: Memory used for Schema Change. + +### thrift_connections_total + +- Unit: Count +- Description: Total number of thrift connections (including finished connections). + +### thrift_opened_clients + +- Unit: Count +- Description: Number of currently opened thrift clients. + +### thrift_used_clients + +- Unit: Count +- Description: Number of thrift clients in use currently. + +### thrift_current_connections (Deprecated) + +### disk_bytes_read + +- Unit: Bytes +- Description: Total bytes read from the disk(sectors read * 512). + +### base_compaction_task_cost_time_ms + +- Unit: ms +- Description: Total time spent on base compactions. + +### update_primary_index_bytes_total + +- Unit: Bytes +- Description: Total memory cost of the Primary Key index. + +### compaction_deltas_total + +- Unit: Count +- Description: Total number of merged rowsets from base compactions and cumulative compactions. + +### max_network_receive_bytes_rate + +- Unit: Bytes +- Description: Total bytes received over the network (maximum value among all network interfaces). + +### max_network_send_bytes_rate + +- Unit: Bytes +- Description: Total bytes sent over the network (maximum value among all network interfaces). + +### chunk_allocator_mem_bytes + +- Unit: Bytes +- Description: Memory used by the chunk allocator. + +### query_cache_usage_ratio + +- Unit: - +- Description: Current query cache usage ratio. + +### process_fd_num_limit_soft + +- Unit: Count +- Description: Soft limit on the maximum number of file descriptors. Please note that this item indicates the soft limit. You can set the hard limit using the `ulimit` command. + +### query_cache_hit_ratio + +- Unit: - +- Description: Hit ratio of query cache. + +### tablet_metadata_mem_bytes + +- Unit: Bytes +- Description: Memory used by tablet metadata. + +### jemalloc_retained_bytes + +- Unit: Bytes +- Description: Total bytes of virtual memory mappings that were retained rather than returned to the operating system via operations like munmap(2). + +### unused_rowsets_count + +- Unit: Count +- Description: Total number of unused rowsets. Please note that these rowsets will be reclaimed later. + +### update_rowset_commit_apply_total + +- Unit: Count +- Description: Total number of COMMIT and APPLY for Primary Key tables. + +### segment_zonemap_mem_bytes + +- Unit: Bytes +- Description: Memory used by segment zonemap. + +### base_compaction_task_byte_per_second + +- Unit: Bytes/s +- Description: Estimated rate of base compactions. + +### transaction_streaming_load_duration_ms + +- Unit: ms +- Description: Total time spent on Stream Load transaction Interface. + +### memory_pool_bytes_total + +- Unit: Bytes +- Description: Memory used by the memory pool. + +### short_key_index_mem_bytes + +- Unit: Bytes +- Description: Memory used by the short key index. + +### disk_bytes_written + +- Unit: Bytes +- Description: Total bytes written to disk. + +### segment_flush_queue_count + +- Unit: Count +- Description: Number of queued tasks in the segment flush thread pool. + +### jemalloc_metadata_bytes + +- Unit: Bytes +- Description: Total number of bytes dedicated to metadata, comprising base allocations used for bootstrap-sensitive allocator metadata structures and internal allocations. The usage of transparent huge pages is not included in this item. + +### network_send_bytes + +- Unit: Bytes +- Description: Number of bytes sent over the network. + +### update_mem_bytes + +- Unit: Bytes +- Description: Memory used by Primary Key table APPLY tasks and Primary Key index. + +### blocks_created_total (Deprecated) + +### query_cache_usage + +- Unit: Bytes +- Description: Current query cache usages. + +### memtable_flush_duration_us + +- Unit: us +- Description: Total time spent on memtable flush. + +### push_request_write_bytes + +- Unit: Bytes +- Description: Total bytes written via Spark Load. + +### jemalloc_active_bytes + +- Unit: Bytes +- Description: Total bytes in active pages allocated by the application. + +### load_rpc_threadpool_size + +- Unit: Count +- Description: The current size of the RPC thread pool, which is used for handling Routine Load and loading via table functions. The default value is 10, with a maximum value of 1000. This value is dynamically adjusted based on the usage of the thread pool. + +### publish_version_queue_count + +- Unit: Count +- Description: Queued task count in the Publish Version thread pool. + +### chunk_pool_system_free_count + +- Unit: Count +- Description: Memory chunk allocation/cache metrics. + +### chunk_pool_system_free_cost_ns + +- Unit: ns +- Description: Memory chunk allocation/cache metrics. + +### chunk_pool_system_alloc_count + +- Unit: Count +- Description: Memory chunk allocation/cache metrics. + +### chunk_pool_system_alloc_cost_ns + +- Unit: ns +- Description: Memory chunk allocation/cache metrics. + +### chunk_pool_local_core_alloc_count + +- Unit: Count +- Description: Memory chunk allocation/cache metrics. + +### chunk_pool_other_core_alloc_count + +- Unit: Count +- Description: Memory chunk allocation/cache metrics. + +### fragment_endpoint_count + +- Unit: Count +- Description: Cumulative number of instances serving as Exchange senders in BE. + +### disk_sync_total (Deprecated) + +### max_disk_io_util_percent + +- Unit: - +- Description: Maximum disk I/O utilization percentage. + +### network_receive_bytes + +- Unit: Bytes +- Description: Total bytes received via network. + +### storage_page_cache_mem_bytes + +- Unit: Bytes +- Description: Memory used by storage page cache. + +### jit_cache_mem_bytes + +- Unit: Bytes +- Description: Memory used by jit compiled function cache. + +### column_partial_update_apply_total + +- Unit: Count +- Description: Total number of APPLY for partial updates by column (Column mode) + +### disk_writes_completed + +- Unit: Count +- Description: Total number of successfully completed disk writes. + +### memtable_flush_total + +- Unit: Count +- Description: Total number of memtable flushes. + +### page_cache_hit_count + +- Unit: Count +- Description: Total number of hits in the storage page cache. + +### page_cache_insert_count + +- Unit: Count +- Description: Total number of insert operations in the storage page cache. + +### page_cache_insert_evict_count + +- Unit: Count +- Description: Total number of cache entries evicted during insert operations due to capacity constraints. + +### page_cache_release_evict_count + +- Unit: Count +- Description: Total number of cache entries evicted during release operations when cache usage exceeds capacity. + +### bytes_read_total (Deprecated) + +### update_rowset_commit_request_total + +- Unit: Count +- Description: Total number of rowset COMMIT requests in Primary Key tables. + +### tablet_cumulative_max_compaction_score + +- Unit: - +- Description: Highest cumulative compaction score of tablets in this BE. + +### column_zonemap_index_mem_bytes + +- Unit: Bytes +- Description: Memory used by column zonemaps. + +### push_request_write_bytes_per_second + +- Unit: Bytes/s +- Description: Data writing rate of Spark Load. + +### process_thread_num + +- Unit: Count +- Description: Total number of threads in this process. + +### query_cache_lookup_count + +- Unit: Count +- Description: Total number of query cache lookups. + +### http_requests_total (Deprecated) + +### http_request_send_bytes (Deprecated) + +### cumulative_compaction_task_cost_time_ms + +- Unit: ms +- Description: Total time spent on cumulative compactions. + +### column_metadata_mem_bytes + +- Unit: Bytes +- Description: Memory used by column metadata. + +### plan_fragment_count + +- Unit: Count +- Description: Number of currently running query plan fragments. + +### page_cache_lookup_count + +- Unit: Count +- Description: Total number of storage page cache lookups. + +### query_mem_bytes + +- Unit: Bytes +- Description: Memory used by queries. + +### load_channel_count + +- Unit: Count +- Description: Total number of loading channels. + +### push_request_write_rows + +- Unit: Count +- Description: Total rows written via Spark Load. + +### running_base_compaction_task_num + +- Unit: Count +- Description: Total number of running base compaction tasks. + +### fragment_request_duration_us + +- Unit: us +- Description: Cumulative execution time of fragment instances (for non-pipeline engine). + +### process_mem_bytes + +- Unit: Bytes +- Description: Memory used by this process. + +### broker_count + +- Unit: Count +- Description: Total number of filesystem brokers (by host address) created. + +### segment_replicate_queue_count + +- Unit: Count +- Description: Queued task count in the Segment Replicate thread pool. + +### brpc_endpoint_stub_count + +- Unit: Count +- Description: Total number of bRPC stubs (by address). + +### readable_blocks_total (Deprecated) + +### delta_column_group_get_non_pk_hit_cache + +- Unit: Count +- Description: Total number of hits in the delta column group cache (for non-Primary Key tables). + +### update_del_vector_deletes_new + +- Unit: Count +- Description: Total number of newly generated DELETE vectors used in Primary Key tables. + +### compaction_bytes_total + +- Unit: Bytes +- Description: Total merged bytes from base compactions and cumulative compactions. + +### segment_metadata_mem_bytes + +- Unit: Bytes +- Description: Memory used by segment metadata. + +### column_partial_update_apply_duration_us + +- Unit: us +- Description: Total time spent on partial updates for columns' APPLY tasks (Column mode). + +### bloom_filter_index_mem_bytes + +- Unit: Bytes +- Description: Memory used by Bloomfilter indexes. + +### routine_load_task_count + +- Unit: Count +- Description: Number of currently running Routine Load tasks. + +### delta_column_group_get_total + +- Unit: Count +- Description: Total number of times to get delta column groups (for Primary Key tables). + +### disk_read_time_ms + +- Unit: ms +- Description: Time spent on reading from disk. Unit: ms. + +### update_compaction_outputs_bytes_total + +- Unit: Bytes +- Description: Total bytes written by Primary Key table compactions. + +### memtable_flush_queue_count + +- Unit: Count +- Description: Queued task count in the memtable flush thread pool. + +### query_cache_capacity + +- Description: Capacity of query cache. + +### streaming_load_duration_ms + +- Unit: ms +- Description: Total time spent on Stream Load. + +### streaming_load_current_processing + +- Unit: Count +- Description: Number of currently running Stream Load tasks. + +### streaming_load_bytes + +- Unit: Bytes +- Description: Total bytes loaded by Stream Load. + +### load_rows + +- Unit: Count +- Description: Total loaded rows. + +### load_bytes + +- Unit: Bytes +- Description: Total loaded bytes. + +### meta_request_total + +- Unit: Count +- Description: Total number of meta read/write requests. + +### meta_request_duration + +- Unit: us +- Description: Total meta read/write duration. + +### tablet_base_max_compaction_score + +- Unit: - +- Description: Highest base compaction score of tablets in this BE. + +### page_cache_capacity + +- Description: Capacity of the storage page cache. + +### cumulative_compaction_task_byte_per_second + +- Unit: Bytes/s +- Description: Rate of bytes processed during cumulative compactions. + +### stream_load + +- Unit: - +- Description: Total loaded rows and received bytes. + +### stream_load_pipe_count + +- Unit: Count +- Description: Number of currently running Stream Load tasks. + +### binary_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the BINARY column pool. + +### int16_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the INT16 column pool. + +### decimal_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the DECIMAL column pool. + +### double_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the DOUBLE column pool. + +### int128_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the INT128 column pool. + +### date_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the DATE column pool. + +### int64_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the INT64 column pool. + +### int8_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the INT8 column pool. + +### datetime_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the DATETIME column pool. + +### int32_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the INT32 column pool. + +### float_column_pool_bytes + +- Unit: Bytes +- Description: Memory used by the FLOAT column pool. + +### uint8_column_pool_bytes + +- Unit: Bytes +- Description: Bytes used by the UINT8 column pool. + +### column_pool_mem_bytes + +- Unit: Bytes +- Description: Memory used by the column pools. + +### local_column_pool_bytes (Deprecated) + +### total_column_pool_bytes (Deprecated) + +### central_column_pool_bytes (Deprecated) + +### wait_base_compaction_task_num + +- Unit: Count +- Description: Number of base compaction tasks waiting for execution. + +### wait_cumulative_compaction_task_num + +- Unit: Count +- Description: Number of cumulative compaction tasks waiting for execution. + +### jemalloc_allocated_bytes + +- Unit: Bytes +- Description: Total number of bytes allocated by the application. + +### pk_index_compaction_queue_count + +- Unit: Count +- Description: Queued task count in the Primary Key index compaction thread pool. + +### disks_total_capacity + +- Description: Total capacity of the disk. + +### disks_state + +- Unit: - +- Description: State of each disk. `1` indicates that the disk is in use, and `0` indicates that it is not in use. + +### update_del_vector_deletes_total (Deprecated) + +### update_del_vector_dels_num (Deprecated) + +### result_block_queue_count + +- Unit: Count +- Description: Number of results in the result block queue. + +### engine_requests_total + +- Unit: Count +- Description: Total count of all types of requests between BE and FE, including CREATE TABLE, Publish Version, and tablet clone. + +### snmp + +- Unit: - +- Description: Metrics returned by `/proc/net/snmp`. + +### compaction_mem_bytes + +- Unit: Bytes +- Description: Memory used by compactions. + +### txn_request + +- Unit: - +- Description: Transaction requests of BEGIN, COMMIT, ROLLBACK, and EXEC. + +### small_file_cache_count + +- Unit: Count +- Description: Number of small file caches. + +### rowset_metadata_mem_bytes + +- Unit: Bytes +- Description: Total number of bytes for rowset metadata. + +### update_apply_queue_count + +- Unit: Count +- Description: Queued task count in the Primary Key table transaction APPLY thread pool. + +### async_delta_writer_queue_count + +- Unit: Count +- Description: Queued task count in the tablet delta writer thread pool. + +### update_compaction_duration_us + +- Unit: us +- Description: Total time spent on Primary Key table compactions. + +### transaction_streaming_load_current_processing + +- Unit: Count +- Description: Number of currently running transactional Stream Load tasks. + +### ordinal_index_mem_bytes + +- Unit: Bytes +- Description: Memory used by ordinal indexes. + +### consistency_mem_bytes + +- Unit: Bytes +- Description: Memory used by replica consistency checks. + +### tablet_schema_mem_bytes + +- Unit: Bytes +- Description: Memory used by tablet schema. + +### fd_num_used + +- Unit: Count +- Description: Number of file descriptors currently in use. + +### running_update_compaction_task_num + +- Unit: Count +- Description: Total number of currently running Primary Key table compaction tasks. + +### rowset_count_generated_and_in_use + +- Unit: Count +- Description: Number of rowset IDs currently in use. + +### bitmap_index_mem_bytes + +- Unit: Bytes +- Description: Memory used by bitmap indexes. + +### update_rowset_commit_apply_duration_us + +- Unit: us +- Description: Total time spent on Primary Key table APPLY tasks. + +### process_fd_num_used + +- Unit: Count +- Description: Number of file descriptors currently in use in this BE process. + +### network_send_packets + +- Unit: Count +- Description: Total number of packets sent through the network. + +### network_receive_packets + +- Unit: Count +- Description: Total number of packets received through the network. + +### metadata_mem_bytes (Deprecated) + +### push_requests_total + +- Unit: Count +- Description: Total number of successful and failed Spark Load requests. + +### blocks_deleted_total (Deprecated) + +### jemalloc_mapped_bytes + +- Unit: Bytes +- Description: Total number of bytes in active extents mapped by the allocator. + +### process_fd_num_limit_hard + +- Unit: Count +- Description: Hard limit on the maximum number of file descriptors. + +### jemalloc_metadata_thp + +- Unit: Count +- Description: Number of Transparent Huge Pages used for metadata. + +### update_rowset_commit_request_failed + +- Unit: Count +- Description: Total number of failed rowset COMMIT requests in Primary Key tables. + +### streaming_load_requests_total + +- Unit: Count +- Description: Total number of Stream Load requests. + +### resource_group_running_queries + +- Unit: Count +- Description: Number of queries currently running in each resource group. This is an instantaneous value. + +### resource_group_total_queries + +- Unit: Count +- Description: Total number of queries executed in each resource group, including those currently running. This is an instantaneous value. + +### resource_group_bigquery_count + +- Unit: Count +- Description: Number of queries in each resource group that have triggered the limit for big queries. This is an instantaneous value. + +### resource_group_concurrency_overflow_count + +- Unit: Count +- Description: Number of queries in each resource group that have triggered the concurrency limit. This is an instantaneous value. + +### resource_group_mem_limit_bytes + +- Unit: Bytes +- Description: Memory limit for each resource group, measured in bytes. This is an instantaneous value. + +### resource_group_mem_inuse_bytes + +- Unit: Bytes +- Description: Currently used memory by each resource group, measured in bytes. This is an instantaneous value. + +### resource_group_cpu_limit_ratio + +- Unit: - +- Description: Ratio of the CPU core limit for each resource group to the total CPU core limit for all resource groups. This is an instantaneous value. + +### resource_group_inuse_cpu_cores + +- Unit: Count +- Description: Estimated number of CPU cores currently in use by each resource group. This is an average value over the time interval between two metric retrievals. + +### resource_group_cpu_use_ratio (Deprecated) + +- Unit: - +- Description: Ratio of pipeline thread time slices used by each resource group to the total used by all resource groups. This is an average value over the time interval between two metric retrievals. + +### resource_group_connector_scan_use_ratio (Deprecated) + +- Unit: - +- Description: Ratio of external table scan thread time slices used by each resource group to the total used by all resource groups. This is an average value over the time interval between two metric retrievals. + +### resource_group_scan_use_ratio (Deprecated) + +- Unit: - +- Description: Ratio of internal table scan thread time slices used by each resource group to the total used by all resource groups. This is an average value over the time interval between two metric retrievals. + +### pipe_poller_block_queue_len + +- Unit: Count +- Description: Current length of the block queue of PipelineDriverPoller in the pipeline engine. + +### pip_query_ctx_cnt + +- Unit: Count +- Description: Total number of currently running queries in the BE. + +### pipe_driver_schedule_count + +- Unit: Count +- Description: Cumulative number of driver scheduling times for pipeline executors in the BE. + +### pipe_scan_executor_queuing + +- Unit: Count +- Description: Current number of pending asynchronous I/O tasks launched by Scan Operators. + +### pipe_driver_queue_len + +- Unit: Count +- Description: Current number of ready drivers in the ready queue waiting for scheduling in BE. + +### pipe_driver_execution_time + +- Description: Cumulative time spent by PipelineDriver executors on processing PipelineDrivers. + +### pipe_prepare_pool_queue_len + +- Unit: Count +- Description: Queued task count in the pipeline PREPARE thread pool. This is an instantaneous value. + +### starrocks_be_exec_state_report_active_threads + +- Unit: Count +- Type: Instantaneous +- Description: The number of tasks being executed in the thread pool that reports the execution status of the Fragment instance. + +### starrocks_be_exec_state_report_running_threads + +- Unit: Count +- Type: Instantaneous +- Description: The number of threads in the thread pool that reports the execution status of the Fragment instance, with a minimum of 1 and a maximum of 2. + +### starrocks_be_exec_state_report_threadpool_size + +- Unit: Count +- Type: Instantaneous +- Description: The maximum number of threads in the thread pool that reports the execution status of the Fragment instance, defaults to 2. + +### starrocks_be_exec_state_report_queue_count + +- Unit: Count +- Type: Instantaneous +- Description: The number of tasks queued in the thread pool that reports the execution status of the Fragment instance, up to a maximum of 1000. + +### starrocks_be_priority_exec_state_report_active_threads + +- Unit: Count +- Type: Instantaneous +- Description: The number of tasks being executed in the thread pool that reports the final execution state of the Fragment instance. + +### starrocks_be_priority_exec_state_report_running_threads + +- Unit: Count +- Type: Instantaneous +- Description: The number of threads in the thread pool that reports the final execution status of the Fragment instance, with a minimum of 1 and a maximum of 2. + +### starrocks_be_priority_exec_state_report_threadpool_size + +- Unit: Count +- Type: Instantaneous +- Description: The maximum number of threads in the thread pool that reports the final execution status of the Fragment instance, defaults to 2. + +### starrocks_be_priority_exec_state_report_queue_count + +- Unit: Count +- Type: Instantaneous +- Description: The number of tasks queued in the thread pool that reports the final execution status of the Fragment instance, up to a maximum of 2147483647. + +### starrocks_fe_routine_load_jobs + +- Unit: Count +- Description: The total number of Routine Load jobs in different states. For example: + + ```plaintext + starrocks_fe_routine_load_jobs{state="NEED_SCHEDULE"} 0 + starrocks_fe_routine_load_jobs{state="RUNNING"} 1 + starrocks_fe_routine_load_jobs{state="PAUSED"} 0 + starrocks_fe_routine_load_jobs{state="STOPPED"} 0 + starrocks_fe_routine_load_jobs{state="CANCELLED"} 1 + starrocks_fe_routine_load_jobs{state="UNSTABLE"} 0 + ``` + +### starrocks_fe_routine_load_paused + +- Unit: Count +- Description: The total number of times Routine Load jobs are paused. + +### starrocks_fe_routine_load_rows + +- Unit: Count +- Description: The total number of rows loaded by all Routine Load jobs. + +### starrocks_fe_routine_load_receive_bytes + +- Unit: Byte +- Description: The total amount of data loaded by all Routine Load jobs. + +### starrocks_fe_routine_load_error_rows + +- Unit: Count +- Description: The total number of error rows encountered during data loading by all Routine Load jobs. + +### starrocks_fe_routine_load_max_lag_of_partition + +- Unit: - +- Description: The maximum Kafka partition offset lag for each Routine Load job. It is collected only when the FE configuration `enable_routine_load_lag_metrics` is set to `true` and the offset lag is greater than or equal to the FE configuration `min_routine_load_lag_for_metrics`. By default, `enable_routine_load_lag_metrics` is `false`, and `min_routine_load_lag_for_metrics` is `10000`. + +### starrocks_fe_routine_load_max_lag_time_of_partition + +- Unit: Seconds +- Description: The maximum Kafka partition offset timestamp lag for each Routine Load job. It is collected only when the FE configuration `enable_routine_load_lag_time_metrics` is set to `true`. By default, `enable_routine_load_lag_time_metrics` is `false`. + +### starrocks_fe_sql_block_hit_count + +- Unit: Count +- Description: The number of times blacklisted sql have been intercepted. + +### starrocks_fe_scheduled_pending_tablet_num + +- Unit: Count +- Type: Instantaneous +- Description: The number of Clone tasks in Pending state FE scheduled, including both BALANCE and REPAIR types. + +### starrocks_fe_scheduled_running_tablet_num + +- Unit: Count +- Type: Instantaneous +- Description: The number of Clone tasks in Running state FE scheduled, including both BALANCE and REPAIR types. + +### starrocks_fe_clone_task_total + +- Unit: Count +- Type: Cumulative +- Description: The total number of Clone tasks in the cluster. + +### starrocks_fe_clone_task_success + +- Unit: Count +- Type: Cumulative +- Description: The number of successfully executed Clone tasks in the cluster. + +### starrocks_fe_clone_task_copy_bytes + +- Unit: Bytes +- Type: Cumulative +- Description: The total file size copied by Clone tasks in the cluster, including both INTER_NODE and INTRA_NODE types. + +### starrocks_fe_clone_task_copy_duration_ms + +- Unit: ms +- Type: Cumulative +- Description: The total time for copy consumed by Clone tasks in the cluster, including both INTER_NODE and INTRA_NODE types. + +### starrocks_be_clone_task_copy_bytes + +- Unit: Bytes +- Type: Cumulative +- Description: The total file size copied by Clone tasks in the BE node, including both INTER_NODE and INTRA_NODE types. + +### starrocks_be_clone_task_copy_duration_ms + +- Unit: ms +- Type: Cumulative +- Description: The total time for copy consumed by Clone tasks in the BE node, including both INTER_NODE and INTRA_NODE types. + +### Transaction Latency Metrics + +The following metrics are `summary`-type metrics that provide latency distributions for different phases of a transaction. These metrics are reported exclusively by the Leader FE node. + +Each metric includes the following outputs: +- **Quantiles**: Latency values at different percentile boundaries. These are exposed via the `quantile` label, which can have values of `0.75`, `0.95`, `0.98`, `0.99`, and `0.999`. +- **`_sum`**: The total cumulative time spent in this phase, for example, `starrocks_fe_txn_total_latency_ms_sum`. +- **`_count`**: The total number of transactions recorded for this phase, for example, `starrocks_fe_txn_total_latency_ms_count`. + +All transaction metrics share the following labels: +- `type`: Categorizes transactions by their load job source type (for example, `all`, `stream_load`, `routine_load`). This allows for monitoring both overall transaction performance and the performance of specific load types. The reported groups can be configured via the FE parameter [`txn_latency_metric_report_groups`](../FE_configuration.md#txn_latency_metric_report_groups). +- `is_leader`: Indicates whether the reporting FE node is the Leader. Only the Leader FE (`is_leader="true"`) reports actual metric values. Followers will have `is_leader="false"` and report no data. + +#### starrocks_fe_txn_total_latency_ms + +- Unit: ms +- Type: Summary +- Description: The total latency for a transaction to complete, measured from the `prepare` time to the `finish` time. This metric represents the full end-to-end duration of a transaction. + +#### starrocks_fe_txn_write_latency_ms + +- Unit: ms +- Type: Summary +- Description: The latency of the `write` phase of a transaction, from `prepare` time to `commit` time. This metric isolates the performance of the data writing and preparation stage before the transaction is ready to be published. + +#### starrocks_fe_txn_publish_latency_ms + +- Unit: ms +- Type: Summary +- Description: The latency of the `publish` phase, from `commit` time to `finish` time. This is the duration it takes for a committed transaction to become visible to queries. It is the sum of the `schedule`, `execute`, and `ack` sub-phases. + +#### starrocks_fe_txn_publish_schedule_latency_ms + +- Unit: ms +- Type: Summary +- Description: The time a transaction spends waiting to be published after it has been committed, measured from `commit` time to when the publish task is picked up. This metric reflects scheduling delays or queueing time in the `publish` pipeline. + +#### starrocks_fe_txn_publish_execute_latency_ms + +- Unit: ms +- Type: Summary +- Description: The active execution time of the `publish` task, from when the task is picked up to when it finishes. This metric represents the actual time being spent to make the transaction's changes visible. + +#### starrocks_fe_txn_publish_ack_latency_ms + +- Unit: ms +- Type: Summary +- Description: The final acknowledgment latency, from when the `publish` task finishes to the final `finish` time when the transaction is marked as `VISIBLE`. This metric includes any final steps or acknowledgments required. + +### Merge Commit BE Metrics + +#### merge_commit_request_total + +- Unit: Count +- Type: Cumulative +- Description: Total number of merge commit requests received by BE. + +#### merge_commit_request_bytes + +- Unit: Bytes +- Type: Cumulative +- Description: Total bytes of data received across merge commit requests. + +#### merge_commit_success_total + +- Unit: Count +- Type: Cumulative +- Description: Merge commit requests that finished successfully. + +#### merge_commit_fail_total + +- Unit: Count +- Type: Cumulative +- Description: Merge commit requests that failed. + +#### merge_commit_pending_total + +- Unit: Count +- Type: Instantaneous +- Description: Merge commit tasks currently waiting in the execution queue. + +#### merge_commit_pending_bytes + +- Unit: Bytes +- Type: Instantaneous +- Description: Total bytes of data held by pending merge commit tasks. + +#### merge_commit_send_rpc_total + +- Unit: Count +- Type: Cumulative +- Description: RPC requests sent to FE for starting merge commit operations. + +#### merge_commit_register_pipe_total + +- Unit: Count +- Type: Cumulative +- Description: Stream load pipes registered for merge commit operations. + +#### merge_commit_unregister_pipe_total + +- Unit: Count +- Type: Cumulative +- Description: Stream load pipes unregistered from merge commit operations. + +Latency metrics expose percentile series such as `merge_commit_request_latency_99` and `merge_commit_request_latency_90`, reported in microseconds. The end-to-end latency obeys: + +`merge_commit_request = merge_commit_pending + merge_commit_wait_plan + merge_commit_append_pipe + merge_commit_wait_finish` + +> **Note**: Before v3.4.11, v3.5.12, and v4.0.4, these latency metrics were reported in nanoseconds. + +#### merge_commit_request + +- Unit: microsecond +- Type: Summary +- Description: End-to-end processing latency for merge commit requests. + +#### merge_commit_pending + +- Unit: microsecond +- Type: Summary +- Description: Time merge commit tasks spend waiting in the pending queue before execution. + +#### merge_commit_wait_plan + +- Unit: microsecond +- Type: Summary +- Description: Combined latency for the RPC request and waiting for the stream load pipe to become available. + +#### merge_commit_append_pipe + +- Unit: microsecond +- Type: Summary +- Description: Time spent appending data to the stream load pipe during merge commit. + +#### merge_commit_wait_finish + +- Unit: microsecond +- Type: Summary +- Description: Time spent waiting for merge commit load operations to finish. \ No newline at end of file diff --git a/docs/en/administration/management/proc_profile.md b/docs/en/administration/management/proc_profile.md new file mode 100644 index 0000000..20a7788 --- /dev/null +++ b/docs/en/administration/management/proc_profile.md @@ -0,0 +1,115 @@ +# Process Profile (Proc Profile) + +The **Process Profile** (Proc Profile) feature provides a built-in mechanism to collect and visualize performance profiles for StarRocks Frontend (FE) and Backend (BE) processes. By generating flame graphs for CPU and memory allocation, it helps developers and administrators diagnose performance bottlenecks, high resource utilization, and complex runtime issues directly from the Web UI. + +## Overview + +Process Profiling is a system-level diagnostic tool that captures the state of the StarRocks processes over a period of time. Unlike Query Profiles, which focus on individual SQL execution, Process Profiles provide a holistic view of what the processes are doing, including background tasks, metadata management, and internal synchronization. + +## Page Examples + +The Proc Profile interface is integrated into the StarRocks Web UI under the **proc profiles** tab. + +### Profile List View + +The main page displays a list of collected profile files for the selected node. +You can switch between FE and different BE nodes using the tabs at the top. + +![Profile List Example](../../_assets/proc_profile_1.png) + +### Flame Graph + +![Flame Graph](../../_assets/proc_profile_2.png) + + +A Flame Graph is a visualization tool used to show the resource consumption distribution of functions or code paths in a program. It represents the call stack using stacked rectangles (usually in a horizontal "flame" shape), where: + +- **Each box represents a function (or method) call.** +- **The width of a box represents the amount of resources consumed by that function (such as CPU time, memory allocation frequency, or lock wait time). The wider the box, the more resources are consumed.** +- **Vertical stacking represents the calling relationship, with the bottom layer being the entry function and child functions stacked on top.** + +Flame graphs help developers and operations personnel quickly identify code hotspots, performance bottlenecks, and call paths with the highest resource consumption, and are commonly used for performance tuning and troubleshooting. + +In StarRocks Process Profile, flame graphs are used to visually display the data distribution of CPU usage, memory allocation, and (in BE scenarios) lock contention, helping to locate the most time-consuming code segments or call paths. + + + +## Use Scenarios + +- **CPU Hotspot Analysis**: Identify which code paths or functions are consuming the most CPU cycles. +- **Memory Allocation Profiling**: Track where memory is being frequently allocated to find potential memory pressure sources. +- **R&D Troubleshooting**: Assist developers in analyzing complex bugs or performance regressions in production environments without needing external profiling tools. + +## Feature Description + +### How to Use + +1. **Access**: Open the StarRocks Web UI (default port 8030) and click on the **proc profiles** tab. +2. **Select Node**: Choose the **FE** tab or a specific **BE** node tab. +3. **Visualize**: Click the **View** link for any entry. + - For **FE**, it extracts and displays a pre-generated HTML flame graph. + - For **BE**, it may convert a raw `pprof` file into an SVG flame graph on the fly (the first view might take a few seconds as it performs the conversion). +4. **Interact**: Use the flame graph to zoom into specific call stacks, search for function names, or analyze stack depth. + +### Information Provided + +- **Type**: CPU, Memory (Allocation), or Contention (BE only). +- **Timestamp**: When the profile collection was completed. +- **File Size**: Size of the compressed profile data. +- **Flame Graph**: A hierarchical visualization where the width of each box represents the relative resource consumption (CPU time, allocation frequency, or lock wait time). + +## Configuration + +The profiling feature performs function-level sampling to generate flame graphs for visualization. + +### Frontend (FE) Configuration + +FE profiling is managed by an internal daemon and uses **AsyncProfiler** for data collection. You can configure these in `fe.conf`. + +| Parameter | Default | Description | +| :--- | :--- | :--- | +| `proc_profile_cpu_enable` | `true` | Whether to enable automatic CPU profiling for FE. | +| `proc_profile_mem_enable` | `true` | Whether to enable automatic memory allocation profiling for FE. | +| `proc_profile_collect_time_s` | `120` | Duration (seconds) for each profile collection. | +| `proc_profile_jstack_depth` | `128` | Maximum Java stack depth to collect. | +| `proc_profile_file_retained_days` | `1` | How many days to retain profile files. | +| `proc_profile_file_retained_size_bytes` | `2147483648` (2GB) | Maximum total size of retained profile files. | + +### Backend (BE) Configuration + +BE profiles are collected using the built-in **gperftools** and are typically collected via a background script or manual triggers. The collected data is then converted into flame graphs using **pprof**. + +#### Configuration in `be.conf` + +| Parameter | Default | Description | +| :--- | :--- | :--- | +| `brpc_port` | `8060` | Port used by collection scripts to fetch data from BE. | +| `sys_log_dir` | `${STARROCKS_HOME}/log` | Base directory for storing collected profiles (stored in the `proc_profile` subdirectory). | +| `flamegraph_tool_dir` | `${STARROCKS_HOME}/bin/flamegraph` | Path to conversion tools (**pprof**, `flamegraph.pl`). | +| `COLLECT_BE_PROFILE_INTERVAL` | `60` | Collection interval in seconds when running the `collect_be_profile.sh` script in daemon mode. | + +#### Manual BE Collection Options + +The `collect_be_profile.sh` script supports the following command-line options: + +| Option | Default | Description | +| :--- | :--- | :--- | +| `--profiling-type` | `cpu` | Type of profile to collect: `cpu`, `contention`, or `both`. | +| `--duration` | `10` | Duration (seconds) for each profile collection. | +| `--interval` | `60` | Interval (seconds) between collections in daemon mode. | +| `--cleanup-days` | `1` | Number of days to retain profile files. | +| `--cleanup-size` | `2147483648` (2GB) | Maximum total size of retained profile files. | +| `--daemon` | - | Run the collection script in daemon mode in the background. | + +### Manual BE Collection Example + +You can use the provided script to trigger or schedule BE collection: + +```bash +# Collect a 30-second CPU profile +./bin/collect_be_profile.sh --profiling-type cpu --duration 30 + +# Run in daemon mode to collect profiles every hour +./bin/collect_be_profile.sh --daemon --interval 3600 +``` + diff --git a/docs/en/administration/management/resource_management/Blacklist.md b/docs/en/administration/management/resource_management/Blacklist.md new file mode 100644 index 0000000..42066ca --- /dev/null +++ b/docs/en/administration/management/resource_management/Blacklist.md @@ -0,0 +1,138 @@ +--- +displayed_sidebar: docs +sidebar_position: 80 +--- + +# Blacklist Management + +In some cases, administrators need to disable certain patterns of SQL to avoid SQL from triggering cluster crashes or unexpected high concurrent queries. The blacklist is only for SELECT statements, INSERT statements (from v3.1 onwards), and CTAS statements (from v3.4 onwards). + +StarRocks allows users to add, view, and delete SQL blacklists. + +## Syntax + +Enable SQL blacklisting via `enable_sql_blacklist`. The default is False (off). + +~~~sql +admin set frontend config ("enable_sql_blacklist" = "true"); +~~~ + +The admin user who has ADMIN_PRIV privileges can manage blacklists by executing the following commands: + +~~~sql +ADD SQLBLACKLIST ""; +DELETE SQLBLACKLIST ; +SHOW SQLBLACKLIST; +~~~ + +* When `enable_sql_blacklist` is true, every SQL query needs to be filtered by sqlblacklist. If it matches, the user will be informed that the SQL is in the blacklist. Otherwise, the SQL will be executed normally. The message may be as follows when the SQL is blacklisted: + +`ERROR 1064 (HY000): Access denied; sql 'select count (*) from test_all_type_select_2556' is in blacklist` + +## Add blacklist + +~~~sql +ADD SQLBLACKLIST ""; +~~~ + +**sql** is a regular expression for a certain type of SQL. + +:::tip +Currently, StarRocks supports adding SELECT statements to the SQL Blacklist. +::: + +Since SQL itself contains the common characters `(`, `)`, `*`, `.` that may be mixed up with the semantics of regular expressions, we need to distinguish those by using escape characters. Given that `(` and `)` are used too often in SQL, there is no need to use escape characters. Other special characters need to use the escape character `\` as a prefix. For example: + +* Prohibit `count(\*)`: + +~~~sql +ADD SQLBLACKLIST "select count(\\*) from .+"; +~~~ + +* Prohibit `count(distinct)`: + +~~~sql +ADD SQLBLACKLIST "select count(distinct .+) from .+"; +~~~ + +* Prohibit order by limit `x`, `y`, `1 <= x <=7`, `5 <=y <=7`: + +~~~sql +ADD SQLBLACKLIST "select id_int from test_all_type_select1 order by id_int limit [1-7], [5-7]"; +~~~ + +* Prohibit complex SQL: + +~~~sql +ADD SQLBLACKLIST "select id_int \\* 4, id_tinyint, id_varchar from test_all_type_nullable except select id_int, id_tinyint, id_varchar from test_basic except select (id_int \\* 9 \\- 8) \\/ 2, id_tinyint, id_varchar from test_all_type_nullable2 except select id_int, id_tinyint, id_varchar from test_basic_nullable"; +~~~ + +* Prohibit all INSERT INTO statements: + +~~~sql +ADD SQLBLACKLIST "(?i)^insert\\s+into\\s+.*"; +~~~ + +* Prohibit all INSERT INTO ... VALUES statements: + +~~~sql +ADD SQLBLACKLIST "(?i)^insert\\s+into\\s+.*values\\s*\\("; +~~~ + +* Prohibit all INSERT INTO ... VALUES statements except those against the system-defined view `_statistics_.column_statistics`: + +~~~sql +ADD SQLBLACKLIST "(?i)^insert\\s+into\\s+(?!column_statistics\\b).*values\\s*\\("; +~~~ + +## View blacklist + +~~~sql +SHOW SQLBLACKLIST; +~~~ + +Result format: `Index | Forbidden SQL` + +For example: + +~~~sql +mysql> show sqlblacklist; ++-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Index | Forbidden SQL | ++-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 1 | select count\(\*\) from .+ | +| 2 | select id_int \* 4, id_tinyint, id_varchar from test_all_type_nullable except select id_int, id_tinyint, id_varchar from test_basic except select \(id_int \* 9 \- 8\) \/ 2, id_tinyint, id_varchar from test_all_type_nullable2 except select id_int, id_tinyint, id_varchar from test_basic_nullable | +| 3 | select id_int from test_all_type_select1 order by id_int limit [1-7], [5-7] | +| 4 | select count\(distinct .+\) from .+ | ++-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +~~~ + +The SQL shown in `Forbidden SQL` is escaped for all SQL semantic characters. + +## Delete blacklist + +~~~sql +DELETE SQLBLACKLIST ; +~~~ + +`` is a list of SQL IDs separated by comma (,). + +For example, delete the No.3 and No.4 SQLs in the above blacklist: + +~~~sql +delete sqlblacklist 3, 4; +~~~ + +Then, the remaining sqlblacklist is as follows: + +~~~sql +mysql> show sqlblacklist; ++-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Index | Forbidden SQL | ++-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 1 | select count\(\*\) from .+ | +| 2 | select id_int \* 4, id_tinyint, id_varchar from test_all_type_nullable except select id_int, id_tinyint, id_varchar from test_basic except select \(id_int \* 9 \- 8\) \/ 2, id_tinyint, id_varchar from test_all_type_nullable2 except select id_int, id_tinyint, id_varchar from test_basic_nullable | ++-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +~~~ diff --git a/docs/en/administration/management/resource_management/Load_balance.md b/docs/en/administration/management/resource_management/Load_balance.md new file mode 100644 index 0000000..e631d74 --- /dev/null +++ b/docs/en/administration/management/resource_management/Load_balance.md @@ -0,0 +1,127 @@ +--- +displayed_sidebar: docs +sidebar_position: 60 +--- + +# Load Balancing + +When deploying multiple FE nodes, users can deploy a load balancing layer on top of the FEs to achieve high availability. + +The following are some high availability options: + +## Code approach + +One way is to implement code at the application layer to perform retry and load balancing. For example, if a connection is broken, it will automatically retry on other connections. This approach requires users to configure multiple FE node addresses. + +## JDBC Connector + +JDBC connector supports automatic retry: + +```sql +jdbc:mysql:loadbalance://[host1][:port],[host2][:port][,[host3][:port]]...[/[database]][?propertyName1=propertyValue1[&propertyName2=propertyValue2]...] +``` + +## ProxySQL + +ProxySQL is a MySQL proxy layer that supports read/write separation, query routing, SQL caching, dynamic load configuration, failover, and SQL filtering. + +StarRocks FE is responsible for receiving connection and query requests, and it’s horizontally scalable and highly available. However FE requires users to set up a proxy layer on top of it to achieve automatic load balancing. See the following steps for setup: + +### 1. Install relevant dependencies + +```shell +yum install -y gnutls perl-DBD-MySQL perl-DBI perl-devel +``` + +### 2. Download the installation package + +```shell +wget https://github.com/sysown/proxysql/releases/download/v2.0.14/proxysql-2.0.14-1-centos7.x86_64.rpm +``` + +### 3. Decompress to the current directory + +```shell +rpm2cpio proxysql-2.0.14-1-centos7.x86_64.rpm | cpio -ivdm +``` + +### 4. Modify the configuration file + +```shell +vim ./etc/proxysql.cnf +``` + +Direct to a directory that the user has privilege to access (absolute path): + +```vim +datadir="/var/lib/proxysql" +errorlog="/var/lib/proxysql/proxysql.log" +``` + +### 5. Start + +```shell +./usr/bin/proxysql -c ./etc/proxysql.cnf --no-monitor +``` + +### 6. Log in + +```shell +mysql -u admin -padmin -h 127.0.0.1 -P6032 +``` + +### 7. Configure the global log + +```shell +SET mysql-eventslog_filename='proxysql_queries.log'; +SET mysql-eventslog_default_log=1; +SET mysql-eventslog_format=2; +LOAD MYSQL VARIABLES TO RUNTIME; +SAVE MYSQL VARIABLES TO DISK; +``` + +### 8. Insert into the leader node + +```sql +insert into mysql_servers(hostgroup_id, hostname, port) values(1, '172.xx.xx.139', 9030); +``` + +### 9. Insert the observer nodes + +```sql +insert into mysql_servers(hostgroup_id, hostname, port) values(2, '172.xx.xx.139', 9030); +insert into mysql_servers(hostgroup_id, hostname, port) values(2, '172.xx.xx.140', 9030); +``` + +### 10. Load the configuration + +```sql +load mysql servers to runtime; +save mysql servers to disk; +``` + +### 11. Configure the username and password + +```sql +insert into mysql_users(username, password, active, default_hostgroup, backend, frontend) values('root', '*94BDCEBE19083CE2A1F959FD02F964C7AF4CFC29', 1, 1, 1, 1); +``` + +### 12. Load the configuration + +```sql +load mysql users to runtime; +save mysql users to disk; +``` + +### 13. Write to the proxy rules + +```sql +insert into mysql_query_rules(rule_id, active, match_digest, destination_hostgroup, mirror_hostgroup, apply) values(1, 1, '.', 1, 2, 1); +``` + +### 14. Load the configuration + +```sql +load mysql query rules to runtime; +save mysql query rules to disk; +``` diff --git a/docs/en/administration/management/resource_management/Memory_management.md b/docs/en/administration/management/resource_management/Memory_management.md new file mode 100644 index 0000000..33a1c86 --- /dev/null +++ b/docs/en/administration/management/resource_management/Memory_management.md @@ -0,0 +1,99 @@ +--- +displayed_sidebar: docs +sidebar_position: 40 +--- + +# Memory Management + +This section briefly introduces memory classification and StarRocks’ methods of managing memory. + +## Memory Classification + +Explanation: + +| Metric | Name | Description | +| --- | --- | --- | +| process | Total memory used of BE | | +| query\_pool | Memory used by data querying | Consists of two parts: memory used by the execution layer and memory used by the storage layer.| +| load | Memory used by data loading | Generally MemTable| +| table_meta | Metadata memory | S Schema, Tablet metadata, RowSet metadata, Column metadata, ColumnReader, IndexReader | +| compaction | Multi-version memory compaction | compaction that happens after data import is complete | +| snapshot | Snapshot memory | Generally used for clone, little memory usage | +| column_pool | Column pool memory | Request to release column cache for accelerated column | +| page_cache | BE's own PageCache | The default is off, the user can turn it on by modifying the BE file | + +## Memory-related configuration + +* **BE Configuration** + +| Name | Default| Description| +| --- | --- | --- | +| vector_chunk_size | 4096 | Number of chunk rows | +| mem_limit | 90% | BE process memory upper limit. You can set it as a percentage ("80%") or a physical limit ("100G"). The default hard limit is 90% of the server's memory size, and the soft limit is 80%. You need to configure this parameter if you want to deploy StarRocks with other memory-intensive services on a same server. | +| disable_storage_page_cache | false | The boolean value to control whether to disable PageCache. When PageCache is enabled, StarRocks caches the recently scanned data. PageCache can significantly improve the query performance when similar queries are repeated frequently. `true` indicates to disable PageCache. Use this item together with `storage_page_cache_limit`, you can accelerate query performance in scenarios with sufficient memory resources and much data scan. The default value of this item has been changed from `true` to `false` since StarRocks v2.4. | +| write_buffer_size | 104857600 | The capacity limit of a single MemTable, exceeding which a disk swipe will be performed. | +| load_process_max_memory_limit_bytes | 107374182400 | The upper limit of memory resources that can be taken up by all load processes on a BE node. Its value is the smaller one between `mem_limit * load_process_max_memory_limit_percent / 100` and `load_process_max_memory_limit_bytes`. If this threshold is exceeded, a flush and backpressure will be triggered. | +| load_process_max_memory_limit_percent | 30 | The maximum percentage of memory resources that can be taken up by all load processes on a BE node. Its value is the smaller one between `mem_limit * load_process_max_memory_limit_percent / 100` and `load_process_max_memory_limit_bytes`. If this threshold is exceeded, a flush and backpressure will be triggered. | +| default_load_mem_limit | 2147483648 | If the memory limit on the receiving side is reached for a single import instance, a disk swipe will be triggered. This needs to be modified with the Session variable `load_mem_limit` to take effect. This parameter is mutable when the Event-based Compaction Framework is enabled.| +| max_compaction_concurrency | -1 | The maximum concurrency of compactions (both Base Compaction and Cumulative Compaction). The value -1 indicates that no limit is imposed on the concurrency. | +| cumulative_compaction_check_interval_seconds | 1 | Interval of compaction check| + +* **Session variables** + +| Name| Default| Description| +| --- | --- | --- | +| query_mem_limit| 0| Memory limit of a query on each BE node | +| load_mem_limit | 0| Memory limit of a single import task. If the value is 0, `exec_mem_limit` will be taken| + +## View memory usage + +* **mem\_tracker** + +~~~ bash +//View the overall memory statistics + + +// View fine-grained memory statistics + +~~~ + +* **tcmalloc** + +~~~ bash + +~~~ + +~~~plain text +------------------------------------------------ +MALLOC: 777276768 ( 741.3 MiB) Bytes in use by application +MALLOC: + 8851890176 ( 8441.8 MiB) Bytes in page heap freelist +MALLOC: + 143722232 ( 137.1 MiB) Bytes in central cache freelist +MALLOC: + 21869824 ( 20.9 MiB) Bytes in transfer cache freelist +MALLOC: + 832509608 ( 793.9 MiB) Bytes in thread cache freelists +MALLOC: + 58195968 ( 55.5 MiB) Bytes in malloc metadata +MALLOC: ------------ +MALLOC: = 10685464576 (10190.5 MiB) Actual memory used (physical + swap) +MALLOC: + 25231564800 (24062.7 MiB) Bytes released to OS (aka unmapped) +MALLOC: ------------ +MALLOC: = 35917029376 (34253.1 MiB) Virtual address space used +MALLOC: +MALLOC: 112388 Spans in use +MALLOC: 335 Thread heaps in use +MALLOC: 8192 Tcmalloc page size +------------------------------------------------ +Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()). +Bytes released to the OS take up virtual address space but no physical memory. +~~~ + +The memory queried by this method is accurate. However, some memory in StarRocks is reserved but not in use. TcMalloc counts the memory that is reserved, not the memory used. + +Here `Bytes in use by application` refers to the memory currently in use. + +* **metrics** + +~~~bash +curl -XGET http://be_ip:be_http_port/metrics | grep 'mem' +curl -XGET http://be_ip:be_http_port/metrics | grep 'column_pool' +~~~ + +The value of metrics is updated every 10 seconds. It is possible to monitor some of the memory statistics with older versions. diff --git a/docs/en/administration/management/resource_management/Query_management.md b/docs/en/administration/management/resource_management/Query_management.md new file mode 100644 index 0000000..5948e03 --- /dev/null +++ b/docs/en/administration/management/resource_management/Query_management.md @@ -0,0 +1,105 @@ +--- +displayed_sidebar: docs +sidebar_position: 30 +--- + +# Query Management + +## Number of user connections + +`Property` is set for user granularity. To set the maximum number of connections between Client and FE, use the following command. + +```sql +ALTER USER '' SET PROPERTIES ("key"="value", ...) +``` + +User properties include the resources assigned to the user. The properties set here are for the user, not `user_identity`. That is, if two users `jack'@'%` and `jack'@'192.%` are created by the `CREATE USER` statement, then the `ALTER USER SET PROPERTIES` statement can work on the user `jack`, not `jack'@'%` or `jack'@'192.%`. + +Example 1: + +```sql +-- For the user `jack`, change the maximum number of connections to 1000 +ALTER USER 'jack' SET PROPERTIES ("max_user_connections" = "1000"); + +-- Check the connection limit for the root user +SHOW PROPERTY FOR 'root'; +``` + +## Query-related session variables + +The session variables can be set by 'key' = 'value', which can limit the concurrency, memory and other query parameters in the current session. For example: + +- parallel_fragment_exec_instance_num + + The parallelism of the query with a default value of 1. It indicates the number of fragment instances on each BE. You can set this to half the number of CPU cores of the BE to improve query performance. + +- query_mem_limit + + Memory limit of a query on each BE node, can be adjusted when a query reports insufficient memory. + +- load_mem_limit + + Memory limit for import, can be adjusted when an import job reports insufficient memory. + +Example 2: + +```sql +set parallel_fragment_exec_instance_num = 8; +set query_mem_limit = 137438953472; +``` + +## capacity quota of database storage + +The capacity quota of database storage is unlimited by default. And you can change quota value by using `alter database`. + +```sql +ALTER DATABASE db_name SET DATA QUOTA quota; +``` + +The quota units are: B/K/KB/M/MB/G/GB/T/TB/P/PB + +Example 3: + +```sql +ALTER DATABASE example_db SET DATA QUOTA 10T; +``` + +## Kill queries + +To terminate a query on a particular connection with the following command: + +```sql +kill connection_id; +``` + +The `connection_id` can be seen by `show processlist;` or `select connection_id();`. + +```plain text + show processlist; ++------+------------+---------------------+-----------------+---------------+---------+------+-------+------+ +| Id | User | Host | Cluster | Db | Command | Time | State | Info | ++------+------------+---------------------+-----------------+---------------+---------+------+-------+------+ +| 1 | starrocksmgr | 172.26.34.147:56208 | default_cluster | starrocks_monitor | Sleep | 8 | | | +| 129 | root | 172.26.92.139:54818 | default_cluster | | Query | 0 | | | +| 114 | test | 172.26.34.147:57974 | default_cluster | ssb_100g | Query | 3 | | | +| 3 | starrocksmgr | 172.26.34.147:57268 | default_cluster | starrocks_monitor | Sleep | 8 | | | +| 100 | root | 172.26.34.147:58472 | default_cluster | ssb_100 | Sleep | 637 | | | +| 117 | starrocksmgr | 172.26.34.147:33790 | default_cluster | starrocks_monitor | Sleep | 8 | | | +| 6 | starrocksmgr | 172.26.34.147:57632 | default_cluster | starrocks_monitor | Sleep | 8 | | | +| 119 | starrocksmgr | 172.26.34.147:33804 | default_cluster | starrocks_monitor | Sleep | 8 | | | +| 111 | root | 172.26.92.139:55472 | default_cluster | | Sleep | 2758 | | | ++------+------------+---------------------+-----------------+---------------+---------+------+-------+------+ +9 rows in set (0.00 sec) + +mysql> select connection_id(); ++-----------------+ +| CONNECTION_ID() | ++-----------------+ +| 98 | ++-----------------+ + + +mysql> kill 114; +Query OK, 0 rows affected (0.02 sec) + +``` diff --git a/docs/en/administration/management/resource_management/Replica.md b/docs/en/administration/management/resource_management/Replica.md new file mode 100644 index 0000000..65bc477 --- /dev/null +++ b/docs/en/administration/management/resource_management/Replica.md @@ -0,0 +1,576 @@ +--- +displayed_sidebar: docs +sidebar_position: 70 +--- + +# Manage replica + +Manage data replicas in your StarRocks cluster. + +This topic includes two sections - [shared-nothing](#shared-nothing) and [shared-data](#shared-data). + +## Shared-nothing + +### Overview + +For native tables in shared-nothing clusters, StarRocks adopts a multi-replica strategy to guarantee the high availability of data. When you create a table, you must specify the replica count of the table using the table property `replication_num` (Default value: `3`). When a loading transaction starts, data is simultaneously loaded into the specified number of replicas. The transaction is returned with success only after the data is stored in the majority of replicas. For detailed information, see [Write quorum](#write-quorum). Nonetheless, StarRocks allows you to specify a lower write quorum for a table to achieve better loading performance. + +StarRocks stores multiple replicas across different BE nodes. For example, if you want to store three replicas for a table, you must deploy at least three BE nodes in your StarRocks cluster. If any of the replicas fail, StarRocks clones a healthy replica, partially or wholly, from another BE node to repair the failed replica. By using the Multi-Version Concurrency Control (MVCC) technique, StarRocks accelerates the repairing of the replica by duplicating the physical copies of these multi-version data. + +### Loading data into a multi-replica table + +![Replica-1](../../../_assets/replica-1.png) + +The routine of a loading transaction is as follows: + +1. The client submits a loading request to FE. + +2. FE chooses a BE node as the Coordinator BE node of this loading transaction, and generates an execution plan for the transaction. + +3. The Coordinator BE node reads the data to be loaded from the client. + +4. The Coordinator BE node dispatches the data to all the replicas of tablets. + + > **NOTE** + > + > A tablet is a logical slice of a table. A table has multiple tablets, and each tablet has replication_num replicas. The number of tablets in a table is determined by the `bucket_size` property of the table. + +5. After the data is loaded and stored in all the tablets, FE makes the loaded data visible. + +6. FE returns loading success to the client. + +Such a routine guarantees service availability even under extreme scenarios. + +### Write quorum + +Loading data into a multi-replica table can be very time-consuming. If you want to improve the loading performance and you can tolerate relatively lower data availability, you can set a lower write quorum for tables. A write quorum refers to the minimum number of replicas that need to acknowledge a write operation before it is considered successful. You can specify write quorum by adding the property `write_quorum` when you [CREATE TABLE](../../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md), or add this property to an existing table using [ALTER TABLE](../../../sql-reference/sql-statements/table_bucket_part_index/ALTER_TABLE.md). This property is supported from v2.5. + +`write_quorum` supports the following values: + +- `MAJORITY`: Default value. When the majority of data replicas return loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed. +- `ONE`: When one of the data replicas returns loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed. +- `ALL`: When all of the data replicas return loading success, StarRocks returns loading task success. Otherwise, StarRocks returns loading task failed. + +### Automatic replica repair + +Replicas can fail because certain BE nodes crash or some loading tasks fail. StarRocks automatically repairs these failed replicas. + +Every `tablet_sched_checker_interval_seconds`, default 20 seconds, Tablet Checker in FE scans all tablet replicas of all tables in your StarRocks cluster, and judges if a replica is healthy by checking the version number of the currently visible data and the health status of the BE node. If the visible version of a replica lags behind those of the other replicas, StarRocks performs an incremental clone to repair the failed replica. If a BE node fails to receive heartbeats or is dropped from the cluster, or the replica is too lagged to be repaired by an incremental clone, StarRocks performs a full clone to repair the lost replica. + +After detecting tablet replicas that need repair, FE generates a tablet scheduling task, and adds the task to the scheduling task queue. Tablet Scheduler in the FE receives the scheduling task from the queue, creates clone tasks for each failed replica in accordance with the clone type they need, and assigns the tasks to the executor BE nodes. + +A clone task is essentially copying data from a source BE node (which has a healthy replica), and loading the data into the destination BE node (which has a failed replica). For a replica with a lagged data version, FE assigns an incremental clone task to the BE executor that stores the failed replica, and informs the executor BE node from which peer BE node it can find a healthy replica and clone the new data. If a replica is lost, FE chooses a surviving BE node as the executor BE node, creates an empty replica in the BE node, and assigns a full clone task to the BE node. + +For each clone task, regardless of its type, the executor BE node duplicates the physical data files from a healthy replica, and then updates its metadata accordingly. After the clone task is completed, the executor BE node reports task success to Tablet Scheduler in FE. After removing the redundant tablet replicas, FE updates its metadata, marking the completion of the replica repair. + +![Replica-2](../../../_assets/replica-2.png) + +During tablet repair, StarRocks can still execute queries. StarRocks can load data into the table as long as the number of healthy replicas satisfies `write_quorum`. + +### Repair replica manually + +The manual replica repair consists of two steps: + +1. Check the replica status. +2. Set the replica priority level. + +#### Check replica status + +Follow these steps to check the replica status of tablets to identify the unhealthy (failed) tablets. + +1. **Check the status of all tablets in the cluster.** + + ```SQL + SHOW PROC '/statistic'; + ``` + + Example: + + ```Plain + mysql> SHOW PROC '/statistic'; + +----------+-----------------------------+----------+--------------+----------+-----------+------------+--------------------+-----------------------+ + | DbId | DbName | TableNum | PartitionNum | IndexNum | TabletNum | ReplicaNum | UnhealthyTabletNum | InconsistentTabletNum | + +----------+-----------------------------+----------+--------------+----------+-----------+------------+--------------------+-----------------------+ + | 35153636 | default_cluster:DF_Newrisk | 3 | 3 | 3 | 96 | 288 | 0 | 0 | + | 48297972 | default_cluster:PaperData | 0 | 0 | 0 | 0 | 0 | 0 | 0 | + | 5909381 | default_cluster:UM_TEST | 7 | 7 | 10 | 320 | 960 | 1 | 0 | + | Total | 240 | 10 | 10 | 13 | 416 | 1248 | 1 | 0 | + +----------+-----------------------------+----------+--------------+----------+-----------+------------+--------------------+-----------------------+ + ``` + + - `UnhealthyTabletNum`: indicates the number of unhealthy tablets in the corresponding database. + - `InconsistentTabletNum`: indicates the number of tablets whose replicas are inconsistent. + + If the value of `UnhealthyTabletNum` or `InconsistentTabletNum` is not `0` in a specific database, you can check the unhealthy tablets in the database with its `DbId`. + + ```SQL + SHOW PROC '/statistic/' + ``` + + Example: + + ```Plain + mysql> SHOW PROC '/statistic/5909381'; + +------------------+---------------------+ + | UnhealthyTablets | InconsistentTablets | + +------------------+---------------------+ + | [40467980] | [] | + +------------------+---------------------+ + ``` + + The ID of the unhealthy tablet is returned in the field `UnhealthyTablets`. + +2. **Check the tablet status in a specific table or partition.** + + You can use the WHERE clause in [ADMIN SHOW REPLICA STATUS](../../../sql-reference/sql-statements/cluster-management/tablet_replica/ADMIN_SHOW_REPLICA_STATUS.md) to filter the tablets with a certain `STATUS`. + + ```SQL + ADMIN SHOW REPLICA STATUS FROM + [PARTITION ([, , ...])] + [WHERE STATUS = {'OK'|'DEAD'|'VERSION_ERROR'|'SCHEMA_ERROR'|'MISSING'}] + ``` + + Example: + + ```Plain + mysql> ADMIN SHOW REPLICA STATUS FROM tbl PARTITION (p1, p2) WHERE STATUS = "OK"; + +----------+-----------+-----------+---------+-------------------+--------------------+------------------+------------+------------+-------+--------+--------+ + | TabletId | ReplicaId | BackendId | Version | LastFailedVersion | LastSuccessVersion | CommittedVersion | SchemaHash | VersionNum | IsBad | State | Status | + +----------+-----------+-----------+---------+-------------------+--------------------+------------------+------------+------------+-------+--------+--------+ + | 29502429 | 29502432 | 10006 | 2 | -1 | 2 | 1 | -1 | 2 | false | NORMAL | OK | + | 29502429 | 36885996 | 10002 | 2 | -1 | -1 | 1 | -1 | 2 | false | NORMAL | OK | + | 29502429 | 48100551 | 10007 | 2 | -1 | -1 | 1 | -1 | 2 | false | NORMAL | OK | + | 29502433 | 29502434 | 10001 | 2 | -1 | 2 | 1 | -1 | 2 | false | NORMAL | OK | + | 29502433 | 44900737 | 10004 | 2 | -1 | -1 | 1 | -1 | 2 | false | NORMAL | OK | + | 29502433 | 48369135 | 10006 | 2 | -1 | -1 | 1 | -1 | 2 | false | NORMAL | OK | + +----------+-----------+-----------+---------+-------------------+--------------------+------------------+------------+------------+-------+--------+--------+ + ``` + + If the field `IsBad` is `true`, this tablet is corrupted. + + For detailed information provided in the field `Status`, see [ADMIN SHOW REPLICA STATUS](../../../sql-reference/sql-statements/cluster-management/tablet_replica/ADMIN_SHOW_REPLICA_STATUS.md). + + You can further explore the details of tablets in the table using [SHOW TABLET](../../../sql-reference/sql-statements/table_bucket_part_index/SHOW_TABLET.md). + + ```SQL + SHOW TABLET FROM + ``` + + Example: + + ```Plain + mysql> SHOW TABLET FROM tbl1; + +----------+-----------+-----------+------------+---------+-------------+-------------------+-----------------------+------------------+----------------------+---------------+----------+----------+--------+-------------------------+--------------+----------------------+--------------+----------------------+----------------------+----------------------+ + | TabletId | ReplicaId | BackendId | SchemaHash | Version | VersionHash | LstSuccessVersion | LstSuccessVersionHash | LstFailedVersion | LstFailedVersionHash | LstFailedTime | DataSize | RowCount | State | LstConsistencyCheckTime | CheckVersion | CheckVersionHash | VersionCount | PathHash | MetaUrl | CompactionStatus | + +----------+-----------+-----------+------------+---------+-------------+-------------------+-----------------------+------------------+----------------------+---------------+----------+----------+--------+-------------------------+--------------+----------------------+--------------+----------------------+----------------------+----------------------+ + | 29502429 | 29502432 | 10006 | 1421156361 | 2 | 0 | 2 | 0 | -1 | 0 | N/A | 784 | 0 | NORMAL | N/A | -1 | -1 | 2 | -5822326203532286804 | url | url | + | 29502429 | 36885996 | 10002 | 1421156361 | 2 | 0 | -1 | 0 | -1 | 0 | N/A | 784 | 0 | NORMAL | N/A | -1 | -1 | 2 | -1441285706148429853 | url | url | + | 29502429 | 48100551 | 10007 | 1421156361 | 2 | 0 | -1 | 0 | -1 | 0 | N/A | 784 | 0 | NORMAL | N/A | -1 | -1 | 2 | -4784691547051455525 | url | url | + +----------+-----------+-----------+------------+---------+-------------+-------------------+-----------------------+------------------+----------------------+---------------+----------+----------+--------+-------------------------+--------------+----------------------+--------------+----------------------+----------------------+----------------------+ + ``` + + The returned results show the size, row count, version, and URL of the tablets. + + The field `State` returned by SHOW TABLET indicates the task state of the tablet, including `CLONE`, `SCHEMA_CHANGE`, and `ROLLUP`. + + You can also check the replica distribution of a specific table or partition to see if these replicas are distributed evenly using [ADMIN SHOW REPLICA DISTRIBUTION](../../../sql-reference/sql-statements/cluster-management/tablet_replica/ADMIN_SHOW_REPLICA_DISTRIBUTION.md). + + ```SQL + ADMIN SHOW REPLICA DISTRIBUTION FROM + ``` + + Example: + + ```Plain + mysql> ADMIN SHOW REPLICA DISTRIBUTION FROM tbl1; + +-----------+------------+-------+---------+ + | BackendId | ReplicaNum | Graph | Percent | + +-----------+------------+-------+---------+ + | 10000 | 7 | | 7.29 % | + | 10001 | 9 | | 9.38 % | + | 10002 | 7 | | 7.29 % | + | 10003 | 7 | | 7.29 % | + | 10004 | 9 | | 9.38 % | + | 10005 | 11 | > | 11.46 % | + | 10006 | 18 | > | 18.75 % | + | 10007 | 15 | > | 15.62 % | + | 10008 | 13 | > | 13.54 % | + +-----------+------------+-------+---------+ + ``` + + The returned results show the number of tablet replicas on each BE node, and their corresponding percentages. + +3. **Check the** **replica** **status of a specific tablet.** + + With the `TabletId` of the unhealthy tablets you obtained in the preceding procedures, you can examine the replica statues of them. + + ```SQL + SHOW TABLET + ``` + + Example: + + ```Plain + mysql> SHOW TABLET 29502553; + +------------------------+-----------+---------------+-----------+----------+----------+-------------+----------+--------+---------------------------------------------------------------------------+ + | DbName | TableName | PartitionName | IndexName | DbId | TableId | PartitionId | IndexId | IsSync | DetailCmd | + +------------------------+-----------+---------------+-----------+----------+----------+-------------+----------+--------+---------------------------------------------------------------------------+ + | default_cluster:test | test | test | test | 29502391 | 29502428 | 29502427 | 29502428 | true | SHOW PROC '/dbs/29502391/29502428/partitions/29502427/29502428/29502553'; | + +------------------------+-----------+---------------+-----------+----------+----------+-------------+----------+--------+---------------------------------------------------------------------------+ + ``` + + The returned results show detailed information about the database, table, partition, and index (Rollup) of the tablet. + + You can copy the SQL statement in the field `DetailCmd` to further examine the replica status of the tablet. + + Example: + + ```Plain + mysql> SHOW PROC '/dbs/29502391/29502428/partitions/29502427/29502428/29502553'; + +-----------+-----------+---------+-------------+-------------------+-----------------------+------------------+----------------------+---------------+------------+----------+----------+--------+-------+--------------+----------------------+----------+------------------+ + | ReplicaId | BackendId | Version | VersionHash | LstSuccessVersion | LstSuccessVersionHash | LstFailedVersion | LstFailedVersionHash | LstFailedTime | SchemaHash | DataSize | RowCount | State | IsBad | VersionCount | PathHash | MetaUrl | CompactionStatus | + +-----------+-----------+---------+-------------+-------------------+-----------------------+------------------+----------------------+---------------+------------+----------+----------+--------+-------+--------------+----------------------+----------+------------------+ + | 43734060 | 10004 | 2 | 0 | -1 | 0 | -1 | 0 | N/A | -1 | 784 | 0 | NORMAL | false | 2 | -8566523878520798656 | url | url | + | 29502555 | 10002 | 2 | 0 | 2 | 0 | -1 | 0 | N/A | -1 | 784 | 0 | NORMAL | false | 2 | 1885826196444191611 | url | url | + | 39279319 | 10007 | 2 | 0 | -1 | 0 | -1 | 0 | N/A | -1 | 784 | 0 | NORMAL | false | 2 | 1656508631294397870 | url | url | + +-----------+-----------+---------+-------------+-------------------+-----------------------+------------------+----------------------+---------------+------------+----------+----------+--------+-------+--------------+----------------------+----------+------------------+ + ``` + + The returned results show all the replicas of the tablet. + +#### Set replica priority level + +Tablet Scheduler automatically assigns a different priority level to each different type of clone task. + +If you want the tablets from a certain table or certain partitions to be repaired at the earliest opportunity, you can manually assign a `VERY_HIGH` priority level to them using [ADMIN REPAIR TABLE](../../../sql-reference/sql-statements/cluster-management/tablet_replica/ADMIN_REPAIR.md). + +```SQL +ADMIN REPAIR TABLE +[PARTITION ([, , ...])] +``` + +> **NOTE** +> +> - Executing this SQL statement only submits a hint to modify the priority level of the tablets to be repaired. It does not guarantee that these tablets can be successfully repaired. +> - Tablet Scheduler might still assign different priority levels to these tablets after you execute this SQL statement. +> - When the Leader FE node is changed or restarted, the hint this SQL statement submitted expires. + +You can cancel this operation using [ADMIN CANCEL REPAIR TABLE](../../../sql-reference/sql-statements/cluster-management/tablet_replica/ADMIN_CANCEL_REPAIR.md). + +```SQL +ADMIN CANCEL REPAIR TABLE +[PARTITION ([, , ...])] +``` + +### Replica balancing + +StarRocks automatically balances the tablets across BE nodes. + +To move a tablet from a high-load node to a low-load node, StarRocks first creates a replica of the tablet in the low-load node, and then drops the corresponding replica on the high-load node. If different types of storage mediums are used in the cluster, StarRocks categorizes all the BE nodes in accordance with the storage medium types. StarRocks moves the tablet across the BE nodes of the same storage medium type whenever possible. Replicas of the same tablet are stored on different BE nodes. + +#### BE load + +StarRocks shows the load statistics of each BE node in the cluster using `ClusterLoadStatistics` (CLS). Tablet Scheduler triggers the replica balancing based on `ClusterLoadStatistics`. StarRocks evaluates the **disk utilization** and the **replica count** of each BE node and calculates a `loadScore` accordingly. The higher the `loadScore` of a BE node, the higher the load the node has. Tablet Scheduler updates `ClusterLoadStatistics` every one minute. + +`capacityCoefficient` and `replicaNumCoefficient`are the weighting factors for the disk utilization and the replica count respectively. The sum of `capacityCoefficient` and `replicaNumCoefficient` is one. `capacityCoefficient` is dynamically adjusted according to the actual disk usage. When the overall disk utilization of a BE node is below 50%, the `capacityCoefficient` value is 0.5. When the disk utilization is above 75%, the value is 1. You can configure this limit via the FE configuration item `capacity_used_percent_high_water`. If the utilization is between 50% and 75%, `capacityCoefficient` increases smoothly based on this formula: + +```SQL +capacityCoefficient= 2 * Disk utilization - 0.5 +``` + +`capacityCoefficient` ensures that when the disk usage is exceedingly high, the `loadScore` of this BE node gets higher, forcing the system to reduce the load on this BE node at the earliest opportunity. + +#### Balancing policy + +Each time Tablet Scheduler schedules tablets, it selects a certain number of healthy tablets as the candidate tablets to be balanced through Load Balancer. Next time when scheduling tablets, Tablet Scheduler balances these healthy tablets. + +#### View System Balance Status + +You can view the current overall balance status of the system and the details of different balance types. + +- **View the current overall balance status of the system:** + + ```SQL + SHOW PROC '/cluster_balance/balance_stat'; + ``` + + Example: + + ```Plain + +---------------+--------------------------------+----------+----------------+----------------+ + | StorageMedium | BalanceType | Balanced | PendingTablets | RunningTablets | + +---------------+--------------------------------+----------+----------------+----------------+ + | HDD | inter-node disk usage | true | 0 | 0 | + | HDD | inter-node tablet distribution | true | 0 | 0 | + | HDD | intra-node disk usage | true | 0 | 0 | + | HDD | intra-node tablet distribution | true | 0 | 0 | + | HDD | colocation group | true | 0 | 0 | + | HDD | label-aware location | true | 0 | 0 | + +---------------+--------------------------------+----------+----------------+----------------+ + ``` + + - `StorageMedium`: Storage medium. + - `BalanceType`: Type of balance. + - `Balanced`: Whether the balanced state is achieved. + - `PendingTablets`: Number of tablets with task status Pending. + - `RunningTablets`: Number of tablets with task status Running. + +- **View the balance of disk utilization by node:** + + ```SQL + SHOW PROC '/cluster_balance/cluster_load_stat'; + ``` + + Example: + + ```Plain + +---------------+----------------------------------------------------------------------------------------------------------------------+ + | StorageMedium | ClusterDiskBalanceStat | + +---------------+----------------------------------------------------------------------------------------------------------------------+ + | HDD | {"balanced":false,"maxBeId":1,"minBeId":2,"maxUsedPercent":0.9,"minUsedPercent":0.1,"type":"INTER_NODE_DISK_USAGE"} | + | SSD | {"balanced":true} | + +---------------+----------------------------------------------------------------------------------------------------------------------+ + ``` + + - `StorageMedium`: Storage medium. + - `ClusterDiskBalanceStat`: Balance status across nodes based on disk usage. If not balanced, displays the maximum and minimum disk utilization and the corresponding BEs. + +- **View the balance of disk usage within node:** + + ```SQL + SHOW PROC '/cluster_balance/cluster_load_stat/HDD'; + ``` + + Example: + + ```Plain + +-------+-----------------+-----------+--------------+--------------+-------------+------------+----------+-----------+-------+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | BeId | Cluster | Available | UsedCapacity | Capacity | UsedPercent | ReplicaNum | CapCoeff | ReplCoeff | Score | Class | BackendDiskBalanceStat | + +-------+-----------------+-----------+--------------+--------------+-------------+------------+----------+-----------+-------+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ + | 10004 | default_cluster | true | 651509602 | 243695955810 | 0.267 | 339 | 0.5 | 0.5 | 1.0 | MID | {"maxUsedPercent":0.9,"minUsedPercent":0.1,"beId":1,"maxPath":"/disk1","minPath":"/disk2","type":"INTRA_NODE_DISK_USAGE","balanced":false} | + | 10005 | default_cluster | true | 651509602 | 243695955810 | 0.267 | 339 | 0.5 | 0.5 | 1.0 | MID | {"balanced":true} | + +-------+-----------------+-----------+--------------+--------------+-------------+------------+----------+-----------+-------+-------+---------------------------------------------------------------------------------------------------------------------------------------------+ + ``` + + - `BeId`: ID of the BE node. + - `BackendDiskBalanceStat`: Balance status between disks within the node based on disk utilization. If not balanced, displays the maximum and minimum disk usage and the corresponding disk paths. + +- **View balanced distribution by tablet:** + + ```SQL + SHOW PROC '/dbs/ssb/lineorder/partitions/lineorder'; + ``` + + Example: + + ```Plain + +---------+-----------+--------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ + | IndexId | IndexName | State | LastConsistencyCheckTime | TabletBalanceStat | + +---------+-----------+--------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ + | 11129 | lineorder | NORMAL | NULL | {"maxTabletNum":23,"minTabletNum":21,"maxBeId":10012,"minBeId":10013,"type":"INTER_NODE_TABLET_DISTRIBUTION","balanced":false} | + | 11230 | lineorder | NORMAL | NULL | {"maxTabletNum":23,"minTabletNum":21,"beId":10012,"maxPath":"/disk1","minPath":"/disk2","type":"INTRA_NODE_TABLET_DISTRIBUTION","balanced":false} | + | 10432 | lineorder | NORMAL | NULL | {"tabletId":10435,"currentBes":[10002,10004],"expectedBes":[10003,10004],"type":"COLOCATION_GROUP","balanced":false} | + | 10436 | lineorder | NORMAL | NULL | {"tabletId":10438,"currentBes":[10005,10006],"expectedLocations":{"rack":["rack1","rack2"]},"type":"LABEL_AWARE_LOCATION","balanced":false} | + +---------+-----------+--------+--------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ + ``` + + - `IndexId`: ID of the Materialized Index within the partition. + - `TabletBalanceStat`: Balance status of tablet distribution between nodes or within nodes. If not balanced, displays the details of the imbalance, including Colocation Group, Label-aware Location. + +- **View partitions with imbalanced tablet distribution:** + + ```SQL + SELECT DB_NAME, TABLE_NAME, PARTITION_NAME, TABLET_BALANCED FROM information_schema.partitions_meta WHERE TABLET_BALANCED = 0; + ``` + + Example: + + ```Plain + +--------------+---------------+----------------+-----------------+ + | DB_NAME | TABLE_NAME | PARTITION_NAME | TABLET_BALANCED | + +--------------+---------------+----------------+-----------------+ + | ssb | lineorder | lineorder | 0 | + +--------------+---------------+----------------+-----------------+ + ``` + + - `TABLET_BALANCED`: whether the tablet distribution is balanced. + +#### Check tablet scheduling tasks + +You can check tablet scheduling tasks that are pending, running, and finished. + +- **Check pending tablet scheduling tasks** + + ```SQL + SHOW PROC '/cluster_balance/pending_tablets'; + ``` + + Example: + + ```Plain + +----------+--------+-----------------+---------+----------+----------+-------+---------+--------+----------+---------+---------------------+---------------------+---------------------+----------+------+-------------+---------------+---------------------+------------+---------------------+--------+---------------------+-------------------------------+ + | TabletId | Type | Status | State | OrigPrio | DynmPrio | SrcBe | SrcPath | DestBe | DestPath | Timeout | Create | LstSched | LstVisit | Finished | Rate | FailedSched | FailedRunning | LstAdjPrio | VisibleVer | VisibleVerHash | CmtVer | CmtVerHash | ErrMsg | + +----------+--------+-----------------+---------+----------+----------+-------+---------+--------+----------+---------+---------------------+---------------------+---------------------+----------+------+-------------+---------------+---------------------+------------+---------------------+--------+---------------------+-------------------------------+ + | 4203036 | REPAIR | REPLICA_MISSING | PENDING | HIGH | LOW | -1 | -1 | -1 | -1 | 0 | 2019-02-21 15:00:20 | 2019-02-24 11:18:41 | 2019-02-24 11:18:41 | N/A | N/A | 2 | 0 | 2019-02-21 15:00:43 | 1 | 0 | 2 | 0 | unable to find source replica | + +----------+--------+-----------------+---------+----------+----------+-------+---------+--------+----------+---------+---------------------+---------------------+---------------------+----------+------+-------------+---------------+---------------------+------------+---------------------+--------+---------------------+-------------------------------+ + ``` + + - `TabletId`: the ID of the tablet pending to be scheduled. A scheduled task is only for one tablet. + - `Type`: the task type. Valid values: REPAIR and BALANCE. + - `Status`: the current status of the tablet, such as REPLICA_MISSING. + - `State`: the state of the scheduling task. Valid values: PENDING, RUNNING, FINISHED, CANCELLED, TIMEOUT, and UNEXPECTED. + - `OrigPrio`: the original priority of the task. + - `DynmPrio`: the current priority of the task after the dynamic adjustment. + - `SrcBe`: the ID of the source BE node. + - `SrcPath`: the hash value of the path to the source BE node. + - `DestBe`: the ID of the destination BE node. + - `DestPath`: the hash value of the path to the destination BE node. + - `Timeout`: the timeout of the task when the task is scheduled successfully. Unit: second. + - `Create`: the time when the task was created. + - `LstSched`: the time when the task was scheduled most recently. + - `LstVisit`: the time when the task was visited most recently. To visit the task here means to schedule the task or to report its execution. + - `Finished`: the time when the task is finished. + - `Rate`: the rate at which the data is cloned. + - `FailedSched`: the number of task scheduling failures. + - `FailedRunning`: the number of task execution failures. + - `LstAdjPrio`: the time when the task priority was adjusted most recently. + - `CmtVer`, `CmtVerHash`, `VisibleVer`, and `VisibleVerHash`: the version information used to execute the clone task. + - `ErrMsg`: the error message that occurs when the task is scheduled and running. + +- **Check running tablet scheduling tasks** + + ```SQL + SHOW PROC '/cluster_balance/running_tablets'; + ``` + + The returned results are identical to those of the pending tasks. + +- **Check finished tablet scheduling tasks** + + ```SQL + SHOW PROC '/cluster_balance/history_tablets'; + ``` + + The returned results are identical to those of the pending tasks. If the `State` of the task is `FINISHED`, the task is completed successfully. If not, please check the `ErrMsg` field for the cause of the task failure. + +### Resource control + +Because StarRocks repairs and balances tablets by cloning tablets from one BE node to another, the I/O load of a BE node can increase dramatically if the node executes such tasks too frequently in a short time. To avoid this situation, StarRocks sets a concurrency limit on clone tasks for each BE node. The minimum unit of resource control is a disk, which is a data storage path (`storage_root_path`) you have specified in the BE configuration file. By default, StarRocks allocates two slots for each disk to process tablet repair tasks. A clone task occupies one slot on the source BE node and one on the destination BE node. If all the slots on a BE node are occupied, StarRocks stops scheduling tasks to the node. You can increase the number of slots on a BE node by increasing the value of the FE dynamic parameter `tablet_sched_slot_num_per_path`. + +StarRocks allocates two slots specifically for tablet balancing tasks to avoid the situation that a high-load BE node fails to release the disk space by balancing tablets because tablet repair tasks constantly occupy the slots. + +## Shared-data + +From v4.1 onwards, StarRocks supports repairing the data replica of cloud-native tables in shared-data clusters. + +### Overview + +In a shared-data architecture, data is stored in single-replica mode on remote storage systems such as object storage or HDFS in order to reduce storage costs. Unlike traditional shared-nothing architectures, this design cannot rely on multiple replicas to automatically recover data when files are lost. + +As a result, if the effective metadata version maintained by FE references metadata or data files that no longer exist in remote storage, data ingestion and query operations will fail with "File not found" errors, potentially rendering the service unavailable. + +Such file loss may occur under the following circumstances: + +- **Accidental deletion**: Object storage files are mistakenly removed due to operational errors. +- **Consistency issues**: In extreme cases, the storage system experiences delayed consistency or metadata loss. +- **Software defects**: System bugs cause files to be cleaned up prematurely. + +Traditional snapshot-based restore mechanisms are often time-consuming and costly, making them unsuitable for fast recovery in production environments. + +To address this issue, StarRocks provides a low-cost, second-level recovery mechanism. By scanning historical metadata versions, the system identifies the most recent healthy version in which all required files are present, and rolls back the Tablet metadata to that version. This approach sacrifices a small amount of recent data in exchange for rapid restoration of table availability. + +#### Mechanism + +This feature is an extension of the ADMIN REPAIR TABLE statement for cloud-native tables in shared-data clusters. + +It operates through the following mechanisms: + +1. **Automated Detection** + + The FE coordinates Compute Nodes (CNs) to probe historical Tablet metadata versions in reverse chronological order and in batches. + +2. **Deterministic Path Derivation** + + Metadata file paths are derived deterministically, allowing direct probing without performing expensive and inefficient object storage List Objects operations. + +3. **Multi-Strategy Recovery Decisions** + + Two recovery strategies are supported—Strict Consistency and Maximum Availability—to accommodate different business requirements. + +4. **Metadata Reset Capability** + + When metadata is completely unavailable, the system can create empty Tablets (Empty Tablet Recovery) to prevent a small number of corrupted Tablets from blocking the availability of the entire table. + +#### Key Benefits + +- **Ultra-fast recovery** + + Only metadata (KB/MB scale) is modified. No data movement is required, enabling second-level recovery even for PB-scale tables. + +- **Low operational cost** + + No additional replicas or expensive object storage API calls are needed. + +### Usage + +#### Syntax + +Use the ADMIN REPAIR TABLE statement with PROPERTIES to control recovery behavior. + +```SQL +ADMIN REPAIR TABLE +[PARTITION (, ...)] +PROPERTIES ( + 'enforce_consistent_version' = 'true', + 'allow_empty_tablet_recovery' = 'false' +); +``` + +**Properties** + +| Property | Type | Default | Description | +| --------------------------- | ------- | ------- | ------------------------------------------------------------------------------------- | +| enforce_consistent_version | Boolean | True | Whether to enforce all tablets in a partition to roll back to a consistent version. If this item is set to `true`, the system will search for a consistent version in the history that is valid for all tablets to perform the rollback, ensuring data version alignment across the partition. If it is set to `false`, each tablet in the partition is allowed to rollback to its latest available valid version. The versions of different tablets may be inconsistent, but this maximizes data preservation. | +| allow_empty_tablet_recovery | Boolean | False | Whether to allow recovery by creating empty tablets. This item takes effect only when `enforce_consistent_version` is `false`. If this item is set to `true`, when metadata is missing for all versions of some tablets but valid metadata exists for at least one tablet, the system attempts to create empty tablets to fill the missing versions. If metadata for all versions of all tablets is lost, recovery is impossible. | + +### Examples + +#### Example 1: Strict Consistency Recovery (Recommended for Strong Consistency) + +Restore all Tablets in partition `p20250101` to the most recent uniform and complete version. + +```SQL +ADMIN REPAIR TABLE my_cloud_table PARTITION (p20250101); +``` + +#### Example 2: Maximum Availability Recovery + +Restore each Tablet in partition p20250101 to its own most recent valid version, allowing version divergence. + +```SQL +ADMIN REPAIR TABLE my_cloud_table PARTITION (p20250101) +PROPERTIES ( + 'enforce_consistent_version' = 'false' +); +``` + +#### Example 3: Allow Empty Tablet Recovery + +Allow Tablets with completely missing metadata to be recovered as empty Tablets. + +```SQL +ADMIN REPAIR TABLE my_cloud_table PARTITION (p20250101) +PROPERTIES ( + 'enforce_consistent_version' = 'false', + 'allow_empty_tablet_recovery' = 'true' +); +``` + +### Limitations and Recommendations + +- Repairing replicas is supported for shared-data clusters from v4.1 onwards. +- Currently, setting `PROPERTIES` is applicable only to shared-data tables. +- Take the following points into consideration when performing materialized view recovery: + - Asynchronous materialized views require a manual refresh after recovery. + - Synchronous materialized views must use Strict Consistency Recovery to ensure consistency between base tables and rollup data. +- Recovery is best-effort. A small amount of data loss is expected, depending on the available historical versions and the selected recovery strategy. diff --git a/docs/en/administration/management/resource_management/be_label.md b/docs/en/administration/management/resource_management/be_label.md new file mode 100644 index 0000000..afcd9e9 --- /dev/null +++ b/docs/en/administration/management/resource_management/be_label.md @@ -0,0 +1,128 @@ +--- +displayed_sidebar: docs +sidebar_position: 80 +--- + +# Add labels on BEs + +Since v3.2.8, StarRocks supports adding labels on BEs. When creating a table or an asynchronous materialized view, you can specify the label of a certain group of BE nodes. This ensures that data replicas are distributed only on the BE nodes associated with that label. Data replicas will be evenly distributed among nodes with the same label, enhancing data high availability and resource isolation. + +## Usage + +### Add labels on BEs + +Suppose that one StarRocks cluster includes six BEs which are distributed evenly across three racks. You can add labels on BEs based on the racks where the BEs are located. + +```SQL +ALTER SYSTEM MODIFY BACKEND "172.xx.xx.46:9050" SET ("labels.location" = "rack:rack1"); +ALTER SYSTEM MODIFY BACKEND "172.xx.xx.47:9050" SET ("labels.location" = "rack:rack1"); +ALTER SYSTEM MODIFY BACKEND "172.xx.xx.48:9050" SET ("labels.location" = "rack:rack2"); +ALTER SYSTEM MODIFY BACKEND "172.xx.xx.49:9050" SET ("labels.location" = "rack:rack2"); +ALTER SYSTEM MODIFY BACKEND "172.xx.xx.50:9050" SET ("labels.location" = "rack:rack3"); +ALTER SYSTEM MODIFY BACKEND "172.xx.xx.51:9050" SET ("labels.location" = "rack:rack3"); +``` + +After adding labels, you can execute `SHOW BACKENDS;` and view the labels of BEs in the `Location` field of the returned result. + +If you need to modify the labels of BEs, you can execute `ALTER SYSTEM MODIFY BACKEND "172.xx.xx.48:9050" SET ("labels.location" = "rack:xxx");`. + +### Use labels to specify table data distribution on BE nodes + +If you need to specify the locations to which a table's data is distributed, for example, distributing a table's data across two racks, rack1 and rack2, you can add labels to the table. + +After labels are added, all the replicas of the same tablet in the table are distributed across labels in a Round-Robin approach. Moreover, if multiple replicas of the same tablet exist within the same label, these replicas will be distributed as evenly as possible across different BEs in that label. + +:::note + +- If the total number of BE nodes associated with the labels is fewer than the number of replicas, the system will preferentially ensure there are enough replicas. In this case, replicas may not be distributed as the label specified. +- The label to be associated with a table must already exist. Otherwise, an error `Getting analyzing error. Detail message: Cannot find any backend with location: rack:xxx` will occur. + +::: + +#### At table creation + +You can use the property `"labels.location"` to distribute the table's data across rack 1 and rack 2 at table creation: + +```SQL +CREATE TABLE example_table ( + order_id bigint NOT NULL, + dt date NOT NULL, + user_id INT NOT NULL, + good_id INT NOT NULL, + cnt int NOT NULL, + revenue int NOT NULL +) +PROPERTIES +("labels.location" = "rack:rack1,rack:rack2"); +``` + +For newly created tables, the default value of the table property `labels.location` is `*`, indicating that replicas are evenly distributed across all labels. If the data distribution of a newly created table does not need to be aware of the geographical locations of servers in the cluster, you can manually set the table property `"labels.location" = ""`. + +#### After table creation + +If you need to modify the data distribution location of the table after table creation, for example, modify the location to rack 1, rack 2, and rack 3, you can execute the following statement: + +```SQL +ALTER TABLE example_table + SET ("labels.location" = "rack:rack1,rack:rack2,rack:rack3"); +``` + +:::note + +If you have upgraded StarRocks to version 3.2.8 or later, for historical tables created before the upgrade, data is not distributed based on labels by default. If you need to distribute data of a historical table based on labels, you can execute the following statement to add labels to the historical table: + +```SQL +ALTER TABLE example_table1 + SET ("labels.location" = "rack:rack1,rack:rack2"); +``` + +::: + +### Use labels to specify materialized view data distribution on BE nodes + +If you need to specify the locations to which an asynchronous materialized view's data is distributed, for example, distributing data across two racks, rack1 and rack2, you can add labels to the materialized view. + +After labels are added, all the replicas of the same tablet in the materialized view are distributed across labels in a Round-Robin approach. Moreover, if multiple replicas of the same tablet exist within the same label, these replicas will be distributed as evenly as possible across different BEs in that label. + +:::note + +- If the total number of BE nodes associated with the labels is fewer than the number of replicas, the system will preferentially ensure there are enough replicas. In this case, replicas may not be distributed as the label specified. +- The labels to be associated with the materialized view must already exist. Otherwise, an error `Getting analyzing error. Detail message: Cannot find any backend with location: rack:xxx` will occur. + +::: + +#### At materialized view creation + +If you want to distribute the materialized view's data across rack 1 and rack 2 while creating it, you can execute the following statement: + +```SQL +CREATE MATERIALIZED VIEW mv_example_mv +DISTRIBUTED BY RANDOM +PROPERTIES ( +"labels.location" = "rack:rack1,rack:rack2") +as +select order_id, dt from example_table; +``` + +For newly created materialized view, the default value of the property `labels.location` is `*`, indicating that replicas are evenly distributed across all labels. If the data distribution of a newly created materialized view does not need to be aware of the geographical locations of servers in the cluster, you can manually set the property `"labels.location" = ""`. + +#### After materialized view creation + +If you need to modify the data distribution location of the materialized view after it is created, for example, modify the location to rack 1, rack 2, and rack 3, you can execute the following statement: + +```SQL +ALTER MATERIALIZED VIEW mv_example_mv + SET ("labels.location" = "rack:rack1,rack:rack2,rack:rack3"); +``` + +:::note + +If you have upgraded StarRocks to version 3.2.8 or later, for existing materialized views created before the upgrade, data is not distributed based on labels by default. If you need to distribute data of an existing based on labels, you can execute the following statement to add labels to the materialized view: + +```SQL +ALTER TABLE example_mv1 + SET ("labels.location" = "rack:rack1,rack:rack2"); +``` + +::: + diff --git a/docs/en/administration/management/resource_management/filemanager.md b/docs/en/administration/management/resource_management/filemanager.md new file mode 100644 index 0000000..7ba94f9 --- /dev/null +++ b/docs/en/administration/management/resource_management/filemanager.md @@ -0,0 +1,42 @@ +--- +displayed_sidebar: docs +sidebar_position: 90 +--- + +# File manager + +With file manager, you can create, view, and delete files, such as the files that are used to access external data sources: public key files, private key files, and certificate files. You can reference or access the created files by using commands. + +## Basic concepts + +**File**: refers to the file that is created and saved in StarRocks. After a file is created and stored in StarRocks, StarRocks assigns a unique ID to the file. You can find a file based on the database name, catalog, and file name. In a database, only an admin user can create and delete files, and all users who have permissions to access a database can use the files that belong to the database. + +## Before you begin + +- Configure the following parameters for each FE. + - `small_file_dir`: the path in which the uploaded files are stored. The default path is `small_files/`, which is in the runtime directory of the FE. You need to specify this parameter in the **fe.conf** file and then restart the FE to allow the change to take effect. + - `max_small_file_size_bytes`: the maximum size of a single file. The default value of this parameter is 1 MB. If the size of a file exceeds the value of this parameter, the file cannot be created. You can specify this parameter by using the [ADMIN SET CONFIG](../../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md) statement. + - `max_small_file_number`: the maximum number of files that can be created within a cluster. The default value of this parameter is 100. If the number of files that you have created reaches the value of this parameter, you cannot continue creating files. You can specify this parameter by using the ADMIN SET CONFIG statement. + +> Note: Increasing the values of the two parameters causes an increase in the memory usage of the FE. Therefore, we recommend that you do not increase the values of the two parameters unless necessary. + +- Configure the following parameters for each BE. + +`small_file_dir`: the path in which the downloaded files are stored. The default path is `lib/small_files/`, which is in the runtime directory of the BE. You can specify this parameter in the **be.conf** file. + +## Create a file + +You can execute the CREATE FILE statement to create a file. For more information, see [CREATE FILE](../../../sql-reference/sql-statements/cluster-management/file/CREATE_FILE.md). After a file is created, the file is uploaded and persisted in StarRocks. + +## View a file + +You can execute the SHOW FILE statement to view the information about a file stored in a database. For more information, see [SHOW FILE](../../../sql-reference/sql-statements/cluster-management/file/SHOW_FILE.md). + +## Delete a file + +You can execute the DROP FILE statement to delete a file. For more information, see [DROP FILE](../../../sql-reference/sql-statements/cluster-management/file/DROP_FILE.md). + +## How an FE and a BE use a file + +- **FE**: The SmallFileMgr class stores the data related to the file in the specified directory of the FE. Then the SmallFileMgr class returns a local file path for the FE to use the file. +- **BE**: The BE calls the **/api/get_small_file API** (HTTP) to download the file to its specified directory and record the information of the file. When the BE requests to use the file, the BE checks whether the file has been downloaded and then verifies the file. If the file pass the verification, the path of the file is returned. If the file fails the verification, the file is deleted and re-downloaded from the FE. When a BE restarts, it preloads the downloaded file into its memory. diff --git a/docs/en/administration/management/resource_management/query_queues.md b/docs/en/administration/management/resource_management/query_queues.md new file mode 100644 index 0000000..af80c65 --- /dev/null +++ b/docs/en/administration/management/resource_management/query_queues.md @@ -0,0 +1,187 @@ +--- +displayed_sidebar: docs +sidebar_position: 20 +--- + +# Query queues + +This topic describes how to manage query queues in StarRocks. + +From v2.5, StarRocks supports query queues. With query queues enabled, StarRocks automatically queues the incoming queries when the concurrency threshold or resource limit is reached, thereby avoiding the overload deteriorating. Pending queries wait in a queue until there is enough compute resources available to begin execution. From v3.1.4 onwards, StarRocks supports setting query queues on the resource group level. + +You can set thresholds on CPU usage, memory usage, and query concurrency to trigger query queues. + +**Roadmap**: + +| Version | Global query queue | Resource group-level query queue | Collective concurrency management | Dynamic concurrency adjustment | +| ------ | ------------------ | -------------------------------- | --------------------------------- | ------------------------------- | +| v2.5 | ✅ | ❌ | ❌ | ❌ | +| v3.1.4 | ✅ | ✅ | ✅ | ✅ | + +## Enable query queues + +Query queues are disabled by default. You can enable global or resource group-level query queues for INSERT loading, SELECT queries, and statistics queries by setting corresponding global session variables. + +### Enable global query queues + +- Enable query queues for loading tasks: + +```SQL +SET GLOBAL enable_query_queue_load = true; +``` + +- Enable query queues for SELECT queries: + +```SQL +SET GLOBAL enable_query_queue_select = true; +``` + +- Enable query queues for statistics queries: + +```SQL +SET GLOBAL enable_query_queue_statistic = true; +``` + +### Enable resource group-level query queues + +From v3.1.4 onwards, StarRocks supports setting query queues on the resource group level. + +To enable the resource group-level query queues, you also need to set `enable_group_level_query_queue` in addition to the global session variables mentioned above. + +```SQL +SET GLOBAL enable_group_level_query_queue = true; +``` + +## Specify resource thresholds + +### Specify resource thresholds for global query queues + +You can set the thresholds that trigger query queues via the following global session variables: + +| **Variable** | **Default** | **Description** | +| ----------------------------------- | ----------- | ------------------------------------------------------------ | +| query_queue_concurrency_limit | 0 | The upper limit of concurrent queries on a BE. It takes effect only after being set greater than `0`. Setting it to `0` indicates no limit is imposed. | +| query_queue_mem_used_pct_limit | 0 | The upper limit of memory usage percentage on a BE. It takes effect only after being set greater than `0`. Setting it to `0` indicates no limit is imposed. Range: [0, 1] | +| query_queue_cpu_used_permille_limit | 0 | The upper limit of CPU usage permille (CPU usage * 1000) on a BE. It takes effect only after being set greater than `0`. Setting it to `0` indicates no limit is imposed. Range: [0, 1000] | + +> **NOTE** +> +> By default, BE reports resource usage to FE at one-second intervals. You can change this interval by setting the BE configuration item `report_resource_usage_interval_ms`. + +### Specify resource thresholds for resource group-level query queues + +From v3.1.4 onwards, you can set individual concurrency limits (`concurrency_limit`) and CPU core limits (`max_cpu_cores`) when creating a resource group. When a query is initiated, if any of the resource consumptions exceed the resource threshold at either the global or resource group level, the query will be placed in queue until all resource consumptions are within the threshold. + +| **Variable** | **Default** | **Description** | +| ------------------- | ----------- | ------------------------------------------------------------ | +| concurrency_limit | 0 | The concurrency limit for the resource group on a single BE node. It takes effect only when it is set to greater than `0`. | +| max_cpu_cores | 0 | The CPU core limit for this resource group on a single BE node. It takes effect only when it is set to greater than `0`. Range: [0, `avg_be_cpu_cores`], where `avg_be_cpu_cores` represents the average number of CPU cores across all BE nodes. | + +You can use SHOW USAGE RESOURCE GROUPS to view the resource usage information for each resource group on each BE node, as described in [View Resource Group Usage Information](./resource_group.md#view-resource-group-usage-information). + +### Manage query concurrency + +When the number of running queries (`num_running_queries`) exceeds the global or resource group's `concurrency_limit`, incoming queries are placed in the queue. The way to obtain `num_running_queries` differs between versions < v3.1.4 and ≥ v3.1.4. + +- In versions < v3.1.4, `num_running_queries` is reported by BEs at the interval specified in `report_resource_usage_interval_ms`. Therefore, there might be some delay in the identification of changes in `num_running_queries`. For example, if the `num_running_queries` reported by BEs at the moment does not exceed the global or resource group's `concurrency_limit`, but incoming queries arrive and exceed the `concurrency_limit` before the next report, these incoming queries will be executed without waiting in the queue. + +- In versions ≥ v3.1.4, all running queries are collectively managed by the Leader FE. Each Follower FE notifies the Leader FE when initiating or finishing a query, allowing the StarRocks to handle scenarios where there is a sudden increase in queries exceeding the `concurrency_limit`. + +## Configure query queues + +You can set the capacity of a query queue and the maximum timeout of queries in queues via the following global session variables: + +| **Variable** | **Default** | **Description** | +| ---------------------------------- | ----------- | ------------------------------------------------------------ | +| query_queue_max_queued_queries | 1024 | The upper limit of queries in a queue. When this threshold is reached, incoming queries are rejected. It takes effect only after being set greater than `0`. | +| query_queue_pending_timeout_second | 300 | The maximum timeout of a pending query in a queue. When this threshold is reached, the corresponding query is rejected. Unit: second. | + +## Configure dynamic adjustment of query concurrency + +Starting from version v3.1.4, for queries managed by the query queue and run by the Pipeline Engine, StarRocks can dynamically adjust the query concurrency `pipeline_dop` for incoming queries based on the current number of running queries `num_running_queries`, the number of fragments `num_fragments`, and the query concurrency `pipeline_dop`. This allows you to dynamically control query concurrency while minimizing scheduling overhead, ensuring optimal BE resource utilization. For more information about fragments and query concurrency `pipeline_dop`, see [Query Management - Adjusting Query Concurrency](./Query_management.md). + +For each query under a query queue, StarRocks maintains a concept of drivers, which represent the concurrent fragments of a query on a single BE. Its logical value `num_drivers`, which represents the total concurrency of all fragments of that query on a single BE, is equal to `num_fragments * pipeline_dop`. When a new query arrives, StarRocks adjusts the query concurrency `pipeline_dop` based on the following rules: + +- The more the number of running drivers `num_drivers` exceeds the low water limit of concurrent drivers `query_queue_driver_low_water`, the lower the query concurrency `pipeline_dop` is adjusted to. +- StarRocks restrains the number of running drivers `num_drivers` below the high water limit of concurrent drivers for queries `query_queue_driver_high_water`. + +You can configure the dynamic adjustment of query concurrency `pipeline_dop` using the following global session variables: + +| **Variable** | **Default** | **Description** | +| ----------------------------- | ----------- | ----------------------------------------------------------- | +| query_queue_driver_high_water | -1 | The high water limit of concurrent drivers for a query. It takes effect only when it is set to a non-negative value. When set to `0`, it is equivalent to `avg_be_cpu_cores * 16`, where `avg_be_cpu_cores` represents the average number of CPU cores across all BE nodes. When set to a value greater than `0`, that value is used directly. | +| query_queue_driver_low_water | -1 | The lower limit of concurrent drivers for queries. It takes effect only when it is set to a non-negative value. When set to `0`, it is equivalent to `avg_be_cpu_cores * 8`. When set to a value greater than `0`, that value is used directly. | + +## Monitor query queues + +You can view information related to query queues using the following methods. + +### SHOW PROC + +You can check the number of running queries, and memory and CPU usages in BE nodes using [SHOW PROC](../../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_PROC.md): + +```Plain +mysql> SHOW PROC '/backends'\G +*************************** 1. row *************************** +... + NumRunningQueries: 0 + MemUsedPct: 0.79 % + CpuUsedPct: 0.0 % +``` + +### SHOW PROCESSLIST + +You can check if a query is in a queue (when `IsPending` is `true`) using [SHOW PROCESSLIST](../../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_PROCESSLIST.md): + +```Plain +mysql> SHOW PROCESSLIST; ++------+------+---------------------+-------+---------+---------------------+------+-------+-------------------+-----------+ +| Id | User | Host | Db | Command | ConnectionStartTime | Time | State | Info | IsPending | ++------+------+---------------------+-------+---------+---------------------+------+-------+-------------------+-----------+ +| 2 | root | xxx.xx.xxx.xx:xxxxx | | Query | 2022-11-24 18:08:29 | 0 | OK | SHOW PROCESSLIST | false | ++------+------+---------------------+-------+---------+---------------------+------+-------+-------------------+-----------+ +``` + +### FE audit log + +You can check the FE audit log file **fe.audit.log**. The field `PendingTimeMs` indicates the time a query spent waiting in a queue, and its unit is milliseconds. + +### Monitoring metrics + +You can obtain metrics of query queues in StarRocks using the [Monitor and Alert](../monitoring/Monitor_and_Alert.md) feature. The following FE metrics are derived from the statistical data of each FE node. + +| Metric | Unit | Type | Description | +| ----------------------------------------------- | ---- | ------- | -------------------------------------------------------------- | +| starrocks_fe_query_queue_pending | Count | Instantaneous | The current number of queries in the queue. | +| starrocks_fe_query_queue_total | Count | Instantaneous | The total number of queries historically queued (including those currently running). | +| starrocks_fe_query_queue_timeout | Count | Instantaneous | The total number of queries that have timed out while in the queue. | +| starrocks_fe_resource_group_query_queue_total | Count | Instantaneous | The total number of queries historically queued in this resource group (including those currently running). The `name` label indicates the name of the resource group. This metric is supported from v3.1.4 onwards. | +| starrocks_fe_resource_group_query_queue_pending | Count | Instantaneous | The number of queries currently in the queue for this resource group. The `name` label indicates the name of the resource group. This metric is supported from v3.1.4 onwards. | +| starrocks_fe_resource_group_query_queue_timeout | Count | Instantaneous | The number of queries that have timed out while in the queue for this resource group. The `name` label indicates the name of the resource group. This metric is supported from v3.1.4 onwards. | + +### SHOW RUNNING QUERIES + +From v3.1.4 onwards, StarRocks supports the SQL statement `SHOW RUNNING QUERIES`, which is used to display queue information for each query. The meanings of each field are as follows: + +- `QueryId`: The ID of the query. +- `ResourceGroupId`: The ID of the resource group that the query hit. When there is no hit on a user-defined resource group, it will be displayed as "-". +- `StartTime`: The start time of the query. +- `PendingTimeout`: The time when the PENDING query will time out in the queue. +- `QueryTimeout`: The time when the query times out. +- `State`: The queue state of the query, where "PENDING" indicates it is in the queue, and "RUNNING" indicates it is currently executing. +- `Slots`: The logical resource quantity requested by the query, currently fixed at `1`. +- `Frontend`: The FE node that initiated the query. +- `FeStartTime`: The start time of the FE node that initiated the query. + +Example: + +```Plain +MySQL [(none)]> SHOW RUNNING QUERIES; ++--------------------------------------+-----------------+---------------------+---------------------+---------------------+-----------+-------+---------------------------------+---------------------+ +| QueryId | ResourceGroupId | StartTime | PendingTimeout | QueryTimeout | State | Slots | Frontend | FeStartTime | ++--------------------------------------+-----------------+---------------------+---------------------+---------------------+-----------+-------+---------------------------------+---------------------+ +| a46f68c6-3b49-11ee-8b43-00163e10863a | - | 2023-08-15 16:56:37 | 2023-08-15 17:01:37 | 2023-08-15 17:01:37 | RUNNING | 1 | 127.00.00.01_9010_1692069711535 | 2023-08-15 16:37:03 | +| a6935989-3b49-11ee-935a-00163e13bca3 | 12003 | 2023-08-15 16:56:40 | 2023-08-15 17:01:40 | 2023-08-15 17:01:40 | RUNNING | 1 | 127.00.00.02_9010_1692069658426 | 2023-08-15 16:37:03 | +| a7b5e137-3b49-11ee-8b43-00163e10863a | 12003 | 2023-08-15 16:56:42 | 2023-08-15 17:01:42 | 2023-08-15 17:01:42 | PENDING | 1 | 127.00.00.03_9010_1692069711535 | 2023-08-15 16:37:03 | ++--------------------------------------+-----------------+---------------------+---------------------+---------------------+-----------+-------+---------------------------------+---------------------+ +``` diff --git a/docs/en/administration/management/resource_management/resource_group.md b/docs/en/administration/management/resource_management/resource_group.md new file mode 100644 index 0000000..d5331a9 --- /dev/null +++ b/docs/en/administration/management/resource_management/resource_group.md @@ -0,0 +1,519 @@ +--- +displayed_sidebar: docs +sidebar_position: 10 +keywords: ['resource groups', 'isolation'] +--- + +# Resource group + +This topic describes the resource group feature of StarRocks. + +![resource group](../../../_assets/resource_group.png) + +With this feature, you could simultaneously run several workloads in a single cluster, including short query, ad-hoc query, ETL jobs, to save extra cost of deploying multiple clusters. From technical perspective, the execution engine would schedule concurrent workloads according to users' specification and isolate the interference among them. + +The roadmap of Resource Group: + +- Since v2.2, StarRocks supports limiting resource consumption for queries and implementing isolation and efficient use of resources among tenants in the same cluster. +- In StarRocks v2.3, you can further restrict the resource consumption for big queries, and prevent the cluster resources from getting exhausted by oversized query requests, to guarantee the system stability. +- StarRocks v2.5 supports limiting computation resource consumption for data loading (INSERT). +- From v3.3.5 onwards, StarRocks supports imposing hard limits on CPU resources. + +| | Internal Table | External Table | Big Query Restriction | INSERT INTO | Broker Load | Routine Load, Stream Load, Schema Change | CPU Hard Limit | +| --------------- | -------------- | -------------- | --------------------- | ----------- | ----------- | ---------------------------------------- | -------------- | +| 2.2 | √ | × | × | × | × | × | x | +| 2.3 | √ | √ | √ | × | × | × | x | +| 2.5 | √ | √ | √ | √ | × | × | x | +| 3.1 & 3.2 | √ | √ | √ | √ | √ | × | x | +| 3.3.5 and later | √ | √ | √ | √ | √ | × | √ | + +## Terms + +This section describes the terms that you must understand before you use the resource group feature. + +### resource group + +Each resource group is a set of computing resources from a specific BE. You can divide each BE of your cluster into multiple resource groups. When a query is assigned to a resource group, StarRocks allocates CPU and memory resources to the resource group based on the resource quotas that you specified for the resource group. + +You can specify CPU and memory resource quotas for a resource group on a BE by using the following parameters: + +| Parameter | Description | Value Range | Default | +| -------------------------- | -------------------------------------------------------------- | -------------------------------------------------------------- | ------- | +| cpu_weight | The CPU scheduling weight of this resource group on a BE node. | (0, `avg_be_cpu_cores`] (takes effect when greater than 0) | 0 | +| exclusive_cpu_cores | CPU hard isolation parameter for this resource group. | (0, `min_be_cpu_cores - 1`] (takes effect when greater than 0) | 0 | +| mem_limit | The percentage of memory available for queries by this resource group on the current BE node. | (0, 1] (required) | - | +| mem_pool | Groups resource groups to share a memory limit. | String | default_mem_pool | +| spill_mem_limit_threshold | Memory usage threshold that triggers spilling to disk. | (0, 1] | 1.0 | +| concurrency_limit | Maximum number of concurrent queries in this resource group. | Integer (takes effect when greater than 0) | 0 | +| big_query_cpu_second_limit | Maximum CPU time (in seconds) for big query tasks on each BE node. | Integer (takes effect when greater than 0) | 0 | +| big_query_scan_rows_limit | Maximum number of rows big query tasks can scan on each BE node. | Integer (takes effect when greater than 0) | 0 | +| big_query_mem_limit | Maximum memory big query tasks can use on each BE node. | Integer (takes effect when greater than 0) | 0 | + +#### CPU resource parameters + +##### `cpu_weight` + +This parameter specifies the CPU scheduling weight of a resource group on a single BE node, determining the relative share of CPU time allocated to tasks from this group. Before v3.3.5, this was referred to as `cpu_core_limit`. + +Its value range is (0, `avg_be_cpu_cores`], where `avg_be_cpu_cores` is the average number of CPU cores across all BE nodes. The parameter is effective only when it is set to greater than 0. Either cpu_weight or exclusive_cpu_cores must be greater than 0, but not both. + +> **NOTE** +> +> For example, suppose three resource groups, rg1, rg2, and rg3, have cpu_weight values of 2, 6, and 8, respectively. On a fully loaded BE node, these groups would receive 12.5%, 37.5%, and 50% of the CPU time. If the node is not fully loaded and rg1 and rg2 are under load while rg3 is idle, rg1 and rg2 would receive 25% and 75% of the CPU time, respectively. + +##### `exclusive_cpu_cores` + +This parameter defines CPU hard hard limit for a resource group. It has two implications: + +- **Exclusive**: Reserves `exclusive_cpu_cores` CPU cores exclusively for this resource group, making them unavailable to other groups, even when idle. +- **Quota**: Limits the resource group to only using these reserved CPU cores, preventing it from using available CPU resources from other groups. + +The value range is (0, `min_be_cpu_cores - 1`], where `min_be_cpu_cores` is the minimum number of CPU cores across all BE nodes. It takes effect only when greater than 0. Only one of `cpu_weight` or `exclusive_cpu_cores` can be set to greater than 0. + +- Resource groups with `exclusive_cpu_cores` greater than 0 are called Exclusive resource groups, and the CPU cores allocated to them are called Exclusive Cores. Other groups are called Shared resource groups and run on Shared Cores. +- The total number of `exclusive_cpu_cores` across all resource groups cannot exceed `min_be_cpu_cores - 1`. The upper limit is set to leave at least one Shared Core available. + +The relationship between `exclusive_cpu_cores` and `cpu_weight`: + +Only one of `cpu_weight` or `exclusive_cpu_cores` can be active at a time. Exclusive resource groups operate on their own reserved Exclusive Cores without requiring a share of CPU time through `cpu_weight`. + +You can configure whether Shared resource groups can borrow Exclusive Cores from Exclusive resource groups using the BE configuration `enable_resource_group_cpu_borrowing`. When set to `true` (default), Shared groups can borrow CPU resources when Exclusive groups are idle. + +To modify this configuration dynamically, use the following command: + +```SQL +UPDATE information_schema.be_configs SET VALUE = "false" WHERE NAME = "enable_resource_group_cpu_borrowing"; +``` + +#### Memory resource parameters + +##### `mem_limit` + +Specifies the percentage of memory (query pool) available to the resource group on the current BE node. The value range is (0,1]. + +##### `mem_pool` + +Since v4.0, specifies a shared memory pool identifier. Resource groups with the same mem_pool identifier draw from a shared memory pool, collectively limited by the `mem_limit`. If not specified, the resource group is assigned to `default_mem_pool`, and its memory usage is limited solely by its own `mem_limit`. + +All resource groups sharing the same `mem_pool` must be configured with an identical `mem_limit`. + +To limit two resource groups to consume 50% of memory collectively, it could be defined in the following way: + +```SQL +CREATE RESOURCE GROUP rg1 +TO (db='db1') +WITH ( + 'mem_limit' = '50%', + 'mem_pool' = 'shared_pool' +); + +CREATE RESOURCE GROUP rg2 +TO (db='db1') +WITH ( + 'mem_limit' = '50%', + 'mem_pool' = 'shared_pool' +); +``` + +##### `spill_mem_limit_threshold` + +Defines the memory usage threshold that triggers spilling to disk. The value range is (0,1], with the default being 1 (inactive). Introduced in v3.1.7. + +- When automatic spilling is enabled (`spill_mode` set to `auto`), but resource groups are disabled, the system will spill intermediate results to disk when a query’s memory usage exceeds 80% of `query_mem_limit`. +- When resource groups are enabled, spilling will occur if: + - The total memory usage of all queries in the group exceeds `current BE memory limit * mem_limit * spill_mem_limit_threshold`, or + - The memory usage of the current query exceeds 80% of `query_mem_limit`. + +#### Query concurrency parameters + +##### `concurrency_limit` + +Defines the maximum number of concurrent queries in the resource group to prevent system overload. Effective only when greater than 0, with a default value of 0. + +#### Big query resource parameters + +You can configure resource limits specifically for large queries using the following parameters: + +##### `big_query_cpu_second_limit` + +Specifies the maximum CPU time (in seconds) that large query tasks can use on each BE node, summing the actual CPU time used by parallel tasks. Effective only when greater than 0, with a default value of 0. + +##### `big_query_scan_rows_limit` + +Sets a limit on the number of rows large query tasks can scan on each BE node. Effective only when greater than 0, with a default value of 0. + +##### `big_query_mem_limit` + +Defines the maximum memory large query tasks can use on each BE node, in bytes. Effective only when greater than 0, with a default value of 0. + +> **NOTE** +> +> When a query running in a resource group exceeds the above big query limit, the query will be terminated with an error. You can also view error messages in the `ErrorCode` column of the FE node **fe.audit.log**. + +##### Type (Deprecated Since v3.3.5) + +Before v3.3.5, StarRocks allowed setting the `type` of a resource group to `short_query`. However, the parameter `type` has been deprecated and replaced by `exclusive_cpu_cores`. For any existing resource groups of this type, the system will automatically convert them to an Exclusive resource group where the `exclusive_cpu_cores` value equals the `cpu_weight` after upgrading to v3.3.5. + +#### System-defined resource groups + +There are two system-defined resource groups in each StarRocks instance: `default_wg` and `default_mv_wg`. You can modify the configuration of system-defined resource groups using the ALTER RESOURCE GROUP command, but you cannot define classifiers for them or delete system-defined resource groups. + +##### default_wg + +`default_wg` will be assigned to regular queries that are under the management of resource groups but don't match any classifier. The default resource limits of `default_wg` are as follows: + +- `cpu_weight`: The number of CPU cores of the BE. +- `mem_limit`: 100%. +- `concurrency_limit`: 0. +- `big_query_cpu_second_limit`: 0. +- `big_query_scan_rows_limit`: 0. +- `big_query_mem_limit`: 0. +- `spill_mem_limit_threshold`: 1. + +##### default_mv_wg + +`default_mv_wg` will be assigned to asynchronous materialized view refresh tasks if no resource group is allocated to the corresponding materialized view in the property `resource_group` during materialized view creation. The default resource limits of `default_mv_wg` are as follows: + +- `cpu_weight`: 1. +- `mem_limit`: 80%. +- `concurrency_limit`: 0. +- `spill_mem_limit_threshold`: 80%. + +### classifier + +Each classifier holds one or more conditions that can be matched to the properties of queries. StarRocks identifies the classifier that best matches each query based on the match conditions and assigns resources for running the query. + +Classifiers support the following conditions: + +- `user`: the name of the user. +- `role`: the role of the user. +- `query_type`: the type of the query. `SELECT` and `INSERT` (from v2.5) are supported. When INSERT INTO or BROKER LOAD tasks hit a resource group with `query_type` as `insert`, the BE node reserves the specified CPU resources for the tasks. +- `source_ip`: the CIDR block from which the query is initiated. +- `db`: the database which the query accesses. It can be specified by strings separated by commas `,`. +- `plan_cpu_cost_range`: The estimated CPU cost range of the query. The format is `(DOUBLE, DOUBLE]`. The default value is NULL, indicating no such restriction. The `PlanCpuCost` column in `fe.audit.log` represents the system's estimate of the CPU cost for the query. This parameter is supported from v3.1.4 onwards. +- `plan_mem_cost_range`: The system-estimated memory cost range of a query. The format is `(DOUBLE, DOUBLE]`. The default value is NULL, indicating no such restriction. The `PlanMemCost` column in `fe.audit.log` represents the system's estimate of the memory cost for the query. This parameter is supported from v3.1.4 onwards. + +A classifier matches a query only when one or all conditions of the classifier match the information about the query. If multiple classifiers match a query, StarRocks calculates the degree of matching between the query and each classifier and identifies the classifier with the highest degree of matching. + +> **NOTE** +> +> You can view the resource group to which a query belongs in the `ResourceGroup` column of the FE node **fe.audit.log** or by running `EXPLAIN VERBOSE `, as described in [View the resource group of a query](#view-the-resource-group-of-a-query). + +StarRocks calculates the degree of matching between a query and a classifier by using the following rules: + +- If the classifier has the same value of `user` as the query, the degree of matching of the classifier increases by 1. +- If the classifier has the same value of `role` as the query, the degree of matching of the classifier increases by 1. +- If the classifier has the same value of `query_type` as the query, the degree of matching of the classifier increases by 1 plus the number obtained from the following calculation: 1/Number of `query_type` fields in the classifier. +- If the classifier has the same value of `source_ip` as the query, the degree of matching of the classifier increases by 1 plus the number obtained from the following calculation: (32 - `cidr_prefix`)/64. +- If the classifier has the same value of `db` as the query, the degree of matching of the classifier increases by 10. +- If the query's CPU cost falls within the `plan_cpu_cost_range`, the degree of matching of the classifier increases by 1. +- If the query's memory cost falls within the `plan_mem_cost_range`, the degree of matching of the classifier increases by 1. + +If multiple classifiers match a query, the classifier with a larger number of conditions has a higher degree of matching. + +```Plain +-- Classifier B has more conditions than Classifier A. Therefore, Classifier B has a higher degree of matching than Classifier A. + + +classifier A (user='Alice') + + +classifier B (user='Alice', source_ip = '192.168.1.0/24') +``` + +If multiple matching classifiers have the same number of conditions, the classifier whose conditions are described more accurately has a higher degree of matching. + +```Plain +-- The CIDR block that is specified in Classifier B is smaller in range than Classifier A. Therefore, Classifier B has a higher degree of matching than Classifier A. +classifier A (user='Alice', source_ip = '192.168.1.0/16') +classifier B (user='Alice', source_ip = '192.168.1.0/24') + +-- Classifier C has fewer query types specified in it than Classifier D. Therefore, Classifier C has a higher degree of matching than Classifier D. +classifier C (user='Alice', query_type in ('select')) +classifier D (user='Alice', query_type in ('insert','select')) +``` + +If multiple classifiers have the same degree of matching, one of the classifiers will be randomly selected. + +```Plain +-- If a query simultaneously queries both db1 and db2 and the classifiers E and F have the +-- highest degree of matching among the hit classifiers, one of E and F will be randomly selected. +classifier E (db='db1') +classifier F (db='db2') +``` + +## Isolate computing resources + +You can isolate computing resources among queries by configuring resource groups and classifiers. + +### Enable resource groups + +To use resource group, you must enable Pipeline Engine for your StarRocks cluster: + +```SQL +-- Enable Pipeline Engine in the current session. +SET enable_pipeline_engine = true; +-- Enable Pipeline Engine globally. +SET GLOBAL enable_pipeline_engine = true; +``` + +> **NOTE** +> +> From v3.1.0 onwards, Resource Group is enabled by default, and the session variable `enable_resource_group` is deprecated. + +### Create resource groups and classifiers + +Execute the following statement to create a resource group, associate the resource group with a classifier, and allocate computing resources to the resource group: + +```SQL +CREATE RESOURCE GROUP +TO ( + user='string', + role='string', + query_type in ('select'), + source_ip='cidr' +) --Create a classifier. If you create more than one classifier, separate the classifiers with commas (`,`). +WITH ( + "{ cpu_weight | exclusive_cpu_cores }" = "INT", + "mem_limit" = "m%", + "concurrency_limit" = "INT", + "type" = "str" --The type of the resource group. Set the value to normal. +); +``` + +Example: + +```SQL +CREATE RESOURCE GROUP rg1 +TO + (user='rg1_user1', role='rg1_role1', query_type in ('select'), source_ip='192.168.x.x/24'), + (user='rg1_user2', query_type in ('select'), source_ip='192.168.x.x/24'), + (user='rg1_user3', source_ip='192.168.x.x/24'), + (user='rg1_user4'), + (db='db1') +WITH ( + 'exclusive_cpu_cores' = '10', + 'mem_limit' = '20%', + 'big_query_cpu_second_limit' = '100', + 'big_query_scan_rows_limit' = '100000', + 'big_query_mem_limit' = '1073741824' +); +``` + +### Specify resource group (Optional) + +You can specify resource group for the current session directly, including `default_wg` and `default_mv_wg`. + +```SQL +SET resource_group = 'group_name'; +``` + +### View resource groups and classifiers + +Execute the following statement to query all resource groups and classifiers: + +```SQL +SHOW RESOURCE GROUPS ALL; +``` + +Execute the following statement to query the resource groups and classifiers of the logged-in user: + +```SQL +SHOW RESOURCE GROUPS; +``` + +Execute the following statement to query a specified resource group and its classifiers: + +```SQL +SHOW RESOURCE GROUP group_name; +``` + +Example: + +```plain +mysql> SHOW RESOURCE GROUPS ALL; ++---------------+-------+------------+---------------------+-----------+----------------------------+---------------------------+---------------------+-------------------+---------------------------+----------------------------------------+ +| name | id | cpu_weight | exclusive_cpu_cores | mem_limit | big_query_cpu_second_limit | big_query_scan_rows_limit | big_query_mem_limit | concurrency_limit | spill_mem_limit_threshold | classifiers | ++---------------+-------+------------+---------------------+-----------+----------------------------+---------------------------+---------------------+-------------------+---------------------------+----------------------------------------+ +| default_mv_wg | 3 | 1 | 0 | 80.0% | 0 | 0 | 0 | null | 80% | (id=0, weight=0.0) | +| default_wg | 2 | 1 | 0 | 100.0% | 0 | 0 | 0 | null | 100% | (id=0, weight=0.0) | +| rge1 | 15015 | 0 | 6 | 90.0% | 0 | 0 | 0 | null | 100% | (id=15016, weight=1.0, user=rg1_user) | +| rgs1 | 15017 | 8 | 0 | 90.0% | 0 | 0 | 0 | null | 100% | (id=15018, weight=1.0, user=rgs1_user) | +| rgs2 | 15019 | 8 | 0 | 90.0% | 0 | 0 | 0 | null | 100% | (id=15020, weight=1.0, user=rgs2_user) | ++---------------+-------+------------+---------------------+-----------+----------------------------+---------------------------+---------------------+-------------------+---------------------------+----------------------------------------+ +``` + +> **NOTE** +> +> In the preceding example, `weight` indicates the degree of matching. + +To query all fields of a resource group, including deprecated fields. + +By adding the keyword `VERBOSE` to the three commands mentioned above, you can view all fields of the resource group, including deprecated ones, such as `type` and `max_cpu_cores`. + +```sql +SHOW VERBOSE RESOURCE GROUPS ALL; +SHOW VERBOSE RESOURCE GROUPS; +SHOW VERBOSE RESOURCE GROUP group_name; +``` + +### Manage resource groups and classifiers + +You can modify the resource quotas for each resource group. You can also add or delete classifiers from resource groups. + +Execute the following statement to modify the resource quotas for an existing resource group: + +```SQL +ALTER RESOURCE GROUP group_name WITH ( + 'cpu_core_limit' = 'INT', + 'mem_limit' = 'm%' +); +``` + +Execute the following statement to delete a resource group: + +```SQL +DROP RESOURCE GROUP group_name; +``` + +Execute the following statement to add a classifier to a resource group: + +```SQL +ALTER RESOURCE GROUP ADD (user='string', role='string', query_type in ('select'), source_ip='cidr'); +``` + +Execute the following statement to delete a classifier from a resource group: + +```SQL +ALTER RESOURCE GROUP DROP (CLASSIFIER_ID_1, CLASSIFIER_ID_2, ...); +``` + +Execute the following statement to delete all classifiers of a resource group: + +```SQL +ALTER RESOURCE GROUP DROP ALL; +``` + +## Observe resource groups + +### View the resource group of a query + +For queries that have not yet been executed, you can view the resource group matched by the query from the `RESOURCE GROUP` field returned by `EXPLAIN VERBOSE `. + +While a query is running, you can check which resource group the query has hit from the `ResourceGroup` field returned by `SHOW PROC '/current_queries'` and `SHOW PROC '/global_current_queries'`. + +After a query has completed, you can view the resource group that the query matched by checking the `ResourceGroup` field in the **fe.audit.log** file on the FE node. + +- If the query is not under the management of resource groups, the column value is an empty string `""`. +- If the query is under the management of resource groups but doesn't match any classifier, the column value is an empty string `""`. But this query is assigned to the default resource group `default_wg`. + +### Monitoring resource groups + +You can set up [monitoring and alerting](../monitoring/Monitor_and_Alert.md) for your resource groups. + +Resource group-related FE and BE metrics are as follows. All the metrics below have a `name` label indicating their corresponding resource group. + +### FE metrics + +The following FE metrics only provide statistics within the current FE node: + +| Metric | Unit | Type | Description | +| ----------------------------------------------- | ---- | ------------- | ------------------------------------------------------------------ | +| starrocks_fe_query_resource_group | Count | Instantaneous | The number of queries historically run in this resource group (including those currently running). | +| starrocks_fe_query_resource_group_latency | ms | Instantaneous | The query latency percentile for this resource group. The label `type` indicates specific percentiles, including `mean`, `75_quantile`, `95_quantile`, `98_quantile`, `99_quantile`, `999_quantile`. | +| starrocks_fe_query_resource_group_err | Count | Instantaneous | The number of queries in this resource group that encountered an error. | +| starrocks_fe_resource_group_query_queue_total | Count | Instantaneous | The total number of queries historically queued in this resource group (including those currently running). This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. | +| starrocks_fe_resource_group_query_queue_pending | Count | Instantaneous | The number of queries currently in the queue of this resource group. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. | +| starrocks_fe_resource_group_query_queue_timeout | Count | Instantaneous | The number of queries in this resource group that have timed out while in the queue. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. | + +### BE metrics + +| Metric | Unit | Type | Description | +| ----------------------------------------- | -------- | ------------- | ------------------------------------------------------------------ | +| resource_group_running_queries | Count | Instantaneous | The number of queries currently running in this resource group. | +| resource_group_total_queries | Count | Instantaneous | The number of queries historically run in this resource group (including those currently running). | +| resource_group_bigquery_count | Count | Instantaneous | The number of queries in this resource group that triggered the big query limit. | +| resource_group_concurrency_overflow_count | Count | Instantaneous | The number of queries in this resource group that triggered the `concurrency_limit` limit. | +| resource_group_mem_limit_bytes | Bytes | Instantaneous | The memory limit for this resource group. | +| resource_group_mem_inuse_bytes | Bytes | Instantaneous | The memory currently in use by this resource group. | +| resource_group_cpu_limit_ratio | Percentage | Instantaneous | The ratio of this resource group's `cpu_core_limit` to the total `cpu_core_limit` across all resource groups. | +| resource_group_inuse_cpu_cores | Count | Average | The estimated number of CPU cores in use by this resource group. This value is an approximate estimate. It represents the average value calculated based on the statistics from two consecutive metric collections. This metric is supported from v3.1.4 onwards. | +| resource_group_cpu_use_ratio | Percentage | Average | **Deprecated** The ratio of the Pipeline thread time slices used by this resource group to the total Pipeline thread time slices used by all resource groups. It represents the average value calculated based on the statistics from two consecutive metric collections. | +| resource_group_connector_scan_use_ratio | Percentage | Average | **Deprecated** The ratio of the external table Scan thread time slices used by this resource group to the total Pipeline thread time slices used by all resource groups. It represents the average value calculated based on the statistics from two consecutive metric collections. | +| resource_group_scan_use_ratio | Percentage | Average | **Deprecated** The ratio of the internal table Scan thread time slices used by this resource group to the total Pipeline thread time slices used by all resource groups. It represents the average value calculated based on the statistics from two consecutive metric collections. | + +### View resource group usage information + +From v3.1.4 onwards, StarRocks supports the SQL statement [SHOW USAGE RESOURCE GROUPS](../../../sql-reference/sql-statements/cluster-management/resource_group/SHOW_USAGE_RESOURCE_GROUPS.md), which is used to display usage information for each resource group across BEs. The descriptions of each field are as follows: + +- `Name`: The name of the resource group. +- `Id`: The ID of the resource group. +- `Backend`: The BE's IP or FQDN. +- `BEInUseCpuCores`: The number of CPU cores currently in use by this resource group on this BE. This value is an approximate estimate. +- `BEInUseMemBytes`: The number of memory bytes currently in use by this resource group on this BE. +- `BERunningQueries`: The number of queries from this resource group that are still running on this BE. + +Please note: + +- BEs periodically report this resource usage information to the Leader FE at the interval specified in `report_resource_usage_interval_ms`, which is by default set to 1 second. +- The results will only show rows where at least one of `BEInUseCpuCores`/`BEInUseMemBytes`/`BERunningQueries` is a positive number. In other words, the information is displayed only when a resource group is actively using some resources on a BE. + +Example: + +```Plain +MySQL [(none)]> SHOW USAGE RESOURCE GROUPS; ++------------+----+-----------+-----------------+-----------------+------------------+ +| Name | Id | Backend | BEInUseCpuCores | BEInUseMemBytes | BERunningQueries | ++------------+----+-----------+-----------------+-----------------+------------------+ +| default_wg | 0 | 127.0.0.1 | 0.100 | 1 | 5 | ++------------+----+-----------+-----------------+-----------------+------------------+ +| default_wg | 0 | 127.0.0.2 | 0.200 | 2 | 6 | ++------------+----+-----------+-----------------+-----------------+------------------+ +| wg1 | 0 | 127.0.0.1 | 0.300 | 3 | 7 | ++------------+----+-----------+-----------------+-----------------+------------------+ +| wg2 | 0 | 127.0.0.1 | 0.400 | 4 | 8 | ++------------+----+-----------+-----------------+-----------------+------------------+ +``` + +### View thread information for Exclusive and Shared resource groups + +Query execution mainly involves three thread pools: `pip_exec`, `pip_scan`, and `pip_con_scan`. + +- Exclusive resource groups run in their dedicated thread pools and are bound to the Exclusive CPU cores allocated to them. +- Shared resource groups run in shared thread pools and are bound to the remaining Shared CPU cores. + +The threads in these three pools follow the naming convention `{ pip_exec | pip_scan | pip_con_scan }_{ com | }`, where `com` refers to the shared thread pool, and `` refers to the ID of the Exclusive resource group. + +You can view the CPU information bound to each BE thread through the system-defined view `information_schema.be_threads`. The fields `BE_ID`, `NAME`, and `BOUND_CPUS` represent the BE's ID, the name of the thread, and the number of CPU cores bound to that thread, respectively. + +```sql +select * from information_schema.be_threads where name like '%pip_exec%'; +select * from information_schema.be_threads where name like '%pip_scan%'; +select * from information_schema.be_threads where name like '%pip_con_scan%'; +``` + +Example: + +```sql +select BE_ID, NAME, FINISHED_TASKS, BOUND_CPUS from information_schema.be_threads where name like '%pip_exec_com%' and be_id = 10223; ++-------+--------------+----------------+------------+ +| BE_ID | NAME | FINISHED_TASKS | BOUND_CPUS | ++-------+--------------+----------------+------------+ +| 10223 | pip_exec_com | 2091295 | 10 | +| 10223 | pip_exec_com | 2088025 | 10 | +| 10223 | pip_exec_com | 1637603 | 6 | +| 10223 | pip_exec_com | 1641260 | 6 | +| 10223 | pip_exec_com | 1634197 | 6 | +| 10223 | pip_exec_com | 1633804 | 6 | +| 10223 | pip_exec_com | 1638184 | 6 | +| 10223 | pip_exec_com | 1636374 | 6 | +| 10223 | pip_exec_com | 2095951 | 10 | +| 10223 | pip_exec_com | 2095248 | 10 | +| 10223 | pip_exec_com | 2098745 | 10 | +| 10223 | pip_exec_com | 2085338 | 10 | +| 10223 | pip_exec_com | 2101221 | 10 | +| 10223 | pip_exec_com | 2093901 | 10 | +| 10223 | pip_exec_com | 2092364 | 10 | +| 10223 | pip_exec_com | 2091366 | 10 | ++-------+--------------+----------------+------------+ +``` diff --git a/docs/en/administration/management/resource_management/spill_to_disk.md b/docs/en/administration/management/resource_management/spill_to_disk.md new file mode 100644 index 0000000..a004cc7 --- /dev/null +++ b/docs/en/administration/management/resource_management/spill_to_disk.md @@ -0,0 +1,91 @@ +--- +displayed_sidebar: docs +sidebar_position: 50 +--- + +import Beta from '../../../_assets/commonMarkdown/_beta.mdx' + +# Spill to disk + + + +This topic describes how to spill intermediate computation results of large operators to local disks and object storage. + +## Overview + +For database systems that rely on in-memory computing for query execution, like StarRocks, they can consume substantial memory resources when processing queries with aggregate, sort, and join operators on a big dataset. When memory limits are reached, these queries are forcibly terminated due to out-of-memory (OOM). + +However, there are still chances that you want certain memory-intensive tasks to be completed stably and performance is not your top priority, for example, building a materialized view, or performing a lightweight ETL with INSERT INTO SELECT. These tasks can easily exhaust your memory resources and thereby block other queries running in your cluster. Usually, to address this issue, you can only fine-tune these tasks individually, and rely on your resource isolation strategy to control the query concurrency. This could be particularly inconvenient and likely to fail under some extreme scenarios. + +From StarRocks v3.0.1, StarRocks supports spilling the intermediate results of some memory-intensive operators to disks. With this feature, you can trade a tolerable drop in performance for a significant reduction in memory usage, thereby improving system availability. + +Currently, StarRocks' spilling feature supports the following operators: + +- Aggregate operators +- Sort operators +- Hash join (LEFT JOIN, RIGHT JOIN, FULL JOIN, OUTER JOIN, SEMI JOIN, and INNER JOIN) operators +- CTE operators (Supported from v3.3.4 onwards) + +## Enable intermediate result spilling + +Follow these steps to enable intermediate result spilling: + +1. Specify the local spill directory `spill_local_storage_dir`, which stores the spilled intermediate result on the local disk, in the BE configuration file **be.conf** or the CN configuration file **cn.conf**, and restart the cluster to allow the modification to take effect. + + ```Properties + spill_local_storage_dir=/[;/] + ``` + + > **NOTE** + > + > - You can specify multiple `spill_local_storage_dir` by separating them with semicolons (`;`). + > - In a production environment, we strongly recommend you use different disks for data storage and spilling. When intermediate results are spilled to disk, there could be a significant increase in both writing load and disk usage. If the same disk is used, this surge can impact other queries or tasks running in the cluster. + +2. Execute the following statement to enable intermediate result spilling: + + ```SQL + SET enable_spill = true; + ``` + +3. Configure the mode of intermediate result spilling using the session variable `spill_mode`: + + ```SQL + SET spill_mode = { "auto" | "force" }; + ``` + + > **NOTE** + > + > Each time a query with spilling completes, StarRocks automatically clears the spilled data the query produces. If BE crashes before clearing the data, StarRocks clears it when the BE is restarted. + + | **Variable** | **Default** | **Description** | + | ------------ | ----------- | ------------------------------------------------------------ | + | enable_spill | false | Whether to enable intermediate result spilling. If it is set to `true`, StarRocks spills the intermediate results to disk to reduce the memory usage when processing aggregate, sort, or join operators in queries. | + | spill_mode | auto | The execution mode of intermediate result spilling. Valid values:
  • `auto`: Spilling is automatically triggered when the memory usage threshold is reached.
  • `force`: StarRocks forcibly executes spilling for all relevant operators, regardless of memory usage.
This variable takes effect only when the variable `enable_spill` is set to `true`. | + +## [Preview] Spill intermediate result to object storage + +From v3.3.0 onwards, StarRocks supports spilling intermediate results to object storage. + +:::tip +Before enabling spilling to object storage, you must create a storage volume to define the object storage you want to use. For detailed instruction on creating a storage volume, see [CREATE STORAGE VOLUME](../../../sql-reference/sql-statements/cluster-management/storage_volume/CREATE_STORAGE_VOLUME.md). +::: + +After you have enabled spilling in the previous step, you can further set these system variables to allow the intermediate results to be spilled to object storage: + +```SQL +SET enable_spill_to_remote_storage = true; + +-- Replace with the name of the storage volume which you want to use. +SET spill_storage_volume = ''; +``` + +After spilling to object storage has been enabled, the intermediate results of queries that triggered spilling will be first stored in the local disks of the BE or CN nodes, and then the object storage if the capacity limit of the local disks is reached. + +Please note that, if the storage volume you specified for `spill_storage_volume` does not exist, spilling to object storage will not be enabled. + +## Limitations + +- Not all OOM issues can be resolved by spilling. For example, StarRocks cannot release the memory used for expression evaluation. +- Usually, queries with spilling involved indicate a tenfold increase in query latency. We recommend you extend the query timeout for these queries by setting the session variable `query_timeout`. +- There is a significant performance drop in spilling to object storage compared to spilling to local disks. +- `spill_local_storage_dir` of each BE or CN node is shared among all queries running on the node. Currently, StarRocks does not support setting a size limit of spilled data to local disks individually for each query. Therefore, concurrent queries involved spilling may impact one another. diff --git a/docs/en/administration/management/timezone.md b/docs/en/administration/management/timezone.md new file mode 100644 index 0000000..7d23bf6 --- /dev/null +++ b/docs/en/administration/management/timezone.md @@ -0,0 +1,55 @@ +--- +displayed_sidebar: docs +--- + +# Configure a time zone + +This topic describes how to configure a time zone and the impacts of time zone settings. + +## Configure a session-level time zone or a global time zone + +You can configure a session-level time zone or a global time zone for your StarRocks cluster using the `time_zone` parameter. + +- To configure a session-level time zone, execute the command `SET time_zone = 'xxx';`. You can configure different time zones for different sessions. The time zone setting becomes invalid if you disconnect with FEs. +- To configure a global time zone, execute the command `SET global time_zone = 'xxx';`. The time zone setting is persisted in FEs and is valid even if you disconnect with FEs. + +> **Note** +> +> Before you load data into StarRocks, modify the global time zone of your StarRocks cluster to the same value of the `system_time_zone` parameter. Otherwise, after data loading, data of the DATE type are incorrect. The `system_time_zone` parameter refers to the time zone of the machines that are used to host FEs. When the machines are started, the time zone of the machines is recorded as the value of this parameter. You cannot manually configure this parameter. + +### Time zone format + +The value of the `time_zone` parameter is not case-sensitive. The value of the parameter can be in one of the following formats. + +| **Format** | **Example** | +| -------------- | ------------------------------------------------------------ | +| UTC offset | `SET time_zone = '+10:00';` `SET global time_zone = '-6:00';` | +| Time zone name | `SET time_zone = 'Asia/Shanghai';` `SET global time_zone = 'America/Los_Angeles';` | + +For more information about time zone format, see [List of tz database time zones](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). + +> **Note** +> +> Time zone abbreviations are not supported except for CST. If you set the value of `time_zone` to `CST`, StarRocks converts `CST` into `Asia/Shanghai`. + +### Default time zone + +The default value of the `time_zone` parameter is `Asia/Shanghai`. + +## View time zone settings + +To view the time zone setting, run the following command. + +```plaintext + SHOW VARIABLES LIKE '%time_zone%'; +``` + +## Impacts of time zone settings + +- Time zone settings affect the time values returned by the SHOW LOAD and SHOW BACKENDS statements. However, the settings do not affect the value specified in the `LESS THAN` clause when the partitioning columns specified in CREATE TABLE statement are of the DATE or DATETIME type. The settings also do not affect data of the DATE and DATETIME types. +- Time zone settings affect the display and storage of the following functions: + - **from_unixtime**: returns a date and time of your specified time zone based on a specified UTC timestamp. For example, if the global time zone of your StarRocks cluster is `Asia/Shanghai`, `select FROM_UNIXTIME(0);` returns `1970-01-01 08:00:00`. + - **unix_timestamp**: returns a UTC timestamp based on the date and time of your specified time zone. For example, if the global time zone of your StarRocks cluster is `Asia/Shanghai`, `select UNIX_TIMESTAMP('1970-01-01 08:00:00');` returns `0`. + - **curtime**: returns the current time of your specified time zone. For example, if the current time of a specified time zone is 16:34:05. `select CURTIME();` returns `16:34:05`. + - **now**: returns the current date and time of your specified time zone. For example, if the current date and time of a specified time zone is 2021-02-11 16:34:13, `select NOW();` returns `2021-02-11 16:34:13`. + - **convert_tz**: converts the date and time from one time zone to another. For example, `select CONVERT_TZ('2021-08-01 11:11:11', 'Asia/Shanghai', 'America/Los_Angeles');` returns `2021-07-31 20:11:11`. From 905dd73bd1df4eceb99b488e05974a550e646cf7 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 11 Feb 2026 00:00:02 +0000 Subject: [PATCH 2/3] docs: automated translation via Gemini [skip ci] --- .../administration/management/BE_blacklist.md | 102 + .../management/BE_configuration.md | 3473 +++++++++++++ .../management/Backup_and_restore.md | 650 +++ .../management/FE_configuration.md | 167 + .../management/Scale_up_down.md | 99 + .../administration/management/audit_loader.md | 221 + .../administration/management/compaction.md | 302 ++ .../administration/management/BE_blacklist.md | 102 + .../management/BE_configuration.md | 3473 +++++++++++++ .../management/Backup_and_restore.md | 650 +++ .../management/FE_configuration.md | 4500 +++++++++++++++++ .../management/Scale_up_down.md | 99 + .../administration/management/audit_loader.md | 221 + .../administration/management/compaction.md | 303 ++ 14 files changed, 14362 insertions(+) create mode 100644 docs/ja/administration/management/BE_blacklist.md create mode 100644 docs/ja/administration/management/BE_configuration.md create mode 100644 docs/ja/administration/management/Backup_and_restore.md create mode 100644 docs/ja/administration/management/FE_configuration.md create mode 100644 docs/ja/administration/management/Scale_up_down.md create mode 100644 docs/ja/administration/management/audit_loader.md create mode 100644 docs/ja/administration/management/compaction.md create mode 100644 docs/zh/administration/management/BE_blacklist.md create mode 100644 docs/zh/administration/management/BE_configuration.md create mode 100644 docs/zh/administration/management/Backup_and_restore.md create mode 100644 docs/zh/administration/management/FE_configuration.md create mode 100644 docs/zh/administration/management/Scale_up_down.md create mode 100644 docs/zh/administration/management/audit_loader.md create mode 100644 docs/zh/administration/management/compaction.md diff --git a/docs/ja/administration/management/BE_blacklist.md b/docs/ja/administration/management/BE_blacklist.md new file mode 100644 index 0000000..d2637aa --- /dev/null +++ b/docs/ja/administration/management/BE_blacklist.md @@ -0,0 +1,102 @@ +--- +displayed_sidebar: docs +--- + +# BEおよびCNブラックリストの管理 + +v3.3.0以降、StarRocksはBEブラックリスト機能に対応しており、クエリ実行における特定のBEノードの使用を禁止することで、BEノードへの接続失敗に起因する頻繁なクエリ失敗やその他の予期せぬ動作を回避できます。1つまたは複数のBEへの接続を妨げるネットワークの問題は、ブラックリストを使用する例となるでしょう。 + +v4.0以降、StarRocksはCompute Nodes (CNs) をブラックリストに追加する機能に対応しています。 + +デフォルトでは、StarRocksはBEおよびCNブラックリストを自動的に管理し、接続が失われたBEまたはCNノードをブラックリストに追加し、接続が再確立されるとブラックリストから削除します。ただし、手動でブラックリストに追加されたノードは、StarRocksによってブラックリストから削除されません。 + +:::note + +- SYSTEM-level BLACKLIST権限を持つユーザーのみがこの機能を使用できます。 +- 各FEノードは独自のBEおよびCNブラックリストを保持し、他のFEノードと共有しません。 + +::: + +## BE/CNをブラックリストに追加 + +[ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md) を使用して、BE/CNノードを手動でブラックリストに追加できます。このステートメントでは、ブラックリストに追加するBE/CNノードのIDを指定する必要があります。[SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md) を実行してBE IDを、[SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md) を実行してCN IDを取得できます。 + +例: + +```SQL +-- BE IDを取得します。 +SHOW BACKENDS\G +*************************** 1. row *************************** + BackendId: 10001 + IP: xxx.xx.xx.xxx + ... +-- BEをブラックリストに追加します。 +ADD BACKEND BLACKLIST 10001; + +-- CN IDを取得します。 +SHOW COMPUTE NODES\G +*************************** 1. row *************************** + ComputeNodeId: 10005 + IP: xxx.xx.xx.xxx + ... +-- CNをブラックリストに追加します。 +ADD COMPUTE NODE BLACKLIST 10005; +``` + +## ブラックリストからBE/CNを削除 + +[DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md) を使用して、BE/CNノードを手動でブラックリストから削除できます。このステートメントでは、BE/CNノードのIDも指定する必要があります。 + +例: + +```SQL +-- ブラックリストからBEを削除します。 +DELETE BACKEND BLACKLIST 10001; + +-- ブラックリストからCNを削除します。 +DELETE COMPUTE NODE BLACKLIST 10005; +``` + +## BE/CNブラックリストを表示 + +[SHOW BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKEND_BLACKLIST.md) を使用して、ブラックリスト内のBE/CNノードを表示できます。 + +例: + +```SQL +-- BEブラックリストを表示します。 +SHOW BACKEND BLACKLIST; ++-----------+------------------+---------------------+------------------------------+--------------------+ +| BackendId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | ++-----------+------------------+---------------------+------------------------------+--------------------+ +| 10001 | MANUAL | 2024-04-28 11:52:09 | 0 | 5 | ++-----------+------------------+---------------------+------------------------------+--------------------+ + +-- CNブラックリストを表示します。 +SHOW COMPUTE NODE BLACKLIST; ++---------------+------------------+---------------------+------------------------------+--------------------+ +| ComputeNodeId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | ++---------------+------------------+---------------------+------------------------------+--------------------+ +| 10005 | MANUAL | 2025-08-18 10:47:51 | 0 | 5 | ++---------------+------------------+---------------------+------------------------------+--------------------+ +``` + +以下のフィールドが返されます。 + +- `AddBlackListType`: BE/CNノードがブラックリストに追加された方法を示します。`MANUAL` はユーザーによって手動でブラックリストに追加されたことを示します。`AUTO` はStarRocksによって自動的にブラックリストに追加されたことを示します。 +- `LostConnectionTime`: + - `MANUAL` タイプの場合、BE/CNノードが手動でブラックリストに追加された時刻を示します。 + - `AUTO` タイプの場合、最後の接続が成功した時刻を示します。 +- `LostConnectionNumberInPeriod`: `CheckTimePeriod(s)` の期間内に検出された切断の回数です。これは、StarRocksがブラックリスト内のBE/CNノードの接続状態をチェックする間隔です。 +- `CheckTimePeriod(s)`: StarRocksがブラックリスト内のBE/CNノードの接続状態をチェックする間隔です。その値は、FE構成項目 `black_host_history_sec` に指定された値として評価されます。単位:秒。 + +## BE/CNブラックリストの自動管理を設定 + +BE/CNノードがFEノードへの接続を失うか、BE/CNノードでのタイムアウトによりクエリが失敗するたびに、FEノードはBE/CNノードをBEおよびCNブラックリストに追加します。FEノードは、一定期間内の接続失敗数をカウントすることで、ブラックリスト内のBE/CNノードの接続性を常に評価します。StarRocksは、接続失敗数が事前に指定されたしきい値を下回る場合にのみ、ブラックリスト内のBE/CNノードを削除します。 + +以下の[FE設定](./FE_configuration.md)を使用して、BEおよびCNブラックリストの自動管理を設定できます。 + +- `black_host_history_sec`: ブラックリスト内のBE/CNノードの接続失敗履歴を保持する期間です。 +- `black_host_connect_failures_within_time`: ブラックリスト内のBE/CNノードに許容される接続失敗のしきい値です。 + +BE/CNノードが自動的にブラックリストに追加された場合、StarRocksはその接続性を評価し、ブラックリストから削除できるかどうかを判断します。`black_host_history_sec` の期間内に、ブラックリスト内のBE/CNノードの接続失敗が `black_host_connect_failures_within_time` で設定されたしきい値を下回る場合にのみ、ブラックリストから削除できます。 diff --git a/docs/ja/administration/management/BE_configuration.md b/docs/ja/administration/management/BE_configuration.md new file mode 100644 index 0000000..b330886 --- /dev/null +++ b/docs/ja/administration/management/BE_configuration.md @@ -0,0 +1,3473 @@ +--- +displayed_sidebar: docs +--- + +import BEConfigMethod from '../../_assets/commonMarkdown/BE_config_method.mdx' + +import CNConfigMethod from '../../_assets/commonMarkdown/CN_config_method.mdx' + +import PostBEConfig from '../../_assets/commonMarkdown/BE_dynamic_note.mdx' + +import StaticBEConfigNote from '../../_assets/commonMarkdown/StaticBE_config_note.mdx' + +# BE設定 + + + + + +## BE構成項目の表示 + +以下のコマンドを使用して、BE構成項目を表示できます。 + +```shell +curl http://:/varz +``` + +## BEパラメータの設定 + + + + + +## BEパラメータの理解 + +### ロギング + +##### diagnose_stack_trace_interval_ms + +- Default: 1800000 (30 minutes) +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: `STACK_TRACE` リクエストに対してDiagnoseDaemonが実行する連続したスタックトレース診断の最小時間間隔を制御します。診断リクエストが到着すると、前回の収集から `diagnose_stack_trace_interval_ms` ミリ秒未満の場合、デーモンはスタックトレースの収集とロギングをスキップします。この値を増やすと、頻繁なスタックダンプによるCPUオーバーヘッドとログ量を減らすことができます。値を減らすと、一時的な問題(たとえば、長い `TabletsChannel::add_chunk` ブロッキングのロードフェイルポイントシミュレーションなど)をデバッグするためにより頻繁なトレースをキャプチャできます。 +- Introduced in: v3.5.0 + +##### lake_replication_slow_log_ms + +- Default: 30000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: Lakeレプリケーション中にスローログエントリを出力するための閾値。各ファイルコピーの後、コードは経過時間をマイクロ秒で測定し、経過時間が `lake_replication_slow_log_ms * 1000` 以上の場合、その操作をスローとマークします。トリガーされると、StarRocksはそのレプリケートされたファイルのファイルサイズ、コスト、およびトレースメトリクスを含むINFOログを書き込みます。この値を増やすと、大規模/低速転送によるノイズの多いスローログを減らすことができます。値を減らすと、より小さな低速コピーイベントをより早く検出して表面化させることができます。 +- Introduced in: - + +##### load_rpc_slow_log_frequency_threshold_seconds + +- Default: 60 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 設定されたRPCタイムアウトを超えるロードRPCのスローログエントリをシステムがどのくらいの頻度で出力するかを制御します。スローログにはロードチャネルのランタイムプロファイルも含まれます。この値を0に設定すると、実際にはタイムアウトごとにログが記録されます。 +- Introduced in: v3.4.3, v3.5.0 + +##### log_buffer_level + +- Default: Empty string +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: ログをフラッシュする戦略。デフォルト値は、ログがメモリにバッファリングされることを示します。有効な値は `-1` と `0` です。`-1` は、ログがメモリにバッファリングされないことを示します。 +- Introduced in: - + +##### pprof_profile_dir + +- Default: `${STARROCKS_HOME}/log` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: StarRocksがpprofアーティファクト(Jemallocヒープスナップショットおよびgperftools CPUプロファイル)を書き込むディレクトリパス。 +- Introduced in: v3.2.0 + +##### sys_log_dir + +- Default: `${STARROCKS_HOME}/log` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: システムログ(INFO、WARNING、ERROR、FATALを含む)を保存するディレクトリ。 +- Introduced in: - + +##### sys_log_level + +- Default: INFO +- Type: String +- Unit: - +- Is mutable: はい (v3.3.0, v3.2.7, および v3.1.12から) +- Description: システムログエントリが分類される重大度レベル。有効な値:INFO、WARN、ERROR、FATAL。この項目はv3.3.0、v3.2.7、およびv3.1.12以降、動的構成に変更されました。 +- Introduced in: - + +##### sys_log_roll_mode + +- Default: SIZE-MB-1024 +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: システムログがログロールに分割されるモード。有効な値には `TIME-DAY`、`TIME-HOUR`、`SIZE-MB-`サイズ が含まれます。デフォルト値は、ログが1GBのロールに分割されることを示します。 +- Introduced in: - + +##### sys_log_roll_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 保持するログロールの数。 +- Introduced in: - + +##### sys_log_timezone + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: ログプレフィックスにタイムゾーン情報を表示するかどうか。`true` はタイムゾーン情報を表示することを示し、`false` は表示しないことを示します。 +- Introduced in: - + +##### sys_log_verbose_level + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 出力するログのレベル。この構成項目は、コード内のVLOGで開始されるログの出力を制御するために使用されます。 +- Introduced in: - + +##### sys_log_verbose_modules + +- Default: +- Type: Strings +- Unit: - +- Is mutable: いいえ +- Description: 出力するログのモジュール。たとえば、この構成項目をOLAPに設定すると、StarRocksはOLAPモジュールのログのみを出力します。有効な値はBEのネームスペースであり、`starrocks`、`starrocks::debug`、`starrocks::fs`、`starrocks::io`、`starrocks::lake`、`starrocks::pipeline`、`starrocks::query_cache`、`starrocks::stream`、および `starrocks::workgroup` が含まれます。 +- Introduced in: - + +### サーバー + +##### abort_on_large_memory_allocation + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 単一の割り当てリクエストが設定された大規模割り当て閾値(`g_large_memory_alloc_failure_threshold` > 0 かつリクエストサイズ > 閾値)を超えた場合、このフラグがプロセス応答を制御します。trueの場合、このような大規模割り当てが検出されると、StarRocksは直ちに `std::abort()` を呼び出します(ハードクラッシュ)。falseの場合、割り当てはブロックされ、アロケータは失敗(nullptrまたはENOMEM)を返すため、呼び出し元はエラーを処理できます。このチェックは、TRY_CATCH_BAD_ALLOCパスでラップされていない割り当てにのみ適用されます(bad-allocがキャッチされている場合、memフックは異なるフローを使用します)。予期しない巨大な割り当ての迅速なデバッグのために有効にします。プロダクション環境では、過大な割り当て試行で即座にプロセスを停止させたい場合を除き、無効にしてください。 +- Introduced in: v3.4.3, 3.5.0, 4.0.0 + +##### arrow_flight_port + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BE Arrow Flight SQLサーバーのTCPポート。`-1` はArrow Flightサービスを無効にすることを示します。macOS以外のビルドでは、BEはこのポートでArrow Flight SQL Serverを起動時に呼び出します。ポートが利用できない場合、サーバーの起動は失敗し、BEプロセスは終了します。設定されたポートは、ハートビートペイロードでFEに報告されます。 +- Introduced in: v3.4.0, v3.5.0 + +##### be_exit_after_disk_write_hang_second + +- Default: 60 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: ディスクがハングした後にBEが終了するまで待機する時間。 +- Introduced in: - + +##### be_http_num_workers + +- Default: 48 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: HTTPサーバーが使用するスレッド数。 +- Introduced in: - + +##### be_http_port + +- Default: 8040 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BE HTTPサーバーのポート。 +- Introduced in: - + +##### be_port + +- Default: 9060 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: FEからのリクエストを受信するために使用されるBE Thriftサーバーのポート。 +- Introduced in: - + +##### be_service_threads + +- Default: 64 +- Type: Int +- Unit: スレッド +- Is mutable: いいえ +- Description: BE ThriftサーバーがバックエンドのRPC/実行リクエストを処理するために使用するワーカースレッドの数。この値はBackendServiceの作成時にThriftServerに渡され、利用可能な同時リクエストハンドラの数を制御します。すべてのワーカースレッドがビジーの場合、リクエストはキューに入れられます。予想される同時RPC負荷と利用可能なCPU/メモリに基づいて調整してください。値を増やすと同時実行性が向上しますが、スレッドごとのメモリとコンテキスト切り替えのコストが増加します。値を減らすと並列処理が制限され、リクエストのレイテンシが増加する可能性があります。 +- Introduced in: v3.2.0 + +##### brpc_connection_type + +- Default: `"single"` +- Type: string +- Unit: - +- Is mutable: いいえ +- Description: bRPCチャネルの接続モード。有効な値: + - `"single"` (デフォルト):各チャネルに1つの永続的なTCP接続。 + - `"pooled"`:より高い同時実行性のために永続的な接続のプールを使用しますが、ソケット/ファイルディスクリプタのコストが増加します。 + - `"short"`:永続的なリソース使用量を減らすためにRPCごとに作成される短寿命の接続ですが、レイテンシが高くなります。 + 選択はソケットごとのバッファリング動作に影響し、未書き込みバイトがソケット制限を超える場合の `Socket.Write` の失敗(EOVERCROWDED)に影響を与える可能性があります。 +- Introduced in: v3.2.5 + +##### brpc_max_body_size + +- Default: 2147483648 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: bRPCの最大ボディサイズ。 +- Introduced in: - + +##### brpc_max_connections_per_server + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: クライアントが各リモートサーバーエンドポイントに対して保持する永続的なbRPC接続の最大数。各エンドポイントについて、`BrpcStubCache` は `StubPool` を作成し、その `_stubs` ベクトルはこのサイズに予約されます。最初のアクセスでは、制限に達するまで新しいスタブが作成されます。その後、既存のスタブはラウンドロビン方式で返されます。この値を増やすと、エンドポイントごとの同時実行性が向上しますが(単一チャネルでの競合が減少)、ファイルディスクリプタ、メモリ、およびチャネルのコストが増加します。 +- Introduced in: v3.2.0 + +##### brpc_num_threads + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: bRPCのbthread数。値 `-1` はCPUスレッドと同じ数を示します。 +- Introduced in: - + +##### brpc_port + +- Default: 8060 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: bRPCのネットワーク統計を表示するために使用されるBE bRPCポート。 +- Introduced in: - + +##### brpc_socket_max_unwritten_bytes + +- Default: 1073741824 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: bRPCサーバーにおける未書き込みの送信バイトのソケットごとの制限を設定します。ソケットにバッファリングされ、まだ書き込まれていないデータの量がこの制限に達すると、後続の `Socket.Write` 呼び出しはEOVERCROWDEDで失敗します。これにより、接続ごとのメモリの無制限の増加が防止されますが、非常に大きなメッセージや低速なピアの場合にRPC送信の失敗を引き起こす可能性があります。単一メッセージのボディが許可される未書き込みバッファよりも大きくならないように、この値を `brpc_max_body_size` と一致させてください。値を増やすと、接続ごとのメモリ使用量が増加します。 +- Introduced in: v3.2.0 + +##### brpc_stub_expire_s + +- Default: 3600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: bRPCスタブキャッシュの有効期限。デフォルト値は60分です。 +- Introduced in: - + +##### compress_rowbatches + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: BE間のRPCでローバッチを圧縮するかどうかを制御するブール値。`true` はローバッチを圧縮することを示し、`false` は圧縮しないことを示します。 +- Introduced in: - + +##### consistency_max_memory_limit_percent + +- Default: 20 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 一貫性関連タスクのメモリ予算を計算するために使用されるパーセンテージキャップ。BE起動時、最終的な一貫性制限は `consistency_max_memory_limit` (バイト) から解析された値と (`process_mem_limit * consistency_max_memory_limit_percent / 100`) の最小値として計算されます。`process_mem_limit` が未設定 (-1) の場合、一貫性メモリは無制限と見なされます。`consistency_max_memory_limit_percent` の場合、0未満または100より大きい値は100として扱われます。この値を調整すると、一貫性操作のために予約されるメモリが増減し、したがってクエリや他のサービスで利用可能なメモリに影響します。 +- Introduced in: v3.2.0 + +##### delete_worker_count_normal_priority + +- Default: 2 +- Type: Int +- Unit: スレッド +- Is mutable: いいえ +- Description: BEエージェントで削除(REALTIME_PUSH with DELETE)タスクを処理するために割り当てられた通常優先度のワーカースレッドの数。起動時にこの値は `delete_worker_count_high_priority` に追加され、`DeleteTaskWorkerPool` のサイズが決定されます(`agent_server.cpp` を参照)。プールは最初の `delete_worker_count_high_priority` スレッドをHIGH優先度として割り当て、残りをNORMAL優先度として割り当てます。通常優先度のスレッドは標準の削除タスクを処理し、全体的な削除スループットに貢献します。並列削除容量を増やすには(CPU/IO使用量の増加)、この値を増やします。リソース競合を減らすには、この値を減らします。 +- Introduced in: v3.2.0 + +##### disable_mem_pools + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: MemPoolを無効にするかどうか。この項目が `true` に設定されている場合、MemPoolのチャンクプーリングは無効になり、各割り当ては再利用またはプールされたチャンクを増やす代わりに独自のサイズのチャンクを取得します。プーリングを無効にすると、より頻繁な割り当て、チャンク数の増加、およびスキップされた整合性チェック(チャンク数が多いため回避される)のコストで、長期間保持されるバッファメモリが削減されます。割り当ての再利用とシステム呼び出しの減少の恩恵を受けるために、`disable_mem_pools` を `false`(デフォルト)のままにしてください。大規模なプールされたメモリ保持を避けなければならない場合(たとえば、メモリの少ない環境や診断実行の場合)にのみ `true` に設定してください。 +- Introduced in: v3.2.0 + +##### enable_https + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: この項目が `true` に設定されている場合、BEのbRPCサーバーはTLSを使用するように構成されます。`ServerOptions.ssl_options` は、BE起動時に `ssl_certificate_path` と `ssl_private_key_path` で指定された証明書と秘密鍵で設定されます。これにより、着信bRPC接続に対してHTTPS/TLSが有効になります。クライアントはTLSを使用して接続する必要があります。証明書と鍵ファイルが存在し、BEプロセスからアクセス可能であり、bRPC/SSLの期待に合致していることを確認してください。 +- Introduced in: v4.0.0 + +##### enable_jemalloc_memory_tracker + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: この項目が `true` に設定されている場合、BEはバックグラウンドスレッド(jemalloc_tracker_daemon)を起動し、jemalloc統計を(1秒に1回)ポーリングし、GlobalEnv jemallocメタデータMemTrackerをjemalloc "stats.metadata" 値で更新します。これにより、jemallocメタデータ消費がStarRocksプロセスメモリ会計に含まれ、jemalloc内部で使用されるメモリの過少報告が防止されます。トラッカーはmacOS以外のビルドでのみコンパイル/起動され(#ifndef __APPLE__)、"jemalloc_tracker_daemon" という名前のデーモンスレッドとして実行されます。この設定は起動動作とMemTrackerの状態を維持するスレッドに影響するため、変更には再起動が必要です。jemallocが使用されていない場合、またはjemallocトラッキングが意図的に異なる方法で管理されている場合にのみ無効にしてください。それ以外の場合は、正確なメモリ会計と割り当ての保護を維持するために有効にしておいてください。 +- Introduced in: v3.2.12 + +##### enable_jvm_metrics + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: 起動時にJVM固有のメトリクスを初期化および登録するかどうかを制御します。有効にすると、メトリクスサブシステムはJVM関連のコレクタ(例:ヒープ、GC、スレッドメトリクス)をエクスポート用に作成し、無効にすると、それらのコレクタは初期化されません。このパラメータは将来の互換性のために意図されており、将来のリリースで削除される可能性があります。システムレベルのメトリクス収集を制御するには `enable_system_metrics` を使用してください。 +- Introduced in: v4.0.0 + +##### get_pindex_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: UpdateManagerの「get_pindex」スレッドプール(プライマリキーテーブルのrowsetを適用するときに使用される永続インデックスデータをロード/フェッチするために使用される)のワーカースレッド数を設定します。実行時には、設定更新によってプールの最大スレッドが調整されます。`>0` の場合、その値が適用されます。`0` の場合、ランタイムコールバックはCPUコア数(`CpuInfo::num_cores()`)を使用します。初期化時には、プールの最大スレッドはmax(`get_pindex_worker_count`, `max_apply_thread_cnt` * 2) として計算されます。ここで `max_apply_thread_cnt` はapply-threadプールの最大値です。pindexロードの並列性を高めるには値を増やし、同時実行性とメモリ/CPU使用量を減らすには値を減らします。 +- Introduced in: v3.2.0 + +##### heartbeat_service_port + +- Default: 9050 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: FEからのハートビートを受信するために使用されるBEハートビートサービスポート。 +- Introduced in: - + +##### heartbeat_service_thread_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BEハートビートサービスのスレッド数。 +- Introduced in: - + +##### local_library_dir + +- Default: `${UDF_RUNTIME_DIR}` +- Type: string +- Unit: - +- Is mutable: いいえ +- Description: UDF(ユーザー定義関数)ライブラリがステージングされ、Python UDFワーカプロセスが動作するBE上のローカルディレクトリ。StarRocksはHDFSからこのパスにUDFライブラリをコピーし、`/pyworker_` にワーカごとのUnixドメインソケットを作成し、exec前にPythonワーカプロセスをこのディレクトリに変更します。ディレクトリは存在し、BEプロセスによって書き込み可能であり、Unixドメインソケットをサポートするファイルシステム(つまり、ローカルファイルシステム)上にある必要があります。この設定は実行時に変更できないため、起動前に設定し、各BEで適切な権限とディスクスペースを確保してください。 +- Introduced in: v3.2.0 + +##### max_transmit_batched_bytes + +- Default: 262144 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: 単一の送信リクエストで蓄積される、ネットワークにフラッシュされる前のシリアライズ済みバイトの最大数。送信側の実装は、シリアライズされたChunkPBペイロードをPTransmitChunkParamsリクエストに追加し、蓄積されたバイトが `max_transmit_batched_bytes` を超えるかEOSに達した場合にリクエストを送信します。この値を増やすと、RPCの頻度を減らし、スループットを向上させることができますが、リクエストごとのレイテンシとメモリ使用量が増加します。この値を減らすと、レイテンシとメモリを削減できますが、RPCレートが増加します。 +- Introduced in: v3.2.0 + +##### mem_limit + +- Default: 90% +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: BEプロセスのメモリ上限。パーセンテージ("80%")または物理的な制限("100G")として設定できます。デフォルトのハードリミットはサーバーのメモリサイズの90%で、ソフトリミットは80%です。StarRocksを他のメモリ集約的なサービスと同じサーバーにデプロイする場合、このパラメータを設定する必要があります。 +- Introduced in: - + +##### memory_max_alignment + +- Default: 16 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: MemPoolが整列された割り当てに対して受け入れる最大バイトアラインメントを設定します。呼び出し元がより大きなアラインメント(SIMD、デバイスバッファ、またはABI制約のため)を必要とする場合にのみ、この値を増やしてください。大きな値は、割り当てごとのパディングと予約メモリの無駄を増やし、システムアロケータとプラットフォームがサポートする範囲内である必要があります。 +- Introduced in: v3.2.0 + +##### memory_urgent_level + +- Default: 85 +- Type: long +- Unit: パーセンテージ (0-100) +- Is mutable: はい +- Description: プロセスメモリ制限のパーセンテージとして表される緊急メモリウォーターレベル。プロセスメモリ消費が `(limit * memory_urgent_level / 100)` を超えると、BEは即座にメモリ再利用をトリガーし、データキャッシュの縮小、更新キャッシュの削除、永続/lake MemTableの「満杯」扱いを強制し、それらがすぐにフラッシュ/圧縮されるようにします。コードは、この設定が `memory_high_level` より大きく、`memory_high_level` が1以上かつ100以下でなければならないことを検証します。値が低いと、より積極的で早期の再利用が発生し、キャッシュの削除とフラッシュが頻繁になります。値が高いと、再利用が遅延し、100に近すぎるとOOMのリスクがあります。この項目は `memory_high_level` とデータキャッシュ関連の自動調整設定と合わせて調整してください。 +- Introduced in: v3.2.0 + +##### net_use_ipv6_when_priority_networks_empty + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: `priority_networks` が指定されていない場合にIPv6アドレスを優先的に使用するかどうかを制御するブール値。`true` は、ノードをホストするサーバーがIPv4とIPv6アドレスの両方を持っており、`priority_networks` が指定されていない場合に、システムがIPv6アドレスを優先的に使用することを許可することを示します。 +- Introduced in: v3.3.0 + +##### num_cores + +- Default: 0 +- Type: Int +- Unit: コア +- Is mutable: いいえ +- Description: CPU認識の決定(例えば、スレッドプールサイジングやランタイムスケジューリング)にシステムが使用するCPUコア数を制御します。値が0の場合、自動検出が有効になります。システムは `/proc/cpuinfo` を読み取り、利用可能なすべてのコアを使用します。正の整数に設定された場合、その値は検出されたコア数を上書きし、実効コア数になります。コンテナ内で実行されている場合、cgroupのcpusetまたはcpuクォータ設定が使用可能なコアをさらに制限する可能性があります。`CpuInfo` もこれらのcgroup制限を尊重します。 +- Introduced in: v3.2.0 + +##### plugin_path + +- Default: `${STARROCKS_HOME}/plugin` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: StarRocksが外部プラグイン(動的ライブラリ、コネクタアーティファクト、UDFバイナリなど)をロードするファイルシステムディレクトリ。`plugin_path` はBEプロセスからアクセス可能なディレクトリ(読み取りおよび実行権限)を指し、プラグインがロードされる前に存在する必要があります。正しい所有権と、プラグインファイルがプラットフォームのネイティブバイナリ拡張子(例:Linuxでは.so)を使用していることを確認してください。 +- Introduced in: v3.2.0 + +##### priority_networks + +- Default: An empty string +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: 複数のIPアドレスを持つサーバーの選択戦略を宣言します。このパラメータで指定されたリストに最大1つのIPアドレスが一致する必要があることに注意してください。このパラメータの値は、CIDR表記でセミコロン(;)で区切られたエントリのリストです(例:`10.10.10.0/24`)。このリストのエントリにIPアドレスが一致しない場合、サーバーの利用可能なIPアドレスがランダムに選択されます。v3.3.0以降、StarRocksはIPv6ベースのデプロイメントをサポートします。サーバーがIPv4とIPv6の両方のアドレスを持ち、このパラメータが指定されていない場合、システムはデフォルトでIPv4アドレスを使用します。この動作は `net_use_ipv6_when_priority_networks_empty` を `true` に設定することで変更できます。 +- Introduced in: - + +##### rpc_compress_ratio_threshold + +- Default: 1.1 +- Type: Double +- Unit: - +- Is mutable: はい +- Description: 圧縮形式でネットワーク経由でシリアライズされたローバッチを送信するかどうかを決定する際に使用される閾値 (uncompressed_size / compressed_size)。圧縮が試行されるとき (例: DataStreamSender、交換シンク、タブレットシンクインデックスチャネル、辞書キャッシュライター)、StarRocksはcompress_ratio = uncompressed_size / compressed_size を計算します。compress_ratioが `rpc_compress_ratio_threshold` より大きい場合にのみ圧縮ペイロードを使用します。デフォルトの1.1では、圧縮データは非圧縮データよりも少なくとも約9.1%小さくなければ使用されません。圧縮を優先するには値を下げます (帯域幅の節約が小さくてもCPU使用量が増加)。より大きなサイズ削減が得られない限り圧縮オーバーヘッドを避けるには値を上げます。注意: これはRPC/シャッフルシリアライズに適用され、ローバッチ圧縮が有効な場合 (compress_rowbatches) にのみ有効です。 +- Introduced in: v3.2.0 + +##### ssl_private_key_path + +- Default: An empty string +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: BEのbRPCサーバーがデフォルト証明書の秘密鍵として使用するTLS/SSL秘密鍵(PEM)へのファイルシステムパス。`enable_https` が `true` に設定されている場合、システムはプロセス開始時に `brpc::ServerOptions::ssl_options().default_cert.private_key` をこのパスに設定します。ファイルはBEプロセスからアクセス可能であり、`ssl_certificate_path` で提供される証明書と一致する必要があります。この値が設定されていない場合、またはファイルが存在しないかアクセスできない場合、HTTPSは設定されず、bRPCサーバーの起動に失敗する可能性があります。このファイルを制限的なファイルシステム権限(例:600)で保護してください。 +- Introduced in: v4.0.0 + +##### thrift_client_retry_interval_ms + +- Default: 100 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: Thriftクライアントが再試行する時間間隔。 +- Introduced in: - + +##### thrift_connect_timeout_seconds + +- Default: 3 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: Thriftクライアント作成時に使用される接続タイムアウト(秒単位)。`ClientCacheHelper::_create_client` はこの値を1000倍し、`ThriftClientImpl::set_conn_timeout()` に渡すため、BEクライアントキャッシュによって開かれる新しいThrift接続のTCP/接続ハンドシェイクタイムアウトを制御します。この設定は接続確立のみに影響します。送信/受信タイムアウトは別途設定されます。非常に小さい値は、高遅延ネットワークで偽の接続障害を引き起こす可能性があり、大きい値は到達不能なピアの検出を遅らせます。 +- Introduced in: v3.2.0 + +##### thrift_port + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 内部のThriftベースのBackendServiceをエクスポートするために使用されるポート。プロセスがCompute Nodeとして実行され、この項目が0以外の値に設定されている場合、`be_port` を上書きし、Thriftサーバーはこの値にバインドします。それ以外の場合は `be_port` が使用されます。この設定は非推奨です。0以外の `thrift_port` を設定すると、`be_port` を使用するように助言する警告がログに記録されます。 +- Introduced in: v3.2.0 + +##### thrift_rpc_connection_max_valid_time_ms + +- Default: 5000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: Thrift RPC接続の最大有効時間。この値よりも長く接続プールに存在した接続は閉じられます。FE構成 `thrift_client_timeout_ms` と一致するように設定する必要があります。 +- Introduced in: - + +##### thrift_rpc_max_body_size + +- Default: 0 +- Type: Int +- Unit: +- Is mutable: いいえ +- Description: RPCの最大文字列ボディサイズ。`0` はサイズが無制限であることを示します。 +- Introduced in: - + +##### thrift_rpc_strict_mode + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: Thriftの厳密な実行モードが有効になっているかどうか。Thriftの厳密なモードの詳細については、[Thrift Binary protocol encoding](https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md) を参照してください。 +- Introduced in: - + +##### thrift_rpc_timeout_ms + +- Default: 5000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: Thrift RPCのタイムアウト。 +- Introduced in: - + +##### transaction_apply_thread_pool_num_min + +- Default: 0 +- Type: Int +- Unit: スレッド +- Is mutable: はい +- Description: BEのUpdateManager内の「update_apply」スレッドプール(プライマリキーテーブルのrowsetを適用するプール)の最小スレッド数を設定します。値が0の場合、固定された最小値は無効になります(下限は強制されません)。`transaction_apply_worker_count` も0の場合、プールの最大スレッドはCPUコア数にデフォルト設定されるため、実効ワーカ容量はCPUコア数に等しくなります。これを増やすと、トランザクション適用に対するベースラインの同時実行性を保証できます。高すぎるとCPUの競合が増加する可能性があります。変更は、update_config HTTPハンドラを介して実行時に適用されます(applyスレッドプールで `update_min_threads` を呼び出します)。 +- Introduced in: v3.2.11 + +##### transaction_publish_version_thread_pool_num_min + +- Default: 0 +- Type: Int +- Unit: スレッド +- Is mutable: はい +- Description: AgentServerの「publish_version」動的スレッドプール(トランザクションバージョンを公開する/ TTaskType::PUBLISH_VERSIONタスクを処理するために使用される)で予約される最小スレッド数を設定します。起動時にプールはmin = max(設定値, MIN_TRANSACTION_PUBLISH_WORKER_COUNT) (MIN_TRANSACTION_PUBLISH_WORKER_COUNT = 1) で作成されるため、デフォルトの0は最小1スレッドになります。実行時にこの値を変更すると、updateコールバックがThreadPool::update_min_threadsを呼び出し、プールの保証された最小値が増減します(ただし、強制された最小値の1を下回ることはありません)。`transaction_publish_version_worker_count` (最大スレッド) と `transaction_publish_version_thread_pool_idle_time_ms` (アイドルタイムアウト) と連携して調整してください。 +- Introduced in: v3.2.11 + +##### use_mmap_allocate_chunk + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: この項目が `true` に設定されている場合、システムは匿名プライベートmmapマッピング(MAP_ANONYMOUS | MAP_PRIVATE)を使用してチャンクを割り当て、munmapで解放します。これを有効にすると、多数の仮想メモリマッピングが作成される可能性があるため、カーネル制限(rootユーザーとして `sysctl -w vm.max_map_count=262144` または `echo 262144 > /proc/sys/vm/max_map_count` を実行)を上げ、`chunk_reserved_bytes_limit` を比較的に大きな値に設定する必要があります。そうしないと、mmapを有効にすると、頻繁なマッピング/アンマッピングにより非常に低いパフォーマンスになる可能性があります。 +- Introduced in: v3.2.0 + +### メタデータとクラスター管理 + +##### cluster_id + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: このStarRocksバックエンドのグローバルクラスター識別子。起動時にStorageEngineは `config::cluster_id` を実効クラスターIDに読み込み、すべてのデータルートパスが同じクラスターIDを含んでいることを確認します(`StorageEngine::_check_all_root_path_cluster_id` を参照)。値 `-1` は「未設定」を意味します。エンジンは既存のデータディレクトリまたはマスターハートビートから実効IDを導出する可能性があります。非負のIDが構成されている場合、構成されたIDとデータディレクトリに保存されているIDとの不一致により、起動検証が失敗します(`Status::Corruption`)。一部のルートにIDがなく、エンジンがIDの書き込みを許可されている場合(`options.need_write_cluster_id`)、実効IDをそれらのルートに永続化します。 +- Introduced in: v3.2.0 + +##### consistency_max_memory_limit + +- Default: 10G +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: CONSISTENCYメモリトラッカーのメモリサイズ指定。 +- Introduced in: v3.2.0 + +##### make_snapshot_rpc_timeout_ms + +- Default: 20000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: リモートBEでスナップショットを作成する際に使用されるThrift RPCタイムアウトをミリ秒単位で設定します。リモートスナップショットの作成がデフォルトのタイムアウトを定期的に超える場合はこの値を増やし、応答しないBEでより早く失敗するにはこの値を減らします。他のタイムアウトがエンドツーエンド操作に影響を与える可能性があることに注意してください(例えば、実効的なタブレットライターオープンタイムアウトは `tablet_writer_open_rpc_timeout_sec` や `load_timeout_sec` に関連する可能性があります)。 +- Introduced in: v3.2.0 + +##### metadata_cache_memory_limit_percent + +- Default: 30 +- Type: Int +- Unit: パーセント +- Is mutable: はい +- Description: メタデータLRUキャッシュサイズをプロセスメモリ制限のパーセンテージとして設定します。起動時にStarRocksはキャッシュバイト数を (process_mem_limit * metadata_cache_memory_limit_percent / 100) として計算し、それをメタデータキャッシュアロケータに渡します。キャッシュは非PRIMARY_KEYS rowsets(PKテーブルはサポートされていません)にのみ使用され、`metadata_cache_memory_limit_percent` > 0 の場合にのみ有効になります。メタデータキャッシュを無効にするには ≤ 0 に設定します。この値を増やすとメタデータキャッシュ容量が増加しますが、他のコンポーネントで利用可能なメモリが減少します。ワークロードとシステムメモリに基づいて調整してください。BE_TESTビルドではアクティブではありません。 +- Introduced in: v3.2.10 + +##### retry_apply_interval_second + +- Default: 30 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 失敗したタブレット適用操作の再試行をスケジューリングする際に使用されるベース間隔(秒単位)。これは、送信失敗後の再試行のスケジューリングに直接使用され、バックオフのベース乗数としても使用されます。次の再試行遅延はmin(600, `retry_apply_interval_second` * failed_attempts)として計算されます。コードはまた、`retry_apply_interval_second` を使用して累積再試行期間(等差数列の合計)を計算し、`retry_apply_timeout_second` と比較して再試行を続けるかどうかを決定します。`enable_retry_apply` がtrueの場合にのみ有効です。この値を増やすと、個々の再試行遅延と再試行に費やされる累積時間の両方が長くなります。減らすと、再試行がより頻繁になり、`retry_apply_timeout_second` に達するまでの試行回数が増加する可能性があります。 +- Introduced in: v3.2.9 + +##### retry_apply_timeout_second + +- Default: 7200 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 適用プロセスがあきらめ、タブレットがエラー状態に入る前に、保留中のバージョンの適用に許容される最大累積再試行時間(秒単位)。適用ロジックは、`retry_apply_interval_second` に基づいて指数関数的/バックオフ間隔を累積し、合計期間を `retry_apply_timeout_second` と比較します。`enable_retry_apply` がtrueであり、エラーが再試行可能と見なされる場合、累積バックオフが `retry_apply_timeout_second` を超えるまで適用試行は再スケジュールされます。その後、適用は停止し、タブレットはエラーに移行します。明示的に再試行不能なエラー(例:`Corruption`)は、この設定に関係なく再試行されません。この値を調整して、StarRocksが適用操作を再試行し続ける期間(デフォルト7200秒 = 2時間)を制御します。 +- Introduced in: v3.3.13, v3.4.3, v3.5.0 + +##### txn_commit_rpc_timeout_ms + +- Default: 60000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: BEストリームロードおよびトランザクションコミット呼び出しで使用されるThrift RPC接続の最大許容存続時間(ミリ秒単位)。StarRocksはこの値をFEに送信されるリクエスト(stream_load計画、loadTxnBegin/loadTxnPrepare/loadTxnCommit、getLoadTxnStatusで使用)の `thrift_rpc_timeout_ms` として設定します。接続がこの値よりも長くプールされている場合、接続は閉じられます。リクエストごとのタイムアウト(`ctx->timeout_second`)が提供されている場合、BEはRPCタイムアウトをrpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms)) として計算するため、実効的なRPCタイムアウトはコンテキストとこの設定によって制限されます。不一致のタイムアウトを避けるために、これをFEの `thrift_client_timeout_ms` と一致させてください。 +- Introduced in: v3.2.0 + +##### txn_map_shard_size + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: トランザクションマネージャーがトランザクションロックを分割し、競合を減らすために使用するロックマップシャードの数。その値は2の累乗(2^n)であるべきです。値を増やすと、追加のメモリとわずかな簿記のオーバーヘッドのコストで、同時実行性が向上し、ロック競合が減少します。予想される同時トランザクションと利用可能なメモリに合わせてシャード数を設定してください。 +- Introduced in: v3.2.0 + +##### txn_shard_size + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: トランザクションマネージャーが使用するロックシャードの数を制御します。この値はtxnロックのシャードサイズを決定します。2の累乗である必要があります。より大きな値に設定すると、ロックの競合が減少し、同時COMMIT/PUBLISHのスループットが向上しますが、追加のメモリとより詳細な内部的な簿記のコストが増加します。 +- Introduced in: v3.2.0 + +##### update_schema_worker_count + +- Default: 3 +- Type: Int +- Unit: スレッド +- Is mutable: いいえ +- Description: TTaskType::UPDATE_SCHEMAタスクを処理するバックエンドの「update_schema」動的ThreadPool内のワーカースレッドの最大数を設定します。ThreadPoolは起動時にagent_serverで最小0スレッド(アイドル時には0にスケールダウン可能)とこの設定に等しい最大値(プールはデフォルトのアイドルタイムアウトと実質的に無制限のキューを使用)で作成されます。この値を増やすと、より多くの同時スキーマ更新タスクを許可できます(CPUとメモリ使用量が増加)。値を減らすと、並列スキーマ操作が制限されます。 +- Introduced in: v3.2.3 + +##### update_tablet_meta_info_worker_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: タブレットメタデータ更新タスクを処理するバックエンドスレッドプールの最大ワーカースレッド数を設定します。スレッドプールはバックエンドの起動中に作成され、最小0スレッド(アイドル時には0にスケールダウン可能)とこの設定に等しい最大値(少なくとも1に制限される)を持ちます。実行時にこの値を更新すると、プールの最大スレッドが調整されます。同時メタデータ更新タスクを増やすには値を増やし、同時実行性を制限するには値を減らします。 +- Introduced in: v4.1.0, v4.0.6, v3.5.13 + +### ユーザー、ロール、および権限 + +##### ssl_certificate_path + +- Default: +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: `enable_https` がtrueの場合にBEのbRPCサーバーが使用するTLS/SSL証明書ファイル(PEM)への絶対パス。BE起動時にこの値は `brpc::ServerOptions::ssl_options().default_cert.certificate` にコピーされます。一致する秘密鍵も `ssl_private_key_path` に設定する必要があります。CAで必要とされる場合、サーバー証明書とすべての中間証明書をPEM形式(証明書チェーン)で提供してください。ファイルはStarRocks BEプロセスから読み取り可能であり、起動時にのみ適用されます。`enable_https` が有効なときに設定されていないか無効な場合、bRPC TLSのセットアップが失敗し、サーバーが正しく起動できない可能性があります。 +- Introduced in: v4.0.0 + +### クエリエンジン + +##### clear_udf_cache_when_start + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: 有効にすると、BEのUserFunctionCacheは起動時にローカルにキャッシュされているすべてのユーザー関数ライブラリをクリアします。`UserFunctionCache::init` 中に、コードは `_reset_cache_dir()` を呼び出し、設定されたUDFライブラリディレクトリ(`kLibShardNum` サブディレクトリに整理されている)からUDFファイルを削除し、Java/Python UDFのサフィックス(.jar/.py)を持つファイルを削除します。無効にすると(デフォルト)、BEは既存のキャッシュされたUDFファイルを削除する代わりにロードします。これを有効にすると、再起動後の最初の使用時にUDFバイナリが再ダウンロードされることになります(ネットワークトラフィックと初回使用のレイテンシが増加します)。 +- Introduced in: v4.0.0 + +##### dictionary_speculate_min_chunk_size + +- Default: 10000 +- Type: Int +- Unit: 行 +- Is mutable: いいえ +- Description: `StringColumnWriter` および `DictColumnWriter` が辞書エンコーディング推測をトリガーするために使用する最小行数(チャンクサイズ)。入力列(または蓄積されたバッファと入力行)のサイズが `dictionary_speculate_min_chunk_size` 以上の場合、ライターは直ちに推測を実行し、より多くの行をバッファリングする代わりにエンコーディング(DICT、PLAINまたはBIT_SHUFFLE)を設定します。推測は文字列列の場合は `dictionary_encoding_ratio` を、数値/非文字列列の場合は `dictionary_encoding_ratio_for_non_string_column` を使用して、辞書エンコーディングが有益かどうかを決定します。また、大きな列の `byte_size`(UINT32_MAX以上)は、`BinaryColumn` のオーバーフローを避けるために即座の推測を強制します。 +- Introduced in: v3.2.0 + +##### disable_storage_page_cache + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: PageCacheを無効にするかどうかを制御するブール値。 + - PageCacheが有効になっている場合、StarRocksは最近スキャンされたデータをキャッシュします。 + - PageCacheは、類似のクエリが頻繁に繰り返される場合にクエリパフォーマンスを大幅に向上させることができます。 + - `true` はPageCacheを無効にすることを示します。 + - StarRocks v2.4以降、この項目のデフォルト値は `true` から `false` に変更されました。 +- Introduced in: - + +##### enable_bitmap_index_memory_page_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: Bitmapインデックスのメモリキャッシュを有効にするかどうか。ポイントクエリを高速化するためにBitmapインデックスを使用する場合は、メモリキャッシュを推奨します。 +- Introduced in: v3.1 + +##### enable_compaction_flat_json + +- Default: True +- Type: Boolean +- Unit: +- Is mutable: はい +- Description: Flat JSONデータのコンパクションを有効にするかどうか。 +- Introduced in: v3.3.3 + +##### enable_json_flat + +- Default: false +- Type: Boolean +- Unit: +- Is mutable: はい +- Description: Flat JSON機能を有効にするかどうか。この機能が有効になった後、新しくロードされたJSONデータは自動的にフラット化され、JSONクエリのパフォーマンスが向上します。 +- Introduced in: v3.3.0 + +##### enable_lazy_dynamic_flat_json + +- Default: True +- Type: Boolean +- Unit: +- Is mutable: はい +- Description: 読み込みプロセスでFlat JSONスキーマが見つからない場合に、Lazy Dyamic Flat JSONを有効にするかどうか。この項目が `true` に設定されている場合、StarRocksはFlat JSON操作を読み込みプロセスではなく計算プロセスに延期します。 +- Introduced in: v3.3.3 + +##### enable_ordinal_index_memory_page_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 順序インデックスのメモリキャッシュを有効にするかどうか。順序インデックスは行IDからデータページ位置へのマッピングであり、スキャンを高速化するために使用できます。 +- Introduced in: - + +##### enable_string_prefix_zonemap + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: プレフィックスベースの最小/最大値を使用して、文字列(CHAR/VARCHAR)列のZoneMapを有効にするかどうか。非キー文字列列の場合、最小/最大値は `string_prefix_zonemap_prefix_len` で設定された固定プレフィックス長に切り捨てられます。 +- Introduced in: - + +##### enable_zonemap_index_memory_page_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: zonemapインデックスのメモリキャッシュを有効にするかどうか。zonemapインデックスを使用してスキャンを高速化する場合は、メモリキャッシュを推奨します。 +- Introduced in: - + +##### exchg_node_buffer_size_bytes + +- Default: 10485760 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: 各クエリの交換ノードのレシーバー側の最大バッファサイズ。この構成項目はソフトリミットです。データが過剰な速度でレシーバー側に送信されたときにバックプレッシャがトリガーされます。 +- Introduced in: - + +##### file_descriptor_cache_capacity + +- Default: 16384 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: キャッシュできるファイルディスクリプタの数。 +- Introduced in: - + +##### flamegraph_tool_dir + +- Default: `${STARROCKS_HOME}/bin/flamegraph` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: flamegraphツールのディレクトリ。プロファイルデータからフレームグラフを生成するためのpprof、stackcollapse-go.pl、およびflamegraph.plスクリプトが含まれている必要があります。 +- Introduced in: - + +##### fragment_pool_queue_size + +- Default: 2048 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 各BEノードで処理できるクエリ数の上限。 +- Introduced in: - + +##### fragment_pool_thread_num_max + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: クエリに使用される最大スレッド数。 +- Introduced in: - + +##### fragment_pool_thread_num_min + +- Default: 64 +- Type: Int +- Unit: 分 - +- Is mutable: いいえ +- Description: クエリに使用される最小スレッド数。 +- Introduced in: - + +##### hdfs_client_enable_hedged_read + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: ヘッジリード機能を有効にするかどうかを指定します。 +- Introduced in: v3.0 + +##### hdfs_client_hedged_read_threadpool_size + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: HDFSクライアントのヘッジリードスレッドプールのサイズを指定します。スレッドプールサイズは、HDFSクライアントでヘッジリードを実行するために割り当てるスレッド数を制限します。HDFSクラスターの **hdfs-site.xml** ファイルにある `dfs.client.hedged.read.threadpool.size` パラメータと同等です。 +- Introduced in: v3.0 + +##### hdfs_client_hedged_read_threshold_millis + +- Default: 2500 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: ヘッジリードを開始する前に待機するミリ秒数を指定します。たとえば、このパラメータを `30` に設定した場合、この状況で、ブロックからの読み取りが30ミリ秒以内に返されない場合、HDFSクライアントは直ちに異なるブロックレプリカに対して新しい読み取りを開始します。HDFSクラスターの **hdfs-site.xml** ファイルにある `dfs.client.hedged.read.threshold.millis` パラメータと同等です。 +- Introduced in: v3.0 + +##### io_coalesce_adaptive_lazy_active + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 述語の選択性に基づいて、述語列と非述語列のI/Oを結合するかどうかを適応的に決定します。 +- Introduced in: v3.2 + +##### jit_lru_cache_size + +- Default: 0 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: JITコンパイル用のLRUキャッシュサイズ。0より大きい値に設定された場合、キャッシュの実際のサイズを表します。0以下に設定された場合、システムは式 `jit_lru_cache_size = min(mem_limit*0.01, 1GB)` を使用してキャッシュを適応的に設定します(ノードの `mem_limit` は16GB以上である必要があります)。 +- Introduced in: - + +##### json_flat_column_max + +- Default: 100 +- Type: Int +- Unit: +- Is mutable: はい +- Description: Flat JSONで抽出できるサブフィールドの最大数。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 +- Introduced in: v3.3.0 + +##### json_flat_create_zonemap + +- Default: true +- Type: Boolean +- Unit: +- Is mutable: はい +- Description: 書き込み中にフラット化されたJSONサブ列のZoneMapを作成するかどうか。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 +- Introduced in: - + +##### json_flat_null_factor + +- Default: 0.3 +- Type: Double +- Unit: +- Is mutable: はい +- Description: Flat JSONで抽出する列のNULL値の割合。NULL値の割合がこの閾値よりも高い場合、列は抽出されません。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 +- Introduced in: v3.3.0 + +##### json_flat_sparsity_factor + +- Default: 0.3 +- Type: Double +- Unit: +- Is mutable: はい +- Description: Flat JSONの同じ名前を持つ列の割合。同じ名前を持つ列の割合がこの値よりも低い場合、抽出は実行されません。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 +- Introduced in: v3.3.0 + +##### lake_tablet_ignore_invalid_delete_predicate + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 列名が変更された後、重複キーテーブルへの論理削除によって導入された可能性のあるタブレットのrowsetメタデータ内の無効な削除述語を無視するかどうかを制御するブール値。 +- Introduced in: v4.0 + +##### late_materialization_ratio + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: SegmentIterator(ベクトルクエリエンジン)での遅延マテリアライズの使用を制御する範囲[0-1000]の整数比。値 `0`(または ≤ 0)は遅延マテリアライズを無効にします。`1000`(または ≥ 1000)はすべての読み取りに対して遅延マテリアライズを強制します。0より大きく1000未満の値は、述語フィルタ比率に基づいて動作を選択する条件付き戦略を有効にします(値が大きいほど遅延マテリアライズが優先されます)。セグメントに複雑なメトリックタイプが含まれている場合、StarRocksは代わりに `metric_late_materialization_ratio` を使用します。`lake_io_opts.cache_file_only` が設定されている場合、遅延マテリアライズは無効になります。 +- Introduced in: v3.2.0 + +##### max_hdfs_file_handle + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 開くことができるHDFSファイルディスクリプタの最大数。 +- Introduced in: - + +##### max_memory_sink_batch_count + +- Default: 20 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: スキャンキャッシュバッチの最大数。 +- Introduced in: - + +##### max_pushdown_conditions_per_column + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 各列でプッシュダウンを許可する条件の最大数。条件数がこの制限を超えると、述語はストレージ層にプッシュダウンされません。 +- Introduced in: - + +##### max_scan_key_num + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 各クエリでセグメント化されるスキャンキーの最大数。 +- Introduced in: - + +##### metric_late_materialization_ratio + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 複雑なメトリック列を含む読み取りに対して、遅延マテリアライズ行アクセス戦略がいつ使用されるかを制御します。有効範囲:[0-1000]。`0` は遅延マテリアライズを無効にします。`1000` はすべての適用可能な読み取りに対して遅延マテリアライズを強制します。1〜999の値は、遅延マテリアライズと早期マテリアライズの両方のコンテキストが準備され、述語/選択性に基づいて実行時に選択される条件付き戦略を有効にします。複雑なメトリックタイプが存在する場合、`metric_late_materialization_ratio` は一般的な `late_materialization_ratio` を上書きします。注:`cache_file_only` I/Oモードでは、この設定に関係なく遅延マテリアライズが無効になります。 +- Introduced in: v3.2.0 + +##### min_file_descriptor_number + +- Default: 60000 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BEプロセスのファイルディスクリプタの最小数。 +- Introduced in: - + +##### object_storage_connect_timeout_ms + +- Default: -1 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: オブジェクトストレージとのソケット接続を確立するためのタイムアウト期間。`-1` はSDK構成のデフォルトのタイムアウト期間を使用することを示します。 +- Introduced in: v3.0.9 + +##### object_storage_request_timeout_ms + +- Default: -1 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: オブジェクトストレージとのHTTP接続を確立するためのタイムアウト期間。`-1` はSDK構成のデフォルトのタイムアウト期間を使用することを示します。 +- Introduced in: v3.0.9 + +##### parquet_late_materialization_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: パフォーマンス向上のためにParquetリーダーの遅延マテリアライズを有効にするかどうかを制御するブール値。`true` は遅延マテリアライズを有効にすることを示し、`false` は無効にすることを示します。 +- Introduced in: - + +##### parquet_page_index_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: パフォーマンス向上のためにParquetファイルのページインデックスを有効にするかどうかを制御するブール値。`true` はページインデックスを有効にすることを示し、`false` は無効にすることを示します。 +- Introduced in: v3.3 + +##### parquet_reader_bloom_filter_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: パフォーマンス向上のためにParquetファイルのブルームフィルタを有効にするかどうかを制御するブール値。`true` はブルームフィルタを有効にすることを示し、`false` は無効にすることを示します。セッションレベルでシステム変数 `enable_parquet_reader_bloom_filter` を使用してこの動作を制御することもできます。Parquetのブルームフィルタは、**各行グループ内の列レベルで**維持されます。Parquetファイルが特定の列のブルームフィルタを含む場合、クエリはそれらの列の述語を使用して効率的に行グループをスキップできます。 +- Introduced in: v3.5 + +##### path_gc_check_step + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 毎回連続してスキャンできるファイルの最大数。 +- Introduced in: - + +##### path_gc_check_step_interval_ms + +- Default: 10 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: ファイルスキャン間の時間間隔。 +- Introduced in: - + +##### path_scan_interval_second + +- Default: 86400 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: GCが期限切れデータをクリーンアップする時間間隔。 +- Introduced in: - + +##### pipeline_connector_scan_thread_num_per_cpu + +- Default: 8 +- Type: Double +- Unit: - +- Is mutable: はい +- Description: BEノードのPipeline ConnectorにCPUコアごとに割り当てられるスキャンスレッド数。この設定はv3.1.7以降、動的に変更可能になりました。 +- Introduced in: - + +##### pipeline_poller_timeout_guard_ms + +- Default: -1 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: この項目が `0` より大きく設定されている場合、pollerでドライバーが単一のディスパッチに `pipeline_poller_timeout_guard_ms` よりも長くかかると、ドライバーとオペレーターの情報が出力されます。 +- Introduced in: - + +##### pipeline_prepare_thread_pool_queue_size + +- Default: 102400 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: Pipeline実行エンジンのPREPAREフラグメントスレッドプールの最大キュー長。 +- Introduced in: - + +##### pipeline_prepare_thread_pool_thread_num + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: パイプライン実行エンジンのPREPAREフラグメントスレッドプールのスレッド数。`0` はシステムVCPUコア数に等しいことを示します。 +- Introduced in: - + +##### pipeline_prepare_timeout_guard_ms + +- Default: -1 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: この項目が `0` より大きく設定されている場合、PREPAREプロセス中にプランフラグメントが `pipeline_prepare_timeout_guard_ms` を超えると、プランフラグメントのスタックトレースが出力されます。 +- Introduced in: - + +##### pipeline_scan_thread_pool_queue_size + +- Default: 102400 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: Pipeline実行エンジンのSCANスレッドプールの最大タスクキュー長。 +- Introduced in: - + +##### pk_index_parallel_get_threadpool_size + +- Default: 1048576 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データ(クラウドネイティブ/lake)モードでのPKインデックス並列取得操作で使用される「cloud_native_pk_index_get」スレッドプールの最大キューサイズ(保留中のタスク数)を設定します。このプールの実際のスレッド数は `pk_index_parallel_get_threadpool_max_threads` によって制御されます。この設定は、実行を待機しているタスクがどれだけキューに入れられるかを制限するだけです。非常に大きいデフォルト値(2^20)は、キューを事実上無制限にします。これを減らすと、キューに入れられたタスクによる過剰なメモリ増加を防ぎますが、キューがいっぱいになるとタスクの送信がブロックされたり失敗したりする可能性があります。ワークロードの同時実行性とメモリ制約に基づいて `pk_index_parallel_get_threadpool_max_threads` と一緒に調整してください。 +- Introduced in: - + +##### priority_queue_remaining_tasks_increased_frequency + +- Default: 512 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: `BlockingPriorityQueue` がすべての残りのタスクの優先度を増加(「エージング」)させて飢餓を防ぐ頻度を制御します。正常なget/popごとに内部 `_upgrade_counter` がインクリメントされます。`_upgrade_counter` が `priority_queue_remaining_tasks_increased_frequency` を超えると、キューはすべての要素の優先度をインクリメントし、ヒープを再構築し、カウンターをリセットします。値が低いほど優先度エージングが頻繁になります(飢餓は減少しますが、イテレーションと再ヒープ化によるCPUコストが増加します)。値が高いほどオーバーヘッドは減少しますが、優先度調整が遅れます。この値は単純な操作数閾値であり、時間 duration ではありません。 +- Introduced in: v3.2.0 + +##### query_cache_capacity + +- Default: 536870912 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: BEのクエリキャッシュのサイズ。デフォルトサイズは512MBです。サイズは4MB未満にすることはできません。BEのメモリ容量が予想されるクエリキャッシュサイズをプロビジョニングするのに不十分な場合、BEのメモリ容量を増やすことができます。 +- Introduced in: - + +##### query_pool_spill_mem_limit_threshold + +- Default: 1.0 +- Type: Double +- Unit: - +- Is mutable: いいえ +- Description: 自動スピリングが有効になっている場合、すべてのクエリのメモリ使用量が `query_pool memory limit * query_pool_spill_mem_limit_threshold` を超えると、中間結果のスピリングがトリガーされます。 +- Introduced in: v3.2.7 + +##### query_scratch_dirs + +- Default: `${STARROCKS_HOME}` +- Type: string +- Unit: - +- Is mutable: いいえ +- Description: クエリ実行が中間データ(例えば、外部ソート、ハッシュ結合、その他のオペレーター)をスピルするために使用する書き込み可能なスクラッチディレクトリのカンマ区切りリスト。セミコロン(;)で区切られた1つ以上のパスを指定します(例:`/mnt/ssd1/tmp;/mnt/ssd2/tmp`)。ディレクトリはBEプロセスからアクセスおよび書き込み可能であり、十分な空きスペースが必要です。StarRocksはそれらの中から選択してスピルI/Oを分散させます。変更を有効にするには再起動が必要です。ディレクトリが見つからない、書き込みできない、または満杯の場合、スピルが失敗したり、クエリパフォーマンスが低下したりする可能性があります。 +- Introduced in: v3.2.0 + +##### result_buffer_cancelled_interval_time + +- Default: 300 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: BufferControlBlockがデータを解放するまでの待機時間。 +- Introduced in: - + +##### scan_context_gc_interval_min + +- Default: 5 +- Type: Int +- Unit: 分 +- Is mutable: はい +- Description: スキャンコンテキストをクリーンアップする時間間隔。 +- Introduced in: - + +##### scanner_row_num + +- Default: 16384 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: スキャンで各スキャンスレッドが返す最大行数。 +- Introduced in: - + +##### scanner_thread_pool_queue_size + +- Default: 102400 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: ストレージエンジンがサポートするスキャンタスクの数。 +- Introduced in: - + +##### scanner_thread_pool_thread_num + +- Default: 48 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: ストレージエンジンが同時ストレージボリュームスキャンに使用するスレッド数。すべてのスレッドはスレッドプールで管理されます。 +- Introduced in: - + +##### string_prefix_zonemap_prefix_len + +- Default: 16 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: `enable_string_prefix_zonemap` が有効な場合、文字列ZoneMapの最小/最大値に使用されるプレフィックス長。 +- Introduced in: - + +##### udf_thread_pool_size + +- Default: 1 +- Type: Int +- Unit: スレッド +- Is mutable: いいえ +- Description: ExecEnvで作成されるUDF呼び出しPriorityThreadPool(ユーザー定義関数/UDF関連タスクの実行に使用)のサイズを設定します。この値はスレッドプール(PriorityThreadPool("udf", thread_num, queue_size))を構築する際にプールのスレッド数とキュー容量として使用されます。同時UDF実行を増やすにはこの値を増やし、過度なCPUとメモリの競合を避けるために小さく保ちます。 +- Introduced in: v3.2.0 + +##### update_memory_limit_percent + +- Default: 60 +- Type: Int +- Unit: パーセント +- Is mutable: いいえ +- Description: 更新関連のメモリとキャッシュのために予約されるBEプロセスメモリの割合。起動中に `GlobalEnv` は更新用の `MemTracker` を `process_mem_limit` * clamp(`update_memory_limit_percent`, 0, 100) / 100 として計算します。`UpdateManager` もこのパーセンテージを使用して、プライマリインデックス/インデックスキャッシュ容量(インデックスキャッシュ容量 = `GlobalEnv::process_mem_limit` * `update_memory_limit_percent` / 100)をサイズ設定します。HTTP設定更新ロジックは、設定が変更された場合に更新サブシステムに適用される `update_primary_index_memory_limit` を更新マネージャーで呼び出すコールバックを登録します。この値を増やすと、更新/プライマリインデックスパスにより多くのメモリが割り当てられ(他のプールで利用可能なメモリが減少)、減らすと更新メモリとキャッシュ容量が減少します。値は0〜100の範囲にクランプされます。 +- Introduced in: v3.2.0 + +##### vector_chunk_size + +- Default: 4096 +- Type: Int +- Unit: 行 +- Is mutable: いいえ +- Description: 実行およびストレージコードパス全体で使用されるベクトル化されたチャンク(バッチ)ごとの行数。この値は `Chunk` および `RuntimeState` の `batch_size` の作成を制御し、オペレーターのスループット、オペレーターごとのメモリフットプリント、スピルおよびソートバッファのサイジング、I/Oヒューリスティックス(例:ORCライターの自然な書き込みサイズ)に影響します。これを増やすと、ワイド/CPUバウンドのワークロードでCPUおよびI/O効率が向上する可能性がありますが、ピークメモリ使用量が増加し、小規模な結果のクエリでレイテンシが増加する可能性があります。プロファイリングでバッチサイズがボトルネックであることが示された場合にのみ調整してください。それ以外の場合は、バランスの取れたメモリとパフォーマンスのためにデフォルトを維持してください。 +- Introduced in: v3.2.0 + +### ロード + +##### clear_transaction_task_worker_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: トランザクションをクリアするために使用されるスレッド数。 +- Introduced in: - + +##### column_mode_partial_update_insert_batch_size + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 挿入された行を処理する際の列モード部分更新のバッチサイズ。この項目が `0` 以下に設定されている場合、無限ループを避けるために `1` に制限されます。この項目は、各バッチで処理される新しく挿入された行の数を制御します。値を大きくすると書き込みパフォーマンスが向上しますが、より多くのメモリを消費します。 +- Introduced in: v3.5.10, v4.0.2 + +##### enable_load_spill_parallel_merge + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 単一タブレット内での並列スピルマージを有効にするかどうかを指定します。これを有効にすると、データロード中のスピルマージのパフォーマンスを向上させることができます。 +- Introduced in: - + +##### enable_stream_load_verbose_log + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: Stream LoadジョブのHTTPリクエストとレスポンスをログに記録するかどうかを指定します。 +- Introduced in: v2.5.17, v3.0.9, v3.1.6, v3.2.1 + +##### flush_thread_num_per_store + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 各ストアでMemTableをフラッシュするために使用されるスレッド数。 +- Introduced in: - + +##### lake_flush_thread_num_per_store + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターの各ストアでMemTableをフラッシュするために使用されるスレッド数。 +この値が `0` に設定されている場合、システムはCPUコア数の2倍の値を自動的に使用します。 +この値が `0` 未満に設定されている場合、システムはその絶対値とCPUコア数の積を値として使用します。 +- Introduced in: v3.1.12, 3.2.7 + +##### load_data_reserve_hours + +- Default: 4 +- Type: Int +- Unit: 時間 +- Is mutable: いいえ +- Description: 小規模なロードによって生成されるファイルの予約時間。 +- Introduced in: - + +##### load_error_log_reserve_hours + +- Default: 48 +- Type: Int +- Unit: 時間 +- Is mutable: はい +- Description: データロードログが予約される時間。 +- Introduced in: - + +##### load_process_max_memory_limit_bytes + +- Default: 107374182400 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: BEノード上のすべてのロードプロセスが占有できるメモリリソースの最大サイズ制限。 +- Introduced in: - + +##### load_spill_memory_usage_per_merge + +- Default: 1073741824 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: スピルマージ中のマージ操作ごとの最大メモリ使用量。デフォルトは1GB(1073741824バイト)。このパラメータは、データロードスピルマージ中の個々のマージタスクのメモリ消費を制御し、過剰なメモリ使用量を防止します。 +- Introduced in: - + +##### max_consumer_num_per_group + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Routine Loadのコンシューマグループ内の最大コンシューマ数。 +- Introduced in: - + +##### max_runnings_transactions_per_txn_map + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 各パーティションで同時に実行できるトランザクションの最大数。 +- Introduced in: - + +##### number_tablet_writer_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Stream Load、Broker Load、Insertなどの取り込みで使用されるタブレットライタースレッドの数。パラメータが0以下に設定されている場合、システムはCPUコア数の半分(最小16)を使用します。パラメータが0より大きい場合、システムはその値を使用します。この設定はv3.1.7以降、動的に変更可能になりました。 +- Introduced in: - + +##### push_worker_count_high_priority + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: HIGH優先度のロードタスクを処理するために使用されるスレッド数。 +- Introduced in: - + +##### push_worker_count_normal_priority + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: NORMAL優先度のロードタスクを処理するために使用されるスレッド数。 +- Introduced in: - + +##### streaming_load_max_batch_size_mb + +- Default: 100 +- Type: Int +- Unit: MB +- Is mutable: はい +- Description: StarRocksにストリーミングできるJSONファイルの最大サイズ。 +- Introduced in: - + +##### streaming_load_max_mb + +- Default: 102400 +- Type: Int +- Unit: MB +- Is mutable: はい +- Description: StarRocksにストリーミングできるファイルの最大サイズ。v3.0以降、デフォルト値は `10240` から `102400` に変更されました。 +- Introduced in: - + +##### streaming_load_rpc_max_alive_time_sec + +- Default: 1200 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: Stream LoadのRPCタイムアウト。 +- Introduced in: - + +##### transaction_publish_version_thread_pool_idle_time_ms + +- Default: 60000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: Publish Versionスレッドプールによってスレッドが再利用されるまでのアイドル時間。 +- Introduced in: - + +##### transaction_publish_version_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: バージョンを公開するために使用されるスレッドの最大数。この値が `0` 以下に設定されている場合、インポートの同時実行性が高いのに固定スレッド数しか使用されない場合にスレッドリソースが不足するのを避けるため、システムはCPUコア数を値として使用します。v2.5以降、デフォルト値は `8` から `0` に変更されました。 +- Introduced in: - + +##### write_buffer_size + +- Default: 104857600 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: メモリ内のMemTableのバッファサイズ。この構成項目はフラッシュをトリガーする閾値です。 +- Introduced in: - + +### ロードとアンロード + +##### broker_write_timeout_seconds + +- Default: 30 +- Type: int +- Unit: 秒 +- Is mutable: いいえ +- Description: バックエンドブローカー操作が書き込み/I/O RPCに使用するタイムアウト(秒単位)。この値は1000倍されてミリ秒単位のタイムアウトが生成され、`BrokerFileSystem` および `BrokerServiceConnection` インスタンス(例:ファイルエクスポートおよびスナップショットのアップロード/ダウンロード)へのデフォルトの `timeout_ms` として渡されます。ブローカーまたはネットワークが遅い場合、または大きなファイルを転送する場合に、この値を増やして早期タイムアウトを回避してください。値を減らすと、ブローカーRPCがより早く失敗する可能性があります。この値は `common/config` で定義されており、プロセス起動時に適用されます(動的にリロードできません)。 +- Introduced in: v3.2.0 + +##### enable_load_channel_rpc_async + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 有効にすると、ロードチャネルオープンRPC(例:`PTabletWriterOpen`)の処理は、BRPCワーカーから専用のスレッドプールにオフロードされます。リクエストハンドラは `ChannelOpenTask` を作成し、`LoadChannelMgr::_open` をインラインで実行する代わりに、内部 `_async_rpc_pool` に送信します。これにより、BRPCスレッド内の作業とブロックが減少し、`load_channel_rpc_thread_pool_num` と `load_channel_rpc_thread_pool_queue_size` を介した同時実行性の調整が可能になります。スレッドプールへの送信が失敗した場合(プールがいっぱいであるかシャットダウンされた場合)、リクエストはキャンセルされ、エラー状態が返されます。プールは `LoadChannelMgr::close()` でシャットダウンされるため、リクエストの拒否や処理の遅延を避けるために、この機能を有効にする際の容量とライフサイクルを考慮してください。 +- Introduced in: v3.5.0 + +##### enable_load_diagnose + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 有効にすると、StarRocksは、"[E1008]Reached timeout" と一致するbrpcタイムアウトの後、BE OlapTableSink/NodeChannelから自動ロード診断を試行します。コードは `PLoadDiagnoseRequest` を作成し、リモートのLoadChannelにRPCを送信してプロファイルおよび/またはスタックトレースを収集します(`load_diagnose_rpc_timeout_profile_threshold_ms` および `load_diagnose_rpc_timeout_stack_trace_threshold_ms` によって制御)。診断RPCは `load_diagnose_send_rpc_timeout_ms` をタイムアウトとして使用します。診断リクエストがすでに進行中の場合、診断はスキップされます。これを有効にすると、ターゲットノードで追加のRPCとプロファイリング作業が発生します。高感度のプロダクションワークロードでは、余分なオーバーヘッドを避けるために無効にしてください。 +- Introduced in: v3.5.0 + +##### enable_load_segment_parallel + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: 有効にすると、rowsetセグメントのロードとrowsetレベルの読み取りは、StarRocksバックグラウンドスレッドプール(ExecEnv::load_segment_thread_poolとExecEnv::load_rowset_thread_pool)を使用して同時に実行されます。Rowset::load_segmentsとTabletReader::get_segment_iteratorsは、これらのプールにセグメントごとまたはrowsetごとのタスクを送信し、送信が失敗した場合はシリアルロードにフォールバックして警告をログに記録します。これを有効にすると、大規模なrowsetの読み取り/ロードレイテンシを削減できますが、CPU/IOの同時実行性とメモリ負荷が増加します。注:並列ロードはセグメントのロード完了順序を変更する可能性があるため、部分的な圧縮を妨げます(コードは `_parallel_load` をチェックし、有効な場合は部分圧縮を無効にします)。セグメント順序に依存する操作への影響を考慮してください。 +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### enable_streaming_load_thread_pool + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: ストリーミングロードスキャナーが専用のストリーミングロードスレッドプールに送信されるかどうかを制御します。有効で、クエリが `TLoadJobType::STREAM_LOAD` のLOADである場合、ConnectorScanNodeはスキャナータスクを `streaming_load_thread_pool`(INT32_MAXスレッドとキューサイズで構成され、事実上無制限)に送信します。無効の場合、スキャナーは一般的な `thread_pool` とその `PriorityThreadPool` 送信ロジック(優先度計算、try_offer/offer動作)を使用します。これを有効にすると、ストリーミングロードの作業を通常のクエリ実行から隔離して干渉を減らすことができます。ただし、専用プールは事実上無制限であるため、有効にすると、大量のストリーミングロードトラフィックの下で同時スレッドとリソース使用量が増加する可能性があります。このオプションはデフォルトでオンになっており、通常は変更する必要はありません。 +- Introduced in: v3.2.0 + +##### es_http_timeout_ms + +- Default: 5000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: ElasticsearchスクロールリクエストのためにESScanReaderのESネットワーククライアントが使用するHTTP接続タイムアウト(ミリ秒)。この値は `network_client.set_timeout_ms()` 経由で適用され、その後のスクロールPOSTを送信する前に適用され、クライアントがスクロール中にES応答を待機する時間を制御します。低速ネットワークまたは大規模なクエリの場合、早期タイムアウトを避けるためにこの値を増やしてください。応答しないESノードでより早く失敗するには、この値を減らしてください。この設定は `es_scroll_keepalive` を補完し、スクロールコンテキストのキープアライブ期間を制御します。 +- Introduced in: v3.2.0 + +##### es_index_max_result_window + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: StarRocksがElasticsearchから単一バッチで要求するドキュメントの最大数を制限します。StarRocksは、ESリーダーの `KEY_BATCH_SIZE` を構築する際に、ESリクエストのバッチサイズを min(`es_index_max_result_window`, `chunk_size`) に設定します。ESリクエストがElasticsearchインデックス設定 `index.max_result_window` を超えると、ElasticsearchはHTTP 400(Bad Request)を返します。大規模なインデックスをスキャンする場合、この値を調整するか、Elasticsearch側でES `index.max_result_window` を増やして、より大きな単一リクエストを許可してください。 +- Introduced in: v3.2.0 + +##### ignore_load_tablet_failure + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: この項目が `false`(デフォルト)に設定されている場合、システムはタブレットヘッダーのロード失敗(NotFoundおよびAlreadyExist以外のエラー)を致命的と見なします。コードはエラーをログに記録し、BEプロセスを停止するために `LOG(FATAL)` を呼び出します。`true` に設定されている場合、BEはこのようなタブレットごとのロードエラーにもかかわらず起動を続行します。失敗したタブレットIDは記録されスキップされ、成功したタブレットはロードされます。このパラメータは、RocksDBメタスキャン自体からの致命的なエラーを抑制しないことに注意してください。これは常にプロセスを終了させます。 +- Introduced in: v3.2.0 + +##### load_channel_abort_clean_up_delay_seconds + +- Default: 600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 中止されたロードチャネルのロードIDをシステムが `_aborted_load_channels` から削除するまでどれくらいの時間(秒単位)保持するかを制御します。ロードジョブがキャンセルまたは失敗した場合、ロードIDは記録されたままになるため、遅れて到着するロードRPCはすぐに拒否できます。遅延が期限切れになると、定期的なバックグラウンドスイープ中(最小スイープ間隔は60秒)にエントリはクリーンアップされます。遅延が低すぎると、中止後に迷子RPCを受け入れるリスクがあり、高すぎると、必要以上に状態を保持し、リソースを消費する可能性があります。中止されたロードの遅延リクエスト拒否の正確性とリソース保持のバランスを取るためにこれを調整してください。 +- Introduced in: v3.5.11, v4.0.4 + +##### load_channel_rpc_thread_pool_num + +- Default: -1 +- Type: Int +- Unit: スレッド +- Is mutable: はい +- Description: ロードチャネルの非同期RPCスレッドプールの最大スレッド数。0以下(デフォルト `-1`)に設定されている場合、プールのサイズはCPUコア数(`CpuInfo::num_cores()`)に自動設定されます。設定された値はThreadPoolBuilderの最大スレッドとして使用され、プールの最小スレッドはmin(5, max_threads)に設定されます。プールキューサイズは `load_channel_rpc_thread_pool_queue_size` によって個別に制御されます。この設定は、非同期RPCプールサイズをbrpcワーカーのデフォルト(`brpc_num_threads`)と揃えるために導入され、ロードRPC処理を同期から非同期に切り替えた後も動作の互換性を維持します。実行時にこの設定を変更すると、`ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)` がトリガーされます。 +- Introduced in: v3.5.0 + +##### load_channel_rpc_thread_pool_queue_size + +- Default: 1024000 +- Type: int +- Unit: 個 +- Is mutable: いいえ +- Description: LoadChannelMgrによって作成されるロードチャネルRPCスレッドプールの保留中タスクの最大キューサイズを設定します。このスレッドプールは、`enable_load_channel_rpc_async` が有効な場合、非同期 `open` リクエストを実行します。プールのサイズは `load_channel_rpc_thread_pool_num` と対になります。大きなデフォルト値(1024000)は、同期処理から非同期処理への切り替え後も動作を維持するためにbrpcワーカーのデフォルトと一致しています。キューがいっぱいの場合、ThreadPool::submit() は失敗し、着信 `open` RPCはエラーでキャンセルされ、呼び出し元は拒否を受け取ります。この値を増やすと、より大きな同時 `open` リクエストのバーストをバッファリングできます。減らすと、バックプレッシャが厳しくなりますが、負荷がかかるとより多くの拒否が発生する可能性があります。 +- Introduced in: v3.5.0 + +##### load_diagnose_rpc_timeout_profile_threshold_ms + +- Default: 60000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: ロードRPCがタイムアウトし(エラーに "[E1008]Reached timeout" が含まれる)、`enable_load_diagnose` がtrueの場合、この閾値は完全なプロファイリング診断が要求されるかどうかを制御します。リクエストレベルのRPCタイムアウト `_rpc_timeout_ms` が `load_diagnose_rpc_timeout_profile_threshold_ms` より大きい場合、その診断に対してプロファイリングが有効になります。`_rpc_timeout_ms` が小さい値の場合、リアルタイム/短時間タイムアウトのロードに対して頻繁な重い診断を避けるため、プロファイリングは20タイムアウトごとに1回サンプリングされます。この値は送信される `PLoadDiagnoseRequest` の `profile` フラグに影響します。スタックトレースの動作は `load_diagnose_rpc_timeout_stack_trace_threshold_ms` によって個別に制御され、送信タイムアウトは `load_diagnose_send_rpc_timeout_ms` によって制御されます。 +- Introduced in: v3.5.0 + +##### load_diagnose_rpc_timeout_stack_trace_threshold_ms + +- Default: 600000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: 長時間実行されるロードRPCのリモートスタックトレースをいつ要求するかを決定するために使用される閾値(ミリ秒単位)。ロードRPCがタイムアウトエラーでタイムアウトし、実効RPCタイムアウト(_rpc_timeout_ms)がこの値を超えた場合、`OlapTableSink`/`NodeChannel` はターゲットBEに `load_diagnose` RPCを送信する際に `stack_trace=true` を含めます。これにより、BEはデバッグ用にスタックトレースを返すことができます。`LocalTabletsChannel::SecondaryReplicasWaiter` も、セカンダリレプリカの待機がこの間隔を超えた場合、プライマリからベストエフォートのスタックトレース診断をトリガーします。この動作には `enable_load_diagnose` が必要であり、診断RPCタイムアウトには `load_diagnose_send_rpc_timeout_ms` が使用されます。プロファイリングは `load_diagnose_rpc_timeout_profile_threshold_ms` によって個別に制御されます。この値を下げると、スタックトレースがより積極的に要求されるようになります。 +- Introduced in: v3.5.0 + +##### load_diagnose_send_rpc_timeout_ms + +- Default: 2000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: BEロードパスによって開始される診断関連のbrpc呼び出しに適用されるタイムアウト(ミリ秒単位)。`load_diagnose` RPC(LoadChannel brpc呼び出しがタイムアウトしたときにNodeChannel/OlapTableSinkによって送信される)およびレプリカステータスクエリ(SecondaryReplicasWaiter / LocalTabletsChannelがプライマリレプリカの状態をチェックするときに使用される)のコントローラタイムアウトを設定するために使用されます。リモート側がプロファイルまたはスタックトレースデータで応答するのに十分な高い値を選択しますが、障害処理が遅延しないように高すぎないようにしてください。このパラメータは `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、および `load_diagnose_rpc_timeout_stack_trace_threshold_ms` と連携して機能し、診断情報がいつ、何が要求されるかを制御します。 +- Introduced in: v3.5.0 + +##### load_fp_brpc_timeout_ms + +- Default: -1 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: `node_channel_set_brpc_timeout` フェイルポイントがトリガーされたときにOlapTableSinkによって使用されるチャネルごとのbrpc RPCタイムアウトを上書きします。正の値に設定されている場合、NodeChannelはその内部 `_rpc_timeout_ms` をこの値(ミリ秒単位)に設定し、open/add-chunk/cancel RPCがより短いタイムアウトを使用するようにします。これにより、"[E1008]Reached timeout" エラーを生成するbrpcタイムアウトのシミュレーションが可能になります。デフォルト(`-1`)は上書きを無効にします。この値の変更はテストおよび障害注入を目的としています。小さい値は偽のタイムアウトを生成し、ロード診断をトリガーする可能性があります(`enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms`、および `load_diagnose_send_rpc_timeout_ms` を参照)。 +- Introduced in: v3.5.0 + +##### load_fp_tablets_channel_add_chunk_block_ms + +- Default: -1 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: 有効にすると(正のミリ秒値に設定すると)、このフェイルポイント設定により、`TabletsChannel::add_chunk` がロード処理中に指定された時間だけスリープします。これは、BRPCタイムアウトエラー(例:"[E1008]Reached timeout")をシミュレートし、ロードレイテンシを増加させるコストの高い `add_chunk` 操作をエミュレートするために使用されます。0以下の値(デフォルト `-1`)はインジェクションを無効にします。これは障害処理、タイムアウト、レプリカ同期動作のテストを目的としています。通常のプロダクションワークロードでは有効にしないでください。書き込み完了が遅延し、アップストリームのタイムアウトやレプリカの中止をトリガーする可能性があります。 +- Introduced in: v3.5.0 + +##### load_segment_thread_pool_num_max + +- Default: 128 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BEロード関連スレッドプールの最大ワーカースレッド数を設定します。この値はThreadPoolBuilderによって `exec_env.cpp` の `load_rowset_pool` と `load_segment_pool` の両方のスレッドを制限するために使用され、ストリーミングおよびバッチロード中のロードされたrowsetとセグメントの処理(例:デコード、インデックス作成、書き込み)の同時実行性を制御します。この値を増やすと並列処理が向上し、ロードスループットが向上する可能性がありますが、CPU、メモリ使用量、および潜在的な競合も増加します。減らすと同時ロード処理が制限され、スループットが低下する可能性があります。`load_segment_thread_pool_queue_size` と `streaming_load_thread_pool_idle_time_ms` と一緒に調整してください。変更にはBEの再起動が必要です。 +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### load_segment_thread_pool_queue_size + +- Default: 10240 +- Type: Int +- Unit: タスク +- Is mutable: いいえ +- Description: 「load_rowset_pool」および「load_segment_pool」として作成されるロード関連スレッドプールの最大キュー長(保留中のタスク数)を設定します。これらのプールは `load_segment_thread_pool_num_max` を最大スレッド数として使用し、この設定はThreadPoolのオーバーフローポリシーが有効になる前にバッファリングできるロードセグメント/rowsetタスクの数を制御します(ThreadPoolの実装に応じて、その後の送信が拒否またはブロックされる可能性があります)。保留中のロード作業を増やすにはこの値を増やします(メモリ使用量が増加し、レイテンシが上昇する可能性があります)。バッファリングされたロードの同時実行性を制限し、メモリ使用量を減らすにはこの値を減らします。 +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### max_pulsar_consumer_num_per_group + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BEのルーチンロード用の単一データコンシューマグループで作成できるPulsarコンシューマの最大数を制御します。複数のトピックサブスクリプションでは累積的な確認応答がサポートされていないため、各コンシューマは正確に1つのトピック/パーティションをサブスクライブします。`pulsar_info->partitions` のパーティション数がこの値を超えると、グループ作成は失敗し、BEで `max_pulsar_consumer_num_per_group` を増やすか、BEを追加するようにエラーが通知されます。この制限は `PulsarDataConsumerGroup` の構築時に強制され、BEが1つのルーチンロードグループに対してこの数を超えるコンシューマをホストすることを防止します。Kafkaルーチンロードの場合、代わりに `max_consumer_num_per_group` が使用されます。 +- Introduced in: v3.2.0 + +##### pull_load_task_dir + +- Default: `${STARROCKS_HOME}/var/pull_load` +- Type: string +- Unit: - +- Is mutable: いいえ +- Description: BEが「プルロード」タスク(ダウンロードされたソースファイル、タスク状態、一時出力など)のデータと作業ファイルを保存するファイルシステムパス。ディレクトリはBEプロセスによって書き込み可能であり、着信ロードのための十分なディスクスペースが必要です。デフォルトはSTARROCKS_HOMEからの相対パスです。テストはこのディレクトリが存在することを期待して作成します(テスト構成を参照)。 +- Introduced in: v3.2.0 + +##### routine_load_kafka_timeout_second + +- Default: 10 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: Kafka関連のルーチンロード操作に使用されるタイムアウト(秒単位)。クライアントリクエストがタイムアウトを指定しない場合、`routine_load_kafka_timeout_second` が `get_info` のデフォルトRPCタイムアウト(ミリ秒に変換される)として使用されます。また、`librdkafka` コンシューマの呼び出しごとのコンシューマポーリングタイムアウトとしても使用されます(ミリ秒に変換され、残りのランタイムで上限が設定されます)。注:内部 `get_info` パスは、FE側のタイムアウト競合を避けるために、`librdkafka` に渡す前にこの値を80%に減らします。この値を、タイムリーな障害報告とネットワーク/ブローカー応答に十分な時間のバランスが取れるように設定してください。設定は変更できないため、変更には再起動が必要です。 +- Introduced in: v3.2.0 + +##### routine_load_pulsar_timeout_second + +- Default: 10 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: リクエストが明示的なタイムアウトを提供しない場合に、BEがPulsar関連のルーチンロード操作に使用するデフォルトのタイムアウト(秒単位)。具体的には、`PInternalServiceImplBase::get_pulsar_info` はこの値を1000倍して、Pulsarパーティションメタデータとバックログをフェッチするルーチンロードタスク実行メソッドに渡されるミリ秒単位のタイムアウトを形成します。応答の遅いPulsar応答を許可するために値を増やしますが、障害検出が長くなります。応答の遅いブローカーでより早く失敗するには値を減らします。Kafkaに使用される `routine_load_kafka_timeout_second` に類似しています。 +- Introduced in: v3.2.0 + +##### streaming_load_thread_pool_idle_time_ms + +- Default: 2000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: ストリーミングロード関連スレッドプールのスレッドアイドルタイムアウト(ミリ秒単位)を設定します。この値は、`stream_load_io` プール、および `load_rowset_pool` と `load_segment_pool` のThreadPoolBuilderにアイドルタイムアウトとして渡されます。これらのプールのスレッドは、この期間アイドル状態が続くと再利用されます。値が低いほどアイドルリソースの使用量は減少しますが、スレッド作成のオーバーヘッドが増加します。値が高いほど、短時間のバーストの間はスレッドをアクティブに保ちますが、ベースラインのリソース使用量が増加します。`stream_load_io` プールは `enable_streaming_load_thread_pool` が有効な場合に使用されます。 +- Introduced in: v3.2.0 + +##### streaming_load_thread_pool_num_min + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: ExecEnv初期化中に作成されるストリーミングロードIOスレッドプール("stream_load_io")の最小スレッド数。プールは `set_max_threads(INT32_MAX)` と `set_max_queue_size(INT32_MAX)` で構築されるため、同時ストリーミングロードでのデッドロックを避けるために実質的に無制限です。値が0の場合、プールはスレッドなしで開始され、オンデマンドで増加します。正の値を設定すると、起動時にその数のスレッドが予約されます。このプールは `enable_streaming_load_thread_pool` がtrueの場合に使用され、そのアイドルタイムアウトは `streaming_load_thread_pool_idle_time_ms` によって制御されます。全体的な同時実行性は `fragment_pool_thread_num_max` と `webserver_num_workers` によって依然として制約されます。この値を変更する必要はめったになく、高すぎるとリソース使用量が増加する可能性があります。 +- Introduced in: v3.2.0 + +### 統計レポート + +##### enable_metric_calculator + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: いいえ +- Description: trueの場合、BEプロセスはバックグラウンドの「metrics_daemon」スレッド(非AppleプラットフォームでDaemon::initで開始)を起動し、約15秒ごとに `StarRocksMetrics::instance()->metrics()->trigger_hook()` を呼び出して、派生/システムメトリクス(例:push/queryバイト/秒、最大ディスクI/O使用率、最大ネットワーク送信/受信レート)を計算し、メモリ内訳をログに記録し、テーブルメトリクスのクリーンアップを実行します。falseの場合、これらのフックはメトリクス収集時に `MetricRegistry::collect` 内で同期的に実行され、メトリクススクレイプのレイテンシが増加する可能性があります。変更を有効にするにはプロセスの再起動が必要です。 +- Introduced in: v3.2.0 + +##### enable_system_metrics + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: trueの場合、StarRocksは起動時にシステムレベルの監視を初期化します。設定されたストアパスからディスクデバイスを検出し、ネットワークインターフェースを列挙し、この情報をメトリクスサブシステムに渡してディスクI/O、ネットワークトラフィック、メモリ関連のシステムメトリクスの収集を有効にします。デバイスまたはインターフェースの検出に失敗した場合、初期化は警告をログに記録し、システムメトリクス設定を中止します。このフラグはシステムメトリクスが初期化されるかどうかのみを制御します。定期的なメトリクス集約スレッドは `enable_metric_calculator` によって個別に制御され、JVMメトリクス初期化は `enable_jvm_metrics` によって制御されます。この値を変更するには再起動が必要です。 +- Introduced in: v3.2.0 + +##### profile_report_interval + +- Default: 30 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ProfileReportWorkerが(1)LOADクエリのフラグメントごとのプロファイル情報をいつ報告するかを決定し、(2)報告サイクルの間にスリープする秒単位の間隔。ワーカーは、現在の時刻と各タスクの `last_report_time` を `(profile_report_interval * 1000) ms` を使用して比較し、非パイプラインおよびパイプラインのロードタスクの両方でプロファイルを再報告する必要があるかどうかを判断します。各ループでワーカーは現在の値(実行時に変更可能)を読み取ります。設定された値が0以下の場合、ワーカーは強制的に1に設定し、警告を出力します。この値を変更すると、次の報告決定とスリープ期間に影響します。 +- Introduced in: v3.2.0 + +##### report_disk_state_interval_seconds + +- Default: 60 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ストレージボリュームの状態(ボリューム内のデータのサイズを含む)を報告する時間間隔。 +- Introduced in: - + +##### report_resource_usage_interval_ms + +- Default: 1000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: BEエージェントからFE(マスター)に送信される定期的なリソース使用量レポート間の間隔(ミリ秒単位)。エージェントのワーカースレッドはTResourceUsage(実行中のクエリ数、使用済み/制限メモリ、CPU使用率、リソースグループの使用量)を収集し、`report_task` を呼び出して、この設定された間隔(`task_worker_pool` を参照)だけスリープします。値が低いほど報告の適時性は向上しますが、CPU、ネットワーク、およびマスター負荷が増加します。値が高いほどオーバーヘッドは減少しますが、リソース情報が古くなります。報告は関連メトリクス(report_resource_usage_requests_total、report_resource_usage_requests_failed)を更新します。クラスターの規模とFE負荷に応じて調整してください。 +- Introduced in: v3.2.0 + +##### report_tablet_interval_seconds + +- Default: 60 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: すべてのタブレットの最新バージョンを報告する時間間隔。 +- Introduced in: - + +##### report_task_interval_seconds + +- Default: 10 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: タスクの状態を報告する時間間隔。タスクには、テーブルの作成、テーブルの削除、データのロード、テーブルスキーマの変更などがあります。 +- Introduced in: - + +##### report_workgroup_interval_seconds + +- Default: 5 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: すべてのワークグループの最新バージョンを報告する時間間隔。 +- Introduced in: - + +### ストレージ + +##### alter_tablet_worker_count + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: スキーマ変更に使用されるスレッド数。 +- Introduced in: - + +##### avro_ignore_union_type_tag + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: Avro Unionデータ型からシリアライズされたJSON文字列からタイプタグを削除するかどうか。 +- Introduced in: v3.3.7, v3.4 + +##### base_compaction_check_interval_seconds + +- Default: 60 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Base Compactionのスレッドポーリングの時間間隔。 +- Introduced in: - + +##### base_compaction_interval_seconds_since_last_operation + +- Default: 86400 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 前回のBase Compactionからの時間間隔。この構成項目は、Base Compactionをトリガーする条件の1つです。 +- Introduced in: - + +##### base_compaction_num_threads_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 各ストレージボリュームでBase Compactionに使用されるスレッド数。 +- Introduced in: - + +##### base_cumulative_delta_ratio + +- Default: 0.3 +- Type: Double +- Unit: - +- Is mutable: はい +- Description: 累積ファイルサイズとベースファイルサイズの比率。この比率に達することは、Base Compactionをトリガーする条件の1つです。 +- Introduced in: - + +##### chaos_test_enable_random_compaction_strategy + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: この項目が `true` に設定されている場合、`TabletUpdates::compaction()` はカオスエンジニアリングテストを目的としたランダムなコンパクション戦略(`compaction_random`)を使用します。このフラグは、タブレットのコンパクション選択中に通常の戦略(例:サイズ階層型コンパクション)の代わりに非決定論的/ランダムなポリシーに従うようにコンパクションを強制し、優先されます。これは制御されたテストのみを目的としており、有効にすると予測できないコンパクション順序、I/O/CPUの増加、およびテストの不安定性を引き起こす可能性があります。プロダクション環境では有効にしないでください。障害注入またはカオスエンジニアリングテストのシナリオでのみ使用してください。 +- Introduced in: v3.3.12, 3.4.2, 3.5.0, 4.0.0 + +##### check_consistency_worker_count + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: タブレットの一貫性をチェックするために使用されるスレッド数。 +- Introduced in: - + +##### clear_expired_replication_snapshots_interval_seconds + +- Default: 3600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: システムが異常なレプリケーションによって残された期限切れのスナップショットをクリアする時間間隔。 +- Introduced in: v3.3.5 + +##### compact_threads + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 同時コンパクションタスクに使用される最大スレッド数。この設定はv3.1.7およびv3.2.2以降、動的に変更可能になりました。 +- Introduced in: v3.0.0 + +##### compaction_max_memory_limit + +- Default: -1 +- Type: Long +- Unit: バイト +- Is mutable: いいえ +- Description: このBE上のコンパクションタスクで利用可能なメモリのグローバルな上限(バイト単位)。BEの初期化中、最終的なコンパクションメモリ制限はmin(`compaction_max_memory_limit`, process_mem_limit * `compaction_max_memory_limit_percent` / 100) として計算されます。`compaction_max_memory_limit` が負の場合(デフォルト `-1`)、`mem_limit` から導出されたBEプロセスメモリ制限にフォールバックします。パーセント値は [0,100] にクランプされます。プロセスメモリ制限が設定されていない場合(負の場合)、コンパクションメモリは無制限のままです(`-1`)。この計算された値は `_compaction_mem_tracker` の初期化に使用されます。`compaction_max_memory_limit_percent` および `compaction_memory_limit_per_worker` も参照してください。 +- Introduced in: v3.2.0 + +##### compaction_max_memory_limit_percent + +- Default: 100 +- Type: Int +- Unit: パーセント +- Is mutable: いいえ +- Description: コンパクションに使用できるBEプロセスメモリの割合。BEは、`compaction_max_memory_limit` と (プロセスメモリ制限 × このパーセンテージ / 100) の最小値としてコンパクションメモリ上限を計算します。この値が0未満または100より大きい場合、100として扱われます。`compaction_max_memory_limit` < 0 の場合、代わりにプロセスメモリ制限が使用されます。計算では、`mem_limit` から導出されたBEプロセスメモリも考慮されます。`compaction_memory_limit_per_worker` (ワーカごとの上限) と組み合わせて、この設定は利用可能な合計コンパクションメモリを制御し、したがってコンパクションの同時実行性とOOMリスクに影響します。 +- Introduced in: v3.2.0 + +##### compaction_memory_limit_per_worker + +- Default: 2147483648 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: 各コンパクションスレッドに許可される最大メモリサイズ。 +- Introduced in: - + +##### compaction_trace_threshold + +- Default: 60 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 各コンパクションのタイム閾値。コンパクションがこのタイム閾値よりも長くかかった場合、StarRocksは対応するトレースを出力します。 +- Introduced in: - + +##### create_tablet_worker_count + +- Default: 3 +- Type: Int +- Unit: スレッド +- Is mutable: はい +- Description: FEによって送信されたTTaskType::CREATE(タブレット作成)タスクを処理するAgentServerスレッドプール内のワーカースレッドの最大数を設定します。BE起動時、この値はスレッドプールの最大値として使用され(プールは最小スレッド数=1、最大キューサイズ=無制限で作成されます)、実行時に変更すると、`ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)` がトリガーされます。同時タブレット作成スループットを向上させるには(大量ロードやパーティション作成時に便利)この値を増やし、同時作成操作を抑制するにはこの値を減らします。値を上げるとCPU、メモリ、I/Oの同時実行性が増加し、競合を引き起こす可能性があります。スレッドプールは少なくとも1つのスレッドを強制するため、1未満の値は実質的な効果はありません。 +- Introduced in: v3.2.0 + +##### cumulative_compaction_check_interval_seconds + +- Default: 1 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Cumulative Compactionのスレッドポーリングの時間間隔。 +- Introduced in: - + +##### cumulative_compaction_num_threads_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: ディスクごとのCumulative Compactionスレッド数。 +- Introduced in: - + +##### data_page_size + +- Default: 65536 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: 列データとインデックスページを構築する際に使用されるターゲットの非圧縮ページサイズ(バイト単位)。この値は `ColumnWriterOptions.data_page_size` と `IndexedColumnWriterOptions.index_page_size` にコピーされ、ページビルダー(例:`BinaryPlainPageBuilder::is_page_full` とバッファ予約ロジック)によって、ページの完了時期と予約するメモリ量を決定するために参照されます。値が0の場合、ビルダーのページサイズ制限が無効になります。この値を変更すると、ページ数、メタデータオーバーヘッド、メモリ予約、I/O/圧縮のトレードオフ(ページが小さいほどページとメタデータが増加し、ページが大きいほどページは減少し、圧縮は向上する可能性がありますが、メモリの急増が発生する可能性があります)に影響します。 +- Introduced in: v3.2.4 + +##### default_num_rows_per_column_file_block + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 各行ブロックに格納できる最大行数。 +- Introduced in: - + +##### delete_worker_count_high_priority + +- Default: 1 +- Type: Int +- Unit: スレッド +- Is mutable: いいえ +- Description: DeleteTaskWorkerPool内のワーカースレッドのうち、HIGH優先度削除スレッドとして割り当てられるスレッド数。起動時にAgentServerは、`total threads = delete_worker_count_normal_priority + delete_worker_count_high_priority` で削除プールを作成します。最初の `delete_worker_count_high_priority` スレッドは、`TPriority::HIGH` タスクを排他的にポップしようとするようにマークされます(高優先度削除タスクをポーリングし、利用可能なものがない場合はスリープ/ループします)。この値を増やすと、高優先度削除リクエストの同時実行性が向上します。減らすと、専用容量が減少し、高優先度削除のレイテンシが増加する可能性があります。 +- Introduced in: v3.2.0 + +##### dictionary_encoding_ratio + +- Default: 0.7 +- Type: Double +- Unit: - +- Is mutable: いいえ +- Description: `StringColumnWriter` がチャンクの辞書(DICT_ENCODING)とプレーン(PLAIN_ENCODING)エンコーディングを決定するエンコード推測フェーズで使用する割合(0.0〜1.0)。コードは `max_card = row_count * dictionary_encoding_ratio` を計算し、チャンクの異なるキー数をスキャンします。異なるキー数が `max_card` を超える場合、ライターは `PLAIN_ENCODING` を選択します。このチェックは、チャンクサイズが `dictionary_speculate_min_chunk_size` を超えた場合(および `row_count > dictionary_min_rowcount` の場合)にのみ実行されます。値を高く設定すると辞書エンコーディングが優先されます(より多くの異なるキーを許容します)。値を低く設定すると、より早くプレーンエンコーディングにフォールバックします。値1.0は事実上辞書エンコーディングを強制します(異なるキー数が `row_count` を超えることはありません)。 +- Introduced in: v3.2.0 + +##### dictionary_encoding_ratio_for_non_string_column + +- Default: 0 +- Type: double +- Unit: - +- Is mutable: いいえ +- Description: 非文字列列(数値、日付/時刻、DECIMAL型)に辞書エンコーディングを使用するかどうかを決定する比率閾値。有効な場合(値 > 0.0001)、ライターは `max_card = row_count * dictionary_encoding_ratio_for_non_string_column` を計算し、`row_count > dictionary_min_rowcount` のサンプルでは、`distinct_count ≤ max_card` の場合にのみ `DICT_ENCODING` を選択します。それ以外の場合は `BIT_SHUFFLE` にフォールバックします。値 `0`(デフォルト)は非文字列の辞書エンコーディングを無効にします。このパラメータは `dictionary_encoding_ratio` と類似していますが、非文字列列に適用されます。(0,1] の値を使用してください。値が小さいほど辞書エンコーディングはカーディナリティの低い列に制限され、辞書メモリ/I/Oオーバーヘッドが減少します。 +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### dictionary_page_size + +- Default: 1048576 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: rowsetセグメントを構築する際に使用される辞書ページのバイト単位のサイズ。この値はBE rowsetコードの `PageBuilderOptions::dict_page_size` に読み込まれ、単一の辞書ページに格納できる辞書エントリの数を制御します。この値を増やすと、より大きな辞書を許可することで辞書エンコードされた列の圧縮率が向上する可能性がありますが、ページが大きくなると書き込み/エンコード中に消費されるメモリが増加し、ページの読み取りやマテリアライズ時にI/Oとレイテンシが増加する可能性があります。大規模メモリ、書き込み頻度の高いワークロードの場合、保守的に設定し、実行時パフォーマンスの低下を防ぐために過度に大きな値を避けてください。 +- Introduced in: v3.3.0, v3.4.0, v3.5.0 + +##### disk_stat_monitor_interval + +- Default: 5 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ディスクの健全性ステータスを監視する時間間隔。 +- Introduced in: - + +##### download_low_speed_limit_kbps + +- Default: 50 +- Type: Int +- Unit: KB/秒 +- Is mutable: はい +- Description: 各HTTPリクエストのダウンロード速度の下限。HTTPリクエストは、`download_low_speed_time` で指定された時間内にこの値よりも低い速度で継続的に実行された場合、中断されます。 +- Introduced in: - + +##### download_low_speed_time + +- Default: 300 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: HTTPリクエストが制限よりも低いダウンロード速度で実行できる最大時間。HTTPリクエストは、この構成項目で指定された時間内に `download_low_speed_limit_kbps` の値よりも低い速度で継続的に実行された場合、中断されます。 +- Introduced in: - + +##### download_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BEノード上のリストアジョブのダウンロードタスクの最大スレッド数。`0` は、BEが稼働しているマシンのCPUコア数に値を設定することを示します。 +- Introduced in: - + +##### drop_tablet_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: タブレットを削除するために使用されるスレッド数。`0` はノード内のCPUコアの半分を示します。 +- Introduced in: - + +##### enable_check_string_lengths + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: ロード中にデータ長をチェックして、VARCHARデータが範囲外であることによる圧縮失敗を解決するかどうか。 +- Introduced in: - + +##### enable_event_based_compaction_framework + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: イベントベースのコンパクションフレームワークを有効にするかどうか。`true` はイベントベースのコンパクションフレームワークが有効であることを示し、`false` は無効であることを示します。イベントベースのコンパクションフレームワークを有効にすると、多くのタブレットがあるシナリオや単一タブレットが大量のデータを持つシナリオで、コンパクションのオーバーヘッドを大幅に削減できます。 +- Introduced in: - + +##### enable_lazy_delta_column_compaction + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 有効にすると、コンパクションは部分列更新によって生成されたデルタ列に対して「遅延」戦略を優先します。StarRocksは、コンパクションI/Oを節約するために、デルタ列ファイルをメインセグメントファイルに積極的にマージすることを避けます。実際には、コンパクション選択コードは部分列更新rowsetと複数の候補をチェックします。これらが見つかり、このフラグがtrueの場合、エンジンはコンパクションへの追加入力を停止するか、空のrowset(レベル-1)のみをマージし、デルタ列を分離したままにします。これにより、コンパクション中の即時I/OとCPUが削減されますが、統合が遅延する(セグメントと一時ストレージオーバーヘッドが増加する可能性)コストがかかります。正確性とクエリのセマンティクスは変更されません。 +- Introduced in: v3.2.3 + +##### enable_new_load_on_memory_limit_exceeded + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: ハードメモリリソース制限に達したときに新しいロードプロセスを許可するかどうか。`true` は新しいロードプロセスが許可されることを示し、`false` は拒否されることを示します。 +- Introduced in: v3.3.2 + +##### enable_pk_index_parallel_compaction + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターでプライマリキーインデックスの並列コンパクションを有効にするかどうか。 +- Introduced in: - + +##### enable_pk_index_parallel_execution + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターでプライマリキーインデックス操作の並列実行を有効にするかどうか。有効にすると、システムはスレッドプールを使用して公開操作中にセグメントを並行して処理し、大規模なタブレットのパフォーマンスを大幅に向上させます。 +- Introduced in: - + +##### enable_pk_index_eager_build + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: データインポートおよびコンパクションフェーズ中にプライマリキーインデックスファイルを積極的に構築するかどうか。有効にすると、システムはデータ書き込み中に永続PKインデックスファイルを即座に生成し、その後のクエリパフォーマンスを向上させます。 +- Introduced in: - + +##### enable_pk_size_tiered_compaction_strategy + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: プライマリキーテーブルのサイズ階層型コンパクションポリシーを有効にするかどうか。`true` はサイズ階層型コンパクション戦略が有効であることを示し、`false` は無効であることを示します。 +- Introduced in: この項目はv3.2.4およびv3.1.10以降の共有データクラスター、およびv3.2.5およびv3.1.10以降の共有なしクラスターで有効になります。 + +##### enable_rowset_verify + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 生成されたrowsetの正しさを検証するかどうか。有効にすると、コンパクションとスキーマ変更後に生成されたrowsetの正しさがチェックされます。 +- Introduced in: - + +##### enable_size_tiered_compaction_strategy + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: サイズ階層型コンパクションポリシー(プライマリキーテーブルを除く)を有効にするかどうか。`true` はサイズ階層型コンパクション戦略が有効であることを示し、`false` は無効であることを示します。 +- Introduced in: - + +##### enable_strict_delvec_crc_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: `enable_strict_delvec_crc_check` がtrueに設定されている場合、削除ベクターに対して厳密なCRC32チェックを実行し、不一致が検出された場合は失敗を返します。 +- Introduced in: - + +##### enable_transparent_data_encryption + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: 有効にすると、StarRocksは新しく書き込まれたストレージオブジェクト(セグメントファイル、削除/更新ファイル、rowsetセグメント、lake SSTs、永続インデックスファイルなど)に対して暗号化されたディスク上のアーティファクトを作成します。ライター(RowsetWriter/SegmentWriter、lake UpdateManager/LakePersistentIndexおよび関連コードパス)はKeyCacheから暗号化情報を要求し、`encryption_info` を書き込み可能ファイルにアタッチし、`encryption_meta` をrowset / セグメント / sstableメタデータ(`segment_encryption_metas`、削除/更新暗号化メタデータ)に永続化します。フロントエンドとバックエンド/CNの暗号化フラグは一致している必要があります。不一致の場合、BEはハートビートで中止します(`LOG(FATAL)`)。このフラグは実行時に変更できません。デプロイ前に有効にし、キー管理(KEK)とKeyCacheがクラスター全体で適切に構成され、同期されていることを確認してください。 +- Introduced in: v3.3.1, 3.4.0, 3.5.0, 4.0.0 + +##### enable_zero_copy_from_page_cache + +- Default: true +- Type: boolean +- Unit: - +- Is mutable: はい +- Description: 有効にすると、`FixedLengthColumnBase` はページキャッシュによってサポートされるバッファから発生するデータを追加する際にバイトコピーを避けることができます。`append_numbers` では、すべての条件が満たされている場合(設定がtrueである、入力リソースが所有されている、リソースメモリが列の要素タイプにアラインされている、列が空である、リソース長が要素サイズの倍数である)、コードは入力 `ContainerResource` を取得し、列の内部リソースポインタ(ゼロコピー)を設定します。これを有効にすると、CPUとメモリコピーのオーバーヘッドが削減され、取り込み/スループットが向上する可能性があります。欠点としては、列の寿命が取得されたバッファと結合され、正しい所有権/アラインメントに依存することです。安全なコピーを強制するには無効にしてください。 +- Introduced in: - + +##### file_descriptor_cache_clean_interval + +- Default: 3600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 一定期間使用されていないファイルディスクリプタをクリーンアップする時間間隔。 +- Introduced in: - + +##### ignore_broken_disk + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: 設定されたストレージパスが読み取り/書き込みチェックに失敗したり、解析に失敗したりした場合の起動動作を制御します。`false`(デフォルト)の場合、BEは `storage_root_path` または `spill_local_storage_dir` 内の破損したエントリを致命的と見なし、起動を中止します。`true` の場合、StarRocksは `check_datapath_rw` に失敗したり、解析に失敗したりしたストレージパスをスキップ(警告をログに記録して削除)し、BEは残りの健全なパスで起動を続行できます。注:設定されたすべてのパスが削除された場合でも、BEは終了します。これを有効にすると、構成ミスまたは故障したディスクを隠蔽し、無視されたパス上のデータが利用できなくなる可能性があります。ログとディスクの状態を適切に監視してください。 +- Introduced in: v3.2.0 + +##### inc_rowset_expired_sec + +- Default: 1800 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 着信データの有効期限。この構成項目は増分クローンで使用されます。 +- Introduced in: - + +##### load_process_max_memory_hard_limit_ratio + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BEノード上のすべてのロードプロセスが占有できるメモリリソースのハードリミット(比率)。`enable_new_load_on_memory_limit_exceeded` が `false` に設定されており、すべてのロードプロセスのメモリ消費が `load_process_max_memory_limit_percent * load_process_max_memory_hard_limit_ratio` を超えた場合、新しいロードプロセスは拒否されます。 +- Introduced in: v3.3.2 + +##### load_process_max_memory_limit_percent + +- Default: 30 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BEノード上のすべてのロードプロセスが占有できるメモリリソースのソフトリミット(パーセンテージ)。 +- Introduced in: - + +##### lz4_acceleration + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 内蔵LZ4コンプレッサによって使用されるLZ4「アクセラレーション」パラメータを制御します(`LZ4_compress_fast_continue` に渡されます)。値が高いほど圧縮速度が優先され、圧縮率が犠牲になります。値が低いほど(1)より良い圧縮を生成しますが、遅くなります。有効範囲:MIN=1、MAX=65537。この設定は、BlockCompression内のすべてのLZ4ベースのコーデック(例:LZ4およびHadoop-LZ4)に影響し、圧縮方法のみを変更します。LZ4形式や解凍の互換性は変更しません。CPUバウンドまたは低遅延のワークロードで、より大きな出力が許容できる場合は上向きに(例:4、8など)調整してください。ストレージまたはI/Oに敏感なワークロードの場合は1に保ってください。スループット対サイズのトレードオフはデータに大きく依存するため、変更する前に代表的なデータでテストしてください。 +- Introduced in: v3.4.1, 3.5.0, 4.0.0 + +##### lz4_expected_compression_ratio + +- Default: 2.1 +- Type: double +- Unit: 無次元 (圧縮率) +- Is mutable: はい +- Description: シリアライゼーション圧縮戦略が観測されたLZ4圧縮が「良好」であるかどうかを判断する際に使用する閾値。`compress_strategy.cpp` では、この値が `lz4_expected_compression_speed_mbps` と共に報酬メトリクスを計算する際に観測された `compress_ratio` を分割します。結合された報酬が1.0より大きい場合、戦略は肯定的なフィードバックを記録します。この値を増やすと、期待される圧縮率が高くなり(条件を満たすのが難しくなる)、減らすと、観測された圧縮が満足できると見なされやすくなります。典型的なデータの圧縮率に合わせるように調整してください。有効範囲:MIN=1、MAX=65537。 +- Introduced in: v3.4.1, 3.5.0, 4.0.0 + +##### lz4_expected_compression_speed_mbps + +- Default: 600 +- Type: double +- Unit: MB/秒 +- Is mutable: はい +- Description: 適応圧縮ポリシー (CompressStrategy) で使用されるメガバイト/秒単位の期待されるLZ4圧縮スループット。フィードバックルーチンは `reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps)` を計算します。`reward_ratio` が1.0より大きい場合、正のカウンタ (alpha) がインクリメントされ、そうでない場合は負のカウンタ (beta) がインクリメントされます。これは、将来のデータが圧縮されるかどうかに影響します。この値をハードウェアでの典型的なLZ4スループットを反映するように調整してください。値を上げると、ポリシーが実行を「良好」と分類するのが難しくなり (より高い観測速度が必要)、下げると分類が容易になります。正の有限数である必要があります。 +- Introduced in: v3.4.1, 3.5.0, 4.0.0 + +##### make_snapshot_worker_count + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BEノード上のスナップショット作成タスクの最大スレッド数。 +- Introduced in: - + +##### manual_compaction_threads + +- Default: 4 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: Manual Compactionのスレッド数。 +- Introduced in: - + +##### max_base_compaction_num_singleton_deltas + +- Default: 100 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 各Base Compactionで圧縮できるセグメントの最大数。 +- Introduced in: - + +##### max_compaction_candidate_num + +- Default: 40960 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: コンパクションの候補タブレットの最大数。値が大きすぎると、高いメモリ使用量と高いCPU負荷を引き起こします。 +- Introduced in: - + +##### max_compaction_concurrency + +- Default: -1 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: コンパクション(Base CompactionとCumulative Compactionの両方を含む)の最大同時実行性。値 `-1` は同時実行性に制限がないことを示します。`0` はコンパクションを無効にすることを示します。イベントベースのコンパクションフレームワークが有効な場合、このパラメータは変更可能です。 +- Introduced in: - + +##### max_cumulative_compaction_num_singleton_deltas + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 単一のCumulative Compactionでマージできるセグメントの最大数。コンパクション中にOOMが発生した場合、この値を減らすことができます。 +- Introduced in: - + +##### max_download_speed_kbps + +- Default: 50000 +- Type: Int +- Unit: KB/秒 +- Is mutable: はい +- Description: 各HTTPリクエストの最大ダウンロード速度。この値はBEノード間のデータレプリカ同期のパフォーマンスに影響します。 +- Introduced in: - + +##### max_garbage_sweep_interval + +- Default: 3600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ストレージボリュームのガベージコレクションの最大時間間隔。この設定はv3.0以降、動的に変更可能になりました。 +- Introduced in: - + +##### max_percentage_of_error_disk + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 対応するBEノードが終了する前に、ストレージボリュームで許容できるエラーディスクの最大割合。 +- Introduced in: - + +##### max_queueing_memtable_per_tablet + +- Default: 2 +- Type: Long +- Unit: 個 +- Is mutable: はい +- Description: 書き込みパスのタブレットごとのバックプレッシャを制御します。タブレットのキューイング(まだフラッシュされていない)memtableの数が `max_queueing_memtable_per_tablet` に達するかそれを超えると、`LocalTabletsChannel` および `LakeTabletsChannel` のライターは、より多くの書き込み作業を送信する前にブロック(スリープ/再試行)します。これにより、同時memtableフラッシュの同時実行性とピークメモリ使用量が減少しますが、大量の負荷がかかるとレイテンシやRPCタイムアウトが増加するコストがかかります。より多くの同時memtableを許可するには(メモリとI/Oバーストが増加)、この値を高く設定します。メモリ負荷を制限し、書き込みスロットリングを増やすには、この値を低く設定します。 +- Introduced in: v3.2.0 + +##### max_row_source_mask_memory_bytes + +- Default: 209715200 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: 行ソースマスクバッファの最大メモリサイズ。バッファがこの値より大きい場合、データはディスク上の一時ファイルに永続化されます。この値は `compaction_memory_limit_per_worker` の値よりも低く設定する必要があります。 +- Introduced in: - + +##### max_tablet_write_chunk_bytes + +- Default: 536870912 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: 現在のインメモリタブレット書き込みチャンクの最大許容メモリ(バイト単位)。この値を超えると、チャンクは満杯と見なされ、送信キューに入れられます。この値を増やすと、ワイドテーブル(多くの列)をロードする際のRPCの頻度を減らすことができ、これによりスループットが向上する可能性がありますが、メモリ使用量とRPCペイロードが大きくなります。RPCの削減とメモリおよびシリアライズ/BRPCの制限のバランスをとるように調整してください。 +- Introduced in: v3.2.12 + +##### max_update_compaction_num_singleton_deltas + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーテーブルの単一のコンパクションでマージできるrowsetの最大数。 +- Introduced in: - + +##### memory_limitation_per_thread_for_schema_change + +- Default: 2 +- Type: Int +- Unit: GB +- Is mutable: はい +- Description: 各スキーマ変更タスクに許可される最大メモリサイズ。 +- Introduced in: - + +##### memory_ratio_for_sorting_schema_change + +- Default: 0.8 +- Type: Double +- Unit: - (単位なし比率) +- Is mutable: はい +- Description: スキーマ変更ソート操作中のメンテーブルの最大バッファサイズとして使用されるスレッドごとのスキーマ変更メモリ制限の割合。この比率は `memory_limitation_per_thread_for_schema_change` (GBで設定され、バイトに変換される) に乗算されて `max_buffer_size` が計算され、その結果は4GBで上限が設定されます。`SchemaChangeWithSorting` および `SortedSchemaChange` が `MemTable/DeltaWriter` を作成するときに使用されます。この比率を上げると、より大きなインメモリバッファが許可されます(フラッシュ/マージが減少)が、メモリ負荷のリスクが高まります。減らすと、より頻繁なフラッシュとより高いI/O/マージオーバーヘッドが発生します。 +- Introduced in: v3.2.0 + +##### min_base_compaction_num_singleton_deltas + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Base Compactionをトリガーする最小セグメント数。 +- Introduced in: - + +##### min_compaction_failure_interval_sec + +- Default: 120 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 以前のコンパクション失敗からタブレットコンパクションをスケジュールできる最小時間間隔。 +- Introduced in: - + +##### min_cumulative_compaction_failure_interval_sec + +- Default: 30 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 失敗時にCumulative Compactionが再試行する最小時間間隔。 +- Introduced in: - + +##### min_cumulative_compaction_num_singleton_deltas + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Cumulative Compactionをトリガーする最小セグメント数。 +- Introduced in: - + +##### min_garbage_sweep_interval + +- Default: 180 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ストレージボリュームのガベージコレクションの最小時間間隔。この設定はv3.0以降、動的に変更可能になりました。 +- Introduced in: - + +##### parallel_clone_task_per_path + +- Default: 8 +- Type: Int +- Unit: スレッド +- Is mutable: はい +- Description: BE上のストレージパスごとに割り当てられる並列クローンワーカースレッドの数。BE起動時、クローンスレッドプールの最大スレッド数はmax(`number_of_store_paths` * `parallel_clone_task_per_path`, MIN_CLONE_TASK_THREADS_IN_POOL) として計算されます。例えば、4つのストレージパスとデフォルト=8の場合、クローンプール最大値は32です。この設定は、BEによって処理されるCLONEタスク(タブレットレプリカコピー)の同時実行性を直接制御します。これを増やすと、並列クローンのスループットが向上しますが、CPU、ディスク、ネットワークの競合も増加します。減らすと、同時クローンタスクが制限され、FEスケジュールされたクローン操作が抑制される可能性があります。この値は動的クローンスレッドプールに適用され、update-config HTTPアクションを介して実行時に変更できます(agent_serverがクローンプールの最大スレッドを更新するようにします)。 +- Introduced in: v3.2.0 + +##### partial_update_memory_limit_per_worker + +- Default: 2147483648 +- Type: long +- Unit: バイト +- Is mutable: はい +- Description: 部分列更新(コンパクション/rowset更新処理で使用)を実行する際に、単一のワーカーがソースチャンクを組み立てるために使用できる最大メモリ(バイト単位)。リーダーは行ごとの更新メモリ(total_update_row_size / num_rows_upt)を推定し、それを読み取られた行数に乗算します。その積がこの制限を超えると、現在のチャンクはフラッシュされ、追加のメモリ増加を避けるために処理されます。これを、更新ワーカーごとに利用可能なメモリに合わせて設定してください。低すぎるとI/O/処理オーバーヘッドが増加し(多くの小さなチャンク)、高すぎるとメモリ負荷やOOMのリスクが高まります。行ごとの推定値がゼロの場合(レガシーrowset)、この設定はバイトベースの制限を課しません(INT32_MAX行数制限のみが適用されます)。 +- Introduced in: v3.2.10 + +##### path_gc_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: 有効にすると、StorageEngineはデータディレクトリごとのバックグラウンドスレッドを起動し、定期的なパススキャンとガベージコレクションを実行します。起動時に `start_bg_threads()` は `_path_scan_thread_callback`(`DataDir::perform_path_scan` と `perform_tmp_path_scan` を呼び出す)と `_path_gc_thread_callback`(`DataDir::perform_path_gc_by_tablet`、`DataDir::perform_path_gc_by_rowsetid`、`DataDir::perform_delta_column_files_gc`、および `DataDir::perform_crm_gc` を呼び出す)を生成します。スキャンとGCの間隔は `path_scan_interval_second` と `path_gc_check_interval_second` によって制御されます。CRMファイルのクリーンアップは `unused_crm_file_threshold_second` を使用します。自動パスレベルのクリーンアップを防ぐにはこれを無効にしてください(その場合、孤立した/一時ファイルを手動で管理する必要があります)。このフラグを変更するにはプロセスの再起動が必要です。 +- Introduced in: v3.2.0 + +##### path_gc_check_interval_second + +- Default: 86400 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: ストレージエンジンのパスガベージコレクションバックグラウンドスレッドの実行間隔(秒単位)。各ウェイクアップは、DataDirがタブレットごと、rowset IDごと、デルタ列ファイルGC、およびCRM GCによってパスGCを実行することをトリガーします(CRM GC呼び出しは `unused_crm_file_threshold_second` を使用します)。非正の値に設定されている場合、コードは強制的に間隔を1800秒(30分)に設定し、警告を出力します。オンディスクの一時ファイルまたはダウンロードされたファイルがスキャンおよび削除される頻度を制御するためにこれを調整してください。 +- Introduced in: v3.2.0 + +##### pending_data_expire_time_sec + +- Default: 1800 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ストレージエンジン内の保留データの有効期限。 +- Introduced in: - + +##### pindex_major_compaction_limit_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: ディスクごとのコンパクションの最大同時実行性。これにより、コンパクションによるディスク間のI/Oの不均一性の問題に対処します。この問題は、特定のディスクでI/Oが過度に高くなる原因となる可能性があります。 +- Introduced in: v3.0.9 + +##### pk_index_compaction_score_ratio + +- Default: 1.5 +- Type: Double +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックスのコンパクションスコア比率。例えば、N個のファイルセットがある場合、コンパクションスコアは `N * pk_index_compaction_score_ratio` となります。 +- Introduced in: - + +##### pk_index_early_sst_compaction_threshold + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックスの早期SSTコンパクション閾値。 +- Introduced in: - + +##### pk_index_map_shard_size + +- Default: 4096 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: lake UpdateManagerのプライマリキーインデックスシャードマップで使用されるシャード数。UpdateManagerは、このサイズの `PkIndexShard` ベクトルを割り当て、ビットマスクを介してタブレットIDをシャードにマップします。この値を増やすと、そうでなければ同じシャードを共有するタブレット間のロック競合が減少しますが、その代償としてより多くのミューテックスオブジェクトとわずかに高いメモリ使用量が発生します。コードがビットマスクインデックスに依存しているため、値は2の累乗である必要があります。サイジングのガイダンスについては、`tablet_map_shard_size` ヒューリスティック `total_num_of_tablets_in_BE / 512` を参照してください。 +- Introduced in: v3.2.0 + +##### pk_index_memtable_flush_threadpool_max_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックスMemTableフラッシュ用スレッドプールの最大スレッド数。`0` はCPUコア数の半分に自動設定されることを意味します。 +- Introduced in: - + +##### pk_index_memtable_flush_threadpool_size + +- Default: 1048576 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データ(クラウドネイティブ/lake)モードで使用されるプライマリキーインデックスmemtableフラッシュスレッドプールの最大キューサイズ(保留中のタスク数)を制御します。スレッドプールはExecEnvで「cloud_native_pk_index_flush」として作成されます。その最大スレッド数は `pk_index_memtable_flush_threadpool_max_threads` によって制御されます。この値を増やすと、実行前にmemtableフラッシュタスクをより多くバッファリングできます。これにより、即時のバックプレッシャは減少しますが、キューに入れられたタスクオブジェクトによって消費されるメモリが増加します。減らすと、バッファリングされたタスクが制限され、スレッドプール動作に応じて、より早いバックプレッシャまたはタスク拒否が発生する可能性があります。利用可能なメモリと予想される同時フラッシュワークロードに応じて調整してください。 +- Introduced in: - + +##### pk_index_memtable_max_count + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックスのMemTablesの最大数。 +- Introduced in: - + +##### pk_index_memtable_max_wait_flush_timeout_ms + +- Default: 30000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックスMemTableフラッシュ完了待機時間の上限。すべてのMemTablesを同期的にフラッシュする場合(例えば、SST操作の取り込み前)、システムはこのタイムアウトまで待機します。デフォルトは30秒です。 +- Introduced in: - + +##### pk_index_parallel_compaction_task_split_threshold_bytes + +- Default: 33554432 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: プライマリキーインデックスコンパクションタスクの分割閾値。タスクに関与するファイルの合計サイズがこの閾値よりも小さい場合、タスクは分割されません。 +- Introduced in: - + +##### pk_index_parallel_compaction_threadpool_max_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのクラウドネイティブプライマリキーインデックス並列コンパクション用スレッドプールの最大スレッド数。`0` はCPUコア数の半分に自動設定されることを意味します。 +- Introduced in: - + +##### pk_index_parallel_compaction_threadpool_size + +- Default: 1048576 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データモードのクラウドネイティブプライマリキーインデックス並列コンパクションで使用されるスレッドプールの最大キューサイズ(保留中のタスク数)。この設定は、スレッドプールが新しい送信を拒否するまでにキューに入れることができるコンパクションタスクの数を制御します。実効的な並列処理は `pk_index_parallel_compaction_threadpool_max_threads` によって制限されます。多くの同時コンパクションタスクが予想される場合にタスクの拒否を避けるにはこの値を増やしますが、キューが大きくなると、キューに入っている作業のメモリとレイテンシが増加することに注意してください。 +- Introduced in: - + +##### pk_index_parallel_execution_min_rows + +- Default: 16384 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックス操作で並列実行を有効にするための最小行閾値。 +- Introduced in: - + +##### pk_index_parallel_execution_threadpool_max_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックス並列実行用スレッドプールの最大スレッド数。`0` はCPUコア数の半分に自動設定されることを意味します。 +- Introduced in: - + +##### pk_index_size_tiered_level_multiplier + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーインデックスのサイズ階層型コンパクション戦略のレベル乗数パラメータ。 +- Introduced in: - + +##### pk_index_size_tiered_max_level + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーインデックスのサイズ階層型コンパクション戦略の最大レベル。 +- Introduced in: - + +##### pk_index_size_tiered_min_level_size + +- Default: 131072 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーインデックスのサイズ階層型コンパクション戦略の最小レベル。 +- Introduced in: - + +##### pk_index_sstable_sample_interval_bytes + +- Default: 16777216 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: 共有データクラスターのSSTableファイルのサンプリング間隔サイズ。SSTableファイルのサイズがこの閾値を超えると、システムはこの間隔でSSTableからキーをサンプリングして、コンパクションタスクの境界パーティションを最適化します。この閾値よりも小さいSSTableの場合、開始キーのみが境界キーとして使用されます。デフォルトは16MBです。 +- Introduced in: - + +##### pk_index_target_file_size + +- Default: 67108864 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーインデックスのターゲットファイルサイズ。 +- Introduced in: - + +##### pk_index_eager_build_threshold_bytes + +- Default: 104857600 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: `enable_pk_index_eager_build` がtrueに設定されている場合、インポートまたはコンパクション中に生成されたデータがこの閾値を超えた場合にのみ、システムはPKインデックスファイルを積極的に構築します。デフォルトは100MBです。 +- Introduced in: - + +##### primary_key_limit_size + +- Default: 128 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: プライマリキーテーブルのキー列の最大サイズ。 +- Introduced in: v2.5 + +##### release_snapshot_worker_count + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BEノード上のスナップショット解放タスクの最大スレッド数。 +- Introduced in: - + +##### repair_compaction_interval_seconds + +- Default: 600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Repair Compactionスレッドのポーリング時間間隔。 +- Introduced in: - + +##### replication_max_speed_limit_kbps + +- Default: 50000 +- Type: Int +- Unit: KB/秒 +- Is mutable: はい +- Description: 各レプリケーションスレッドの最大速度。 +- Introduced in: v3.3.5 + +##### replication_min_speed_limit_kbps + +- Default: 50 +- Type: Int +- Unit: KB/秒 +- Is mutable: はい +- Description: 各レプリケーションスレッドの最小速度。 +- Introduced in: v3.3.5 + +##### replication_min_speed_time_seconds + +- Default: 300 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: レプリケーションスレッドが最小速度を下回ることを許容される時間。実際の速度が `replication_min_speed_limit_kbps` より低い時間がこの値を超えると、レプリケーションは失敗します。 +- Introduced in: v3.3.5 + +##### replication_threads + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: レプリケーションに使用される最大スレッド数。`0` はスレッド数をBE CPUコア数の4倍に設定することを示します。 +- Introduced in: v3.3.5 + +##### size_tiered_level_multiple + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: サイズ階層型コンパクションポリシーにおける2つの連続するレベル間のデータサイズの倍数。 +- Introduced in: - + +##### size_tiered_level_multiple_dupkey + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: サイズ階層型コンパクションポリシーにおける、Duplicate Keyテーブルの2つの隣接するレベル間のデータ量の差の倍数。 +- Introduced in: - + +##### size_tiered_level_num + +- Default: 7 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: サイズ階層型コンパクションポリシーのレベル数。各レベルには最大1つのrowsetが予約されます。したがって、安定した状態では、この構成項目で指定されたレベル数と同じ数のrowsetが最大で存在します。 +- Introduced in: - + +##### size_tiered_max_compaction_level + +- Default: 3 +- Type: Int +- Unit: レベル +- Is mutable: はい +- Description: 単一のプライマリキーリアルタイムコンパクションタスクにマージできるサイズ階層レベルの数を制限します。PKサイズ階層型コンパクションの選択中、StarRocksはサイズによってrowsetの順序付けられた「レベル」を構築し、この制限に達するまで連続するレベルを選択されたコンパクション入力に追加します(コードは `compaction_level <= size_tiered_max_compaction_level` を使用します)。この値は含まれ、マージされた異なるサイズ階層の数をカウントします(最上位レベルは1としてカウントされます)。PKサイズ階層型コンパクション戦略が有効な場合にのみ有効です。これを上げると、コンパクションタスクにさらに多くのレベルを含めることができます(より大きく、I/OとCPUを多用するマージ、潜在的により高い書き込み増幅)。一方、下げるとマージが制限され、タスクサイズとリソース使用量が削減されます。 +- Introduced in: v4.0.0 + +##### size_tiered_min_level_size + +- Default: 131072 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: サイズ階層型コンパクションポリシーにおける最小レベルのデータサイズ。この値よりも小さいrowsetは、直ちにデータコンパクションをトリガーします。 +- Introduced in: - + +##### small_dictionary_page_size + +- Default: 4096 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: `BinaryPlainPageDecoder` が辞書(バイナリ/プレーン)ページを積極的に解析するかどうかを決定する閾値(バイト単位)。ページのエンコードサイズが `small_dictionary_page_size` 未満の場合、デコーダーはすべての文字列エントリをインメモリベクトル(`_parsed_datas`)に事前に解析し、ランダムアクセスとバッチ読み取りを高速化します。この値を上げると、より多くのページが事前に解析されます(アクセスごとのデコードオーバーヘッドを削減し、より大きな辞書の実効圧縮率を向上させる可能性があります)が、メモリ使用量と解析に費やされるCPUが増加します。過度に大きな値は全体的なパフォーマンスを低下させる可能性があります。メモリとアクセスレイテンシのトレードオフを測定した後でのみ調整してください。 +- Introduced in: v3.4.1, v3.5.0 + +##### snapshot_expire_time_sec + +- Default: 172800 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: スナップショットファイルの有効期限。 +- Introduced in: - + +##### stale_memtable_flush_time_sec + +- Default: 0 +- Type: long +- Unit: 秒 +- Is mutable: はい +- Description: 送信ジョブのメモリ使用量が高い場合、`stale_memtable_flush_time_sec` 秒よりも長く更新されていないメンテーブルは、メモリ負荷を軽減するためにフラッシュされます。この動作は、メモリ制限が近づいている場合(`limit_exceeded_by_ratio(70)` 以上)にのみ考慮されます。`LocalTabletsChannel` では、非常に高いメモリ使用量(`limit_exceeded_by_ratio(95)`)の場合、追加パスでサイズが `write_buffer_size / 4` を超えるメンテーブルをフラッシュする可能性があります。値 `0` は、この年齢ベースの古いメンテーブルフラッシングを無効にします(不変パーティションのメンテーブルは、アイドル状態または高メモリ時にすぐにフラッシュされます)。 +- Introduced in: v3.2.0 + +##### storage_flood_stage_left_capacity_bytes + +- Default: 107374182400 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: すべてのBEディレクトリに残っているストレージスペースのハードリミット。BEストレージディレクトリの残りのストレージスペースがこの値よりも少なく、ストレージ使用量(パーセンテージ)が `storage_flood_stage_usage_percent` を超えている場合、ロードジョブとリストアジョブは拒否されます。構成を有効にするには、FE構成項目 `storage_usage_hard_limit_reserve_bytes` と一緒にこの項目を設定する必要があります。 +- Introduced in: - + +##### storage_flood_stage_usage_percent + +- Default: 95 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: すべてのBEディレクトリのストレージ使用率(パーセンテージ)のハードリミット。BEストレージディレクトリのストレージ使用率(パーセンテージ)がこの値を超え、残りのストレージスペースが `storage_flood_stage_left_capacity_bytes` 未満の場合、ロードジョブとリストアジョブは拒否されます。構成を有効にするには、FE構成項目 `storage_usage_hard_limit_percent` と一緒にこの項目を設定する必要があります。 +- Introduced in: - + +##### storage_high_usage_disk_protect_ratio + +- Default: 0.1 +- Type: double +- Unit: - +- Is mutable: はい +- Description: タブレット作成のストレージルートを選択する際、`StorageEngine` は候補ディスクを `disk_usage(0)` でソートし、平均使用量を計算します。使用量が (平均使用量 + `storage_high_usage_disk_protect_ratio`) より大きいディスクは、優先選択プールから除外されます(ランダム化された優先シャッフルには参加せず、したがって初期選択から延期されます)。この保護を無効にするには0に設定します。値は分数です(一般的な範囲は0.0〜1.0)。値が大きいほど、スケジューラは平均よりも高いディスクに対して寛容になります。 +- Introduced in: v3.2.0 + +##### storage_medium_migrate_count + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: ストレージメディアの移行(SATAからSSDへ)に使用されるスレッド数。 +- Introduced in: - + +##### storage_root_path + +- Default: `${STARROCKS_HOME}/storage` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: ストレージボリュームのディレクトリとメディア。例:`/data1,medium:hdd;/data2,medium:ssd`。 + - 複数のボリュームはセミコロン(`;`)で区切られます。 + - ストレージメディアがSSDの場合、ディレクトリの最後に `,medium:ssd` を追加します。 + - ストレージメディアがHDDの場合、ディレクトリの最後に `,medium:hdd` を追加します。 +- Introduced in: - + +##### sync_tablet_meta + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: タブレットメタデータの同期を有効にするかどうかを制御するブール値。`true` は同期を有効にすることを示し、`false` は無効にすることを示します。 +- Introduced in: - + +##### tablet_map_shard_size + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: タブレットマップのシャードサイズ。値は2の累乗である必要があります。 +- Introduced in: - + +##### tablet_max_pending_versions + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキータブレットで許容される保留バージョンの最大数。保留バージョンとは、コミットされたがまだ適用されていないバージョンを指します。 +- Introduced in: - + +##### tablet_max_versions + +- Default: 1000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: タブレットに許可される最大バージョン数。バージョン数がこの値を超えると、新しい書き込みリクエストは失敗します。 +- Introduced in: - + +##### tablet_meta_checkpoint_min_interval_secs + +- Default: 600 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: TabletMetaチェックポイントのスレッドポーリング時間間隔。 +- Introduced in: - + +##### tablet_meta_checkpoint_min_new_rowsets_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 最後のTabletMetaチェックポイント以降に作成する最小rowset数。 +- Introduced in: - + +##### tablet_rowset_stale_sweep_time_sec + +- Default: 1800 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: タブレット内の古いrowsetをスイープする時間間隔。 +- Introduced in: - + +##### tablet_stat_cache_update_interval_second + +- Default: 300 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Tablet Stat Cacheが更新される時間間隔。 +- Introduced in: - + +##### tablet_writer_open_rpc_timeout_sec + +- Default: 300 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: リモートBEでタブレットライターを開くRPCのタイムアウト(秒単位)。値はミリ秒に変換され、オープン呼び出しを発行する際のリクエストタイムアウトとbrpcコントロールタイムアウトの両方に適用されます。ランタイムは、実効タイムアウトを `tablet_writer_open_rpc_timeout_sec` と全体的なロードタイムアウトの半分(つまり、min(`tablet_writer_open_rpc_timeout_sec`, `load_timeout_sec` / 2))の最小値として使用します。この値を、タイムリーな障害検出(小さすぎると早期のオープン失敗を引き起こす可能性あり)とBEがライターを初期化するのに十分な時間を与える(大きすぎるとエラー処理が遅れる)バランスをとるように設定してください。 +- Introduced in: v3.2.0 + +##### transaction_apply_worker_count + +- Default: 0 +- Type: Int +- Unit: スレッド +- Is mutable: はい +- Description: UpdateManagerの「update_apply」スレッドプール(トランザクションのrowsetを適用するプール、特にプライマリキーテーブルの場合)が使用するワーカースレッドの最大数を制御します。`>0` の値は固定された最大スレッド数を設定します。`0`(デフォルト)はプールのサイズがCPUコア数に等しいことを意味します。設定された値は起動時(`UpdateManager::init`)に適用され、`update-config` HTTPアクションを介して実行時に変更でき、プールの最大スレッドを更新します。これを調整して、適用同時実行性(スループット)を向上させるか、CPU/メモリの競合を制限してください。最小スレッド数とアイドルタイムアウトはそれぞれ `transaction_apply_thread_pool_num_min` と `transaction_apply_worker_idle_time_ms` によって制御されます。 +- Introduced in: v3.2.0 + +##### transaction_apply_worker_idle_time_ms + +- Default: 500 +- Type: int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: トランザクション/更新を適用するために使用されるUpdateManagerの「update_apply」スレッドプールのアイドルタイムアウト(ミリ秒単位)を設定します。この値は `MonoDelta::FromMilliseconds` を介して `ThreadPoolBuilder::set_idle_timeout` に渡されるため、このタイムアウトよりも長くアイドル状態が続くワーカースレッドは終了する可能性があります(プールの設定された最小スレッド数と最大スレッド数に従います)。値が低いほどリソースを早く解放しますが、バースト負荷の下ではスレッド作成/破棄のオーバーヘッドが増加します。値が高いほど、ベースラインのリソース使用量が増加するコストで、短時間のバーストの間はワーカーをウォームに保ちます。 +- Introduced in: v3.2.11 + +##### trash_file_expire_time_sec + +- Default: 86400 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: ゴミファイルをクリーンアップする時間間隔。v2.5.17、v3.0.9、v3.1.6以降、デフォルト値は259,200から86,400に変更されました。 +- Introduced in: - + +##### unused_rowset_monitor_interval + +- Default: 30 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: 期限切れのrowsetをクリーンアップする時間間隔。 +- Introduced in: - + +##### update_cache_expire_sec + +- Default: 360 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Update Cacheの有効期限。 +- Introduced in: - + +##### update_compaction_check_interval_seconds + +- Default: 10 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: プライマリキーテーブルのコンパクションをチェックする時間間隔。 +- Introduced in: - + +##### update_compaction_delvec_file_io_amp_ratio + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーテーブルのDelvecファイルを含むrowsetのコンパクション優先度を制御するために使用されます。値が大きいほど優先度が高くなります。 +- Introduced in: - + +##### update_compaction_num_threads_per_disk + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーテーブルのディスクごとのコンパクションスレッド数。 +- Introduced in: - + +##### update_compaction_per_tablet_min_interval_seconds + +- Default: 120 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: プライマリキーテーブルの各タブレットでコンパクションがトリガーされる最小時間間隔。 +- Introduced in: - + +##### update_compaction_ratio_threshold + +- Default: 0.5 +- Type: Double +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーテーブルのコンパクションがマージできるデータの最大割合。単一のタブレットが過度に大きくなる場合、この値を縮小することを推奨します。 +- Introduced in: v3.1.5 + +##### update_compaction_result_bytes + +- Default: 1073741824 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: プライマリキーテーブルの単一コンパクションの最大結果サイズ。 +- Introduced in: - + +##### update_compaction_size_threshold + +- Default: 268435456 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: プライマリキーテーブルのコンパクションスコアはファイルサイズに基づいて計算され、他のテーブルタイプとは異なります。このパラメータは、プライマリキーテーブルのコンパクションスコアを他のテーブルタイプと同様にし、ユーザーが理解しやすくするために使用できます。 +- Introduced in: - + +##### upload_worker_count + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BEノード上のバックアップジョブのアップロードタスクの最大スレッド数。`0` は、BEが稼働しているマシンのCPUコア数に値を設定することを示します。 +- Introduced in: - + +##### vertical_compaction_max_columns_per_group + +- Default: 5 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 垂直コンパクションのグループあたりの最大列数。 +- Introduced in: - + +### 共有データ + +##### download_buffer_size + +- Default: 4194304 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: スナップショットファイルをダウンロードする際に使用されるインメモリコピーバッファのサイズ(バイト単位)。`SnapshotLoader::download` はこの値を `fs::copy` に転送ごとのチャンクサイズとして渡し、リモートのシーケンシャルファイルからローカルの書き込み可能ファイルに読み込む際に使用します。値が大きいほど、システムコール/I/Oオーバーヘッドが減少するため、高帯域幅リンクでのスループットが向上する可能性があります。値が小さいほど、アクティブな転送ごとのピークメモリ使用量が減少します。注:このパラメータはストリームごとのバッファサイズを制御し、ダウンロードスレッドの数は制御しません。総メモリ消費量 = `download_buffer_size` * `number_of_concurrent_downloads`。 +- Introduced in: v3.2.13 + +##### graceful_exit_wait_for_frontend_heartbeat + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: グレースフルシャットダウンを完了する前に、SHUTDOWNステータスを示すフロントエンドハートビート応答を少なくとも1つ待機するかどうかを決定します。有効にすると、グレースフルシャットダウンプロセスは、ハートビートRPCを介してSHUTDOWN確認が応答されるまでアクティブなままになり、フロントエンドが2つの通常のハートビート間隔の間で終了状態を検出するのに十分な時間を確保します。 +- Introduced in: v3.4.5 + +##### lake_compaction_stream_buffer_size_bytes + +- Default: 1048576 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: 共有データクラスターのクラウドネイティブテーブルコンパクション用のリーダーのリモートI/Oバッファサイズ。デフォルト値は1MBです。この値を増やすとコンパクションプロセスを高速化できます。 +- Introduced in: v3.2.3 + +##### lake_pk_compaction_max_input_rowsets + +- Default: 500 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターのプライマリキーテーブルのコンパクションタスクで許可される入力rowsetの最大数。このパラメータのデフォルト値は、v3.2.4およびv3.1.10以降 `5` から `1000` に、v3.3.1およびv3.2.9以降 `500` に変更されました。プライマリキーテーブルのサイズ階層型コンパクションポリシーが有効になった後(`enable_pk_size_tiered_compaction_strategy` を `true` に設定することで)、StarRocksは書き込み増幅を減らすために各コンパクションのrowset数を制限する必要がありません。したがって、このパラメータのデフォルト値は増加しています。 +- Introduced in: v3.1.8, v3.2.3 + +##### loop_count_wait_fragments_finish + +- Default: 2 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: BE/CNプロセスが終了する際に待機するループの数。各ループは固定された10秒間隔です。ループ待機を無効にするには `0` に設定できます。v3.4以降、この項目は変更可能になり、デフォルト値は `0` から `2` に変更されました。 +- Introduced in: v2.5 + +##### max_client_cache_size_per_host + +- Default: 10 +- Type: Int +- Unit: エントリ (キャッシュされたクライアントインスタンス)/ホスト +- Is mutable: いいえ +- Description: BE全体のクライアントキャッシュによって各リモートホストに対して保持されるキャッシュされたクライアントインスタンスの最大数。この単一の設定は、ExecEnv初期化中にBackendServiceClientCache、FrontendServiceClientCache、およびBrokerServiceClientCacheを作成する際に使用されるため、これらのキャッシュ全体でホストごとに保持されるクライアントスタブ/接続の数を制限します。この値を上げると、再接続とスタブ作成のオーバーヘッドが減少しますが、メモリとファイルディスクリプタの使用量が増加します。減らすとリソースは節約されますが、接続のチャーンが増加する可能性があります。値は起動時に読み取られ、実行時に変更することはできません。現在、1つの共有設定ですべてのクライアントキャッシュタイプを制御しています。後でキャッシュごとの個別の設定が導入される可能性があります。 +- Introduced in: v3.2.0 + +##### starlet_filesystem_instance_cache_capacity + +- Default: 10000 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Starletファイルシステムインスタンスのキャッシュ容量。 +- Introduced in: v3.2.16, v3.3.11, v3.4.1 + +##### starlet_filesystem_instance_cache_ttl_sec + +- Default: 86400 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Starletファイルシステムインスタンスのキャッシュ有効期限。 +- Introduced in: v3.3.15, 3.4.5 + +##### starlet_port + +- Default: 9070 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: BEおよびCN用の追加のエージェントサービスポート。 +- Introduced in: - + +##### starlet_star_cache_disk_size_percent + +- Default: 80 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 共有データクラスターでData Cacheが使用できるディスク容量の最大割合。 +- Introduced in: v3.1 + +##### starlet_use_star_cache + +- Default: false in v3.1 and true from v3.2.3 +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターでData Cacheを有効にするかどうか。`true` はこの機能を有効にし、`false` は無効にすることを示します。v3.2.3以降、デフォルト値は `false` から `true` に変更されました。 +- Introduced in: v3.1 + +##### starlet_write_file_with_tag + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターにおいて、オブジェクトストレージに書き込まれるファイルにオブジェクトストレージタグを付けて、カスタムファイル管理を便利にするかどうか。 +- Introduced in: v3.5.3 + +##### table_schema_service_max_retries + +- Default: 3 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Table Schema Serviceリクエストの最大再試行回数。 +- Introduced in: v4.1 + +### データレイク + +##### datacache_block_buffer_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: Data Cache効率を最適化するためにBlock Bufferを有効にするかどうか。Block Bufferが有効になっている場合、システムはData CacheからBlockデータを読み取り、一時バッファにキャッシュします。これにより、頻繁なキャッシュ読み取りによって引き起こされる余分なオーバーヘッドが削減されます。 +- Introduced in: v3.2.0 + +##### datacache_disk_adjust_interval_seconds + +- Default: 10 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Data Cacheの自動容量スケーリングの間隔。定期的に、システムはキャッシュディスク使用量をチェックし、必要に応じて自動スケーリングをトリガーします。 +- Introduced in: v3.3.0 + +##### datacache_disk_idle_seconds_for_expansion + +- Default: 7200 +- Type: Int +- Unit: 秒 +- Is mutable: はい +- Description: Data Cacheの自動拡張の最小待機時間。ディスク使用量が `datacache_disk_low_level` をこの期間よりも長く下回っている場合にのみ、自動スケーリングアップがトリガーされます。 +- Introduced in: v3.3.0 + +##### datacache_disk_size + +- Default: 0 +- Type: String +- Unit: - +- Is mutable: はい +- Description: 単一ディスクにキャッシュできるデータの最大量。パーセンテージ(例:`80%`)または物理的な制限(例:`2T`、`500G`)として設定できます。たとえば、2つのディスクを使用し、`datacache_disk_size` パラメータの値を `21474836480`(20 GB)に設定した場合、これら2つのディスクに最大40 GBのデータをキャッシュできます。デフォルト値は `0` で、メモリのみを使用してデータをキャッシュすることを示します。 +- Introduced in: - + +##### datacache_enable + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: いいえ +- Description: Data Cacheを有効にするかどうか。`true` はData Cacheが有効であることを示し、`false` はData Cacheが無効であることを示します。v3.3以降、デフォルト値は `true` に変更されました。 +- Introduced in: - + +##### datacache_eviction_policy + +- Default: slru +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: Data Cacheの退去ポリシー。有効な値:`lru`(最小最近使用)および `slru`(セグメント化LRU)。 +- Introduced in: v3.4.0 + +##### datacache_inline_item_count_limit + +- Default: 130172 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: Data Cacheのインラインキャッシュ項目の最大数。特に小さなキャッシュブロックの場合、Data Cacheはそれらを `inline` モードで格納し、ブロックデータとメタデータを一緒にメモリにキャッシュします。 +- Introduced in: v3.4.0 + +##### datacache_mem_size + +- Default: 0 +- Type: String +- Unit: - +- Is mutable: はい +- Description: メモリにキャッシュできるデータの最大量。パーセンテージ(例:`10%`)または物理的な制限(例:`10G`、`21474836480`)として設定できます。 +- Introduced in: - + +##### datacache_min_disk_quota_for_adjustment + +- Default: 10737418240 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: Data Cacheの自動スケーリングのための最小有効容量。システムがキャッシュ容量をこの値よりも小さく調整しようとすると、キャッシュ容量は直接 `0` に設定され、キャッシュ容量が不足することによる頻繁なキャッシュの満杯と退去による最適ではないパフォーマンスが防止されます。 +- Introduced in: v3.3.0 + +##### disk_high_level + +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: キャッシュ容量の自動スケーリングアップをトリガーするディスク使用率の上限(パーセンテージ)。ディスク使用率がこの値を超えると、システムは自動的にData Cacheからキャッシュデータを削除します。v3.4.0以降、デフォルト値は `80` から `90` に変更されました。この項目はv4.0以降、`datacache_disk_high_level` から `disk_high_level` に名称変更されました。 +- Introduced in: v3.3.0 + +##### disk_low_level + +- Default: 60 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: キャッシュ容量の自動スケーリングダウンをトリガーするディスク使用率の下限(パーセンテージ)。ディスク使用率が `datacache_disk_idle_seconds_for_expansion` で指定された期間、この値を下回ったままになり、Data Cacheに割り当てられたスペースが完全に利用されている場合、システムは自動的に上限を増やすことでキャッシュ容量を拡張します。この項目はv4.0以降、`datacache_disk_low_level` から `disk_low_level` に名称変更されました。 +- Introduced in: v3.3.0 + +##### disk_safe_level + +- Default: 80 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: Data Cacheのディスク使用率の安全レベル(パーセンテージ)。Data Cacheが自動スケーリングを実行する際、システムはディスク使用率がこの値にできるだけ近づくようにキャッシュ容量を調整します。v3.4.0以降、デフォルト値は `70` から `80` に変更されました。この項目はv4.0以降、`datacache_disk_safe_level` から `disk_safe_level` に名称変更されました。 +- Introduced in: v3.3.0 + +##### enable_connector_sink_spill + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 外部テーブルへの書き込みでスピリングを有効にするかどうか。この機能を有効にすると、メモリ不足時に外部テーブルへの書き込みによって多数の小さなファイルが生成されるのを防ぐことができます。現在、この機能はIcebergテーブルへの書き込みのみをサポートしています。 +- Introduced in: v4.0.0 + +##### enable_datacache_disk_auto_adjust + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: Data Cacheディスク容量の自動スケーリングを有効にするかどうか。有効にすると、システムは現在のディスク使用率に基づいてキャッシュ容量を動的に調整します。この項目はv4.0以降、`datacache_auto_adjust_enable` から `enable_datacache_disk_auto_adjust` に名称変更されました。 +- Introduced in: v3.3.0 + +##### jdbc_connection_idle_timeout_ms + +- Default: 600000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: いいえ +- Description: JDBC接続プールでアイドル状態の接続が期限切れになるまでの時間。JDBC接続プールで接続のアイドル時間がこの値を超えると、接続プールは構成項目 `jdbc_minimum_idle_connections` で指定された数を超えるアイドル接続を閉じます。 +- Introduced in: - + +##### jdbc_connection_pool_size + +- Default: 8 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: JDBC接続プールのサイズ。各BEノードでは、同じ `jdbc_url` を持つ外部テーブルにアクセスするクエリは同じ接続プールを共有します。 +- Introduced in: - + +##### jdbc_minimum_idle_connections + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: JDBC接続プール内のアイドル接続の最小数。 +- Introduced in: - + +##### lake_clear_corrupted_cache_data + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターで破損したデータキャッシュをシステムがクリアすることを許可するかどうか。 +- Introduced in: v3.4 + +##### lake_clear_corrupted_cache_meta + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 共有データクラスターで破損したメタデータキャッシュをシステムがクリアすることを許可するかどうか。 +- Introduced in: v3.3 + +##### lake_enable_vertical_compaction_fill_data_cache + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 垂直コンパクションタスクが共有データクラスターのローカルディスクにデータをキャッシュすることを許可するかどうか。 +- Introduced in: v3.1.7, v3.2.3 + +##### lake_replication_read_buffer_size + +- Default: 16777216 +- Type: Long +- Unit: バイト +- Is mutable: はい +- Description: Lakeレプリケーション中にlakeセグメントファイルをダウンロードする際に使用される読み取りバッファサイズ。この値はリモートファイルを読み取るための読み取りごとの割り当てを決定します。実装では、この設定と1MBの最小値のうち大きい方が使用されます。値が大きいほど読み取り呼び出しの回数が減り、スループットが向上する可能性がありますが、同時ダウンロードごとに使用されるメモリが増加します。値が小さいほどメモリ使用量は減少しますが、I/O呼び出しのコストが増加します。ネットワーク帯域幅、ストレージI/O特性、および並列レプリケーションスレッド数に応じて調整してください。 +- Introduced in: - + +##### lake_service_max_concurrency + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: 共有データクラスターのRPCリクエストの最大同時実行性。この閾値に達すると、着信リクエストは拒否されます。この項目が `0` に設定されている場合、同時実行性に制限はありません。 +- Introduced in: - + +##### max_hdfs_scanner_num + +- Default: 50 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: `ConnectorScanNode` が持つことができる同時実行コネクタ(HDFS/リモート)スキャナーの最大数を制限します。スキャンの起動中、ノードは推定同時実行性(メモリ、チャンクサイズ、`scanner_row_num` に基づく)を計算し、この値で上限を設定して、予約するスキャナーとチャンクの数、および起動するスキャナースレッドの数を決定します。また、実行時に保留中のスキャナーをスケジュールする際(過剰なサブスクリプションを避けるため)、およびファイルハンドル制限を考慮して再送信できる保留中のスキャナーの数を決定する際にも参照されます。これを減らすと、スレッド、メモリ、およびオープンファイルの負荷が減少しますが、スループットが低下する可能性があります。増やすと、同時実行性とリソース使用量が増加します。 +- Introduced in: v3.2.0 + +##### query_max_memory_limit_percent + +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: いいえ +- Description: Query Poolが使用できる最大メモリ。Processメモリ制限のパーセンテージとして表されます。 +- Introduced in: v3.1.0 + +##### rocksdb_max_write_buffer_memory_bytes + +- Default: 1073741824 +- Type: Int64 +- Unit: - +- Is mutable: いいえ +- Description: RocksDBのメタの書き込みバッファの最大サイズです。デフォルトは1GBです。 +- Introduced in: v3.5.0 + +##### rocksdb_write_buffer_memory_percent + +- Default: 5 +- Type: Int64 +- Unit: - +- Is mutable: いいえ +- Description: RocksDBのメタの書き込みバッファのメモリ割合です。デフォルトはシステムメモリの5%です。ただし、これとは別に、書き込みバッファメモリの最終的な計算サイズは64MB未満でも1GB(`rocksdb_max_write_buffer_memory_bytes`)を超過することもありません。 +- Introduced in: v3.5.0 + +### その他 + +##### default_mv_resource_group_concurrency_limit + +- Default: 0 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクの最大同時実行性(BEノードごと)。デフォルト値 `0` は制限がないことを示します。 +- Introduced in: v3.1 + +##### default_mv_resource_group_cpu_limit + +- Default: 1 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクが使用できる最大CPUコア数(BEノードごと)。 +- Introduced in: v3.1 + +##### default_mv_resource_group_memory_limit + +- Default: 0.8 +- Type: Double +- Unit: +- Is mutable: はい +- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクが使用できる最大メモリ割合(BEノードごと)。デフォルト値はメモリの80%を示します。 +- Introduced in: v3.1 + +##### default_mv_resource_group_spill_mem_limit_threshold + +- Default: 0.8 +- Type: Double +- Unit: - +- Is mutable: はい +- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクが中間結果のスピリングをトリガーする前のメモリ使用量閾値。デフォルト値はメモリの80%を示します。 +- Introduced in: v3.1 + +##### enable_resolve_hostname_to_ip_in_load_error_url + +- Default: false +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: `error_urls` のデバッグのために、オペレーターがFEハートビートから元のホスト名を使用するか、環境のニーズに基づいてIPアドレスへの解決を強制するかを選択できるかどうか。 + - `true`: ホスト名をIPに解決します。 + - `false` (デフォルト): エラーURLに元のホスト名を保持します。 +- Introduced in: v4.0.1 + +##### enable_retry_apply + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: 有効にすると、再試行可能に分類されたタブレット適用失敗(例えば、一時的なメモリ制限エラー)は、タブレットを即座にエラーとマークする代わりに再試行のために再スケジュールされます。`TabletUpdates` の再試行パスは、現在の失敗回数に乗算し、600秒の最大値にクランプされた `retry_apply_interval_second` を使用して次の試行をスケジュールするため、バックオフは連続する失敗とともに増加します。明示的に再試行不能なエラー(例えば、破損)は再試行を迂回し、適用プロセスを即座にエラー状態に移行させます。再試行は、全体的なタイムアウト/最終条件に達するまで続き、その後、適用はエラー状態に入ります。これをオフにすると、失敗した適用タスクの自動再スケジュールが無効になり、失敗した適用は再試行なしでエラー状態に移行します。 +- Introduced in: v3.2.9 + +##### enable_token_check + +- Default: true +- Type: Boolean +- Unit: - +- Is mutable: はい +- Description: トークンチェックを有効にするかどうかを制御するブール値。`true` はトークンチェックを有効にすることを示し、`false` は無効にすることを示します。 +- Introduced in: - + +##### es_scroll_keepalive + +- Default: 5m +- Type: String +- Unit: 分 (サフィックス付き文字列、例: "5m") +- Is mutable: いいえ +- Description: スクロール検索コンテキストのためにElasticsearchに送信されるキープアライブ期間。この値は、初期スクロールURL(`?scroll=`)の構築時および後続のスクロールリクエストの送信時(`ESScrollQueryBuilder` 経由)にそのまま使用されます(例:「5m」)。これは、ES側でES検索コンテキストがガベージコレクションされるまでの時間を制御します。長く設定するとスクロールコンテキストがより長くアクティブに保たれますが、ESクラスターのリソース使用期間が長くなります。この値はESスキャンリーダーによって起動時に読み取られ、実行時に変更することはできません。 +- Introduced in: v3.2.0 + +##### load_replica_status_check_interval_ms_on_failure + +- Default: 2000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: 以前のチェックRPCが失敗した場合に、セカンダリレプリカがプライマリレプリカのステータスをチェックする間隔。 +- Introduced in: v3.5.1 + +##### load_replica_status_check_interval_ms_on_success + +- Default: 15000 +- Type: Int +- Unit: ミリ秒 +- Is mutable: はい +- Description: 以前のチェックRPCが成功した場合に、セカンダリレプリカがプライマリレプリカのステータスをチェックする間隔。 +- Introduced in: v3.5.1 + +##### max_length_for_bitmap_function + +- Default: 1000000 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: ビットマップ関数の入力値の最大長。 +- Introduced in: - + +##### max_length_for_to_base64 + +- Default: 200000 +- Type: Int +- Unit: バイト +- Is mutable: いいえ +- Description: to_base64() 関数の入力値の最大長。 +- Introduced in: - + +##### memory_high_level + +- Default: 75 +- Type: Long +- Unit: パーセント +- Is mutable: はい +- Description: プロセスメモリ制限のパーセンテージとして表される高水域メモリ閾値。総メモリ消費がこのパーセンテージを超えると、BEは徐々にメモリを解放し始め(現在はデータキャッシュと更新キャッシュを削除することで)、負荷を軽減します。モニターはこの値を使用して `memory_high = mem_limit * memory_high_level / 100` を計算し、消費が `memory_high` を超えた場合、GCアドバイザによってガイドされた制御された削除を実行します。消費が `memory_urgent_level`(別の設定)を超えた場合、より積極的な即時削減が行われます。この値は、閾値を超えた場合に特定のメモリ集約型操作(例えば、プライマリキーのプリロード)を無効にするためにも参照されます。`memory_urgent_level` との検証(`memory_urgent_level` > `memory_high_level`、`memory_high_level` >= 1、`memory_urgent_level` <= 100)を満たす必要があります。 +- Introduced in: v3.2.0 + +##### report_exec_rpc_request_retry_num + +- Default: 10 +- Type: Int +- Unit: - +- Is mutable: はい +- Description: FEに実行RPCリクエストを報告するためのRPCリクエストの再試行回数。デフォルト値は10で、これはRPCリクエストが失敗した場合、そのフラグメントインスタンスがRPCを完了する限り、10回再試行されることを意味します。実行RPCリクエストの報告はロードジョブにとって重要であり、あるフラグメントインスタンスの完了報告が失敗した場合、ロードジョブはタイムアウトするまでハングします。 +- Introduced in: - + +##### sleep_one_second + +- Default: 1 +- Type: Int +- Unit: 秒 +- Is mutable: いいえ +- Description: BEエージェントワーカースレッドが、マスターアドレス/ハートビートがまだ利用できない場合や、短時間の再試行/バックオフが必要な場合に、1秒間の一時停止として使用する小さなグローバルスリープ間隔(秒単位)。コードベースでは、複数のレポートワーカープール(例:ReportDiskStateTaskWorkerPool、ReportOlapTableTaskWorkerPool、ReportWorkgroupTaskWorkerPool)によって参照され、ビジーウェイトを回避し、再試行中のCPU消費を削減します。この値を増やすと、再試行頻度とマスター可用性への応答性が低下します。減らすと、ポーリングレートとCPU使用量が増加します。応答性とリソース使用量のトレードオフを意識してのみ調整してください。 +- Introduced in: v3.2.0 + +##### small_file_dir + +- Default: `${STARROCKS_HOME}/lib/small_file/` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: ファイルマネージャーによってダウンロードされたファイルを保存するために使用されるディレクトリ。 +- Introduced in: - + +##### upload_buffer_size + +- Default: 4194304 +- Type: Int +- Unit: バイト +- Is mutable: はい +- Description: スナップショットファイルをリモートストレージ(ブローカーまたは直接FileSystem)にアップロードする際のファイルコピー操作で使用されるバッファサイズ(バイト単位)。アップロードパス(`snapshot_loader.cpp`)では、この値が `fs::copy` に各アップロードストリームの読み取り/書き込みチャンクサイズとして渡されます。デフォルトは4MiBです。この値を増やすと、高遅延または高帯域幅リンクでのスループットが向上する可能性がありますが、同時アップロードごとのメモリ使用量が増加します。減らすと、ストリームごとのメモリは減少しますが、転送効率が低下する可能性があります。`upload_worker_count` および利用可能な総メモリと合わせて調整してください。 +- Introduced in: v3.2.13 + +##### user_function_dir + +- Default: `${STARROCKS_HOME}/lib/udf` +- Type: String +- Unit: - +- Is mutable: いいえ +- Description: ユーザー定義関数(UDF)を保存するために使用されるディレクトリ。 +- Introduced in: - + +##### web_log_bytes + +- Default: 1048576 (1 MB) +- Type: long +- Unit: バイト +- Is mutable: いいえ +- Description: INFOログファイルから読み取り、BEデバッグウェブサーバーのログページに表示する最大バイト数。ハンドラはこの値を使用してシークオフセットを計算し(最後のNバイトを表示)、非常に大きなログファイルの読み取りまたは提供を回避します。ログファイルがこの値よりも小さい場合、ファイル全体が表示されます。注:現在の実装では、INFOログを読み取って提供するコードはコメントアウトされており、ハンドラはINFOログファイルを開けないと報告するため、ログ提供コードが有効になっていない限り、このパラメータは効果がない可能性があります。 +- Introduced in: v3.2.0 + +### 削除されたパラメータ + +##### enable_bit_unpack_simd + +- Status: 削除済み +- Description: このパラメータは削除されました。ビットアンパックSIMD選択は、現在コンパイル時(AVX2/BMI2)に処理され、デフォルトの実装に自動的にフォールバックします。 +- Removed in: - diff --git a/docs/ja/administration/management/Backup_and_restore.md b/docs/ja/administration/management/Backup_and_restore.md new file mode 100644 index 0000000..17f70c1 --- /dev/null +++ b/docs/ja/administration/management/Backup_and_restore.md @@ -0,0 +1,650 @@ +--- +displayed_sidebar: docs +--- + +# データのバックアップとリストア + +このトピックでは、StarRocksでのデータのバックアップとリストア、または新しいStarRocksクラスターへのデータ移行について説明します。 + +StarRocksは、データをスナップショットとしてリモートストレージシステムにバックアップし、そのデータを任意のStarRocksクラスターにリストアすることをサポートしています。 + +v3.4.0以降、StarRocksはより多くのオブジェクトをサポートし、柔軟性を向上させるために構文をリファクタリングすることにより、BACKUPおよびRESTOREの機能を強化しました。 + +StarRocksは以下のリモートストレージシステムをサポートしています。 + +- Apache™ Hadoop® (HDFS) クラスター +- AWS S3 +- Google GCS +- MinIO + +StarRocksは以下のオブジェクトのバックアップをサポートしています。 + +- 内部データベース、テーブル(全てのタイプとパーティショニング戦略)、およびパーティション +- 外部カタログのメタデータ(v3.4.0以降でサポート) +- 同期マテリアライズドビューと非同期マテリアライズドビュー +- 論理ビュー(v3.4.0以降でサポート) +- ユーザー定義関数 (UDF)(v3.4.0以降でサポート) + +> **NOTE** +> +> Shared-data StarRocksクラスターはデータのBACKUPとRESTOREをサポートしていません。 + +## リポジトリの作成 + +データをバックアップする前に、リポジトリを作成する必要があります。これはリモートストレージシステムにデータスナップショットを保存するために使用されます。StarRocksクラスター内に複数のリポジトリを作成できます。詳細な手順については、[CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md)を参照してください。 + +- HDFSにリポジトリを作成する + +以下の例は、HDFSクラスターに`test_repo`という名前のリポジトリを作成します。 + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "hdfs://:/repo_dir/backup" +PROPERTIES( + "username" = "", + "password" = "" +); +``` + +- AWS S3にリポジトリを作成する + + AWS S3へのアクセス認証方法として、IAMユーザーベースの認証情報(Access KeyとSecret Key)、Instance Profile、またはAssumed Roleを選択できます。 + + - 以下の例は、IAMユーザーベースの認証情報を認証方法として使用し、AWS S3バケット`bucket_s3`に`test_repo`という名前のリポジトリを作成します。 + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyyyyyyyyy", + "aws.s3.region" = "us-east-1" + ); + ``` + + - 以下の例は、Instance Profileを認証方法として使用し、AWS S3バケット`bucket_s3`に`test_repo`という名前のリポジトリを作成します。 + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.use_instance_profile" = "true", + "aws.s3.region" = "us-east-1" + ); + ``` + + - 以下の例は、Assumed Roleを認証方法として使用し、AWS S3バケット`bucket_s3`に`test_repo`という名前のリポジトリを作成します。 + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.use_instance_profile" = "true", + "aws.s3.iam_role_arn" = "arn:aws:iam::xxxxxxxxxx:role/yyyyyyyy", + "aws.s3.region" = "us-east-1" + ); + ``` + +> **NOTE** +> +> StarRocksは、S3Aプロトコルにのみ従ってAWS S3にリポジトリを作成することをサポートしています。したがって、AWS S3にリポジトリを作成する際は、`ON LOCATION`でリポジトリのロケーションとして渡すS3 URIの`s3://`を`s3a://`に置き換える必要があります。 + +- Google GCSにリポジトリを作成する + +以下の例は、Google GCSバケット`bucket_gcs`に`test_repo`という名前のリポジトリを作成します。 + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "s3a://bucket_gcs/backup" +PROPERTIES( + "fs.s3a.access.key" = "xxxxxxxxxxxxxxxxxxxx", + "fs.s3a.secret.key" = "yyyyyyyyyyyyyyyyyyyy", + "fs.s3a.endpoint" = "storage.googleapis.com" +); +``` + +> **NOTE** +> +> - StarRocksは、S3Aプロトコルにのみ従ってGoogle GCSにリポジトリを作成することをサポートしています。したがって、Google GCSにリポジトリを作成する際は、`ON LOCATION`でリポジトリのロケーションとして渡すGCS URIのプレフィックスを`s3a://`に置き換える必要があります。 +> - エンドポイントアドレスに`https`を指定しないでください。 + +- MinIOにリポジトリを作成する + +以下の例は、MinIOバケット`bucket_minio`に`test_repo`という名前のリポジトリを作成します。 + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "s3://bucket_minio/backup" +PROPERTIES( + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy", + "aws.s3.endpoint" = "http://minio:9000" +); +``` + +リポジトリが作成された後、[SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md)を使用してリポジトリを確認できます。データをリストアした後、[DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md)を使用してStarRocks内のリポジトリを削除できます。ただし、リモートストレージシステムにバックアップされたデータスナップショットはStarRocks経由で削除することはできません。リモートストレージシステムで手動で削除する必要があります。 + +## データのバックアップ + +リポジトリが作成されたら、データスナップショットを作成し、リモートリポジトリにバックアップする必要があります。詳細な手順については、[BACKUP](../../sql-reference/sql-statements/backup_restore/BACKUP.md)を参照してください。BACKUPは非同期操作です。[SHOW BACKUP](../../sql-reference/sql-statements/backup_restore/SHOW_BACKUP.md)を使用してBACKUPジョブのステータスを確認したり、[CANCEL BACKUP](../../sql-reference/sql-statements/backup_restore/CANCEL_BACKUP.md)を使用してBACKUPジョブをキャンセルしたりできます。 + +StarRocksは、データベース、テーブル、またはパーティションの粒度でのFULLバックアップをサポートしています。 + +テーブルに大量のデータを保存している場合、パーティションごとにデータをバックアップおよびリストアすることをお勧めします。これにより、ジョブの失敗時の再試行コストを削減できます。定期的に増分データをバックアップする必要がある場合は、テーブルに[パーティショニングプラン](../../table_design/data_distribution/Data_distribution.md#partitioning)を設定し、毎回新しいパーティションのみをバックアップできます。 + +### データベースのバックアップ + +データベースに対して完全なBACKUPを実行すると、そのデータベース内のすべてのテーブル、同期および非同期マテリアライズドビュー、論理ビュー、およびUDFがバックアップされます。 + +以下の例は、データベース`sr_hub`をスナップショット`sr_hub_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +-- v3.4.0以降でサポート。 +BACKUP DATABASE sr_hub SNAPSHOT sr_hub_backup +TO test_repo; + +-- 以前のバージョンでの構文と互換性があります。 +BACKUP SNAPSHOT sr_hub.sr_hub_backup +TO test_repo; +``` + +### テーブルのバックアップ + +StarRocksは、全てのタイプとパーティショニング戦略のテーブルのバックアップとリストアをサポートしています。テーブルに対して完全なBACKUPを実行すると、そのテーブルと、その上に構築された同期マテリアライズドビューがバックアップされます。 + +以下の例は、データベース`sr_hub`からテーブル`sr_member`をスナップショット`sr_member_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +-- v3.4.0以降でサポート。 +BACKUP DATABASE sr_hub SNAPSHOT sr_member_backup +TO test_repo +ON (TABLE sr_member); + +-- 以前のバージョンでの構文と互換性があります。 +BACKUP SNAPSHOT sr_hub.sr_member_backup +TO test_repo +ON (sr_member); +``` + +以下の例は、データベース`sr_hub`から2つのテーブル`sr_member`と`sr_pmc`をスナップショット`sr_core_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_core_backup +TO test_repo +ON (TABLE sr_member, TABLE sr_pmc); +``` + +以下の例は、データベース`sr_hub`からすべてのテーブルをスナップショット`sr_all_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_all_backup +TO test_repo +ON (ALL TABLES); +``` + +### パーティションのバックアップ + +以下の例は、データベース`sr_hub`のテーブル`sr_member`のパーティション`p1`をスナップショット`sr_par_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +-- v3.4.0以降でサポート。 +BACKUP DATABASE sr_hub SNAPSHOT sr_par_backup +TO test_repo +ON (TABLE sr_member PARTITION (p1)); + +-- 以前のバージョンでの構文と互換性があります。 +BACKUP SNAPSHOT sr_hub.sr_par_backup +TO test_repo +ON (sr_member PARTITION (p1)); +``` + +複数のパーティション名をコンマ (`,`) で区切って指定することで、パーティションを一括でバックアップできます。 + +### マテリアライズドビューのバックアップ + +同期マテリアライズドビューは、ベーステーブルのBACKUP操作と同時にバックアップされるため、手動でバックアップする必要はありません。 + +非同期マテリアライズドビューは、それが属するデータベースのBACKUP操作と同時にバックアップできます。手動でバックアップすることもできます。 + +以下の例は、データベース`sr_hub`からマテリアライズドビュー`sr_mv1`をスナップショット`sr_mv1_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv1_backup +TO test_repo +ON (MATERIALIZED VIEW sr_mv1); +``` + +以下の例は、データベース`sr_hub`から2つのマテリアライズドビュー`sr_mv1`と`sr_mv2`をスナップショット`sr_mv2_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv2_backup +TO test_repo +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2); +``` + +以下の例は、データベース`sr_hub`からすべてのマテリアライズドビューをスナップショット`sr_mv3_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv3_backup +TO test_repo +ON (ALL MATERIALIZED VIEWS); +``` + +### 論理ビューのバックアップ + +以下の例は、データベース`sr_hub`から論理ビュー`sr_view1`をスナップショット`sr_view1_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view1_backup +TO test_repo +ON (VIEW sr_view1); +``` + +以下の例は、データベース`sr_hub`から2つの論理ビュー`sr_view1`と`sr_view2`をスナップショット`sr_view2_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view2_backup +TO test_repo +ON (VIEW sr_view1, VIEW sr_view2); +``` + +以下の例は、データベース`sr_hub`からすべての論理ビューをスナップショット`sr_view3_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view3_backup +TO test_repo +ON (ALL VIEWS); +``` + +### UDFのバックアップ + +以下の例は、データベース`sr_hub`からUDF`sr_udf1`をスナップショット`sr_udf1_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf1_backup +TO test_repo +ON (FUNCTION sr_udf1); +``` + +以下の例は、データベース`sr_hub`から2つのUDF`sr_udf1`と`sr_udf2`をスナップショット`sr_udf2_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf2_backup +TO test_repo +ON (FUNCTION sr_udf1, FUNCTION sr_udf2); +``` + +以下の例は、データベース`sr_hub`からすべてのUDFをスナップショット`sr_udf3_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf3_backup +TO test_repo +ON (ALL FUNCTIONS); +``` + +### 外部カタログのメタデータのバックアップ + +以下の例は、外部カタログ`iceberg`のメタデータをスナップショット`iceberg_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP EXTERNAL CATALOG (iceberg) SNAPSHOT iceberg_backup +TO test_repo; +``` + +以下の例は、2つの外部カタログ`iceberg`と`hive`のメタデータをスナップショット`iceberg_hive_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP EXTERNAL CATALOGS (iceberg, hive) SNAPSHOT iceberg_hive_backup +TO test_repo; +``` + +以下の例は、すべての外部カタログのメタデータをスナップショット`all_catalog_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 + +```SQL +BACKUP ALL EXTERNAL CATALOGS SNAPSHOT all_catalog_backup +TO test_repo; +``` + +外部カタログに対するBACKUP操作をキャンセルするには、次のステートメントを実行します。 + +```SQL +CANCEL BACKUP FOR EXTERNAL CATALOG; +``` + +## データのリストア + +リモートストレージシステムにバックアップされたデータスナップショットを、現在のStarRocksクラスターまたは他のStarRocksクラスターにリストアして、データを回復または移行できます。 + +**スナップショットからオブジェクトをリストアする際は、スナップショットのタイムスタンプを指定する必要があります。** + +リモートストレージシステム内のデータスナップショットをリストアするには、[RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md)ステートメントを使用します。 + +RESTOREは非同期操作です。[SHOW RESTORE](../../sql-reference/sql-statements/backup_restore/SHOW_RESTORE.md)を使用してRESTOREジョブのステータスを確認したり、[CANCEL RESTORE](../../sql-reference/sql-statements/backup_restore/CANCEL_RESTORE.md)を使用してRESTOREジョブをキャンセルしたりできます。 + +### (オプション)新しいクラスターにリポジトリを作成する + +データを別のStarRocksクラスターに移行するには、ターゲットクラスターで同じ**リポジトリ名**と**ロケーション**を持つリポジトリを作成する必要があります。そうしないと、以前にバックアップされたデータスナップショットを表示できません。詳細については、[リポジトリの作成](#create-a-repository)を参照してください。 + +### スナップショットタイムスタンプの取得 + +データをリストアする前に、[SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md)を使用してリポジトリ内のスナップショットを確認し、タイムスタンプを取得できます。 + +以下の例は、`test_repo`内のスナップショット情報を確認します。 + +```Plain +mysql> SHOW SNAPSHOT ON test_repo; ++------------------+-------------------------+--------+ +| Snapshot | Timestamp | Status | ++------------------+-------------------------+--------+ +| sr_member_backup | 2023-02-07-14-45-53-143 | OK | ++------------------+-------------------------+--------+ +1 row in set (1.16 sec) +``` + +### データベースのリストア + +以下の例は、スナップショット`sr_hub_backup`内のデータベース`sr_hub`をターゲットクラスター内のデータベース`sr_hub`にリストアします。スナップショットにデータベースが存在しない場合、システムはエラーを返します。ターゲットクラスターにデータベースが存在しない場合、システムは自動的に作成します。 + +```SQL +-- v3.4.0以降でサポート。 +RESTORE SNAPSHOT sr_hub_backup +FROM test_repo +DATABASE sr_hub +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); + +-- 以前のバージョンでの構文と互換性があります。 +RESTORE SNAPSHOT sr_hub.sr_hub_backup +FROM `test_repo` +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); +``` + +以下の例は、スナップショット`sr_hub_backup`内のデータベース`sr_hub`をターゲットクラスター内のデータベース`sr_hub_new`にリストアします。スナップショットにデータベース`sr_hub`が存在しない場合、システムはエラーを返します。ターゲットクラスターにデータベース`sr_hub_new`が存在しない場合、システムは自動的に作成します。 + +```SQL +-- v3.4.0以降でサポート。 +RESTORE SNAPSHOT sr_hub_backup +FROM test_repo +DATABASE sr_hub AS sr_hub_new +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); +``` + +### テーブルのリストア + +以下の例は、スナップショット`sr_member_backup`内のデータベース`sr_hub`のテーブル`sr_member`を、ターゲットクラスター内のデータベース`sr_hub`のテーブル`sr_member`にリストアします。 + +```SQL +-- v3.4.0以降でサポート。 +RESTORE SNAPSHOT sr_member_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); + +-- 以前のバージョンでの構文と互換性があります。 +RESTORE SNAPSHOT sr_hub.sr_member_backup +FROM test_repo +ON (sr_member) +PROPERTIES ("backup_timestamp"="2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_member_backup`内のデータベース`sr_hub`のテーブル`sr_member`を、ターゲットクラスター内のデータベース`sr_hub_new`のテーブル`sr_member_new`にリストアします。 + +```SQL +RESTORE SNAPSHOT sr_member_backup +FROM test_repo +DATABASE sr_hub AS sr_hub_new +ON (TABLE sr_member AS sr_member_new) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_core_backup`内のデータベース`sr_hub`の2つのテーブル`sr_member`と`sr_pmc`を、ターゲットクラスター内のデータベース`sr_hub`の2つのテーブル`sr_member`と`sr_pmc`にリストアします。 + +```SQL +RESTORE SNAPSHOT sr_core_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member, TABLE sr_pmc) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_all_backup`内のデータベース`sr_hub`からすべてのテーブルをリストアします。 + +```SQL +RESTORE SNAPSHOT sr_all_backup +FROM test_repo +DATABASE sr_hub +ON (ALL TABLES); +``` + +以下の例は、スナップショット`sr_all_backup`内のデータベース`sr_hub`からすべてのテーブルのうちの1つをリストアします。 + +```SQL +RESTORE SNAPSHOT sr_all_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### パーティションのリストア + +以下の例は、スナップショット`sr_par_backup`内のテーブル`sr_member`のパーティション`p1`を、ターゲットクラスター内のテーブル`sr_member`のパーティション`p1`にリストアします。 + +```SQL +-- v3.4.0以降でサポート。 +RESTORE SNAPSHOT sr_par_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member PARTITION (p1)) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); + +-- 以前のバージョンでの構文と互換性があります。 +RESTORE SNAPSHOT sr_hub.sr_par_backup +FROM test_repo +ON (sr_member PARTITION (p1)) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +複数のパーティション名をコンマ (`,`) で区切って指定することで、パーティションを一括でリストアできます。 + +### マテリアライズドビューのリストア + +以下の例は、スナップショット`sr_mv1_backup`内のデータベース`sr_hub`からマテリアライズドビュー`sr_mv1`をターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_mv1_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_mv2_backup`内のデータベース`sr_hub`から2つのマテリアライズドビュー`sr_mv1`と`sr_mv2`をターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_mv2_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_mv3_backup`内のデータベース`sr_hub`からすべてのマテリアライズドビューをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_mv3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL MATERIALIZED VIEWS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_mv3_backup`内のデータベース`sr_hub`からすべてのマテリアライズドビューのうちの1つをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_mv3_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +:::info + +RESTORE後、[SHOW MATERIALIZED VIEWS](../../sql-reference/sql-statements/materialized_view/SHOW_MATERIALIZED_VIEW.md)を使用してマテリアライズドビューのステータスを確認できます。 + +- マテリアライズドビューがアクティブな場合は、直接使用できます。 +- マテリアライズドビューが非アクティブな場合は、そのベーステーブルがリストアされていないためである可能性があります。すべてのベーステーブルがリストアされた後、[ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md)を使用してマテリアライズドビューを再アクティブ化できます。 + +::: + +### 論理ビューのリストア + +以下の例は、スナップショット`sr_view1_backup`内のデータベース`sr_hub`から論理ビュー`sr_view1`をターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_view1_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_view2_backup`内のデータベース`sr_hub`から2つの論理ビュー`sr_view1`と`sr_view2`をターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_view2_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1, VIEW sr_view2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_view3_backup`内のデータベース`sr_hub`からすべての論理ビューをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_view3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL VIEWS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_view3_backup`内のデータベース`sr_hub`からすべての論理ビューのうちの1つをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_view3_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### UDFのリストア + +以下の例は、スナップショット`sr_udf1_backup`内のデータベース`sr_hub`からUDF`sr_udf1`をターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_udf1_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_udf2_backup`内のデータベース`sr_hub`から2つのUDF`sr_udf1`と`sr_udf2`をターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_udf2_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1, FUNCTION sr_udf2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_udf3_backup`内のデータベース`sr_hub`からすべてのUDFをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_udf3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL FUNCTIONS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`sr_udf3_backup`内のデータベース`sr_hub`からすべてのUDFのうちの1つをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT sr_udf3_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### 外部カタログのメタデータのリストア + +以下の例は、スナップショット`iceberg_backup`内の外部カタログ`iceberg`のメタデータをターゲットクラスターにリストアし、`iceberg_new`として名前を変更します。 + +```SQL +RESTORE SNAPSHOT iceberg_backup +FROM test_repo +EXTERNAL CATALOG (iceberg AS iceberg_new) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`iceberg_hive_backup`内の2つの外部カタログ`iceberg`と`hive`のメタデータをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT iceberg_hive_backup +FROM test_repo +EXTERNAL CATALOGS (iceberg, hive) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下の例は、スナップショット`all_catalog_backup`内のすべての外部カタログのメタデータをターゲットクラスターにリストアします。 + +```SQL +RESTORE SNAPSHOT all_catalog_backup +FROM test_repo +ALL EXTERNAL CATALOGS +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +外部カタログに対するRESTORE操作をキャンセルするには、次のステートメントを実行します。 + +```SQL +CANCEL RESTORE FOR EXTERNAL CATALOG; +``` + +## BACKUPまたはRESTOREジョブの構成 + +BE構成ファイル**be.conf**で以下の構成項目を修正することで、BACKUPまたはRESTOREジョブのパフォーマンスを最適化できます。 + +| 構成項目 | 説明 | +| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| `make_snapshot_worker_count` | BEノード上のBACKUPジョブのスナップショット作成タスクのスレッドの最大数。デフォルト: `5`。この構成項目の値を増やすと、スナップショット作成タスクの並行度が高まります。 | +| `release_snapshot_worker_count` | BEノード上の失敗したBACKUPジョブのスナップショット解放タスクのスレッドの最大数。デフォルト: `5`。この構成項目の値を増やすと、スナップショット解放タスクの並行度が高まります。 | +| `upload_worker_count` | BEノード上のBACKUPジョブのアップロードタスクのスレッドの最大数。デフォルト: `0`。`0`は、BEが存在するマシンのCPUコア数に値を設定することを示します。この構成項目の値を増やすと、アップロードタスクの並行度が高まります。 | +| `download_worker_count` | BEノード上のRESTOREジョブのダウンロードタスクのスレッドの最大数。デフォルト: `0`。`0`は、BEが存在するマシンのCPUコア数に値を設定することを示します。この構成項目の値を増やすと、ダウンロードタスクの並行度が高まります。 | + +## 使用上の注意 + +- グローバル、データベース、テーブル、パーティションレベルでのバックアップおよびリストア操作には、異なる権限が必要です。詳細については、[シナリオに応じたロールのカスタマイズ](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios)を参照してください。 +- 各データベースでは、同時に実行できるBACKUPまたはRESTOREジョブは1つだけです。そうでない場合、StarRocksはエラーを返します。 +- BACKUPおよびRESTOREジョブはStarRocksクラスターの多くのリソースを占有するため、StarRocksクラスターの負荷が低いときにデータをバックアップおよびリストアすることをお勧めします。 +- StarRocksは、データバックアップのためのデータ圧縮アルゴリズムの指定をサポートしていません。 +- データはスナップショットとしてバックアップされるため、スナップショット生成時にロードされたデータはスナップショットに含まれません。したがって、スナップショットが生成された後、かつRESTOREジョブが完了する前に古いクラスターにデータをロードした場合、そのデータをリストア先のクラスターにもロードする必要があります。データ移行が完了した後、一定期間両方のクラスターにデータを並行してロードし、データとサービスの正確性を検証した上で、アプリケーションを新しいクラスターに移行することをお勧めします。 +- RESTOREジョブが完了するまで、リストア対象のテーブルを操作することはできません。 +- Primary Keyテーブルは、v2.5より前のStarRocksクラスターにリストアすることはできません。 +- リストアするテーブルは、リストア前に新しいクラスターで作成する必要はありません。RESTOREジョブが自動的に作成します。 +- リストア対象のテーブルと同じ名前の既存テーブルがある場合、StarRocksはまず既存テーブルのスキーマがリストア対象テーブルのスキーマと一致するかどうかを確認します。スキーマが一致する場合、StarRocksは既存テーブルをスナップショット内のデータで上書きします。スキーマが一致しない場合、RESTOREジョブは失敗します。キーワード`AS`を使用してリストア対象のテーブルの名前を変更するか、データをリストアする前に既存のテーブルを削除することができます。 +- RESTOREジョブが既存のデータベース、テーブル、またはパーティションを上書きする場合、ジョブがCOMMITフェーズに入った後、上書きされたデータを元に戻すことはできません。この時点でRESTOREジョブが失敗またはキャンセルされた場合、データが破損し、アクセスできなくなる可能性があります。この場合、RESTORE操作を再度実行し、ジョブが完了するのを待つしかありません。したがって、現在のデータがもう使用されていないことを確認できない限り、上書きによるデータのリストアは推奨されません。上書き操作は、まずスナップショットと既存のデータベース、テーブル、またはパーティション間のメタデータの整合性をチェックします。不整合が検出された場合、RESTORE操作は実行できません。 +- 現在、StarRocksはユーザーアカウント、権限、およびリソースグループに関連する構成データのバックアップとリストアをサポートしていません。 +- 現在、StarRocksはテーブル間のColocate Join関係のバックアップとリストアをサポートしていません。 diff --git a/docs/ja/administration/management/FE_configuration.md b/docs/ja/administration/management/FE_configuration.md new file mode 100644 index 0000000..a84e416 --- /dev/null +++ b/docs/ja/administration/management/FE_configuration.md @@ -0,0 +1,167 @@ +--- +displayed_sidebar: docs +--- + +import FEConfigMethod from '../../_assets/commonMarkdown/FE_config_method.mdx' + +import AdminSetFrontendNote from '../../_assets/commonMarkdown/FE_config_note.mdx' + +import StaticFEConfigNote from '../../_assets/commonMarkdown/StaticFE_config_note.mdx' + +import EditionSpecificFEItem from '../../_assets/commonMarkdown/Edition_Specific_FE_Item.mdx' + +# FE 設定 + + + +## FE 設定項目の表示 + +FE の起動後、MySQL クライアントで `ADMIN SHOW FRONTEND CONFIG` コマンドを実行して、パラメーター設定を確認できます。特定のパラメーターの設定をクエリするには、次のコマンドを実行します。 + +```SQL +ADMIN SHOW FRONTEND CONFIG [LIKE "pattern"]; +``` + +返されるフィールドの詳細については、[ADMIN SHOW CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SHOW_CONFIG.md) を参照してください。 + +:::note +クラスター管理関連コマンドを実行するには、管理者権限が必要です。 +::: + +## FE パラメーターの設定 + +### FE 動的パラメーターの設定 + +[ADMIN SET FRONTEND CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md) を使用して、FE 動的パラメーターの設定を構成または変更できます。 + +```SQL +ADMIN SET FRONTEND CONFIG ("key" = "value"); +``` + + + +### FE 静的パラメーターの設定 + + + +## FE パラメーターについて + +### ロギング + +##### audit_log_delete_age + +- デフォルト: 30d +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: 監査ログファイルの保持期間。デフォルト値の `30d` は、各監査ログファイルが 30 日間保持されることを指定します。StarRocks は各監査ログファイルをチェックし、30 日以上前に生成されたものを削除します。 +- 導入バージョン: - + +##### audit_log_dir + +- デフォルト: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: 監査ログファイルが保存されるディレクトリ。 +- 導入バージョン: - + +##### audit_log_enable_compress + +- デフォルト: false +- 型: Boolean +- 単位: N/A +- 変更可能: いいえ +- 説明: true の場合、生成された Log4j2 設定は、ローテーションされた監査ログファイル名 (fe.audit.log.*) に ".gz" 接尾辞を追加し、Log4j2 がロールオーバー時に圧縮された (.gz) アーカイブ監査ログファイルを生成するようにします。この設定は、FE 起動時に Log4jConfig.initLogging で読み込まれ、監査ログ用の RollingFile アペンダーに適用されます。アクティブな監査ログには影響せず、ローテーション/アーカイブされたファイルにのみ影響します。値は起動時に初期化されるため、変更を有効にするには FE の再起動が必要です。監査ログのローテーション設定 (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num) とともに使用します。 +- 導入バージョン: 3.2.12 + +##### audit_log_json_format + +- デフォルト: false +- 型: Boolean +- 単位: N/A +- 変更可能: はい +- 説明: true の場合、FE 監査イベントは、デフォルトのパイプ区切り「key=value」文字列ではなく、構造化された JSON (AuditEvent フィールドの Map をシリアル化する Jackson ObjectMapper) として出力されます。この設定は、AuditLogBuilder で処理されるすべての組み込み監査シンクに影響します。接続監査、クエリ監査、大規模クエリ監査 (イベントが条件を満たす場合、大規模クエリのしきい値フィールドが JSON に追加されます)、および低速監査出力です。大規模クエリのしきい値と「features」フィールドの注釈が付けられたフィールドは特別に扱われます (通常の監査エントリから除外され、適用可能な場合は大規模クエリまたは機能ログに含まれます)。ログコレクターまたは SIEM のためにログを機械で解析できるようにするには、これを有効にします。ただし、ログ形式が変更され、従来のパイプ区切り形式を想定する既存のパーサーを更新する必要がある場合があります。 +- 導入バージョン: 3.2.7 + +##### audit_log_modules + +- デフォルト: slow_query, query +- 型: String[] +- 単位: - +- 変更可能: いいえ +- 説明: StarRocks が監査ログエントリを生成するモジュール。デフォルトでは、StarRocks は `slow_query` モジュールと `query` モジュールの監査ログを生成します。`connection` モジュールは v3.0 からサポートされています。モジュール名をコンマ (,) とスペースで区切ります。 +- 導入バージョン: - + +##### audit_log_roll_interval + +- デフォルト: DAY +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: StarRocks が監査ログエントリをローテーションする時間間隔。有効な値: `DAY` と `HOUR`。 + - このパラメーターを `DAY` に設定すると、監査ログファイル名に `yyyyMMdd` 形式のサフィックスが追加されます。 + - このパラメーターを `HOUR` に設定すると、監査ログファイル名に `yyyyMMddHH` 形式のサフィックスが追加されます。 +- 導入バージョン: - + +##### audit_log_roll_num + +- デフォルト: 90 +- 型: Int +- 単位: - +- 変更可能: いいえ +- 説明: `audit_log_roll_interval` パラメーターで指定された各保持期間内に保持できる監査ログファイルの最大数。 +- 導入バージョン: - + +##### bdbje_log_level + +- デフォルト: INFO +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: StarRocks で Berkeley DB Java Edition (BDB JE) が使用するログレベルを制御します。BDB 環境初期化 BDBEnvironment.initConfigs() 中に、この値を `com.sleepycat.je` パッケージの Java ロガーと BDB JE 環境ファイルロギングレベル (EnvironmentConfig.FILE_LOGGING_LEVEL) に適用します。SEVERE、WARNING、INFO、CONFIG、FINE、FINER、FINEST、ALL、OFF などの標準的な java.util.logging.Level 名を受け入れます。ALL に設定すると、すべてのログメッセージが有効になります。詳細度を上げるとログ量が増加し、ディスク I/O とパフォーマンスに影響を与える可能性があります。値は BDB 環境が初期化されるときに読み取られるため、環境の (再) 初期化後にのみ有効になります。 +- 導入バージョン: v3.2.0 + +##### big_query_log_delete_age + +- デフォルト: 7d +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: FE 大規模クエリログファイル (`fe.big_query.log.*`) が自動削除されるまでの保持期間を制御します。この値は、Log4j の削除ポリシーに IfLastModified の age として渡されます。最終更新時刻がこの値よりも古いローテーションされた大規模クエリログは削除されます。`d` (日)、`h` (時間)、`m` (分)、`s` (秒) のサフィックスをサポートします。例: `7d` (7 日間)、`10h` (10 時間)、`60m` (60 分)、`120s` (120 秒)。この項目は、`big_query_log_roll_interval` および `big_query_log_roll_num` と連携して、どのファイルを保持またはパージするかを決定します。 +- 導入バージョン: v3.2.0 + +##### big_query_log_dir + +- デフォルト: `Config.STARROCKS_HOME_DIR + "/log"` +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: FE が大規模クエリダンプログ (`fe.big_query.log.*`) を書き込むディレクトリ。Log4j 設定は、このパスを使用して `fe.big_query.log` およびそのローテーションされたファイル用の RollingFile アペンダーを作成します。ローテーションと保持は、`big_query_log_roll_interval` (時間ベースのサフィックス)、`log_roll_size_mb` (サイズトリガー)、`big_query_log_roll_num` (最大ファイル数)、および `big_query_log_delete_age` (時間ベースの削除) によって管理されます。大規模クエリレコードは、`big_query_log_cpu_second_threshold`、`big_query_log_scan_rows_threshold`、または `big_query_log_scan_bytes_threshold` などのユーザー定義のしきい値を超えるクエリに対してログに記録されます。`big_query_log_modules` を使用して、どのモジュールがこのファイルにログを記録するかを制御します。 +- 導入バージョン: v3.2.0 + +##### big_query_log_modules + +- デフォルト: `{"query"}` +- 型: String[] +- 単位: - +- 変更可能: いいえ +- 説明: モジュールごとの大規模クエリロギングを有効にするモジュール名サフィックスのリスト。一般的な値は論理コンポーネント名です。例えば、デフォルトの `query` は `big_query.query` を生成します。 +- 導入バージョン: v3.2.0 + +##### big_query_log_roll_interval + +- デフォルト: `"DAY"` +- 型: String +- 単位: - +- 変更可能: いいえ +- 説明: `big_query` ログアペンダーのローリングファイル名の日付コンポーネントを構築するために使用される時間間隔を指定します。有効な値 (大文字と小文字を区別しない) は `DAY` (デフォルト) と `HOUR` です。`DAY` は日次パターン (`"%d{yyyyMMdd}"`) を生成し、`HOUR` は時間パターン (`"%d{yyyyMMddHH}"`) を生成します。この値は、サイズベースのロールオーバー (`big_query_roll_maxsize`) とインデックスベースのロールオーバー (`big_query_log_roll_num`) と組み合わされて、RollingFile の filePattern を形成します。無効な値は、ログ設定の生成を失敗させ (IOException)、ログの初期化または再設定を妨げる可能性があります。`big_query_log_dir`、`big_query_roll_maxsize`、`big_query_log_roll_num`、および `big_query_log_delete_age` とともに使用します。 +- 導入バージョン: v3.2.0 + +##### big_query_log_roll_num + +- デフォルト: 10 +- 型: Int +- 単位: - +- 変更可能: いいえ +- 説明: `big_query_log_roll_interval` ごとに保持するローテーションされた FE 大規模クエリログファイルの最大数。この値は、`fe.big_query.log` の RollingFile アペンダーの DefaultRolloverStrategy `max` 属性にバインドされます。ログが (時間または `log_roll_size_mb` によって) ロールオーバーされると、StarRocks は最大 `big_query_log_roll_num` 個のインデックス付きファイル diff --git a/docs/ja/administration/management/Scale_up_down.md b/docs/ja/administration/management/Scale_up_down.md new file mode 100644 index 0000000..08e5a68 --- /dev/null +++ b/docs/ja/administration/management/Scale_up_down.md @@ -0,0 +1,99 @@ +--- +displayed_sidebar: docs +--- + +# スケールインとスケールアウト + +このトピックでは、StarRocksのノードをスケールインおよびスケールアウトする方法について説明します。 + +## FEのスケールインとスケールアウト + +StarRocksには、FollowerとObserverの2種類のFEノードがあります。Followerは選挙の投票と書き込みに関与します。Observerはログの同期と読み取りパフォーマンスの拡張にのみ使用されます。 + +> * Follower FEの数(リーダーを含む)は奇数でなければならず、高可用性(HA)モードを形成するために3つをデプロイすることが推奨されます。 +> * FEが高可用性デプロイメント(リーダー1、Follower 2)である場合、読み取りパフォーマンスを向上させるためにObserver FEを追加することが推奨されます。 + +### FEのスケールアウト + +FEノードをデプロイし、サービスを開始した後、以下のコマンドを実行してFEをスケールアウトします。 + +~~~sql +alter system add follower "fe_host:edit_log_port"; +alter system add observer "fe_host:edit_log_port"; +~~~ + +### FEのスケールイン + +FEのスケールインはスケールアウトと似ています。以下のコマンドを実行してFEをスケールインします。 + +~~~sql +alter system drop follower "fe_host:edit_log_port"; +alter system drop observer "fe_host:edit_log_port"; +~~~ + +拡張と縮小の後、`show proc '/frontends';` を実行してノード情報を確認できます。 + +## BEのスケールインとスケールアウト + +StarRocksは、BEがスケールインまたはスケールアウトされた後、全体のパフォーマンスに影響を与えることなく、自動的にロードバランシングを実行します。 + +新しいBEノードを追加すると、システムのTablet Schedulerが新しいノードとその低い負荷を検出し、高負荷のBEノードから新しい低負荷のBEノードへタブレットの移動を開始し、クラスター全体でのデータと負荷の均等な分散を保証します。 + +バランシングプロセスは、各BEに対して計算されるloadScoreに基づいており、ディスク使用率とレプリカ数の両方を考慮します。システムは、loadScoreが高いノードからloadScoreが低いノードへタブレットを移動させることを目指します。 + +FE構成パラメータ`tablet_sched_disable_balance`を確認して、自動バランシングが無効になっていないことを確認できます(このパラメータはデフォルトでfalseであり、タブレットバランシングがデフォルトで有効であることを意味します)。詳細については、[レプリカ管理ドキュメント](./resource_management/Replica.md)を参照してください。 + +### BEのスケールアウト + +以下のコマンドを実行してBEをスケールアウトします。 + +~~~sql +alter system add backend 'be_host:be_heartbeat_service_port'; +~~~ + +以下のコマンドを実行してBEのステータスを確認します。 + +~~~sql +show proc '/backends'; +~~~ + +### BEのスケールイン + +BEノードをスケールインする方法には、`DROP` と `DECOMMISSION` の2つがあります。 + +`DROP`はBEノードを即座に削除し、失われた複製はFEのスケジューリングによって補われます。`DECOMMISSION`はまず複製が補われることを確認してからBEノードを削除します。`DECOMMISSION`の方が少し扱いやすく、BEのスケールインには推奨されます。 + +両方のメソッドのコマンドは似ています: + +* `alter system decommission backend "be_host:be_heartbeat_service_port";` +* `alter system drop backend "be_host:be_heartbeat_service_port";` + +バックエンドのドロップは危険な操作であるため、実行する前に二重に確認する必要があります。 + +* `alter system drop backend "be_host:be_heartbeat_service_port";` + +## CNのスケールインとスケールアウト + +### CNのスケールアウト + +以下のコマンドを実行してCNをスケールアウトします。 + +~~~sql +ALTER SYSTEM ADD COMPUTE NODE "cn_host:cn_heartbeat_service_port"; +~~~ + +以下のコマンドを実行してCNのステータスを確認します。 + +~~~sql +SHOW PROC '/compute_nodes'; +~~~ + +### CNのスケールイン + +CNのスケールインはスケールアウトと似ています。以下のコマンドを実行してCNをスケールインします。 + +~~~sql +ALTER SYSTEM DROP COMPUTE NODE "cn_host:cn_heartbeat_service_port"; +~~~ + +`SHOW PROC '/compute_nodes';` を実行してノード情報を確認できます。 diff --git a/docs/ja/administration/management/audit_loader.md b/docs/ja/administration/management/audit_loader.md new file mode 100644 index 0000000..2038c1a --- /dev/null +++ b/docs/ja/administration/management/audit_loader.md @@ -0,0 +1,221 @@ +--- +displayed_sidebar: docs +--- + +# AuditLoader を介した StarRocks 内での監査ログの管理 + +このトピックでは、プラグイン AuditLoader を介して、テーブル内で StarRocks の監査ログを管理する方法について説明します。 + +StarRocks は、監査ログを内部データベースではなく、ローカルファイル **fe/log/fe.audit.log** に保存します。プラグイン AuditLoader を使用すると、クラスター内で直接監査ログを管理できます。インストールされると、AuditLoader はファイルからログを読み取り、HTTP PUT を介して StarRocks にロードします。その後、SQL ステートメントを使用して StarRocks で監査ログをクエリできます。 + +## 監査ログを保存するテーブルを作成する + +StarRocks クラスターにデータベースとテーブルを作成して、監査ログを保存します。詳細な手順については、[CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) および [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md) を参照してください。 + +監査ログのフィールドは StarRocks のバージョンによって異なるため、アップグレード時の互換性の問題を避けるために、以下に記載されている推奨事項に従うことが重要です。 + +> **注意** +> +> - すべての新しいフィールドは `NULL` とマークする必要があります。 +> - フィールドの名前を変更してはいけません。ユーザーがそれらに依存している可能性があるためです。 +> - フィールドタイプには、`VARCHAR(32)` -> `VARCHAR(64)` のように、後方互換性のある変更のみを適用して、挿入時のエラーを回避する必要があります。 +> - `AuditEvent` フィールドは名前のみで解決されます。テーブル内の列の順序は重要ではなく、ユーザーはいつでも変更できます。 +> - テーブルに存在しない `AuditEvent` フィールドは無視されるため、ユーザーは不要な列を削除できます。 + +```SQL +CREATE DATABASE starrocks_audit_db__; + +CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( + `queryId` VARCHAR(64) COMMENT "クエリの一意のID", + `timestamp` DATETIME NOT NULL COMMENT "クエリ開始時刻", + `queryType` VARCHAR(12) COMMENT "クエリタイプ (query, slow_query, connection)", + `clientIp` VARCHAR(32) COMMENT "クライアントIP", + `user` VARCHAR(64) COMMENT "クエリユーザー名", + `authorizedUser` VARCHAR(64) COMMENT "ユーザーの一意の識別子 (user_identity)", + `resourceGroup` VARCHAR(64) COMMENT "リソースグループ名", + `catalog` VARCHAR(32) COMMENT "カタログ名", + `db` VARCHAR(96) COMMENT "クエリが実行されるデータベース", + `state` VARCHAR(8) COMMENT "クエリの状態 (EOF, ERR, OK)", + `errorCode` VARCHAR(512) COMMENT "エラーコード", + `queryTime` BIGINT COMMENT "クエリ実行時間 (ミリ秒)", + `scanBytes` BIGINT COMMENT "クエリによってスキャンされたバイト数", + `scanRows` BIGINT COMMENT "クエリによってスキャンされた行数", + `returnRows` BIGINT COMMENT "クエリによって返された行数", + `cpuCostNs` BIGINT COMMENT "クエリによって消費されたCPU時間 (ナノ秒)", + `memCostBytes` BIGINT COMMENT "クエリによって消費されたメモリ (バイト)", + `stmtId` INT COMMENT "SQLステートメントのインクリメンタルID", + `isQuery` TINYINT COMMENT "SQLがクエリであるかどうか (1または0)", + `feIp` VARCHAR(128) COMMENT "ステートメントを実行したFEのIP", + `stmt` VARCHAR(1048576) COMMENT "元のSQLステートメント", + `digest` VARCHAR(32) COMMENT "遅いSQLのフィンガープリント", + `planCpuCosts` DOUBLE COMMENT "クエリ計画中のCPU使用率 (ナノ秒)", + `planMemCosts` DOUBLE COMMENT "クエリ計画中のメモリ使用量 (バイト)", + `pendingTimeMs` BIGINT COMMENT "クエリがキューで待機した時間 (ミリ秒)", + `candidateMVs` VARCHAR(65533) NULL COMMENT "候補となるマテリアライズドビューのリスト", + `hitMvs` VARCHAR(65533) NULL COMMENT "一致したマテリアライズドビューのリスト", + `warehouse` VARCHAR(32) NULL COMMENT "ウェアハウス名" +) ENGINE = OLAP +DUPLICATE KEY (`queryId`, `timestamp`, `queryType`) +COMMENT "監査ログテーブル" +PARTITION BY date_trunc('day', `timestamp`) +PROPERTIES ( + "replication_num" = "1", + "partition_live_number" = "30" +); +``` + +`starrocks_audit_tbl__` は動的パーティションで作成されます。デフォルトでは、テーブルが作成されてから10分後に最初の動的パーティションが作成されます。その後、監査ログをテーブルにロードできます。次のステートメントを使用して、テーブル内のパーティションを確認できます。 + +```SQL +SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; +``` + +パーティションが作成されたら、次のステップに進むことができます。 + +## AuditLoader をダウンロードして設定する + +1. [AuditLoader](https://releases.starrocks.io/resources/auditloader.zip) インストールパッケージをダウンロードします。このパッケージは、利用可能なすべてのバージョンの StarRocks と互換性があります。 + +2. インストールパッケージを解凍します。 + + ```shell + unzip auditloader.zip + ``` + + 以下のファイルが解凍されます。 + + - **auditloader.jar**: AuditLoader の JAR ファイル。 + - **plugin.properties**: AuditLoader のプロパティファイル。このファイルを変更する必要はありません。 + - **plugin.conf**: AuditLoader の設定ファイル。ほとんどの場合、`user` および `password` フィールドのみを変更する必要があります。 + +3. AuditLoader を設定するために **plugin.conf** を変更します。AuditLoader が正しく機能するように、以下の項目を設定する必要があります。 + + - `frontend_host_port`: FE の IP アドレスと HTTP ポート。`:` の形式です。デフォルト値 `127.0.0.1:8030` に設定することをお勧めします。StarRocks の各 FE は独自の監査ログを独立して管理し、プラグインをインストールすると、各 FE は独自のバックグラウンドスレッドを開始して監査ログをフェッチおよび保存し、Stream Load を介してそれらを書き込みます。`frontend_host_port` 設定項目は、プラグインのバックグラウンド Stream Load タスクに HTTP プロトコルの IP とポートを提供するために使用され、このパラメータは複数の値をサポートしません。パラメータの IP 部分はクラスター内の任意の FE の IP を使用できますが、対応する FE がクラッシュした場合、他の FE のバックグラウンドにある監査ログ書き込みタスクも通信障害のために失敗するため、推奨されません。デフォルト値 `127.0.0.1:8030` に設定することをお勧めします。これにより、各 FE が自身の HTTP ポートを使用して通信し、他の FE の例外が発生した場合の通信への影響を回避できます(すべての書き込みタスクは最終的に FE Leader ノードに転送されて実行されます)。 + - `database`: 監査ログをホストするために作成したデータベースの名前。 + - `table`: 監査ログをホストするために作成したテーブルの名前。 + - `user`: クラスターのユーザー名。テーブルにデータをロードする権限(LOAD_PRIV)を持っている必要があります。 + - `password`: ユーザーのパスワード。 + - `secret_key`: パスワードを暗号化するために使用されるキー(文字列、16バイト以下である必要があります)。このパラメータが設定されていない場合、**plugin.conf** のパスワードは暗号化されず、`password` に平文のパスワードを指定するだけでよいことを示します。このパラメータが指定されている場合、パスワードはこのキーによって暗号化され、`password` に暗号化された文字列を指定する必要があります。暗号化されたパスワードは、StarRocks で `AES_ENCRYPT` 関数を使用して生成できます: `SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`。 + - `filter`: 監査ログロードのフィルター条件。このパラメータは、Stream Load の [WHERE パラメータ](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties) に基づいており、つまり `-H “where: ”` で、デフォルトは空の文字列です。例: `filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`。 + +4. ファイルをパッケージに戻して zip 圧縮します。 + + ```shell + zip -q -m -r auditloader.zip auditloader.jar plugin.conf plugin.properties + ``` + +5. パッケージをすべての FE ノードをホストするマシンに配布します。すべてのパッケージが同一のパスに保存されていることを確認してください。そうでない場合、インストールは失敗します。パッケージを配布した後、パッケージへの絶対パスをコピーすることを忘れないでください。 + + > **注** + > + > **auditloader.zip** をすべての FE がアクセスできる HTTP サービス(例: `httpd` や `nginx`)に配布し、ネットワーク経由でインストールすることもできます。どちらの場合も、インストールが実行された後、**auditloader.zip** はパスに永続化される必要があり、インストール後にソースファイルを削除してはならないことに注意してください。 + +## AuditLoader をインストールする + +コピーしたパスとともに次のステートメントを実行して、AuditLoader を StarRocks のプラグインとしてインストールします。 + +```SQL +INSTALL PLUGIN FROM ""; +``` + +ローカルパッケージからのインストールの例: + +```SQL +INSTALL PLUGIN FROM ""; +``` + +ネットワークパス経由でプラグインをインストールする場合は、INSTALL ステートメントのプロパティでパッケージの md5 を提供する必要があります。 + +例: + +```sql +INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5sum" = "3975F7B880C9490FE95F42E2B2A28E2D"); +``` + +詳細な手順については、[INSTALL PLUGIN](../../sql-reference/sql-statements/cluster-management/plugin/INSTALL_PLUGIN.md) を参照してください。 + +## インストールの確認と監査ログのクエリ + +1. [SHOW PLUGINS](../../sql-reference/sql-statements/cluster-management/plugin/SHOW_PLUGINS.md) を介して、インストールが成功したかどうかを確認できます。 + + 次の例では、プラグイン `AuditLoader` の `Status` が `INSTALLED` であり、インストールが成功したことを意味します。 + + ```Plain + mysql> SHOW PLUGINS\G + *************************** 1. row *************************** + Name: __builtin_AuditLogBuilder + Type: AUDIT + Description: 組み込み監査ロガー + Version: 0.12.0 + JavaVersion: 1.8.31 + ClassName: com.starrocks.qe.AuditLogBuilder + SoName: NULL + Sources: Builtin + Status: INSTALLED + Properties: {} + *************************** 2. row *************************** + Name: AuditLoader + Type: AUDIT + Description: バージョン3.3.11以降で利用可能。監査ログをStarRocksにロードし、ユーザーはクエリの統計を表示できます + Version: 5.0.0 + JavaVersion: 11 + ClassName: com.starrocks.plugin.audit.AuditLoaderPlugin + SoName: NULL + Sources: /x/xx/xxx/xxxxx/auditloader.zip + Status: INSTALLED + Properties: {} + 2 rows in set (0.01 sec) + ``` + +2. いくつかのランダムな SQL を実行して監査ログを生成し、60秒間(または AuditLoader を設定したときに `max_batch_interval_sec` アイテムで指定した時間)待って、AuditLoader が監査ログを StarRocks にロードできるようにします。 + +3. テーブルをクエリして監査ログを確認します。 + + ```SQL + SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__; + ``` + + 次の例は、監査ログがテーブルに正常にロードされたことを示しています。 + + ```Plain + mysql> SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__\G + *************************** 1. row *************************** + queryId: 01975a33-4129-7520-97a2-05e641cec6c9 + timestamp: 2025-06-10 14:16:37 + queryType: query + clientIp: xxx.xx.xxx.xx:65283 + user: root + authorizedUser: 'root'@'%' + resourceGroup: default_wg + catalog: default_catalog + db: + state: EOF + errorCode: + queryTime: 3 + scanBytes: 0 + scanRows: 0 + returnRows: 1 + cpuCostNs: 33711 + memCostBytes: 4200 + stmtId: 102 + isQuery: 1 + feIp: xxx.xx.xxx.xx + stmt: SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__ + digest: + planCpuCosts: 908 + planMemCosts: 0 + pendingTimeMs: -1 + candidateMvs: null + hitMVs: null + ………… + ``` + +## トラブルシューティング + +動的パーティションが作成され、プラグインがインストールされた後もテーブルに監査ログがロードされない場合は、**plugin.conf** が正しく設定されているかどうかを確認できます。変更するには、まずプラグインをアンインストールする必要があります。 + +```SQL +UNINSTALL PLUGIN AuditLoader; +``` + +AuditLoader のログは **fe.log** に出力されます。**fe.log** でキーワード `audit` を検索して取得できます。すべての設定が正しく行われたら、上記の手順に従って AuditLoader を再度インストールできます。 diff --git a/docs/ja/administration/management/compaction.md b/docs/ja/administration/management/compaction.md new file mode 100644 index 0000000..c61008c --- /dev/null +++ b/docs/ja/administration/management/compaction.md @@ -0,0 +1,302 @@ +--- +displayed_sidebar: docs +--- + +# 共有データクラスターのCompaction + +このトピックでは、StarRocksの共有データクラスターでCompactionを管理する方法について説明します。 + +## 概要 + +StarRocksでの各データロード操作は、データファイルの新しいバージョンを生成します。Compactionは、異なるバージョンのデータファイルをより大きなファイルにマージし、小さなファイルの数を減らしてクエリ効率を向上させます。 + +## Compaction Score + +### 概要 + +*Compaction Score* は、パーティション内のデータファイルのマージ状況を反映します。スコアが高いほどマージの進行度が低いことを示し、パーティションに未マージのデータファイルバージョンがより多く存在することを意味します。FEは、各パーティションのCompaction Score情報を維持しており、これにはMax Compaction Score(パーティション内のすべてのTabletで最も高いスコア)が含まれます。 + +パーティションのMax Compaction ScoreがFEパラメーター `lake_compaction_score_selector_min_score` (デフォルト: 10) を下回る場合、そのパーティションのCompactionは完了していると見なされます。Max Compaction Scoreが100を超える場合は、Compactionが不健全な状態であることを示します。スコアがFEパラメーター `lake_ingest_slowdown_threshold` (デフォルト: 100) を超えると、システムはそのパーティションのデータロードトランザクションコミットを減速させます。 `lake_compaction_score_upper_bound` (デフォルト: 2000) を超えた場合、システムはそのパーティションのインポートトランザクションを拒否します。 + +### 計算ルール + +通常、各データファイルはCompaction Scoreに1貢献します。たとえば、パーティションに1つのTabletがあり、最初のロード操作で生成されたデータファイルが10個ある場合、パーティションのMax Compaction Scoreは10です。Tablet内でトランザクションによって生成されたすべてのデータファイルは、Rowsetとしてグループ化されます。 + +スコア計算中、TabletのRowsetはサイズ別にグループ化され、ファイル数が最も多いグループがTabletのCompaction Scoreを決定します。 + +たとえば、Tabletが7回のロード操作を経て、100 MB、100 MB、100 MB、10 MB、10 MB、10 MB、10 MBというサイズのRowsetを生成します。計算中、システムは3つの100 MBのRowsetを1つのグループにし、4つの10 MBのRowsetを別のグループにします。Compaction Scoreは、より多くのファイルを持つグループに基づいて計算されます。この場合、2番目のグループの方がCompaction Scoreが高くなります。Compactionはスコアが高いグループを優先するため、最初のCompaction後、Rowsetの分布は100 MB、100 MB、100 MB、および40 MBになります。 + +## Compactionワークフロー + +共有データクラスターの場合、StarRocksはFEによって制御される新しいCompactionメカニズムを導入しています。 + +1. **スコア計算**: Leader FEノードは、トランザクションの公開結果に基づいて、パーティションのCompaction Scoreを計算し保存します。 +2. **候補選択**: FEは、最も高いMax Compaction Scoreを持つパーティションをCompaction候補として選択します。 +3. **タスク生成**: FEは、選択されたパーティションに対してCompactionトランザクションを開始し、Tabletレベルのサブタスクを生成し、FEパラメーター `lake_compaction_max_tasks` で設定された制限に達するまでCompute Nodes (CNs) にディスパッチします。 +4. **サブタスク実行**: CNsはバックグラウンドでCompactionサブタスクを実行します。CNごとの同時サブタスクの数は、CNパラメーター `compact_threads` によって制御されます。 +5. **結果収集**: FEはサブタスクの結果を集計し、Compactionトランザクションをコミットします。 +6. **公開**: FEは、正常にコミットされたCompactionトランザクションを公開します。 + +## Compactionの管理 + +### Compaction Scoreの表示 + +- `SHOW PROC` ステートメントを使用して、特定のテーブルのパーティションのCompaction Scoreを表示できます。通常、`MaxCS` フィールドにのみ注目すれば十分です。`MaxCS` が10未満の場合、Compactionは完了していると見なされます。`MaxCS` が100を超える場合、Compaction Scoreは比較的高くなります。`MaxCS` が500を超える場合、Compaction Scoreは非常に高く、手動での介入が必要になる場合があります。 + + ```Plain + SHOW PARTITIONS FROM + SHOW PROC '/dbs///partitions' + ``` + + 例: + + ```Plain + mysql> SHOW PROC '/dbs/load_benchmark/store_sales/partitions'; + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | PartitionId | PartitionName | CompactVersion | VisibleVersion | NextVersion | State | PartitionKey | Range | DistributionKey | Buckets | DataSize | RowCount | CacheTTL | AsyncWrite | AvgCS | P50CS | MaxCS | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | 38028 | store_sales | 913 | 921 | 923 | NORMAL | | | ss_item_sk, ss_ticket_number | 64 | 15.6GB | 273857126 | 2592000 | false | 10.00 | 10.00 | 10.00 | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + 1 row in set (0.20 sec) + ``` + +- システム定義ビュー `information_schema.partitions_meta` をクエリして、パーティションのCompaction Scoreを表示することもできます。 + + 例: + + ```Plain + mysql> SELECT * FROM information_schema.partitions_meta ORDER BY Max_CS LIMIT 10; + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | DB_NAME | TABLE_NAME | PARTITION_NAME | PARTITION_ID | COMPACT_VERSION | VISIBLE_VERSION | VISIBLE_VERSION_TIME | NEXT_VERSION | PARTITION_KEY | PARTITION_VALUE | DISTRIBUTION_KEY | BUCKETS | REPLICATION_NUM | STORAGE_MEDIUM | COOLDOWN_TIME | LAST_CONSISTENCY_CHECK_TIME | IS_IN_MEMORY | IS_TEMP | DATA_SIZE | ROW_COUNT | ENABLE_DATACACHE | AVG_CS | P50_CS | MAX_CS | STORAGE_PATH | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | tpcds_1t | call_center | call_center | 11905 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | cc_call_center_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 12.3KB | 42 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11906/11905 | + | tpcds_1t | web_returns | web_returns | 12030 | 3 | 3 | 2024-03-17 08:40:48 | 4 | | | wr_item_sk, wr_order_number | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 3.5GB | 71997522 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12031/12030 | + | tpcds_1t | warehouse | warehouse | 11847 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | w_warehouse_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 4.2KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11848/11847 | + | tpcds_1t | ship_mode | ship_mode | 11851 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | sm_ship_mode_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 1.7KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11852/11851 | + | tpcds_1t | customer_address | customer_address | 11790 | 0 | 2 | 2024-03-17 08:32:19 | 3 | | | ca_address_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 120.9MB | 6000000 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11791/11790 | + | tpcds_1t | time_dim | time_dim | 11855 | 0 | 2 | 2024-03-17 08:30:48 | 3 | | | t_time_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 864.7KB | 86400 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11856/11855 | + | tpcds_1t | web_sales | web_sales | 12049 | 3 | 3 | 2024-03-17 10:14:20 | 4 | | | ws_item_sk, ws_order_number | 128 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 47.7GB | 720000376 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12050/12049 | + | tpcds_1t | store | store | 11901 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | s_store_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 95.6KB | 1002 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11902/11901 | + | tpcds_1t | web_site | web_site | 11928 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | web_site_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 13.4KB | 54 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11929/11928 | + | tpcds_1t | household_demographics | household_demographics | 11932 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | hd_demo_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 2.1KB | 7200 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11933/11932 | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + ``` + +### Compactionタスクの表示 + +システムに新しいデータがロードされると、FEは異なるCNノードで実行されるCompactionタスクを常にスケジューリングします。最初にFEでCompactionタスクの一般的なステータスを表示し、次にCNで各タスクの実行詳細を表示できます。 + +#### Compactionタスクの一般的なステータスを表示する + +`SHOW PROC` ステートメントを使用して、Compactionタスクの一般的なステータスを表示できます。 + +```SQL +SHOW PROC '/compactions'; +``` + +例: + +```Plain +mysql> SHOW PROC '/compactions'; ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Partition | TxnID | StartTime | CommitTime | FinishTime | Error | Profile | ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ssb.lineorder.10081 | 15 | 2026-01-10 03:29:07 | 2026-01-10 03:29:11 | 2026-01-10 03:29:12 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":219,"write_remote_sec":4,"in_queue_sec":18} | +| ssb.lineorder.10068 | 16 | 2026-01-10 03:29:07 | 2026-01-10 03:29:13 | 2026-01-10 03:29:14 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":38} | +| ssb.lineorder.10055 | 20 | 2026-01-10 03:29:11 | 2026-01-10 03:29:15 | 2026-01-10 03:29:17 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":23} | ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +以下のフィールドが返されます。 + +- `Partition`: Compactionタスクが属するパーティション。 +- `TxnID`: Compactionタスクに割り当てられたトランザクションID。 +- `StartTime`: Compactionタスクが開始された時刻。`NULL` は、タスクがまだ開始されていないことを示します。 +- `CommitTime`: Compactionタスクがデータをコミットした時刻。`NULL` は、データがまだコミットされていないことを示します。 +- `FinishTime`: Compactionタスクがデータを公開した時刻。`NULL` は、データがまだ公開されていないことを示します。 +- `Error`: Compactionタスクのエラーメッセージ(もしあれば)。 +- `Profile`: (v3.2.12およびv3.3.4以降でサポート) 完了後のCompactionタスクのProfile。 + - `sub_task_count`: パーティション内のサブタスク(Tabletに相当)の数。 + - `read_local_sec`: すべてのサブタスクがローカルキャッシュからデータを読み取るのにかかった合計時間。単位: 秒。 + - `read_local_mb`: すべてのサブタスクがローカルキャッシュから読み取ったデータの合計サイズ。単位: MB。 + - `read_remote_sec`: すべてのサブタスクがリモートストレージからデータを読み取るのにかかった合計時間。単位: 秒。 + - `read_remote_mb`: すべてのサブタスクがリモートストレージから読み取ったデータの合計サイズ。単位: MB。 + - `read_segment_count`: すべてのサブタスクによって読み取られたファイルの総数。 + - `write_segment_count`: すべてのサブタスクによって生成された新しいファイルの総数。 + - `write_segment_mb`: すべてのサブタスクによって生成された新しいファイルの合計サイズ。単位: MB。 + - `write_remote_sec`: すべてのサブタスクがリモートストレージにデータを書き込むのにかかった合計時間。単位: 秒。 + - `in_queue_sec`: すべてのサブタスクがキューに滞留した合計時間。単位: 秒。 + +#### Compactionタスクの実行詳細を表示する + +各Compactionタスクは複数のサブタスクに分割され、それぞれがTabletに対応します。システム定義ビュー `information_schema.be_cloud_native_compactions` をクエリして、各サブタスクの実行詳細を表示できます。 + +例: + +```Plain +mysql> SELECT * FROM information_schema.be_cloud_native_compactions; ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| BE_ID | TXN_ID | TABLET_ID | VERSION | SKIPPED | RUNS | START_TIME | FINISH_TIME | PROGRESS | STATUS | PROFILE | ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 10001 | 51047 | 43034 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51048 | 43032 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":32,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51049 | 43033 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51051 | 43038 | 9 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 84 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51052 | 43036 | 12 | 0 | 0 | NULL | NULL | 0 | | | +| 10001 | 51053 | 43035 | 12 | 0 | 1 | 2024-09-24 19:15:16 | NULL | 2 | | {"read_local_sec":0,"read_local_mb":1,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":100,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + +以下のフィールドが返されます。 + +- `BE_ID`: CNのID。 +- `TXN_ID`: サブタスクが属するトランザクションのID。 +- `TABLET_ID`: サブタスクが属するTabletのID。 +- `VERSION`: Tabletのバージョン。 +- `RUNS`: サブタスクが実行された回数。 +- `START_TIME`: サブタスクが開始された時刻。 +- `FINISH_TIME`: サブタスクが完了した時刻。 +- `PROGRESS`: TabletのCompaction進捗状況(パーセンテージ)。 +- `STATUS`: サブタスクのステータス。エラーがある場合は、このフィールドにエラーメッセージが返されます。 +- `PROFILE`: (v3.2.12およびv3.3.4以降でサポート) サブタスクのランタイムプロファイル。 + - `read_local_sec`: サブタスクがローカルキャッシュからデータを読み取るのにかかった時間。単位: 秒。 + - `read_local_mb`: サブタスクがローカルキャッシュから読み取ったデータのサイズ。単位: MB。 + - `read_remote_sec`: サブタスクがリモートストレージからデータを読み取るのにかかった時間。単位: 秒。 + - `read_remote_mb`: サブタスクがリモートストレージから読み取ったデータのサイズ。単位: MB。 + - `read_local_count`: サブタスクがローカルキャッシュからデータを読み取った回数。 + - `read_remote_count`: サブタスクがリモートストレージからデータを読み取った回数。 + - `in_queue_sec`: サブタスクがキューに滞留した時間。単位: 秒。 + +### Compactionタスクの設定 + +これらのFEおよびCN (BE) パラメーターを使用してCompactionタスクを設定できます。 + +#### FEパラメーター + +以下のFEパラメーターは動的に設定できます。 + +```SQL +ADMIN SET FRONTEND CONFIG ("lake_compaction_max_tasks" = "-1"); +``` + +##### lake_compaction_max_tasks + +- デフォルト: -1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターで許可される同時Compactionタスクの最大数。この項目を`-1`に設定すると、生存しているCNノードの数に16を掛けた値として、同時タスク数を適応的に計算することを示します。この値を`0`に設定すると、Compactionが無効になります。 +- 導入バージョン: v3.1.0 + +```SQL +ADMIN SET FRONTEND CONFIG ("lake_compaction_disable_tables" = "11111;22222"); +``` + +##### lake_compaction_disable_tables + +- デフォルト: "" +- タイプ: String +- 単位: - +- 変更可能: はい +- 説明: 特定のテーブルのCompactionを無効にします。これは開始済みのCompactionには影響しません。この項目の値はテーブルIDです。複数の値は`;`で区切られます。 +- 導入バージョン: v3.2.7 + +#### CNパラメーター + +以下のCNパラメーターは動的に設定できます。 + +```SQL +UPDATE information_schema.be_configs SET VALUE = 8 +WHERE name = "compact_threads"; +``` + +##### compact_threads + +- デフォルト: 4 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 同時Compactionタスクに使用されるスレッドの最大数。この設定は、v3.1.7およびv3.2.2以降で動的に変更可能になりました。 +- 導入バージョン: v3.0.0 + +> **注記** +> +> 本番環境では、`compact_threads` をBE/CNのCPUコア数の25%に設定することをお勧めします。 + +##### max_cumulative_compaction_num_singleton_deltas + +- デフォルト: 500 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 単一のCumulative Compactionでマージできるセグメントの最大数。Compaction中にOOMが発生する場合、この値を減らすことができます。 +- 導入バージョン: - + +> **注記** +> +> 本番環境では、Compactionタスクを高速化し、リソース消費を削減するために、`max_cumulative_compaction_num_singleton_deltas` を `100` に設定することをお勧めします。 + +##### lake_pk_compaction_max_input_rowsets + +- デフォルト: 500 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのPrimary KeyテーブルのCompactionタスクで許可される入力Rowsetの最大数。このパラメーターのデフォルト値は、v3.2.4およびv3.1.10以降で`5`から`1000`に、v3.3.1およびv3.2.9以降で`500`に変更されました。Primary Keyテーブルでサイズ階層型Compactionポリシーが有効化された後 (`enable_pk_size_tiered_compaction_strategy` を `true` に設定することで)、StarRocksは書き込み増幅を減らすために各CompactionのRowset数を制限する必要がなくなりました。したがって、このパラメーターのデフォルト値は増加しています。 +- 導入バージョン: v3.1.8, v3.2.3 + +> **注記** +> +> 本番環境では、Compactionタスクを高速化し、リソース消費を削減するために、`max_cumulative_compaction_num_singleton_deltas` を `100` に設定することをお勧めします。 + +### 手動でCompactionタスクをトリガーする + +```SQL +-- テーブル全体に対してCompactionをトリガーします。 +ALTER TABLE COMPACT; +-- 特定のパーティションに対してCompactionをトリガーします。 +ALTER TABLE COMPACT ; +-- 複数のパーティションに対してCompactionをトリガーします。 +ALTER TABLE COMPACT (, , ...); +``` + +### Compactionタスクのキャンセル + +タスクのトランザクションIDを使用して、Compactionタスクを手動でキャンセルできます。 + +```SQL +CANCEL COMPACTION WHERE TXN_ID = ; +``` + +> **注記** +> +> - `CANCEL COMPACTION` ステートメントはLeader FEノードから送信する必要があります。 +> - `CANCEL COMPACTION` ステートメントは、まだコミットされていないトランザクション、つまり `SHOW PROC '/compactions'` の戻り値で `CommitTime` がNULLであるトランザクションにのみ適用されます。 +> - `CANCEL COMPACTION` は非同期プロセスです。タスクがキャンセルされたかどうかは、`SHOW PROC '/compactions'` を実行して確認できます。 + +## ベストプラクティス + +Compactionはクエリパフォーマンスにとって非常に重要であるため、テーブルとパーティションのデータマージ状況を定期的に監視することをお勧めします。以下にいくつかのベストプラクティスとガイドラインを示します。 + +- ロード間の時間間隔を長くし(10秒未満の間隔のシナリオは避ける)、ロードあたりのバッチサイズを大きくするよう努めます(100行未満のデータバッチは避ける)。 +- CN上の並列Compactionワーカー スレッドの数を調整して、タスクの実行を高速化します。本番環境では、`compact_threads` をBE/CNのCPUコア数の25%に設定することをお勧めします。 +- `show proc '/compactions'` および `select * from information_schema.be_cloud_native_compactions;` を使用してCompactionタスクのステータスを監視します。 +- Compaction Scoreを監視し、それに基づいてアラートを設定します。StarRocksの組み込みGrafana監視テンプレートには、このメトリックが含まれています。 +- Compaction中のリソース消費、特にメモリ使用量に注意してください。Grafana監視テンプレートには、このメトリックも含まれています。 + +## トラブルシューティング + +### 遅いクエリ + +適時でないCompactionによって引き起こされる遅いクエリを特定するには、SQL Profileで、単一のFragment内の `SegmentsReadCount` を `TabletCount` で割った値を確認できます。その値が数十以上のような大きな値である場合、Compactionが適時でないことが遅いクエリの原因である可能性があります。 + +### クラスター内の高いMax Compaction Score + +1. `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` および `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"` を使用して、Compaction関連のパラメーターが適切な範囲内にあるかどうかを確認します。 +2. `SHOW PROC '/compactions'` を使用してCompactionがスタックしているかどうかを確認します。 + - `CommitTime` がNULLのままである場合、システムビュー `information_schema.be_cloud_native_compactions` を調べてCompactionがスタックしている理由を確認します。 + - `FinishTime` がNULLのままである場合、Leader FEログで `TxnID` を使用して公開失敗の理由を検索します。 +3. `SHOW PROC '/compactions'` を使用してCompactionが遅く実行されているかどうかを確認します。 + - `sub_task_count` が大きすぎる場合(`SHOW PARTITIONS` を使用してこのパーティション内の各Tabletのサイズを確認)、テーブルが不適切に作成されている可能性があります。 + - `read_remote_mb` が大きすぎる場合(読み取りデータの合計の30%を超える場合)、サーバーのディスクサイズを確認し、`SHOW BACKENDS` で `DataCacheMetrics` フィールドを通じてキャッシュクォータも確認します。 + - `write_remote_sec` が大きすぎる場合(Compactionの合計時間の90%を超える場合)、リモートストレージへの書き込みが遅すぎる可能性があります。これは、キーワード `single upload latency` および `multi upload latency` を含む共有データ固有の監視メトリックを確認することで検証できます。 + - `in_queue_sec` が大きすぎる場合(Tabletあたりの平均待機時間が60秒を超える場合)、パラメーター設定が不適切であるか、他の実行中のCompactionが遅すぎる可能性があります。 +--- diff --git a/docs/zh/administration/management/BE_blacklist.md b/docs/zh/administration/management/BE_blacklist.md new file mode 100644 index 0000000..ae39359 --- /dev/null +++ b/docs/zh/administration/management/BE_blacklist.md @@ -0,0 +1,102 @@ +--- +displayed_sidebar: docs +--- + +# 管理 BE 和 CN 黑名单 + +从 v3.3.0 版本开始,StarRocks 支持 BE 黑名单功能,该功能允许您禁止在查询执行中使用某些 BE 节点,从而避免因 BE 节点连接失败而导致的频繁查询失败或其他意外行为。例如,当网络问题阻止连接到一个或多个 BE 时,就可以使用黑名单。 + +从 v4.0 版本开始,StarRocks 支持将 Compute Node (CN) 添加到黑名单。 + +默认情况下,StarRocks 可以自动管理 BE 和 CN 黑名单,将失去连接的 BE 或 CN 节点添加到黑名单中,并在连接重新建立时将其从黑名单中移除。但是,如果节点是手动加入黑名单的,StarRocks 不会将其从黑名单中移除。 + +:::note + +- 只有拥有 SYSTEM-level BLACKLIST 权限的用户才能使用此功能。 +- 每个 FE 节点都维护自己的 BE 和 CN 黑名单,并且不会与其他 FE 节点共享。 + +::: + +## 将 BE/CN 添加到黑名单 + +您可以使用 [ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md) 手动将 BE/CN 节点添加到黑名单中。在此语句中,您必须指定要加入黑名单的 BE/CN 节点的 ID。您可以通过执行 [SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md) 获取 BE ID,通过执行 [SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md) 获取 CN ID。 + +示例: + +```SQL +-- 获取 BE ID。 +SHOW BACKENDS\G +*************************** 1. row *************************** + BackendId: 10001 + IP: xxx.xx.xx.xxx + ... +-- 将 BE 添加到黑名单。 +ADD BACKEND BLACKLIST 10001; + +-- 获取 CN ID。 +SHOW COMPUTE NODES\G +*************************** 1. row *************************** + ComputeNodeId: 10005 + IP: xxx.xx.xx.xxx + ... +-- 将 CN 添加到黑名单。 +ADD COMPUTE NODE BLACKLIST 10005; +``` + +## 将 BE/CN 从黑名单中移除 + +您可以使用 [DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md) 手动将 BE/CN 节点从黑名单中移除。在此语句中,您也必须指定 BE/CN 节点的 ID。 + +示例: + +```SQL +-- 将 BE 从黑名单中移除。 +DELETE BACKEND BLACKLIST 10001; + +-- 将 CN 从黑名单中移除。 +DELETE COMPUTE NODE BLACKLIST 10005; +``` + +## 查看 BE/CN 黑名单 + +您可以使用 [SHOW BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKEND_BLACKLIST.md) 查看黑名单中的 BE/CN 节点。 + +示例: + +```SQL +-- 查看 BE 黑名单。 +SHOW BACKEND BLACKLIST; ++-----------+------------------+---------------------+------------------------------+--------------------+ +| BackendId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | ++-----------+------------------+---------------------+------------------------------+--------------------+ +| 10001 | MANUAL | 2024-04-28 11:52:09 | 0 | 5 | ++-----------+------------------+---------------------+------------------------------+--------------------+ + +-- 查看 CN 黑名单。 +SHOW COMPUTE NODE BLACKLIST; ++---------------+------------------+---------------------+------------------------------+--------------------+ +| ComputeNodeId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | ++---------------+------------------+---------------------+------------------------------+--------------------+ +| 10005 | MANUAL | 2025-08-18 10:47:51 | 0 | 5 | ++---------------+------------------+---------------------+------------------------------+--------------------+ +``` + +返回以下字段: + +- `AddBlackListType`: BE/CN 节点是如何被添加到黑名单的。`MANUAL` 表示用户手动将其加入黑名单。`AUTO` 表示 StarRocks 自动将其加入黑名单。 +- `LostConnectionTime`: + - 对于 `MANUAL` 类型,表示 BE/CN 节点被手动添加到黑名单的时间。 + - 对于 `AUTO` 类型,表示上次成功建立连接的时间。 +- `LostConnectionNumberInPeriod`: 在 `CheckTimePeriod(s)` 内检测到的断开连接次数,`CheckTimePeriod(s)` 是 StarRocks 检查黑名单中 BE/CN 节点连接状态的间隔。 +- `CheckTimePeriod(s)`: StarRocks 检查黑名单中 BE/CN 节点连接状态的间隔。其值等于您为 FE 配置项 `black_host_history_sec` 指定的值。单位:秒。 + +## 配置 BE/CN 黑名单的自动管理 + +每当 BE/CN 节点失去与 FE 节点的连接,或者由于 BE/CN 节点上的查询超时而失败时,FE 节点都会将该 BE/CN 节点添加到其 BE 和 CN 黑名单中。FE 节点将通过计算其在特定时间段内的连接失败次数,持续评估黑名单中 BE/CN 节点的连接性。仅当 BE/CN 节点的连接失败次数低于预设阈值时,StarRocks 才会将其从黑名单中移除。 + +您可以使用以下 [FE 配置](./FE_configuration.md) 配置 BE 和 CN 黑名单的自动管理: + +- `black_host_history_sec`: 黑名单中保留 BE/CN 节点历史连接失败记录的时间长度。 +- `black_host_connect_failures_within_time`: 允许黑名单中的 BE/CN 节点发生连接失败的阈值。 + +如果 BE/CN 节点是自动添加到黑名单的,StarRocks 将评估其连接性并判断是否可以将其从黑名单中移除。在 `black_host_history_sec` 期间,只有当黑名单中的 BE/CN 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从黑名单中移除。 diff --git a/docs/zh/administration/management/BE_configuration.md b/docs/zh/administration/management/BE_configuration.md new file mode 100644 index 0000000..b0ff5c3 --- /dev/null +++ b/docs/zh/administration/management/BE_configuration.md @@ -0,0 +1,3473 @@ +--- +displayed_sidebar: docs +--- + +import BEConfigMethod from '../../_assets/commonMarkdown/BE_config_method.mdx' + +import CNConfigMethod from '../../_assets/commonMarkdown/CN_config_method.mdx' + +import PostBEConfig from '../../_assets/commonMarkdown/BE_dynamic_note.mdx' + +import StaticBEConfigNote from '../../_assets/commonMarkdown/StaticBE_config_note.mdx' + +# BE 配置 + + + + + +## 查看 BE 配置项 + +您可以使用以下命令查看 BE 配置项: + +```shell +curl http://:/varz +``` + +## 配置 BE 参数 + + + + + +## 理解 BE 参数 + +### 日志 + +##### diagnose_stack_trace_interval_ms + +- 默认值: 1800000 (30 分钟) +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 控制 DiagnoseDaemon 对 `STACK_TRACE` 请求执行连续堆栈跟踪诊断之间的最短时间间隔。当诊断请求到达时,如果上次收集堆栈跟踪的时间距今不到 `diagnose_stack_trace_interval_ms` 毫秒,则守护进程会跳过收集和记录堆栈跟踪。增加此值可减少频繁堆栈转储带来的 CPU 开销和日志量;减少此值可捕获更频繁的跟踪以调试瞬时问题(例如,在长时间 `TabletsChannel::add_chunk` 阻塞的负载故障点模拟中)。 +- 引入版本: v3.5.0 + +##### lake_replication_slow_log_ms + +- 默认值: 30000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: Lake Replication 期间发出慢日志条目的阈值。每次文件复制后,代码会测量以微秒为单位的经过时间,并在经过时间大于或等于 `lake_replication_slow_log_ms * 1000` 时将操作标记为慢速。触发时,StarRocks 会写入一条 INFO 日志,其中包含该复制文件的文件大小、成本和跟踪指标。增加此值可减少大型/慢速传输中嘈杂的慢日志;减少此值可更快地检测并发现较小的慢速复制事件。 +- 引入版本: - + +##### load_rpc_slow_log_frequency_threshold_seconds + +- 默认值: 60 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 控制系统打印超出其配置的 RPC 超时的加载 RPC 慢日志条目的频率。慢日志还包括加载通道运行时配置文件。将此值设置为 0 实际上会导致每个超时都进行日志记录。 +- 引入版本: v3.4.3, v3.5.0 + +##### log_buffer_level + +- 默认值: 空字符串 +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 刷新日志的策略。默认值表示日志在内存中缓冲。有效值为 `-1` 和 `0`。`-1` 表示日志不在内存中缓冲。 +- 引入版本: - + +##### pprof_profile_dir + +- 默认值: `${STARROCKS_HOME}/log` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: StarRocks 写入 pprof 构件(Jemalloc 堆快照和 gperftools CPU 配置文件)的目录路径。 +- 引入版本: v3.2.0 + +##### sys_log_dir + +- 默认值: `${STARROCKS_HOME}/log` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 存储系统日志(包括 INFO、WARNING、ERROR 和 FATAL)的目录。 +- 引入版本: - + +##### sys_log_level + +- 默认值: INFO +- 类型: String +- 单位: - +- 可变: 是 (从 v3.3.0, v3.2.7 和 v3.1.12 开始) +- 描述: 系统日志条目的严重性级别。有效值:INFO、WARN、ERROR 和 FATAL。此项从 v3.3.0、v3.2.7 和 v3.1.12 版本开始更改为动态配置。 +- 引入版本: - + +##### sys_log_roll_mode + +- 默认值: SIZE-MB-1024 +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 系统日志分段为日志卷的模式。有效值包括 `TIME-DAY`、`TIME-HOUR` 和 `SIZE-MB-`size。默认值表示日志被分段为大小为 1 GB 的日志卷。 +- 引入版本: - + +##### sys_log_roll_num + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 保留的日志卷数量。 +- 引入版本: - + +##### sys_log_timezone + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否在日志前缀中显示时区信息。`true` 表示显示时区信息,`false` 表示不显示。 +- 引入版本: - + +##### sys_log_verbose_level + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 要打印的日志级别。此配置项用于控制代码中以 VLOG 启动的日志的输出。 +- 引入版本: - + +##### sys_log_verbose_modules + +- 默认值: +- 类型: Strings +- 单位: - +- 可变: 否 +- 描述: 要打印的日志模块。例如,如果将此配置项设置为 OLAP,StarRocks 只打印 OLAP 模块的日志。有效值为 BE 中的命名空间,包括 `starrocks`、`starrocks::debug`、`starrocks::fs`、`starrocks::io`、`starrocks::lake`、`starrocks::pipeline`、`starrocks::query_cache`、`starrocks::stream` 和 `starrocks::workgroup`。 +- 引入版本: - + +### 服务器 + +##### abort_on_large_memory_allocation + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 当单个分配请求超过配置的大分配阈值(`g_large_memory_alloc_failure_threshold > 0` 且请求大小 `>` 阈值)时,此标志控制进程的响应方式。如果为 true,当检测到此类大分配时,StarRocks 会立即调用 `std::abort()`(硬崩溃)。如果为 false,则分配被阻塞,分配器返回失败(nullptr 或 ENOMEM),以便调用者可以处理错误。此检查仅对未通过 `TRY_CATCH_BAD_ALLOC` 路径包装的分配生效(当捕获到 bad-alloc 时,内存 hook 使用不同的流程)。启用此项可用于对意外的巨大分配进行快速失败调试;除非您希望在尝试进行超大分配时立即中止进程,否则在生产环境中请保持禁用状态。 +- 引入版本: v3.4.3, 3.5.0, 4.0.0 + +##### arrow_flight_port + +- 默认值: -1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE Arrow Flight SQL 服务器的 TCP 端口。`-1` 表示禁用 Arrow Flight 服务。在非 macOS 构建中,BE 在启动期间会使用此端口调用 Arrow Flight SQL 服务器;如果端口不可用,服务器启动将失败,BE 进程会退出。配置的端口会在心跳载荷中报告给 FE。 +- 引入版本: v3.4.0, v3.5.0 + +##### be_exit_after_disk_write_hang_second + +- 默认值: 60 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: 磁盘挂起后 BE 等待退出的时间长度。 +- 引入版本: - + +##### be_http_num_workers + +- 默认值: 48 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: HTTP 服务器使用的线程数。 +- 引入版本: - + +##### be_http_port + +- 默认值: 8040 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE HTTP 服务器端口。 +- 引入版本: - + +##### be_port + +- 默认值: 9060 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE Thrift 服务器端口,用于接收来自 FE 的请求。 +- 引入版本: - + +##### be_service_threads + +- 默认值: 64 +- 类型: Int +- 单位: 线程 +- 可变: 否 +- 描述: BE Thrift 服务器用于处理后端 RPC/执行请求的工作线程数。此值在创建 BackendService 时传递给 ThriftServer,并控制有多少并发请求处理器可用;当所有工作线程都忙时,请求会排队。根据预期的并发 RPC 负载和可用的 CPU/内存进行调整:增加它会提高并发性,但也会增加每个线程的内存和上下文切换成本;减少它会限制并行处理,并可能增加请求延迟。 +- 引入版本: v3.2.0 + +##### brpc_connection_type + +- 默认值: `"single"` +- 类型: string +- 单位: - +- 可变: 否 +- 描述: bRPC 通道连接模式。有效值: + - `"single"` (默认): 每个通道一个持久 TCP 连接。 + - `"pooled"`: 一个持久连接池,以更高的并发性为代价,但会使用更多套接字/文件描述符。 + - `"short"`: 每个 RPC 创建的短生命周期连接,以减少持久资源使用但延迟较高。 + 选择会影响每个套接字的缓冲行为,并可能在未写入字节超过套接字限制时影响 `Socket.Write` 失败 (EOVERCROWDED)。 +- 引入版本: v3.2.5 + +##### brpc_max_body_size + +- 默认值: 2147483648 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: bRPC 的最大主体大小。 +- 引入版本: - + +##### brpc_max_connections_per_server + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 客户端为每个远程服务器端点保留的最大持久 bRPC 连接数。对于每个端点,`BrpcStubCache` 创建一个 `StubPool`,其 `_stubs` 向量被保留为此大小。首次访问时,会创建新的存根,直到达到限制。之后,现有存根会以轮询方式返回。增加此值会提高每个端点的并发性(减少单个通道上的争用),但会增加文件描述符、内存和通道的成本。 +- 引入版本: v3.2.0 + +##### brpc_num_threads + +- 默认值: -1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: bRPC 的 bthread 数量。值 `-1` 表示与 CPU 线程数相同。 +- 引入版本: - + +##### brpc_port + +- 默认值: 8060 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE bRPC 端口,用于查看 bRPC 的网络统计信息。 +- 引入版本: - + +##### brpc_socket_max_unwritten_bytes + +- 默认值: 1073741824 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 设置 bRPC 服务器中每个套接字未写入的出站字节的限制。当套接字上缓冲的、尚未写入的数据量达到此限制时,后续的 `Socket.Write` 调用将以 EOVERCROWDED 失败。这可以防止每个连接的内存无限制增长,但可能会导致非常大的消息或慢速对等体的 RPC 发送失败。将此值与 `brpc_max_body_size` 对齐,以确保单个消息体不会大于允许的未写入缓冲区。增加此值会增加每个连接的内存使用量。 +- 引入版本: v3.2.0 + +##### brpc_stub_expire_s + +- 默认值: 3600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: bRPC 存根缓存的过期时间。默认值为 60 分钟。 +- 引入版本: - + +##### compress_rowbatches + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 一个布尔值,控制是否在 BE 之间的 RPC 中压缩 Row Batch。`true` 表示压缩 Row Batch,`false` 表示不压缩。 +- 引入版本: - + +##### consistency_max_memory_limit_percent + +- 默认值: 20 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于计算与一致性相关任务的内存预算的百分比上限。在 BE 启动期间,最终的一致性限制被计算为从 `consistency_max_memory_limit`(字节)解析的值与 (`process_mem_limit * consistency_max_memory_limit_percent / 100`) 中的最小值。如果 `process_mem_limit` 未设置 (-1),则一致性内存被视为无限制。对于 `consistency_max_memory_limit_percent`,小于 0 或大于 100 的值被视为 100。调整此值会增加或减少为一致性操作保留的内存,因此会影响查询和其他服务可用的内存。 +- 引入版本: v3.2.0 + +##### delete_worker_count_normal_priority + +- 默认值: 2 +- 类型: Int +- 单位: 线程 +- 可变: 否 +- 描述: 专门用于处理 BE Agent 上删除 (REALTIME_PUSH with DELETE) 任务的普通优先级工作线程数。在启动时,此值会添加到 `delete_worker_count_high_priority` 以确定 DeleteTaskWorkerPool 的大小(参见 agent_server.cpp)。该池将前 `delete_worker_count_high_priority` 个线程分配为 HIGH 优先级,其余为 NORMAL;普通优先级线程处理标准删除任务并有助于提高总体删除吞吐量。增加此值可提高并发删除容量(更高的 CPU/IO 使用率);减少此值可降低资源争用。 +- 引入版本: v3.2.0 + +##### disable_mem_pools + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否禁用 MemPool。当此项设置为 `true` 时,MemPool 块池化被禁用,因此每个分配都会获得其自己的大小块,而不是重用或增加池化块。禁用池化会减少长期保留的缓冲区内存,但会增加分配频率、块数量增加并跳过完整性检查(由于块数量过多而避免)。保持 `disable_mem_pools` 为 `false`(默认)以受益于分配重用和更少的系统调用。仅当您必须避免大量池化内存保留(例如,低内存环境或诊断运行)时才将其设置为 `true`。 +- 引入版本: v3.2.0 + +##### enable_https + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 当此项设置为 `true` 时,BE 的 bRPC 服务器被配置为使用 TLS:`ServerOptions.ssl_options` 将在 BE 启动时填充 `ssl_certificate_path` 和 `ssl_private_key_path` 指定的证书和私钥。这为传入的 bRPC 连接启用 HTTPS/TLS;客户端必须使用 TLS 连接。确保证书和密钥文件存在,BE 进程可访问,并且符合 bRPC/SSL 期望。 +- 引入版本: v4.0.0 + +##### enable_jemalloc_memory_tracker + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 当此项设置为 `true` 时,BE 会启动一个后台线程 (jemalloc_tracker_daemon),该线程每秒轮询 Jemalloc 统计信息一次,并使用 Jemalloc "stats.metadata" 值更新 GlobalEnv Jemalloc 元数据 MemTracker。这确保了 Jemalloc 元数据消耗被纳入 StarRocks 进程内存核算,并防止了 Jemalloc 内部使用的内存报告不足。该跟踪器仅在非 macOS 构建中编译/启动 (#ifndef __APPLE__) 并作为名为 "jemalloc_tracker_daemon" 的守护线程运行。由于此设置影响启动行为和维护 MemTracker 状态的线程,因此更改它需要重新启动。仅当未使用 Jemalloc 或 Jemalloc 跟踪以不同方式有意管理时才禁用;否则保持启用以维护准确的内存核算和分配安全措施。 +- 引入版本: v3.2.12 + +##### enable_jvm_metrics + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 控制系统是否在启动时初始化和注册 JVM 特定的指标。启用时,指标子系统将创建 JVM 相关的收集器(例如,堆、GC 和线程指标)用于导出,禁用时,这些收集器不会初始化。此参数旨在向前兼容,并可能在未来版本中删除。使用 `enable_system_metrics` 控制系统级指标收集。 +- 引入版本: v4.0.0 + +##### get_pindex_worker_count + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 设置 UpdateManager 中 "get_pindex" 线程池的工作线程数,该线程池用于加载/获取持久化索引数据(在应用主键表的 Rowset 时使用)。在运行时,配置更新将调整池的最大线程数:如果 `>0`,则应用该值;如果为 0,则运行时回调使用 CPU 核心数 (CpuInfo::num_cores())。在初始化时,池的最大线程数计算为 max(get_pindex_worker_count, max_apply_thread_cnt * 2),其中 `max_apply_thread_cnt` 是 apply 线程池的最大值。增加此值可提高 pindex 加载的并行度;降低此值可减少并发性并降低内存/CPU 使用率。 +- 引入版本: v3.2.0 + +##### heartbeat_service_port + +- 默认值: 9050 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE 心跳服务端口,用于接收来自 FE 的心跳。 +- 引入版本: - + +##### heartbeat_service_thread_count + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE 心跳服务的线程数。 +- 引入版本: - + +##### local_library_dir + +- 默认值: `${UDF_RUNTIME_DIR}` +- 类型: string +- 单位: - +- 可变: 否 +- 描述: BE 上的本地目录,用于暂存 UDF(用户定义函数)库以及 Python UDF 工作进程操作的目录。StarRocks 将 UDF 库从 HDFS 复制到此路径,在 `/pyworker_` 创建每个工作进程的 Unix 域套接字,并在执行前将 Python 工作进程的当前目录更改为该目录。该目录必须存在,BE 进程可写入,并位于支持 Unix 域套接字(即本地文件系统)的文件系统上。由于此配置在运行时不可变,请在启动前设置,并确保每个 BE 上具有足够的权限和磁盘空间。 +- 引入版本: v3.2.0 + +##### max_transmit_batched_bytes + +- 默认值: 262144 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 在刷新到网络之前,单个传输请求中要累积的最大序列化字节数。发送方实现将序列化的 ChunkPB 载荷添加到 PTransmitChunkParams 请求中,并在累积字节数超过 `max_transmit_batched_bytes` 或达到 EOS 时发送请求。增加此值可减少 RPC 频率并提高吞吐量,但会以更高的每个请求延迟和内存使用为代价;减少此值可降低延迟和内存,但会增加 RPC 速率。 +- 引入版本: v3.2.0 + +##### mem_limit + +- 默认值: 90% +- 类型: String +- 单位: - +- 可变: 否 +- 描述: BE 进程内存上限。可以将其设置为百分比("80%")或物理限制("100G")。默认硬限制是服务器内存大小的 90%,软限制是 80%。如果要在同一服务器上部署 StarRocks 和其他内存密集型服务,则需要配置此参数。 +- 引入版本: - + +##### memory_max_alignment + +- 默认值: 16 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 设置 MemPool 将接受的对齐分配的最大字节对齐。仅当调用者需要更大的对齐时(例如,用于 SIMD、设备缓冲区或 ABI 约束)才增加此值。较大的值会增加每个分配的填充和保留内存浪费,并且必须保持在系统分配器和平台支持的范围内。 +- 引入版本: v3.2.0 + +##### memory_urgent_level + +- 默认值: 85 +- 类型: long +- 单位: 百分比 (0-100) +- 可变: 是 +- 描述: 以进程内存限制的百分比表示的紧急内存水位。当进程内存消耗超过 `(limit * memory_urgent_level / 100)` 时,BE 会触发即时内存回收,这将强制数据缓存收缩,逐出更新缓存,并导致持久/Lake MemTable 被视为“满”,以便它们将很快被刷新/压缩。代码验证此设置必须大于 `memory_high_level`,并且 `memory_high_level` 必须大于或等于 `1` 且小于或等于 `100`。较低的值会导致更激进、更早的回收,即更频繁的缓存逐出和刷新。较高的值会延迟回收,如果太接近 100,则存在 OOM 风险。将此项与 `memory_high_level` 和数据缓存相关自动调整设置一起调整。 +- 引入版本: v3.2.0 + +##### net_use_ipv6_when_priority_networks_empty + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 一个布尔值,控制当 `priority_networks` 未指定时是否优先使用 IPv6 地址。`true` 表示当托管节点的服务器同时具有 IPv4 和 IPv6 地址且 `priority_networks` 未指定时,允许系统优先使用 IPv6 地址。 +- 引入版本: v3.3.0 + +##### num_cores + +- 默认值: 0 +- 类型: Int +- 单位: 核心 +- 可变: 否 +- 描述: 控制系统用于 CPU 敏感决策(例如,线程池大小调整和运行时调度)的 CPU 核心数。值为 0 启用自动检测:系统读取 `/proc/cpuinfo` 并使用所有可用核心。如果设置为正整数,则该值将覆盖检测到的核心数并成为有效的核心数。在容器内运行时,cgroup cpuset 或 cpu quota 设置可以进一步限制可用的核心;`CpuInfo` 也尊重这些 cgroup 限制。 +- 引入版本: v3.2.0 + +##### plugin_path + +- 默认值: `${STARROCKS_HOME}/plugin` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: StarRocks 加载外部插件(动态库、连接器构件、UDF 二进制文件等)的文件系统目录。`plugin_path` 应该指向 BE 进程可访问的目录(读取和执行权限),并且在加载插件之前必须存在。确保正确的权限,并且插件文件使用平台的本地二进制扩展名(例如,Linux 上的 .so)。 +- 引入版本: v3.2.0 + +##### priority_networks + +- 默认值: 空字符串 +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 声明服务器有多个 IP 地址时的选择策略。请注意,最多一个 IP 地址必须与此参数指定的列表匹配。此参数的值是一个列表,由 CIDR 表示法中以分号 (;) 分隔的条目组成,例如 `10.10.10.0/24`。如果没有 IP 地址与此列表中的条目匹配,将随机选择服务器的一个可用 IP 地址。从 v3.3.0 开始,StarRocks 支持基于 IPv6 的部署。如果服务器同时具有 IPv4 和 IPv6 地址,并且未指定此参数,系统默认使用 IPv4 地址。可以通过将 `net_use_ipv6_when_priority_networks_empty` 设置为 `true` 来更改此行为。 +- 引入版本: - + +##### rpc_compress_ratio_threshold + +- 默认值: 1.1 +- 类型: Double +- 单位: - +- 可变: 是 +- 描述: 决定是否以压缩形式通过网络发送序列化 Row Batch 时使用的阈值 (uncompressed_size / compressed_size)。当尝试压缩时(例如,在 DataStreamSender、exchange sink、tablet sink index channel、dictionary cache writer 中),StarRocks 计算 compress_ratio = uncompressed_size / compressed_size;仅当 compress_ratio `>` rpc_compress_ratio_threshold 时才使用压缩载荷。默认值为 1.1,压缩数据必须至少比未压缩数据小约 9.1% 才能被使用。降低此值以优先使用压缩(更多 CPU 用于更小的带宽节省);提高此值以避免压缩开销,除非它能产生更大的尺寸减小。注意:这适用于 RPC/shuffle 序列化,并且仅当 Row Batch 压缩启用时 (compress_rowbatches) 才有效。 +- 引入版本: v3.2.0 + +##### ssl_private_key_path + +- 默认值: 空字符串 +- 类型: String +- 单位: - +- 可变: 否 +- 描述: BE 的 bRPC 服务器用作默认证书私钥的 TLS/SSL 私钥 (PEM) 文件系统路径。当 `enable_https` 设置为 `true` 时,系统在进程启动时将 `brpc::ServerOptions::ssl_options().default_cert.private_key` 设置为此路径。文件必须可由 BE 进程访问,并且必须与 `ssl_certificate_path` 提供的证书匹配。如果未设置此值或文件丢失或无法访问,HTTPS 将无法配置,bRPC 服务器可能无法启动。使用限制性的文件系统权限(例如 600)保护此文件。 +- 引入版本: v4.0.0 + +##### thrift_client_retry_interval_ms + +- 默认值: 100 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: Thrift 客户端重试的时间间隔。 +- 引入版本: - + +##### thrift_connect_timeout_seconds + +- 默认值: 3 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: 创建 Thrift 客户端时使用的连接超时(秒)。ClientCacheHelper::_create_client 将此值乘以 1000 并将其传递给 ThriftClientImpl::set_conn_timeout(),因此它控制 BE 客户端缓存打开的新 Thrift 连接的 TCP/连接握手超时。此设置仅影响连接建立;发送/接收超时是单独配置的。非常小的值可能会在高延迟网络上导致虚假连接失败,而大值会延迟检测不可达的对等体。 +- 引入版本: v3.2.0 + +##### thrift_port + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于导出内部基于 Thrift 的 BackendService 的端口。当进程作为 Compute Node 运行且此项设置为非零值时,它会覆盖 `be_port`,Thrift 服务器绑定到此值;否则使用 `be_port`。此配置已弃用 — 设置非零 `thrift_port` 会记录警告,建议改用 `be_port`。 +- 引入版本: v3.2.0 + +##### thrift_rpc_connection_max_valid_time_ms + +- 默认值: 5000 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: Thrift RPC 连接的最大有效时间。如果连接在连接池中存在时间超过此值,则连接将被关闭。它必须与 FE 配置 `thrift_client_timeout_ms` 保持一致。 +- 引入版本: - + +##### thrift_rpc_max_body_size + +- 默认值: 0 +- 类型: Int +- 单位: +- 可变: 否 +- 描述: RPC 的最大字符串主体大小。`0` 表示大小不受限制。 +- 引入版本: - + +##### thrift_rpc_strict_mode + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否启用 Thrift 的严格执行模式。有关 Thrift 严格模式的更多信息,请参阅 [Thrift Binary protocol encoding](https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md)。 +- 引入版本: - + +##### thrift_rpc_timeout_ms + +- 默认值: 5000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: Thrift RPC 的超时时间。 +- 引入版本: - + +##### transaction_apply_thread_pool_num_min + +- 默认值: 0 +- 类型: Int +- 单位: 线程 +- 可变: 是 +- 描述: 设置 BE UpdateManager 中 "update_apply" 线程池的最小线程数 — 该线程池用于应用主键表的 Rowset。值为 0 禁用固定最小值(无强制下限);当 `transaction_apply_worker_count` 也为 0 时,池的最大线程数默认为 CPU 核心数,因此有效的工作容量等于 CPU 核心数。您可以提高此值以保证应用事务的基线并发性;设置过高可能会增加 CPU 争用。更改通过 `update_config` HTTP 处理程序在运行时应用(它在 apply 线程池上调用 `update_min_threads`)。 +- 引入版本: v3.2.11 + +##### transaction_publish_version_thread_pool_num_min + +- 默认值: 0 +- 类型: Int +- 单位: 线程 +- 可变: 是 +- 描述: 设置 AgentServer "publish_version" 动态线程池中保留的最小线程数(用于发布事务版本/处理 `TTaskType::PUBLISH_VERSION` 任务)。在启动时,池以 min = max(config value, MIN_TRANSACTION_PUBLISH_WORKER_COUNT) (MIN_TRANSACTION_PUBLISH_WORKER_COUNT = 1) 创建,因此默认 0 会导致最小 1 个线程。在运行时更改此值会调用更新回调以调用 ThreadPool::update_min_threads,提高或降低池的保证最小值(但不低于强制最小值 1)。与 `transaction_publish_version_worker_count`(最大线程数)和 `transaction_publish_version_thread_pool_idle_time_ms`(空闲超时)协调。 +- 引入版本: v3.2.11 + +##### use_mmap_allocate_chunk + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 当此项设置为 `true` 时,系统使用匿名私有 mmap 映射 (`MAP_ANONYMOUS | MAP_PRIVATE`) 分配块,并使用 munmap 释放它们。启用此功能可能会创建许多虚拟内存映射,因此您必须提高内核限制(作为 root 用户,运行 `sysctl -w vm.max_map_count=262144` 或 `echo 262144 > /proc/sys/vm/max_map_count`),并将 `chunk_reserved_bytes_limit` 设置为相对较大的值。否则,启用 mmap 可能会由于频繁的映射/取消映射而导致非常差的性能。 +- 引入版本: v3.2.0 + +### 元数据和集群管理 + +##### cluster_id + +- 默认值: -1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 此 StarRocks 后端的全局集群标识符。在启动时,StorageEngine 读取 config::cluster_id 到其有效的集群 ID,并验证所有数据根路径是否包含相同的集群 ID(参见 StorageEngine::_check_all_root_path_cluster_id)。值 -1 表示“未设置”——引擎可以从现有数据目录或从主心跳派生有效的 ID。如果配置了非负 ID,则配置的 ID 与数据目录中存储的 ID 之间的任何不匹配都会导致启动验证失败 (Status::Corruption)。当某些根缺少 ID 且引擎被允许写入 ID (options.need_write_cluster_id) 时,它会将有效的 ID 持久化到这些根中。 +- 引入版本: v3.2.0 + +##### consistency_max_memory_limit + +- 默认值: 10G +- 类型: String +- 单位: - +- 可变: 否 +- 描述: CONSISTENCY 内存跟踪器的内存大小规范。 +- 引入版本: v3.2.0 + +##### make_snapshot_rpc_timeout_ms + +- 默认值: 20000 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: 设置在远程 BE 上创建快照时使用的 Thrift RPC 超时(毫秒)。当远程快照创建经常超出默认超时时,增加此值;减少此值以在无响应的 BE 上更快失败。请注意,其他超时可能会影响端到端操作(例如,有效的 Tablet 写入器打开超时可能与 `tablet_writer_open_rpc_timeout_sec` 和 `load_timeout_sec` 相关)。 +- 引入版本: v3.2.0 + +##### metadata_cache_memory_limit_percent + +- 默认值: 30 +- 类型: Int +- 单位: 百分比 +- 可变: 是 +- 描述: 设置元数据 LRU 缓存大小为进程内存限制的百分比。在启动时,StarRocks 将缓存字节计算为 (`process_mem_limit * metadata_cache_memory_limit_percent / 100`) 并将其传递给元数据缓存分配器。缓存仅用于非 PRIMARY_KEYS Rowset(不支持 PK 表),并且仅当 `metadata_cache_memory_limit_percent > 0` 时才启用;将其设置为 `<= 0` 以禁用元数据缓存。增加此值会提高元数据缓存容量,但会减少其他组件可用的内存;根据工作负载和系统内存进行调整。在 BE_TEST 构建中不活动。 +- 引入版本: v3.2.10 + +##### retry_apply_interval_second + +- 默认值: 30 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 用于调度失败的 Tablet 应用操作重试的基本间隔(秒)。它直接用于在提交失败后调度重试,并作为退避的基本乘数:下一个重试延迟计算为 min(600, `retry_apply_interval_second` * failed_attempts)。代码还使用 `retry_apply_interval_second` 计算累积重试持续时间(等差数列和),并将其与 `retry_apply_timeout_second` 进行比较,以决定是否继续重试。仅当 `enable_retry_apply` 为 true 时有效。增加此值会延长单个重试延迟和累积重试时间;减少此值会使重试更频繁,并可能增加达到 `retry_apply_timeout_second` 之前的尝试次数。 +- 引入版本: v3.2.9 + +##### retry_apply_timeout_second + +- 默认值: 7200 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 允许应用挂起版本之前积累的最大重试时间(秒),在此时间之后,应用进程将放弃并使 Tablet 进入错误状态。应用逻辑根据 `retry_apply_interval_second` 累积指数/退避间隔,并将总持续时间与 `retry_apply_timeout_second` 进行比较。如果 `enable_retry_apply` 为 true 且错误被认为是可重试的,则应用尝试将重新调度,直到累积退避超过 `retry_apply_timeout_second`;然后应用停止,Tablet 转换为错误状态。明确不可重试的错误(例如 Corruption)无论此设置如何都不会重试。调整此值以控制 StarRocks 将继续重试应用操作多长时间(默认 7200s = 2 小时)。 +- 引入版本: v3.3.13, v3.4.3, v3.5.0 + +##### txn_commit_rpc_timeout_ms + +- 默认值: 60000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: BE 流式加载和事务提交调用使用的 Thrift RPC 连接的最大允许生命周期(毫秒)。StarRocks 将此值设置为发送到 FE 的请求的 `thrift_rpc_timeout_ms`(用于 stream_load 规划、loadTxnBegin/loadTxnPrepare/loadTxnCommit 和 getLoadTxnStatus)。如果连接在池中停留时间超过此值,它将被关闭。当提供了每个请求的超时 (`ctx->timeout_second`) 时,BE 将 RPC 超时计算为 rpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms)),因此有效的 RPC 超时受上下文和此配置的限制。请确保此值与 FE 的 `thrift_client_timeout_ms` 保持一致,以避免超时不匹配。 +- 引入版本: v3.2.0 + +##### txn_map_shard_size + +- 默认值: 128 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 事务管理器用于分区事务锁和减少争用的锁映射分片数量。其值应为 2 的幂 (2^n);增加它会增加并发性并减少锁争用,但会增加额外的内存和少量簿记开销。根据预期的并发事务和可用内存选择分片数量。 +- 引入版本: v3.2.0 + +##### txn_shard_size + +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 控制事务管理器使用的锁分片数量。此值决定了事务锁的分片大小。它必须是 2 的幂;将其设置为更大的值会减少锁争用并提高并发 COMMIT/PUBLISH 吞吐量,但会增加额外的内存和更细粒度的内部簿记开销。 +- 引入版本: v3.2.0 + +##### update_schema_worker_count + +- 默认值: 3 +- 类型: Int +- 单位: 线程 +- 可变: 否 +- 描述: 设置后端“update_schema”动态 ThreadPool 中处理 `TTaskType::UPDATE_SCHEMA` 任务的最大工作线程数。ThreadPool 在启动期间在 AgentServer 中创建,最小线程数为 0(空闲时可以缩减到零),最大线程数等于此设置;该池使用默认的空闲超时和实际上无限的队列。增加此值以允许更多并发的模式更新任务(更高的 CPU 和内存使用率),或降低此值以限制并行模式操作。 +- 引入版本: v3.2.3 + +##### update_tablet_meta_info_worker_count + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 设置后端线程池中处理 Tablet 元数据更新任务的最大工作线程数。线程池在后端启动期间创建,最小线程数为 0(空闲时可以缩减到零),最大线程数等于此设置(限制为至少 1)。在运行时更新此值会调整池的最大线程数。增加它以允许更多并发的元数据更新任务,或降低它以限制并发性。 +- 引入版本: v4.1.0, v4.0.6, v3.5.13 + +### 用户、角色和权限 + +##### ssl_certificate_path + +- 默认值: +- 类型: String +- 单位: - +- 可变: 否 +- 描述: BE 的 bRPC 服务器在 `enable_https` 为 true 时将使用的 TLS/SSL 证书文件 (PEM) 的绝对路径。在 BE 启动时,此值将复制到 `brpc::ServerOptions::ssl_options().default_cert.certificate`;您还必须将 `ssl_private_key_path` 设置为匹配的私钥。如果您的 CA 要求,请以 PEM 格式提供服务器证书和任何中间证书(证书链)。该文件必须可由 StarRocks BE 进程读取,并且仅在启动时应用。如果未设置或无效而 `enable_https` 已启用,bRPC TLS 设置可能会失败并阻止服务器正常启动。 +- 引入版本: v4.0.0 + +### 查询引擎 + +##### clear_udf_cache_when_start + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 启用时,BE 的 UserFunctionCache 将在启动时清除所有本地缓存的用户函数库。在 UserFunctionCache::init 期间,代码会调用 _reset_cache_dir(),该函数会从配置的 UDF 库目录(组织成 kLibShardNum 子目录)中删除 UDF 文件,并删除带有 Java/Python UDF 后缀(.jar/.py)的文件。禁用时(默认),BE 会加载现有缓存的 UDF 文件而不是删除它们。启用此功能会强制在重启后首次使用时重新下载 UDF 二进制文件(增加网络流量和首次使用的延迟)。 +- 引入版本: v4.0.0 + +##### dictionary_speculate_min_chunk_size + +- 默认值: 10000 +- 类型: Int +- 单位: 行 +- 可变: 否 +- 描述: StringColumnWriter 和 DictColumnWriter 用于触发字典编码推测的最小行数(块大小)。如果传入列(或累积缓冲区加上传入行)的大小大于或等于 `dictionary_speculate_min_chunk_size`,则写入器将立即运行推测并设置编码(DICT、PLAIN 或 BIT_SHUFFLE),而不是缓冲更多行。推测使用 `dictionary_encoding_ratio` 用于字符串列,使用 `dictionary_encoding_ratio_for_non_string_column` 用于数值/非字符串列来决定字典编码是否有利。此外,大的列 byte_size(大于或等于 UINT32_MAX)会强制立即推测以避免 `BinaryColumn` 溢出。 +- 引入版本: v3.2.0 + +##### disable_storage_page_cache + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 一个布尔值,控制是否禁用 PageCache。 + - 当启用 PageCache 时,StarRocks 会缓存最近扫描的数据。 + - 当频繁重复相似查询时,PageCache 可以显著提高查询性能。 + - `true` 表示禁用 PageCache。 + - 从 StarRocks v2.4 开始,此项的默认值已从 `true` 更改为 `false`。 +- 引入版本: - + +##### enable_bitmap_index_memory_page_cache + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否为 Bitmap 索引启用内存缓存。如果想使用 Bitmap 索引加速点查询,建议启用内存缓存。 +- 引入版本: v3.1 + +##### enable_compaction_flat_json + +- 默认值: True +- 类型: Boolean +- 单位: +- 可变: 是 +- 描述: 是否为 Flat JSON 数据启用 Compaction。 +- 引入版本: v3.3.3 + +##### enable_json_flat + +- 默认值: false +- 类型: Boolean +- 单位: +- 可变: 是 +- 描述: 是否启用 Flat JSON 功能。启用此功能后,新加载的 JSON 数据将自动展平,从而提高 JSON 查询性能。 +- 引入版本: v3.3.0 + +##### enable_lazy_dynamic_flat_json + +- 默认值: True +- 类型: Boolean +- 单位: +- 可变: 是 +- 描述: 当查询在读取过程中缺少 Flat JSON 模式时,是否启用 Lazy Dynamic Flat JSON。当此项设置为 `true` 时,StarRocks 会将 Flat JSON 操作推迟到计算过程而不是读取过程。 +- 引入版本: v3.3.3 + +##### enable_ordinal_index_memory_page_cache + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否为 Ordinal 索引启用内存缓存。Ordinal 索引是行 ID 到数据页位置的映射,可用于加速扫描。 +- 引入版本: - + +##### enable_string_prefix_zonemap + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否为使用前缀 Min/Max 的字符串 (CHAR/VARCHAR) 列启用 ZoneMap。对于非关键字符串列,Min/Max 值会被截断为 `string_prefix_zonemap_prefix_len` 配置的固定前缀长度。 +- 引入版本: - + +##### enable_zonemap_index_memory_page_cache + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否为 Zonemap 索引启用内存缓存。如果想使用 Zonemap 索引加速扫描,建议启用内存缓存。 +- 引入版本: - + +##### exchg_node_buffer_size_bytes + +- 默认值: 10485760 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 每个查询的交换节点接收端的最大缓冲区大小。此配置项是一个软限制。当数据以过快的速度发送到接收端时,会触发反压。 +- 引入版本: - + +##### file_descriptor_cache_capacity + +- 默认值: 16384 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 可以缓存的文件描述符数量。 +- 引入版本: - + +##### flamegraph_tool_dir + +- 默认值: `${STARROCKS_HOME}/bin/flamegraph` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: Flamegraph 工具的目录,其中应包含 pprof、stackcollapse-go.pl 和 flamegraph.pl 脚本,用于从配置文件数据生成火焰图。 +- 引入版本: - + +##### fragment_pool_queue_size + +- 默认值: 2048 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 每个 BE 节点上可以处理的查询数量上限。 +- 引入版本: - + +##### fragment_pool_thread_num_max + +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于查询的最大线程数。 +- 引入版本: - + +##### fragment_pool_thread_num_min + +- 默认值: 64 +- 类型: Int +- 单位: 分钟 - +- 可变: 否 +- 描述: 用于查询的最小线程数。 +- 引入版本: - + +##### hdfs_client_enable_hedged_read + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 指定是否启用对冲读取功能。 +- 引入版本: v3.0 + +##### hdfs_client_hedged_read_threadpool_size + +- 默认值: 128 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 指定 HDFS 客户端上对冲读取线程池的大小。线程池大小限制了 HDFS 客户端中专门用于运行对冲读取的线程数。它等效于 HDFS 集群 **hdfs-site.xml** 文件中的 `dfs.client.hedged.read.threadpool.size` 参数。 +- 引入版本: v3.0 + +##### hdfs_client_hedged_read_threshold_millis + +- 默认值: 2500 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: 指定在启动对冲读取之前等待的毫秒数。例如,您已将此参数设置为 `30`。在这种情况下,如果从块读取在 30 毫秒内未返回,您的 HDFS 客户端会立即启动对不同块副本的新读取。它等效于 HDFS 集群 **hdfs-site.xml** 文件中的 `dfs.client.hedged.read.threshold.millis` 参数。 +- 引入版本: v3.0 + +##### io_coalesce_adaptive_lazy_active + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 根据谓词的选择性,自适应地确定是否合并谓词列和非谓词列的 I/O。 +- 引入版本: v3.2 + +##### jit_lru_cache_size + +- 默认值: 0 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: JIT 编译的 LRU 缓存大小。如果它设置为大于 0,则表示缓存的实际大小。如果设置为小于或等于 0,系统将使用公式 `jit_lru_cache_size = min(mem_limit*0.01, 1GB)` 自适应地设置缓存(同时节点的 `mem_limit` 必须大于或等于 16 GB)。 +- 引入版本: - + +##### json_flat_column_max + +- 默认值: 100 +- 类型: Int +- 单位: +- 可变: 是 +- 描述: Flat JSON 可提取的最大子字段数量。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 引入版本: v3.3.0 + +##### json_flat_create_zonemap + +- 默认值: true +- 类型: Boolean +- 单位: +- 可变: 是 +- 描述: 是否在写入时为扁平化 JSON 子列创建 ZoneMaps。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 引入版本: - + +##### json_flat_null_factor + +- 默认值: 0.3 +- 类型: Double +- 单位: +- 可变: 是 +- 描述: Flat JSON 提取的列中 NULL 值的比例。如果列中 NULL 值的比例高于此阈值,则不提取该列。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 引入版本: v3.3.0 + +##### json_flat_sparsity_factor + +- 默认值: 0.3 +- 类型: Double +- 单位: +- 可变: 是 +- 描述: Flat JSON 同名列的比例。如果同名列的比例低于此值,则不执行提取。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 引入版本: v3.3.0 + +##### lake_tablet_ignore_invalid_delete_predicate + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 一个布尔值,控制是否忽略 Tablet Rowset 元数据中可能由于列名重命名后对重复键表进行逻辑删除而引入的无效删除谓词。 +- 引入版本: v4.0 + +##### late_materialization_ratio + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 范围 [0-1000] 内的整数比率,控制 SegmentIterator(向量查询引擎)中使用延迟物化。值为 `0`(或 <= 0)禁用延迟物化;`1000`(或 >= 1000)强制所有读取使用延迟物化。值 > 0 且 < 1000 启用条件策略,其中准备了延迟和早期物化上下文,并且迭代器根据谓词过滤器比率选择行为(值越高越有利于延迟物化)。当 Segment 包含复杂指标类型时,StarRocks 改用 `metric_late_materialization_ratio`。如果 `lake_io_opts.cache_file_only` 设置为 true,则禁用延迟物化。 +- 引入版本: v3.2.0 + +##### max_hdfs_file_handle + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 可打开的最大 HDFS 文件描述符数量。 +- 引入版本: - + +##### max_memory_sink_batch_count + +- 默认值: 20 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Scan Cache 的最大批次数量。 +- 引入版本: - + +##### max_pushdown_conditions_per_column + +- Default: 1024 +- Type: Int +- Unit: - +- Is mutable: Yes +- Description: 每个列允许下推的最大条件数。如果条件数超过此限制,谓词将不会下推到存储层。 +- Introduced in: - + +##### max_scan_key_num + +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每个查询分段的最大扫描键数量。 +- 引入版本: - + +##### metric_late_materialization_ratio + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 控制包含复杂指标列的读取何时使用延迟物化行访问策略。有效范围: [0-1000]。`0` 禁用延迟物化;`1000` 强制所有适用读取使用延迟物化。值 1-999 启用条件策略,其中准备了延迟和早期物化上下文,并根据谓词/选择性在运行时选择。当存在复杂指标类型时,`metric_late_materialization_ratio` 会覆盖通用的 `late_materialization_ratio`。注意:`cache_file_only` I/O 模式将导致延迟物化被禁用,无论此设置如何。 +- 引入版本: v3.2.0 + +##### min_file_descriptor_number + +- 默认值: 60000 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE 进程中文件描述符的最小数量。 +- 引入版本: - + +##### object_storage_connect_timeout_ms + +- 默认值: -1 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: 与对象存储建立套接字连接的超时时长。`-1` 表示使用 SDK 配置的默认超时时长。 +- 引入版本: v3.0.9 + +##### object_storage_request_timeout_ms + +- 默认值: -1 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: 与对象存储建立 HTTP 连接的超时时长。`-1` 表示使用 SDK 配置的默认超时时长。 +- 引入版本: v3.0.9 + +##### parquet_late_materialization_enable + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 一个布尔值,控制是否启用 Parquet 读取器的延迟物化以提高性能。`true` 表示启用延迟物化,`false` 表示禁用。 +- 引入版本: - + +##### parquet_page_index_enable + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 一个布尔值,控制是否启用 Parquet 文件的 Page Index 以提高性能。`true` 表示启用 Page Index,`false` 表示禁用。 +- 引入版本: v3.3 + +##### parquet_reader_bloom_filter_enable + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 一个布尔值,控制是否启用 Parquet 文件的布隆过滤器以提高性能。`true` 表示启用布隆过滤器,`false` 表示禁用。您也可以通过系统变量 `enable_parquet_reader_bloom_filter` 在会话级别控制此行为。Parquet 中的布隆过滤器在**每个行组内以列级别**维护。如果 Parquet 文件包含某些列的布隆过滤器,查询可以使用这些列上的谓词有效地跳过行组。 +- 引入版本: v3.5 + +##### path_gc_check_step + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每次可连续扫描的最大文件数。 +- 引入版本: - + +##### path_gc_check_step_interval_ms + +- 默认值: 10 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 文件扫描之间的时间间隔。 +- 引入版本: - + +##### path_scan_interval_second + +- 默认值: 86400 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: GC 清理过期数据的时间间隔。 +- 引入版本: - + +##### pipeline_connector_scan_thread_num_per_cpu + +- 默认值: 8 +- 类型: Double +- 单位: - +- 可变: 是 +- 描述: BE 节点中 Pipeline Connector 每 CPU 核心分配的扫描线程数。此配置从 v3.1.7 版本开始变为动态。 +- 引入版本: - + +##### pipeline_poller_timeout_guard_ms + +- 默认值: -1 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 当此项设置为大于 `0` 时,如果驱动程序在 Poller 中单次调度花费的时间超过 `pipeline_poller_timeout_guard_ms`,则会打印驱动程序和操作员的信息。 +- 引入版本: - + +##### pipeline_prepare_thread_pool_queue_size + +- 默认值: 102400 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Pipeline 执行引擎 PREPARE Fragment 线程池的最大队列长度。 +- 引入版本: - + +##### pipeline_prepare_thread_pool_thread_num + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Pipeline 执行引擎 PREPARE Fragment 线程池中的线程数。`0` 表示该值等于系统 VCPU 核心数。 +- 引入版本: - + +##### pipeline_prepare_timeout_guard_ms + +- 默认值: -1 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 当此项设置为大于 `0` 时,如果计划片段在 PREPARE 过程中超过 `pipeline_prepare_timeout_guard_ms`,则会打印计划片段的堆栈跟踪。 +- 引入版本: - + +##### pipeline_scan_thread_pool_queue_size + +- 默认值: 102400 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Pipeline 执行引擎 SCAN 线程池的最大任务队列长度。 +- 引入版本: - + +##### pk_index_parallel_get_threadpool_size + +- 默认值: 1048576 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 设置共享数据(云原生/Lake)模式下 PK 索引并行获取操作使用的 "cloud_native_pk_index_get" 线程池的最大队列大小(挂起任务数)。该池的实际线程数由 `pk_index_parallel_get_threadpool_max_threads` 控制;此设置仅限制有多少任务可以排队等待执行。非常大的默认值 (2^20) 实际上使队列无限制;降低它可防止排队任务导致过度内存增长,但当队列已满时可能导致任务提交阻塞或失败。根据工作负载并发性和内存限制与 `pk_index_parallel_get_threadpool_max_threads` 一起调整。 +- 引入版本: - + +##### priority_queue_remaining_tasks_increased_frequency + +- 默认值: 512 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 控制 BlockingPriorityQueue 增加所有剩余任务优先级(“老化”)的频率,以避免饥饿。每次成功 get/pop 都会增加一个内部 `_upgrade_counter`;当 `_upgrade_counter` 超过 `priority_queue_remaining_tasks_increased_frequency` 时,队列会增加每个元素的优先级,重建堆,并重置计数器。较小的值会导致更频繁的优先级老化(减少饥饿,但由于迭代和重新堆化而增加 CPU 成本);较大的值会减少该开销但会延迟优先级调整。该值是一个简单的操作计数阈值,而不是时间持续时间。 +- 引入版本: v3.2.0 + +##### query_cache_capacity + +- 默认值: 536870912 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: BE 中查询缓存的大小。默认大小为 512 MB。大小不能小于 4 MB。如果 BE 的内存容量不足以提供预期的查询缓存大小,您可以增加 BE 的内存容量。 +- 引入版本: - + +##### query_pool_spill_mem_limit_threshold + +- 默认值: 1.0 +- 类型: Double +- 单位: - +- 可变: 否 +- 描述: 如果启用自动溢出,当所有查询的内存使用量超过 `query_pool memory limit * query_pool_spill_mem_limit_threshold` 时,将触发中间结果溢出。 +- 引入版本: v3.2.7 + +##### query_scratch_dirs + +- 默认值: `${STARROCKS_HOME}` +- 类型: string +- 单位: - +- 可变: 否 +- 描述: 逗号分隔的可写临时目录列表,由查询执行用于溢出中间数据(例如,外部排序、哈希连接和其他操作符)。指定一个或多个路径,用 `;` 分隔(例如 `/mnt/ssd1/tmp;/mnt/ssd2/tmp`)。目录必须可由 BE 进程访问和写入,并有足够的可用空间;StarRocks 将在其中选择以分配溢出 I/O。更改需要重新启动才能生效。如果目录缺失、不可写或已满,溢出可能会失败或降低查询性能。 +- 引入版本: v3.2.0 + +##### result_buffer_cancelled_interval_time + +- 默认值: 300 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: BufferControlBlock 释放数据前的等待时间。 +- 引入版本: - + +##### scan_context_gc_interval_min + +- 默认值: 5 +- 类型: Int +- 单位: 分钟 +- 可变: 是 +- 描述: 清理 Scan Context 的时间间隔。 +- 引入版本: - + +##### scanner_row_num + +- 默认值: 16384 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 扫描中每个扫描线程返回的最大行数。 +- 引入版本: - + +##### scanner_thread_pool_queue_size + +- 默认值: 102400 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 存储引擎支持的扫描任务数量。 +- 引入版本: - + +##### scanner_thread_pool_thread_num + +- 默认值: 48 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 存储引擎用于并发存储卷扫描的线程数。所有线程都在线程池中管理。 +- 引入版本: - + +##### string_prefix_zonemap_prefix_len + +- 默认值: 16 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 当 `enable_string_prefix_zonemap` 启用时,用于字符串 ZoneMap Min/Max 的前缀长度。 +- 引入版本: - + +##### udf_thread_pool_size + +- 默认值: 1 +- 类型: Int +- 单位: 线程 +- 可变: 否 +- 描述: 设置在 ExecEnv 中创建的 UDF 调用 PriorityThreadPool 的大小(用于执行用户定义函数/UDF 相关任务)。该值用作线程池线程计数,也用作构建线程池 (PriorityThreadPool("udf", thread_num, queue_size)) 时的池队列容量。增加此值以允许更多并发 UDF 执行;保持较小以避免过度的 CPU 和内存争用。 +- 引入版本: v3.2.0 + +##### update_memory_limit_percent + +- 默认值: 60 +- 类型: Int +- 单位: 百分比 +- 可变: 否 +- 描述: 为更新相关内存和缓存保留的 BE 进程内存的比例。在启动期间,`GlobalEnv` 计算更新的 `MemTracker` 为 `process_mem_limit * clamp(update_memory_limit_percent, 0, 100) / 100`。`UpdateManager` 也使用此百分比来确定其主索引/索引缓存容量(索引缓存容量 = `GlobalEnv::process_mem_limit * update_memory_limit_percent / 100`)。HTTP 配置更新逻辑注册一个回调,该回调在更新管理器上调用 `update_primary_index_memory_limit`,因此如果配置更改,更改将应用于更新子系统。增加此值会为更新/主索引路径分配更多内存(减少其他池可用的内存);减少此值会减少更新内存和缓存容量。值被限制在 0-100 范围内。 +- 引入版本: v3.2.0 + +##### vector_chunk_size + +- 默认值: 4096 +- 类型: Int +- 单位: 行 +- 可变: 否 +- 描述: 在整个执行和存储代码路径中使用的每个向量化 Chunk(批次)的行数。此值控制 Chunk 和 RuntimeState 的 batch_size 创建,影响操作符吞吐量、每个操作符的内存占用、溢出和排序缓冲区大小调整以及 I/O 启发式(例如,ORC 写入器的自然写入大小)。增加它可以在宽/CPU 密集型工作负载中提高 CPU 和 I/O 效率,但会提高峰值内存使用率,并可能增加小结果查询的延迟。仅当分析显示批次大小是瓶颈时才进行调整;否则保持默认值以平衡内存和性能。 +- 引入版本: v3.2.0 + +### 加载 + +##### clear_transaction_task_worker_count + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于清除事务的线程数。 +- 引入版本: - + +##### column_mode_partial_update_insert_batch_size + +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 列模式部分更新处理插入行时的批处理大小。如果此项设置为 `0` 或负数,它将被限制为 `1` 以避免无限循环。此项控制每个批次处理的新插入行数。较大的值可以提高写入性能,但会消耗更多内存。 +- 引入版本: v3.5.10, v4.0.2 + +##### enable_load_spill_parallel_merge + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 指定是否在单个 Tablet 中启用并行溢出合并。启用此功能可以提高数据加载期间溢出合并的性能。 +- 引入版本: - + +##### enable_stream_load_verbose_log + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 指定是否记录 Stream Load 作业的 HTTP 请求和响应。 +- 引入版本: v2.5.17, v3.0.9, v3.1.6, v3.2.1 + +##### flush_thread_num_per_store + +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每个存储用于刷新 MemTable 的线程数。 +- 引入版本: - + +##### lake_flush_thread_num_per_store + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中每个存储用于刷新 MemTable 的线程数。 +当此值设置为 `0` 时,系统使用 CPU 核心数的两倍作为值。 +当此值设置为小于 `0` 时,系统使用其绝对值与 CPU 核心数的乘积作为值。 +- 引入版本: v3.1.12, 3.2.7 + +##### load_data_reserve_hours + +- 默认值: 4 +- 类型: Int +- 单位: 小时 +- 可变: 否 +- 描述: 小型加载产生文件的保留时间。 +- 引入版本: - + +##### load_error_log_reserve_hours + +- 默认值: 48 +- 类型: Int +- 单位: 小时 +- 可变: 是 +- 描述: 数据加载日志的保留时间。 +- 引入版本: - + +##### load_process_max_memory_limit_bytes + +- 默认值: 107374182400 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: BE 节点上所有加载进程可占用的内存资源的最大大小限制。 +- 引入版本: - + +##### load_spill_memory_usage_per_merge + +- 默认值: 1073741824 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 溢出合并期间每次合并操作的最大内存使用量。默认值为 1 GB (1073741824 字节)。此参数控制数据加载溢出合并期间单个合并任务的内存消耗,以防止内存使用量过高。 +- 引入版本: - + +##### max_consumer_num_per_group + +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Routine Load 消费者组中的最大消费者数量。 +- 引入版本: - + +##### max_runnings_transactions_per_txn_map + +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每个分区中可同时运行的最大事务数。 +- 引入版本: - + +##### number_tablet_writer_threads + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 摄取(例如 Stream Load、Broker Load 和 Insert)中使用的 Tablet 写入线程数。当参数设置为小于或等于 0 时,系统使用 CPU 核心数的一半,最小值为 16。当参数设置为大于 0 时,系统使用该值。此配置从 v3.1.7 版本开始变为动态。 +- 引入版本: - + +##### push_worker_count_high_priority + +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于处理高优先级加载任务的线程数。 +- 引入版本: - + +##### push_worker_count_normal_priority + +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于处理普通优先级加载任务的线程数。 +- 引入版本: - + +##### streaming_load_max_batch_size_mb + +- 默认值: 100 +- 类型: Int +- 单位: MB +- 可变: 是 +- 描述: 可流式传输到 StarRocks 的 JSON 文件的最大大小。 +- 引入版本: - + +##### streaming_load_max_mb + +- 默认值: 102400 +- 类型: Int +- 单位: MB +- 可变: 是 +- 描述: 可流式传输到 StarRocks 的文件的最大大小。从 v3.0 开始,默认值已从 `10240` 更改为 `102400`。 +- 引入版本: - + +##### streaming_load_rpc_max_alive_time_sec + +- 默认值: 1200 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: Stream Load 的 RPC 超时时间。 +- 引入版本: - + +##### transaction_publish_version_thread_pool_idle_time_ms + +- 默认值: 60000 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: Publish Version 线程池回收线程前的空闲时间。 +- 引入版本: - + +##### transaction_publish_version_worker_count + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 用于发布版本的最大线程数。当此值设置为小于或等于 `0` 时,系统使用 CPU 核心数作为值,以避免在高并发导入时线程资源不足而仅使用固定数量的线程。从 v2.5 开始,默认值已从 `8` 更改为 `0`。 +- 引入版本: - + +##### write_buffer_size + +- 默认值: 104857600 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 内存中 MemTable 的缓冲区大小。此配置项是触发刷新的阈值。 +- 引入版本: - + +### 加载和卸载 + +##### broker_write_timeout_seconds + +- 默认值: 30 +- 类型: int +- 单位: 秒 +- 可变: 否 +- 描述: 后端 Broker 操作用于写入/IO RPC 的超时(秒)。该值乘以 1000 以产生毫秒超时,并作为默认的 `timeout_ms` 传递给 BrokerFileSystem 和 BrokerServiceConnection 实例(例如文件导出和快照上传/下载)。当 Broker 或网络较慢或传输大文件时,增加此值可避免过早超时;减少此值可能会导致 Broker RPC 更早失败。此值在 common/config 中定义,并在进程启动时应用(不可动态重新加载)。 +- 引入版本: v3.2.0 + +##### enable_load_channel_rpc_async + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 启用时,加载通道打开 RPC(例如 `PTabletWriterOpen`)的处理将从 BRPC 工作线程卸载到专门的线程池:请求处理程序创建一个 `ChannelOpenTask` 并将其提交到内部 `_async_rpc_pool`,而不是内联运行 `LoadChannelMgr::_open`。这减少了 BRPC 线程内部的工作和阻塞,并允许通过 `load_channel_rpc_thread_pool_num` 和 `load_channel_rpc_thread_pool_queue_size` 调整并发性。如果线程池提交失败(当池满或关闭时),请求将被取消并返回错误状态。池在 `LoadChannelMgr::close()` 时关闭,因此在启用此功能时要考虑容量和生命周期,以避免请求被拒绝或处理延迟。 +- 引入版本: v3.5.0 + +##### enable_load_diagnose + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 启用时,StarRocks 将在 bRPC 超时匹配 "[E1008]Reached timeout" 后尝试从 BE OlapTableSink/NodeChannel 进行自动加载诊断。代码会创建一个 `PLoadDiagnoseRequest` 并向远程 LoadChannel 发送 RPC 以收集配置文件和/或堆栈跟踪(由 `load_diagnose_rpc_timeout_profile_threshold_ms` 和 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 控制)。诊断 RPC 使用 `load_diagnose_send_rpc_timeout_ms` 作为其超时。如果诊断请求已在进行中,则跳过诊断。启用此功能会在目标节点上产生额外的 RPC 和分析工作;在敏感的生产工作负载上禁用以避免额外开销。 +- 引入版本: v3.5.0 + +##### enable_load_segment_parallel + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 启用时,Rowset Segment 加载和 Rowset 级别读取会使用 StarRocks 后台线程池(ExecEnv::load_segment_thread_pool 和 ExecEnv::load_rowset_thread_pool)并发执行。Rowset::load_segments 和 TabletReader::get_segment_iterators 将每个 Segment 或每个 Rowset 任务提交到这些池中,如果提交失败则回退到串行加载并记录警告。启用此功能可降低大型 Rowset 的读取/加载延迟,但会增加 CPU/IO 并发性和内存压力。注意:并行加载可能会改变 Segment 的加载完成顺序,因此会阻止部分 Compaction(代码会检查 `_parallel_load` 并在启用时禁用部分 Compaction);考虑对依赖 Segment 顺序的操作的影响。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 + +##### enable_streaming_load_thread_pool + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 控制流式加载扫描器是否提交到专用的流式加载线程池。启用时,如果查询是 `TLoadJobType::STREAM_LOAD` 的 LOAD,ConnectorScanNode 会将扫描器任务提交到 `streaming_load_thread_pool`(该池配置有 INT32_MAX 线程和队列大小,即实际上无限制)。禁用时,扫描器使用通用的 `thread_pool` 及其 PriorityThreadPool 提交逻辑(优先级计算、try_offer/offer 行为)。启用此功能可将流式加载工作与常规查询执行隔离,以减少干扰;但是,由于专用池实际上是无限制的,启用此功能可能会在重度流式加载流量下增加并发线程和资源使用。此选项默认开启,通常不需要修改。 +- 引入版本: v3.2.0 + +##### es_http_timeout_ms + +- 默认值: 5000 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: ESScanReader 中 ES 网络客户端用于 Elasticsearch scroll 请求的 HTTP 连接超时(毫秒)。此值通过 `network_client.set_timeout_ms()` 应用,然后发送后续的 scroll POST 请求,并控制客户端在滚动过程中等待 ES 响应的时间。对于慢速网络或大型查询,增加此值可避免过早超时;减少此值可更快地在无响应的 ES 节点上失败。此设置补充了 `es_scroll_keepalive`,后者控制 scroll 上下文的保持活动持续时间。 +- 引入版本: v3.2.0 + +##### es_index_max_result_window + +- 默认值: 10000 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 限制 StarRocks 在单个批次中将从 Elasticsearch 请求的最大文档数。StarRocks 在为 ES 读取器构建 `KEY_BATCH_SIZE` 时,将 ES 请求批处理大小设置为 min(`es_index_max_result_window`, `chunk_size`)。如果 ES 请求超过 Elasticsearch 索引设置 `index.max_result_window`,Elasticsearch 将返回 HTTP 400 (Bad Request)。在扫描大型索引时调整此值,或在 Elasticsearch 端增加 ES `index.max_result_window` 以允许更大的单个请求。 +- 引入版本: v3.2.0 + +##### ignore_load_tablet_failure + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 当此项设置为 `false` 时,系统将把任何 Tablet Header 加载失败(非 NotFound 和非 AlreadyExist 错误)视为致命错误:代码将记录错误并调用 LOG(FATAL) 以停止 BE 进程。当设置为 `true` 时,BE 将继续启动,即使存在此类每个 Tablet 的加载错误——失败的 Tablet ID 会被记录并跳过,而成功的 Tablet 仍会加载。请注意,此参数**不**会抑制 RocksDB 元数据扫描本身的致命错误,这些错误总是会导致进程退出。 +- 引入版本: v3.2.0 + +##### load_channel_abort_clean_up_delay_seconds + +- 默认值: 600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 控制系统在从 `_aborted_load_channels` 中删除中止的加载通道的加载 ID 之前保留多长时间(秒)。当加载作业被取消或失败时,加载 ID 会被记录,以便任何迟到的加载 RPC 可以立即被拒绝;一旦延迟过期,该条目将在定期后台清理期间被清除(最小清理间隔为 60 秒)。将延迟设置过低可能会在中止后接受离散的 RPC,而设置过高可能会保留状态并消耗资源比必要时间更长。调整此值以平衡晚期请求拒绝的正确性和中止加载的资源保留。 +- 引入版本: v3.5.11, v4.0.4 + +##### load_channel_rpc_thread_pool_num + +- 默认值: -1 +- 类型: Int +- 单位: 线程 +- 可变: 是 +- 描述: 加载通道异步 RPC 线程池的最大线程数。当设置为小于或等于 0(默认 `-1`)时,池大小会自动设置为 CPU 核心数 (`CpuInfo::num_cores()`)。配置的值用作 ThreadPoolBuilder 的最大线程数,池的最小线程数设置为 min(5, max_threads)。池队列大小由 `load_channel_rpc_thread_pool_queue_size` 单独控制。引入此设置是为了使异步 RPC 池大小与 bRPC 工作线程的默认值 (`brpc_num_threads`) 对齐,以便在将加载 RPC 处理从同步切换到异步后行为保持兼容。在运行时更改此配置会触发 `ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)`。 +- 引入版本: v3.5.0 + +##### load_channel_rpc_thread_pool_queue_size + +- 默认值: 1024000 +- 类型: int +- 单位: 数量 +- 可变: 否 +- 描述: 设置 LoadChannelMgr 创建的加载通道 RPC 线程池的最大待处理任务队列大小。当 `enable_load_channel_rpc_async` 启用时,此线程池执行异步 `open` 请求;池大小与 `load_channel_rpc_thread_pool_num` 配对。大型默认值 (1024000) 与 bRPC 工作线程的默认值对齐,以在从同步处理切换到异步处理后保留行为。如果队列已满,ThreadPool::submit() 将失败,并且传入的 `open` RPC 将因错误而被取消,导致调用者收到拒绝。增加此值可缓冲更大的并发 `open` 请求突发;减少它会收紧反压,但在负载下可能导致更多拒绝。 +- 引入版本: v3.5.0 + +##### load_diagnose_rpc_timeout_profile_threshold_ms + +- 默认值: 60000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 当加载 RPC 超时(错误包含 "[E1008]Reached timeout")且 `enable_load_diagnose` 为 true 时,此阈值控制是否请求完整的分析诊断。如果请求级别的 RPC 超时 `_rpc_timeout_ms` 大于 `load_diagnose_rpc_timeout_profile_threshold_ms`,则为该诊断启用分析。对于较小的 `_rpc_timeout_ms` 值,分析每 20 次超时采样一次,以避免对实时/短超时加载进行频繁的重诊断。此值影响发送的 `PLoadDiagnoseRequest` 中的 `profile` 标志;堆栈跟踪行为由 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 单独控制,发送超时由 `load_diagnose_send_rpc_timeout_ms` 控制。 +- 引入版本: v3.5.0 + +##### load_diagnose_rpc_timeout_stack_trace_threshold_ms + +- 默认值: 600000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 用于决定何时请求远程堆栈跟踪以用于长时间运行的加载 RPC 的阈值(毫秒)。当加载 RPC 因超时错误而超时且有效 RPC 超时 (`_rpc_timeout_ms`) 超过此值时,`OlapTableSink`/`NodeChannel` 将在发送到目标 BE 的 `load_diagnose` RPC 中包含 `stack_trace=true`,以便 BE 可以返回堆栈跟踪进行调试。`LocalTabletsChannel::SecondaryReplicasWaiter` 也会在等待辅助副本超过此间隔时从主副本触发尽力而为的堆栈跟踪诊断。此行为需要 `enable_load_diagnose`,并使用 `load_diagnose_send_rpc_timeout_ms` 作为诊断 RPC 超时;分析由 `load_diagnose_rpc_timeout_profile_threshold_ms` 单独控制。降低此值会增加请求堆栈跟踪的积极性。 +- 引入版本: v3.5.0 + +##### load_diagnose_send_rpc_timeout_ms + +- 默认值: 2000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 应用于 BE 加载路径发起的诊断相关 bRPC 调用的超时(毫秒)。它用于设置 `load_diagnose` RPC(当 LoadChannel bRPC 调用超时时由 NodeChannel/OlapTableSink 发送)和副本状态查询(当 SecondaryReplicasWaiter / LocalTabletsChannel 检查主副本状态时使用)的控制器超时。选择一个足够高的值以允许远程端响应配置文件或堆栈跟踪数据,但不要太高以至于延迟故障处理。此参数与 `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms` 和 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 协同工作,它们控制何时以及请求哪些诊断信息。 +- 引入版本: v3.5.0 + +##### load_fp_brpc_timeout_ms + +- 默认值: -1 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 当 `node_channel_set_brpc_timeout` 故障点触发时,覆盖 OlapTableSink 使用的每个通道 bRPC RPC 超时。如果设置为正值,NodeChannel 将其内部 `_rpc_timeout_ms` 设置为该值(毫秒),导致 open/add-chunk/cancel RPC 使用较短的超时,并启用模拟产生 "[E1008]Reached timeout" 错误的 bRPC 超时。默认值 (`-1`) 禁用覆盖。更改此值旨在用于测试和故障注入;小值可能会产生错误的超时并触发加载诊断(参见 `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms` 和 `load_diagnose_send_rpc_timeout_ms`)。 +- 引入版本: v3.5.0 + +##### load_fp_tablets_channel_add_chunk_block_ms + +- 默认值: -1 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 启用时(设置为正毫秒值),此故障点配置使 TabletsChannel::add_chunk 在加载处理期间休眠指定时间。它用于模拟 BRPC 超时错误(例如 "[E1008]Reached timeout")并模拟昂贵的 add_chunk 操作,从而增加加载延迟。小于或等于 0 的值(默认 `-1`)禁用注入。旨在用于测试故障处理、超时和副本同步行为——不要在正常的生产工作负载中启用,因为它会延迟写入完成并可能触发上游超时或副本中止。 +- 引入版本: v3.5.0 + +##### load_segment_thread_pool_num_max + +- 默认值: 128 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 设置 BE 加载相关线程池的最大工作线程数。此值由 ThreadPoolBuilder 用于限制 exec_env.cpp 中 `load_rowset_pool` 和 `load_segment_pool` 的线程数,控制流式加载和批量加载期间处理已加载 Rowset 和 Segment(例如解码、索引、写入)的并发性。增加此值会提高并行度并可以改善加载吞吐量,但也会增加 CPU、内存使用率和潜在的争用;减少此值会限制并发加载处理并可能降低吞吐量。与 `load_segment_thread_pool_queue_size` 和 `streaming_load_thread_pool_idle_time_ms` 一起调整。更改需要 BE 重启。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 + +##### load_segment_thread_pool_queue_size + +- 默认值: 10240 +- 类型: Int +- 单位: 任务 +- 可变: 否 +- 描述: 设置作为 "load_rowset_pool" 和 "load_segment_pool" 创建的加载相关线程池的最大队列长度(待处理任务数)。这些池使用 `load_segment_thread_pool_num_max` 作为其最大线程数,此配置控制在 ThreadPool 的溢出策略生效之前可以缓冲多少加载 Segment/Rowset 任务(根据 ThreadPool 实现,进一步的提交可能会被拒绝或阻塞)。增加此值以允许更多待处理的加载工作(使用更多内存并可能增加延迟);减少此值以限制缓冲加载并发性并减少内存使用量。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 + +##### max_pulsar_consumer_num_per_group + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 控制 BE 上 Routine Load 的单个数据消费者组中可创建的最大 Pulsar 消费者数量。由于多主题订阅不支持累积确认,每个消费者只订阅一个主题/分区;如果 `pulsar_info->partitions` 中的分区数量超过此值,则组创建将失败并返回错误,建议增加 BE 上的 `max_pulsar_consumer_num_per_group` 或添加更多 BE。此限制在构建 PulsarDataConsumerGroup 时强制执行,并防止 BE 为一个 Routine Load 组托管超过此数量的消费者。对于 Kafka Routine Load,则使用 `max_consumer_num_per_group`。 +- 引入版本: v3.2.0 + +##### pull_load_task_dir + +- 默认值: `${STARROCKS_HOME}/var/pull_load` +- 类型: string +- 单位: - +- 可变: 否 +- 描述: BE 存储“拉取加载”任务(下载的源文件、任务状态、临时输出等)数据和工作文件的文件系统路径。目录必须可由 BE 进程写入,并有足够的磁盘空间用于传入加载。默认值是相对于 STARROCKS_HOME 的;测试创建并期望此目录存在(参见测试配置)。 +- 引入版本: v3.2.0 + +##### routine_load_kafka_timeout_second + +- 默认值: 10 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: 用于 Kafka 相关 Routine Load 操作的超时时间(秒)。当客户端请求未指定超时时,`routine_load_kafka_timeout_second` 用作 `get_info` 的默认 RPC 超时(转换为毫秒)。它还用作 librdkafka 消费者的每次调用消费轮询超时(转换为毫秒并受剩余运行时限制)。注意:内部 `get_info` 路径在将其传递给 librdkafka 之前将此值减小到 80% 以避免 FE 端超时竞争。将此值设置为平衡及时故障报告和网络/Broker 响应足够时间的值;更改需要重新启动,因为该设置不可变。 +- 引入版本: v3.2.0 + +##### routine_load_pulsar_timeout_second + +- 默认值: 10 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: 当请求未提供显式超时时,BE 用于 Pulsar 相关 Routine Load 操作的默认超时(秒)。具体来说,`PInternalServiceImplBase::get_pulsar_info` 将此值乘以 1000 以形成传递给获取 Pulsar 分区元数据和积压的 Routine Load 任务执行器方法的毫秒超时。增加此值以允许较慢的 Pulsar 响应,但代价是更长的故障检测时间;减少此值以在慢速 Broker 上更快失败。类似于用于 Kafka 的 `routine_load_kafka_timeout_second`。 +- 引入版本: v3.2.0 + +##### streaming_load_thread_pool_idle_time_ms + +- 默认值: 2000 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: 设置流式加载相关线程池的线程空闲超时(毫秒)。该值用作 `stream_load_io` 池以及 `load_rowset_pool` 和 `load_segment_pool` 的 ThreadPoolBuilder 的空闲超时。这些池中的线程在此持续时间空闲时将被回收;较小的值可更快释放资源,但会增加突发负载下的线程创建开销,而较大的值可使线程在短时突发期间保持活动状态。当 `enable_streaming_load_thread_pool` 启用时,使用 `stream_load_io` 池。 +- 引入版本: v3.2.0 + +##### streaming_load_thread_pool_num_min + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 在 ExecEnv 初始化期间创建的流式加载 IO 线程池 ("stream_load_io") 的最小线程数。该池以 `set_max_threads(INT32_MAX)` 和 `set_max_queue_size(INT32_MAX)` 构建,因此它实际上是无限制的,以避免并发流式加载的死锁。值为 0 允许池以零线程启动并按需增长;设置正值会在启动时保留那么多线程。当 `enable_streaming_load_thread_pool` 为 true 时使用此池,其空闲超时由 `streaming_load_thread_pool_idle_time_ms` 控制。总体并发性仍然受 `fragment_pool_thread_num_max` 和 `webserver_num_workers` 的限制;更改此值很少必要,如果设置过高可能会增加资源使用。 +- 引入版本: v3.2.0 + +### 统计报告 + +##### enable_metric_calculator + +- 默认值: true +- 类型: boolean +- 单位: - +- 可变: 否 +- 描述: 当为 true 时,BE 进程会启动一个后台 "metrics_daemon" 线程(在非 Apple 平台上在 Daemon::init 中启动),该线程大约每 15 秒运行一次,以调用 `StarRocksMetrics::instance()->metrics()->trigger_hook()` 并计算派生/系统指标(例如,推送/查询字节/秒、最大磁盘 I/O 利用率、最大网络发送/接收速率),记录内存细分并运行表指标清理。当为 false 时,这些 Hook 在指标收集时在 MetricRegistry::collect 内部同步执行,这可能会增加指标抓取延迟。需要进程重新启动才能生效。 +- 引入版本: v3.2.0 + +##### enable_system_metrics + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 当为 true 时,StarRocks 在启动时初始化系统级监控:它从配置的存储路径发现磁盘设备并枚举网络接口,然后将此信息传递到指标子系统,以启用磁盘 I/O、网络流量和内存相关系统指标的收集。如果设备或接口发现失败,初始化会记录警告并中止系统指标设置。此标志仅控制是否初始化系统指标;定期指标聚合线程由 `enable_metric_calculator` 单独控制,JVM 指标初始化由 `enable_jvm_metrics` 控制。更改此值需要重新启动。 +- 引入版本: v3.2.0 + +##### profile_report_interval + +- 默认值: 30 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: ProfileReportWorker 用于 (1) 决定何时报告 LOAD 查询的每个 Fragment 配置文件信息以及 (2) 在报告周期之间休眠的间隔(秒)。工作线程使用 (profile_report_interval * 1000) 毫秒将当前时间与每个任务的 `last_report_time` 进行比较,以确定是否应为非 Pipeline 和 Pipeline 加载任务重新报告配置文件。在每次循环中,工作线程读取当前值(运行时可变);如果配置值小于或等于 0,工作线程会强制将其设置为 1 并发出警告。更改此值会影响下一个报告决策和休眠持续时间。 +- 引入版本: v3.2.0 + +##### report_disk_state_interval_seconds + +- 默认值: 60 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 报告存储卷状态(包括卷内数据大小)的时间间隔。 +- 引入版本: - + +##### report_resource_usage_interval_ms + +- 默认值: 1000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: BE 代理向 FE(主节点)发送定期资源使用报告的时间间隔(毫秒)。代理工作线程收集 TResourceUsage(运行中的查询数量、已用/限制内存、已用 CPU 千分比和资源组使用情况)并调用 report_task,然后休眠此配置的间隔(参见 task_worker_pool)。较小的值可提高报告的及时性,但会增加 CPU、网络和主节点负载;较大的值可减少开销,但会使资源信息更新不及时。报告会更新相关指标(report_resource_usage_requests_total、report_resource_usage_requests_failed)。根据集群规模和 FE 负载进行调整。 +- 引入版本: v3.2.0 + +##### report_tablet_interval_seconds + +- 默认值: 60 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 报告所有 Tablet 最新版本的时间间隔。 +- 引入版本: - + +##### report_task_interval_seconds + +- 默认值: 10 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 报告任务状态的时间间隔。任务可以是创建表、删除表、加载数据或更改表模式。 +- 引入版本: - + +##### report_workgroup_interval_seconds + +- 默认值: 5 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 报告所有工作组最新版本的时间间隔。 +- 引入版本: - + +### 存储 + +##### alter_tablet_worker_count + +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 用于 Schema Change 的线程数。 +- 引入版本: - + +##### avro_ignore_union_type_tag + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否从 Avro Union 数据类型序列化的 JSON 字符串中剥离类型标签。 +- 引入版本: v3.3.7, v3.4 + +##### base_compaction_check_interval_seconds + +- 默认值: 60 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Base Compaction 的线程轮询时间间隔。 +- 引入版本: - + +##### base_compaction_interval_seconds_since_last_operation + +- 默认值: 86400 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 自上次 Base Compaction 以来的时间间隔。此配置项是触发 Base Compaction 的条件之一。 +- 引入版本: - + +##### base_compaction_num_threads_per_disk + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 每个存储卷用于 Base Compaction 的线程数。 +- 引入版本: - + +##### base_cumulative_delta_ratio + +- 默认值: 0.3 +- 类型: Double +- 单位: - +- 可变: 是 +- 描述: 累积文件大小与基文件大小之比。达到此比率是触发 Base Compaction 的条件之一。 +- 引入版本: - + +##### chaos_test_enable_random_compaction_strategy + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 当此项设置为 `true` 时,TabletUpdates::compaction() 使用为混沌工程测试设计的随机 Compaction 策略 (compaction_random)。此标志强制 Compaction 遵循非确定性/随机策略,而不是正常策略(例如,Size-tiered Compaction),并且在 Tablet 的 Compaction 选择期间具有优先权。它仅用于受控测试:启用它可能会导致不可预测的 Compaction 顺序、增加 I/O/CPU 和测试不稳定性。请勿在生产环境中启用;仅用于故障注入或混沌测试场景。 +- 引入版本: v3.3.12, 3.4.2, 3.5.0, 4.0.0 + +##### check_consistency_worker_count + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于检查 Tablet 一致性的线程数。 +- 引入版本: - + +##### clear_expired_replication_snapshots_interval_seconds + +- 默认值: 3600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 系统清除异常复制留下的过期快照的时间间隔。 +- 引入版本: v3.3.5 + +##### compact_threads + +- 默认值: 4 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 用于并发 Compaction 任务的最大线程数。此配置从 v3.1.7 和 v3.2.2 版本开始变为动态。 +- 引入版本: v3.0.0 + +##### compaction_max_memory_limit + +- 默认值: -1 +- 类型: Long +- 单位: 字节 +- 可变: 否 +- 描述: 此 BE 上 Compaction 任务可用内存的全局上限(字节)。在 BE 初始化期间,最终的 Compaction 内存限制计算为 min(`compaction_max_memory_limit`, `process_mem_limit * compaction_max_memory_limit_percent / 100`)。如果 `compaction_max_memory_limit` 为负数(默认 `-1`),则回退到从 `mem_limit` 派生的 BE 进程内存限制。百分比值被限制在 [0,100] 之间。如果进程内存限制未设置(负数),则 Compaction 内存保持无限制 (`-1`)。此计算值用于初始化 `_compaction_mem_tracker`。另请参见 `compaction_max_memory_limit_percent` 和 `compaction_memory_limit_per_worker`。 +- 引入版本: v3.2.0 + +##### compaction_max_memory_limit_percent + +- 默认值: 100 +- 类型: Int +- 单位: 百分比 +- 可变: 否 +- 描述: Compaction 可能使用的 BE 进程内存百分比。BE 将 Compaction 内存上限计算为 `compaction_max_memory_limit` 和 (进程内存限制 × 此百分比 / 100) 的最小值。如果此值 < 0 或 > 100,则将其视为 100。如果 `compaction_max_memory_limit` < 0,则使用进程内存限制。计算还会考虑从 `mem_limit` 派生的 BE 进程内存。结合 `compaction_memory_limit_per_worker`(每个 worker 的上限),此设置控制可用的总 Compaction 内存,因此会影响 Compaction 并发性和 OOM 风险。 +- 引入版本: v3.2.0 + +##### compaction_memory_limit_per_worker + +- 默认值: 2147483648 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 每个 Compaction 线程允许的最大内存大小。 +- 引入版本: - + +##### compaction_trace_threshold + +- 默认值: 60 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 每次 Compaction 的时间阈值。如果一次 Compaction 的时间超过此阈值,StarRocks 会打印相应的跟踪。 +- 引入版本: - + +##### create_tablet_worker_count + +- 默认值: 3 +- 类型: Int +- 单位: 线程 +- 可变: 是 +- 描述: 设置 AgentServer 线程池中处理 FE 提交的 `TTaskType::CREATE`(创建 Tablet)任务的最大工作线程数。在 BE 启动时,此值用作线程池的最大值(池以最小线程数 = 1 和最大队列大小 = 无限制创建),在运行时更改此值会触发 `ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)`。增加此值可提高并发 Tablet 创建吞吐量(在批量加载或分区创建期间有用);减少此值会限制并发创建操作。提高此值会增加 CPU、内存和 I/O 并发性,并可能导致争用;线程池强制至少一个线程,因此小于 1 的值没有实际效果。 +- 引入版本: v3.2.0 + +##### cumulative_compaction_check_interval_seconds + +- 默认值: 1 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 累积 Compaction 的线程轮询时间间隔。 +- 引入版本: - + +##### cumulative_compaction_num_threads_per_disk + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 每个磁盘的累积 Compaction 线程数。 +- 引入版本: - + +##### data_page_size + +- 默认值: 65536 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 构建列数据和索引页时使用的目标未压缩页面大小(字节)。此值复制到 ColumnWriterOptions.data_page_size 和 IndexedColumnWriterOptions.index_page_size,并由页面构建器(例如 BinaryPlainPageBuilder::is_page_full 和缓冲区保留逻辑)查阅,以决定何时完成页面以及保留多少内存。值为 0 会禁用构建器中的页面大小限制。更改此值会影响页面计数、元数据开销、内存保留和 I/O/压缩权衡(较小的页面→更多页面和元数据;较大的页面→较少页面,可能更好的压缩但更大的内存峰值)。 +- 引入版本: v3.2.4 + +##### default_num_rows_per_column_file_block + +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每个行块中可存储的最大行数。 +- 引入版本: - + +##### delete_worker_count_high_priority + +- 默认值: 1 +- 类型: Int +- 单位: 线程 +- 可变: 否 +- 描述: DeleteTaskWorkerPool 中被分配为 HIGH 优先级删除线程的工作线程数。在启动时,AgentServer 以总线程数 = `delete_worker_count_normal_priority + delete_worker_count_high_priority` 创建删除池;前 `delete_worker_count_high_priority` 个线程被标记为专门尝试弹出 TPriority::HIGH 任务(它们轮询高优先级删除任务,如果没有可用则休眠/循环)。增加此值可增加高优先级删除请求的并发性;减少此值会降低专用容量并可能增加高优先级删除的延迟。 +- 引入版本: v3.2.0 + +##### dictionary_encoding_ratio + +- 默认值: 0.7 +- 类型: Double +- 单位: - +- 可变: 否 +- 描述: StringColumnWriter 在编码推测阶段用于决定 Chunk 的字典 (DICT_ENCODING) 编码和纯 (PLAIN_ENCODING) 编码之间的分数 (0.0–1.0)。代码计算 max_card = row_count * `dictionary_encoding_ratio` 并扫描 Chunk 的不同键计数;如果不同计数超过 max_card,写入器会选择 PLAIN_ENCODING。仅当 Chunk 大小通过 `dictionary_speculate_min_chunk_size`(且当 row_count > dictionary_min_rowcount)时才执行检查。将值设置得更高有利于字典编码(容忍更多不同键);设置得更低会导致更早回退到纯编码。值为 1.0 实际上强制字典编码(不同计数永远不会超过行数)。 +- 引入版本: v3.2.0 + +##### dictionary_encoding_ratio_for_non_string_column + +- 默认值: 0 +- 类型: double +- 单位: - +- 可变: 否 +- 描述: 用于决定是否对非字符串列(数值、日期/时间、Decimal 类型)使用字典编码的比例阈值。启用时(值 > 0.0001),写入器计算 `max_card = row_count * dictionary_encoding_ratio_for_non_string_column`,对于 `row_count > dictionary_min_rowcount` 的样本,仅当 `distinct_count <= max_card` 时才选择 DICT_ENCODING;否则回退到 BIT_SHUFFLE。值为 `0`(默认)禁用非字符串字典编码。此参数类似于 `dictionary_encoding_ratio`,但适用于非字符串列。使用 (0,1] 范围内的值 — 较小的值将字典编码限制为基数较低的列,并减少字典内存/IO 开销。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 + +##### dictionary_page_size + +- 默认值: 1048576 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 构建 Rowset Segment 时使用的字典页大小(字节)。此值在 BE Rowset 代码中读入 `PageBuilderOptions::dict_page_size`,并控制单个字典页中可存储的字典条目数。增加此值可以通过允许更大的字典来提高字典编码列的压缩比,但更大的页面在写入/编码期间会消耗更多内存,并在读取或物化页面时增加 I/O 和延迟。对于大内存、写密集型工作负载保守设置,并避免过大的值以防止运行时性能下降。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 + +##### disk_stat_monitor_interval + +- 默认值: 5 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 监视磁盘健康状态的时间间隔。 +- 引入版本: - + +##### download_low_speed_limit_kbps + +- 默认值: 50 +- 类型: Int +- 单位: KB/秒 +- 可变: 是 +- 描述: 每个 HTTP 请求的下载速度下限。当 HTTP 请求在配置项 `download_low_speed_time` 指定的时间跨度内持续以低于此值的速度运行时,该请求将中止。 +- 引入版本: - + +##### download_low_speed_time + +- 默认值: 300 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: HTTP 请求以低于限制的下载速度运行的最大时间。当 HTTP 请求在此配置项指定的时间跨度内持续以低于 `download_low_speed_limit_kbps` 值时,该请求将中止。 +- 引入版本: - + +##### download_worker_count + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: BE 节点上 restore 作业下载任务的最大线程数。`0` 表示将该值设置为 BE 所在机器的 CPU 核心数。 +- 引入版本: - + +##### drop_tablet_worker_count + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 用于删除 Tablet 的线程数。`0` 表示节点中 CPU 核心数的一半。 +- 引入版本: - + +##### enable_check_string_lengths + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否在加载期间检查数据长度,以解决因 VARCHAR 数据超出边界而导致的 Compaction 失败。 +- 引入版本: - + +##### enable_event_based_compaction_framework + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否启用事件驱动的 Compaction 框架。`true` 表示启用事件驱动的 Compaction 框架,`false` 表示禁用。在 Tablet 数量众多或单个 Tablet 数据量大的场景中,启用事件驱动的 Compaction 框架可以大大减少 Compaction 的开销。 +- 引入版本: - + +##### enable_lazy_delta_column_compaction + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 启用时,Compaction 将优先对部分列更新产生的 Delta 列采用“惰性”策略:StarRocks 将避免急于将 Delta 列文件合并回其主 Segment 文件,以节省 Compaction I/O。实际上,Compaction 选择代码会检查部分列更新 Rowset 和多个候选项;如果找到且此标志为 true,引擎将停止向 Compaction 添加更多输入或仅合并空 Rowset(级别 -1),使 Delta 列保持独立。这会减少 Compaction 期间的即时 I/O 和 CPU,但会以延迟合并为代价(可能导致更多 Segment 和临时存储开销)。正确性和查询语义保持不变。 +- 引入版本: v3.2.3 + +##### enable_new_load_on_memory_limit_exceeded + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 当达到硬内存资源限制时,是否允许新的加载进程。`true` 表示允许新的加载进程,`false` 表示拒绝。 +- 引入版本: v3.3.2 + +##### enable_pk_index_parallel_compaction + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中是否启用主键索引的并行 Compaction。 +- 引入版本: - + +##### enable_pk_index_parallel_execution + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中是否为主键索引操作启用并行执行。启用后,系统会在发布操作期间使用线程池并发处理 Segment,显著提高大型 Tablet 的性能。 +- 引入版本: - + +##### enable_pk_index_eager_build + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否在数据导入和 Compaction 阶段急切构建主键索引文件。启用后,系统会在数据写入期间立即生成持久化 PK 索引文件,从而提高后续查询性能。 +- 引入版本: - + +##### enable_pk_size_tiered_compaction_strategy + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否为主键表启用 Size-tiered Compaction 策略。`true` 表示启用 Size-tiered Compaction 策略,`false` 表示禁用。 +- 引入版本: 此项从 v3.2.4 和 v3.1.10 开始对共享数据集群生效,从 v3.2.5 和 v3.1.10 开始对共享无数据集群生效。 + +##### enable_rowset_verify + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否验证生成的 Rowset 的正确性。启用后,将在 Compaction 和 Schema Change 后检查生成的 Rowset 的正确性。 +- 引入版本: - + +##### enable_size_tiered_compaction_strategy + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否启用 Size-tiered Compaction 策略(不包括主键表)。`true` 表示启用 Size-tiered Compaction 策略,`false` 表示禁用。 +- 引入版本: - + +##### enable_strict_delvec_crc_check + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 当 `enable_strict_delvec_crc_check` 设置为 true 时,我们将对 Delete Vector 执行严格的 CRC32 检查,如果检测到不匹配,将返回失败。 +- 引入版本: - + +##### enable_transparent_data_encryption + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 启用时,StarRocks 将为新写入的存储对象(Segment 文件、删除/更新文件、Rowset Segment、Lake SST、持久索引文件等)创建加密的磁盘工件。写入器 (RowsetWriter/SegmentWriter、Lake UpdateManager/LakePersistentIndex 和相关代码路径) 将从 KeyCache 请求加密信息,将 `encryption_info` 附加到可写文件,并将 `encryption_meta` 持久化到 Rowset/Segment/SSTable 元数据中 (segment_encryption_metas、delete/update encryption metadata)。Frontend 和 Backend/CN 加密标志必须匹配 — 不匹配会导致 BE 在心跳时中止 (LOG(FATAL))。此标志在运行时不可变;在部署前启用它,并确保密钥管理 (KEK) 和 KeyCache 在整个集群中正确配置和同步。 +- 引入版本: v3.3.1, 3.4.0, 3.5.0, 4.0.0 + +##### enable_zero_copy_from_page_cache + +- 默认值: true +- 类型: boolean +- 单位: - +- 可变: 是 +- 描述: 启用时,FixedLengthColumnBase 在追加来自 Page Cache 支持的缓冲区的数据时,可能会避免复制字节。在 `append_numbers` 中,如果满足所有条件,代码将获取传入的 ContainerResource 并设置列的内部资源指针(零拷贝):配置为 true,传入资源被拥有,资源内存与列元素类型对齐,列为空,并且资源长度是元素大小的倍数。启用此功能可减少 CPU 和内存拷贝开销,并可提高摄取/扫描吞吐量。缺点:它将列的生命周期与获取的缓冲区耦合,并依赖于正确的拥有权/对齐;禁用以强制安全复制。 +- 引入版本: - + +##### file_descriptor_cache_clean_interval + +- 默认值: 3600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 清理一定时间内未使用的文件描述符的时间间隔。 +- 引入版本: - + +##### ignore_broken_disk + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 控制当配置的存储路径读/写检查失败或解析失败时的启动行为。当 `false`(默认)时,BE 将 `storage_root_path` 或 `spill_local_storage_dir` 中的任何损坏条目视为致命错误,并将中止启动。当 `true` 时,StarRocks 将跳过(记录警告并移除)任何 `check_datapath_rw` 失败或解析失败的存储路径,以便 BE 可以继续使用剩余的健康路径启动。注意:如果所有配置的路径都被移除,BE 仍将退出。启用此功能可能会掩盖配置错误或失败的磁盘,并导致被忽略路径上的数据不可用;相应地监控日志和磁盘健康状况。 +- 引入版本: v3.2.0 + +##### inc_rowset_expired_sec + +- 默认值: 1800 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 传入数据的过期时间。此配置项用于增量克隆。 +- 引入版本: - + +##### load_process_max_memory_hard_limit_ratio + +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: BE 节点上所有加载进程可占用的内存资源的硬限制(比例)。当 `enable_new_load_on_memory_limit_exceeded` 设置为 `false`,且所有加载进程的内存消耗超过 `load_process_max_memory_limit_percent * load_process_max_memory_hard_limit_ratio` 时,将拒绝新的加载进程。 +- 引入版本: v3.3.2 + +##### load_process_max_memory_limit_percent + +- 默认值: 30 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE 节点上所有加载进程可占用的内存资源的软限制(百分比)。 +- 引入版本: - + +##### lz4_acceleration + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 控制内置 LZ4 压缩器使用的 LZ4“加速”参数(传递给 LZ4_compress_fast_continue)。较高的值优先考虑压缩速度,但会牺牲压缩比;较低的值 (1) 会产生更好的压缩,但速度较慢。有效范围:MIN=1,MAX=65537。此设置会影响 BlockCompression 中所有基于 LZ4 的编解码器(例如 LZ4 和 Hadoop-LZ4),并且只改变压缩的执行方式 — 它不改变 LZ4 格式或解压缩兼容性。对于 CPU 密集型或低延迟工作负载,可接受较大的输出时,向上调整(例如 4、8、...);对于存储或 IO 敏感型工作负载,保持为 1。在更改前使用代表性数据进行测试,因为吞吐量与大小的权衡高度依赖于数据。 +- 引入版本: v3.4.1, 3.5.0, 4.0.0 + +##### lz4_expected_compression_ratio + +- 默认值: 2.1 +- 类型: double +- 单位: 无量纲 (压缩比) +- 可变: 是 +- 描述: 序列化压缩策略用于判断观察到的 LZ4 压缩是否“良好”的阈值 (uncompressed_size / compressed_size)。在 compress_strategy.cpp 中,此值将观察到的 compress_ratio 与 `lz4_expected_compression_speed_mbps` 一起计算奖励指标;如果组合奖励 `>` 1.0,策略会记录正反馈。增加此值会提高预期的压缩比(使条件更难满足),而降低此值会使观察到的压缩更容易被认为是令人满意的。调整以匹配典型数据可压缩性。有效范围:MIN=1,MAX=65537。 +- 引入版本: v3.4.1, 3.5.0, 4.0.0 + +##### lz4_expected_compression_speed_mbps + +- 默认值: 600 +- 类型: double +- 单位: MB/秒 +- 可变: 是 +- 描述: 自适应压缩策略 (CompressStrategy) 使用的预期 LZ4 压缩吞吐量(兆字节/秒)。反馈例程计算 `reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps)`。如果 `reward_ratio > 1.0`,则增加正计数器 (alpha),否则增加负计数器 (beta);这会影响未来数据是否会被压缩。调整此值以反映硬件上典型的 LZ4 吞吐量——提高它会使策略更难将运行分类为“良好”(需要更高的观察速度),降低它会使分类更容易。必须是正有限数。 +- 引入版本: v3.4.1, 3.5.0, 4.0.0 + +##### make_snapshot_worker_count + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: BE 节点上 make snapshot 任务的最大线程数。 +- 引入版本: - + +##### manual_compaction_threads + +- 默认值: 4 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 手动 Compaction 的线程数。 +- 引入版本: - + +##### max_base_compaction_num_singleton_deltas + +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每次 Base Compaction 中可压缩的最大 Segment 数量。 +- 引入版本: - + +##### max_compaction_candidate_num + +- 默认值: 40960 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 候选 Compaction Tablet 的最大数量。如果值过大,会导致高内存使用和高 CPU 负载。 +- 引入版本: - + +##### max_compaction_concurrency + +- 默认值: -1 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Compaction(包括 Base Compaction 和 Cumulative Compaction)的最大并发数。值 `-1` 表示不对并发数施加限制。`0` 表示禁用 Compaction。当事件驱动的 Compaction 框架启用时,此参数是可变的。 +- 引入版本: - + +##### max_cumulative_compaction_num_singleton_deltas + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 单次 Cumulative Compaction 中可合并的最大 Segment 数量。如果 Compaction 期间发生 OOM,可以减小此值。 +- 引入版本: - + +##### max_download_speed_kbps + +- 默认值: 50000 +- 类型: Int +- 单位: KB/秒 +- 可变: 是 +- 描述: 每个 HTTP 请求的最大下载速度。此值影响 BE 节点间数据副本同步的性能。 +- 引入版本: - + +##### max_garbage_sweep_interval + +- 默认值: 3600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 存储卷垃圾回收的最大时间间隔。此配置从 v3.0 开始变为动态。 +- 引入版本: - + +##### max_percentage_of_error_disk + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 在相应的 BE 节点退出之前,存储卷中可容忍的最大错误百分比。 +- 引入版本: - + +##### max_queueing_memtable_per_tablet + +- 默认值: 2 +- 类型: Long +- 单位: 数量 +- 可变: 是 +- 描述: 控制写入路径的每个 Tablet 反压:当 Tablet 的排队(尚未刷新)MemTable 数量达到或超过 `max_queueing_memtable_per_tablet` 时,LocalTabletsChannel 和 LakeTabletsChannel 中的写入器将阻塞(休眠/重试),然后提交更多写入工作。这以增加高负载下的延迟或 RPC 超时为代价,减少了同时的 MemTable 刷新并发性和峰值内存使用。设置更高以允许更多并发 MemTable(更多内存和 I/O 突发);设置更低以限制内存压力并增加写入限流。 +- 引入版本: v3.2.0 + +##### max_row_source_mask_memory_bytes + +- 默认值: 209715200 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: 行源掩码缓冲区的最大内存大小。当缓冲区大于此值时,数据将持久化到磁盘上的临时文件。此值应设置为小于 `compaction_memory_limit_per_worker` 的值。 +- 引入版本: - + +##### max_tablet_write_chunk_bytes + +- 默认值: 536870912 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 当前内存中 Tablet 写入 Chunk 的最大允许内存(字节),在此之前它被视为已满并排队等待发送。增加此值可减少加载宽表(多列)时的 RPC 频率,这可以提高吞吐量,但会增加内存使用和更大的 RPC 有效载荷。调整以平衡更少的 RPC 与内存和序列化/BRPC 限制。 +- 引入版本: v3.2.12 + +##### max_update_compaction_num_singleton_deltas + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键表单次 Compaction 中可合并的最大 Rowset 数量。 +- 引入版本: - + +##### memory_limitation_per_thread_for_schema_change + +- 默认值: 2 +- 类型: Int +- 单位: GB +- 可变: 是 +- 描述: 每个 Schema Change 任务允许的最大内存大小。 +- 引入版本: - + +##### memory_ratio_for_sorting_schema_change + +- 默认值: 0.8 +- 类型: Double +- 单位: - (无单位比率) +- 可变: 是 +- 描述: 每个线程 Schema Change 内存限制的比例,用作排序 Schema Change 操作期间 MemTable 最大缓冲区大小。该比率乘以 `memory_limitation_per_thread_for_schema_change`(以 GB 配置并转换为字节)以计算 `max_buffer_size`,结果上限为 4GB。由 SchemaChangeWithSorting 和 SortedSchemaChange 在创建 MemTable/DeltaWriter 时使用。增加此比率允许更大的内存缓冲区(更少的刷新/合并),但会增加内存压力的风险;减少此比率会导致更频繁的刷新和更高的 I/O/合并开销。 +- 引入版本: v3.2.0 + +##### min_base_compaction_num_singleton_deltas + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 触发 Base Compaction 的最小 Segment 数量。 +- 引入版本: - + +##### min_compaction_failure_interval_sec + +- 默认值: 120 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 自上次 Compaction 失败以来,可调度 Tablet Compaction 的最短时间间隔。 +- 引入版本: - + +##### min_cumulative_compaction_failure_interval_sec + +- 默认值: 30 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Cumulative Compaction 失败后重试的最短时间间隔。 +- 引入版本: - + +##### min_cumulative_compaction_num_singleton_deltas + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 触发 Cumulative Compaction 的最小 Segment 数量。 +- 引入版本: - + +##### min_garbage_sweep_interval + +- 默认值: 180 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 存储卷垃圾回收的最小时间间隔。此配置从 v3.0 开始变为动态。 +- 引入版本: - + +##### parallel_clone_task_per_path + +- 默认值: 8 +- 类型: Int +- 单位: 线程 +- 可变: 是 +- 描述: BE 上每个存储路径分配的并行克隆工作线程数。在 BE 启动时,克隆线程池的最大线程数计算为 max(number_of_store_paths * parallel_clone_task_per_path, MIN_CLONE_TASK_THREADS_IN_POOL)。例如,对于 4 个存储路径和默认值 8,克隆池最大线程数 = 32。此设置直接控制 BE 处理 CLONE 任务(Tablet 副本拷贝)的并发性:增加它会提高并行克隆吞吐量,但也会增加 CPU、磁盘和网络争用;减少它会限制同时克隆任务并可能限制 FE 调度的克隆操作。该值应用于动态克隆线程池,可以通过 update-config HTTP 操作在运行时更改(导致 agent_server 更新克隆池的最大线程数)。 +- 引入版本: v3.2.0 + +##### partial_update_memory_limit_per_worker + +- 默认值: 2147483648 +- 类型: long +- 单位: 字节 +- 可变: 是 +- 描述: 单个 Worker 在执行部分列更新时(用于 Compaction / Rowset 更新处理)用于组装源 Chunk 的最大内存(字节)。读取器估计每行更新内存 (`total_update_row_size / num_rows_upt`) 并将其乘以读取的行数;当该乘积超过此限制时,当前 Chunk 将被刷新和处理,以避免额外的内存增长。设置此值以匹配每个更新 Worker 的可用内存——过低会增加 I/O/处理开销(许多小 Chunk);过高会增加内存压力或 OOM 风险。如果每行估计值为零(旧版 Rowset),则此配置不施加基于字节的限制(仅适用 INT32_MAX 行数限制)。 +- 引入版本: v3.2.10 + +##### path_gc_check + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 启用时,StorageEngine 会启动每个数据目录的后台线程,执行定期路径扫描和垃圾回收。在启动时,`start_bg_threads()` 会生成 `_path_scan_thread_callback`(调用 `DataDir::perform_path_scan` 和 `perform_tmp_path_scan`)以及 `_path_gc_thread_callback`(调用 `DataDir::perform_path_gc_by_tablet`、`DataDir::perform_path_gc_by_rowsetid`、`DataDir::perform_delta_column_files_gc` 和 `DataDir::perform_crm_gc`)。扫描和 GC 间隔由 `path_scan_interval_second` 和 `path_gc_check_interval_second` 控制;CRM 文件清理使用 `unused_crm_file_threshold_second`。禁用此功能可防止自动路径级清理(您必须手动管理孤立/临时文件)。更改此标志需要重新启动进程。 +- 引入版本: v3.2.0 + +##### path_gc_check_interval_second + +- 默认值: 86400 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: 存储引擎路径垃圾回收后台线程运行之间的时间间隔(秒)。每次唤醒都会触发 DataDir 按 Tablet、按 Rowset ID、Delta Column 文件 GC 和 CRM GC 执行路径 GC(CRM GC 调用使用 `unused_crm_file_threshold_second`)。如果设置为非正值,代码会强制将间隔设置为 1800 秒(半小时)并发出警告。调整此值以控制扫描和删除磁盘上的临时或下载文件的频率。 +- 引入版本: v3.2.0 + +##### pending_data_expire_time_sec + +- 默认值: 1800 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 存储引擎中待处理数据的过期时间。 +- 引入版本: - + +##### pindex_major_compaction_limit_per_disk + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 每个磁盘上 Compaction 的最大并发数。这解决了因 Compaction 导致的磁盘 I/O 不均衡问题。该问题可能导致某些磁盘的 I/O 过高。 +- 引入版本: v3.0.9 + +##### pk_index_compaction_score_ratio + +- 默认值: 1.5 +- 类型: Double +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键索引的 Compaction 得分比。例如,如果有 N 个文件集,Compaction 得分将为 `N * pk_index_compaction_score_ratio`。 +- 引入版本: - + +##### pk_index_early_sst_compaction_threshold + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键索引的早期 SST Compaction 阈值。 +- 引入版本: - + +##### pk_index_map_shard_size + +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Lake UpdateManager 中主键索引分片映射使用的分片数量。UpdateManager 分配一个此大小的 `PkIndexShard` 向量,并通过位掩码将 Tablet ID 映射到分片。增加此值可减少否则会共享同一分片的 Tablet 之间的锁争用,但代价是更多的互斥对象和稍高的内存使用。该值必须是 2 的幂,因为代码依赖于位掩码索引。有关大小调整指导,请参见 `tablet_map_shard_size` 启发式方法:`total_num_of_tablets_in_BE / 512`。 +- 引入版本: v3.2.0 + +##### pk_index_memtable_flush_threadpool_max_threads + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键索引 MemTable 刷新的线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 +- 引入版本: - + +##### pk_index_memtable_flush_threadpool_size + +- 默认值: 1048576 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 控制共享数据(云原生/Lake)模式下主键索引 MemTable 刷新线程池的最大队列大小(挂起任务数)。线程池在 ExecEnv 中创建为 "cloud_native_pk_index_flush";其最大线程数由 `pk_index_memtable_flush_threadpool_max_threads` 管理。增加此值允许更多 MemTable 刷新任务在执行前缓冲,这可以减少即时反压,但会增加排队任务对象消耗的内存。减少此值会限制缓冲任务,并可能根据线程池行为导致更早的反压或任务拒绝。根据可用内存和预期并发刷新工作负载进行调整。 +- 引入版本: - + +##### pk_index_memtable_max_count + +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键索引 MemTable 的最大数量。 +- 引入版本: - + +##### pk_index_memtable_max_wait_flush_timeout_ms + +- 默认值: 30000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 共享数据集群中等待主键索引 MemTable 刷新完成的最大超时时间。当同步刷新所有 MemTable 时(例如,在摄取 SST 操作之前),系统会等待直到此超时。默认值为 30 秒。 +- 引入版本: - + +##### pk_index_parallel_compaction_task_split_threshold_bytes + +- 默认值: 33554432 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 主键索引 Compaction 任务的拆分阈值。当任务涉及的文件总大小小于此阈值时,任务将不会被拆分。 +- 引入版本: - + +##### pk_index_parallel_compaction_threadpool_max_threads + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中云原生主键索引并行 Compaction 线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 +- 引入版本: - + +##### pk_index_parallel_compaction_threadpool_size + +- 默认值: 1048576 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据模式下云原生主键索引并行 Compaction 使用的线程池的最大队列大小(待处理任务数)。此设置控制在线程池拒绝新提交之前可以排队多少 Compaction 任务。有效并行度受 `pk_index_parallel_compaction_threadpool_max_threads` 限制;当您预期有许多并发 Compaction 任务时,增加此值以避免任务拒绝,但请注意,较大的队列可能会增加排队工作的内存和延迟。 +- 引入版本: - + +##### pk_index_parallel_execution_min_rows + +- 默认值: 16384 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中启用主键索引操作并行执行的最小行数阈值。 +- 引入版本: - + +##### pk_index_parallel_execution_threadpool_max_threads + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键索引并行执行线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 +- 引入版本: - + +##### pk_index_size_tiered_level_multiplier + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键索引 Size-tiered Compaction 策略的级别乘数参数。 +- 引入版本: - + +##### pk_index_size_tiered_max_level + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键索引 Size-tiered Compaction 策略的最大级别。 +- 引入版本: - + +##### pk_index_size_tiered_min_level_size + +- 默认值: 131072 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键索引 Size-tiered Compaction 策略的最小级别。 +- 引入版本: - + +##### pk_index_sstable_sample_interval_bytes + +- 默认值: 16777216 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 共享数据集群中 SSTable 文件的采样间隔大小。当 SSTable 文件大小超过此阈值时,系统会以该间隔从 SSTable 中采样键,以优化 Compaction 任务的边界分区。对于小于此阈值的 SSTable,仅使用起始键作为边界键。默认值为 16 MB。 +- 引入版本: - + +##### pk_index_target_file_size + +- 默认值: 67108864 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 共享数据集群中主键索引的目标文件大小。 +- 引入版本: - + +##### pk_index_eager_build_threshold_bytes + +- 默认值: 104857600 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 当 `enable_pk_index_eager_build` 设置为 true 时,系统仅在导入或 Compaction 期间生成的数据超过此阈值时才会急切构建 PK 索引文件。默认值为 100MB。 +- 引入版本: - + +##### primary_key_limit_size + +- 默认值: 128 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 主键表中键列的最大大小。 +- 引入版本: v2.5 + +##### release_snapshot_worker_count + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: BE 节点上 release snapshot 任务的最大线程数。 +- 引入版本: - + +##### repair_compaction_interval_seconds + +- 默认值: 600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 轮询 Repair Compaction 线程的时间间隔。 +- 引入版本: - + +##### replication_max_speed_limit_kbps + +- 默认值: 50000 +- 类型: Int +- 单位: KB/秒 +- 可变: 是 +- 描述: 每个复制线程的最大速度。 +- 引入版本: v3.3.5 + +##### replication_min_speed_limit_kbps + +- 默认值: 50 +- 类型: Int +- 单位: KB/秒 +- 可变: 是 +- 描述: 每个复制线程的最小速度。 +- 引入版本: v3.3.5 + +##### replication_min_speed_time_seconds + +- 默认值: 300 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 复制线程低于最小速度允许持续的时间。如果实际速度低于 `replication_min_speed_limit_kbps` 的时间超过此值,复制将失败。 +- 引入版本: v3.3.5 + +##### replication_threads + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 用于复制的最大线程数。`0` 表示将线程数设置为 BE CPU 核心数的四倍。 +- 引入版本: v3.3.5 + +##### size_tiered_level_multiple + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Size-tiered Compaction 策略中两个连续级别之间的数据大小倍数。 +- 引入版本: - + +##### size_tiered_level_multiple_dupkey + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 在 Size-tiered Compaction 策略中,两个相邻级别之间数据量差异的倍数,用于 Duplicate Key 表。 +- 引入版本: - + +##### size_tiered_level_num + +- 默认值: 7 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Size-tiered Compaction 策略的级别数。每个级别最多保留一个 Rowset。因此,在稳定条件下,Rowset 的数量最多与此配置项中指定的级别数相同。 +- 引入版本: - + +##### size_tiered_max_compaction_level + +- 默认值: 3 +- 类型: Int +- 单位: 级别 +- 可变: 是 +- 描述: 限制在一个主键实时 Compaction 任务中可以合并的 Size-tiered 级别数量。在 PK Size-tiered Compaction 选择期间,StarRocks 会按大小构建有序的 Rowset “级别”,并将连续的级别添加到选定的 Compaction 输入中,直到达到此限制(代码使用 `compaction_level <= size_tiered_max_compaction_level`)。该值是包含性的,并计算合并的不同 Size-tiered 数量(最高级别计为 1)。仅当 PK Size-tiered Compaction 策略启用时有效;提高它允许 Compaction 任务包含更多级别(更大、I/O 和 CPU 密集型合并,潜在更高的写入放大),而降低它会限制合并并减少任务大小和资源使用。 +- 引入版本: v4.0.0 + +##### size_tiered_min_level_size + +- 默认值: 131072 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: Size-tiered Compaction 策略中最小级别的数据大小。小于此值的 Rowset 将立即触发数据 Compaction。 +- 引入版本: - + +##### small_dictionary_page_size + +- 默认值: 4096 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: BinaryPlainPageDecoder 用于决定是否急切解析字典(二进制/纯文本)页面的阈值(字节)。如果页面编码大小小于 `small_dictionary_page_size`,解码器会预解析所有字符串条目到内存向量 (`_parsed_datas`) 中,以加速随机访问和批量读取。增加此值会导致更多页面被预解析(这可以减少每次访问的解码开销,并可能提高较大字典的有效压缩),但会增加内存使用和解析所花费的 CPU;过大的值可能会降低整体性能。仅在测量内存和访问延迟权衡后才进行调整。 +- 引入版本: v3.4.1, v3.5.0 + +##### snapshot_expire_time_sec + +- 默认值: 172800 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 快照文件的过期时间。 +- 引入版本: - + +##### stale_memtable_flush_time_sec + +- 默认值: 0 +- 类型: long +- 单位: 秒 +- 可变: 是 +- 描述: 当发送者作业的内存使用量很高时,如果 MemTable 超过 `stale_memtable_flush_time_sec` 秒未更新,将被刷新以减少内存压力。此行为仅在内存限制接近时(`limit_exceeded_by_ratio(70)` 或更高)才被考虑。在 LocalTabletsChannel 中,当内存使用量非常高时(`limit_exceeded_by_ratio(95)`),可能会额外刷新大小超过 `write_buffer_size / 4` 的 MemTable。值为 `0` 会禁用这种基于年龄的过期 MemTable 刷新(不可变分区 MemTable 在空闲或内存高时仍会立即刷新)。 +- 引入版本: v3.2.0 + +##### storage_flood_stage_left_capacity_bytes + +- 默认值: 107374182400 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 所有 BE 目录中剩余存储空间的硬限制。如果 BE 存储目录的剩余存储空间小于此值且存储使用率(百分比)超过 `storage_flood_stage_usage_percent`,则拒绝 Load 和 Restore 作业。您需要将此项与 FE 配置项 `storage_usage_hard_limit_reserve_bytes` 一起设置才能使配置生效。 +- 引入版本: - + +##### storage_flood_stage_usage_percent + +- 默认值: 95 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 所有 BE 目录中存储使用率的硬限制(百分比)。如果 BE 存储目录的存储使用率(百分比)超过此值且剩余存储空间小于 `storage_flood_stage_left_capacity_bytes`,则拒绝 Load 和 Restore 作业。您需要将此项与 FE 配置项 `storage_usage_hard_limit_percent` 一起设置才能使配置生效。 +- 引入版本: - + +##### storage_high_usage_disk_protect_ratio + +- 默认值: 0.1 +- 类型: double +- 单位: - +- 可变: 是 +- 描述: 在选择用于 Tablet 创建的存储根目录时,StorageEngine 按 `disk_usage(0)` 对候选磁盘进行排序并计算平均使用率。任何使用率大于(平均使用率 + `storage_high_usage_disk_protect_ratio`)的磁盘都会被排除在优先选择池之外(它将不参与随机优先混洗,因此最初被选择的机会被推迟)。设置为 0 可禁用此保护。值是小数(典型范围 0.0–1.0);较大的值使调度程序对高于平均水平的磁盘更具容忍度。 +- 引入版本: v3.2.0 + +##### storage_medium_migrate_count + +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 用于存储介质迁移(从 SATA 到 SSD)的线程数。 +- 引入版本: - + +##### storage_root_path + +- 默认值: `${STARROCKS_HOME}/storage` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 存储卷的目录和介质。示例:`/data1,medium:hdd;/data2,medium:ssd`。 + - 多个卷用分号 (`;`) 分隔。 + - 如果存储介质为 SSD,在目录末尾添加 `,medium:ssd`。 + - 如果存储介质为 HDD,在目录末尾添加 `,medium:hdd`。 +- 引入版本: - + +##### sync_tablet_meta + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 一个布尔值,控制是否启用 Tablet 元数据的同步。`true` 表示启用同步,`false` 表示禁用。 +- 引入版本: - + +##### tablet_map_shard_size + +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Tablet Map 的分片大小。该值必须是 2 的幂。 +- 引入版本: - + +##### tablet_max_pending_versions + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键 Tablet 上可容忍的最大待处理版本数。待处理版本指的是已提交但尚未应用的版本。 +- 引入版本: - + +##### tablet_max_versions + +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Tablet 上允许的最大版本数。如果版本数超过此值,新的写入请求将失败。 +- 引入版本: - + +##### tablet_meta_checkpoint_min_interval_secs + +- 默认值: 600 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: TabletMeta Checkpoint 的线程轮询时间间隔。 +- 引入版本: - + +##### tablet_meta_checkpoint_min_new_rowsets_num + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 自上次 TabletMeta Checkpoint 以来创建的最小 Rowset 数量。 +- 引入版本: - + +##### tablet_rowset_stale_sweep_time_sec + +- 默认值: 1800 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 清理 Tablet 中过期 Rowset 的时间间隔。 +- 引入版本: - + +##### tablet_stat_cache_update_interval_second + +- 默认值: 300 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Tablet Stat Cache 的更新时间间隔。 +- 引入版本: - + +##### tablet_writer_open_rpc_timeout_sec + +- 默认值: 300 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 在远程 BE 上打开 Tablet Writer 的 RPC 超时(秒)。该值转换为毫秒,并应用于发出打开调用时的请求超时和 bRPC 控制超时。运行时使用 `tablet_writer_open_rpc_timeout_sec` 和总体加载超时的一半(即 min(`tablet_writer_open_rpc_timeout_sec`, `load_timeout_sec` / 2))中的最小值作为有效超时。设置此值以平衡及时故障检测(过小可能导致过早的打开失败)和为 BE 提供足够时间初始化写入器(过大延迟错误处理)。 +- 引入版本: v3.2.0 + +##### transaction_apply_worker_count + +- 默认值: 0 +- 类型: Int +- 单位: 线程 +- 可变: 是 +- 描述: 控制 UpdateManager 中 "update_apply" 线程池使用的最大工作线程数 — 该线程池用于应用事务的 Rowset(特别是对于主键表)。值 `>0` 设置固定的最大线程数;0(默认值)使池大小等于 CPU 核心数。配置的值在启动时应用 (UpdateManager::init),可以通过 update-config HTTP 操作在运行时更改,该操作会更新池的最大线程数。调整此值以增加应用并发性(吞吐量)或限制 CPU/内存争用;最小线程数和空闲超时分别由 `transaction_apply_thread_pool_num_min` 和 `transaction_apply_worker_idle_time_ms` 控制。 +- 引入版本: v3.2.0 + +##### transaction_apply_worker_idle_time_ms + +- 默认值: 500 +- 类型: int +- 单位: 毫秒 +- 可变: 否 +- 描述: 设置 UpdateManager 中用于应用事务/更新的 "update_apply" 线程池的空闲超时(毫秒)。该值通过 MonoDelta::FromMilliseconds 传递给 ThreadPoolBuilder::set_idle_timeout,因此空闲时间超过此超时的 Worker 线程可能会被终止(受池配置的最小线程数和最大线程数限制)。较小的值可更快释放资源,但会增加突发负载下的线程创建/销毁开销;较大的值可使 Worker 在短时突发期间保持“热”状态,但代价是更高的基线资源使用。 +- 引入版本: v3.2.11 + +##### trash_file_expire_time_sec + +- 默认值: 86400 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 清理垃圾文件的时间间隔。默认值已从 v2.5.17、v3.0.9 和 v3.1.6 开始从 259,200 更改为 86,400。 +- 引入版本: - + +##### unused_rowset_monitor_interval + +- 默认值: 30 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 清理过期 Rowset 的时间间隔。 +- 引入版本: - + +##### update_cache_expire_sec + +- 默认值: 360 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Update Cache 的过期时间。 +- 引入版本: - + +##### update_compaction_check_interval_seconds + +- 默认值: 10 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 检查主键表 Compaction 的时间间隔。 +- 引入版本: - + +##### update_compaction_delvec_file_io_amp_ratio + +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 用于控制主键表中包含 Delvec 文件的 Rowset 的 Compaction 优先级。值越大,优先级越高。 +- 引入版本: - + +##### update_compaction_num_threads_per_disk + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键表每个磁盘的 Compaction 线程数。 +- 引入版本: - + +##### update_compaction_per_tablet_min_interval_seconds + +- 默认值: 120 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: 主键表中每个 Tablet 触发 Compaction 的最小时间间隔。 +- 引入版本: - + +##### update_compaction_ratio_threshold + +- 默认值: 0.5 +- 类型: Double +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键表 Compaction 可合并的最大数据比例。如果单个 Tablet 过大,建议减小此值。 +- 引入版本: v3.1.5 + +##### update_compaction_result_bytes + +- 默认值: 1073741824 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 主键表单次 Compaction 的最大结果大小。 +- 引入版本: - + +##### update_compaction_size_threshold + +- 默认值: 268435456 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 主键表的 Compaction 得分是根据文件大小计算的,这与其他表类型不同。此参数可用于使主键表的 Compaction 得分与其他表类型相似,从而更易于用户理解。 +- 引入版本: - + +##### upload_worker_count + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: BE 节点上备份作业上传任务的最大线程数。`0` 表示将该值设置为 BE 所在机器的 CPU 核心数。 +- 引入版本: - + +##### vertical_compaction_max_columns_per_group + +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 垂直 Compaction 中每组的最大列数。 +- 引入版本: - + +### 共享数据 + +##### download_buffer_size + +- 默认值: 4194304 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 下载快照文件时使用的内存中拷贝缓冲区大小(字节)。SnapshotLoader::download 将此值传递给 fs::copy 作为每传输块大小,用于从远程顺序文件读取到本地可写文件。较大的值可以通过减少系统调用/IO 开销来提高高带宽链接的吞吐量;较小的值会减少每个活动传输的峰值内存使用。注意:此参数控制每个流的缓冲区大小,而不是下载线程数——总内存消耗 = `download_buffer_size * number_of_concurrent_downloads`。 +- 引入版本: v3.2.13 + +##### graceful_exit_wait_for_frontend_heartbeat + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 确定是否等待至少一个 Frontend 心跳响应指示 SHUTDOWN 状态,然后才完成优雅退出。启用时,优雅关闭过程保持活动状态,直到通过心跳 RPC 响应 SHUTDOWN 确认,确保 Frontend 在两个常规心跳间隔之间有足够的时间检测终止状态。 +- 引入版本: v3.4.5 + +##### lake_compaction_stream_buffer_size_bytes + +- 默认值: 1048576 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 共享数据集群中云原生表 Compaction 的读取器远程 I/O 缓冲区大小。默认值为 1MB。您可以增加此值以加速 Compaction 进程。 +- 引入版本: v3.2.3 + +##### lake_pk_compaction_max_input_rowsets + +- 默认值: 500 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中主键表 Compaction 任务允许的最大输入 Rowset 数量。此参数的默认值从 v3.2.4 和 v3.1.10 开始从 `5` 更改为 `1000`,从 v3.3.1 和 v3.2.9 开始更改为 `500`。在为主键表启用 Size-tiered Compaction 策略(通过将 `enable_pk_size_tiered_compaction_strategy` 设置为 `true`)后,StarRocks 不需要限制每次 Compaction 的 Rowset 数量以减少写入放大。因此,此参数的默认值增加了。 +- 引入版本: v3.1.8, v3.2.3 + +##### loop_count_wait_fragments_finish + +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: BE/CN 进程退出时要等待的循环次数。每个循环是固定的 10 秒间隔。您可以将其设置为 `0` 以禁用循环等待。从 v3.4 开始,此项变为可变,其默认值从 `0` 更改为 `2`。 +- 引入版本: v2.5 + +##### max_client_cache_size_per_host + +- 默认值: 10 +- 类型: Int +- 单位: 每个主机条目(缓存的客户端实例) +- 可变: 否 +- 描述: BE 范围客户端缓存为每个远程主机保留的最大缓存客户端实例数。此单个设置在 ExecEnv 初始化期间创建 BackendServiceClientCache、FrontendServiceClientCache 和 BrokerServiceClientCache 时使用,因此它限制了这些缓存中为每个主机保留的客户端存根/连接数。增加此值可减少重新连接和存根创建开销,但会增加内存和文件描述符使用;减少此值可节省资源,但可能增加连接流失。该值在启动时读取,不能在运行时更改。目前一个共享设置控制所有客户端缓存类型;以后可能会引入每个缓存的单独配置。 +- 引入版本: v3.2.0 + +##### starlet_filesystem_instance_cache_capacity + +- 默认值: 10000 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Starlet 文件系统实例的缓存容量。 +- 引入版本: v3.2.16, v3.3.11, v3.4.1 + +##### starlet_filesystem_instance_cache_ttl_sec + +- 默认值: 86400 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Starlet 文件系统实例的缓存过期时间。 +- 引入版本: v3.3.15, 3.4.5 + +##### starlet_port + +- 默认值: 9070 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: BE 和 CN 的额外代理服务端口。 +- 引入版本: - + +##### starlet_star_cache_disk_size_percent + +- 默认值: 80 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 共享数据集群中 Data Cache 最多可使用的磁盘容量百分比。 +- 引入版本: v3.1 + +##### starlet_use_star_cache + +- 默认值: v3.1 为 false,从 v3.2.3 起为 true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中是否启用 Data Cache。`true` 表示启用此功能,`false` 表示禁用。默认值从 v3.2.3 开始从 `false` 更改为 `true`。 +- 引入版本: v3.1 + +##### starlet_write_file_with_tag + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 在共享数据集群中,是否为写入对象存储的文件添加对象存储标签,以便于自定义文件管理。 +- 引入版本: v3.5.3 + +##### table_schema_service_max_retries + +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 表模式服务请求的最大重试次数。 +- 引入版本: v4.1 + +### 数据湖 + +##### datacache_block_buffer_enable + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否启用 Block Buffer 以优化 Data Cache 效率。启用 Block Buffer 后,系统会从 Data Cache 读取 Block 数据并将其缓存在临时缓冲区中,从而减少频繁缓存读取带来的额外开销。 +- 引入版本: v3.2.0 + +##### datacache_disk_adjust_interval_seconds + +- 默认值: 10 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Data Cache 自动容量伸缩的间隔。系统会定期检查缓存磁盘使用情况,并在必要时触发自动伸缩。 +- 引入版本: v3.3.0 + +##### datacache_disk_idle_seconds_for_expansion + +- 默认值: 7200 +- 类型: Int +- 单位: 秒 +- 可变: 是 +- 描述: Data Cache 自动扩展的最小等待时间。仅当磁盘使用率低于 `datacache_disk_low_level` 持续时间超过此值时,才会触发自动扩展。 +- 引入版本: v3.3.0 + +##### datacache_disk_size + +- 默认值: 0 +- 类型: String +- 单位: - +- 可变: 是 +- 描述: 单个磁盘上可缓存的最大数据量。可以将其设置为百分比(例如 `80%`)或物理限制(例如 `2T`、`500G`)。例如,如果您使用两个磁盘并将 `datacache_disk_size` 参数的值设置为 `21474836480` (20 GB),则这两个磁盘上最多可缓存 40 GB 数据。默认值为 `0`,表示只使用内存来缓存数据。 +- 引入版本: - + +##### datacache_enable + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 否 +- 描述: 是否启用 Data Cache。`true` 表示启用 Data Cache,`false` 表示禁用 Data Cache。默认值从 v3.3 开始更改为 `true`。 +- 引入版本: - + +##### datacache_eviction_policy + +- 默认值: slru +- 类型: String +- 单位: - +- 可变: 否 +- 描述: Data Cache 的逐出策略。有效值:`lru`(最近最少使用)和 `slru`(分段 LRU)。 +- 引入版本: v3.4.0 + +##### datacache_inline_item_count_limit + +- 默认值: 130172 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Data Cache 中内联缓存项的最大数量。对于一些特别小的缓存块,Data Cache 将它们以 `inline` 模式存储,该模式将块数据和元数据一起缓存在内存中。 +- 引入版本: v3.4.0 + +##### datacache_mem_size + +- 默认值: 0 +- 类型: String +- 单位: - +- 可变: 是 +- 描述: 内存中可缓存的最大数据量。可以将其设置为百分比(例如 `10%`)或物理限制(例如 `10G`、`21474836480`)。 +- 引入版本: - + +##### datacache_min_disk_quota_for_adjustment + +- 默认值: 10737418240 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: Data Cache 自动伸缩的最小有效容量。如果系统尝试将缓存容量调整到小于此值,则缓存容量将直接设置为 `0`,以防止因缓存容量不足导致频繁缓存填充和逐出而造成的次优性能。 +- 引入版本: v3.3.0 + +##### disk_high_level + +- 默认值: 90 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 触发缓存容量自动扩容的磁盘使用率上限(百分比)。当磁盘使用率超过此值时,系统会自动从 Data Cache 中逐出缓存数据。从 v3.4.0 开始,默认值从 `80` 更改为 `90`。从 v4.0 开始,此项从 `datacache_disk_high_level` 重命名为 `disk_high_level`。 +- 引入版本: v3.3.0 + +##### disk_low_level + +- 默认值: 60 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 触发缓存容量自动缩容的磁盘使用率下限(百分比)。当磁盘使用率在此值以下持续时间超过 `datacache_disk_idle_seconds_for_expansion` 且为 Data Cache 分配的空间被完全利用时,系统将通过增加上限自动扩展缓存容量。从 v4.0 开始,此项从 `datacache_disk_low_level` 重命名为 `disk_low_level`。 +- 引入版本: v3.3.0 + +##### disk_safe_level + +- 默认值: 80 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: Data Cache 的磁盘使用率安全级别(百分比)。当 Data Cache 执行自动伸缩时,系统会调整缓存容量,目标是使磁盘使用率尽可能接近此值。从 v3.4.0 开始,默认值从 `70` 更改为 `80`。从 v4.0 开始,此项从 `datacache_disk_safe_level` 重命名为 `disk_safe_level`。 +- 引入版本: v3.3.0 + +##### enable_connector_sink_spill + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否为写入外部表启用溢出 (Spilling)。启用此功能可以防止在内存不足时写入外部表导致生成大量小文件。目前,此功能仅支持写入 Iceberg 表。 +- 引入版本: v4.0.0 + +##### enable_datacache_disk_auto_adjust + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 是否启用 Data Cache 磁盘容量的自动伸缩。启用后,系统会根据当前磁盘使用率动态调整缓存容量。从 v4.0 开始,此项从 `datacache_auto_adjust_enable` 重命名为 `enable_datacache_disk_auto_adjust`。 +- 引入版本: v3.3.0 + +##### jdbc_connection_idle_timeout_ms + +- 默认值: 600000 +- 类型: Int +- 单位: 毫秒 +- 可变: 否 +- 描述: JDBC 连接池中空闲连接过期的时间长度。如果 JDBC 连接池中的连接空闲时间超过此值,连接池将关闭超过配置项 `jdbc_minimum_idle_connections` 指定数量的空闲连接。 +- 引入版本: - + +##### jdbc_connection_pool_size + +- 默认值: 8 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: JDBC 连接池大小。在每个 BE 节点上,使用相同 `jdbc_url` 访问外部表的查询共享同一个连接池。 +- 引入版本: - + +##### jdbc_minimum_idle_connections + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: JDBC 连接池中的最小空闲连接数。 +- 引入版本: - + +##### lake_clear_corrupted_cache_data + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中是否允许系统清除损坏的数据缓存。 +- 引入版本: v3.4 + +##### lake_clear_corrupted_cache_meta + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中是否允许系统清除损坏的元数据缓存。 +- 引入版本: v3.3 + +##### lake_enable_vertical_compaction_fill_data_cache + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 共享数据集群中是否允许垂直 Compaction 任务将数据缓存到本地磁盘。 +- 引入版本: v3.1.7, v3.2.3 + +##### lake_replication_read_buffer_size + +- 默认值: 16777216 +- 类型: Long +- 单位: 字节 +- 可变: 是 +- 描述: 在 Lake Replication 期间下载 Lake Segment 文件时使用的读取缓冲区大小。此值确定读取远程文件的每个读取分配;实现使用此设置和 1 MB 最小值的较大者。较大的值会减少读取调用次数并可以提高吞吐量,但会增加每个并发下载使用的内存;较小的值会降低内存使用,但会增加 I/O 调用次数。根据网络带宽、存储 I/O 特性以及并行复制线程数进行调整。 +- 引入版本: - + +##### lake_service_max_concurrency + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 共享数据集群中 RPC 请求的最大并发数。当达到此阈值时,传入请求将被拒绝。当此项设置为 `0` 时,不限制并发数。 +- 引入版本: - + +##### max_hdfs_scanner_num + +- 默认值: 50 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: 限制 ConnectorScanNode 可以同时运行的最大连接器 (HDFS/远程) 扫描器数量。在扫描启动期间,节点计算估计并发性(基于内存、块大小和 scanner_row_num),然后使用此值作为上限,以确定要保留的扫描器和块数量以及要启动的扫描器线程数量。在运行时调度待处理扫描器时也会参考此值(以避免过度订阅),以及在考虑文件句柄限制时决定可以重新提交多少待处理扫描器。降低此值可减少线程、内存和打开文件压力,但可能影响吞吐量;增加此值会提高并发性和资源使用率。 +- 引入版本: v3.2.0 + +##### query_max_memory_limit_percent + +- 默认值: 90 +- 类型: Int +- 单位: - +- 可变: 否 +- 描述: Query Pool 可使用的最大内存。它表示为进程内存限制的百分比。 +- 引入版本: v3.1.0 + +##### rocksdb_max_write_buffer_memory_bytes + +- 默认值: 1073741824 +- 类型: Int64 +- 单位: - +- 可变: 否 +- 描述: 这是 RocksDB 中元数据写入缓冲区的最大大小。默认值为 1GB。 +- 引入版本: v3.5.0 + +##### rocksdb_write_buffer_memory_percent + +- 默认值: 5 +- 类型: Int64 +- 单位: - +- 可变: 否 +- 描述: 这是 RocksDB 中元数据写入缓冲区的内存百分比。默认值为系统内存的 5%。但是,除此之外,写入缓冲区内存的最终计算大小不会小于 64MB 也不会超过 1G (`rocksdb_max_write_buffer_memory_bytes`)。 +- 引入版本: v3.5.0 + +### 其他 + +##### default_mv_resource_group_concurrency_limit + +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 资源组 `default_mv_wg` 中物化视图刷新任务的最大并发数(每个 BE 节点)。默认值 `0` 表示没有限制。 +- 引入版本: v3.1 + +##### default_mv_resource_group_cpu_limit + +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 资源组 `default_mv_wg` 中物化视图刷新任务可使用的最大 CPU 核心数(每个 BE 节点)。 +- 引入版本: v3.1 + +##### default_mv_resource_group_memory_limit + +- 默认值: 0.8 +- 类型: Double +- 单位: +- 可变: 是 +- 描述: 资源组 `default_mv_wg` 中物化视图刷新任务可使用的最大内存比例(每个 BE 节点)。默认值表示内存的 80%。 +- 引入版本: v3.1 + +##### default_mv_resource_group_spill_mem_limit_threshold + +- 默认值: 0.8 +- 类型: Double +- 单位: - +- 可变: 是 +- 描述: 资源组 `default_mv_wg` 中物化视图刷新任务触发中间结果溢出前的内存使用阈值。默认值表示内存的 80%。 +- 引入版本: v3.1 + +##### enable_resolve_hostname_to_ip_in_load_error_url + +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 对于 `error_urls` 调试,是否允许操作员根据其环境需求选择使用 FE 心跳中的原始主机名,或强制解析为 IP 地址。 + - `true`: 将主机名解析为 IP。 + - `false` (默认): 在错误 URL 中保留原始主机名。 +- 引入版本: v4.0.1 + +##### enable_retry_apply + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 启用时,被归类为可重试的 Tablet 应用失败(例如瞬态内存限制错误)将被重新调度重试,而不是立即将 Tablet 标记为错误。TabletUpdates 中的重试路径使用 `retry_apply_interval_second` 乘以当前失败计数并限制在 600 秒最大值来调度下一次尝试,因此退避随着连续失败而增长。明确不可重试的错误(例如损坏)会绕过重试,并导致应用进程立即进入错误状态。重试将持续到达到整体超时/终止条件,之后应用将进入错误状态。关闭此功能会禁用失败应用任务的自动重新调度,并导致失败的应用在没有重试的情况下转换为错误状态。 +- 引入版本: v3.2.9 + +##### enable_token_check + +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: 是 +- 描述: 一个布尔值,控制是否启用令牌检查。`true` 表示启用令牌检查,`false` 表示禁用。 +- 引入版本: - + +##### es_scroll_keepalive + +- 默认值: 5m +- 类型: String +- 单位: 分钟(带后缀的字符串,例如 "5m") +- 可变: 否 +- 描述: 发送到 Elasticsearch 以用于滚动搜索上下文的保持活动时长。该值在构建初始滚动 URL (`?scroll=`) 和发送后续滚动请求(通过 ESScrollQueryBuilder)时按原样使用(例如 "5m")。这控制了 ES 搜索上下文在 ES 端进行垃圾回收之前保留多长时间;设置更长会使滚动上下文保持活动更长时间,但会延长 ES 集群上的资源使用。该值在启动时由 ES 扫描读取器读取,不能在运行时更改。 +- 引入版本: v3.2.0 + +##### load_replica_status_check_interval_ms_on_failure + +- 默认值: 2000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 如果上次检查 RPC 失败,辅助副本检查主副本状态的间隔。 +- 引入版本: v3.5.1 + +##### load_replica_status_check_interval_ms_on_success + +- 默认值: 15000 +- 类型: Int +- 单位: 毫秒 +- 可变: 是 +- 描述: 如果上次检查 RPC 成功,辅助副本检查主副本状态的间隔。 +- 引入版本: v3.5.1 + +##### max_length_for_bitmap_function + +- 默认值: 1000000 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: Bitmap 函数输入值的最大长度。 +- 引入版本: - + +##### max_length_for_to_base64 + +- 默认值: 200000 +- 类型: Int +- 单位: 字节 +- 可变: 否 +- 描述: `to_base64()` 函数输入值的最大长度。 +- 引入版本: - + +##### memory_high_level + +- 默认值: 75 +- 类型: Long +- 单位: 百分比 +- 可变: 是 +- 描述: 高水位内存阈值,表示为进程内存限制的百分比。当总内存消耗超过此百分比时,BE 开始逐渐释放内存(目前通过逐出数据缓存和更新缓存)以缓解压力。监视器使用此值计算 `memory_high = mem_limit * memory_high_level / 100`,如果消耗 `>` `memory_high`,则在 GC 顾问的指导下执行受控逐出;如果消耗超过 `memory_urgent_level`(一个单独的配置),则会发生更积极的即时减少。此值还会用于在超过阈值时禁用某些内存密集型操作(例如主键预加载)。必须满足与 `memory_urgent_level` 的验证(`memory_urgent_level > memory_high_level`,`memory_high_level >= 1`,`memory_urgent_level <= 100`)。 +- 引入版本: v3.2.0 + +##### report_exec_rpc_request_retry_num + +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: 是 +- 描述: 向 FE 报告 exec rpc 请求的 rpc 请求重试次数。默认值为 10,这意味着如果 rpc 请求失败,只有在是 Fragment Instance 完成 rpc 时,它才会被重试 10 次。报告 exec rpc 请求对于加载作业很重要,如果一个 Fragment Instance 完成报告失败,加载作业将挂起直到超时。 +- 引入版本: - + +##### sleep_one_second + +- 默认值: 1 +- 类型: Int +- 单位: 秒 +- 可变: 否 +- 描述: BE 代理工作线程使用的一个小的全局休眠间隔(秒),当主地址/心跳尚不可用或需要短时间重试/退避时作为一秒的暂停。在代码库中,它被几个报告工作池(例如 ReportDiskStateTaskWorkerPool、ReportOlapTableTaskWorkerPool、ReportWorkgroupTaskWorkerPool)引用,以避免忙等待并减少重试时的 CPU 消耗。增加此值会降低重试频率和对主可用性的响应速度;减少它会增加轮询速率和 CPU 使用率。仅在了解响应性与资源使用权衡的情况下进行调整。 +- 引入版本: v3.2.0 + +##### small_file_dir + +- 默认值: `${STARROCKS_HOME}/lib/small_file/` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 用于存储文件管理器下载的文件的目录。 +- 引入版本: - + +##### upload_buffer_size + +- 默认值: 4194304 +- 类型: Int +- 单位: 字节 +- 可变: 是 +- 描述: 文件拷贝操作在将快照文件上传到远程存储(Broker 或直接 FileSystem)时使用的缓冲区大小(字节)。在上传路径 (snapshot_loader.cpp) 中,此值作为每个上传流的读/写块大小传递给 fs::copy。默认值为 4 MiB。增加此值可以提高高延迟或高带宽链接的吞吐量,但会增加每个并发上传的内存使用;减少此值可降低每个流的内存,但可能降低传输效率。与 `upload_worker_count` 和总体可用内存一起调整。 +- 引入版本: v3.2.13 + +##### user_function_dir + +- 默认值: `${STARROCKS_HOME}/lib/udf` +- 类型: String +- 单位: - +- 可变: 否 +- 描述: 用于存储用户定义函数 (UDF) 的目录。 +- 引入版本: - + +##### web_log_bytes + +- 默认值: 1048576 (1 MB) +- 类型: long +- 单位: 字节 +- 可变: 否 +- 描述: 从 INFO 日志文件读取并在 BE 调试 Web 服务器的日志页面上显示的最大字节数。处理程序使用此值计算查找偏移量(显示最后 N 字节),以避免读取或提供非常大的日志文件。如果日志文件小于此值,则显示整个文件。注意:在当前实现中,读取和提供 INFO 日志的代码被注释掉了,处理程序报告 INFO 日志文件无法打开,因此除非启用日志提供代码,否则此参数可能无效。 +- 引入版本: v3.2.0 + +### 已删除参数 + +##### enable_bit_unpack_simd + +- 状态: 已删除 +- 描述: 此参数已被删除。Bit-unpack SIMD 选择现在在编译时处理 (AVX2/BMI2),并自动回退到默认实现。 +- 删除版本: - diff --git a/docs/zh/administration/management/Backup_and_restore.md b/docs/zh/administration/management/Backup_and_restore.md new file mode 100644 index 0000000..5fc65b6 --- /dev/null +++ b/docs/zh/administration/management/Backup_and_restore.md @@ -0,0 +1,650 @@ +--- +displayed_sidebar: docs +--- + +# 备份与恢复数据 + +本文介绍如何在 StarRocks 中备份和恢复数据,或将数据迁移到新的 StarRocks 集群。 + +StarRocks 支持将数据备份为快照并存储到远端存储系统,然后将数据恢复到任意 StarRocks 集群。 + +从 v3.4.0 版本起,StarRocks 增强了 BACKUP 和 RESTORE 的功能,支持更多对象并重构了语法以提高灵活性。 + +StarRocks 支持以下远端存储系统: + +- Apache™ Hadoop® (HDFS) 集群 +- AWS S3 +- Google GCS +- MinIO + +StarRocks 支持备份以下对象: + +- 内部数据库、表(所有类型和分区策略)以及分区 +- 外部 Catalog 的元数据(v3.4.0 版本起支持) +- 同步物化视图和异步物化视图 +- 逻辑视图(v3.4.0 版本起支持) +- 用户定义函数 UDFs(v3.4.0 版本起支持) + +> **NOTE** +> +> 共享数据集群模式的 StarRocks 集群不支持数据 BACKUP 和 RESTORE。 + +## 创建仓库 + +在备份数据之前,您需要创建一个仓库,用于在远端存储系统中存储数据快照。您可以在 StarRocks 集群中创建多个仓库。有关详细说明,请参阅 [CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md)。 + +- 在 HDFS 中创建仓库 + +以下示例在 HDFS 集群中创建名为 `test_repo` 的仓库。 + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "hdfs://:/repo_dir/backup" +PROPERTIES( + "username" = "", + "password" = "" +); +``` + +- 在 AWS S3 中创建仓库 + + 您可以选择 IAM user-based credential (Access Key and Secret Key)、Instance Profile 或 Assumed Role 作为访问 AWS S3 的凭证方法。 + + - 以下示例使用 IAM user-based credentials 凭证方法,在 AWS S3 存储桶 `bucket_s3` 中创建名为 `test_repo` 的仓库。 + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyyyyyyyyy", + "aws.s3.region" = "us-east-1" + ); + ``` + + - 以下示例使用 Instance Profile 凭证方法,在 AWS S3 存储桶 `bucket_s3` 中创建名为 `test_repo` 的仓库。 + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.use_instance_profile" = "true", + "aws.s3.region" = "us-east-1" + ); + ``` + + - 以下示例使用 Assumed Role 凭证方法,在 AWS S3 存储桶 `bucket_s3` 中创建名为 `test_repo` 的仓库。 + + ```SQL + CREATE REPOSITORY test_repo + WITH BROKER + ON LOCATION "s3a://bucket_s3/backup" + PROPERTIES( + "aws.s3.use_instance_profile" = "true", + "aws.s3.iam_role_arn" = "arn:aws:iam::xxxxxxxxxx:role/yyyyyyyy", + "aws.s3.region" = "us-east-1" + ); + ``` + +> **NOTE** +> +> StarRocks 仅支持根据 S3A 协议在 AWS S3 中创建仓库。因此,当您在 AWS S3 中创建仓库时,必须将 `ON LOCATION` 中作为仓库位置传入的 S3 URI 中的 `s3://` 替换为 `s3a://`。 + +- 在 Google GCS 中创建仓库 + +以下示例在 Google GCS 存储桶 `bucket_gcs` 中创建名为 `test_repo` 的仓库。 + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "s3a://bucket_gcs/backup" +PROPERTIES( + "fs.s3a.access.key" = "xxxxxxxxxxxxxxxxxxxx", + "fs.s3a.secret.key" = "yyyyyyyyyyyyyyyyyyyy", + "fs.s3a.endpoint" = "storage.googleapis.com" +); +``` + +> **NOTE** +> +> - StarRocks 仅支持根据 S3A 协议在 Google GCS 中创建仓库。因此,当您在 Google GCS 中创建仓库时,必须将 `ON LOCATION` 中作为仓库位置传入的 GCS URI 中的前缀替换为 `s3a://`。 +> - 端点地址中不要指定 `https`。 + +- 在 MinIO 中创建仓库 + +以下示例在 MinIO 存储桶 `bucket_minio` 中创建名为 `test_repo` 的仓库。 + +```SQL +CREATE REPOSITORY test_repo +WITH BROKER +ON LOCATION "s3://bucket_minio/backup" +PROPERTIES( + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy", + "aws.s3.endpoint" = "http://minio:9000" +); +``` + +仓库创建完成后,您可以通过 [SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md) 命令查看仓库。数据恢复完成后,您可以使用 [DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md) 命令删除 StarRocks 中的仓库。但是,存储在远端存储系统中的数据快照无法通过 StarRocks 删除。您需要手动在远端存储系统中删除它们。 + +## 备份数据 + +仓库创建完成后,您需要创建一个数据快照并将其备份到远端仓库。有关详细说明,请参阅 [BACKUP](../../sql-reference/sql-statements/backup_restore/BACKUP.md)。BACKUP 是一种异步操作。您可以使用 [SHOW BACKUP](../../sql-reference/sql-statements/backup_restore/SHOW_BACKUP.md) 命令检查 BACKUP 作业的状态,或使用 [CANCEL BACKUP](../../sql-reference/sql-statements/backup_restore/CANCEL_BACKUP.md) 命令取消 BACKUP 作业。 + +StarRocks 支持在数据库、表或分区级别进行 FULL 备份。 + +如果您的表存储了大量数据,建议您按分区备份和恢复数据。这样,您可以减少作业失败时的重试成本。如果您需要定期备份增量数据,可以为表配置一个 [分区方案](../../table_design/data_distribution/Data_distribution.md#partitioning),每次只备份新分区。 + +### 备份数据库 + +对数据库执行完整 BACKUP 操作将备份数据库中的所有表、同步和异步物化视图、逻辑视图和 UDF。 + +以下示例将数据库 `sr_hub` 备份到快照 `sr_hub_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +BACKUP DATABASE sr_hub SNAPSHOT sr_hub_backup +TO test_repo; + +-- Compatible with the syntax in earlier versions. (兼容早期版本语法) +BACKUP SNAPSHOT sr_hub.sr_hub_backup +TO test_repo; +``` + +### 备份表 + +StarRocks 支持备份和恢复所有类型和分区策略的表。对表执行完整 BACKUP 操作将备份该表及其上构建的同步物化视图。 + +以下示例将数据库 `sr_hub` 中的表 `sr_member` 备份到快照 `sr_member_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +BACKUP DATABASE sr_hub SNAPSHOT sr_member_backup +TO test_repo +ON (TABLE sr_member); + +-- Compatible with the syntax in earlier versions. (兼容早期版本语法) +BACKUP SNAPSHOT sr_hub.sr_member_backup +TO test_repo +ON (sr_member); +``` + +以下示例将数据库 `sr_hub` 中的两张表 `sr_member` 和 `sr_pmc` 备份到快照 `sr_core_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_core_backup +TO test_repo +ON (TABLE sr_member, TABLE sr_pmc); +``` + +以下示例将数据库 `sr_hub` 中的所有表备份到快照 `sr_all_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_all_backup +TO test_repo +ON (ALL TABLES); +``` + +### 备份分区 + +以下示例将数据库 `sr_hub` 中表 `sr_member` 的分区 `p1` 备份到快照 `sr_par_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +BACKUP DATABASE sr_hub SNAPSHOT sr_par_backup +TO test_repo +ON (TABLE sr_member PARTITION (p1)); + +-- Compatible with the syntax in earlier versions. (兼容早期版本语法) +BACKUP SNAPSHOT sr_hub.sr_par_backup +TO test_repo +ON (sr_member PARTITION (p1)); +``` + +您可以指定多个分区名称,用逗号 (`,`) 分隔,以批量备份分区。 + +### 备份物化视图 + +您无需手动备份同步物化视图,因为它们会随同基表的 BACKUP 操作一起备份。 + +异步物化视图可以随同其所属数据库的 BACKUP 操作一起备份。您也可以手动备份它们。 + +以下示例将数据库 `sr_hub` 中的物化视图 `sr_mv1` 备份到快照 `sr_mv1_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv1_backup +TO test_repo +ON (MATERIALIZED VIEW sr_mv1); +``` + +以下示例将数据库 `sr_hub` 中的两个物化视图 `sr_mv1` 和 `sr_mv2` 备份到快照 `sr_mv2_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv2_backup +TO test_repo +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2); +``` + +以下示例将数据库 `sr_hub` 中的所有物化视图备份到快照 `sr_mv3_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_mv3_backup +TO test_repo +ON (ALL MATERIALIZED VIEWS); +``` + +### 备份逻辑视图 + +以下示例将数据库 `sr_hub` 中的逻辑视图 `sr_view1` 备份到快照 `sr_view1_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view1_backup +TO test_repo +ON (VIEW sr_view1); +``` + +以下示例将数据库 `sr_hub` 中的两个逻辑视图 `sr_view1` 和 `sr_view2` 备份到快照 `sr_view2_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view2_backup +TO test_repo +ON (VIEW sr_view1, VIEW sr_view2); +``` + +以下示例将数据库 `sr_hub` 中的所有逻辑视图备份到快照 `sr_view3_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_view3_backup +TO test_repo +ON (ALL VIEWS); +``` + +### 备份 UDF + +以下示例将数据库 `sr_hub` 中的 UDF `sr_udf1` 备份到快照 `sr_udf1_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf1_backup +TO test_repo +ON (FUNCTION sr_udf1); +``` + +以下示例将数据库 `sr_hub` 中的两个 UDF `sr_udf1` 和 `sr_udf2` 备份到快照 `sr_udf2_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf2_backup +TO test_repo +ON (FUNCTION sr_udf1, FUNCTION sr_udf2); +``` + +以下示例将数据库 `sr_hub` 中的所有 UDF 备份到快照 `sr_udf3_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP DATABASE sr_hub SNAPSHOT sr_udf3_backup +TO test_repo +ON (ALL FUNCTIONS); +``` + +### 备份外部 Catalog 的元数据 + +以下示例将外部 Catalog `iceberg` 的元数据备份到快照 `iceberg_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP EXTERNAL CATALOG (iceberg) SNAPSHOT iceberg_backup +TO test_repo; +``` + +以下示例将两个外部 Catalog `iceberg` 和 `hive` 的元数据备份到快照 `iceberg_hive_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP EXTERNAL CATALOGS (iceberg, hive) SNAPSHOT iceberg_hive_backup +TO test_repo; +``` + +以下示例将所有外部 Catalog 的元数据备份到快照 `all_catalog_backup` 中,并将快照上传到仓库 `test_repo`。 + +```SQL +BACKUP ALL EXTERNAL CATALOGS SNAPSHOT all_catalog_backup +TO test_repo; +``` + +要取消对外部 Catalog 的 BACKUP 操作,请执行以下语句: + +```SQL +CANCEL BACKUP FOR EXTERNAL CATALOG; +``` + +## 恢复数据 + +您可以将备份在远端存储系统中的数据快照恢复到当前或其他的 StarRocks 集群,以实现数据恢复或数据迁移。 + +**当您从快照恢复对象时,必须指定快照的时间戳。** + +使用 [RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md) 语句来恢复远端存储系统中的数据快照。 + +RESTORE 是一种异步操作。您可以使用 [SHOW RESTORE](../../sql-reference/sql-statements/backup_restore/SHOW_RESTORE.md) 命令检查 RESTORE 作业的状态,或使用 [CANCEL RESTORE](../../sql-reference/sql-statements/backup_restore/CANCEL_RESTORE.md) 命令取消 RESTORE 作业。 + +### (可选) 在新集群中创建仓库 + +要将数据迁移到另一个 StarRocks 集群,您需要在目标集群中创建具有相同**仓库名称**和**位置**的仓库,否则将无法查看之前备份的数据快照。有关详细信息,请参阅 [创建仓库](#create-a-repository)。 + +### 获取快照时间戳 + +在恢复数据之前,您可以使用 [SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md) 命令查看仓库中的快照信息以获取时间戳。 + +以下示例检查 `test_repo` 中的快照信息。 + +```Plain +mysql> SHOW SNAPSHOT ON test_repo; ++------------------+-------------------------+--------+ +| Snapshot | Timestamp | Status | ++------------------+-------------------------+--------+ +| sr_member_backup | 2023-02-07-14-45-53-143 | OK | ++------------------+-------------------------+--------+ +1 row in set (1.16 sec) +``` + +### 恢复数据库 + +以下示例将快照 `sr_hub_backup` 中的数据库 `sr_hub` 恢复到目标集群中的数据库 `sr_hub`。如果快照中不存在该数据库,系统将返回错误。如果目标集群中不存在该数据库,系统将自动创建。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +RESTORE SNAPSHOT sr_hub_backup +FROM test_repo +DATABASE sr_hub +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); + +-- Compatible with the syntax in earlier versions. (兼容早期版本语法) +RESTORE SNAPSHOT sr_hub.sr_hub_backup +FROM `test_repo` +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); +``` + +以下示例将快照 `sr_hub_backup` 中的数据库 `sr_hub` 恢复到目标集群中的数据库 `sr_hub_new`。如果快照中不存在数据库 `sr_hub`,系统将返回错误。如果目标集群中不存在数据库 `sr_hub_new`,系统将自动创建。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +RESTORE SNAPSHOT sr_hub_backup +FROM test_repo +DATABASE sr_hub AS sr_hub_new +PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); +``` + +### 恢复表 + +以下示例将快照 `sr_member_backup` 中数据库 `sr_hub` 的表 `sr_member` 恢复到目标集群中数据库 `sr_hub` 的表 `sr_member`。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +RESTORE SNAPSHOT sr_member_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); + +-- Compatible with the syntax in earlier versions. (兼容早期版本语法) +RESTORE SNAPSHOT sr_hub.sr_member_backup +FROM test_repo +ON (sr_member) +PROPERTIES ("backup_timestamp"="2024-12-09-10-52-10-940"); +``` + +以下示例将快照 `sr_member_backup` 中数据库 `sr_hub` 的表 `sr_member` 恢复到目标集群中数据库 `sr_hub_new` 的表 `sr_member_new`。 + +```SQL +RESTORE SNAPSHOT sr_member_backup +FROM test_repo +DATABASE sr_hub AS sr_hub_new +ON (TABLE sr_member AS sr_member_new) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例将快照 `sr_core_backup` 中数据库 `sr_hub` 的两张表 `sr_member` 和 `sr_pmc` 恢复到目标集群中数据库 `sr_hub` 的两张表 `sr_member` 和 `sr_pmc`。 + +```SQL +RESTORE SNAPSHOT sr_core_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member, TABLE sr_pmc) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_all_backup` 中数据库 `sr_hub` 的所有表。 + +```SQL +RESTORE SNAPSHOT sr_all_backup +FROM test_repo +DATABASE sr_hub +ON (ALL TABLES); +``` + +以下示例恢复快照 `sr_all_backup` 中数据库 `sr_hub` 的其中一张表。 + +```SQL +RESTORE SNAPSHOT sr_all_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### 恢复分区 + +以下示例将快照 `sr_par_backup` 中表 `sr_member` 的分区 `p1` 恢复到目标集群中表 `sr_member` 的分区 `p1`。 + +```SQL +-- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +RESTORE SNAPSHOT sr_par_backup +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member PARTITION (p1)) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); + +-- Compatible with the syntax in earlier versions. (兼容早期版本语法) +RESTORE SNAPSHOT sr_hub.sr_par_backup +FROM test_repo +ON (sr_member PARTITION (p1)) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +您可以指定多个分区名称,用逗号 (`,`) 分隔,以批量恢复分区。 + +### 恢复物化视图 + +以下示例将快照 `sr_mv1_backup` 中数据库 `sr_hub` 的物化视图 `sr_mv1` 恢复到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_mv1_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例将快照 `sr_mv2_backup` 中数据库 `sr_hub` 的两个物化视图 `sr_mv1` 和 `sr_mv2` 恢复到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_mv2_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_mv3_backup` 中数据库 `sr_hub` 的所有物化视图到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_mv3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL MATERIALIZED VIEWS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_mv3_backup` 中数据库 `sr_hub` 的其中一个物化视图到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_mv3_backup +FROM test_repo +DATABASE sr_hub +ON (MATERIALIZED VIEW sr_mv1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +:::info + +RESTORE 之后,您可以使用 [SHOW MATERIALIZED VIEWS](../../sql-reference/sql-statements/materialized_view/SHOW_MATERIALIZED_VIEW.md) 查看物化视图的状态。 + +- 如果物化视图是 Active 状态,则可以直接使用。 +- 如果物化视图是 Inactive 状态,可能是因为其基表未恢复。所有基表恢复后,您可以使用 [ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md) 重新激活物化视图。 + +::: + +### 恢复逻辑视图 + +以下示例将快照 `sr_view1_backup` 中数据库 `sr_hub` 的逻辑视图 `sr_view1` 恢复到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_view1_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例将快照 `sr_view2_backup` 中数据库 `sr_hub` 的两个逻辑视图 `sr_view1` 和 `sr_view2` 恢复到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_view2_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1, VIEW sr_view2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_view3_backup` 中数据库 `sr_hub` 的所有逻辑视图到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_view3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL VIEWS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_view3_backup` 中数据库 `sr_hub` 的其中一个逻辑视图到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_view3_backup +FROM test_repo +DATABASE sr_hub +ON (VIEW sr_view1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### 恢复 UDF + +以下示例将快照 `sr_udf1_backup` 中数据库 `sr_hub` 的 UDF `sr_udf1` 恢复到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_udf1_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例将快照 `sr_udf2_backup` 中数据库 `sr_hub` 的两个 UDF `sr_udf1` 和 `sr_udf2` 恢复到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_udf2_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1, FUNCTION sr_udf2) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_udf3_backup` 中数据库 `sr_hub` 的所有 UDF 到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_udf3_backup +FROM test_repo +DATABASE sr_hub +ON (ALL FUNCTIONS) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `sr_udf3_backup` 中数据库 `sr_hub` 的其中一个 UDF 到目标集群。 + +```SQL +RESTORE SNAPSHOT sr_udf3_backup +FROM test_repo +DATABASE sr_hub +ON (FUNCTION sr_udf1) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +### 恢复外部 Catalog 的元数据 + +以下示例将快照 `iceberg_backup` 中外部 Catalog `iceberg` 的元数据恢复到目标集群,并将其重命名为 `iceberg_new`。 + +```SQL +RESTORE SNAPSHOT iceberg_backup +FROM test_repo +EXTERNAL CATALOG (iceberg AS iceberg_new) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `iceberg_hive_backup` 中两个外部 Catalog `iceberg` 和 `hive` 的元数据到目标集群。 + +```SQL +RESTORE SNAPSHOT iceberg_hive_backup +FROM test_repo +EXTERNAL CATALOGS (iceberg, hive) +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +以下示例恢复快照 `all_catalog_backup` 中所有外部 Catalog 的元数据到目标集群。 + +```SQL +RESTORE SNAPSHOT all_catalog_backup +FROM test_repo +ALL EXTERNAL CATALOGS +PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); +``` + +要取消对外部 Catalog 的 RESTORE 操作,请执行以下语句: + +```SQL +CANCEL RESTORE FOR EXTERNAL CATALOG; +``` + +## 配置 BACKUP 或 RESTORE 作业 + +您可以通过修改 BE 配置文件 **be.conf** 中的以下配置项来优化 BACKUP 或 RESTORE 作业的性能: + +| 配置项 | 描述 | +| ---------------------------- | ---------------------------------------------------------------------------------------------------------- | +| make_snapshot_worker_count | BE 节点上 BACKUP 作业创建快照任务的最大线程数。默认值:`5`。增加此配置项的值以提高创建快照任务的并发性。 | +| release_snapshot_worker_count | BE 节点上失败的 BACKUP 作业释放快照任务的最大线程数。默认值:`5`。增加此配置项的值以提高释放快照任务的并发性。 | +| upload_worker_count | BE 节点上 BACKUP 作业上传任务的最大线程数。默认值:`0`。`0` 表示设置为 BE 所在机器的 CPU 核心数。增加此配置项的值以提高上传任务的并发性。 | +| download_worker_count | BE 节点上 RESTORE 作业下载任务的最大线程数。默认值:`0`。`0` 表示设置为 BE 所在机器的 CPU 核心数。增加此配置项的值以提高下载任务的并发性。 | + +## 使用须知 + +- 在 GLOBAL、DATABASE、TABLE 和 PARTITION 级别执行备份和恢复操作需要不同的权限。有关详细信息,请参阅 [根据场景定制角色](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios)。 +- 每个数据库每次只允许运行一个 BACKUP 或 RESTORE 作业。否则,StarRocks 将返回错误。 +- 由于 BACKUP 和 RESTORE 作业会占用 StarRocks 集群的许多资源,因此您可以在 StarRocks 集群负载不高时备份和恢复数据。 +- StarRocks 不支持为数据备份指定数据压缩算法。 +- 由于数据是作为快照备份的,因此在快照生成时加载的数据不包含在快照中。因此,如果在快照生成后和 RESTORE 作业完成之前将数据加载到旧集群中,您还需要将数据加载到恢复数据的集群中。建议您在数据迁移完成后的一段时间内并行将数据加载到两个集群中,并在验证数据和服务正确性后将应用程序迁移到新集群。 +- 在 RESTORE 作业完成之前,您无法操作要恢复的表。 +- Primary Key 表不能恢复到 v2.5 之前的 StarRocks 集群。 +- 在恢复表之前,您无需在新集群中创建要恢复的表。RESTORE 作业会自动创建。 +- 如果存在一个与要恢复的表同名的表,StarRocks 首先会检查现有表的 Schema 是否与要恢复的表的 Schema 匹配。如果 Schema 匹配,StarRocks 将用快照中的数据覆盖现有表。如果 Schema 不匹配,RESTORE 作业将失败。您可以选择使用 `AS` 关键字重命名要恢复的表,或在恢复数据之前删除现有表。 +- 如果 RESTORE 作业覆盖了现有的数据库、表或分区,则在作业进入 COMMIT 阶段后,被覆盖的数据无法恢复。如果 RESTORE 作业在此点失败或被取消,数据可能会损坏且无法访问。在这种情况下,您只能再次执行 RESTORE 操作并等待作业完成。因此,除非您确定当前数据不再使用,否则我们建议您不要通过覆盖方式恢复数据。覆盖操作首先检查快照与现有数据库、表或分区之间的元数据一致性。如果检测到不一致,则无法执行 RESTORE 操作。 +- 当前 StarRocks 不支持备份和恢复与用户账户、权限和资源组相关的配置数据。 +- 当前 StarRocks 不支持备份和恢复表之间的 Colocate Join 关系。 diff --git a/docs/zh/administration/management/FE_configuration.md b/docs/zh/administration/management/FE_configuration.md new file mode 100644 index 0000000..fbdf15b --- /dev/null +++ b/docs/zh/administration/management/FE_configuration.md @@ -0,0 +1,4500 @@ +--- +displayed_sidebar: docs +--- + +import FEConfigMethod from '../../_assets/commonMarkdown/FE_config_method.mdx' + +import AdminSetFrontendNote from '../../_assets/commonMarkdown/FE_config_note.mdx' + +import StaticFEConfigNote from '../../_assets/commonMarkdown/StaticFE_config_note.mdx' + +import EditionSpecificFEItem from '../../_assets/commonMarkdown/Edition_Specific_FE_Item.mdx' + +# FE 配置 + + + +## 查看 FE 配置项 + +FE 启动后,您可以在 MySQL 客户端上运行 ADMIN SHOW FRONTEND CONFIG 命令查看参数配置。如果要查询特定参数的配置,请运行以下命令: + +```SQL +ADMIN SHOW FRONTEND CONFIG [LIKE "pattern"]; +``` + +有关返回字段的详细说明,请参阅 [ADMIN SHOW CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SHOW_CONFIG.md)。 + +:::note +您必须具备管理员权限才能运行集群管理相关命令。 +::: + +## 配置 FE 参数 + +### 配置 FE 动态参数 + +您可以使用 [ADMIN SET FRONTEND CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md) 配置或修改 FE 动态参数的设置。 + +```SQL +ADMIN SET FRONTEND CONFIG ("key" = "value"); +``` + + + +### 配置 FE 静态参数 + + + +## 理解 FE 参数 + +### 日志 + +##### audit_log_delete_age + +- 默认值:30d +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:审计日志文件的保留期限。默认值 `30d` 指定每个审计日志文件可以保留 30 天。StarRocks 会检查每个审计日志文件并删除 30 天前生成的那些文件。 +- 引入版本:- + +##### audit_log_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储审计日志文件的目录。 +- 引入版本:- + +##### audit_log_enable_compress + +- 默认值:false +- 类型:Boolean +- 单位:N/A +- 是否可变:否 +- 描述:当为 true 时,生成的 Log4j2 配置会为轮转的审计日志文件名 (fe.audit.log.*) 附加 ".gz" 后缀,以便 Log4j2 在轮转时生成压缩的 (.gz) 归档审计日志文件。该设置在 FE 启动期间在 Log4jConfig.initLogging 中读取,并应用于审计日志的 RollingFile appender;它只影响轮转/归档文件,不影响活动的审计日志。由于该值在启动时初始化,因此更改它需要重启 FE 才能生效。与审计日志轮转设置 (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num) 一起使用。 +- 引入版本:3.2.12 + +##### audit_log_json_format + +- 默认值:false +- 类型:Boolean +- 单位:N/A +- 是否可变:是 +- 描述:当为 true 时,FE 审计事件以结构化 JSON 格式(Jackson ObjectMapper 序列化带注释的 AuditEvent 字段的 Map)发出,而不是默认的管道分隔 "key=value" 字符串。该设置影响 AuditLogBuilder 处理的所有内置审计接收器:连接审计、查询审计、大查询审计(当事件符合条件时,大查询阈值字段会添加到 JSON 中)和慢审计输出。大查询阈值和 "features" 字段的带注释字段会进行特殊处理(从普通审计条目中排除;根据适用情况包含在大查询或功能日志中)。启用此功能可使日志对于日志收集器或 SIEM 而言可机器解析;请注意,它会更改日志格式,并且可能需要更新任何期望旧版管道分隔格式的现有解析器。 +- 引入版本:3.2.7 + +##### audit_log_modules + +- 默认值:slow_query, query +- 类型:String[] +- 单位:- +- 是否可变:否 +- 描述:StarRocks 为其生成审计日志条目的模块。默认情况下,StarRocks 为 `slow_query` 模块和 `query` 模块生成审计日志。从 v3.0 开始支持 `connection` 模块。模块名称之间用逗号 (,) 和空格分隔。 +- 引入版本:- + +##### audit_log_roll_interval + +- 默认值:DAY +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:StarRocks 轮转审计日志条目的时间间隔。有效值:`DAY` 和 `HOUR`。 + - 如果此参数设置为 `DAY`,则在审计日志文件名称中添加 `yyyyMMdd` 格式的后缀。 + - 如果此参数设置为 `HOUR`,则在审计日志文件名称中添加 `yyyyMMddHH` 格式的后缀。 +- 引入版本:- + +##### audit_log_roll_num + +- 默认值:90 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:在 `audit_log_roll_interval` 参数指定的每个保留期内可以保留的审计日志文件的最大数量。 +- 引入版本:- + +##### bdbje_log_level + +- 默认值:INFO +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:控制 StarRocks 中 Berkeley DB Java Edition (BDB JE) 使用的日志级别。在 BDB 环境初始化期间,BDBEnvironment.initConfigs() 将此值应用于 `com.sleepycat.je` 包的 Java 日志记录器和 BDB JE 环境文件日志级别 (EnvironmentConfig.FILE_LOGGING_LEVEL)。接受标准 `java.util.logging.Level` 名称,例如 SEVERE、WARNING、INFO、CONFIG、FINE、FINER、FINEST、ALL、OFF。设置为 ALL 将启用所有日志消息。增加详细程度将提高日志量,并可能影响磁盘 I/O 和性能;该值在 BDB 环境初始化时读取,因此仅在环境(重新)初始化后生效。 +- 引入版本:v3.2.0 + +##### big_query_log_delete_age + +- 默认值:7d +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:控制 FE 大查询日志文件 (`fe.big_query.log.*`) 在自动删除之前保留多长时间。该值作为 IfLastModified age 传递给 Log4j 的删除策略 — 任何最后修改时间早于此值的轮转大查询日志都将被删除。支持的后缀包括 `d`(天)、`h`(小时)、`m`(分钟)和 `s`(秒)。示例:`7d`(7 天)、`10h`(10 小时)、`60m`(60 分钟)和 `120s`(120 秒)。此项与 `big_query_log_roll_interval` 和 `big_query_log_roll_num` 一起确定哪些文件被保留或清除。 +- 引入版本:v3.2.0 + +##### big_query_log_dir + +- 默认值:`Config.STARROCKS_HOME_DIR + "/log"` +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:FE 写入大查询转储日志 (`fe.big_query.log.*`) 的目录。Log4j 配置使用此路径为 `fe.big_query.log` 及其轮转文件创建 RollingFile appender。这些文件的轮转和保留由 `big_query_log_roll_interval`(基于时间的后缀)、`log_roll_size_mb`(大小触发器)、`big_query_log_roll_num`(最大文件数)和 `big_query_log_delete_age`(基于年龄的删除)控制。对于超过用户定义阈值(如 `big_query_log_cpu_second_threshold`、`big_query_log_scan_rows_threshold` 或 `big_query_log_scan_bytes_threshold`)的查询,会记录大查询记录。使用 `big_query_log_modules` 控制哪些模块记录到此文件。 +- 引入版本:v3.2.0 + +##### big_query_log_modules + +- 默认值:`{"query"}` +- 类型:String[] +- 单位:- +- 是否可变:否 +- 描述:启用按模块大查询日志记录的模块名称后缀列表。典型值是逻辑组件名称。例如,默认的 `query` 会生成 `big_query.query`。 +- 引入版本:v3.2.0 + +##### big_query_log_roll_interval + +- 默认值:`"DAY"` +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:指定用于构建 `big_query` 日志 appender 的滚动文件名称的日期组件的时间间隔。有效值(不区分大小写)为 `DAY`(默认)和 `HOUR`。`DAY` 生成每日模式 (`"%d{yyyyMMdd}"`),`HOUR` 生成每小时模式 (`"%d{yyyyMMddHH}"`)。该值与基于大小的滚动 (`big_query_roll_maxsize`) 和基于索引的滚动 (`big_query_log_roll_num`) 结合,形成 RollingFile 文件模式。无效值会导致日志配置生成失败 (IOException),并可能阻止日志初始化或重新配置。与 `big_query_log_dir`、`big_query_roll_maxsize`、`big_query_log_roll_num` 和 `big_query_log_delete_age` 一起使用。 +- 引入版本:v3.2.0 + +##### big_query_log_roll_num + +- 默认值:10 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:每个 `big_query_log_roll_interval` 要保留的轮转 FE 大查询日志文件的最大数量。此值绑定到 RollingFile appender 的 DefaultRolloverStrategy `max` 属性,用于 `fe.big_query.log`;当日志滚动时(按时间或按 `log_roll_size_mb`),StarRocks 最多保留 `big_query_log_roll_num` 个索引文件(文件模式使用时间后缀加索引)。早于此计数的文件可能会因滚动而删除,`big_query_log_delete_age` 还可以根据最后修改时间删除文件。 +- 引入版本:v3.2.0 + +##### dump_log_delete_age + +- 默认值:7d +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:转储日志文件的保留期限。默认值 `7d` 指定每个转储日志文件可以保留 7 天。StarRocks 会检查每个转储日志文件并删除 7 天前生成的那些文件。 +- 引入版本:- + +##### dump_log_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储转储日志文件的目录。 +- 引入版本:- + +##### dump_log_modules + +- 默认值:query +- 类型:String[] +- 单位:- +- 是否可变:否 +- 描述:StarRocks 为其生成转储日志条目的模块。默认情况下,StarRocks 为 query 模块生成转储日志。模块名称之间用逗号 (,) 和空格分隔。 +- 引入版本:- + +##### dump_log_roll_interval + +- 默认值:DAY +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:StarRocks 轮转转储日志条目的时间间隔。有效值:`DAY` 和 `HOUR`。 + - 如果此参数设置为 `DAY`,则在转储日志文件名称中添加 `yyyyMMdd` 格式的后缀。 + - 如果此参数设置为 `HOUR`,则在转储日志文件名称中添加 `yyyyMMddHH` 格式的后缀。 +- 引入版本:- + +##### dump_log_roll_num + +- 默认值:10 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:在 `dump_log_roll_interval` 参数指定的每个保留期内可以保留的转储日志文件的最大数量。 +- 引入版本:- + +##### edit_log_write_slow_log_threshold_ms + +- 默认值:2000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:JournalWriter 用于检测和记录慢编辑日志批次写入的阈值(毫秒)。在批次提交后,如果批次持续时间超过此值,JournalWriter 将发出 WARN 消息,其中包含批次大小、持续时间和当前 Journal 队列大小(速率限制为大约每 2 秒一次)。此设置仅控制 FE Leader 上潜在 IO 或复制延迟的日志记录/警报;它不改变提交或滚动行为(请参阅 `edit_log_roll_num` 和与提交相关的设置)。无论此阈值如何,度量更新仍会发生。 +- 引入版本:v3.2.3 + +##### enable_audit_sql + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:当此项设置为 `true` 时,FE 审计子系统会将语句的 SQL 文本记录到 ConnectProcessor 处理的 FE 审计日志 (`fe.audit.log`) 中。存储的语句遵循其他控制:加密语句会被 redacted (`AuditEncryptionChecker`),如果 `enable_sql_desensitize_in_log` 设置为 true,敏感凭据可能会被 redacted 或 desensitized,并且摘要记录由 `enable_sql_digest` 控制。当设置为 `false` 时,ConnectProcessor 会在审计事件中将语句文本替换为 "?" — 其他审计字段(用户、主机、持续时间、状态、通过 `qe_slow_log_ms` 进行的慢查询检测以及指标)仍会被记录。启用 SQL 审计会增加取证和故障排除的可见性,但可能会暴露敏感的 SQL 内容并增加日志量和 I/O;禁用它以牺牲审计日志中失去完整语句可见性为代价来提高隐私性。 +- 引入版本:- + +##### enable_profile_log + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否启用 Profile 日志记录。当启用此功能时,FE 会将每个查询的 Profile 日志(ProfileManager 生成的序列化 queryDetail JSON)写入 Profile 日志接收器。仅当 `enable_collect_query_detail_info` 也启用时才会执行此日志记录;当 `enable_profile_log_compress` 启用时,JSON 可能会在日志记录前进行 gzip 压缩。Profile 日志文件由 `profile_log_dir`、`profile_log_roll_num`、`profile_log_roll_interval` 管理,并根据 `profile_log_delete_age`(支持 `7d`、`10h`、`60m`、`120s` 等格式)进行轮转/删除。禁用此功能将停止写入 Profile 日志(减少磁盘 I/O、压缩 CPU 和存储使用)。 +- 引入版本:v3.2.5 + +##### enable_qe_slow_log + +- 默认值:true +- 类型:Boolean +- 单位:N/A +- 是否可变:是 +- 描述:启用时,FE 内置审计插件 (AuditLogBuilder) 会将执行时间("Time" 字段)超过 `qe_slow_log_ms` 配置阈值的查询事件写入慢查询审计日志 (AuditLog.getSlowAudit)。如果禁用,这些慢查询条目将被抑制(常规查询和连接审计日志不受影响)。慢审计条目遵循全局 `audit_log_json_format` 设置(JSON 与纯字符串)。使用此标志可独立于常规审计日志记录控制慢查询审计量;当 `qe_slow_log_ms` 较低或工作负载产生许多长时间运行的查询时,关闭此功能可能会减少日志 I/O。 +- 引入版本:3.2.11 + +##### enable_sql_desensitize_in_log + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:当此项设置为 `true` 时,系统会在将敏感 SQL 内容写入日志和查询详细信息记录之前对其进行替换或隐藏。遵循此配置的代码路径包括 ConnectProcessor.formatStmt(审计日志)、StmtExecutor.addRunningQueryDetail(查询详细信息)和 SimpleExecutor.formatSQL(内部执行器日志)。启用此功能后,无效的 SQL 可能会被替换为固定的脱敏消息,凭据(用户/密码)将被隐藏,并且 SQL 格式化程序需要生成一个经过清理的表示(它还可以启用摘要式输出)。这减少了审计/内部日志中敏感文字和凭据的泄露,但也意味着日志和查询详细信息不再包含原始完整 SQL 文本(这会影响重放或调试)。 +- 引入版本:- + +##### internal_log_delete_age + +- 默认值:7d +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:指定 FE 内部日志文件(写入 `internal_log_dir`)的保留期限。该值是一个持续时间字符串。支持的后缀:`d`(天)、`h`(小时)、`m`(分钟)、`s`(秒)。示例:`7d`(7 天)、`10h`(10 小时)、`60m`(60 分钟)、`120s`(120 秒)。此项作为 `` 谓词替换到 Log4j 配置中,由 RollingFile 删除策略使用。在日志滚动期间,最后修改时间早于此持续时间的文件将被删除。增加此值可更快释放磁盘空间,或减少此值以更长时间保留内部物化视图或统计日志。 +- 引入版本:v3.2.4 + +##### internal_log_dir + +- 默认值:`Config.STARROCKS_HOME_DIR + "/log"` +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:FE 日志子系统用于存储内部日志 (`fe.internal.log`) 的目录。此配置将替换到 Log4j 配置中,并确定 InternalFile appender 写入内部/物化视图/统计日志的位置,以及 `internal.` 下的每个模块日志记录器放置其文件的位置。确保目录存在、可写且有足够的磁盘空间。此目录中文件的日志轮转和保留由 `log_roll_size_mb`、`internal_log_roll_num`、`internal_log_delete_age` 和 `internal_log_roll_interval` 控制。如果 `sys_log_to_console` 已启用,内部日志可能会写入控制台而不是此目录。 +- 引入版本:v3.2.4 + +##### internal_log_json_format + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `true` 时,内部统计/审计条目将作为紧凑的 JSON 对象写入统计审计日志记录器。JSON 包含键 "executeType" (InternalType: QUERY 或 DML)、"queryId"、"sql" 和 "time"(经过的毫秒)。当设置为 `false` 时,相同的信息将作为单个格式化文本行记录 ("statistic execute: ... | QueryId: [...] | SQL: ...")。启用 JSON 可改善机器解析和与日志处理器的集成,但也会导致原始 SQL 文本包含在日志中,这可能会暴露敏感信息并增加日志大小。 +- 引入版本:- + +##### internal_log_modules + +- 默认值:`{"base", "statistic"}` +- 类型:String[] +- 单位:- +- 是否可变:否 +- 描述:将接收专用内部日志记录的模块标识符列表。对于每个条目 X,Log4j 会创建一个名为 `internal.` 的日志记录器,其级别为 INFO,additivity="false"。这些日志记录器被路由到内部 appender(写入 `fe.internal.log`)或在 `sys_log_to_console` 启用时路由到控制台。根据需要使用短名称或包片段 — 确切的日志记录器名称变为 `internal.` + 配置的字符串。内部日志文件轮转和保留遵循 `internal_log_dir`、`internal_log_roll_num`、`internal_log_delete_age`、`internal_log_roll_interval` 和 `log_roll_size_mb`。添加模块会导致其运行时消息分离到内部日志流中,以便于调试和审计。 +- 引入版本:v3.2.4 + +##### internal_log_roll_interval + +- 默认值:DAY +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:控制 FE 内部日志 appender 的基于时间的滚动间隔。接受的值(不区分大小写)为 `HOUR` 和 `DAY`。`HOUR` 生成每小时文件模式 (`"%d{yyyyMMddHH}"`),`DAY` 生成每日文件模式 (`"%d{yyyyMMdd}"`),这些模式由 RollingFile TimeBasedTriggeringPolicy 用于命名轮转的 `fe.internal.log` 文件。无效值会导致初始化失败(在构建活动 Log4j 配置时抛出 IOException)。滚动行为还取决于相关设置,如 `internal_log_dir`、`internal_roll_maxsize`、`internal_log_roll_num` 和 `internal_log_delete_age`。 +- 引入版本:v3.2.4 + +##### internal_log_roll_num + +- 默认值:90 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:为内部 appender (`fe.internal.log`) 保留的轮转内部 FE 日志文件的最大数量。此值用作 Log4j DefaultRolloverStrategy `max` 属性;当发生轮转时,StarRocks 最多保留 `internal_log_roll_num` 个归档文件并删除旧文件(也受 `internal_log_delete_age` 控制)。较低的值会减少磁盘使用,但缩短日志历史记录;较高的值会保留更多的历史内部日志。此项与 `internal_log_dir`、`internal_log_roll_interval` 和 `internal_roll_maxsize` 协同工作。 +- 引入版本:v3.2.4 + +##### log_cleaner_audit_log_min_retention_days + +- 默认值:3 +- 类型:Int +- 单位:天 +- 是否可变:是 +- 描述:审计日志文件的最小保留天数。早于此天数的审计日志文件即使磁盘使用率高也不会被删除。这确保了审计日志为合规性和故障排除目的而保留。 +- 引入版本:- + +##### log_cleaner_check_interval_second + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:检查磁盘使用情况和清理日志的时间间隔(秒)。清理程序定期检查每个日志目录的磁盘使用情况,并在必要时触发清理。默认值为 300 秒(5 分钟)。 +- 引入版本:- + +##### log_cleaner_disk_usage_target + +- 默认值:60 +- 类型:Int +- 单位:百分比 +- 是否可变:是 +- 描述:日志清理后的目标磁盘使用率(百分比)。日志清理将持续进行,直到磁盘使用率降至此阈值以下。清理程序会逐个删除最旧的日志文件,直到达到目标。 +- 引入版本:- + +##### log_cleaner_disk_usage_threshold + +- 默认值:80 +- 类型:Int +- 单位:百分比 +- 是否可变:是 +- 描述:触发日志清理的磁盘使用率阈值(百分比)。当磁盘使用率超过此阈值时,日志清理将开始。清理程序独立检查每个配置的日志目录,并处理超过此阈值的目录。 +- 引入版本:- + +##### log_cleaner_disk_util_based_enable + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用基于磁盘使用率的自动日志清理。启用后,当磁盘使用率超过阈值时,将清理日志。日志清理程序作为 FE 节点上的后台守护进程运行,有助于防止日志文件堆积耗尽磁盘空间。 +- 引入版本:- + +##### log_plan_cancelled_by_crash_be + +- 默认值:true +- 类型:boolean +- 单位:- +- 是否可变:是 +- 描述:当查询因 BE 崩溃或 RPC 异常而被取消时,是否启用查询执行计划日志记录。启用此功能后,当查询因 BE 崩溃或 `RpcException` 而被取消时,StarRocks 会将查询执行计划(在 `TExplainLevel.COSTS` 级别)记录为 WARN 条目。日志条目包括 QueryId、SQL 和 COSTS 计划;在 ExecuteExceptionHandler 路径中,异常堆栈跟踪也会被记录。当 `enable_collect_query_detail_info` 启用时(计划存储在查询详细信息中),日志记录会被跳过 —— 在代码路径中,通过验证查询详细信息是否为 null 来执行检查。请注意,在 ExecuteExceptionHandler 中,计划仅在第一次重试 (`retryTime == 0`) 时记录。启用此功能可能会增加日志量,因为完整的 COSTS 计划可能很大。 +- 引入版本:v3.2.0 + +##### log_register_and_unregister_query_id + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否允许 FE 记录来自 QeProcessorImpl 的查询注册和注销消息(例如,`"register query id = {}"` 和 `"deregister query id = {}"`)。仅当查询具有非空 ConnectContext 且命令不是 `COM_STMT_EXECUTE` 或会话变量 `isAuditExecuteStmt()` 为 true 时才发出日志。由于这些消息是针对每个查询生命周期事件写入的,因此启用此功能可能会产生大量日志并成为高并发环境中的吞吐量瓶颈。启用它用于调试或审计;禁用它以减少日志记录开销并提高性能。 +- 引入版本:v3.3.0, v3.4.0, v3.5.0 + +##### log_roll_size_mb + +- 默认值:1024 +- 类型:Int +- 单位:MB +- 是否可变:否 +- 描述:系统日志文件或审计日志文件的最大大小。 +- 引入版本:- + +##### proc_profile_file_retained_days + +- 默认值:1 +- 类型:Int +- 单位:天 +- 是否可变:是 +- 描述:保留在 `sys_log_dir/proc_profile` 下生成的进程分析文件(CPU 和内存)的天数。ProcProfileCollector 通过从当前时间(格式为 yyyyMMdd-HHmmss)减去 `proc_profile_file_retained_days` 天来计算截止日期,并删除时间戳部分在字典序上早于该截止日期的分析文件(即 `timePart.compareTo(timeToDelete) < 0`)。文件删除还遵循由 `proc_profile_file_retained_size_bytes` 控制的基于大小的截止日期。分析文件使用 `cpu-profile-` 和 `mem-profile-` 前缀,并在收集后进行压缩。 +- 引入版本:v3.2.12 + +##### proc_profile_file_retained_size_bytes + +- 默认值:2L * 1024 * 1024 * 1024 (2147483648) +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:在分析目录下保留的已收集 CPU 和内存分析文件(文件名前缀为 `cpu-profile-` 和 `mem-profile-`)的最大总字节数。当有效分析文件的总大小超过 `proc_profile_file_retained_size_bytes` 时,收集器会删除最旧的分析文件,直到剩余总大小小于或等于 `proc_profile_file_retained_size_bytes`。早于 `proc_profile_file_retained_days` 的文件也会被删除,无论大小如何。此设置控制分析归档的磁盘使用情况,并与 `proc_profile_file_retained_days` 交互以确定删除顺序和保留。 +- 引入版本:v3.2.12 + +##### profile_log_delete_age + +- 默认值:1d +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:控制 FE profile 日志文件在可删除之前保留多长时间。该值被注入到 Log4j 的 `` 策略(通过 `Log4jConfig`)中,并与 `profile_log_roll_interval` 和 `profile_log_roll_num` 等轮转设置一起应用。支持的后缀:`d`(天)、`h`(小时)、`m`(分钟)、`s`(秒)。例如:`7d`(7 天)、`10h`(10 小时)、`60m`(60 分钟)、`120s`(120 秒)。 +- 引入版本:v3.2.5 + +##### profile_log_dir + +- 默认值:`Config.STARROCKS_HOME_DIR + "/log"` +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:FE Profile 日志写入的目录。Log4jConfig 使用此值来放置与 Profile 相关的 appender(在此目录下创建 `fe.profile.log` 和 `fe.features.log` 等文件)。这些文件的轮转和保留由 `profile_log_roll_size_mb`、`profile_log_roll_num` 和 `profile_log_delete_age` 控制;时间戳后缀格式由 `profile_log_roll_interval` 控制(支持 DAY 或 HOUR)。由于默认目录位于 `STARROCKS_HOME_DIR` 下,请确保 FE 进程对此目录具有写入和轮转/删除权限。 +- 引入版本:v3.2.5 + +##### profile_log_roll_interval + +- 默认值:DAY +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:控制用于生成 Profile 日志文件名日期部分的时间粒度。有效值(不区分大小写)为 `HOUR` 和 `DAY`。`HOUR` 生成 `"%d{yyyyMMddHH}"` 模式(每小时时间桶),`DAY` 生成 `"%d{yyyyMMdd}"` 模式(每日时间桶)。此值在 Log4j 配置中计算 `profile_file_pattern` 时使用,并且仅影响滚动文件名称的基于时间的组件;基于大小的滚动仍由 `profile_log_roll_size_mb` 控制,保留由 `profile_log_roll_num` / `profile_log_delete_age` 控制。无效值会导致日志初始化期间抛出 IOException(错误消息:`"profile_log_roll_interval config error: "`)。对于高容量 Profile,选择 `HOUR` 以限制每小时的文件大小;对于每日聚合,选择 `DAY`。 +- 引入版本:v3.2.5 + +##### profile_log_roll_num + +- 默认值:5 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:指定 Log4j 的 DefaultRolloverStrategy 为 Profile 日志记录器保留的轮转 Profile 日志文件的最大数量。此值注入到日志 XML 中,作为 `${profile_log_roll_num}`(例如 ``)。轮转由 `profile_log_roll_size_mb` 或 `profile_log_roll_interval` 触发;当发生轮转时,Log4j 最多保留这些索引文件,旧的索引文件将有资格被删除。磁盘上的实际保留也受 `profile_log_delete_age` 和 `profile_log_dir` 位置的影响。较低的值会减少磁盘使用,但限制保留历史记录;较高的值会保留更多的历史 Profile 日志。 +- 引入版本:v3.2.5 + +##### profile_log_roll_size_mb + +- 默认值:1024 +- 类型:Int +- 单位:MB +- 是否可变:否 +- 描述:设置触发 FE Profile 日志文件基于大小轮转的大小阈值(以兆字节为单位)。此值由 Log4j RollingFile SizeBasedTriggeringPolicy 用于 `ProfileFile` appender;当 Profile 日志超过 `profile_log_roll_size_mb` 时,它将被轮转。当达到 `profile_log_roll_interval` 时,也可以按时间进行轮转——任一条件都将触发轮转。结合 `profile_log_roll_num` 和 `profile_log_delete_age`,此项控制保留多少历史 Profile 文件以及何时删除旧文件。轮转文件的压缩由 `enable_profile_log_compress` 控制。 +- 引入版本:v3.2.5 + +##### qe_slow_log_ms + +- 默认值:5000 +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:用于确定查询是否为慢查询的阈值。如果查询的响应时间超过此阈值,则会在 **fe.audit.log** 中记录为慢查询。 +- 引入版本:- + +##### slow_lock_log_every_ms + +- 默认值:3000L +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:在为同一个 SlowLockLogStats 实例发出另一个“慢锁”警告之前等待的最小间隔(毫秒)。LockUtils 在锁等待超过 slow_lock_threshold_ms 后检查此值,并且会抑制额外的警告,直到自上次记录的慢锁事件以来经过了 slow_lock_log_every_ms 毫秒。使用较大的值可减少长时间争用期间的日志量,或使用较小的值可获得更频繁的诊断。更改在运行时对后续检查生效。 +- 引入版本:v3.2.0 + +##### slow_lock_print_stack + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否允许 LockManager 在 `logSlowLockTrace` 发出的慢锁警告的 JSON 有效负载中包含拥有线程的完整堆栈跟踪("stack" 数组通过 `LogUtil.getStackTraceToJsonArray` 使用 `start=0` 和 `max=Short.MAX_VALUE` 填充)。此配置仅控制当锁获取超过 `slow_lock_threshold_ms` 配置的阈值时显示的锁拥有者的额外堆栈信息。启用此功能通过提供精确的持有锁的线程堆栈来帮助调试;禁用此功能可减少日志量和高并发环境中捕获和序列化堆栈跟踪导致的 CPU/内存开销。 +- 引入版本:v3.3.16, v3.4.5, v3.5.1 + +##### slow_lock_threshold_ms + +- 默认值:3000L +- 类型:long +- 单位:毫秒 +- 是否可变:是 +- 描述:用于将锁操作或持有的锁归类为“慢”的阈值(毫秒)。当锁的等待或持有时间超过此值时,StarRocks 将(根据上下文)发出诊断日志、包含堆栈跟踪或等待者/拥有者信息,并且——在 LockManager 中——在此延迟后开始死锁检测。它被 LockUtils(慢锁日志记录)、QueryableReentrantReadWriteLock(过滤慢读器)、LockManager(死锁检测延迟和慢锁跟踪)、LockChecker(周期性慢锁检测)和其他调用者(例如 DiskAndTabletLoadReBalancer 日志记录)使用。降低该值会增加敏感度和日志/诊断开销;将其设置为 0 或负值会禁用初始基于等待的死锁检测延迟行为。与 slow_lock_log_every_ms、slow_lock_print_stack 和 slow_lock_stack_trace_reserve_levels 一起调整。 +- 引入版本:3.2.0 + +##### sys_log_delete_age + +- 默认值:7d +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:系统日志文件的保留期限。默认值 `7d` 指定每个系统日志文件可以保留 7 天。StarRocks 会检查每个系统日志文件并删除 7 天前生成的那些文件。 +- 引入版本:- + +##### sys_log_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储系统日志文件的目录。 +- 引入版本:- + +##### sys_log_enable_compress + +- 默认值:false +- 类型:boolean +- 单位:- +- 是否可变:否 +- 描述:当此项设置为 `true` 时,系统会将 ".gz" 后缀附加到轮转的系统日志文件名,以便 Log4j 生成 gzip 压缩的轮转 FE 系统日志(例如,fe.log.*)。此值在 Log4j 配置生成期间读取 (Log4jConfig.initLogging / generateActiveLog4jXmlConfig),并控制 RollingFile filePattern 中使用的 `sys_file_postfix` 属性。启用此功能可减少保留日志的磁盘使用,但会增加轮转期间的 CPU 和 I/O,并更改日志文件名,因此读取日志的工具或脚本必须能够处理 .gz 文件。请注意,审计日志使用单独的压缩配置,即 `audit_log_enable_compress`。 +- 引入版本:v3.2.12 + +##### sys_log_format + +- 默认值:"plaintext" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:选择用于 FE 日志的 Log4j 布局。有效值:`"plaintext"`(默认)和 `"json"`。值不区分大小写。`"plaintext"` 配置 PatternLayout,具有人类可读的时间戳、级别、线程、类.方法:行以及 WARN/ERROR 的堆栈跟踪。`"json"` 配置 JsonTemplateLayout,并发出结构化 JSON 事件(UTC 时间戳、级别、线程 ID/名称、源文件/方法/行、消息、异常堆栈跟踪),适用于日志聚合器 (ELK, Splunk)。JSON 输出遵循 `sys_log_json_max_string_length` 和 `sys_log_json_profile_max_string_length` 的最大字符串长度。 +- 引入版本:v3.2.10 + +##### sys_log_json_max_string_length + +- 默认值:1048576 +- 类型:Int +- 单位:字节 +- 是否可变:否 +- 描述:设置用于 JSON 格式系统日志的 JsonTemplateLayout "maxStringLength" 值。当 `sys_log_format` 设置为 `"json"` 时,如果字符串值字段(例如 "message" 和字符串化的异常堆栈跟踪)的长度超过此限制,则会被截断。该值注入到 `Log4jConfig.generateActiveLog4jXmlConfig()` 中生成的 Log4j XML 中,并应用于默认、警告、审计、转储和大查询布局。Profile 布局使用单独的配置 (`sys_log_json_profile_max_string_length`)。降低此值可减少日志大小,但可能会截断有用信息。 +- 引入版本:3.2.11 + +##### sys_log_json_profile_max_string_length + +- 默认值:104857600 (100 MB) +- 类型:Int +- 单位:字节 +- 是否可变:否 +- 描述:当 `sys_log_format` 为 "json" 时,设置 Profile(及相关功能)日志 appender 的 JsonTemplateLayout 的 maxStringLength。JSON 格式 Profile 日志中的字符串字段值将被截断到此字节长度;非字符串字段不受影响。此项应用于 Log4jConfig `JsonTemplateLayout maxStringLength`,并在使用 `plaintext` 日志记录时忽略。将该值设置得足够大以容纳所需的完整消息,但请注意,较大的值会增加日志大小和 I/O。 +- 引入版本:v3.2.11 + +##### sys_log_level + +- 默认值:INFO +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:系统日志条目分类的严重性级别。有效值:`INFO`、`WARN`、`ERROR` 和 `FATAL`。 +- 引入版本:- + +##### sys_log_roll_interval + +- 默认值:DAY +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:StarRocks 轮转系统日志条目的时间间隔。有效值:`DAY` 和 `HOUR`。 + - 如果此参数设置为 `DAY`,则在系统日志文件名称中添加 `yyyyMMdd` 格式的后缀。 + - 如果此参数设置为 `HOUR`,则在系统日志文件名称中添加 `yyyyMMddHH` 格式的后缀。 +- 引入版本:- + +##### sys_log_roll_num + +- 默认值:10 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:在 `sys_log_roll_interval` 参数指定的每个保留期内可以保留的系统日志文件的最大数量。 +- 引入版本:- + +##### sys_log_to_console + +- 默认值:false(除非环境变量 `SYS_LOG_TO_CONSOLE` 设置为 "1") +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:当此项设置为 `true` 时,系统会将 Log4j 配置为将所有日志发送到控制台 (ConsoleErr appender),而不是基于文件的 appender。此值在生成活动 Log4j XML 配置时读取(这会影响根日志记录器和每个模块日志记录器 appender 的选择)。其值在进程启动时从 `SYS_LOG_TO_CONSOLE` 环境变量中捕获。在运行时更改它无效。此配置通常用于容器化或 CI 环境中,其中首选 stdout/stderr 日志收集而不是写入日志文件。 +- 引入版本:v3.2.0 + +##### sys_log_verbose_modules + +- 默认值:空字符串 +- 类型:String[] +- 单位:- +- 是否可变:否 +- 描述:StarRocks 为其生成系统日志的模块。如果此参数设置为 `org.apache.starrocks.catalog`,StarRocks 将仅为 catalog 模块生成系统日志。模块名称之间用逗号 (,) 和空格分隔。 +- 引入版本:- + +##### sys_log_warn_modules + +- 默认值:{} +- 类型:String[] +- 单位:- +- 是否可变:否 +- 描述:在启动时系统将配置为 WARN 级别日志记录器并路由到警告 appender (SysWF) —— `fe.warn.log` 文件的日志记录器名称或包前缀列表。条目插入到生成的 Log4j 配置中(与 org.apache.kafka、org.apache.hudi 和 org.apache.hadoop.io.compress 等内置警告模块一起),并生成 `` 这样的日志记录器元素。建议使用完全限定的包和类前缀(例如,"com.example.lib"),以抑制 INFO/DEBUG 输出到常规日志中,并允许单独捕获警告。 +- 引入版本:v3.2.13 + +### 服务器 + +##### brpc_idle_wait_max_time + +- 默认值:10000 +- 类型:Int +- 单位:毫秒 +- 是否可变:否 +- 描述:bRPC 客户端在空闲状态下等待的最长时间。 +- 引入版本:- + +##### brpc_inner_reuse_pool + +- 默认值:true +- 类型:boolean +- 单位:- +- 是否可变:否 +- 描述:控制底层 BRPC 客户端是否为连接/通道使用内部共享重用池。StarRocks 在 BrpcProxy 中构建 RpcClientOptions 时读取 `brpc_inner_reuse_pool`(通过 `rpcOptions.setInnerResuePool(...)`)。启用时(true),RPC 客户端重用内部池以减少每次调用的连接创建,降低 FE 到 BE / LakeService RPC 的连接流失、内存和文件描述符使用。禁用时(false),客户端可能会创建更多隔离池(以更高的资源使用为代价增加并发隔离)。更改此值需要重启进程才能生效。 +- 引入版本:v3.3.11, v3.4.1, v3.5.0 + +##### brpc_min_evictable_idle_time_ms + +- 默认值:120000 +- 类型:Int +- 单位:毫秒 +- 是否可变:否 +- 描述:空闲 BRPC 连接在连接池中必须保持空闲状态的毫秒数,然后才能被驱逐。应用于 `BrpcProxy` 使用的 RpcClientOptions(通过 RpcClientOptions.setMinEvictableIdleTime)。增加此值可使空闲连接保持更长时间(减少重新连接的流失);降低此值可更快释放未使用的套接字(减少资源使用)。与 `brpc_connection_pool_size` 和 `brpc_idle_wait_max_time` 一起调整,以平衡连接重用、池增长和驱逐行为。 +- 引入版本:v3.3.11, v3.4.1, v3.5.0 + +##### brpc_reuse_addr + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:当为 true 时,StarRocks 会设置套接字选项,允许 brpc RpcClient 创建的客户端套接字重用本地地址(通过 RpcClientOptions.setReuseAddress)。启用此功能可减少绑定失败,并允许在套接字关闭后更快地重新绑定本地端口,这对于高速率连接流失或快速重启很有帮助。当为 false 时,禁用地址/端口重用,这可以减少意外端口共享的可能性,但可能会增加瞬时绑定错误。此选项与 `brpc_connection_pool_size` 和 `brpc_short_connection` 配置的连接行为交互,因为它会影响客户端套接字重新绑定和重用的速度。 +- 引入版本:v3.3.11, v3.4.1, v3.5.0 + +##### cluster_name + +- 默认值:StarRocks Cluster +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:FE 所属的 StarRocks 集群的名称。集群名称显示在网页的 `Title` 上。 +- 引入版本:- + +##### dns_cache_ttl_seconds + +- 默认值:60 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:成功 DNS 查询的 DNS 缓存 TTL(生存时间,秒)。此项设置 Java 安全属性 `networkaddress.cache.ttl`,它控制 JVM 缓存成功 DNS 查询的时长。将此项设置为 `-1` 以允许系统始终缓存信息,或设置为 `0` 以禁用缓存。这在 IP 地址频繁变化的环境中特别有用,例如 Kubernetes 部署或使用动态 DNS 时。 +- 引入版本:v3.5.11, v4.0.4 + +##### enable_http_async_handler + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否允许系统异步处理 HTTP 请求。如果启用此功能,Netty worker 线程收到的 HTTP 请求将被提交到单独的线程池进行服务逻辑处理,以避免阻塞 HTTP 服务器。如果禁用,Netty worker 将处理服务逻辑。 +- 引入版本:4.0.0 + +##### enable_http_validate_headers + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:控制 Netty 的 HttpServerCodec 是否执行严格的 HTTP 头部验证。该值在 HttpServer 中初始化 HTTP 管道时传递给 HttpServerCodec(参见 UseLocations)。默认值为 false 以实现向后兼容性,因为较新的 Netty 版本强制执行更严格的头部规则 (https://github.com/netty/netty/pull/12760)。设置为 true 以强制执行符合 RFC 的头部检查;这样做可能会导致来自旧客户端或代理的格式错误或不符合规范的请求被拒绝。更改需要重启 HTTP 服务器才能生效。 +- 引入版本:v3.3.0, v3.4.0, v3.5.0 + +##### enable_https + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:FE 节点是否启用 HTTPS 服务器以及 HTTP 服务器。 +- 引入版本:v4.0 + +##### frontend_address + +- 默认值:0.0.0.0 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:FE 节点的 IP 地址。 +- 引入版本:- + +##### http_async_threads_num + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:异步 HTTP 请求处理线程池的大小。别名是 `max_http_sql_service_task_threads_num`。 +- 引入版本:4.0.0 + +##### http_backlog_num + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 HTTP 服务器持有的积压队列的长度。 +- 引入版本:- + +##### http_max_chunk_size + +- 默认值:8192 +- 类型:Int +- 单位:字节 +- 是否可变:否 +- 描述:设置 FE HTTP 服务器中 Netty 的 HttpServerCodec 处理的单个 HTTP 块的最大允许大小(以字节为单位)。它作为第三个参数传递给 HttpServerCodec,并限制分块传输或流式请求/响应期间块的长度。如果传入块超过此值,Netty 将引发帧过大错误(例如 TooLongFrameException),并且请求可能会被拒绝。对于合法的分块上传,请增加此值;保持较小以减少内存压力和 DoS 攻击的表面积。此设置与 `http_max_initial_line_length`、`http_max_header_size` 和 `enable_http_validate_headers` 一起使用。 +- 引入版本:v3.2.0 + +##### http_max_header_size + +- 默认值:32768 +- 类型:Int +- 单位:字节 +- 是否可变:否 +- 描述:Netty 的 `HttpServerCodec` 解析的 HTTP 请求头块的最大允许大小(以字节为单位)。StarRocks 将此值传递给 `HttpServerCodec`(作为 `Config.http_max_header_size`);如果传入请求的头部(名称和值组合)超过此限制,编解码器将拒绝请求(解码器异常),并且连接/请求将失败。仅当客户端合法发送非常大的头部(大型 Cookie 或许多自定义头部)时才增加此值;较大的值会增加每个连接的内存使用。与 `http_max_initial_line_length` 和 `http_max_chunk_size` 结合调整。更改需要 FE 重启。 +- 引入版本:v3.2.0 + +##### http_max_initial_line_length + +- 默认值:4096 +- 类型:Int +- 单位:字节 +- 是否可变:否 +- 描述:设置 HttpServer 中使用的 Netty `HttpServerCodec` 接受的 HTTP 初始请求行(方法 + 请求目标 + HTTP 版本)的最大允许长度(以字节为单位)。该值传递给 Netty 的解码器,初始行长于此值的请求将被拒绝 (TooLongFrameException)。仅当您必须支持非常长的请求 URI 时才增加此值;较大的值会增加内存使用,并可能增加遭受格式错误/请求滥用的风险。与 `http_max_header_size` 和 `http_max_chunk_size` 一起调整。 +- 引入版本:v3.2.0 + +##### http_port + +- 默认值:8030 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 HTTP 服务器监听的端口。 +- 引入版本:- + +##### http_web_page_display_hardware + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当为 true 时,HTTP 索引页面 (/index) 将包含通过 oshi 库(CPU、内存、进程、磁盘、文件系统、网络等)填充的硬件信息部分。oshi 可能会间接调用系统实用程序或读取系统文件(例如,它可以执行 `getent passwd` 等命令),这可能会暴露敏感的系统数据。如果您需要更严格的安全性或希望避免在主机上执行这些间接命令,请将此配置设置为 false,以禁用在 Web UI 上收集和显示硬件详细信息。 +- 引入版本:v3.2.0 + +##### http_worker_threads_num + +- 默认值:0 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:HTTP 服务器处理 HTTP 请求的工作线程数。对于负值或 0 值,线程数将是 CPU 核心数的两倍。 +- 引入版本:v2.5.18, v3.0.10, v3.1.7, v3.2.2 + +##### https_port + +- 默认值:8443 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 HTTPS 服务器监听的端口。 +- 引入版本:v4.0 + +##### max_mysql_service_task_threads_num + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 MySQL 服务器可以运行的最大任务处理线程数。 +- 引入版本:- + +##### max_task_runs_threads_num + +- 默认值:512 +- 类型:Int +- 单位:线程 +- 是否可变:否 +- 描述:控制任务运行执行器线程池中的最大线程数。此值是并发任务运行执行的上限;增加它会提高并行度,但也会增加 CPU、内存和网络使用率,而减少它可能导致任务运行积压和更高的延迟。根据预期的并发调度作业和可用的系统资源调整此值。 +- 引入版本:v3.2.0 + +##### memory_tracker_enable + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用 FE 内存跟踪子系统。当 `memory_tracker_enable` 设置为 `true` 时,`MemoryUsageTracker` 定期扫描注册的元数据模块,更新内存中的 `MemoryUsageTracker.MEMORY_USAGE` map,记录总数,并导致 `MetricRepo` 在指标输出中暴露内存使用和对象计数 gauge。使用 `memory_tracker_interval_seconds` 控制采样间隔。启用此功能有助于监控和调试内存消耗,但会引入 CPU 和 I/O 开销以及额外的指标基数。 +- 引入版本:v3.2.4 + +##### memory_tracker_interval_seconds + +- 默认值:60 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:FE `MemoryUsageTracker` 守护进程轮询并记录 FE 进程和注册的 `MemoryTrackable` 模块内存使用情况的时间间隔(秒)。当 `memory_tracker_enable` 设置为 `true` 时,跟踪器按此频率运行,更新 `MEMORY_USAGE`,并记录聚合的 JVM 和跟踪模块使用情况。 +- 引入版本:v3.2.4 + +##### mysql_nio_backlog_num + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 MySQL 服务器持有的积压队列的长度。 +- 引入版本:- + +##### mysql_server_version + +- 默认值:8.0.33 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:返回给客户端的 MySQL 服务器版本。修改此参数将影响以下情况的版本信息: + 1. `select version();` + 2. 握手包版本 + 3. 全局变量 `version` 的值 (`show variables like 'version';`) +- 引入版本:- + +##### mysql_service_io_threads_num + +- 默认值:4 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 MySQL 服务器可以运行的最大 I/O 事件处理线程数。 +- 引入版本:- + +##### mysql_service_kill_after_disconnect + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:控制当检测到 MySQL TCP 连接关闭(读取时 EOF)时,服务器如何处理会话。如果设置为 `true`,服务器会立即终止该连接的任何正在运行的查询并立即执行清理。如果设置为 `false`,服务器在断开连接时不会终止正在运行的查询,并且仅在没有待处理请求任务时才执行清理,允许长时间运行的查询在客户端断开连接后继续执行。注意:尽管有一个简短的注释建议 TCP Keep-Alive,但此参数专门控制断开连接后的终止行为,应根据您是否希望终止孤立查询(在不可靠/负载均衡客户端后面推荐)或允许其完成进行设置。 +- 引入版本:- + +##### mysql_service_nio_enable_keep_alive + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:为 MySQL 连接启用 TCP Keep-Alive。对于负载均衡器后的长时间空闲连接很有用。 +- 引入版本:- + +##### net_use_ipv6_when_priority_networks_empty + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:一个布尔值,用于控制当未指定 `priority_networks` 时是否优先使用 IPv6 地址。`true` 表示当托管节点的服务器同时具有 IPv4 和 IPv6 地址且未指定 `priority_networks` 时,允许系统优先使用 IPv6 地址。 +- 引入版本:v3.3.0 + +##### priority_networks + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:声明一种用于具有多个 IP 地址的服务器的选择策略。请注意,最多只有一个 IP 地址必须与此参数指定的列表匹配。此参数的值是一个列表,由以分号 (;) 分隔的 CIDR 表示法条目组成,例如 10.10.10.0/24。如果没有 IP 地址与此列表中的条目匹配,则将随机选择服务器的可用 IP 地址。从 v3.3.0 开始,StarRocks 支持基于 IPv6 的部署。如果服务器同时具有 IPv4 和 IPv6 地址,并且未指定此参数,则系统默认使用 IPv4 地址。您可以通过将 `net_use_ipv6_when_priority_networks_empty` 设置为 `true` 来更改此行为。 +- 引入版本:- + +##### proc_profile_cpu_enable + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `true` 时,后台 `ProcProfileCollector` 将使用 `AsyncProfiler` 收集 CPU Profile,并将 HTML 报告写入 `sys_log_dir/proc_profile` 下。每次收集运行都会在 `proc_profile_collect_time_s` 配置的持续时间内记录 CPU 堆栈,并使用 `proc_profile_jstack_depth` 作为 Java 堆栈深度。生成的 Profile 会根据 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 进行压缩和删除。`AsyncProfiler` 需要本机库 (`libasyncProfiler.so`);`one.profiler.extractPath` 设置为 `STARROCKS_HOME_DIR/bin` 以避免 `/tmp` 上的 noexec 问题。 +- 引入版本:v3.2.12 + +##### qe_max_connection + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:所有用户可以与 FE 节点建立的最大连接数。从 v3.1.12 和 v3.2.7 开始,默认值已从 `1024` 更改为 `4096`。 +- 引入版本:- + +##### query_port + +- 默认值:9030 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 MySQL 服务器监听的端口。 +- 引入版本:- + +##### rpc_port + +- 默认值:9020 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 Thrift 服务器监听的端口。 +- 引入版本:- + +##### slow_lock_stack_trace_reserve_levels + +- 默认值:15 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:控制当 StarRocks 转储慢锁或持有锁的调试信息时捕获和发出的堆栈跟踪帧数。此值由 `QueryableReentrantReadWriteLock` 传递给 `LogUtil.getStackTraceToJsonArray`,用于生成独占锁所有者、当前线程和最旧/共享读取器的 JSON。增加此值可提供更多上下文以诊断慢锁或死锁问题,但会以更大的 JSON 有效负载和捕获堆栈的 CPU/内存略高为代价;减少此值可降低开销。注意:当仅记录慢锁时,读取器条目可以由 `slow_lock_threshold_ms` 过滤。 +- 引入版本:v3.4.0, v3.5.0 + +##### ssl_cipher_blacklist + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:以逗号分隔的列表,支持正则表达式,用于通过 IANA 名称列出 SSL 密码套件黑名单。如果同时设置了白名单和黑名单,则黑名单优先。 +- 引入版本:v4.0 + +##### ssl_cipher_whitelist + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:以逗号分隔的列表,支持正则表达式,用于通过 IANA 名称列出 SSL 密码套件白名单。如果同时设置了白名单和黑名单,则黑名单优先。 +- 引入版本:v4.0 + +##### task_runs_concurrency + +- 默认值:4 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:并发运行 TaskRun 实例的全局限制。当当前运行计数大于或等于 `task_runs_concurrency` 时,`TaskRunScheduler` 会停止调度新的运行,因此此值限制了跨调度器的并行 TaskRun 执行。它还用于 `MVPCTRefreshPartitioner` 计算每个 TaskRun 的分区刷新粒度。增加该值会提高并行度并增加资源使用;减少它会降低并发度并使每个运行的分区刷新更大。除非有意禁用调度,否则不要设置为 0 或负值:0(或负值)将有效地阻止 `TaskRunScheduler` 调度新的 TaskRun。 +- 引入版本:v3.2.0 + +##### task_runs_queue_length + +- 默认值:500 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:限制待处理队列中保留的最大待处理 TaskRun 项数。`TaskRunManager` 检查当前待处理计数,并且当有效待处理 TaskRun 计数大于或等于 `task_runs_queue_length` 时拒绝新的提交。在添加合并/接受的 TaskRun 之前,会重新检查相同的限制。调整此值以平衡内存和调度积压:对于大量突发工作负载,设置为较高值以避免拒绝;对于限制内存和减少待处理积压,设置为较低值。 +- 引入版本:v3.2.0 + +##### thrift_backlog_num + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 节点中 Thrift 服务器持有的积压队列的长度。 +- 引入版本:- + +##### thrift_client_timeout_ms + +- 默认值:5000 +- 类型:Int +- 单位:毫秒 +- 是否可变:否 +- 描述:空闲客户端连接超时的时间长度。 +- 引入版本:- + +##### thrift_rpc_max_body_size + +- 默认值:-1 +- 类型:Int +- 单位:字节 +- 是否可变:否 +- 描述:控制构建服务器的 Thrift 协议时允许的最大 Thrift RPC 消息体大小(以字节为单位)(传递给 `ThriftServer` 中的 TBinaryProtocol.Factory)。值为 `-1` 表示禁用限制(无界)。设置正值会强制执行上限,以便大于此值的消息被 Thrift 层拒绝,这有助于限制内存使用并缓解超大请求或 DoS 风险。将此设置为足够大的值,以适应预期负载(大型结构或批量数据),以避免拒绝合法请求。 +- 引入版本:v3.2.0 + +##### thrift_server_max_worker_threads + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE 节点中 Thrift 服务器支持的最大工作线程数。 +- 引入版本:- + +##### thrift_server_queue_size + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:请求待处理队列的长度。如果 Thrift 服务器中正在处理的线程数超过 `thrift_server_max_worker_threads` 中指定的值,则新请求将添加到待处理队列中。 +- 引入版本:- + +### 元数据和集群管理 + +##### alter_max_worker_queue_size + +- 默认值:4096 +- 类型:Int +- 单位:任务 +- 是否可变:否 +- 描述:控制 alter 子系统使用的内部 worker 线程池队列的容量。它与 `alter_max_worker_threads` 一起传递给 `AlterHandler` 中的 `ThreadPoolManager.newDaemonCacheThreadPool`。当待处理的 alter 任务数量超过 `alter_max_worker_queue_size` 时,新的提交将被拒绝,并可能抛出 `RejectedExecutionException`(参见 `AlterHandler.handleFinishAlterTask`)。调整此值以平衡内存使用和您允许的并发 alter 任务的积压量。 +- 引入版本:v3.2.0 + +##### alter_max_worker_threads + +- 默认值:4 +- 类型:Int +- 单位:线程 +- 是否可变:否 +- 描述:设置 AlterHandler 线程池中 worker 线程的最大数量。AlterHandler 使用此值构建执行器来运行和完成与 alter 相关的任务(例如,通过 handleFinishAlterTask 提交 AlterReplicaTask)。此值限制并发执行 alter 操作;增加它会提高并行度并增加资源使用,降低它会限制并发 alter 并可能成为瓶颈。执行器与 `alter_max_worker_queue_size` 一起创建,并且处理程序调度使用 `alter_scheduler_interval_millisecond`。 +- 引入版本:v3.2.0 + +##### automated_cluster_snapshot_interval_seconds + +- 默认值:600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:触发自动集群快照任务的时间间隔。 +- 引入版本:v3.4.2 + +##### background_refresh_metadata_interval_millis + +- 默认值:600000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:两次连续 Hive 元数据缓存刷新之间的时间间隔。 +- 引入版本:v2.5.5 + +##### background_refresh_metadata_time_secs_since_last_access_secs + +- 默认值:3600 * 24 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:Hive 元数据刷新任务的过期时间。对于已访问的 Hive Catalog,如果超过指定时间未访问,StarRocks 将停止刷新其缓存的元数据。对于未访问的 Hive Catalog,StarRocks 将不会刷新其缓存的元数据。 +- 引入版本:v2.5.5 + +##### bdbje_cleaner_threads + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:StarRocks 日志使用的 Berkeley DB Java Edition (JE) 环境的后台清理线程数。此值在 `BDBEnvironment.initConfigs` 中的环境初始化期间读取,并使用 `Config.bdbje_cleaner_threads` 应用于 `EnvironmentConfig.CLEANER_THREADS`。它控制 JE 日志清理和空间回收的并行度;增加它可以加速清理,但会以额外的 CPU 和 I/O 干扰前台操作为代价。更改仅在 BDB 环境(重新)初始化时生效,因此需要重启前端才能应用新值。 +- 引入版本:v3.2.0 + +##### bdbje_heartbeat_timeout_second + +- 默认值:30 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:StarRocks 集群中 Leader、Follower 和 Observer FE 之间心跳超时的时间量。 +- 引入版本:- + +##### bdbje_lock_timeout_second + +- 默认值:1 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:基于 BDB JE 的 FE 中的锁超时的时间量。 +- 引入版本:- + +##### bdbje_replay_cost_percent + +- 默认值:150 +- 类型:Int +- 单位:百分比 +- 是否可变:否 +- 描述:设置从 BDB JE 日志重放事务相对于通过网络恢复获取相同数据的相对成本(百分比)。该值提供给底层 JE 复制参数 REPLAY_COST_PERCENT,通常 `>100` 表示重放通常比网络恢复更昂贵。当决定是否保留清理的日志文件以进行潜在重放时,系统将重放成本乘以日志大小与网络恢复的成本进行比较;如果网络恢复被判断为更有效,则文件将被删除。值为 0 会禁用基于此成本比较的保留。在 `REP_STREAM_TIMEOUT` 内或任何活动复制所需的日志文件始终保留。 +- 引入版本:v3.2.0 + +##### bdbje_replica_ack_timeout_second + +- 默认值:10 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:当元数据从 Leader FE 写入 Follower FE 时,Leader FE 可以等待指定数量的 Follower FE 返回 ACK 消息的最长时间。单位:秒。如果正在写入大量元数据,Follower FE 需要很长时间才能向 Leader FE 返回 ACK 消息,从而导致 ACK 超时。在这种情况下,元数据写入失败,FE 进程退出。我们建议您增加此参数的值以防止这种情况。 +- 引入版本:- + +##### bdbje_reserved_disk_size + +- 默认值:512 * 1024 * 1024 (536870912) +- 类型:Long +- 单位:字节 +- 是否可变:否 +- 描述:限制 Berkeley DB JE 将保留为“未保护”(可删除)日志/数据文件的字节数。StarRocks 通过 `EnvironmentConfig.RESERVED_DISK` 在 BDBEnvironment 中将此值传递给 JE;JE 的内置默认值为 0(无限制)。StarRocks 的默认值(512 MiB)可防止 JE 为未保护文件保留过多磁盘空间,同时允许安全清理过时文件。在磁盘受限的系统上调整此值:减少它可以让 JE 更快地释放更多文件,增加它可以让 JE 保留更多保留空间。更改需要重启进程才能生效。 +- 引入版本:v3.2.0 + +##### bdbje_reset_election_group + +- 默认值:false +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:是否重置 BDBJE 复制组。如果此参数设置为 `TRUE`,FE 将重置 BDBJE 复制组(即删除所有可选举 FE 节点的信息),并作为 Leader FE 启动。重置后,此 FE 将是集群中唯一的成员,其他 FE 可以通过使用 `ALTER SYSTEM ADD/DROP FOLLOWER/OBSERVER 'xxx'` 重新加入此集群。仅当由于大多数 Follower FE 的数据已损坏而无法选举 Leader FE 时才使用此设置。`reset_election_group` 用于替换 `metadata_failure_recovery`。 +- 引入版本:- + +##### black_host_connect_failures_within_time + +- 默认值:5 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:黑名单 BE 节点允许的连接失败阈值。如果 BE 节点被自动添加到 BE 黑名单,StarRocks 将评估其连接性并判断是否可以将其从 BE 黑名单中移除。在 `black_host_history_sec` 内,只有当黑名单 BE 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从 BE 黑名单中移除。 +- 引入版本:v3.3.0 + +##### black_host_history_sec + +- 默认值:2 * 60 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:保留 BE 黑名单中 BE 节点历史连接失败的时间长度。如果 BE 节点被自动添加到 BE 黑名单,StarRocks 将评估其连接性并判断是否可以将其从 BE 黑名单中移除。在 `black_host_history_sec` 内,只有当黑名单 BE 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从 BE 黑名单中移除。 +- 引入版本:v3.3.0 + +##### brpc_connection_pool_size + +- 默认值:16 +- 类型:Int +- 单位:连接 +- 是否可变:否 +- 描述:FE 的 BrpcProxy 每个端点使用的最大 BRPC 连接池数量。此值通过 `setMaxTotoal` 和 `setMaxIdleSize` 应用于 RpcClientOptions,因此它直接限制并发传出的 BRPC 请求,因为每个请求都必须从池中借用一个连接。在高并发场景中,增加此值以避免请求排队;增加它会提高套接字和内存使用率,并可能增加远程服务器负载。调整时,请考虑相关设置,如 `brpc_idle_wait_max_time`、`brpc_short_connection`、`brpc_inner_reuse_pool`、`brpc_reuse_addr` 和 `brpc_min_evictable_idle_time_ms`。更改此值不可热重载,需要重启。 +- 引入版本:v3.2.0 + +##### brpc_short_connection + +- 默认值:false +- 类型:boolean +- 单位:- +- 是否可变:否 +- 描述:控制底层 brpc RpcClient 是否使用短生命周期连接。启用时 (`true`),RpcClientOptions.setShortConnection 将被设置,并且连接在请求完成后关闭,从而以更高的连接设置开销和增加的延迟为代价减少长生命周期套接字的数量。禁用时 (`false`,默认值) 使用持久连接和连接池。启用此选项会影响连接池行为,应与 `brpc_connection_pool_size`、`brpc_idle_wait_max_time`、`brpc_min_evictable_idle_time_ms`、`brpc_reuse_addr` 和 `brpc_inner_reuse_pool` 一起考虑。对于典型的高吞吐量部署,请保持禁用状态;仅在需要限制套接字生命周期或网络策略要求短连接时才启用。 +- 引入版本:v3.3.11, v3.4.1, v3.5.0 + +##### catalog_try_lock_timeout_ms + +- 默认值:5000 +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:获取全局锁的超时时长。 +- 引入版本:- + +##### checkpoint_only_on_leader + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当 `true` 时,CheckpointController 将只选择 Leader FE 作为检查点 worker;当 `false` 时,控制器可能会选择任何前端并优先选择堆使用率较低的节点。当 `false` 时,worker 按最近的失败时间排序,并按 `heapUsedPercent` 排序(Leader 被视为具有无限使用率以避免选择它)。对于需要集群快照元数据的操作,控制器已经强制选择 Leader,无论此标志如何。启用 `true` 会将检查点工作集中到 Leader 上(更简单,但会增加 Leader 的 CPU/内存和网络负载);保持 `false` 会将检查点负载分配给负载较低的 FE。此设置影响 worker 选择以及与超时(如 `checkpoint_timeout_seconds`)和 RPC 设置(如 `thrift_rpc_timeout_ms`)的交互。 +- 引入版本:v3.4.0, v3.5.0 + +##### checkpoint_timeout_seconds + +- 默认值:24 * 3600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:Leader 的 CheckpointController 等待检查点 worker 完成检查点的最长时间(秒)。控制器将此值转换为纳秒并轮询 worker 结果队列;如果在超时时间内未收到成功完成,则检查点被视为失败,并且 createImage 返回失败。增加此值可适应长时间运行的检查点,但会延迟故障检测和后续镜像传播;减少此值会导致更快的故障转移/重试,但可能会对慢 worker 产生错误超时。此设置仅控制 `CheckpointController` 在检查点创建期间的等待时间,不改变 worker 的内部检查点行为。 +- 引入版本:v3.4.0, v3.5.0 + +##### db_used_data_quota_update_interval_secs + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:数据库已用数据配额更新的时间间隔。StarRocks 定期更新所有数据库的已用数据配额,以跟踪存储消耗。此值用于配额强制和指标收集。允许的最小间隔为 30 秒,以防止系统负载过高。小于 30 的值将被拒绝。 +- 引入版本:- + +##### drop_backend_after_decommission + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:BE 退役后是否删除该 BE。`TRUE` 表示 BE 退役后立即删除该 BE。`FALSE` 表示 BE 退役后不删除该 BE。 +- 引入版本:- + +##### edit_log_port + +- 默认值:9010 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:集群中 Leader、Follower 和 Observer FE 之间通信使用的端口。 +- 引入版本:- + +##### edit_log_roll_num + +- 默认值:50000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:在为日志条目创建日志文件之前可以写入的最大元数据日志条目数。此参数用于控制日志文件的大小。新日志文件写入 BDBJE 数据库。 +- 引入版本:- + +##### edit_log_type + +- 默认值:BDB +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:可以生成的编辑日志类型。将值设置为 `BDB`。 +- 引入版本:- + +##### enable_background_refresh_connector_metadata + +- 默认值:v3.0 及更高版本中为 true,v2.5 中为 false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用周期性 Hive 元数据缓存刷新。启用后,StarRocks 会轮询 Hive 集群的 Metastore(Hive Metastore 或 AWS Glue),并刷新常用 Hive Catalog 的缓存元数据,以感知数据变化。`true` 表示启用 Hive 元数据缓存刷新,`false` 表示禁用。 +- 引入版本:v2.5.5 + +##### enable_collect_query_detail_info + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否收集查询的 Profile。如果此参数设置为 `TRUE`,系统将收集查询的 Profile。如果此参数设置为 `FALSE`,系统将不收集查询的 Profile。 +- 引入版本:- + +##### enable_create_partial_partition_in_batch + +- 默认值:false +- 类型:boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `false`(默认)时,StarRocks 强制批处理创建的范围分区与标准时间单位边界对齐。它将拒绝非对齐范围以避免创建空洞。将此项设置为 `true` 会禁用该对齐检查,并允许在批处理中创建部分(非标准)分区,这可能产生间隙或错位的分区范围。仅当您有意需要部分批处理分区并接受相关风险时才应将其设置为 `true`。 +- 引入版本:v3.2.0 + +##### enable_internal_sql + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:当此项设置为 `true` 时,内部组件(例如 SimpleExecutor)执行的内部 SQL 语句将被保留并写入内部审计或日志消息(如果 `enable_sql_desensitize_in_log` 设置为 true,还可以进一步脱敏)。当设置为 `false` 时,内部 SQL 文本将被抑制:格式化代码 (SimpleExecutor.formatSQL) 返回 "?",并且实际语句不会发出到内部审计或日志消息中。此配置不改变内部语句的执行语义 — 它仅控制内部 SQL 的日志记录和可见性,以用于隐私或安全目的。 +- 引入版本:- + +##### enable_legacy_compatibility_for_replication + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用复制的旧版兼容性。StarRocks 在旧版本和新版本之间可能表现不同,导致跨集群数据迁移出现问题。因此,在数据迁移之前,必须为目标集群启用旧版兼容性,并在数据迁移完成后禁用它。`true` 表示启用此模式。 +- 引入版本:v3.1.10, v3.2.6 + +##### enable_show_materialized_views_include_all_task_runs + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:控制 SHOW MATERIALIZED VIEWS 命令返回 TaskRun 的方式。当此项设置为 `false` 时,StarRocks 仅返回每个任务的最新 TaskRun(为兼容性而保留的旧版行为)。当设置为 `true`(默认值)时,`TaskManager` 可能会仅在 TaskRun 共享相同的起始 TaskRun ID(例如,属于同一个作业)时包含同一任务的额外 TaskRun,从而防止不相关的重复运行出现,同时允许显示与一个作业相关的多个状态。将此项设置为 `false` 以恢复单次运行输出,或显示多运行作业历史记录以进行调试和监控。 +- 引入版本:v3.3.0, v3.4.0, v3.5.0 + +##### enable_statistics_collect_profile + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否为统计查询生成 Profile。您可以将此项设置为 `true`,以允许 StarRocks 为系统统计查询生成查询 Profile。 +- 引入版本:v3.1.5 + +##### enable_table_name_case_insensitive + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否对 Catalog 名称、数据库名称、表名称、视图名称和物化视图名称启用不区分大小写处理。目前,表名称默认区分大小写。 + - 启用此功能后,所有相关名称将以小写形式存储,所有包含这些名称的 SQL 命令将自动将其转换为小写。 + - 仅在创建集群时才能启用此功能。**集群启动后,此配置的值不能以任何方式修改**。任何修改尝试都会导致错误。当 FE 检测到此配置项的值与集群首次启动时不一致时,FE 将无法启动。 + - 目前,此功能不支持 JDBC Catalog 和表名称。如果您想对 JDBC 或 ODBC 数据源执行不区分大小写处理,请不要启用此功能。 +- 引入版本:v4.0 + +##### enable_task_history_archive + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用时,完成的任务运行记录将被存档到持久化任务运行历史表,并记录到编辑日志中,以便查找(例如 `lookupHistory`、`lookupHistoryByTaskNames`、`lookupLastJobOfTasks`)包含存档结果。存档由 FE Leader 执行,并在单元测试期间跳过 (`FeConstants.runningUnitTest`)。启用时,内存中过期和强制 GC 路径将被绕过(代码从 `removeExpiredRuns` 和 `forceGC` 提前返回),因此保留/驱逐由持久存档处理,而不是 `task_runs_ttl_second` 和 `task_runs_max_history_number`。禁用时,历史记录保留在内存中,并由这些配置进行修剪。 +- 引入版本:v3.3.1, v3.4.0, v3.5.0 + +##### enable_task_run_fe_evaluation + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用时,FE 将对 `TaskRunsSystemTable.supportFeEvaluation` 中的系统表 `task_runs` 执行本地评估。FE 端评估仅允许对将列与常量进行比较的合取相等谓词,并且仅限于 `QUERY_ID` 和 `TASK_NAME` 列。启用此功能可通过避免更广泛的扫描或额外的远程处理来提高目标查找的性能;禁用此功能会强制规划器跳过 `task_runs` 的 FE 评估,这可能会减少谓词修剪并影响这些过滤器的查询延迟。 +- 引入版本:v3.3.13, v3.4.3, v3.5.0 + +##### heartbeat_mgr_blocking_queue_size + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:Heartbeat Manager 运行心跳任务的阻塞队列的大小。 +- 引入版本:- + +##### heartbeat_mgr_threads_num + +- 默认值:8 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:Heartbeat Manager 运行心跳任务的线程数。 +- 引入版本:- + +##### ignore_materialized_view_error + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:FE 是否忽略物化视图错误导致的元数据异常。如果 FE 因物化视图错误导致的元数据异常而无法启动,您可以将此参数设置为 `true`,以允许 FE 忽略此异常。 +- 引入版本:v2.5.10 + +##### ignore_meta_check + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:非 Leader FE 是否忽略 Leader FE 的元数据差异。如果值为 TRUE,非 Leader FE 将忽略 Leader FE 的元数据差异,并继续提供数据读取服务。此参数可确保即使您长时间停止 Leader FE,也能持续提供数据读取服务。如果值为 FALSE,非 Leader FE 不会忽略 Leader FE 的元数据差异,并停止提供数据读取服务。 +- 引入版本:- + +##### ignore_task_run_history_replay_error + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当 StarRocks 为 `information_schema.task_runs` 反序列化 TaskRun 历史行时,损坏或无效的 JSON 行通常会导致反序列化记录警告并抛出 RuntimeException。如果此项设置为 `true`,系统将捕获反序列化错误,跳过格式错误的记录,并继续处理剩余行,而不是使查询失败。这将使 `information_schema.task_runs` 查询能够容忍 `_statistics_.task_run_history` 表中的错误条目。请注意,启用它将静默删除损坏的历史记录(潜在数据丢失),而不是显式报错。 +- 引入版本:v3.3.3, v3.4.0, v3.5.0 + +##### lock_checker_interval_second + +- 默认值:30 +- 类型:long +- 单位:秒 +- 是否可变:是 +- 描述:LockChecker 前端守护进程(名为 "deadlock-checker")执行之间的时间间隔(秒)。该守护进程执行死锁检测和慢锁扫描;配置的值乘以 1000 以毫秒为单位设置计时器。减小此值可降低检测延迟,但会增加调度和 CPU 开销;增加此值可降低开销,但会延迟检测和慢锁报告。由于守护进程每次运行都会重置其间隔,因此更改在运行时生效。此设置与 `lock_checker_enable_deadlock_check`(启用死锁检查)和 `slow_lock_threshold_ms`(定义什么是慢锁)交互。 +- 引入版本:v3.2.0 + +##### master_sync_policy + +- 默认值:SYNC +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Leader FE 将日志刷新到磁盘的策略。此参数仅在当前 FE 为 Leader FE 时有效。有效值: + - `SYNC`:事务提交时,日志条目同时生成并刷新到磁盘。 + - `NO_SYNC`:事务提交时,日志条目的生成和刷新不同时发生。 + - `WRITE_NO_SYNC`:事务提交时,日志条目同时生成但不刷新到磁盘。 + + 如果您只部署了一个 Follower FE,建议将此参数设置为 `SYNC`。如果您部署了三个或更多 Follower FE,建议将此参数和 `replica_sync_policy` 都设置为 `WRITE_NO_SYNC`。 + +- 引入版本:- + +##### max_bdbje_clock_delta_ms + +- 默认值:5000 +- 类型:Long +- 单位:毫秒 +- 是否可变:否 +- 描述:StarRocks 集群中 Leader FE 与 Follower 或 Observer FE 之间允许的最大时钟偏移。 +- 引入版本:- + +##### meta_delay_toleration_second + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:Follower 和 Observer FE 上的元数据与 Leader FE 上的元数据之间允许的最大延迟持续时间。单位:秒。如果超过此持续时间,非 Leader FE 将停止提供服务。 +- 引入版本:- + +##### meta_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/meta" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储元数据的目录。 +- 引入版本:- + +##### metadata_ignore_unknown_operation_type + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否忽略未知日志 ID。当 FE 回滚时,早期版本的 FE 可能无法识别某些日志 ID。如果值为 `TRUE`,FE 将忽略未知日志 ID。如果值为 `FALSE`,FE 将退出。 +- 引入版本:- + +##### profile_info_format + +- 默认值:default +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:系统输出的 Profile 格式。有效值:`default` 和 `json`。当设置为 `default` 时,Profile 为默认格式。当设置为 `json` 时,系统以 JSON 格式输出 Profile。 +- 引入版本:v2.5 + +##### replica_ack_policy + +- 默认值:SIMPLE_MAJORITY +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:日志条目被视为有效的策略。默认值 `SIMPLE_MAJORITY` 指定如果大多数 Follower FE 返回 ACK 消息,则日志条目被视为有效。 +- 引入版本:- + +##### replica_sync_policy + +- 默认值:SYNC +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Follower FE 将日志刷新到磁盘的策略。此参数仅在当前 FE 为 Follower FE 时有效。有效值: + - `SYNC`:事务提交时,日志条目同时生成并刷新到磁盘。 + - `NO_SYNC`:事务提交时,日志条目的生成和刷新不同时发生。 + - `WRITE_NO_SYNC`:事务提交时,日志条目同时生成但不刷新到磁盘。 +- 引入版本:- + +##### start_with_incomplete_meta + +- 默认值:false +- 类型:boolean +- 单位:- +- 是否可变:否 +- 描述:当为 true 时,FE 将允许在镜像数据存在但 Berkeley DB JE (BDB) 日志文件丢失或损坏时启动。`MetaHelper.checkMetaDir()` 使用此标志绕过安全检查,否则会阻止在没有相应 BDB 日志的情况下从镜像启动;以这种方式启动可能会产生陈旧或不一致的元数据,并且仅应用于紧急恢复。`RestoreClusterSnapshotMgr` 在恢复集群快照时暂时将此标志设置为 true,然后回滚;该组件在恢复期间也会切换 `bdbje_reset_election_group`。在正常操作中不要启用 — 仅在从损坏的 BDB 数据恢复或显式恢复基于镜像的快照时才启用。 +- 引入版本:v3.2.0 + +##### table_keeper_interval_second + +- 默认值:30 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:TableKeeper 守护进程执行之间的时间间隔(秒)。TableKeeperDaemon 使用此值(乘以 1000)设置其内部计时器,并定期运行 keeper 任务,以确保历史表存在,正确的表属性(复制数量)并更新分区 TTL。守护进程仅在 Leader 节点上执行工作,并在 `table_keeper_interval_second` 更改时通过 setInterval 更新其运行时间隔。增加以减少调度频率和负载;减少以更快地响应缺失或陈旧的历史表。 +- 引入版本:v3.3.1, v3.4.0, v3.5.0 + +##### task_runs_ttl_second + +- 默认值:7 * 24 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:控制任务运行历史记录的生存时间 (TTL)。降低此值可缩短历史保留时间并减少内存/磁盘使用;提高它可使历史记录保留更长时间,但会增加资源使用。与 `task_runs_max_history_number` 和 `enable_task_history_archive` 一起调整,以实现可预测的保留和存储行为。 +- 引入版本:v3.2.0 + +##### task_ttl_second + +- 默认值:24 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:任务的生存时间 (TTL)。对于手动任务(未设置计划时),TaskBuilder 使用此值计算任务的 `expireTime` (`expireTime = now + task_ttl_second * 1000L`)。TaskRun 还使用此值作为计算运行执行超时的上限 — 有效执行超时是 `min(task_runs_timeout_second, task_runs_ttl_second, task_ttl_second)`。调整此值会改变手动创建任务的有效时间长度,并可能间接限制任务运行的最大允许执行时间。 +- 引入版本:v3.2.0 + +##### thrift_rpc_retry_times + +- 默认值:3 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:控制 Thrift RPC 调用将进行的尝试总数。此值由 `ThriftRPCRequestExecutor`(以及 `NodeMgr` 和 `VariableMgr` 等调用者)用作重试的循环计数 — 即,值 3 允许最多三次尝试,包括初始尝试。在 `TTransportException` 上,执行器将尝试重新打开连接并重试此计数;当原因是 `SocketTimeoutException` 或重新打开失败时,它不会重试。每次尝试都受 `thrift_rpc_timeout_ms` 配置的每次尝试超时限制。增加此值可提高对瞬态连接故障的弹性,但可能会增加整体 RPC 延迟和资源使用。 +- 引入版本:v3.2.0 + +##### thrift_rpc_strict_mode + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:控制 Thrift 服务器使用的 TBinaryProtocol "strict read" 模式。此值作为第一个参数传递给 Thrift 服务器堆栈中的 `org.apache.thrift.protocol.TBinaryProtocol.Factory`,并影响如何解析和验证传入的 Thrift 消息。当 `true`(默认值)时,服务器强制执行严格的 Thrift 编码/版本检查并遵守配置的 `thrift_rpc_max_body_size` 限制;当 `false` 时,服务器接受非严格(旧版/宽松)消息格式,这可以提高与旧客户端的兼容性,但可能会绕过某些协议验证。在运行中的集群上更改此值时请谨慎,因为它不可变且会影响互操作性和解析安全性。 +- 引入版本:v3.2.0 + +##### thrift_rpc_timeout_ms + +- 默认值:10000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:用作 Thrift RPC 调用的默认网络/套接字超时的毫秒数。在 `ThriftConnectionPool` 中创建 Thrift 客户端时(由前端和后端池使用)传递给 TSocket,并且在计算 RPC 调用超时时(例如 `ConfigBase`、`LeaderOpExecutor`、`GlobalStateMgr`、`NodeMgr`、`VariableMgr` 和 `CheckpointWorker` 等地方)也添加到操作的执行超时(例如 ExecTimeout*1000 + `thrift_rpc_timeout_ms`)。增加此值可使 RPC 调用容忍更长的网络或远程处理延迟;降低此值可在慢速网络上更快地进行故障转移。更改此值会影响执行 Thrift RPC 的 FE 代码路径中的连接创建和请求截止日期。 +- 引入版本:v3.2.0 + +##### txn_latency_metric_report_groups + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:以逗号分隔的事务延迟指标组列表,用于报告。加载类型被分类为逻辑组以进行监控。启用某个组时,其名称将作为“type”标签添加到事务指标中。有效值:`stream_load`、`routine_load`、`broker_load`、`insert` 和 `compaction`(仅适用于共享数据集群)。示例:`"stream_load,routine_load"`。 +- 引入版本:v4.0 + +##### txn_rollback_limit + +- 默认值:100 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:可以回滚的最大事务数。 +- 引入版本:- + +### 用户、角色和权限 + +##### enable_task_info_mask_credential + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当为 true 时,StarRocks 在将任务 SQL 定义返回到 `information_schema.tasks` 和 `information_schema.task_runs` 之前,通过对 DEFINITION 列应用 `SqlCredentialRedactor.redact` 来编辑凭据。在 `information_schema.task_runs` 中,无论定义来自任务运行状态,还是在为空时来自任务定义查找,都应用相同的编辑。当为 false 时,返回原始任务定义(可能会暴露凭据)。掩码是 CPU/字符串处理工作,当任务或 `task_runs` 数量很大时可能会非常耗时;仅当您需要未编辑的定义并接受安全风险时才禁用。 +- 引入版本:v3.5.6 + +##### privilege_max_role_depth + +- 默认值:16 +- 类型:Int +- 单位: +- 是否可变:是 +- 描述:角色的最大角色深度(继承级别)。 +- 引入版本:v3.0.0 + +##### privilege_max_total_roles_per_user + +- 默认值:64 +- 类型:Int +- 单位: +- 是否可变:是 +- 描述:一个用户可以拥有的最大角色数。 +- 引入版本:v3.0.0 + +### 查询引擎 + +##### brpc_send_plan_fragment_timeout_ms + +- 默认值:60000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:在发送计划片段之前应用于 BRPC TalkTimeoutController 的超时(毫秒)。`BackendServiceClient.sendPlanFragmentAsync` 在调用后端 `execPlanFragmentAsync` 之前设置此值。它控制 BRPC 在从连接池借用空闲连接以及执行发送时将等待多长时间;如果超出,RPC 将失败并可能触发方法的重试逻辑。在争用情况下,将其设置得较低可快速失败;将其设置得较高可容忍瞬态池耗尽或慢速网络。请谨慎:非常大的值可能会延迟故障检测并阻塞请求线程。 +- 引入版本:v3.3.11, v3.4.1, v3.5.0 + +##### connector_table_query_trigger_analyze_large_table_interval + +- 默认值:12 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:大表的查询触发 ANALYZE 任务的时间间隔。 +- 引入版本:v3.4.0 + +##### connector_table_query_trigger_analyze_max_pending_task_num + +- 默认值:100 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE 上处于 Pending 状态的查询触发 ANALYZE 任务的最大数量。 +- 引入版本:v3.4.0 + +##### connector_table_query_trigger_analyze_max_running_task_num + +- 默认值:2 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE 上处于 Running 状态的查询触发 ANALYZE 任务的最大数量。 +- 引入版本:v3.4.0 + +##### connector_table_query_trigger_analyze_small_table_interval + +- 默认值:2 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:小表的查询触发 ANALYZE 任务的时间间隔。 +- 引入版本:v3.4.0 + +##### connector_table_query_trigger_analyze_small_table_rows + +- 默认值:10000000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:查询触发 ANALYZE 任务判断表是否为小表的阈值。 +- 引入版本:v3.4.0 + +##### connector_table_query_trigger_task_schedule_interval + +- 默认值:30 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:调度器线程调度查询触发后台任务的时间间隔。此项取代了 v3.4.0 中引入的 `connector_table_query_trigger_analyze_schedule_interval`。此处,后台任务指 v3.4 中的 `ANALYZE` 任务,以及 v3.4 之后版本中低基数列字典的收集任务。 +- 引入版本:v3.4.2 + +##### create_table_max_serial_replicas + +- 默认值:128 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:串行创建副本的最大数量。如果实际副本数量超过此值,则将并发创建副本。如果表创建时间过长,请尝试减小此值。 +- 引入版本:- + +##### default_mv_partition_refresh_number + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:当物化视图刷新涉及多个分区时,此参数默认控制在单个批次中刷新多少个分区。 +从 v3.3.0 版本开始,系统默认一次刷新一个分区,以避免潜在的 OOM(内存溢出)问题。在早期版本中,默认一次刷新所有分区,这可能导致内存耗尽和任务失败。但是,请注意,当物化视图刷新涉及大量分区时,一次只刷新一个分区可能会导致过多的调度开销、更长的总体刷新时间以及大量的刷新记录。在这种情况下,建议适当调整此参数以提高刷新效率并降低调度成本。 +- 引入版本:v3.3.0 + +##### default_mv_refresh_immediate + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在创建异步物化视图后立即刷新。当此项设置为 `true` 时,新创建的物化视图将立即刷新。 +- 引入版本:v3.2.3 + +##### dynamic_partition_check_interval_seconds + +- 默认值:600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:检查新数据的时间间隔。如果检测到新数据,StarRocks 会自动为数据创建分区。 +- 引入版本:- + +##### dynamic_partition_enable + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用动态分区功能。启用此功能后,StarRocks 会动态为新数据创建分区,并自动删除过期分区以确保数据的时效性。 +- 引入版本:- + +##### enable_active_materialized_view_schema_strict_check + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:在激活非活动物化视图时,是否严格检查数据类型长度一致性。当此项设置为 `false` 时,如果基表中的数据类型长度已更改,则物化视图的激活不受影响。 +- 引入版本:v3.3.4 + +##### enable_auto_collect_array_ndv + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用 ARRAY 类型 NDV 信息的自动收集。 +- 引入版本:v4.0 + +##### enable_backup_materialized_view + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:在备份或恢复特定数据库时,是否启用异步物化视图的 BACKUP 和 RESTORE。如果此项设置为 `false`,StarRocks 将跳过备份异步物化视图。 +- 引入版本:v3.2.0 + +##### enable_collect_full_statistic + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用自动全量统计信息收集。此功能默认启用。 +- 引入版本:- + +##### enable_colocate_mv_index + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:创建同步物化视图时,是否支持将同步物化视图索引与基表进行 Colocate。如果此项设置为 `true`,Tablet sink 将加速同步物化视图的写入性能。 +- 引入版本:v3.2.0 + +##### enable_decimal_v3 + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否支持 DECIMAL V3 数据类型。 +- 引入版本:- + +##### enable_experimental_mv + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用异步物化视图功能。TRUE 表示启用此功能。从 v2.5.2 版本开始,此功能默认启用。对于 v2.5.2 之前的版本,此功能默认禁用。 +- 引入版本:v2.4 + +##### enable_local_replica_selection + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:查询是否选择本地副本。本地副本可降低网络传输成本。如果此参数设置为 TRUE,CBO 优先选择与当前 FE 具有相同 IP 地址的 BE 上的 Tablet 副本。如果此参数设置为 `FALSE`,则可以选择本地副本和非本地副本。 +- 引入版本:- + +##### enable_manual_collect_array_ndv + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用 ARRAY 类型 NDV 信息的手动收集。 +- 引入版本:v4.0 + +##### enable_materialized_view + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用物化视图创建。 +- 引入版本:- + +##### enable_materialized_view_external_table_precise_refresh + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:将此项设置为 `true` 以启用物化视图刷新时的内部优化,当基表是外部(非云原生)表时。启用后,物化视图刷新处理器会计算候选分区并仅刷新受影响的基表分区,而不是所有分区,从而减少 I/O 和刷新成本。将其设置为 `false` 以强制对外部表进行全分区刷新。 +- 引入版本:v3.2.9 + +##### enable_materialized_view_metrics_collect + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否默认收集异步物化视图的监控指标。 +- 引入版本:v3.1.11, v3.2.5 + +##### enable_materialized_view_spill + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用物化视图刷新任务的中间结果溢出。 +- 引入版本:v3.1.1 + +##### enable_materialized_view_text_based_rewrite + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否默认启用基于文本的查询重写。如果此项设置为 `true`,系统将在创建异步物化视图时构建抽象语法树。 +- 引入版本:v3.2.5 + +##### enable_mv_automatic_active_check + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用系统自动检查并重新激活因其基表(视图)发生 Schema 变更或被删除和重新创建而设置为非活动的异步物化视图。请注意,此功能不会重新激活用户手动设置为非活动的物化视图。 +- 引入版本:v3.1.6 + +##### enable_mv_automatic_repairing_for_broken_base_tables + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `true` 时,StarRocks 将尝试在基外部表被删除并重新创建或其表标识符更改时自动修复物化视图基表元数据。修复流程可以更新物化视图的基表信息,收集外部表分区的分区级修复信息,并驱动异步自动刷新物化视图的分区刷新决策,同时遵守 `autoRefreshPartitionsLimit`。目前,自动修复支持 Hive 外部表;不支持的表类型将导致物化视图设置为非活动状态并引发修复异常。分区信息收集是非阻塞的,并且会记录失败。 +- 引入版本:v3.3.19, v3.4.8, v3.5.6 + +##### enable_predicate_columns_collection + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用谓词列收集。如果禁用,谓词列在查询优化期间将不会被记录。 +- 引入版本:- + +##### enable_query_queue_v2 + +- 默认值:true +- 类型:boolean +- 单位:- +- 是否可变:否 +- 描述:当为 true 时,将 FE 基于 Slot 的查询调度器切换到 Query Queue V2。此标志由 Slot 管理器和跟踪器(例如 `BaseSlotManager.isEnableQueryQueueV2` 和 `SlotTracker#createSlotSelectionStrategy`)读取,以选择 `SlotSelectionStrategyV2` 而不是旧版策略。`query_queue_v2_xxx` 配置选项和 `QueryQueueOptions` 仅在此标志启用时生效。从 v4.1 开始,默认值从 `false` 更改为 `true`。 +- 引入版本:v3.3.4, v3.4.0, v3.5.0 + +##### enable_sql_blacklist + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用 SQL 查询黑名单检查。启用此功能后,黑名单中的查询无法执行。 +- 引入版本:- + +##### enable_statistic_collect + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否为 CBO 收集统计信息。此功能默认启用。 +- 引入版本:- + +##### enable_statistic_collect_on_first_load + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:控制数据加载操作触发的自动统计信息收集和维护。这包括: + - 数据首次加载到分区时(分区版本等于 2)的统计信息收集。 + - 数据加载到多分区表的空分区时的统计信息收集。 + - INSERT OVERWRITE 操作的统计信息复制和更新。 + + **统计信息收集类型决策策略:** + + - 对于 INSERT OVERWRITE:`deltaRatio = |targetRows - sourceRows| / (sourceRows + 1)` + - 如果 `deltaRatio < statistic_sample_collect_ratio_threshold_of_first_load`(默认值:0.1),则不执行统计信息收集。仅复制现有统计信息。 + - 否则,如果 `targetRows > statistic_sample_collect_rows`(默认值:200000),则使用 SAMPLE 统计信息收集。 + - 否则,使用 FULL 统计信息收集。 + + - 对于首次加载:`deltaRatio = loadRows / (totalRows + 1)` + - 如果 `deltaRatio < statistic_sample_collect_ratio_threshold_of_first_load`(默认值:0.1),则不执行统计信息收集。 + - 否则,如果 `loadRows > statistic_sample_collect_rows`(默认值:200000),则使用 SAMPLE 统计信息收集。 + - 否则,使用 FULL 统计信息收集。 + + **同步行为:** + + - 对于 DML 语句 (INSERT INTO/INSERT OVERWRITE):同步模式,带表锁。加载操作等待统计信息收集完成(最长 `semi_sync_collect_statistic_await_seconds`)。 + - 对于 Stream Load 和 Broker Load:异步模式,不带锁。统计信息收集在后台运行,不阻塞加载操作。 + + :::note + 禁用此配置将阻止所有加载触发的统计信息操作,包括 INSERT OVERWRITE 的统计信息维护,这可能导致表缺乏统计信息。如果频繁创建新表并频繁加载数据,启用此功能将增加内存和 CPU 开销。 + ::: + +- 引入版本:v3.1 + +##### enable_statistic_collect_on_update + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:控制 UPDATE 语句是否可以触发自动统计信息收集。启用后,修改表数据的 UPDATE 操作可能会通过与 `enable_statistic_collect_on_first_load` 控制的基于摄取数据的统计信息框架调度统计信息收集。禁用此配置将跳过 UPDATE 语句的统计信息收集,同时保持加载触发的统计信息收集行为不变。 +- 引入版本:v3.5.11, v4.0.4 + +##### enable_udf + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否启用 UDF。 +- 引入版本:- + +##### expr_children_limit + +- 默认值:10000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:一个表达式中允许的最大子表达式数量。 +- 引入版本:- + +##### histogram_buckets_size + +- 默认值:64 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:直方图的默认桶数。 +- 引入版本:- + +##### histogram_max_sample_row_count + +- 默认值:10000000 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:为直方图收集的最大行数。 +- 引入版本:- + +##### histogram_mcv_size + +- 默认值:100 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:直方图的最常见值 (MCV) 数量。 +- 引入版本:- + +##### histogram_sample_ratio + +- 默认值:0.1 +- 类型:Double +- 单位:- +- 是否可变:是 +- 描述:直方图的采样率。 +- 引入版本:- + +##### http_slow_request_threshold_ms + +- 默认值:5000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:如果 HTTP 请求的响应时间超过此参数指定的值,则生成日志以跟踪此请求。 +- 引入版本:v2.5.15, v3.1.5 + +##### lock_checker_enable_deadlock_check + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用后,LockChecker 线程使用 ThreadMXBean.findDeadlockedThreads() 执行 JVM 级死锁检测,并记录违规线程的堆栈跟踪。该检查在 LockChecker 守护进程(其频率由 `lock_checker_interval_second` 控制)内部运行,并将详细的堆栈信息写入日志,这可能需要大量的 CPU 和 I/O。仅在故障排除活动或可重现的死锁问题时才启用此选项;在正常操作中保持启用状态可能会增加开销和日志量。 +- 引入版本:v3.2.0 + +##### low_cardinality_threshold + +- 默认值:255 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:低基数字典的阈值。 +- 引入版本:v3.5.0 + +##### materialized_view_min_refresh_interval + +- 默认值:60 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:ASYNC 物化视图调度的最小允许刷新间隔(秒)。当使用基于时间的间隔创建物化视图时,间隔会转换为秒,并且不得小于此值;否则 CREATE/ALTER 操作将因 DDL 错误而失败。如果此值大于 0,则强制执行检查;将其设置为 0 或负值以禁用限制,这可以防止过度的 TaskManager 调度和过高频率刷新导致的 FE 内存/CPU 使用。此项不适用于 EVENT_TRIGGERED 刷新。 +- 引入版本:v3.3.0, v3.4.0, v3.5.0 + +##### materialized_view_refresh_ascending + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `true` 时,物化视图分区刷新将按分区键升序(从旧到新)迭代分区。当设置为 `false`(默认值)时,系统将按降序(从新到旧)迭代。StarRocks 在列表分区和范围分区物化视图刷新逻辑中都使用此项,以选择在应用分区刷新限制时要处理的分区,并计算后续 TaskRun 执行的下一个开始/结束分区边界。更改此项会改变哪些分区首先刷新以及如何派生下一个分区范围;对于范围分区物化视图,调度器会验证新的开始/结束,如果更改会创建重复边界(死循环),则会引发错误,因此请谨慎设置此项。 +- 引入版本:v3.3.1, v3.4.0, v3.5.0 + +##### max_allowed_in_element_num_of_delete + +- 默认值:10000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:DELETE 语句中 IN 谓词允许的最大元素数量。 +- 引入版本:- + +##### max_create_table_timeout_second + +- 默认值:600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:创建表的最大超时持续时间。 +- 引入版本:- + +##### max_distribution_pruner_recursion_depth + +- 默认值:100 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:分区修剪器允许的最大递归深度。增加递归深度可以修剪更多元素,但也会增加 CPU 消耗。 +- 引入版本:- + +##### max_partitions_in_one_batch + +- 默认值:4096 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:批量创建分区时可以创建的最大分区数。 +- 引入版本:- + +##### max_planner_scalar_rewrite_num + +- 默认值:100000 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:优化器可以重写标量操作符的最大次数。 +- 引入版本:- + +##### max_query_queue_history_slots_number + +- 默认值:0 +- 类型:Int +- 单位:Slot +- 是否可变:是 +- 描述:控制每个查询队列保留多少最近释放的(历史)已分配 Slot 以进行监控和可观测性。当 `max_query_queue_history_slots_number` 设置为大于 `0` 的值时,BaseSlotTracker 会在内存队列中保留最多指定数量的最新释放的 LogicalSlot 条目,并在超出限制时驱逐最旧的条目。启用此功能会导致 getSlots() 包含这些历史条目(最新的在前),允许 BaseSlotTracker 尝试使用 ConnectContext 注册 Slot 以获取更丰富的 ExtraMessage 数据,并允许 LogicalSlot.ConnectContextListener 将查询完成元数据附加到历史 Slot。当 `max_query_queue_history_slots_number` `<= 0` 时,历史机制被禁用(不使用额外内存)。使用合理的值来平衡可观测性和内存开销。 +- 引入版本:v3.5.0 + +##### max_query_retry_time + +- 默认值:2 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE 上查询的最大重试次数。 +- 引入版本:- + +##### max_running_rollup_job_num_per_table + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:一张表可以并行运行的 Rollup 作业的最大数量。 +- 引入版本:- + +##### max_scalar_operator_flat_children + +- 默认值:10000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:ScalarOperator 的最大扁平子节点数。您可以设置此限制以防止优化器使用过多内存。 +- 引入版本:- + +##### max_scalar_operator_optimize_depth + +- 默认值:256 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:ScalarOperator 优化可以应用的最大深度。 +- 引入版本:- + +##### mv_active_checker_interval_seconds + +- 默认值:60 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:当后台 active_checker 线程启用时,系统将定期检测并自动重新激活因其基表(或视图)的 Schema 变更或重建而变为非活动的物化视图。此参数控制检查器线程的调度间隔,以秒为单位。默认值为系统定义。 +- 引入版本:v3.1.6 + +##### mv_rewrite_consider_data_layout_mode + +- 默认值:`enable` +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:控制物化视图重写在选择最佳物化视图时是否应考虑基表数据布局。有效值: + - `disable`:在选择候选物化视图时,从不使用数据布局标准。 + - `enable`:仅当查询被识别为对布局敏感时才使用数据布局标准。 + - `force`:在选择最佳物化视图时,始终应用数据布局标准。 + 更改此项会影响 `BestMvSelector` 的行为,并可以根据物理布局是否影响计划正确性或性能来改进或扩大重写适用性。 +- 引入版本:- + +##### publish_version_interval_ms + +- 默认值:10 +- 类型:Int +- 单位:毫秒 +- 是否可变:否 +- 描述:发布验证任务发出的时间间隔。 +- 引入版本:- + +##### query_queue_slots_estimator_strategy + +- 默认值:MAX +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:当 `enable_query_queue_v2` 为 true 时,选择用于基于队列的查询的 Slot 估算策略。有效值:MBE(基于内存)、PBE(基于并行度)、MAX(取 MBE 和 PBE 的最大值)和 MIN(取 MBE 和 PBE 的最小值)。MBE 根据预测内存或计划成本除以每个 Slot 的内存目标来估算 Slot,并由 `totalSlots` 限制。PBE 根据片段并行度(扫描范围计数或基数 / 每 Slot 行数)和基于 CPU 成本的计算(使用每个 Slot 的 CPU 成本)派生 Slot,然后将结果限制在 [numSlots/2, numSlots] 范围内。MAX 和 MIN 通过取其最大值或最小值来组合 MBE 和 PBE。如果配置值无效,则使用默认值 (`MAX`)。 +- 引入版本:v3.5.0 + +##### query_queue_v2_concurrency_level + +- 默认值:4 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:控制计算系统总查询 Slot 时使用的逻辑并发“层”数。在 Shared-Nothing 模式下,总 Slot = `query_queue_v2_concurrency_level` * BE 数量 * 每个 BE 的核心数(从 BackendResourceStat 派生)。在多仓库模式下,有效并发会缩减到 max(1, `query_queue_v2_concurrency_level` / 4)。如果配置值为非正数,则将其视为 `4`。更改此值会增加或减少总 Slot(以及因此的并发查询容量),并影响每个 Slot 的资源:memBytesPerSlot 通过将每个 worker 内存除以(每个 worker 的核心数 * 并发)得出,CPU 核算使用 `query_queue_v2_cpu_costs_per_slot`。将其设置为与集群大小成比例;非常大的值可能会减少每个 Slot 的内存并导致资源碎片化。 +- 引入版本:v3.3.4, v3.4.0, v3.5.0 + +##### query_queue_v2_cpu_costs_per_slot + +- 默认值:1000000000 +- 类型:Long +- 单位:规划器 CPU 成本单位 +- 是否可变:是 +- 描述:每个 Slot 的 CPU 成本阈值,用于根据查询的规划器 CPU 成本估算查询所需的 Slot 数量。调度器将 Slot 计算为整数(plan_cpu_costs / `query_queue_v2_cpu_costs_per_slot`),然后将结果限制在 [1, totalSlots] 范围内(totalSlots 从查询队列 V2 `V2` 参数派生)。V2 代码将非正设置规范化为 1 (Math.max(1, value)),因此非正值实际上变为 `1`。增加此值可减少每个查询分配的 Slot(倾向于较少、较大 Slot 的查询);减少此值会增加每个查询的 Slot。与 `query_queue_v2_num_rows_per_slot` 和并发设置一起调整,以控制并行度与资源粒度。 +- 引入版本:v3.3.4, v3.4.0, v3.5.0 + +##### query_queue_v2_num_rows_per_slot + +- 默认值:4096 +- 类型:Int +- 单位:行 +- 是否可变:是 +- 描述:估算每个查询 Slot 计数时,分配给单个调度 Slot 的目标源行记录数。StarRocks 计算 estimated_slots = (Source Node 的基数) / `query_queue_v2_num_rows_per_slot`,然后将结果限制在 [1, totalSlots] 范围内,如果计算值为非正数,则强制最小值为 1。totalSlots 从可用资源(大约 DOP * `query_queue_v2_concurrency_level` * worker/BE 数量)派生,因此取决于集群/核心计数。增加此值可减少 Slot 计数(每个 Slot 处理更多行)并降低调度开销;减少它可增加并行度(更多、更小的 Slot),直到达到资源限制。 +- 引入版本:v3.3.4, v3.4.0, v3.5.0 + +##### query_queue_v2_schedule_strategy + +- 默认值:SWRR +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:选择 Query Queue V2 用于排序待处理查询的调度策略。支持的值(不区分大小写)为 `SWRR` (Smooth Weighted Round Robin) —— 默认值,适用于需要公平加权共享的混合/混合工作负载 —— 和 `SJF` (Short Job First + Aging) —— 优先处理短作业,同时使用老化机制避免饥饿。该值通过不区分大小写的枚举查找进行解析;无法识别的值将作为错误记录并使用默认策略。此配置仅在 Query Queue V2 启用时影响行为,并与 V2 大小设置(例如 `query_queue_v2_concurrency_level`)交互。 +- 引入版本:v3.3.12, v3.4.2, v3.5.0 + +##### semi_sync_collect_statistic_await_seconds + +- 默认值:30 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:DML 操作(INSERT INTO 和 INSERT OVERWRITE 语句)期间半同步统计信息收集的最大等待时间。Stream Load 和 Broker Load 使用异步模式,不受此配置影响。如果统计信息收集时间超过此值,加载操作将继续而不等待收集完成。此配置与 `enable_statistic_collect_on_first_load` 协同工作。 +- 引入版本:v3.1 + +##### slow_query_analyze_threshold + +- 默认值:5 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:查询执行时间阈值,用于触发查询反馈分析。 +- 引入版本:v3.4.0 + +##### statistic_analyze_status_keep_second + +- 默认值:3 * 24 * 3600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:保留收集任务历史记录的持续时间。默认值为 3 天。 +- 引入版本:- + +##### statistic_auto_analyze_end_time + +- 默认值:23:59:59 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:自动收集的结束时间。取值范围:`00:00:00` - `23:59:59`。 +- 引入版本:- + +##### statistic_auto_analyze_start_time + +- 默认值:00:00:00 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:自动收集的开始时间。取值范围:`00:00:00` - `23:59:59`。 +- 引入版本:- + +##### statistic_auto_collect_ratio + +- 默认值:0.8 +- 类型:Double +- 单位:- +- 是否可变:是 +- 描述:判断自动收集统计信息是否健康的阈值。如果统计信息健康度低于此阈值,则触发自动收集。 +- 引入版本:- + +##### statistic_auto_collect_small_table_rows + +- 默认值:10000000 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:自动收集期间判断外部数据源(Hive、Iceberg、Hudi)中的表是否为小表的阈值。如果表的行数小于此值,则认为该表为小表。 +- 引入版本:v3.2 + +##### statistic_cache_columns + +- 默认值:100000 +- 类型:Long +- 单位:- +- 是否可变:否 +- 描述:统计信息表可以缓存的行数。 +- 引入版本:- + +##### statistic_cache_thread_pool_size + +- 默认值:10 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:用于刷新统计信息缓存的线程池大小。 +- 引入版本:- + +##### statistic_collect_interval_sec + +- 默认值:5 * 60 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:自动收集期间检查数据更新的间隔。 +- 引入版本:- + +##### statistic_max_full_collect_data_size + +- 默认值:100 * 1024 * 1024 * 1024 +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:统计信息自动收集的数据大小阈值。如果总大小超过此值,则执行采样收集而不是全量收集。 +- 引入版本:- + +##### statistic_sample_collect_rows + +- 默认值:200000 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:在加载触发的统计信息操作期间,用于决定 SAMPLE 和 FULL 统计信息收集的行数阈值。如果加载或更改的行数超过此阈值(默认 200,000),则使用 SAMPLE 统计信息收集;否则,使用 FULL 统计信息收集。此设置与 `enable_statistic_collect_on_first_load` 和 `statistic_sample_collect_ratio_threshold_of_first_load` 协同工作。 +- 引入版本:- + +##### statistic_update_interval_sec + +- 默认值:24 * 60 * 60 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:统计信息缓存的更新间隔。 +- 引入版本:- + +##### task_check_interval_second + +- 默认值:60 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:任务后台作业执行之间的时间间隔。GlobalStateMgr 使用此值调度 TaskCleaner FrontendDaemon,该守护进程调用 `doTaskBackgroundJob()`;该值乘以 1000 以毫秒为单位设置守护进程间隔。减小此值可使后台维护(任务清理、检查)运行更频繁并更快响应,但会增加 CPU/IO 开销;增加它可减少开销,但会延迟清理和陈旧任务的检测。调整此值以平衡维护响应速度和资源使用。 +- 引入版本:v3.2.0 + +##### task_min_schedule_interval_s + +- 默认值:10 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:SQL 层检查任务调度的最小允许调度间隔(秒)。提交任务时,TaskAnalyzer 将调度周期转换为秒,如果周期小于 `task_min_schedule_interval_s`,则拒绝提交并显示 ERR_INVALID_PARAMETER 错误。这可以防止创建运行过于频繁的任务,并保护调度器免受高频任务的影响。如果调度没有明确的开始时间,TaskAnalyzer 会将开始时间设置为当前纪元秒。 +- 引入版本:v3.3.0, v3.4.0, v3.5.0 + +##### task_runs_timeout_second + +- 默认值:4 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:TaskRun 的默认执行超时(秒)。此项用作 TaskRun 执行的基线超时。如果任务运行的属性包含 `query_timeout` 或 `insert_timeout` 具有正整数值的会话变量,则运行时将使用该会话超时与 `task_runs_timeout_second` 之间的较大值。然后,有效超时将被限制为不超过配置的 `task_runs_ttl_second` 和 `task_ttl_second`。设置此项以限制任务运行可能执行的时间长度。非常大的值可能会被任务/任务运行 TTL 设置截断。 +- 引入版本:- + +### 加载和卸载 + +##### broker_load_default_timeout_second + +- 默认值:14400 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:Broker Load 作业的超时持续时间。 +- 引入版本:- + +##### desired_max_waiting_jobs + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE 中待处理作业的最大数量。该数量指所有作业,例如表创建、加载和 Schema Change 作业。如果 FE 中待处理作业的数量达到此值,FE 将拒绝新的加载请求。此参数仅对异步加载有效。从 v2.5 开始,默认值从 100 更改为 1024。 +- 引入版本:- + +##### disable_load_job + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当集群遇到错误时是否禁用加载。这可以防止因集群错误造成的任何损失。默认值为 `FALSE`,表示不禁用加载。`TRUE` 表示禁用加载,集群处于只读状态。 +- 引入版本:- + +##### empty_load_as_error + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:如果没有加载数据,是否返回错误消息 "all partitions have no load data"。有效值: + - `true`:如果没有加载数据,系统将显示失败消息并返回错误 "all partitions have no load data"。 + - `false`:如果没有加载数据,系统将显示成功消息并返回 OK,而不是错误。 +- 引入版本:- + +##### enable_file_bundling + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否为云原生表启用文件捆绑优化。当此功能启用(设置为 `true`)时,系统会自动捆绑加载、Compaction 或 Publish 操作生成的数据文件,从而减少高频访问外部存储系统造成的 API 成本。您还可以使用 CREATE TABLE 属性 `file_bundling` 在表级别控制此行为。有关详细说明,请参阅 [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md)。 +- 引入版本:v4.0 + +##### enable_routine_load_lag_metrics + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否收集 Routine Load Kafka 分区偏移量滞后指标。请注意,将此项设置为 `true` 将调用 Kafka API 获取分区的最新偏移量。 +- 引入版本:- + +##### enable_sync_publish + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在加载事务的发布阶段同步执行应用任务。此参数仅适用于主键表。有效值: + - `TRUE`(默认值):应用任务在加载事务的发布阶段同步执行。这意味着只有在应用任务完成后,加载事务才会被报告为成功,并且加载的数据才能真正被查询。当任务一次加载大量数据或频繁加载数据时,将此参数设置为 `true` 可以提高查询性能和稳定性,但可能会增加加载延迟。 + - `FALSE`:应用任务在加载事务的发布阶段异步执行。这意味着加载事务在应用任务提交后就被报告为成功,但加载的数据不能立即被查询。在这种情况下,并发查询需要等待应用任务完成或超时才能继续。当任务一次加载大量数据或频繁加载数据时,将此参数设置为 `false` 可能会影响查询性能和稳定性。 +- 引入版本:v3.2.0 + +##### export_checker_interval_second + +- 默认值:5 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:加载作业的调度时间间隔。 +- 引入版本:- + +##### export_max_bytes_per_be_per_task + +- 默认值:268435456 +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:单个数据卸载任务从单个 BE 导出的最大数据量。 +- 引入版本:- + +##### export_running_job_num_limit + +- 默认值:5 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:可以并行运行的数据导出任务的最大数量。 +- 引入版本:- + +##### export_task_default_timeout_second + +- 默认值:2 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:数据导出任务的超时持续时间。 +- 引入版本:- + +##### export_task_pool_size + +- 默认值:5 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:卸载任务线程池的大小。 +- 引入版本:- + +##### external_table_commit_timeout_ms + +- 默认值:10000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:将写入事务提交(发布)到 StarRocks 外部表的超时持续时间。默认值 `10000` 表示 10 秒的超时持续时间。 +- 引入版本:- + +##### finish_transaction_default_lock_timeout_ms + +- 默认值:1000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:完成事务期间获取数据库和表锁的默认超时时间。 +- 引入版本:v4.0.0, v3.5.8 + +##### history_job_keep_max_second + +- 默认值:7 * 24 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:历史作业(例如 Schema Change 作业)可以保留的最长持续时间。 +- 引入版本:- + +##### insert_load_default_timeout_second + +- 默认值:3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:用于加载数据的 INSERT INTO 语句的超时持续时间。 +- 引入版本:- + +##### label_clean_interval_second + +- 默认值:4 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:标签清理的时间间隔。单位:秒。建议指定较短的时间间隔,以确保可以及时清理历史标签。 +- 引入版本:- + +##### label_keep_max_num + +- 默认值:1000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:一段时间内可以保留的加载作业的最大数量。如果超过此数量,历史作业的信息将被删除。 +- 引入版本:- + +##### label_keep_max_second + +- 默认值:3 * 24 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:已完成且处于 FINISHED 或 CANCELLED 状态的加载作业的标签最长保留时间(秒)。默认值为 3 天。超过此持续时间后,标签将被删除。此参数适用于所有类型的加载作业。值过大会消耗大量内存。 +- 引入版本:- + +##### load_checker_interval_second + +- 默认值:5 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:加载作业循环处理的时间间隔。 +- 引入版本:- + +##### load_parallel_instance_num + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:控制为 Broker Load 和 Stream Load 在单个主机上创建的并行加载片段实例的数量。LoadPlanner 使用此值作为每主机并行度,除非会话启用自适应 Sink DOP;如果会话变量 `enable_adaptive_sink_dop` 为 true,则会话的 `sink_degree_of_parallelism` 将覆盖此配置。当需要 Shuffle 时,此值应用于片段并行执行(扫描片段和 Sink 片段并行执行实例)。当不需要 Shuffle 时,它用作 Sink Pipeline DOP。注意:从本地文件加载被强制为单个实例(Pipeline DOP = 1,并行执行 = 1),以避免本地磁盘争用。增加此值可提高每主机并发性和吞吐量,但可能会增加 CPU、内存和 I/O 争用。 +- 引入版本:v3.2.0 + +##### load_straggler_wait_second + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:BE 副本可以容忍的最大加载延迟。如果超过此值,则执行克隆以从其他副本克隆数据。 +- 引入版本:- + +##### loads_history_retained_days + +- 默认值:30 +- 类型:Int +- 单位:天 +- 是否可变:是 +- 描述:内部 `_statistics_.loads_history` 表中保留加载历史记录的天数。此值用于表创建以设置表属性 `partition_live_number`,并传递给 `TableKeeper`(限制最小值为 1)以确定要保留多少每日分区。增加或减少此值可调整完成的加载作业在每日分区中保留的时间;它会影响新表创建和 keeper 的修剪行为,但不会自动重新创建过去的分区。`LoadsHistorySyncer` 依赖此保留在管理加载历史生命周期时;其同步频率由 `loads_history_sync_interval_second` 控制。 +- 引入版本:v3.3.6, v3.4.0, v3.5.0 + +##### loads_history_sync_interval_second + +- 默认值:60 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:LoadsHistorySyncer 用于调度定期将已完成的加载作业从 `information_schema.loads` 同步到内部 `_statistics_.loads_history` 表的时间间隔(秒)。该值在构造函数中乘以 1000 以设置 FrontendDaemon 间隔。同步器跳过第一次运行(以允许表创建)并且仅导入一分钟前完成的加载;较小的值会增加 DML 和执行器负载,而较大的值会延迟历史加载记录的可用性。有关目标表的保留/分区行为,请参见 `loads_history_retained_days`。 +- 引入版本:v3.3.6, v3.4.0, v3.5.0 + +##### max_broker_load_job_concurrency + +- 默认值:5 +- 别名:async_load_task_pool_size +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:StarRocks 集群中允许的最大并发 Broker Load 作业数。此参数仅对 Broker Load 有效。此参数的值必须小于 `max_running_txn_num_per_db` 的值。从 v2.5 开始,默认值从 `10` 更改为 `5`。 +- 引入版本:- + +##### max_load_timeout_second + +- 默认值:259200 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:加载作业允许的最大超时持续时间。如果超过此限制,加载作业将失败。此限制适用于所有类型的加载作业。 +- 引入版本:- + +##### max_routine_load_batch_size + +- 默认值:4294967296 +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:Routine Load 任务可以加载的最大数据量。 +- 引入版本:- + +##### max_routine_load_task_concurrent_num + +- 默认值:5 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:每个 Routine Load 作业的最大并发任务数。 +- 引入版本:- + +##### max_routine_load_task_num_per_be + +- 默认值:16 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:每个 BE 上最大并发 Routine Load 任务数。从 v3.1.0 起,此参数的默认值从 5 增加到 16,并且不再需要小于或等于 BE 静态参数 `routine_load_thread_pool_size` (已弃用) 的值。 +- 引入版本:- + +##### max_running_txn_num_per_db + +- 默认值:1000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:StarRocks 集群中每个数据库允许运行的最大加载事务数。默认值为 `1000`。从 v3.1 开始,默认值从 `100` 更改为 `1000`。当数据库正在运行的加载事务的实际数量超过此参数的值时,将不处理新的加载请求。同步加载作业的新请求将被拒绝,异步加载作业的新请求将被放入队列。不建议增加此参数的值,因为这会增加系统负载。 +- 引入版本:- + +##### max_stream_load_timeout_second + +- 默认值:259200 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:Stream Load 作业允许的最大超时持续时间。 +- 引入版本:- + +##### max_tolerable_backend_down_num + +- 默认值:0 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:允许的最大故障 BE 节点数。如果超过此数量,Routine Load 作业将无法自动恢复。 +- 引入版本:- + +##### min_bytes_per_broker_scanner + +- 默认值:67108864 +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:Broker Load 实例可以处理的最小数据量。 +- 引入版本:- + +##### min_load_timeout_second + +- 默认值:1 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:加载作业允许的最小超时持续时间。此限制适用于所有类型的加载作业。 +- 引入版本:- + +##### min_routine_load_lag_for_metrics + +- 默认值:10000 +- 类型:INT +- 单位:- +- 是否可变:是 +- 描述:Routine Load 作业在监控指标中显示的最小偏移量滞后。偏移量滞后大于此值的 Routine Load 作业将显示在指标中。 +- 引入版本:- + +##### period_of_auto_resume_min + +- 默认值:5 +- 类型:Int +- 单位:分钟 +- 是否可变:是 +- 描述:Routine Load 作业自动恢复的时间间隔。 +- 引入版本:- + +##### prepared_transaction_default_timeout_second + +- 默认值:86400 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:准备事务的默认超时持续时间。 +- 引入版本:- + +##### routine_load_task_consume_second + +- 默认值:15 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:集群中每个 Routine Load 任务消耗数据的最长时间。自 v3.1.0 起,Routine Load 作业在 [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties) 中支持新参数 `task_consume_second`。此参数适用于 Routine Load 作业中的单个加载任务,更具灵活性。 +- 引入版本:- + +##### routine_load_task_timeout_second + +- 默认值:60 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:集群中每个 Routine Load 任务的超时持续时间。自 v3.1.0 起,Routine Load 作业在 [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties) 中支持新参数 `task_timeout_second`。此参数适用于 Routine Load 作业中的单个加载任务,更具灵活性。 +- 引入版本:- + +##### routine_load_unstable_threshold_second + +- 默认值:3600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:如果 Routine Load 作业中的任何任务滞后,则该作业被设置为 UNSTABLE 状态。具体而言,如果正在消费的消息的时间戳与当前时间之差超过此阈值,并且数据源中存在未消费的消息。 +- 引入版本:- + +##### spark_dpp_version + +- 默认值:1.0.0 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:使用的 Spark 动态分区修剪 (DPP) 版本。 +- 引入版本:- + +##### spark_home_default_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/lib/spark2x" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Spark 客户端的根目录。 +- 引入版本:- + +##### spark_launcher_log_dir + +- 默认值:sys_log_dir + "/spark_launcher_log" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储 Spark 日志文件的目录。 +- 引入版本:- + +##### spark_load_default_timeout_second + +- 默认值:86400 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:每个 Spark Load 作业的超时持续时间。 +- 引入版本:- + +##### spark_load_submit_timeout_second + +- 默认值:300 +- 类型:long +- 单位:秒 +- 是否可变:否 +- 描述:提交 Spark 应用程序后等待 YARN 响应的最长时间(秒)。`SparkLauncherMonitor.LogMonitor` 将此值转换为毫秒,如果作业在 UNKNOWN/CONNECTED/SUBMITTED 状态停留时间超过此超时,它将停止监控并强制终止 Spark 启动器进程。`SparkLoadJob` 将此配置作为默认值读取,并允许通过 `LoadStmt.SPARK_LOAD_SUBMIT_TIMEOUT` 属性进行每个加载的覆盖。将其设置得足够高以适应 YARN 排队延迟;设置过低可能会中止合法排队的作业,而设置过高可能会延迟故障处理和资源清理。 +- 引入版本:v3.2.0 + +##### spark_resource_path + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Spark 依赖包的根目录。 +- 引入版本:- + +##### stream_load_default_timeout_second + +- 默认值:600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:每个 Stream Load 作业的默认超时持续时间。 +- 引入版本:- + +##### stream_load_max_txn_num_per_be + +- 默认值:-1 +- 类型:Int +- 单位:事务 +- 是否可变:是 +- 描述:限制从单个 BE(后端)主机接受的并发 Stream Load 事务的数量。当设置为非负整数时,FrontendServiceImpl 检查 BE(按客户端 IP)的当前事务计数,如果计数 `>=` 此限制,则拒绝新的 Stream Load 开始请求。值 `< 0` 禁用限制(无限制)。此检查在 Stream Load 开始时发生,并且在超出时可能会导致 `streamload txn num per be exceeds limit` 错误。相关的运行时行为使用 `stream_load_default_timeout_second` 作为请求超时的回退。 +- 引入版本:v3.3.0, v3.4.0, v3.5.0 + +##### stream_load_task_keep_max_num + +- 默认值:1000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:StreamLoadMgr 在内存中保留的 Stream Load 任务的最大数量(全局适用于所有数据库)。当跟踪任务 (`idToStreamLoadTask`) 的数量超过此阈值时,StreamLoadMgr 首先调用 `cleanSyncStreamLoadTasks()` 删除已完成的同步 Stream Load 任务;如果大小仍然大于此阈值的一半,它将调用 `cleanOldStreamLoadTasks(true)` 强制删除旧任务或已完成的任务。增加此值可在内存中保留更多任务历史记录;减小它可减少内存使用并使清理更具侵略性。此值仅控制内存中的保留,不影响持久化/重放的任务。 +- 引入版本:v3.2.0 + +##### stream_load_task_keep_max_second + +- 默认值:3 * 24 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:已完成或已取消的 Stream Load 任务的保留窗口。当任务达到最终状态且其结束时间戳早于此阈值 (`currentMs - endTimeMs > stream_load_task_keep_max_second * 1000`) 时,它将有资格由 `StreamLoadMgr.cleanOldStreamLoadTasks` 删除,并在加载持久化状态时丢弃。适用于 `StreamLoadTask` 和 `StreamLoadMultiStmtTask`。如果总任务计数超过 `stream_load_task_keep_max_num`,清理可能会更早触发(同步任务由 `cleanSyncStreamLoadTasks` 优先处理)。设置此值以平衡历史/可调试性与内存使用。 +- 引入版本:v3.2.0 + +##### transaction_clean_interval_second + +- 默认值:30 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:已完成事务的清理时间间隔。单位:秒。建议指定较短的时间间隔,以确保可以及时清理已完成的事务。 +- 引入版本:- + +##### transaction_stream_load_coordinator_cache_capacity + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:存储从事务标签到协调器节点映射的缓存容量。 +- 引入版本:- + +##### transaction_stream_load_coordinator_cache_expire_seconds + +- 默认值:900 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:协调器映射在缓存中保留直到被驱逐(TTL)的时间。 +- 引入版本:- + +##### yarn_client_path + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Yarn 客户端包的根目录。 +- 引入版本:- + +##### yarn_config_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-config" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储 Yarn 配置文件的目录。 +- 引入版本:- + +### 统计报告 + +##### enable_collect_warehouse_metrics + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `true` 时,系统将收集并导出每个仓库的指标。启用它会将仓库级别的指标(Slot/使用率/可用性)添加到指标输出中,并增加指标基数和收集开销。禁用它将省略特定于仓库的指标,并减少 CPU/网络和监控存储成本。 +- 引入版本:v3.5.0 + +##### enable_http_detail_metrics + +- 默认值:false +- 类型:boolean +- 单位:- +- 是否可变:是 +- 描述:当为 true 时,HTTP 服务器计算并暴露详细的 HTTP worker 指标(特别是 `HTTP_WORKER_PENDING_TASKS_NUM` gauge)。启用此功能会导致服务器迭代 Netty worker 执行器并在每个 `NioEventLoop` 上调用 `pendingTasks()` 以汇总待处理任务计数;禁用时,gauge 返回 0 以避免该成本。此额外收集可能对 CPU 和延迟敏感——仅在调试或详细调查时才启用。 +- 引入版本:v3.2.3 + +##### proc_profile_collect_time_s + +- 默认值:120 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:单个进程 Profile 收集的持续时间(秒)。当 `proc_profile_cpu_enable` 或 `proc_profile_mem_enable` 设置为 `true` 时,`AsyncProfiler` 会启动,收集器线程会睡眠此持续时间,然后 Profiler 停止并写入 Profile。较大的值会增加采样覆盖率和文件大小,但会延长 Profiler 运行时并延迟后续收集;较小的值会减少开销,但可能生成不足的样本。确保此值与 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 等保留设置对齐。 +- 引入版本:v3.2.12 + +### 存储 + +##### alter_table_timeout_second + +- 默认值:86400 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:Schema Change 操作 (ALTER TABLE) 的超时持续时间。 +- 引入版本:- + +##### capacity_used_percent_high_water + +- 默认值:0.75 +- 类型:double +- 单位:分数 (0.0–1.0) +- 是否可变:是 +- 描述:计算后端负载分数时使用的磁盘容量使用百分比的高水位阈值(总容量的一部分)。`BackendLoadStatistic.calcSore` 使用 `capacity_used_percent_high_water` 设置 `LoadScore.capacityCoefficient`:如果后端使用百分比小于 0.5,则系数等于 0.5;如果使用百分比 `>` `capacity_used_percent_high_water`,则系数 = 1.0;否则系数通过 (2 * usedPercent - 0.5) 线性变化。当系数为 1.0 时,负载分数完全由容量比例驱动;较低的值会增加副本计数的权重。调整此值会改变平衡器惩罚磁盘利用率高的后端的积极程度。 +- 引入版本:v3.2.0 + +##### catalog_trash_expire_second + +- 默认值:86400 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:数据库、表或分区删除后元数据可以保留的最长持续时间。如果此持续时间过期,数据将被删除,并且无法通过 [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) 命令恢复。 +- 引入版本:- + +##### check_consistency_default_timeout_second + +- 默认值:600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:副本一致性检查的超时持续时间。您可以根据 Tablet 的大小设置此参数。 +- 引入版本:- + +##### consistency_check_cooldown_time_second + +- 默认值:24 * 3600 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:控制同一 Tablet 之间一致性检查所需的最小间隔(秒)。在 Tablet 选择期间,Tablet 仅在 `tablet.getLastCheckTime()` 小于 `(currentTimeMillis - consistency_check_cooldown_time_second * 1000)` 时才被视为合格。默认值 (24 * 3600) 强制每个 Tablet 大约每天检查一次,以减少后端磁盘 I/O。降低此值会增加检查频率和资源使用;提高此值会以较慢检测不一致性为代价减少 I/O。该值在从索引的 Tablet 列表中过滤冷却的 Tablet 时全局应用。 +- 引入版本:v3.5.5 + +##### consistency_check_end_time + +- 默认值:"4" +- 类型:String +- 单位:一天中的小时 (0-23) +- 是否可变:否 +- 描述:指定 ConsistencyChecker 工作窗口的结束小时(一天中的小时)。该值在系统时区中使用 SimpleDateFormat("HH") 解析,并接受 0–23(一位或两位数字)。StarRocks 将其与 `consistency_check_start_time` 一起使用,以决定何时调度和添加一致性检查作业。当 `consistency_check_start_time` 大于 `consistency_check_end_time` 时,窗口跨越午夜(例如,默认值是 `consistency_check_start_time` = "23" 到 `consistency_check_end_time` = "4")。当 `consistency_check_start_time` 等于 `consistency_check_end_time` 时,检查器永不运行。解析失败将导致 FE 启动时记录错误并退出,因此请提供有效的小时字符串。 +- 引入版本:v3.2.0 + +##### consistency_check_start_time + +- 默认值:"23" +- 类型:String +- 单位:一天中的小时 (00-23) +- 是否可变:否 +- 描述:指定 ConsistencyChecker 工作窗口的开始小时(一天中的小时)。该值在系统时区中使用 SimpleDateFormat("HH") 解析,并接受 0–23(一位或两位数字)。StarRocks 将其与 `consistency_check_end_time` 一起使用,以决定何时调度和添加一致性检查作业。当 `consistency_check_start_time` 大于 `consistency_check_end_time` 时,窗口跨越午夜(例如,默认值是 `consistency_check_start_time` = "23" 到 `consistency_check_end_time` = "4")。当 `consistency_check_start_time` 等于 `consistency_check_end_time` 时,检查器永不运行。解析失败将导致 FE 启动时记录错误并退出,因此请提供有效的小时字符串。 +- 引入版本:v3.2.0 + +##### consistency_tablet_meta_check_interval_ms + +- 默认值:2 * 3600 * 1000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:ConsistencyChecker 用于在 `TabletInvertedIndex` 和 `LocalMetastore` 之间运行完整 Tablet 元数据一致性扫描的时间间隔。`runAfterCatalogReady` 中的守护进程在 `current time - lastTabletMetaCheckTime` 超过此值时触发 checkTabletMetaConsistency。当第一次检测到无效 Tablet 时,其 `toBeCleanedTime` 设置为 `now + (consistency_tablet_meta_check_interval_ms / 2)`,因此实际删除会延迟到后续扫描。增加此值可减少扫描频率和负载(清理速度变慢);减少此值可更快检测和删除陈旧 Tablet(开销更高)。 +- 引入版本:v3.2.0 + +##### default_replication_num + +- 默认值:3 +- 类型:Short +- 单位:- +- 是否可变:是 +- 描述:设置在 StarRocks 中创建表时每个数据分区的默认副本数。此设置可以在创建表时通过在 CREATE TABLE DDL 中指定 `replication_num=x` 进行覆盖。 +- 引入版本:- + +##### enable_auto_tablet_distribution + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否自动设置桶的数量。 + - 如果此参数设置为 `TRUE`,则在创建表或添加分区时无需指定桶的数量。StarRocks 会自动确定桶的数量。 + - 如果此参数设置为 `FALSE`,则在创建表或添加分区时需要手动指定桶的数量。如果为表添加新分区时未指定桶计数,则新分区将继承表创建时设置的桶计数。但是,您也可以手动为新分区指定桶的数量。 +- 引入版本:v2.5.7 + +##### enable_experimental_rowstore + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用 [混合行-列存储](../../table_design/hybrid_table.md) 功能。 +- 引入版本:v3.2.3 + +##### enable_fast_schema_evolution + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否为 StarRocks 集群中的所有表启用快速 Schema 演进。有效值为 `TRUE` 和 `FALSE`(默认值)。启用快速 Schema 演进可以提高 Schema 变更的速度,并减少添加或删除列时的资源使用。 +- 引入版本:v3.2.0 + +> **注意** +> +> - StarRocks 共享数据集群从 v3.3.0 开始支持此参数。 +> - 如果您需要为特定表配置快速 Schema 演进,例如禁用特定表的快速 Schema 演进,您可以在表创建时设置表属性 [`fast_schema_evolution`](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md#set-fast-schema-evolution)。 + +##### enable_online_optimize_table + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:控制 StarRocks 在创建优化作业时是否使用非阻塞的在线优化路径。当 `enable_online_optimize_table` 为 true 且目标表满足兼容性检查(无分区/键/排序规范,分布不是 `RandomDistributionDesc`,存储类型不是 `COLUMN_WITH_ROW`,复制存储已启用,并且表不是云原生表或物化视图)时,规划器创建 `OnlineOptimizeJobV2` 以执行优化而不阻塞写入。如果为 false 或任何兼容性条件失败,StarRocks 回退到 `OptimizeJobV2`,这可能会在优化期间阻塞写入操作。 +- 引入版本:v3.3.3, v3.4.0, v3.5.0 + +##### enable_strict_storage_medium_check + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:FE 在用户创建表时是否严格检查 BE 的存储介质。如果此参数设置为 `TRUE`,FE 在用户创建表时会检查 BE 的存储介质,如果 BE 的存储介质与 CREATE TABLE 语句中指定的 `storage_medium` 参数不同,则返回错误。例如,CREATE TABLE 语句中指定的存储介质是 SSD,但 BE 的实际存储介质是 HDD。结果是表创建失败。如果此参数为 `FALSE`,FE 在用户创建表时不会检查 BE 的存储介质。 +- 引入版本:- + +##### max_bucket_number_per_partition + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:一个分区中可以创建的最大桶数。 +- 引入版本:v3.3.2 + +##### max_column_number_per_table + +- 默认值:10000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:一个表中可以创建的最大列数。 +- 引入版本:v3.3.2 + +##### max_dynamic_partition_num + +- 默认值:500 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:在分析或创建动态分区表时,一次可以创建的最大分区数。在动态分区属性验证期间,系统 task_runs_max_history_number 计算预期分区(结束偏移量 + 历史分区号),如果总数超过 `max_dynamic_partition_num`,则抛出 DDL 错误。仅当您期望合法的大的分区范围时才增加此值;增加它允许创建更多分区,但可能会增加元数据大小、调度工作和操作复杂性。 +- 引入版本:v3.2.0 + +##### max_partition_number_per_table + +- 默认值:100000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:一个表中可以创建的最大分区数。 +- 引入版本:v3.3.2 + +##### max_task_consecutive_fail_count + +- 默认值:10 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:任务在调度器自动暂停之前可能连续失败的最大次数。当 `TaskSource.MV.equals(task.getSource())` 且 `max_task_consecutive_fail_count` 大于 0 时,如果任务的连续失败计数达到或超过 `max_task_consecutive_fail_count`,则任务通过 TaskManager 暂停,并且对于物化视图任务,物化视图将被置为非活动状态。将抛出异常指示暂停以及如何重新激活(例如 `ALTER MATERIALIZED VIEW ACTIVE`)。将此项设置为 0 或负值以禁用自动暂停。 +- 引入版本:- + +##### partition_recycle_retention_period_secs + +- 默认值:1800 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:通过 INSERT OVERWRITE 或物化视图刷新操作删除的分区的元数据保留时间。请注意,此类元数据无法通过执行 [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) 恢复。 +- 引入版本:v3.5.9 + +##### recover_with_empty_tablet + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否用空副本替换丢失或损坏的 Tablet 副本。如果 Tablet 副本丢失或损坏,对此 Tablet 或其他健康 Tablet 的数据查询可能会失败。用空 Tablet 替换丢失或损坏的 Tablet 副本可确保查询仍能执行。但是,结果可能不正确,因为数据已丢失。默认值为 `FALSE`,表示不替换丢失或损坏的 Tablet 副本,并且查询失败。 +- 引入版本:- + +##### storage_usage_hard_limit_percent + +- 默认值:95 +- 别名:storage_flood_stage_usage_percent +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:BE 目录存储使用百分比的硬限制。如果 BE 存储目录的存储使用率(百分比)超过此值且剩余存储空间小于 `storage_usage_hard_limit_reserve_bytes`,则加载和恢复作业将被拒绝。您需要将此项与 BE 配置项 `storage_flood_stage_usage_percent` 一起设置才能使配置生效。 +- 引入版本:- + +##### storage_usage_hard_limit_reserve_bytes + +- 默认值:100 * 1024 * 1024 * 1024 +- 别名:storage_flood_stage_left_capacity_bytes +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:BE 目录剩余存储空间的硬限制。如果 BE 存储目录的剩余存储空间小于此值且存储使用率(百分比)超过 `storage_usage_hard_limit_percent`,则加载和恢复作业将被拒绝。您需要将此项与 BE 配置项 `storage_flood_stage_left_capacity_bytes` 一起设置才能使配置生效。 +- 引入版本:- + +##### storage_usage_soft_limit_percent + +- 默认值:90 +- 别名:storage_high_watermark_usage_percent +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:BE 目录存储使用百分比的软限制。如果 BE 存储目录的存储使用率(百分比)超过此值且剩余存储空间小于 `storage_usage_soft_limit_reserve_bytes`,则 Tablet 无法克隆到此目录。 +- 引入版本:- + +##### storage_usage_soft_limit_reserve_bytes + +- 默认值:200 * 1024 * 1024 * 1024 +- 别名:storage_min_left_capacity_bytes +- 类型:Long +- 单位:字节 +- 是否可变:是 +- 描述:BE 目录剩余存储空间的软限制。如果 BE 存储目录的剩余存储空间小于此值且存储使用率(百分比)超过 `storage_usage_soft_limit_percent`,则 Tablet 无法克隆到此目录。 +- 引入版本:- + +##### tablet_checker_lock_time_per_cycle_ms + +- 默认值:1000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:Tablet 检查器在释放并重新获取表锁之前,每个周期最大锁持有时间。小于 100 的值将被视为 100。 +- 引入版本:v3.5.9, v4.0.2 + +##### tablet_create_timeout_second + +- 默认值:10 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:创建 Tablet 的超时持续时间。从 v3.1 开始,默认值从 1 更改为 10。 +- 引入版本:- + +##### tablet_delete_timeout_second + +- 默认值:2 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:删除 Tablet 的超时持续时间。 +- 引入版本:- + +##### tablet_sched_balance_load_disk_safe_threshold + +- 默认值:0.5 +- 别名:balance_load_disk_safe_threshold +- 类型:Double +- 单位:- +- 是否可变:是 +- 描述:用于判断 BE 磁盘使用率是否均衡的百分比阈值。如果所有 BE 的磁盘使用率均低于此值,则认为均衡。如果磁盘使用率大于此值且最高与最低 BE 磁盘使用率之差大于 10%,则认为磁盘使用率不均衡并触发 Tablet 重均衡。 +- 引入版本:- + +##### tablet_sched_balance_load_score_threshold + +- 默认值:0.1 +- 别名:balance_load_score_threshold +- 类型:Double +- 单位:- +- 是否可变:是 +- 描述:用于判断 BE 负载是否均衡的百分比阈值。如果 BE 的负载低于所有 BE 的平均负载且差值大于此值,则此 BE 处于低负载状态。反之,如果 BE 的负载高于平均负载且差值大于此值,则此 BE 处于高负载状态。 +- 引入版本:- + +##### tablet_sched_be_down_tolerate_time_s + +- 默认值:900 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:调度器允许 BE 节点保持非活动状态的最长持续时间。达到时间阈值后,该 BE 节点上的 Tablet 将迁移到其他活动 BE 节点。 +- 引入版本:v2.5.7 + +##### tablet_sched_disable_balance + +- 默认值:false +- 别名:disable_balance +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否禁用 Tablet 均衡。`TRUE` 表示禁用 Tablet 均衡。`FALSE` 表示启用 Tablet 均衡。 +- 引入版本:- + +##### tablet_sched_disable_colocate_balance + +- 默认值:false +- 别名:disable_colocate_balance +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否禁用 Colocate 表的副本均衡。`TRUE` 表示禁用副本均衡。`FALSE` 表示启用副本均衡。 +- 引入版本:- + +##### tablet_sched_max_balancing_tablets + +- 默认值:500 +- 别名:max_balancing_tablets +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:可以同时均衡的 Tablet 的最大数量。如果超过此值,则跳过 Tablet 重均衡。 +- 引入版本:- + +##### tablet_sched_max_clone_task_timeout_sec + +- 默认值:2 * 60 * 60 +- 别名:max_clone_task_timeout_sec +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:克隆 Tablet 的最大超时持续时间。 +- 引入版本:- + +##### tablet_sched_max_not_being_scheduled_interval_ms + +- 默认值:15 * 60 * 1000 +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:当正在调度 Tablet 克隆任务时,如果 Tablet 在此参数指定的时间内未被调度,StarRocks 会给予它更高的优先级以尽快调度。 +- 引入版本:- + +##### tablet_sched_max_scheduling_tablets + +- 默认值:10000 +- 别名:max_scheduling_tablets +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:可以同时调度的 Tablet 的最大数量。如果超过此值,则跳过 Tablet 均衡和修复检查。 +- 引入版本:- + +##### tablet_sched_min_clone_task_timeout_sec + +- 默认值:3 * 60 +- 别名:min_clone_task_timeout_sec +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:克隆 Tablet 的最小超时持续时间。 +- 引入版本:- + +##### tablet_sched_num_based_balance_threshold_ratio + +- 默认值:0.5 +- 别名:- +- 类型:Double +- 单位:- +- 是否可变:是 +- 描述:基于数量的均衡可能会破坏磁盘大小均衡,但磁盘之间的最大差距不能超过 tablet_sched_num_based_balance_threshold_ratio * tablet_sched_balance_load_score_threshold。如果集群中有 Tablet 不断从 A 均衡到 B,B 均衡到 A,请减小此值。如果您希望 Tablet 分布更均衡,请增加此值。 +- 引入版本:- 3.1 + +##### tablet_sched_repair_delay_factor_second + +- 默认值:60 +- 别名:tablet_repair_delay_factor_second +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:副本修复的时间间隔,以秒为单位。 +- 引入版本:- + +##### tablet_sched_slot_num_per_path + +- 默认值:8 +- 别名:schedule_slot_num_per_path +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:单个 BE 存储目录中可以并发运行的 Tablet 相关任务的最大数量。从 v2.5 开始,此参数的默认值从 `4` 更改为 `8`。 +- 引入版本:- + +##### tablet_sched_storage_cooldown_second + +- 默认值:-1 +- 别名:storage_cooldown_second +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:从表创建时间开始自动冷却的延迟。默认值 `-1` 指定禁用自动冷却。如果您想启用自动冷却,请将此参数设置为大于 `-1` 的值。 +- 引入版本:- + +##### tablet_stat_update_interval_second + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:否 +- 描述:FE 从每个 BE 检索 Tablet 统计信息的时间间隔。 +- 引入版本:- + +### 共享数据 + +##### aws_s3_access_key + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于访问 S3 存储桶的 Access Key ID。 +- 引入版本:v3.0 + +##### aws_s3_endpoint + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于访问 S3 存储桶的 Endpoint,例如 `https://s3.us-west-2.amazonaws.com`。 +- 引入版本:v3.0 + +##### aws_s3_external_id + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于跨账号访问 S3 存储桶的 AWS 账户的 External ID。 +- 引入版本:v3.0 + +##### aws_s3_iam_role_arn + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:在存储数据文件的 S3 存储桶上具有权限的 IAM 角色的 ARN。 +- 引入版本:v3.0 + +##### aws_s3_path + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于存储数据的 S3 路径。它由 S3 存储桶的名称及其下的子路径(如果有)组成,例如 `testbucket/subpath`。 +- 引入版本:v3.0 + +##### aws_s3_region + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:S3 存储桶所在的区域,例如 `us-west-2`。 +- 引入版本:v3.0 + +##### aws_s3_secret_key + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于访问 S3 存储桶的 Secret Access Key。 +- 引入版本:v3.0 + +##### aws_s3_use_aws_sdk_default_behavior + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否使用 AWS SDK 的默认认证凭证。有效值:true 和 false(默认)。 +- 引入版本:v3.0 + +##### aws_s3_use_instance_profile + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否使用 Instance Profile 和 Assumed Role 作为访问 S3 的凭证方法。有效值:true 和 false(默认)。 + - 如果您使用基于 IAM 用户的凭证(Access Key 和 Secret Key)访问 S3,则必须将此项指定为 `false`,并指定 `aws_s3_access_key` 和 `aws_s3_secret_key`。 + - 如果您使用 Instance Profile 访问 S3,则必须将此项指定为 `true`。 + - 如果您使用 Assumed Role 访问 S3,则必须将此项指定为 `true`,并指定 `aws_s3_iam_role_arn`。 + - 如果您使用外部 AWS 账户,则还必须指定 `aws_s3_external_id`。 +- 引入版本:v3.0 + +##### azure_adls2_endpoint + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Azure Data Lake Storage Gen2 账户的 Endpoint,例如 `https://test.dfs.core.windows.net`。 +- 引入版本:v3.4.1 + +##### azure_adls2_oauth2_client_id + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 Azure Data Lake Storage Gen2 请求的托管标识的 Client ID。 +- 引入版本:v3.4.4 + +##### azure_adls2_oauth2_tenant_id + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 Azure Data Lake Storage Gen2 请求的托管标识的 Tenant ID。 +- 引入版本:v3.4.4 + +##### azure_adls2_oauth2_use_managed_identity + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否使用托管标识授权 Azure Data Lake Storage Gen2 请求。 +- 引入版本:v3.4.4 + +##### azure_adls2_path + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于存储数据的 Azure Data Lake Storage Gen2 路径。它由文件系统名称和目录名称组成,例如 `testfilesystem/starrocks`。 +- 引入版本:v3.4.1 + +##### azure_adls2_sas_token + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 Azure Data Lake Storage Gen2 请求的共享访问签名 (SAS)。 +- 引入版本:v3.4.1 + +##### azure_adls2_shared_key + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 Azure Data Lake Storage Gen2 请求的 Shared Key。 +- 引入版本:v3.4.1 + +##### azure_blob_endpoint + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:Azure Blob Storage 账户的 Endpoint,例如 `https://test.blob.core.windows.net`。 +- 引入版本:v3.1 + +##### azure_blob_path + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于存储数据的 Azure Blob Storage 路径。它由存储账户中容器的名称及其下的子路径(如果有)组成,例如 `testcontainer/subpath`。 +- 引入版本:v3.1 + +##### azure_blob_sas_token + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 Azure Blob Storage 请求的共享访问签名 (SAS)。 +- 引入版本:v3.1 + +##### azure_blob_shared_key + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 Azure Blob Storage 请求的 Shared Key。 +- 引入版本:v3.1 + +##### azure_use_native_sdk + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否使用原生 SDK 访问 Azure Blob Storage,从而允许使用托管标识和服务主体进行身份验证。如果此项设置为 `false`,则仅允许使用 Shared Key 和 SAS Token 进行身份验证。 +- 引入版本:v3.4.4 + +##### cloud_native_hdfs_url + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:HDFS 存储的 URL,例如 `hdfs://127.0.0.1:9000/user/xxx/starrocks/`。 +- 引入版本:- + +##### cloud_native_meta_port + +- 默认值:6090 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:FE 云原生元数据服务器 RPC 监听端口。 +- 引入版本:- + +##### cloud_native_storage_type + +- 默认值:S3 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:您使用的对象存储类型。在共享数据模式下,StarRocks 支持将数据存储在 HDFS、Azure Blob(从 v3.1.1 开始支持)、Azure Data Lake Storage Gen2(从 v3.4.1 开始支持)、Google Storage(使用原生 SDK,从 v3.5.1 开始支持)以及与 S3 协议兼容的对象存储系统(如 AWS S3 和 MinIO)。有效值:`S3`(默认)、`HDFS`、`AZBLOB`、`ADLS2` 和 `GS`。如果将此参数指定为 `S3`,则必须添加以 `aws_s3` 为前缀的参数。如果将此参数指定为 `AZBLOB`,则必须添加以 `azure_blob` 为前缀的参数。如果将此参数指定为 `ADLS2`,则必须添加以 `azure_adls2` 为前缀的参数。如果将此参数指定为 `GS`,则必须添加以 `gcp_gcs` 为前缀的参数。如果将此参数指定为 `HDFS`,则只需指定 `cloud_native_hdfs_url`。 +- 引入版本:- + +##### enable_load_volume_from_conf + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否允许 StarRocks 使用 FE 配置文件中指定的对象存储相关属性创建内置存储卷。默认值从 v3.4.1 开始从 `true` 更改为 `false`。 +- 引入版本:v3.1.0 + +##### gcp_gcs_impersonation_service_account + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:如果您使用基于模拟的身份验证访问 Google Storage,您想要模拟的服务账户。 +- 引入版本:v3.5.1 + +##### gcp_gcs_path + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于存储数据的 Google Cloud 路径。它由 Google Cloud 存储桶的名称及其下的子路径(如果有)组成,例如 `testbucket/subpath`。 +- 引入版本:v3.5.1 + +##### gcp_gcs_service_account_email + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:在创建服务账户时生成的 JSON 文件中的电子邮件地址,例如 `user@hello.iam.gserviceaccount.com`。 +- 引入版本:v3.5.1 + +##### gcp_gcs_service_account_private_key + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:在创建服务账户时生成的 JSON 文件中的私钥,例如 `-----BEGIN PRIVATE KEY----xxxx-----END PRIVATE KEY-----\n`。 +- 引入版本:v3.5.1 + +##### gcp_gcs_service_account_private_key_id + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:在创建服务账户时生成的 JSON 文件中的私钥 ID。 +- 引入版本:v3.5.1 + +##### gcp_gcs_use_compute_engine_service_account + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:是否使用绑定到 Compute Engine 的服务账户。 +- 引入版本:v3.5.1 + +##### hdfs_file_system_expire_seconds + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:HdfsFsManager 管理的未使用缓存 HDFS/ObjectStore FileSystem 的生存时间(秒)。FileSystemExpirationChecker(每 60 秒运行一次)使用此值调用每个 HdfsFs.isExpired(...);当过期时,管理器关闭底层 FileSystem 并将其从缓存中删除。访问器方法(例如 `HdfsFs.getDFSFileSystem`、`getUserName`、`getConfiguration`)会更新最后访问时间戳,因此过期是基于不活动状态。较小的值会减少空闲资源占用,但会增加重新打开的开销;较大的值会使句柄保留更长时间,并可能消耗更多资源。 +- 引入版本:v3.2.0 + +##### lake_autovacuum_grace_period_minutes + +- 默认值:30 +- 类型:Long +- 单位:分钟 +- 是否可变:是 +- 描述:共享数据集群中保留历史数据版本的时间范围。在此时间范围内的历史数据版本在 Compaction 后不会通过 AutoVacuum 自动清理。您需要将此值设置得大于最大查询时间,以避免正在运行的查询访问的数据在查询完成前被删除。默认值从 v3.3.0、v3.2.5 和 v3.1.10 开始已从 `5` 更改为 `30`。 +- 引入版本:v3.1.0 + +##### lake_autovacuum_parallel_partitions + +- 默认值:8 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:共享数据集群中可以同时进行 AutoVacuum 的最大分区数。AutoVacuum 是 Compaction 后的垃圾回收。 +- 引入版本:v3.1.0 + +##### lake_autovacuum_partition_naptime_seconds + +- 默认值:180 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:共享数据集群中同一分区上 AutoVacuum 操作之间的最小间隔。 +- 引入版本:v3.1.0 + +##### lake_autovacuum_stale_partition_threshold + +- 默认值:12 +- 类型:Long +- 单位:小时 +- 是否可变:是 +- 描述:如果分区在此时间范围内没有更新(加载、DELETE 或 Compaction),系统将不会对此分区执行 AutoVacuum。 +- 引入版本:v3.1.0 + +##### lake_compaction_allow_partial_success + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:如果此项设置为 `true`,系统将认为共享数据集群中的 Compaction 操作在其中一个子任务成功时即为成功。 +- 引入版本:v3.5.2 + +##### lake_compaction_disable_ids + +- 默认值:"" +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:在共享数据模式下禁用 Compaction 的表或分区列表。格式为 `tableId1;partitionId2`,以分号分隔,例如 `12345;98765`。 +- 引入版本:v3.4.4 + +##### lake_compaction_history_size + +- 默认值:20 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:共享数据集群中 Leader FE 节点内存中保留的最近成功 Compaction 任务记录的数量。您可以使用 `SHOW PROC '/compactions'` 命令查看最近成功的 Compaction 任务记录。请注意,Compaction 历史记录存储在 FE 进程内存中,如果 FE 进程重启,它将丢失。 +- 引入版本:v3.1.0 + +##### lake_compaction_max_tasks + +- 默认值:-1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:共享数据集群中允许的最大并发 Compaction 任务数。将此项设置为 `-1` 表示以自适应方式计算并发任务数。将此值设置为 `0` 将禁用 Compaction。 +- 引入版本:v3.1.0 + +##### lake_compaction_score_selector_min_score + +- 默认值:10.0 +- 类型:Double +- 单位:- +- 是否可变:是 +- 描述:触发共享数据集群中 Compaction 操作的 Compaction Score 阈值。当分区的 Compaction Score 大于或等于此值时,系统会对该分区执行 Compaction。 +- 引入版本:v3.1.0 + +##### lake_compaction_score_upper_bound + +- 默认值:2000 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:共享数据集群中分区 Compaction Score 的上限。`0` 表示无上限。此项仅在 `lake_enable_ingest_slowdown` 设置为 `true` 时生效。当分区的 Compaction Score 达到或超过此上限时,传入的加载任务将被拒绝。从 v3.3.6 开始,默认值从 `0` 更改为 `2000`。 +- 引入版本:v3.2.0 + +##### lake_enable_balance_tablets_between_workers + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:在共享数据集群中云原生表的 Tablet 迁移期间,是否在 Compute 节点之间平衡 Tablet 的数量。`true` 表示在 Compute 节点之间平衡 Tablet,`false` 表示禁用此功能。 +- 引入版本:v3.3.4 + +##### lake_enable_ingest_slowdown + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在共享数据集群中启用数据摄取减速。当数据摄取减速启用时,如果分区的 Compaction Score 超过 `lake_ingest_slowdown_threshold`,则该分区上的加载任务将被限流。此配置仅在 `run_mode` 设置为 `shared_data` 时生效。从 v3.3.6 开始,默认值从 `false` 更改为 `true`。 +- 引入版本:v3.2.0 + +##### lake_ingest_slowdown_threshold + +- 默认值:100 +- 类型:Long +- 单位:- +- 是否可变:是 +- 描述:触发共享数据集群中数据摄取减速的 Compaction Score 阈值。此配置仅在 `lake_enable_ingest_slowdown` 设置为 `true` 时生效。 +- 引入版本:v3.2.0 + +##### lake_publish_version_max_threads + +- 默认值:512 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:共享数据集群中版本发布任务的最大线程数。 +- 引入版本:v3.2.0 + +##### meta_sync_force_delete_shard_meta + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否允许直接删除共享数据集群的元数据,绕过清理远程存储文件。建议仅在需要清理的 shard 过多,导致 FE JVM 内存压力过大时,才将此项设置为 `true`。请注意,启用此功能后,属于 shard 或 Tablet 的数据文件无法自动清理。 +- 引入版本:v3.2.10, v3.3.3 + +##### run_mode + +- 默认值:shared_nothing +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:StarRocks 集群的运行模式。有效值:`shared_data` 和 `shared_nothing`(默认)。 + - `shared_data` 表示以共享数据模式运行 StarRocks。 + - `shared_nothing` 表示以 Shared-Nothing 模式运行 StarRocks。 + + > **注意** + > + > - StarRocks 集群不能同时采用 `shared_data` 和 `shared_nothing` 模式。不支持混合部署。 + > - 集群部署后,请勿更改 `run_mode`。否则,集群将无法重启。不支持从 Shared-Nothing 集群转换为共享数据集群,反之亦然。 + +- 引入版本:- + +##### shard_group_clean_threshold_sec + +- 默认值:3600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:FE 清理共享数据集群中未使用的 Tablet 和 shard 组的时间。在此阈值内创建的 Tablet 和 shard 组将不会被清理。 +- 引入版本:- + +##### star_mgr_meta_sync_interval_sec + +- 默认值:600 +- 类型:Long +- 单位:秒 +- 是否可变:否 +- 描述:FE 与共享数据集群中的 StarMgr 运行周期性元数据同步的时间间隔。 +- 引入版本:- + +##### starmgr_grpc_server_max_worker_threads + +- 默认值:1024 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE starmgr 模块中 grpc 服务器使用的最大工作线程数。 +- 引入版本:v4.0.0, v3.5.8 + +##### starmgr_grpc_timeout_seconds + +- 默认值:5 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述: +- 引入版本:- + +### 数据湖 + +##### files_enable_insert_push_down_schema + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用时,分析器将尝试将目标表 Schema 推入 `files()` 表函数,用于 INSERT ... FROM files() 操作。这仅适用于源是 FileTableFunctionRelation、目标是原生表且 SELECT 列表中包含相应的 slot-ref 列(或 *)的情况。分析器会将选择列与目标列匹配(计数必须匹配),短暂锁定目标表,并用非复杂类型(Parquet JSON `->` `array` 等复杂类型将被跳过)的深拷贝目标列类型替换文件列类型。保留原始文件表的列名。这减少了摄取期间由于文件类型推断导致的类型不匹配和宽松性。 +- 引入版本:v3.4.0, v3.5.0 + +##### hdfs_read_buffer_size_kb + +- 默认值:8192 +- 类型:Int +- 单位:千字节 +- 是否可变:是 +- 描述:HDFS 读取缓冲区的大小(千字节)。StarRocks 将此值转换为字节 (`<< 10`),并将其用于在 `HdfsFsManager` 中初始化 HDFS 读取缓冲区,并在不使用 Broker 访问时填充发送到 BE 任务的 thrift 字段 `hdfs_read_buffer_size_kb`(例如 `TBrokerScanRangeParams`、`TDownloadReq`)。增加 `hdfs_read_buffer_size_kb` 可以提高顺序读取吞吐量并减少系统调用开销,但会以更高的每流内存使用为代价;减少它会减少内存占用,但可能会降低 I/O 效率。调整时考虑工作负载(许多小流与少量大顺序读取)。 +- 引入版本:v3.2.0 + +##### hdfs_write_buffer_size_kb + +- 默认值:1024 +- 类型:Int +- 单位:千字节 +- 是否可变:是 +- 描述:设置用于直接写入 HDFS 或对象存储时(不使用 Broker)的 HDFS 写入缓冲区大小(KB)。FE 将此值转换为字节 (`<< 10`),并在 HdfsFsManager 中初始化本地写入缓冲区,并将其传播到 Thrift 请求中(例如 TUploadReq、TExportSink、Sink Options),以便后端/代理使用相同的缓冲区大小。增加此值可以提高大型顺序写入的吞吐量,但会以每个写入器更多内存为代价;减少它会减少每个流的内存使用,并可能降低小型写入的延迟。与 `hdfs_read_buffer_size_kb` 一起调整,并考虑可用内存和并发写入器。 +- 引入版本:v3.2.0 + +##### lake_batch_publish_max_version_num + +- 默认值:10 +- 类型:Int +- 单位:计数 +- 是否可变:是 +- 描述:设置构建用于湖表(云原生表)的发布批次时,可能分组在一起的连续事务版本的上限。该值传递给事务图批处理例程(参见 getReadyToPublishTxnListBatch),并与 `lake_batch_publish_min_version_num` 协同工作,以确定 TransactionStateBatch 的候选范围大小。较大的值可以通过批处理更多提交来提高发布吞吐量,但会增加原子发布范围(更长的可见性延迟和更大的回滚表面),并且当版本不连续时可能会在运行时受到限制。根据工作负载和可见性/延迟要求进行调整。 +- 引入版本:v3.2.0 + +##### lake_batch_publish_min_version_num + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:设置形成湖表发布批次所需的最小连续事务版本数。DatabaseTransactionMgr.getReadyToPublishTxnListBatch 将此值与 `lake_batch_publish_max_version_num` 一起传递给 transactionGraph.getTxnsWithTxnDependencyBatch 以选择依赖事务。值为 `1` 允许单事务发布(不批处理)。值 `>1` 要求至少有那么多连续版本、单表、非复制事务可用;如果版本不连续、出现复制事务或 Schema Change 消耗了一个版本,则批处理将中止。增加此值可以通过分组提交来提高发布吞吐量,但可能会在等待足够多的连续事务时延迟发布。 +- 引入版本:v3.2.0 + +##### lake_enable_batch_publish_version + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:启用后,PublishVersionDaemon 将同一湖(共享数据)表/分区的就绪事务进行批处理,并将其版本一起发布,而不是发布每个事务。在 Shared-Data 模式下,守护进程调用 getReadyPublishTransactionsBatch() 并使用 publishVersionForLakeTableBatch(...) 执行分组发布操作(减少 RPC 并提高吞吐量)。禁用时,守护进程回退到通过 publishVersionForLakeTable(...) 进行每个事务发布。该实现使用内部集合协调正在进行的工作,以避免在切换开关时重复发布,并受 `lake_publish_version_max_threads` 的线程池大小影响。 +- 引入版本:v3.2.0 + +##### lake_enable_tablet_creation_optimization + +- 默认值:false +- 类型:boolean +- 单位:- +- 是否可变:是 +- 描述:启用后,StarRocks 在共享数据模式下为云原生表和物化视图优化 Tablet 创建,方法是为物理分区下的所有 Tablet 创建单个共享 Tablet 元数据,而不是为每个 Tablet 创建不同的元数据。这减少了表创建、Rollup 和 Schema Change 作业期间创建的 Tablet 任务和元数据/文件数量。此优化仅适用于云原生表/物化视图,并与 `file_bundling` 结合使用(后者重用相同的优化逻辑)。注意:Schema Change 和 Rollup 作业明确禁用使用 `file_bundling` 的表的优化,以避免覆盖同名文件。谨慎启用——它改变了创建的 Tablet 元数据的粒度,并可能影响副本创建和文件命名行为。 +- 引入版本:v3.3.1, v3.4.0, v3.5.0 + +##### lake_use_combined_txn_log + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:当此项设置为 `true` 时,系统允许湖表对相关事务使用组合事务日志路径。仅适用于共享数据集群。 +- 引入版本:v3.3.7, v3.4.0, v3.5.0 + +##### enable_iceberg_commit_queue + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用 Iceberg 表的提交队列以避免并发提交冲突。Iceberg 使用乐观并发控制(OCC)进行元数据提交。当多个线程并发提交到同一表时,可能会出现冲突,例如:“无法提交:基础元数据位置与当前表元数据位置不同”。启用后,每个 Iceberg 表都有自己的单线程执行器用于提交操作,确保对同一表的提交是串行化的,从而防止 OCC 冲突。不同的表可以并发提交,从而保持整体吞吐量。这是一个系统级优化,旨在提高可靠性,应默认启用。如果禁用,并发提交可能会由于乐观锁定冲突而失败。 +- 引入版本:v4.1.0 + +##### iceberg_commit_queue_timeout_seconds + +- 默认值:300 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:等待 Iceberg 提交操作完成的超时时间(秒)。当使用提交队列 (`enable_iceberg_commit_queue=true`) 时,每个提交操作必须在此超时时间内完成。如果提交时间超过此超时,它将被取消并引发错误。影响提交时间的因素包括:提交的数据文件数量、表的元数据大小、底层存储(例如 S3、HDFS)的性能。 +- 引入版本:v4.1.0 + +##### iceberg_commit_queue_max_size + +- 默认值:1000 +- 类型:Int +- 单位:计数 +- 是否可变:否 +- 描述:每个 Iceberg 表待处理提交操作的最大数量。当使用提交队列 (`enable_iceberg_commit_queue=true`) 时,这限制了单个表可以排队的提交操作数量。当达到限制时,额外的提交操作将在调用线程中执行(阻塞直到容量可用)。此配置在 FE 启动时读取并应用于新创建的表执行器。需要重启 FE 才能生效。如果您预期对同一表进行许多并发提交,请增加此值。如果此值过低,在高并发期间提交可能会在调用线程中阻塞。 +- 引入版本:v4.1.0 + +### 其他 + +##### agent_task_resend_wait_time_ms + +- 默认值:5000 +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:FE 重新发送 Agent 任务之前必须等待的持续时间。仅当任务创建时间与当前时间之间的间隔超过此参数的值时,才能重新发送 Agent 任务。此参数用于防止重复发送 Agent 任务。 +- 引入版本:- + +##### allow_system_reserved_names + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否允许用户创建名称以 `__op` 和 `__row` 开头的列。要启用此功能,请将此参数设置为 `TRUE`。请注意,这些名称格式在 StarRocks 中保留用于特殊目的,创建此类列可能会导致未定义的行为。因此,此功能默认禁用。 +- 引入版本:v3.2.0 + +##### auth_token + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于 StarRocks 集群(FE 所属集群)内部身份验证的 Token。如果未指定此参数,StarRocks 会在集群的 Leader FE 首次启动时为集群生成一个随机 Token。 +- 引入版本:- + +##### authentication_ldap_simple_bind_base_dn + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:基本 DN,即 LDAP 服务器开始搜索用户身份验证信息的起点。 +- 引入版本:- + +##### authentication_ldap_simple_bind_root_dn + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:用于搜索用户身份验证信息的管理员 DN。 +- 引入版本:- + +##### authentication_ldap_simple_bind_root_pwd + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:用于搜索用户身份验证信息的管理员密码。 +- 引入版本:- + +##### authentication_ldap_simple_server_host + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:LDAP 服务器运行的主机。 +- 引入版本:- + +##### authentication_ldap_simple_server_port + +- 默认值:389 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:LDAP 服务器的端口。 +- 引入版本:- + +##### authentication_ldap_simple_user_search_attr + +- 默认值:uid +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:用于在 LDAP 对象中识别用户的属性名称。 +- 引入版本:- + +##### backup_job_default_timeout_ms + +- 默认值:86400 * 1000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:备份作业的超时持续时间。如果超过此值,备份作业将失败。 +- 引入版本:- + +##### enable_collect_tablet_num_in_show_proc_backend_disk_path + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在 `SHOW PROC /BACKENDS/{id}` 命令中启用每个磁盘的 Tablet 数量收集。 +- 引入版本:v4.0.1, v3.5.8 + +##### enable_colocate_restore + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用 Colocate 表的备份和恢复。`true` 表示启用 Colocate 表的备份和恢复,`false` 表示禁用。 +- 引入版本:v3.2.10, v3.3.3 + +##### enable_materialized_view_concurrent_prepare + +- 默认值:true +- 类型:Boolean +- 单位: +- 是否可变:是 +- 描述:是否并发准备物化视图以提高性能。 +- 引入版本:v3.4.4 + +##### enable_metric_calculator + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:否 +- 描述:指定是否启用用于定期收集指标的功能。有效值:`TRUE` 和 `FALSE`。`TRUE` 指定启用此功能,`FALSE` 指定禁用此功能。 +- 引入版本:- + +##### enable_table_metrics_collect + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在 FE 中导出表级别指标。禁用时,FE 将跳过导出表指标(例如表扫描/加载计数器和表大小指标),但仍将计数器记录在内存中。 +- 引入版本:- + +##### enable_mv_post_image_reload_cache + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:FE 加载镜像后是否执行重新加载标志检查。如果对基本物化视图执行检查,则无需对与之相关的其他物化视图执行检查。 +- 引入版本:v3.5.0 + +##### enable_mv_query_context_cache + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用查询级别的物化视图重写缓存以提高查询重写性能。 +- 引入版本:v3.3 + +##### enable_mv_refresh_collect_profile + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否默认在刷新物化视图时为所有物化视图启用 Profile。 +- 引入版本:v3.3.0 + +##### enable_mv_refresh_extra_prefix_logging + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在日志中启用物化视图名称前缀以更好地进行调试。 +- 引入版本:v3.4.0 + +##### enable_mv_refresh_query_rewrite + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否在物化视图刷新期间启用重写查询,以便查询可以直接使用重写的物化视图而不是基表,以提高查询性能。 +- 引入版本:v3.3 + +##### enable_trace_historical_node + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否允许系统跟踪历史节点。通过将此项设置为 `true`,您可以启用缓存共享功能,并允许系统在弹性扩展期间选择正确的缓存节点。 +- 引入版本:v3.5.1 + +##### es_state_sync_interval_second + +- 默认值:10 +- 类型:Long +- 单位:秒 +- 是否可变:否 +- 描述:FE 获取 Elasticsearch 索引并同步 StarRocks 外部表元数据的时间间隔。 +- 引入版本:- + +##### hive_meta_cache_refresh_interval_s + +- 默认值:3600 * 2 +- 类型:Long +- 单位:秒 +- 是否可变:否 +- 描述:Hive 外部表缓存元数据的更新时间间隔。 +- 引入版本:- + +##### hive_meta_store_timeout_s + +- 默认值:10 +- 类型:Long +- 单位:秒 +- 是否可变:否 +- 描述:与 Hive Metastore 连接超时的时间量。 +- 引入版本:- + +##### jdbc_connection_idle_timeout_ms + +- 默认值:600000 +- 类型:Int +- 单位:毫秒 +- 是否可变:否 +- 描述:访问 JDBC Catalog 的连接超时后最长时间。超时连接被视为空闲。 +- 引入版本:- + +##### jdbc_connection_timeout_ms + +- 默认值:10000 +- 类型:Long +- 单位:毫秒 +- 是否可变:否 +- 描述:HikariCP 连接池获取连接的超时时间(毫秒)。如果在此时间内无法从池中获取连接,操作将失败。 +- 引入版本:v3.5.13 + +##### jdbc_query_timeout_ms + +- 默认值:30000 +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:JDBC 语句查询执行的超时时间(毫秒)。此超时应用于通过 JDBC Catalog 执行的所有 SQL 查询(例如分区元数据查询)。该值在传递给 JDBC 驱动程序时转换为秒。 +- 引入版本:v3.5.13 + +##### jdbc_network_timeout_ms + +- 默认值:30000 +- 类型:Long +- 单位:毫秒 +- 是否可变:是 +- 描述:JDBC 网络操作(套接字读取)的超时时间(毫秒)。此超时适用于数据库元数据调用(例如 getSchemas()、getTables()、getColumns()),以防止在外部数据库无响应时无限期阻塞。 +- 引入版本:v3.5.13 + +##### jdbc_connection_pool_size + +- 默认值:8 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:用于访问 JDBC Catalog 的 JDBC 连接池的最大容量。 +- 引入版本:- + +##### jdbc_meta_default_cache_enable + +- 默认值:false +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:JDBC Catalog 元数据缓存是否启用的默认值。设置为 True 时,新创建的 JDBC Catalog 将默认启用元数据缓存。 +- 引入版本:- + +##### jdbc_meta_default_cache_expire_sec + +- 默认值:600 +- 类型:Long +- 单位:秒 +- 是否可变:是 +- 描述:JDBC Catalog 元数据缓存的默认过期时间。当 `jdbc_meta_default_cache_enable` 设置为 true 时,新创建的 JDBC Catalog 将默认设置元数据缓存的过期时间。 +- 引入版本:- + +##### jdbc_minimum_idle_connections + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:用于访问 JDBC Catalog 的 JDBC 连接池中的最小空闲连接数。 +- 引入版本:- + +##### jwt_jwks_url + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:JSON Web Key Set (JWKS) 服务的 URL 或 `fe/conf` 目录下公共密钥本地文件的路径。 +- 引入版本:v3.5.0 + +##### jwt_principal_field + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于标识 JWT 中表示主题 (`sub`) 字段的字符串。默认值为 `sub`。此字段的值必须与登录 StarRocks 的用户名相同。 +- 引入版本:v3.5.0 + +##### jwt_required_audience + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于标识 JWT 中受众 (`aud`) 字段的字符串列表。仅当列表中一个值与 JWT 受众匹配时,JWT 才被视为有效。 +- 引入版本:v3.5.0 + +##### jwt_required_issuer + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于标识 JWT 中颁发者 (`iss`) 字段的字符串列表。仅当列表中一个值与 JWT 颁发者匹配时,JWT 才被视为有效。 +- 引入版本:v3.5.0 + +##### locale + +- 默认值:zh_CN.UTF-8 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:FE 使用的字符集。 +- 引入版本:- + +##### max_agent_task_threads_num + +- 默认值:4096 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:Agent 任务线程池中允许的最大线程数。 +- 引入版本:- + +##### max_download_task_per_be + +- 默认值:0 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:在每次 RESTORE 操作中,StarRocks 分配给 BE 节点的最大下载任务数。当此项设置为小于或等于 0 时,不对任务数量施加限制。 +- 引入版本:v3.1.0 + +##### max_mv_check_base_table_change_retry_times + +- 默认值:10 +- 类型:- +- 单位:- +- 是否可变:是 +- 描述:刷新物化视图时检测基表更改的最大重试次数。 +- 引入版本:v3.3.0 + +##### max_mv_refresh_failure_retry_times + +- 默认值:1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:物化视图刷新失败时的最大重试次数。 +- 引入版本:v3.3.0 + +##### max_mv_refresh_try_lock_failure_retry_times + +- 默认值:3 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:物化视图刷新失败时尝试锁定的最大重试次数。 +- 引入版本:v3.3.0 + +##### max_small_file_number + +- 默认值:100 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:FE 目录中可以存储的小文件最大数量。 +- 引入版本:- + +##### max_small_file_size_bytes + +- 默认值:1024 * 1024 +- 类型:Int +- 单位:字节 +- 是否可变:是 +- 描述:小文件的最大大小。 +- 引入版本:- + +##### max_upload_task_per_be + +- 默认值:0 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:在每次 BACKUP 操作中,StarRocks 分配给 BE 节点的最大上传任务数。当此项设置为小于或等于 0 时,不对任务数量施加限制。 +- 引入版本:v3.1.0 + +##### mv_create_partition_batch_interval_ms + +- 默认值:1000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:物化视图刷新期间,如果需要批量创建多个分区,系统会将其分成每批 64 个分区。为了降低频繁创建分区导致的失败风险,每个批次之间设置了默认间隔(毫秒)以控制创建频率。 +- 引入版本:v3.3 + +##### mv_plan_cache_max_size + +- 默认值:1000 +- 类型:Long +- 单位: +- 是否可变:是 +- 描述:物化视图计划缓存(用于物化视图重写)的最大大小。如果有很多物化视图用于透明查询重写,您可以增加此值。 +- 引入版本:v3.2 + +##### mv_plan_cache_thread_pool_size + +- 默认值:3 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:物化视图计划缓存(用于物化视图重写)的默认线程池大小。 +- 引入版本:v3.2 + +##### mv_refresh_default_planner_optimize_timeout + +- 默认值:30000 +- 类型:- +- 单位:- +- 是否可变:是 +- 描述:刷新物化视图时优化器规划阶段的默认超时。 +- 引入版本:v3.3.0 + +##### mv_refresh_fail_on_filter_data + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:如果在刷新期间有过滤数据,物化视图刷新失败,默认为 true,否则通过忽略过滤数据返回成功。 +- 引入版本:- + +##### mv_refresh_try_lock_timeout_ms + +- 默认值:30000 +- 类型:Int +- 单位:毫秒 +- 是否可变:是 +- 描述:物化视图刷新尝试其基表/物化视图的 DB 锁的默认尝试锁定超时。 +- 引入版本:v3.3.0 + +##### oauth2_auth_server_url + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:授权 URL。用户浏览器将被重定向到此 URL 以开始 OAuth 2.0 授权过程。 +- 引入版本:v3.5.0 + +##### oauth2_client_id + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:StarRocks 客户端的公共标识符。 +- 引入版本:v3.5.0 + +##### oauth2_client_secret + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于授权 StarRocks 客户端与授权服务器通信的 Secret。 +- 引入版本:v3.5.0 + +##### oauth2_jwks_url + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:JSON Web Key Set (JWKS) 服务的 URL 或 `conf` 目录下本地文件的路径。 +- 引入版本:v3.5.0 + +##### oauth2_principal_field + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于标识 JWT 中表示主题 (`sub`) 字段的字符串。默认值为 `sub`。此字段的值必须与登录 StarRocks 的用户名相同。 +- 引入版本:v3.5.0 + +##### oauth2_redirect_url + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:OAuth 2.0 身份验证成功后,用户浏览器将被重定向到的 URL。授权码将发送到此 URL。在大多数情况下,需要将其配置为 `http://:/api/oauth2`。 +- 引入版本:v3.5.0 + +##### oauth2_required_audience + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于标识 JWT 中受众 (`aud`) 字段的字符串列表。仅当列表中一个值与 JWT 受众匹配时,JWT 才被视为有效。 +- 引入版本:v3.5.0 + +##### oauth2_required_issuer + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:用于标识 JWT 中颁发者 (`iss`) 字段的字符串列表。仅当列表中一个值与 JWT 颁发者匹配时,JWT 才被视为有效。 +- 引入版本:v3.5.0 + +##### oauth2_token_server_url + +- 默认值:空字符串 +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:StarRocks 从授权服务器获取访问 Token 的端点 URL。 +- 引入版本:v3.5.0 + +##### plugin_dir + +- 默认值:System.getenv("STARROCKS_HOME") + "/plugins" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储插件安装包的目录。 +- 引入版本:- + +##### plugin_enable + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否可以在 FE 上安装插件。插件只能在 Leader FE 上安装或卸载。 +- 引入版本:- + +##### proc_profile_jstack_depth + +- 默认值:128 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:系统收集 CPU 和内存 Profile 时 Java 堆栈的最大深度。此值控制每个采样堆栈捕获的 Java 堆栈帧数:较大的值会增加跟踪详细信息和输出大小,并可能增加 Profiling 开销,而较小的值会减少详细信息。此设置在 CPU 和内存 Profiling 启动时都使用,因此请根据诊断需求和性能影响进行调整。 +- 引入版本:- + +##### proc_profile_mem_enable + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:是否启用进程内存分配 Profile 收集。当此项设置为 `true` 时,系统会在 `sys_log_dir/proc_profile` 下生成名为 `mem-profile-.html` 的 HTML Profile,然后在采样期间睡眠 `proc_profile_collect_time_s` 秒,并使用 `proc_profile_jstack_depth` 作为 Java 堆栈深度。生成的文件根据 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 进行压缩和清除。原生提取路径使用 `STARROCKS_HOME_DIR` 以避免 `/tmp` noexec 问题。此项旨在解决内存分配热点问题。启用它会增加 CPU、I/O 和磁盘使用,并可能产生大文件。 +- 引入版本:v3.2.12 + +##### query_detail_explain_level + +- 默认值:COSTS +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:EXPLAIN 语句返回的查询计划的详细级别。有效值:COSTS, NORMAL, VERBOSE。 +- 引入版本:v3.2.12, v3.3.5 + +##### replication_interval_ms + +- 默认值:100 +- 类型:Int +- 单位:- +- 是否可变:否 +- 描述:调度复制任务的最小时间间隔。 +- 引入版本:v3.3.5 + +##### replication_max_parallel_data_size_mb + +- 默认值:1048576 +- 类型:Int +- 单位:MB +- 是否可变:是 +- 描述:并发同步允许的最大数据大小。 +- 引入版本:v3.3.5 + +##### replication_max_parallel_replica_count + +- 默认值:10240 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:并发同步允许的最大 Tablet 副本数。 +- 引入版本:v3.3.5 + +##### replication_max_parallel_table_count + +- 默认值:100 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:允许的最大并发数据同步任务数。StarRocks 为每张表创建一个同步任务。 +- 引入版本:v3.3.5 + +##### replication_transaction_timeout_sec + +- 默认值:86400 +- 类型:Int +- 单位:秒 +- 是否可变:是 +- 描述:同步任务的超时持续时间。 +- 引入版本:v3.3.5 + +##### skip_whole_phase_lock_mv_limit + +- 默认值:5 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:控制 StarRocks 何时对具有相关物化视图的表应用“无锁”优化。当此项设置为小于 0 时,系统始终应用无锁优化,并且不为查询复制相关物化视图(FE 内存使用和元数据复制/锁争用减少,但元数据并发问题风险可能增加)。当设置为 0 时,禁用无锁优化(系统始终使用安全的复制和锁定路径)。当设置为大于 0 时,仅当相关物化视图的数量小于或等于配置的阈值时才应用无锁优化。此外,当值大于等于 0 时,规划器将查询 OLAP 表记录到优化器上下文中以启用与物化视图相关的重写路径;当小于 0 时,此步骤被跳过。 +- 引入版本:v3.2.1 + +##### small_file_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/small_files" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:小文件的根目录。 +- 引入版本:- + +##### task_runs_max_history_number + +- 默认值:10000 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:内存中保留的任务运行记录的最大数量,并用作查询存档任务运行历史记录时的默认 LIMIT。当 `enable_task_history_archive` 为 false 时,此值限制内存中的历史记录:强制 GC 修剪旧条目,因此只保留最新的 `task_runs_max_history_number`。当查询存档历史记录时(且未提供显式 LIMIT),如果此值大于 0,`TaskRunHistoryTable.lookup` 将使用 `"ORDER BY create_time DESC LIMIT "`。注意:将其设置为 0 会禁用查询端的 LIMIT(无上限),但会导致内存中的历史记录截断为零(除非启用了存档)。 +- 引入版本:v3.2.0 + +##### tmp_dir + +- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/temp_dir" +- 类型:String +- 单位:- +- 是否可变:否 +- 描述:存储临时文件的目录,例如备份和恢复过程中生成的文件。这些过程完成后,生成的临时文件将被删除。 +- 引入版本:- + +##### transform_type_prefer_string_for_varchar + +- 默认值:true +- 类型:Boolean +- 单位:- +- 是否可变:是 +- 描述:在物化视图创建和 CTAS 操作中,是否更喜欢对固定长度 VARCHAR 列使用 STRING 类型。 +- 引入版本:v4.0.0 + + diff --git a/docs/zh/administration/management/Scale_up_down.md b/docs/zh/administration/management/Scale_up_down.md new file mode 100644 index 0000000..3c26128 --- /dev/null +++ b/docs/zh/administration/management/Scale_up_down.md @@ -0,0 +1,99 @@ +--- +displayed_sidebar: docs +--- + +# 扩缩容 + +本主题描述了如何对 StarRocks 节点进行扩缩容。 + +## FE 节点扩缩容 + +StarRocks 有两种类型的 FE 节点:Follower 和 Observer。Follower 参与选举投票和写入。Observer 仅用于同步日志和扩展读取性能。 + +> * Follower FE(包括 Leader)的数量必须是奇数,建议部署 3 个以形成高可用 (HA) 模式。 +> * 当 FE 以高可用模式部署(1 个 Leader,2 个 Follower)时,建议添加 Observer FE 以获得更好的读取性能。 + +### FE 扩容 + +部署 FE 节点并启动服务后,运行以下命令进行 FE 扩容。 + +~~~sql +alter system add follower "fe_host:edit_log_port"; +alter system add observer "fe_host:edit_log_port"; +~~~ + +### FE 缩容 + +FE 缩容与扩容类似。运行以下命令进行 FE 缩容。 + +~~~sql +alter system drop follower "fe_host:edit_log_port"; +alter system drop observer "fe_host:edit_log_port"; +~~~ + +扩缩容完成后,可以通过运行 `show proc '/frontends';` 查看节点信息。 + +## BE 节点扩缩容 + +StarRocks 会在 BE 节点扩缩容后自动执行负载均衡,而不会影响整体性能。 + +当您添加新的 BE 节点时,系统的 Tablet Scheduler 会检测到新节点及其低负载。然后,它将开始把 Tablet 从高负载的 BE 节点移动到新的、低负载的 BE 节点,以确保数据和负载在整个集群中均匀分布。 + +负载均衡过程基于为每个 BE 计算的 loadScore,该 loadScore 考虑了磁盘利用率和副本数量。系统旨在将 Tablet 从具有较高 loadScore 的节点移动到具有较低 loadScore 的节点。 + +您可以检查 FE 配置参数 `tablet_sched_disable_balance` 以确保自动均衡未被禁用(该参数默认为 false,这意味着 Tablet 均衡默认是启用的)。更多详细信息请参阅 [manage replica docs](./resource_management/Replica.md)。 + +### BE 扩容 + +运行以下命令进行 BE 扩容。 + +~~~sql +alter system add backend 'be_host:be_heartbeat_service_port'; +~~~ + +运行以下命令查看 BE 状态。 + +~~~sql +show proc '/backends'; +~~~ + +### BE 缩容 + +缩容 BE 节点有两种方式:`DROP` 和 `DECOMMISSION`。 + +`DROP` 将立即删除 BE 节点,丢失的副本将由 FE 调度补齐。`DECOMMISSION` 将首先确保副本已补齐,然后再删除 BE 节点。`DECOMMISSION` 更加友好,推荐用于 BE 缩容。 + +两种方法的命令类似: + +* `alter system decommission backend "be_host:be_heartbeat_service_port";` +* `alter system drop backend "be_host:be_heartbeat_service_port";` + +DROP BE 是一种危险操作,执行前需要二次确认。 + +* `alter system drop backend "be_host:be_heartbeat_service_port";` + +## CN 节点扩缩容 + +### CN 扩容 + +运行以下命令进行 CN 扩容。 + +~~~sql +ALTER SYSTEM ADD COMPUTE NODE "cn_host:cn_heartbeat_service_port"; +~~~ + +运行以下命令查看 CN 状态。 + +~~~sql +SHOW PROC '/compute_nodes'; +~~~ + +### CN 缩容 + +CN 缩容与扩容类似。运行以下命令进行 CN 缩容。 + +~~~sql +ALTER SYSTEM DROP COMPUTE NODE "cn_host:cn_heartbeat_service_port"; +~~~ + +您可以通过运行 `SHOW PROC '/compute_nodes';` 查看节点信息。 diff --git a/docs/zh/administration/management/audit_loader.md b/docs/zh/administration/management/audit_loader.md new file mode 100644 index 0000000..86b6229 --- /dev/null +++ b/docs/zh/administration/management/audit_loader.md @@ -0,0 +1,221 @@ +--- +displayed_sidebar: docs +--- + +# 通过 AuditLoader 在 StarRocks 中管理审计日志 + +本主题介绍如何通过插件 AuditLoader 在表中管理 StarRocks 审计日志。 + +StarRocks 将其审计日志存储在本地文件 **fe/log/fe.audit.log** 中,而不是内部数据库中。插件 AuditLoader 允许您直接在集群中管理审计日志。安装后,AuditLoader 从文件读取日志,并通过 HTTP PUT 将其加载到 StarRocks 中。然后您可以使用 SQL 语句在 StarRocks 中查询审计日志。 + +## 创建表以存储审计日志 + +在 StarRocks 集群中创建数据库和表以存储审计日志。有关详细说明,请参阅 [CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) 和 [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md)。 + +由于审计日志的字段在不同 StarRocks 版本之间有所差异,因此务必遵循以下建议,以避免在升级过程中出现兼容性问题: + +> **注意** +> +> - 所有新字段都应标记为 `NULL`。 +> - 字段不应重命名,因为用户可能依赖它们。 +> - 字段类型只能应用向后兼容的更改,例如 `VARCHAR(32)` -> `VARCHAR(64)`,以避免插入时出错。 +> - `AuditEvent` 字段仅通过名称解析。表中列的顺序无关紧要,并且用户可以随时更改。 +> - 表中不存在的 `AuditEvent` 字段将被忽略,因此用户可以删除不需要的列。 + +```SQL +CREATE DATABASE starrocks_audit_db__; + +CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( + `queryId` VARCHAR(64) COMMENT "查询的唯一ID", + `timestamp` DATETIME NOT NULL COMMENT "查询开始时间", + `queryType` VARCHAR(12) COMMENT "查询类型 (query, slow_query, connection)", + `clientIp` VARCHAR(32) COMMENT "客户端IP", + `user` VARCHAR(64) COMMENT "查询用户名", + `authorizedUser` VARCHAR(64) COMMENT "用户的唯一标识,即user_identity", + `resourceGroup` VARCHAR(64) COMMENT "资源组名称", + `catalog` VARCHAR(32) COMMENT "Catalog 名称", + `db` VARCHAR(96) COMMENT "查询运行的数据库", + `state` VARCHAR(8) COMMENT "查询状态 (EOF, ERR, OK)", + `errorCode` VARCHAR(512) COMMENT "错误码", + `queryTime` BIGINT COMMENT "查询执行时间(毫秒)", + `scanBytes` BIGINT COMMENT "查询扫描的字节数", + `scanRows` BIGINT COMMENT "查询扫描的行数", + `returnRows` BIGINT COMMENT "查询返回的行数", + `cpuCostNs` BIGINT COMMENT "查询消耗的CPU时间(纳秒)", + `memCostBytes` BIGINT COMMENT "查询消耗的内存(字节)", + `stmtId` INT COMMENT "SQL语句的增量ID", + `isQuery` TINYINT COMMENT "SQL是否为查询语句(1或0)", + `feIp` VARCHAR(128) COMMENT "执行该语句的FE IP", + `stmt` VARCHAR(1048576) COMMENT "原始SQL语句", + `digest` VARCHAR(32) COMMENT "慢SQL的指纹", + `planCpuCosts` DOUBLE COMMENT "查询规划阶段的CPU使用量(纳秒)", + `planMemCosts` DOUBLE COMMENT "查询规划阶段的内存使用量(字节)", + `pendingTimeMs` BIGINT COMMENT "查询在队列中等待的时间(毫秒)", + `candidateMVs` VARCHAR(65533) NULL COMMENT "候选物化视图列表", + `hitMvs` VARCHAR(65533) NULL COMMENT "匹配的物化视图列表", + `warehouse` VARCHAR(32) NULL COMMENT "Warehouse 名称" +) ENGINE = OLAP +DUPLICATE KEY (`queryId`, `timestamp`, `queryType`) +COMMENT "审计日志表" +PARTITION BY date_trunc('day', `timestamp`) +PROPERTIES ( + "replication_num" = "1", + "partition_live_number" = "30" +); +``` + +`starrocks_audit_tbl__` 是使用动态分区创建的。默认情况下,第一个动态分区在表创建后 10 分钟创建。然后可以将审计日志加载到表中。您可以使用以下语句检查表中的分区: + +```SQL +SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; +``` + +分区创建后,您可以继续下一步。 + +## 下载和配置 AuditLoader + +1. [下载](https://releases.starrocks.io/resources/auditloader.zip) AuditLoader 安装包。该软件包与所有可用版本的 StarRocks 兼容。 + +2. 解压安装包。 + + ```shell + unzip auditloader.zip + ``` + + 解压以下文件: + + - **auditloader.jar**:AuditLoader 的 JAR 文件。 + - **plugin.properties**:AuditLoader 的属性文件。您无需修改此文件。 + - **plugin.conf**:AuditLoader 的配置文件。在大多数情况下,您只需修改文件中的 `user` 和 `password` 字段。 + +3. 修改 **plugin.conf** 以配置 AuditLoader。您必须配置以下项以确保 AuditLoader 正常工作: + + - `frontend_host_port`:FE IP 地址和 HTTP 端口,格式为 `:`。建议将其设置为默认值 `127.0.0.1:8030`。StarRocks 中的每个 FE 独立管理自己的审计日志,安装插件后,每个 FE 将启动自己的后台线程来获取并保存审计日志,并通过 Stream Load 将其写入。`frontend_host_port` 配置项用于为插件的后台 Stream Load 任务提供 HTTP 协议的 IP 和端口,此参数不支持多个值。参数的 IP 部分可以使用集群中任何 FE 的 IP,但不推荐这样做,因为如果相应的 FE 崩溃,其他 FE 后台的审计日志写入任务也会因为通信失败而失败。建议将其设置为默认值 `127.0.0.1:8030`,这样每个 FE 使用自己的 HTTP 端口进行通信,从而避免在其他 FE 出现异常时影响通信(所有写入任务最终都将转发到 FE Leader 节点执行)。 + - `database`:您为存储审计日志而创建的数据库名称。 + - `table`:您为存储审计日志而创建的表名称。 + - `user`:您的集群用户名。您必须具有向表中加载数据(LOAD_PRIV)的权限。 + - `password`:您的用户密码。 + - `secret_key`:用于加密密码的密钥(字符串,长度不得超过 16 字节)。如果未设置此参数,则表示 **plugin.conf** 中的密码不会被加密,您只需在 `password` 中指定明文密码。如果指定此参数,则表示密码由该密钥加密,您需要在 `password` 中指定加密字符串。加密密码可以在 StarRocks 中使用 `AES_ENCRYPT` 函数生成:`SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`。 + - `filter`:审计日志加载的过滤条件。此参数基于 Stream Load 中的 [WHERE 参数](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties),即 `-H “where: ”`,默认为空字符串。示例:`filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`。 + +4. 将文件重新打包。 + + ```shell + zip -q -m -r auditloader.zip auditloader.jar plugin.conf plugin.properties + ``` + +5. 将软件包分发到所有托管 FE 节点的机器上。确保所有软件包都存储在相同的路径中。否则,安装将失败。分发软件包后,请记住复制软件包的绝对路径。 + + > **注意** + > + > 您也可以将 **auditloader.zip** 分发到所有 FE 可访问的 HTTP 服务(例如,`httpd` 或 `nginx`),并通过网络安装。请注意,在两种情况下,**auditloader.zip** 在安装执行后都需要持久化在路径中,并且源文件在安装后不应被删除。 + +## 安装 AuditLoader + +执行以下语句以及您复制的路径,将 AuditLoader 作为插件安装到 StarRocks 中: + +```SQL +INSTALL PLUGIN FROM ""; +``` + +从本地软件包安装的示例: + +```SQL +INSTALL PLUGIN FROM ""; +``` + +如果您想通过网络路径安装插件,您需要在 INSTALL 语句的 properties 中提供软件包的 md5。 + +示例: + +```sql +INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5sum" = "3975F7B880C9490FE95F42E2B2A28E2D"); +``` + +有关详细说明,请参阅 [INSTALL PLUGIN](../../sql-reference/sql-statements/cluster-management/plugin/INSTALL_PLUGIN.md)。 + +## 验证安装并查询审计日志 + +1. 您可以通过 [SHOW PLUGINS](../../sql-reference/sql-statements/cluster-management/plugin/SHOW_PLUGINS.md) 检查安装是否成功。 + + 在以下示例中,插件 `AuditLoader` 的 `Status` 为 `INSTALLED`,表示安装成功。 + + ```Plain + mysql> SHOW PLUGINS\G + *************************** 1. row *************************** + Name: __builtin_AuditLogBuilder + Type: AUDIT + Description: builtin audit logger + Version: 0.12.0 + JavaVersion: 1.8.31 + ClassName: com.starrocks.qe.AuditLogBuilder + SoName: NULL + Sources: Builtin + Status: INSTALLED + Properties: {} + *************************** 2. row *************************** + Name: AuditLoader + Type: AUDIT + Description: 适用于 3.3.11+ 版本。将审计日志加载到 StarRocks,用户可以查看查询的统计信息 + Version: 5.0.0 + JavaVersion: 11 + ClassName: com.starrocks.plugin.audit.AuditLoaderPlugin + SoName: NULL + Sources: /x/xx/xxx/xxxxx/auditloader.zip + Status: INSTALLED + Properties: {} + 2 rows in set (0.01 sec) + ``` + +2. 执行一些随机 SQLs 以生成审计日志,并等待 60 秒(或您在配置 AuditLoader 时在 `max_batch_interval_sec` 项中指定的时间)以允许 AuditLoader 将审计日志加载到 StarRocks 中。 + +3. 通过查询表检查审计日志。 + + ```SQL + SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__; + ``` + + 以下示例显示审计日志成功加载到表中: + + ```Plain + mysql> SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__\G + *************************** 1. row *************************** + queryId: 01975a33-4129-7520-97a2-05e641cec6c9 + timestamp: 2025-06-10 14:16:37 + queryType: query + clientIp: xxx.xx.xxx.xx:65283 + user: root + authorizedUser: 'root'@'%' + resourceGroup: default_wg + catalog: default_catalog + db: + state: EOF + errorCode: + queryTime: 3 + scanBytes: 0 + scanRows: 0 + returnRows: 1 + cpuCostNs: 33711 + memCostBytes: 4200 + stmtId: 102 + isQuery: 1 + feIp: xxx.xx.xxx.xx + stmt: SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__ + digest: + planCpuCosts: 908 + planMemCosts: 0 + pendingTimeMs: -1 + candidateMvs: null + hitMVs: null + ………… + ``` + +## 故障排除 + +如果在创建动态分区并安装插件后没有审计日志加载到表中,您可以检查 **plugin.conf** 是否配置正确。要修改它,您必须首先卸载插件: + +```SQL +UNINSTALL PLUGIN AuditLoader; +``` + +AuditLoader 的日志打印在 **fe.log** 中,您可以通过在 **fe.log** 中搜索关键字 `audit` 来检索它们。所有配置都正确设置后,您可以按照上述步骤再次安装 AuditLoader。 diff --git a/docs/zh/administration/management/compaction.md b/docs/zh/administration/management/compaction.md new file mode 100644 index 0000000..b1a8f14 --- /dev/null +++ b/docs/zh/administration/management/compaction.md @@ -0,0 +1,303 @@ +--- +displayed_sidebar: docs +--- + +# 共享数据集群的 Compaction + +本主题介绍如何在 StarRocks 共享数据集群中管理 Compaction。 + +## 概述 + +StarRocks 中的每次数据加载操作都会生成新版本的数据文件。Compaction 将不同版本的数据文件合并成更大的文件,从而减少小文件的数量并提高查询效率。 + +## Compaction 分数 + +### 概述 + +*Compaction 分数*反映了分区中数据文件的合并状态。分数越高,表示合并进度越低,这意味着该分区有更多未合并的数据文件版本。FE 维护每个分区的 Compaction 分数信息,包括最大 Compaction 分数(分区中所有 Tablet 的最高分数)。 + +如果一个分区的最大 Compaction 分数低于 FE 参数 `lake_compaction_score_selector_min_score`(默认值:10),则该分区的 Compaction 被视为完成。最大 Compaction 分数超过 100 表示 Compaction 状态不健康。当分数超过 FE 参数 `lake_ingest_slowdown_threshold`(默认值:100)时,系统会减慢该分区的数据加载事务提交速度。如果它超过 `lake_compaction_score_upper_bound`(默认值:2000),系统会拒绝该分区的导入事务。 + +### 计算规则 + +通常,每个数据文件对 Compaction 分数的贡献为 1。例如,如果一个分区有一个 Tablet,并且从第一次加载操作中生成了 10 个数据文件,则该分区的最大 Compaction 分数为 10。一个 Tablet 内由事务生成的所有数据文件都作为 Rowset 进行分组。 + +在分数计算期间,Tablet 的 Rowset 会按大小分组,文件数量最多的组决定了该 Tablet 的 Compaction 分数。 + +例如,一个 Tablet 经历了 7 次加载操作,生成了大小分别为:100 MB、100 MB、100 MB、10 MB、10 MB、10 MB 和 10 MB 的 Rowset。在计算过程中,系统会将三个 100 MB 的 Rowset 组成一组,将四个 10 MB 的 Rowset 组成另一组。Compaction 分数是根据文件数量较多的组计算的。在这种情况下,第二组的 Compaction 分数更大。Compaction 优先处理分数较高的组,因此在第一次 Compaction 后,Rowset 的分布将是:100 MB、100 MB、100 MB 和 40 MB。 + +## Compaction 工作流程 + +对于共享数据集群,StarRocks 引入了一种新的 FE 控制的 Compaction 机制: + +1. 分数计算:Leader FE 节点根据事务发布结果计算并存储分区的 Compaction 分数。 +2. 候选选择:FE 选择具有最高最大 Compaction 分数的分区作为 Compaction 候选。 +3. 任务生成:FE 为选定的分区启动 Compaction 事务,生成 Tablet 级别的子任务,并将其分派给计算节点 (CN),直到达到 FE 参数 `lake_compaction_max_tasks` 设置的限制。 +4. 子任务执行:CN 在后台执行 Compaction 子任务。每个 CN 的并发子任务数量由 CN 参数 `compact_threads` 控制。 +5. 结果收集:FE 聚合子任务结果并提交 Compaction 事务。 +6. 发布:FE 发布成功提交的 Compaction 事务。 + +## 管理 Compaction + +### 查看 Compaction 分数 + +- 您可以使用 SHOW PROC 语句查看特定表中分区的 Compaction 分数。通常,您只需关注 `MaxCS` 字段。如果 `MaxCS` 低于 10,则认为 Compaction 已完成。如果 `MaxCS` 高于 100,则 Compaction 分数相对较高。如果 `MaxCS` 超过 500,则 Compaction 分数非常高,可能需要手动干预。 + + ```Plain + SHOW PARTITIONS FROM + SHOW PROC '/dbs///partitions' + ``` + + 示例: + + ```Plain + mysql> SHOW PROC '/dbs/load_benchmark/store_sales/partitions'; + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | PartitionId | PartitionName | CompactVersion | VisibleVersion | NextVersion | State | PartitionKey | Range | DistributionKey | Buckets | DataSize | RowCount | CacheTTL | AsyncWrite | AvgCS | P50CS | MaxCS | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | 38028 | store_sales | 913 | 921 | 923 | NORMAL | | | ss_item_sk, ss_ticket_number | 64 | 15.6GB | 273857126 | 2592000 | false | 10.00 | 10.00 | 10.00 | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + 1 row in set (0.20 sec) + ``` + +- 您还可以通过查询系统定义的视图 `information_schema.partitions_meta` 来查看分区 Compaction 分数。 + + 示例: + + ```Plain + mysql> SELECT * FROM information_schema.partitions_meta ORDER BY Max_CS LIMIT 10; + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | DB_NAME | TABLE_NAME | PARTITION_NAME | PARTITION_ID | COMPACT_VERSION | VISIBLE_VERSION | VISIBLE_VERSION_TIME | NEXT_VERSION | PARTITION_KEY | PARTITION_VALUE | DISTRIBUTION_KEY | BUCKETS | REPLICATION_NUM | STORAGE_MEDIUM | COOLDOWN_TIME | LAST_CONSISTENCY_CHECK_TIME | IS_IN_MEMORY | IS_TEMP | DATA_SIZE | ROW_COUNT | ENABLE_DATACACHE | AVG_CS | P50_CS | MAX_CS | STORAGE_PATH | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | tpcds_1t | call_center | call_center | 11905 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | cc_call_center_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 12.3KB | 42 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11906/11905 | + | tpcds_1t | web_returns | web_returns | 12030 | 3 | 3 | 2024-03-17 08:40:48 | 4 | | | wr_item_sk, wr_order_number | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 3.5GB | 71997522 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12031/12030 | + | tpcds_1t | warehouse | warehouse | 11847 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | w_warehouse_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 4.2KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11848/11847 | + | tpcds_1t | ship_mode | ship_mode | 11851 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | sm_ship_mode_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 1.7KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11852/11851 | + | tpcds_1t | customer_address | customer_address | 11790 | 0 | 2 | 2024-03-17 08:32:19 | 3 | | | ca_address_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 120.9MB | 6000000 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11791/11790 | + | tpcds_1t | time_dim | time_dim | 11855 | 0 | 2 | 2024-03-17 08:30:48 | 3 | | | t_time_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 864.7KB | 86400 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11856/11855 | + | tpcds_1t | web_sales | web_sales | 12049 | 3 | 3 | 2024-03-17 10:14:20 | 4 | | | ws_item_sk, ws_order_number | 128 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 47.7GB | 720000376 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12050/12049 | + | tpcds_1t | store | store | 11901 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | s_store_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 95.6KB | 1002 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11902/11901 | + | tpcds_1t | web_site | web_site | 11928 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | web_site_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 13.4KB | 54 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11929/11928 | + | tpcds_1t | household_demographics | household_demographics | 11932 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | hd_demo_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 2.1KB | 7200 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11933/11932 | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + ``` + +### 查看 Compaction 任务 + +当新数据加载到系统时,FE 会不断调度 Compaction 任务在不同的 CN 节点上执行。您可以先在 FE 上查看 Compaction 任务的总体状态,然后查看每个任务在 CN 上的执行详情。 + +#### 查看 Compaction 任务的总体状态 + +您可以使用 SHOW PROC 语句查看 Compaction 任务的总体状态。 + +```SQL +SHOW PROC '/compactions'; +``` + +示例: + +```Plain +mysql> SHOW PROC '/compactions'; ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Partition | TxnID | StartTime | CommitTime | FinishTime | Error | Profile | ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| ssb.lineorder.10081 | 15 | 2026-01-10 03:29:07 | 2026-01-10 03:29:11 | 2026-01-10 03:29:12 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":219,"write_remote_sec":4,"in_queue_sec":18} | +| ssb.lineorder.10068 | 16 | 2026-01-10 03:29:07 | 2026-01-10 03:29:13 | 2026-01-10 03:29:14 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":38} | +| ssb.lineorder.10055 | 20 | 2026-01-10 03:29:11 | 2026-01-10 03:29:15 | 2026-01-10 03:29:17 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":23} | ++---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +返回以下字段: + +- `Partition`: `Compaction` 任务所属的分区。 +- `TxnID`: 分配给 `Compaction` 任务的事务 ID。 +- `StartTime`: `Compaction` 任务开始的时间。`NULL` 表示任务尚未启动。 +- `CommitTime`: `Compaction` 任务提交数据的时间。`NULL` 表示数据尚未提交。 +- `FinishTime`: `Compaction` 任务发布数据的时间。`NULL` 表示数据尚未发布。 +- `Error`: `Compaction` 任务的错误信息(如果有)。 +- `Profile`: (从 v3.2.12 和 v3.3.4 开始支持)`Compaction` 任务完成后的 Profile。 + - `sub_task_count`: 分区中的子任务(相当于 Tablet)数量。 + - `read_local_sec`: 所有子任务从本地缓存读取数据的总耗时。单位:秒。 + - `read_local_mb`: 所有子任务从本地缓存读取数据的总大小。单位:MB。 + - `read_remote_sec`: 所有子任务从远端存储读取数据的总耗时。单位:秒。 + - `read_remote_mb`: 所有子任务从远端存储读取数据的总大小。单位:MB。 + - `read_segment_count`: 所有子任务读取的文件总数。 + - `write_segment_count`: 所有子任务生成的新文件总数。 + - `write_segment_mb`: 所有子任务生成的新文件总大小。单位:MB。 + - `write_remote_sec`: 所有子任务写入远端存储的总耗时。单位:秒。 + - `in_queue_sec`: 所有子任务在队列中停留的总时间。单位:秒。 + +#### 查看 Compaction 任务的执行详情 + +每个 Compaction 任务都分为多个子任务,每个子任务对应一个 Tablet。您可以通过查询系统定义的视图 `information_schema.be_cloud_native_compactions` 来查看每个子任务的执行详情。 + +示例: + +```Plain +mysql> SELECT * FROM information_schema.be_cloud_native_compactions; ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| BE_ID | TXN_ID | TABLET_ID | VERSION | SKIPPED | RUNS | START_TIME | FINISH_TIME | PROGRESS | STATUS | PROFILE | ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| 10001 | 51047 | 43034 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51048 | 43032 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":32,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51049 | 43033 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51051 | 43038 | 9 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 84 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +| 10001 | 51052 | 43036 | 12 | 0 | 0 | NULL | NULL | 0 | | | +| 10001 | 51053 | 43035 | 12 | 0 | 1 | 2024-09-24 19:15:16 | NULL | 2 | | {"read_local_sec":0,"read_local_mb":1,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":100,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | ++-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +返回以下字段: + +- `BE_ID`: CN 的 ID。 +- `TXN_ID`: 子任务所属事务的 ID。 +- `TABLET_ID`: 子任务所属 Tablet 的 ID。 +- `VERSION`: Tablet 的版本。 +- `RUNS`: 子任务已执行的次数。 +- `START_TIME`: 子任务开始的时间。 +- `FINISH_TIME`: 子任务完成的时间。 +- `PROGRESS`: Tablet 的 Compaction 进度百分比。 +- `STATUS`: 子任务的状态。如果出现错误,此字段将返回错误消息。 +- `PROFILE`: (从 v3.2.12 和 v3.3.4 开始支持)子任务的运行时 Profile。 + - `read_local_sec`: 子任务从本地缓存读取数据的耗时。单位:秒。 + - `read_local_mb`: 子任务从本地缓存读取数据的大小。单位:MB。 + - `read_remote_sec`: 子任务从远端存储读取数据的耗时。单位:秒。 + - `read_remote_mb`: 子任务从远端存储读取数据的大小。单位:MB。 + - `read_local_count`: 子任务从本地缓存读取数据的次数。 + - `read_remote_count`: 子任务从远端存储读取数据的次数。 + - `in_queue_sec`: 子任务在队列中停留的时间。单位:秒。 + +### 配置 Compaction 任务 + +您可以使用这些 FE 和 CN (BE) 参数配置 Compaction 任务。 + +#### FE 参数 + +您可以动态配置以下 FE 参数。 + +```SQL +ADMIN SET FRONTEND CONFIG ("lake_compaction_max_tasks" = "-1"); +``` + +##### lake_compaction_max_tasks + +- 默认值: -1 +- 类型: Int +- 单位: - +- 是否可变: Yes +- 描述: 共享数据集群中允许的最大并发 Compaction 任务数。将此项设置为 `-1` 表示以自适应方式计算并发任务数,即存活 CN 节点数乘以 16。将此值设置为 `0` 将禁用 Compaction。 +- 引入版本: v3.1.0 + +```SQL +ADMIN SET FRONTEND CONFIG ("lake_compaction_disable_tables" = "11111;22222"); +``` + +##### lake_compaction_disable_tables + +- 默认值:"" +- 类型:String +- 单位:- +- 是否可变:Yes +- 描述:禁用某些表的 Compaction。这不会影响已开始的 Compaction。此项的值是表 ID。多个值用 ';' 分隔。 +- 引入版本:v3.2.7 + +#### CN 参数 + +您可以动态配置以下 CN 参数。 + +```SQL +UPDATE information_schema.be_configs SET VALUE = 8 +WHERE name = "compact_threads"; +``` + +##### compact_threads + +- 默认值: 4 +- 类型: Int +- 单位: - +- 是否可变: Yes +- 描述: 用于并发 Compaction 任务的最大线程数。此配置从 v3.1.7 和 v3.2.2 起变为动态。 +- 引入版本: v3.0.0 + +> **注意** +> +> 在生产环境中,建议将 `compact_threads` 设置为 BE/CN CPU 核心数的 25%。 + +##### max_cumulative_compaction_num_singleton_deltas + +- 默认值: 500 +- 类型: Int +- 单位: - +- 是否可变: Yes +- 描述: 单个 Cumulative Compaction 中可以合并的最大 Segment 数。如果 Compaction 期间发生 OOM,您可以减小此值。 +- 引入版本: - + +> **注意** +> +> 在生产环境中,建议将 `max_cumulative_compaction_num_singleton_deltas` 设置为 `100` 以加速 Compaction 任务并减少其资源消耗。 + +##### lake_pk_compaction_max_input_rowsets + +- 默认值: 500 +- 类型: Int +- 单位: - +- 是否可变: Yes +- 描述: 共享数据集群中 Primary Key 表 Compaction 任务允许的最大输入 Rowset 数。此参数的默认值已从 v3.2.4 和 v3.1.10 起从 `5` 更改为 `1000`,并从 v3.3.1 和 v3.2.9 起更改为 `500`。为 Primary Key 表启用分级 Compaction 策略(通过将 `enable_pk_size_tiered_compaction_strategy` 设置为 `true`)后,StarRocks 无需限制每次 Compaction 的 Rowset 数量以减少写入放大。因此,此参数的默认值已增加。 +- 引入版本: v3.1.8, v3.2.3 + +> **注意** +> +> 在生产环境中,建议将 `max_cumulative_compaction_num_singleton_deltas` 设置为 `100` 以加速 Compaction 任务并减少其资源消耗。 + +### 手动触发 Compaction 任务 + +```SQL +-- Trigger compaction for the whole table. +ALTER TABLE COMPACT; +-- Trigger compaction for a specific partition. +ALTER TABLE COMPACT ; +-- Trigger compaction for multiple partitions. +ALTER TABLE COMPACT (, , ...); +``` + +### 取消 Compaction 任务 + +您可以使用任务的事务 ID 手动取消 Compaction 任务。 + +```SQL +CANCEL COMPACTION WHERE TXN_ID = ; +``` + +> **注意** +> +> - CANCEL COMPACTION 语句必须从 Leader FE 节点提交。 +> - CANCEL COMPACTION 语句仅适用于尚未提交的事务,即 `SHOW PROC '/compactions'` 返回结果中 `CommitTime` 为 NULL 的事务。 +> - CANCEL COMPACTION 是一个异步过程。您可以通过执行 `SHOW PROC '/compactions'` 来检查任务是否已取消。 + +## 最佳实践 + +由于 Compaction 对查询性能至关重要,建议定期监控表和分区的数据合并状态。以下是一些最佳实践和指导: + +- 尝试增加加载之间的时间间隔(避免间隔小于 10 秒的场景),并增加每次加载的批次大小(避免批次大小小于 100 行数据)。 +- 调整 CN 上并行 Compaction worker 线程的数量以加速任务执行。在生产环境中,建议将 `compact_threads` 设置为 BE/CN CPU 核心数的 25%。 +- 使用 `show proc '/compactions'` 和 `select * from information_schema.be_cloud_native_compactions;` 监控 Compaction 任务状态。 +- 监控 Compaction 分数,并根据其配置警报。StarRocks 内置的 Grafana 监控模板包含此指标。 +- 关注 Compaction 期间的资源消耗,尤其是内存使用。Grafana 监控模板也包含此指标。 + +## 故障排除 + +### 慢查询 + +要识别由 Compaction 不及时导致的慢查询,您可以在 SQL Profile 中检查单个 Fragment 内 `SegmentsReadCount` 除以 `TabletCount` 的值。如果该值很大,例如几十或更多,则 Compaction 不及时可能是慢查询的原因。 + +### 集群中最大 Compaction 分数过高 + +1. 使用 `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` 和 `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"` 检查 Compaction 相关参数是否在合理范围内。 +2. 使用 `SHOW PROC '/compactions'` 检查 Compaction 是否卡住: + - 如果 `CommitTime` 保持为 NULL,请检查系统视图 `information_schema.be_cloud_native_compactions` 以查找 Compaction 卡住的原因。 + - 如果 `FinishTime` 保持为 NULL,请使用 `TxnID` 在 Leader FE 日志中搜索发布失败的原因。 +3. 使用 `SHOW PROC '/compactions'` 检查 Compaction 是否运行缓慢: + - 如果 `sub_task_count` 过大(使用 `SHOW PARTITIONS` 检查此分区中每个 Tablet 的大小),则表可能创建不当。 + - 如果 `read_remote_mb` 过大(超过总读取数据的 30%),请检查服务器磁盘大小,并通过 `SHOW BACKENDS` 的 `DataCacheMetrics` 字段检查缓存配额。 + - 如果 `write_remote_sec` 过大(超过总 Compaction 时间的 90%),则写入远端存储可能过慢。这可以通过检查带有关键词 `single upload latency` 和 `multi upload latency` 的共享数据特定监控指标来验证。 + - 如果 `in_queue_sec` 过大(每个 Tablet 的平均等待时间超过 60 秒),则参数设置可能不合理或其它正在运行的 Compaction 任务过慢。 From c142bccab42575f6d8bdea5a41cd5754c860644e Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 11 Feb 2026 01:09:27 +0000 Subject: [PATCH 3/3] docs: automated translation via Gemini [skip ci] --- .../administration/management/BE_blacklist.md | 68 +- .../management/BE_configuration.md | 4571 ++++++------- .../management/Backup_and_restore.md | 256 +- .../management/FE_configuration.md | 156 +- .../management/Scale_up_down.md | 46 +- .../administration/management/audit_loader.md | 88 +- .../administration/management/compaction.md | 194 +- .../management/configuration.mdx | 11 + .../administration/management/enable_fqdn.md | 163 + .../management/graceful_exit.md | 276 + .../administration/management/BE_blacklist.md | 34 +- .../management/BE_configuration.md | 1372 ++-- .../management/Backup_and_restore.md | 226 +- .../management/FE_configuration.md | 5860 ++++++++--------- .../management/Scale_up_down.md | 36 +- .../administration/management/audit_loader.md | 85 +- .../administration/management/compaction.md | 308 +- .../management/configuration.mdx | 11 + .../administration/management/enable_fqdn.md | 163 + .../management/graceful_exit.md | 276 + 20 files changed, 7565 insertions(+), 6635 deletions(-) create mode 100644 docs/ja/administration/management/configuration.mdx create mode 100644 docs/ja/administration/management/enable_fqdn.md create mode 100644 docs/ja/administration/management/graceful_exit.md create mode 100644 docs/zh/administration/management/configuration.mdx create mode 100644 docs/zh/administration/management/enable_fqdn.md create mode 100644 docs/zh/administration/management/graceful_exit.md diff --git a/docs/ja/administration/management/BE_blacklist.md b/docs/ja/administration/management/BE_blacklist.md index d2637aa..289d7ff 100644 --- a/docs/ja/administration/management/BE_blacklist.md +++ b/docs/ja/administration/management/BE_blacklist.md @@ -4,67 +4,67 @@ displayed_sidebar: docs # BEおよびCNブラックリストの管理 -v3.3.0以降、StarRocksはBEブラックリスト機能に対応しており、クエリ実行における特定のBEノードの使用を禁止することで、BEノードへの接続失敗に起因する頻繁なクエリ失敗やその他の予期せぬ動作を回避できます。1つまたは複数のBEへの接続を妨げるネットワークの問題は、ブラックリストを使用する例となるでしょう。 +v3.3.0以降、StarRocksはBEブラックリスト機能をサポートしており、クエリ実行における特定のBEノードの使用を禁止することで、BEノードへの接続失敗によって引き起こされる頻繁なクエリ失敗やその他の予期せぬ動作を回避できます。1つ以上のBEへの接続を妨げるネットワーク問題が、ブラックリストを使用する例となるでしょう。 -v4.0以降、StarRocksはCompute Nodes (CNs) をブラックリストに追加する機能に対応しています。 +v4.0以降、StarRocksはCompute Node(CN)をブラックリストに追加することをサポートしています。 -デフォルトでは、StarRocksはBEおよびCNブラックリストを自動的に管理し、接続が失われたBEまたはCNノードをブラックリストに追加し、接続が再確立されるとブラックリストから削除します。ただし、手動でブラックリストに追加されたノードは、StarRocksによってブラックリストから削除されません。 +デフォルトでは、StarRocksはBEおよびCNブラックリストを自動的に管理し、接続が失われたBEまたはCNノードをブラックリストに追加し、接続が再確立されたときにブラックリストから削除します。ただし、手動でブラックリストに追加されたノードは、StarRocksによってブラックリストから削除されることはありません。 :::note -- SYSTEM-level BLACKLIST権限を持つユーザーのみがこの機能を使用できます。 -- 各FEノードは独自のBEおよびCNブラックリストを保持し、他のFEノードと共有しません。 +- SYSTEMレベルのBLACKLIST権限を持つユーザーのみがこの機能を使用できます。 +- 各FEノードは独自のBEおよびCNブラックリストを保持し、他のFEノードとは共有しません。 ::: -## BE/CNをブラックリストに追加 +## BE/CNをブラックリストに追加する -[ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md) を使用して、BE/CNノードを手動でブラックリストに追加できます。このステートメントでは、ブラックリストに追加するBE/CNノードのIDを指定する必要があります。[SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md) を実行してBE IDを、[SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md) を実行してCN IDを取得できます。 +[ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md)を使用して、BE/CNノードを手動でブラックリストに追加できます。このステートメントでは、ブラックリストに追加するBE/CNノードのIDを指定する必要があります。[SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md)を実行してBE IDを、[SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md)を実行してCN IDを取得できます。 -例: +例: ```SQL --- BE IDを取得します。 +-- BE IDを取得する。 SHOW BACKENDS\G *************************** 1. row *************************** BackendId: 10001 IP: xxx.xx.xx.xxx ... --- BEをブラックリストに追加します。 +-- BEをブラックリストに追加する。 ADD BACKEND BLACKLIST 10001; --- CN IDを取得します。 +-- CN IDを取得する。 SHOW COMPUTE NODES\G *************************** 1. row *************************** ComputeNodeId: 10005 IP: xxx.xx.xx.xxx ... --- CNをブラックリストに追加します。 +-- CNをブラックリストに追加する。 ADD COMPUTE NODE BLACKLIST 10005; ``` -## ブラックリストからBE/CNを削除 +## ブラックリストからBE/CNを削除する -[DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md) を使用して、BE/CNノードを手動でブラックリストから削除できます。このステートメントでは、BE/CNノードのIDも指定する必要があります。 +[DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md)を使用して、BE/CNノードを手動でブラックリストから削除できます。このステートメントでは、BE/CNノードのIDも指定する必要があります。 -例: +例: ```SQL --- ブラックリストからBEを削除します。 +-- ブラックリストからBEを削除する。 DELETE BACKEND BLACKLIST 10001; --- ブラックリストからCNを削除します。 +-- ブラックリストからCNを削除する。 DELETE COMPUTE NODE BLACKLIST 10005; ``` -## BE/CNブラックリストを表示 +## BE/CNブラックリストを表示する -[SHOW BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKEND_BLACKLIST.md) を使用して、ブラックリスト内のBE/CNノードを表示できます。 +[SHOW BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKEND_BLACKLIST.md)を使用して、ブラックリスト内のBE/CNノードを表示できます。 -例: +例: ```SQL --- BEブラックリストを表示します。 +-- BEブラックリストを表示する。 SHOW BACKEND BLACKLIST; +-----------+------------------+---------------------+------------------------------+--------------------+ | BackendId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | @@ -72,7 +72,7 @@ SHOW BACKEND BLACKLIST; | 10001 | MANUAL | 2024-04-28 11:52:09 | 0 | 5 | +-----------+------------------+---------------------+------------------------------+--------------------+ --- CNブラックリストを表示します。 +-- CNブラックリストを表示する。 SHOW COMPUTE NODE BLACKLIST; +---------------+------------------+---------------------+------------------------------+--------------------+ | ComputeNodeId | AddBlackListType | LostConnectionTime | LostConnectionNumberInPeriod | CheckTimePeriod(s) | @@ -81,22 +81,22 @@ SHOW COMPUTE NODE BLACKLIST; +---------------+------------------+---------------------+------------------------------+--------------------+ ``` -以下のフィールドが返されます。 +以下のフィールドが返されます: -- `AddBlackListType`: BE/CNノードがブラックリストに追加された方法を示します。`MANUAL` はユーザーによって手動でブラックリストに追加されたことを示します。`AUTO` はStarRocksによって自動的にブラックリストに追加されたことを示します。 +- `AddBlackListType`: BE/CNノードがブラックリストに追加された方法。`MANUAL`はユーザーによって手動でブラックリストに追加されたことを示します。`AUTO`はStarRocksによって自動的にブラックリストに追加されたことを示します。 - `LostConnectionTime`: - - `MANUAL` タイプの場合、BE/CNノードが手動でブラックリストに追加された時刻を示します。 - - `AUTO` タイプの場合、最後の接続が成功した時刻を示します。 -- `LostConnectionNumberInPeriod`: `CheckTimePeriod(s)` の期間内に検出された切断の回数です。これは、StarRocksがブラックリスト内のBE/CNノードの接続状態をチェックする間隔です。 -- `CheckTimePeriod(s)`: StarRocksがブラックリスト内のBE/CNノードの接続状態をチェックする間隔です。その値は、FE構成項目 `black_host_history_sec` に指定された値として評価されます。単位:秒。 + - `MANUAL`タイプの場合、BE/CNノードが手動でブラックリストに追加された時間を示します。 + - `AUTO`タイプの場合、最後に正常な接続が確立された時間を示します。 +- `LostConnectionNumberInPeriod`: `CheckTimePeriod(s)`内に検出された切断の数。これは、StarRocksがブラックリスト内のBE/CNノードの接続状態をチェックする間隔です。 +- `CheckTimePeriod(s)`: StarRocksがブラックリストに登録されたBE/CNノードの接続状態をチェックする間隔。その値は、FE構成項目`black_host_history_sec`に指定した値に評価されます。単位: 秒。 -## BE/CNブラックリストの自動管理を設定 +## BE/CNブラックリストの自動管理を設定する -BE/CNノードがFEノードへの接続を失うか、BE/CNノードでのタイムアウトによりクエリが失敗するたびに、FEノードはBE/CNノードをBEおよびCNブラックリストに追加します。FEノードは、一定期間内の接続失敗数をカウントすることで、ブラックリスト内のBE/CNノードの接続性を常に評価します。StarRocksは、接続失敗数が事前に指定されたしきい値を下回る場合にのみ、ブラックリスト内のBE/CNノードを削除します。 +BE/CNノードがFEノードとの接続を失うたび、またはBE/CNノードでのタイムアウトによりクエリが失敗するたびに、FEノードはそのBE/CNノードを独自のBEおよびCNブラックリストに追加します。FEノードは、一定期間内の接続失敗数を数えることにより、ブラックリスト内のBE/CNノードの接続性を常に評価します。StarRocksは、接続失敗数が事前に指定されたしきい値を下回る場合にのみ、ブラックリストに登録されたBE/CNノードを削除します。 -以下の[FE設定](./FE_configuration.md)を使用して、BEおよびCNブラックリストの自動管理を設定できます。 +以下の[FE構成](./FE_configuration.md)を使用して、BEおよびCNブラックリストの自動管理を設定できます。 -- `black_host_history_sec`: ブラックリスト内のBE/CNノードの接続失敗履歴を保持する期間です。 -- `black_host_connect_failures_within_time`: ブラックリスト内のBE/CNノードに許容される接続失敗のしきい値です。 +- `black_host_history_sec`: ブラックリスト内のBE/CNノードの過去の接続失敗を保持する期間。 +- `black_host_connect_failures_within_time`: ブラックリストに登録されたBE/CNノードに許容される接続失敗のしきい値。 -BE/CNノードが自動的にブラックリストに追加された場合、StarRocksはその接続性を評価し、ブラックリストから削除できるかどうかを判断します。`black_host_history_sec` の期間内に、ブラックリスト内のBE/CNノードの接続失敗が `black_host_connect_failures_within_time` で設定されたしきい値を下回る場合にのみ、ブラックリストから削除できます。 +BE/CNノードが自動的にブラックリストに追加された場合、StarRocksはその接続性を評価し、ブラックリストから削除できるかどうかを判断します。`black_host_history_sec`の期間内で、ブラックリストに登録されたBE/CNノードの接続失敗数が`black_host_connect_failures_within_time`で設定されたしきい値よりも少ない場合にのみ、ブラックリストから削除できます。 diff --git a/docs/ja/administration/management/BE_configuration.md b/docs/ja/administration/management/BE_configuration.md index b330886..c87567e 100644 --- a/docs/ja/administration/management/BE_configuration.md +++ b/docs/ja/administration/management/BE_configuration.md @@ -16,3458 +16,3459 @@ import StaticBEConfigNote from '../../_assets/commonMarkdown/StaticBE_config_not -## BE構成項目の表示 -以下のコマンドを使用して、BE構成項目を表示できます。 +## BE設定項目の表示 + +以下のコマンドを使用して、BEの設定項目を表示できます。 ```shell curl http://:/varz ``` -## BEパラメータの設定 +## BEパラメーターの設定 -## BEパラメータの理解 +## BEパラメーターの理解 ### ロギング ##### diagnose_stack_trace_interval_ms -- Default: 1800000 (30 minutes) -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: `STACK_TRACE` リクエストに対してDiagnoseDaemonが実行する連続したスタックトレース診断の最小時間間隔を制御します。診断リクエストが到着すると、前回の収集から `diagnose_stack_trace_interval_ms` ミリ秒未満の場合、デーモンはスタックトレースの収集とロギングをスキップします。この値を増やすと、頻繁なスタックダンプによるCPUオーバーヘッドとログ量を減らすことができます。値を減らすと、一時的な問題(たとえば、長い `TabletsChannel::add_chunk` ブロッキングのロードフェイルポイントシミュレーションなど)をデバッグするためにより頻繁なトレースをキャプチャできます。 -- Introduced in: v3.5.0 +- デフォルト: 1800000 (30分) +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: DiagnoseDaemonが`STACK_TRACE`リクエストに対して実行する連続したスタックトレース診断間の最小時間間隔を制御します。診断リクエストが到着すると、前回の収集から`diagnose_stack_trace_interval_ms`ミリ秒が経過していない場合、デーモンはスタックトレースの収集とロギングをスキップします。この値を増やすと、頻繁なスタックダンプによるCPUオーバーヘッドとログ量を削減できます。この値を減らすと、一時的な問題(例えば、長時間ブロックする`TabletsChannel::add_chunk`の負荷障害点シミュレーションなど)をデバッグするためにより頻繁なトレースをキャプチャできます。 +- 導入バージョン: v3.5.0 ##### lake_replication_slow_log_ms -- Default: 30000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: Lakeレプリケーション中にスローログエントリを出力するための閾値。各ファイルコピーの後、コードは経過時間をマイクロ秒で測定し、経過時間が `lake_replication_slow_log_ms * 1000` 以上の場合、その操作をスローとマークします。トリガーされると、StarRocksはそのレプリケートされたファイルのファイルサイズ、コスト、およびトレースメトリクスを含むINFOログを書き込みます。この値を増やすと、大規模/低速転送によるノイズの多いスローログを減らすことができます。値を減らすと、より小さな低速コピーイベントをより早く検出して表面化させることができます。 -- Introduced in: - +- デフォルト: 30000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: Lakeのレプリケーション中にスローログエントリを出力するための閾値。各ファイルコピー後、コードは経過時間をマイクロ秒単位で測定し、経過時間が`lake_replication_slow_log_ms * 1000`以上の場合、操作をスローとマークします。トリガーされると、StarRocksはファイルサイズ、コスト、トレースメトリクスを含むINFOログをレプリケートされたファイルに対して書き込みます。大きな/遅い転送でノイズの多いスローログを減らすにはこの値を増やし、より小さな遅いコピーイベントをより早く検出して表面化するにはこの値を減らします。 +- 導入バージョン: - ##### load_rpc_slow_log_frequency_threshold_seconds -- Default: 60 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 設定されたRPCタイムアウトを超えるロードRPCのスローログエントリをシステムがどのくらいの頻度で出力するかを制御します。スローログにはロードチャネルのランタイムプロファイルも含まれます。この値を0に設定すると、実際にはタイムアウトごとにログが記録されます。 -- Introduced in: v3.4.3, v3.5.0 +- デフォルト: 60 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 設定されたRPCタイムアウトを超えるロードRPCのスローログエントリがシステムによってどれくらいの頻度で出力されるかを制御します。スローログにはロードチャネルのランタイムプロファイルも含まれます。この値を0に設定すると、実際にはタイムアウトごとのロギングが行われます。 +- 導入バージョン: v3.4.3, v3.5.0 ##### log_buffer_level -- Default: Empty string -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: ログをフラッシュする戦略。デフォルト値は、ログがメモリにバッファリングされることを示します。有効な値は `-1` と `0` です。`-1` は、ログがメモリにバッファリングされないことを示します。 -- Introduced in: - +- デフォルト: 空文字列 +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: ログをフラッシュする戦略。デフォルト値は、ログがメモリにバッファリングされることを示します。有効な値は`-1`と`0`です。`-1`は、ログがメモリにバッファリングされないことを示します。 +- 導入バージョン: - ##### pprof_profile_dir -- Default: `${STARROCKS_HOME}/log` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: StarRocksがpprofアーティファクト(Jemallocヒープスナップショットおよびgperftools CPUプロファイル)を書き込むディレクトリパス。 -- Introduced in: v3.2.0 +- デフォルト: `${STARROCKS_HOME}/log` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: StarRocksがpprofアーティファクト(Jemallocヒープスナップショットとgperftools CPUプロファイル)を書き込むディレクトリパス。 +- 導入バージョン: v3.2.0 ##### sys_log_dir -- Default: `${STARROCKS_HOME}/log` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: システムログ(INFO、WARNING、ERROR、FATALを含む)を保存するディレクトリ。 -- Introduced in: - +- デフォルト: `${STARROCKS_HOME}/log` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: システムログ(INFO、WARNING、ERROR、FATALを含む)を格納するディレクトリ。 +- 導入バージョン: - ##### sys_log_level -- Default: INFO -- Type: String -- Unit: - -- Is mutable: はい (v3.3.0, v3.2.7, および v3.1.12から) -- Description: システムログエントリが分類される重大度レベル。有効な値:INFO、WARN、ERROR、FATAL。この項目はv3.3.0、v3.2.7、およびv3.1.12以降、動的構成に変更されました。 -- Introduced in: - +- デフォルト: INFO +- タイプ: String +- 単位: - +- 変更可能: はい (v3.3.0, v3.2.7, v3.1.12以降) +- 説明: システムログエントリが分類される重大度レベル。有効な値: INFO、WARN、ERROR、FATAL。この項目はv3.3.0、v3.2.7、v3.1.12以降、動的設定に変更されました。 +- 導入バージョン: - ##### sys_log_roll_mode -- Default: SIZE-MB-1024 -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: システムログがログロールに分割されるモード。有効な値には `TIME-DAY`、`TIME-HOUR`、`SIZE-MB-`サイズ が含まれます。デフォルト値は、ログが1GBのロールに分割されることを示します。 -- Introduced in: - +- デフォルト: SIZE-MB-1024 +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: システムログがログロールに分割されるモード。有効な値には`TIME-DAY`、`TIME-HOUR`、`SIZE-MB-`サイズが含まれます。デフォルト値は、ログがそれぞれ1 GBのロールに分割されることを示します。 +- 導入バージョン: - ##### sys_log_roll_num -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 保持するログロールの数。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 予約するログロールの数。 +- 導入バージョン: - ##### sys_log_timezone -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: ログプレフィックスにタイムゾーン情報を表示するかどうか。`true` はタイムゾーン情報を表示することを示し、`false` は表示しないことを示します。 -- Introduced in: - +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: ログプレフィックスにタイムゾーン情報を表示するかどうか。`true`はタイムゾーン情報を表示することを示し、`false`は表示しないことを示します。 +- 導入バージョン: - ##### sys_log_verbose_level -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 出力するログのレベル。この構成項目は、コード内のVLOGで開始されるログの出力を制御するために使用されます。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 出力するログのレベル。この設定項目は、コード内のVLOGで開始されるログの出力を制御するために使用されます。 +- 導入バージョン: - ##### sys_log_verbose_modules -- Default: -- Type: Strings -- Unit: - -- Is mutable: いいえ -- Description: 出力するログのモジュール。たとえば、この構成項目をOLAPに設定すると、StarRocksはOLAPモジュールのログのみを出力します。有効な値はBEのネームスペースであり、`starrocks`、`starrocks::debug`、`starrocks::fs`、`starrocks::io`、`starrocks::lake`、`starrocks::pipeline`、`starrocks::query_cache`、`starrocks::stream`、および `starrocks::workgroup` が含まれます。 -- Introduced in: - +- デフォルト: +- タイプ: Strings +- 単位: - +- 変更可能: いいえ +- 説明: 出力するログのモジュール。例えば、この設定項目をOLAPに設定すると、StarRocksはOLAPモジュールのログのみを出力します。有効な値は、`starrocks`、`starrocks::debug`、`starrocks::fs`、`starrocks::io`、`starrocks::lake`、`starrocks::pipeline`、`starrocks::query_cache`、`starrocks::stream`、`starrocks::workgroup`を含むBEの名前空間です。 +- 導入バージョン: - ### サーバー ##### abort_on_large_memory_allocation -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 単一の割り当てリクエストが設定された大規模割り当て閾値(`g_large_memory_alloc_failure_threshold` > 0 かつリクエストサイズ > 閾値)を超えた場合、このフラグがプロセス応答を制御します。trueの場合、このような大規模割り当てが検出されると、StarRocksは直ちに `std::abort()` を呼び出します(ハードクラッシュ)。falseの場合、割り当てはブロックされ、アロケータは失敗(nullptrまたはENOMEM)を返すため、呼び出し元はエラーを処理できます。このチェックは、TRY_CATCH_BAD_ALLOCパスでラップされていない割り当てにのみ適用されます(bad-allocがキャッチされている場合、memフックは異なるフローを使用します)。予期しない巨大な割り当ての迅速なデバッグのために有効にします。プロダクション環境では、過大な割り当て試行で即座にプロセスを停止させたい場合を除き、無効にしてください。 -- Introduced in: v3.4.3, 3.5.0, 4.0.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 単一の割り当てリクエストが設定された大規模割り当て閾値(`g_large_memory_alloc_failure_threshold > 0`かつリクエストサイズが閾値を超える)を超えた場合、このフラグはプロセスの応答を制御します。trueの場合、このような大規模割り当てが検出されると、StarRocksは直ちに`std::abort()`を呼び出します(ハードクラッシュ)。falseの場合、割り当てはブロックされ、アロケータは失敗(nullptrまたはENOMEM)を返すため、呼び出し元はエラーを処理できます。このチェックは、TRY_CATCH_BAD_ALLOCパスでラップされていない割り当てにのみ適用されます(bad-allocが捕捉されている場合、メモリフックは異なるフローを使用します)。予期しない巨大な割り当てを高速にデバッグするために有効にしますが、過大な割り当て試行時に即座にプロセスを中止したい場合を除き、本番環境では無効にしておきます。 +- 導入バージョン: v3.4.3, 3.5.0, 4.0.0 ##### arrow_flight_port -- Default: -1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BE Arrow Flight SQLサーバーのTCPポート。`-1` はArrow Flightサービスを無効にすることを示します。macOS以外のビルドでは、BEはこのポートでArrow Flight SQL Serverを起動時に呼び出します。ポートが利用できない場合、サーバーの起動は失敗し、BEプロセスは終了します。設定されたポートは、ハートビートペイロードでFEに報告されます。 -- Introduced in: v3.4.0, v3.5.0 +- デフォルト: -1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BE Arrow Flight SQLサーバー用のTCPポート。`-1`はArrow Flightサービスを無効にすることを示します。macOS以外のビルドでは、BEは起動時にこのポートでArrow Flight SQL Serverを呼び出します。ポートが利用できない場合、サーバーの起動は失敗し、BEプロセスは終了します。設定されたポートは、ハートビートペイロードでFEに報告されます。 +- 導入バージョン: v3.4.0, v3.5.0 ##### be_exit_after_disk_write_hang_second -- Default: 60 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: ディスクがハングした後にBEが終了するまで待機する時間。 -- Introduced in: - +- デフォルト: 60 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: ディスクがハングアップした後、BEが終了するまでの待機時間。 +- 導入バージョン: - ##### be_http_num_workers -- Default: 48 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: HTTPサーバーが使用するスレッド数。 -- Introduced in: - +- デフォルト: 48 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: HTTPサーバーが使用するスレッド数。 +- 導入バージョン: - ##### be_http_port -- Default: 8040 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BE HTTPサーバーのポート。 -- Introduced in: - +- デフォルト: 8040 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BE HTTPサーバーのポート。 +- 導入バージョン: - ##### be_port -- Default: 9060 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: FEからのリクエストを受信するために使用されるBE Thriftサーバーのポート。 -- Introduced in: - +- デフォルト: 9060 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: FEからのリクエストを受信するために使用されるBE Thriftサーバーのポート。 +- 導入バージョン: - ##### be_service_threads -- Default: 64 -- Type: Int -- Unit: スレッド -- Is mutable: いいえ -- Description: BE ThriftサーバーがバックエンドのRPC/実行リクエストを処理するために使用するワーカースレッドの数。この値はBackendServiceの作成時にThriftServerに渡され、利用可能な同時リクエストハンドラの数を制御します。すべてのワーカースレッドがビジーの場合、リクエストはキューに入れられます。予想される同時RPC負荷と利用可能なCPU/メモリに基づいて調整してください。値を増やすと同時実行性が向上しますが、スレッドごとのメモリとコンテキスト切り替えのコストが増加します。値を減らすと並列処理が制限され、リクエストのレイテンシが増加する可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: 64 +- タイプ: Int +- 単位: スレッド +- 変更可能: いいえ +- 説明: BE ThriftサーバーがバックエンドRPC/実行リクエストを処理するために使用するワーカー・スレッドの数。この値はBackendServiceの作成時にThriftServerに渡され、利用可能な同時リクエストハンドラーの数を制御します。すべてのワーカー・スレッドがビジーの場合、リクエストはキューに入れられます。予想される同時RPC負荷と利用可能なCPU/メモリに基づいて調整します。これを増やすと並行処理が増加しますが、スレッドごとのメモリとコンテキスト切り替えのコストも増加します。これを減らすと並行処理が制限され、リクエストのレイテンシが増加する可能性があります。 +- 導入バージョン: v3.2.0 ##### brpc_connection_type -- Default: `"single"` -- Type: string -- Unit: - -- Is mutable: いいえ -- Description: bRPCチャネルの接続モード。有効な値: - - `"single"` (デフォルト):各チャネルに1つの永続的なTCP接続。 - - `"pooled"`:より高い同時実行性のために永続的な接続のプールを使用しますが、ソケット/ファイルディスクリプタのコストが増加します。 - - `"short"`:永続的なリソース使用量を減らすためにRPCごとに作成される短寿命の接続ですが、レイテンシが高くなります。 - 選択はソケットごとのバッファリング動作に影響し、未書き込みバイトがソケット制限を超える場合の `Socket.Write` の失敗(EOVERCROWDED)に影響を与える可能性があります。 -- Introduced in: v3.2.5 +- デフォルト: `"single"` +- タイプ: string +- 単位: - +- 変更可能: いいえ +- 説明: bRPCチャネルの接続モード。有効な値: + - `"single"` (デフォルト): 各チャネルに1つの永続的なTCP接続。 + - `"pooled"`: より高い並行処理のための永続接続のプール(ソケット/ファイルディスクリプタのコストは増加)。 + - `"short"`: 永続的なリソース使用量を削減するが、レイテンシが高いRPCごとに作成される短寿命の接続。 + 選択はソケットごとのバッファリング動作に影響し、未書き込みバイトがソケットの制限を超えた場合の`Socket.Write`エラー(EOVERCROWDED)に影響を与える可能性があります。 +- 導入バージョン: v3.2.5 ##### brpc_max_body_size -- Default: 2147483648 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: bRPCの最大ボディサイズ。 -- Introduced in: - +- デフォルト: 2147483648 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: bRPCの最大ボディサイズ。 +- 導入バージョン: - ##### brpc_max_connections_per_server -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: クライアントが各リモートサーバーエンドポイントに対して保持する永続的なbRPC接続の最大数。各エンドポイントについて、`BrpcStubCache` は `StubPool` を作成し、その `_stubs` ベクトルはこのサイズに予約されます。最初のアクセスでは、制限に達するまで新しいスタブが作成されます。その後、既存のスタブはラウンドロビン方式で返されます。この値を増やすと、エンドポイントごとの同時実行性が向上しますが(単一チャネルでの競合が減少)、ファイルディスクリプタ、メモリ、およびチャネルのコストが増加します。 -- Introduced in: v3.2.0 +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: クライアントが各リモートサーバーエンドポイントに対して保持する永続的なbRPC接続の最大数。各エンドポイントに対して`BrpcStubCache`は`StubPool`を作成し、その`_stubs`ベクターはこのサイズに予約されます。最初のアクセスでは、制限に達するまで新しいスタブが作成されます。その後、既存のスタブがラウンドロビン方式で返されます。この値を増やすと、エンドポイントごとの並行処理が増加し(単一チャネルでの競合が減少)、ファイルディスクリプタ、メモリ、チャネルのコストが増加します。 +- 導入バージョン: v3.2.0 ##### brpc_num_threads -- Default: -1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: bRPCのbthread数。値 `-1` はCPUスレッドと同じ数を示します。 -- Introduced in: - +- デフォルト: -1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: bRPCのbスレッドの数。値`-1`はCPUスレッドと同じ数であることを示します。 +- 導入バージョン: - ##### brpc_port -- Default: 8060 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: bRPCのネットワーク統計を表示するために使用されるBE bRPCポート。 -- Introduced in: - +- デフォルト: 8060 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: bRPCのネットワーク統計を表示するために使用されるBE bRPCポート。 +- 導入バージョン: - ##### brpc_socket_max_unwritten_bytes -- Default: 1073741824 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: bRPCサーバーにおける未書き込みの送信バイトのソケットごとの制限を設定します。ソケットにバッファリングされ、まだ書き込まれていないデータの量がこの制限に達すると、後続の `Socket.Write` 呼び出しはEOVERCROWDEDで失敗します。これにより、接続ごとのメモリの無制限の増加が防止されますが、非常に大きなメッセージや低速なピアの場合にRPC送信の失敗を引き起こす可能性があります。単一メッセージのボディが許可される未書き込みバッファよりも大きくならないように、この値を `brpc_max_body_size` と一致させてください。値を増やすと、接続ごとのメモリ使用量が増加します。 -- Introduced in: v3.2.0 +- デフォルト: 1073741824 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: bRPCサーバーにおけるソケットあたりの未書き込み送信バイトの制限を設定します。ソケット上のバッファリングされた未書き込みデータの量がこの制限に達すると、以降の`Socket.Write`呼び出しはEOVERCROWDEDで失敗します。これにより、接続あたりのメモリの無限の増加を防ぎますが、非常に大きなメッセージや遅いピアの場合にRPC送信エラーが発生する可能性があります。この値を`brpc_max_body_size`に合わせて、単一メッセージのボディが許可される未書き込みバッファよりも大きくならないようにしてください。値を増やすと、接続あたりのメモリ使用量が増加します。 +- 導入バージョン: v3.2.0 ##### brpc_stub_expire_s -- Default: 3600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: bRPCスタブキャッシュの有効期限。デフォルト値は60分です。 -- Introduced in: - +- デフォルト: 3600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: bRPCスタブキャッシュの有効期限。デフォルト値は60分です。 +- 導入バージョン: - ##### compress_rowbatches -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: BE間のRPCでローバッチを圧縮するかどうかを制御するブール値。`true` はローバッチを圧縮することを示し、`false` は圧縮しないことを示します。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: BE間のRPCでロウバッチを圧縮するかどうかを制御するブール値。`true`はロウバッチを圧縮することを示し、`false`は圧縮しないことを示します。 +- 導入バージョン: - ##### consistency_max_memory_limit_percent -- Default: 20 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 一貫性関連タスクのメモリ予算を計算するために使用されるパーセンテージキャップ。BE起動時、最終的な一貫性制限は `consistency_max_memory_limit` (バイト) から解析された値と (`process_mem_limit * consistency_max_memory_limit_percent / 100`) の最小値として計算されます。`process_mem_limit` が未設定 (-1) の場合、一貫性メモリは無制限と見なされます。`consistency_max_memory_limit_percent` の場合、0未満または100より大きい値は100として扱われます。この値を調整すると、一貫性操作のために予約されるメモリが増減し、したがってクエリや他のサービスで利用可能なメモリに影響します。 -- Introduced in: v3.2.0 +- デフォルト: 20 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 一貫性関連タスクのメモリ予算を計算するために使用されるパーセンテージキャップ。BEの起動中、最終的な一貫性制限は、`consistency_max_memory_limit`(バイト)から解析された値と(`process_mem_limit * consistency_max_memory_limit_percent / 100`)の最小値として計算されます。`process_mem_limit`が設定されていない場合(-1)、一貫性メモリは無制限と見なされます。`consistency_max_memory_limit_percent`の場合、0未満または100を超える値は100として扱われます。この値を調整すると、一貫性操作のために予約されるメモリが増減し、クエリやその他のサービスで利用可能なメモリに影響します。 +- 導入バージョン: v3.2.0 ##### delete_worker_count_normal_priority -- Default: 2 -- Type: Int -- Unit: スレッド -- Is mutable: いいえ -- Description: BEエージェントで削除(REALTIME_PUSH with DELETE)タスクを処理するために割り当てられた通常優先度のワーカースレッドの数。起動時にこの値は `delete_worker_count_high_priority` に追加され、`DeleteTaskWorkerPool` のサイズが決定されます(`agent_server.cpp` を参照)。プールは最初の `delete_worker_count_high_priority` スレッドをHIGH優先度として割り当て、残りをNORMAL優先度として割り当てます。通常優先度のスレッドは標準の削除タスクを処理し、全体的な削除スループットに貢献します。並列削除容量を増やすには(CPU/IO使用量の増加)、この値を増やします。リソース競合を減らすには、この値を減らします。 -- Introduced in: v3.2.0 +- デフォルト: 2 +- タイプ: Int +- 単位: スレッド +- 変更可能: いいえ +- 説明: BEエージェント上で削除(DELETEを含むREALTIME_PUSH)タスクを処理するために割り当てられた通常優先度ワーカー・スレッドの数。起動時、この値は`delete_worker_count_high_priority`に追加され、`DeleteTaskWorkerPool`のサイズが決定されます(agent_server.cppを参照)。プールは最初の`delete_worker_count_high_priority`スレッドをHIGH優先度として割り当て、残りをNORMAL優先度として割り当てます。通常優先度スレッドは標準の削除タスクを処理し、全体的な削除スループットに貢献します。並行削除容量を増やすにはこの値を増やし(CPU/IO使用量が増加)、リソース競合を減らすにはこの値を減らします。 +- 導入バージョン: v3.2.0 ##### disable_mem_pools -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: MemPoolを無効にするかどうか。この項目が `true` に設定されている場合、MemPoolのチャンクプーリングは無効になり、各割り当ては再利用またはプールされたチャンクを増やす代わりに独自のサイズのチャンクを取得します。プーリングを無効にすると、より頻繁な割り当て、チャンク数の増加、およびスキップされた整合性チェック(チャンク数が多いため回避される)のコストで、長期間保持されるバッファメモリが削減されます。割り当ての再利用とシステム呼び出しの減少の恩恵を受けるために、`disable_mem_pools` を `false`(デフォルト)のままにしてください。大規模なプールされたメモリ保持を避けなければならない場合(たとえば、メモリの少ない環境や診断実行の場合)にのみ `true` に設定してください。 -- Introduced in: v3.2.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: MemPoolを無効にするかどうか。この項目が`true`に設定されている場合、MemPoolのチャンクプーリングは無効になり、各割り当ては再利用またはプールされたチャンクを増やすのではなく、独自のサイズのチャンクを取得します。プーリングを無効にすると、より頻繁な割り当て、チャンク数の増加、およびスキップされる整合性チェック(チャンク数が多いため回避される)のコストで、長期間保持されるバッファメモリが削減されます。割り当ての再利用とシステム呼び出しの削減の恩恵を受けるために、`disable_mem_pools`は`false`(デフォルト)のままにしてください。大規模なプールされたメモリの保持を避けたい場合(例えば、低メモリ環境や診断実行など)にのみ`true`に設定してください。 +- 導入バージョン: v3.2.0 ##### enable_https -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: この項目が `true` に設定されている場合、BEのbRPCサーバーはTLSを使用するように構成されます。`ServerOptions.ssl_options` は、BE起動時に `ssl_certificate_path` と `ssl_private_key_path` で指定された証明書と秘密鍵で設定されます。これにより、着信bRPC接続に対してHTTPS/TLSが有効になります。クライアントはTLSを使用して接続する必要があります。証明書と鍵ファイルが存在し、BEプロセスからアクセス可能であり、bRPC/SSLの期待に合致していることを確認してください。 -- Introduced in: v4.0.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: この項目が`true`に設定されている場合、BEのbRPCサーバーはTLSを使用するように構成されます。BEの起動時に、`ServerOptions.ssl_options`には`ssl_certificate_path`と`ssl_private_key_path`で指定された証明書と秘密鍵が設定されます。これにより、受信bRPC接続に対してHTTPS/TLSが有効になります。クライアントはTLSを使用して接続する必要があります。証明書と鍵ファイルが存在し、BEプロセスからアクセス可能であり、bRPC/SSLの要件を満たしていることを確認してください。 +- 導入バージョン: v4.0.0 ##### enable_jemalloc_memory_tracker -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: この項目が `true` に設定されている場合、BEはバックグラウンドスレッド(jemalloc_tracker_daemon)を起動し、jemalloc統計を(1秒に1回)ポーリングし、GlobalEnv jemallocメタデータMemTrackerをjemalloc "stats.metadata" 値で更新します。これにより、jemallocメタデータ消費がStarRocksプロセスメモリ会計に含まれ、jemalloc内部で使用されるメモリの過少報告が防止されます。トラッカーはmacOS以外のビルドでのみコンパイル/起動され(#ifndef __APPLE__)、"jemalloc_tracker_daemon" という名前のデーモンスレッドとして実行されます。この設定は起動動作とMemTrackerの状態を維持するスレッドに影響するため、変更には再起動が必要です。jemallocが使用されていない場合、またはjemallocトラッキングが意図的に異なる方法で管理されている場合にのみ無効にしてください。それ以外の場合は、正確なメモリ会計と割り当ての保護を維持するために有効にしておいてください。 -- Introduced in: v3.2.12 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: この項目が`true`に設定されている場合、BEはバックグラウンドスレッド(jemalloc_tracker_daemon)を開始し、jemalloc統計を(1秒ごとに)ポーリングし、GlobalEnv jemallocメタデータMemTrackerをjemallocの"stats.metadata"値で更新します。これにより、jemallocメタデータ消費がStarRocksプロセスメモリアカウンティングに含まれ、jemalloc内部で使用されるメモリの過少報告が防止されます。トラッカーはmacOS以外のビルドでのみコンパイル/開始され(#ifndef __APPLE__)、"jemalloc_tracker_daemon"という名前のデーモンスレッドとして実行されます。この設定は起動動作とMemTrackerの状態を維持するスレッドに影響するため、変更には再起動が必要です。jemallocが使用されていない場合、またはjemallocトラッキングが意図的に異なる方法で管理されている場合にのみ無効にしてください。それ以外の場合は、正確なメモリアカウンティングと割り当て保護を維持するために有効にしておいてください。 +- 導入バージョン: v3.2.12 ##### enable_jvm_metrics -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: 起動時にJVM固有のメトリクスを初期化および登録するかどうかを制御します。有効にすると、メトリクスサブシステムはJVM関連のコレクタ(例:ヒープ、GC、スレッドメトリクス)をエクスポート用に作成し、無効にすると、それらのコレクタは初期化されません。このパラメータは将来の互換性のために意図されており、将来のリリースで削除される可能性があります。システムレベルのメトリクス収集を制御するには `enable_system_metrics` を使用してください。 -- Introduced in: v4.0.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: 起動時にJVM固有のメトリクスを初期化および登録するかどうかを制御します。有効にすると、メトリクスサブシステムはJVM関連のコレクタ(例:ヒープ、GC、スレッドメトリクス)をエクスポート用に作成し、無効にすると、それらのコレクタは初期化されません。このパラメータは将来の互換性のために意図されており、将来のリリースで削除される可能性があります。システムレベルのメトリクス収集を制御するには、`enable_system_metrics`を使用してください。 +- 導入バージョン: v4.0.0 ##### get_pindex_worker_count -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: UpdateManagerの「get_pindex」スレッドプール(プライマリキーテーブルのrowsetを適用するときに使用される永続インデックスデータをロード/フェッチするために使用される)のワーカースレッド数を設定します。実行時には、設定更新によってプールの最大スレッドが調整されます。`>0` の場合、その値が適用されます。`0` の場合、ランタイムコールバックはCPUコア数(`CpuInfo::num_cores()`)を使用します。初期化時には、プールの最大スレッドはmax(`get_pindex_worker_count`, `max_apply_thread_cnt` * 2) として計算されます。ここで `max_apply_thread_cnt` はapply-threadプールの最大値です。pindexロードの並列性を高めるには値を増やし、同時実行性とメモリ/CPU使用量を減らすには値を減らします。 -- Introduced in: v3.2.0 +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: `UpdateManager`の"get_pindex"スレッドプール(プライマリキーテーブルのロウセット適用時に使用される永続インデックスデータをロード/フェッチするために使用される)のワーカー・スレッド数を設定します。実行時には、設定の更新によってプールの最大スレッドが調整されます。`>0`の場合、その値が適用されます。`0`の場合、ランタイムコールバックはCPUコア数(`CpuInfo::num_cores()`)を使用します。初期化時には、プールの最大スレッドは`max(get_pindex_worker_count, max_apply_thread_cnt * 2)`として計算されます(ここで`max_apply_thread_cnt`は適用スレッドプールの最大値)。pindexロードの並列処理を増やすにはこの値を増やし、並列処理とメモリ/CPU使用量を減らすにはこの値を減らします。 +- 導入バージョン: v3.2.0 ##### heartbeat_service_port -- Default: 9050 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: FEからのハートビートを受信するために使用されるBEハートビートサービスポート。 -- Introduced in: - +- デフォルト: 9050 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: FEからのハートビートを受信するために使用されるBEハートビートサービスのポート。 +- 導入バージョン: - ##### heartbeat_service_thread_count -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BEハートビートサービスのスレッド数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BEハートビートサービスのスレッド数。 +- 導入バージョン: - ##### local_library_dir -- Default: `${UDF_RUNTIME_DIR}` -- Type: string -- Unit: - -- Is mutable: いいえ -- Description: UDF(ユーザー定義関数)ライブラリがステージングされ、Python UDFワーカプロセスが動作するBE上のローカルディレクトリ。StarRocksはHDFSからこのパスにUDFライブラリをコピーし、`/pyworker_` にワーカごとのUnixドメインソケットを作成し、exec前にPythonワーカプロセスをこのディレクトリに変更します。ディレクトリは存在し、BEプロセスによって書き込み可能であり、Unixドメインソケットをサポートするファイルシステム(つまり、ローカルファイルシステム)上にある必要があります。この設定は実行時に変更できないため、起動前に設定し、各BEで適切な権限とディスクスペースを確保してください。 -- Introduced in: v3.2.0 +- デフォルト: `${UDF_RUNTIME_DIR}` +- タイプ: string +- 単位: - +- 変更可能: いいえ +- 説明: UDF(ユーザー定義関数)ライブラリがステージングされ、Python UDFワーカープロセスが動作するBE上のローカルディレクトリ。StarRocksはHDFSからこのパスにUDFライブラリをコピーし、`/pyworker_`にワーカーごとのUnixドメインソケットを作成し、exec前にPythonワーカープロセスをこのディレクトリに変更します。ディレクトリは存在し、BEプロセスから書き込み可能で、Unixドメインソケット(つまりローカルファイルシステム)をサポートするファイルシステム上に存在する必要があります。この設定は実行時に変更できないため、起動前に設定し、各BEで適切な権限とディスク容量を確保してください。 +- 導入バージョン: v3.2.0 ##### max_transmit_batched_bytes -- Default: 262144 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: 単一の送信リクエストで蓄積される、ネットワークにフラッシュされる前のシリアライズ済みバイトの最大数。送信側の実装は、シリアライズされたChunkPBペイロードをPTransmitChunkParamsリクエストに追加し、蓄積されたバイトが `max_transmit_batched_bytes` を超えるかEOSに達した場合にリクエストを送信します。この値を増やすと、RPCの頻度を減らし、スループットを向上させることができますが、リクエストごとのレイテンシとメモリ使用量が増加します。この値を減らすと、レイテンシとメモリを削減できますが、RPCレートが増加します。 -- Introduced in: v3.2.0 +- デフォルト: 262144 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: ネットワークにフラッシュされるまでに単一の送信リクエストで蓄積されるシリアル化バイトの最大数。送信者実装は、シリアル化されたChunkPBペイロードをPTransmitChunkParamsリクエストに追加し、蓄積されたバイトが`max_transmit_batched_bytes`を超えるか、EOSに達した場合にリクエストを送信します。この値を増やすと、RPCの頻度が減り、スループットが向上しますが、リクエストごとのレイテンシとメモリ使用量が増加します。この値を減らすと、レイテンシとメモリが減少しますが、RPCレートが増加します。 +- 導入バージョン: v3.2.0 ##### mem_limit -- Default: 90% -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: BEプロセスのメモリ上限。パーセンテージ("80%")または物理的な制限("100G")として設定できます。デフォルトのハードリミットはサーバーのメモリサイズの90%で、ソフトリミットは80%です。StarRocksを他のメモリ集約的なサービスと同じサーバーにデプロイする場合、このパラメータを設定する必要があります。 -- Introduced in: - +- デフォルト: 90% +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: BEプロセスメモリの上限。パーセンテージ("80%")または物理的な制限("100G")として設定できます。デフォルトのハードリミットはサーバーメモリサイズの90%で、ソフトリミットは80%です。同じサーバーにStarRocksを他のメモリを大量に消費するサービスと一緒にデプロイする場合は、このパラメータを設定する必要があります。 +- 導入バージョン: - ##### memory_max_alignment -- Default: 16 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: MemPoolが整列された割り当てに対して受け入れる最大バイトアラインメントを設定します。呼び出し元がより大きなアラインメント(SIMD、デバイスバッファ、またはABI制約のため)を必要とする場合にのみ、この値を増やしてください。大きな値は、割り当てごとのパディングと予約メモリの無駄を増やし、システムアロケータとプラットフォームがサポートする範囲内である必要があります。 -- Introduced in: v3.2.0 +- デフォルト: 16 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: MemPoolが整列割り当てに対して受け入れる最大バイトアラインメントを設定します。この値は、呼び出し元がより大きなアラインメントを必要とする場合(SIMD、デバイスバッファ、またはABI制約のため)にのみ増やしてください。値が大きいと、割り当てごとのパディングと予約メモリの無駄が増加し、システムアロケータとプラットフォームがサポートする範囲内に維持する必要があります。 +- 導入バージョン: v3.2.0 ##### memory_urgent_level -- Default: 85 -- Type: long -- Unit: パーセンテージ (0-100) -- Is mutable: はい -- Description: プロセスメモリ制限のパーセンテージとして表される緊急メモリウォーターレベル。プロセスメモリ消費が `(limit * memory_urgent_level / 100)` を超えると、BEは即座にメモリ再利用をトリガーし、データキャッシュの縮小、更新キャッシュの削除、永続/lake MemTableの「満杯」扱いを強制し、それらがすぐにフラッシュ/圧縮されるようにします。コードは、この設定が `memory_high_level` より大きく、`memory_high_level` が1以上かつ100以下でなければならないことを検証します。値が低いと、より積極的で早期の再利用が発生し、キャッシュの削除とフラッシュが頻繁になります。値が高いと、再利用が遅延し、100に近すぎるとOOMのリスクがあります。この項目は `memory_high_level` とデータキャッシュ関連の自動調整設定と合わせて調整してください。 -- Introduced in: v3.2.0 +- デフォルト: 85 +- タイプ: long +- 単位: パーセンテージ (0-100) +- 変更可能: はい +- 説明: プロセスメモリ制限のパーセンテージとして表される緊急メモリウォーターレベル。プロセスメモリ消費が`(limit * memory_urgent_level / 100)`を超えると、BEは即座のメモリ再利用をトリガーし、データキャッシュの縮小、更新キャッシュの退去、永続/Lake MemTableの「フル」扱いを強制して、すぐにフラッシュ/圧縮されるようにします。コードは、この設定が`memory_high_level`よりも大きく、`memory_high_level`が`1`以上`100`以下であることを検証します。値が低いほど、より積極的な早期再利用、つまりより頻繁なキャッシュ退去とフラッシュが発生します。値が高いほど再利用が遅れ、100に近づきすぎるとOOMのリスクがあります。この項目は`memory_high_level`とデータキャッシュ関連の自動調整設定と合わせて調整してください。 +- 導入バージョン: v3.2.0 ##### net_use_ipv6_when_priority_networks_empty -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: `priority_networks` が指定されていない場合にIPv6アドレスを優先的に使用するかどうかを制御するブール値。`true` は、ノードをホストするサーバーがIPv4とIPv6アドレスの両方を持っており、`priority_networks` が指定されていない場合に、システムがIPv6アドレスを優先的に使用することを許可することを示します。 -- Introduced in: v3.3.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: `priority_networks`が指定されていない場合にIPv6アドレスを優先的に使用するかどうかを制御するブール値。`true`は、ノードをホストするサーバーがIPv4とIPv6の両方のアドレスを持ち、`priority_networks`が指定されていない場合に、システムがIPv6アドレスを優先的に使用することを許可することを示します。 +- 導入バージョン: v3.3.0 ##### num_cores -- Default: 0 -- Type: Int -- Unit: コア -- Is mutable: いいえ -- Description: CPU認識の決定(例えば、スレッドプールサイジングやランタイムスケジューリング)にシステムが使用するCPUコア数を制御します。値が0の場合、自動検出が有効になります。システムは `/proc/cpuinfo` を読み取り、利用可能なすべてのコアを使用します。正の整数に設定された場合、その値は検出されたコア数を上書きし、実効コア数になります。コンテナ内で実行されている場合、cgroupのcpusetまたはcpuクォータ設定が使用可能なコアをさらに制限する可能性があります。`CpuInfo` もこれらのcgroup制限を尊重します。 -- Introduced in: v3.2.0 +- デフォルト: 0 +- タイプ: Int +- 単位: コア +- 変更可能: いいえ +- 説明: CPU認識決定(例えば、スレッドプールのサイズ設定やランタイムスケジューリング)のためにシステムが使用するCPUコア数を制御します。値が0の場合、自動検出が有効になります。システムは`/proc/cpuinfo`を読み取り、利用可能なすべてのコアを使用します。正の整数に設定された場合、その値が検出されたコア数をオーバーライドし、実効コア数になります。コンテナ内で実行されている場合、cgroupのcpusetまたはcpuクォータ設定が使用可能なコアをさらに制限することがあります。`CpuInfo`もこれらのcgroup制限を尊重します。 +- 導入バージョン: v3.2.0 ##### plugin_path -- Default: `${STARROCKS_HOME}/plugin` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: StarRocksが外部プラグイン(動的ライブラリ、コネクタアーティファクト、UDFバイナリなど)をロードするファイルシステムディレクトリ。`plugin_path` はBEプロセスからアクセス可能なディレクトリ(読み取りおよび実行権限)を指し、プラグインがロードされる前に存在する必要があります。正しい所有権と、プラグインファイルがプラットフォームのネイティブバイナリ拡張子(例:Linuxでは.so)を使用していることを確認してください。 -- Introduced in: v3.2.0 +- デフォルト: `${STARROCKS_HOME}/plugin` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: StarRocksが外部プラグイン(ダイナミックライブラリ、コネクタアーティファクト、UDFバイナリなど)をロードするファイルシステムディレクトリ。`plugin_path`は、BEプロセスがアクセス可能(読み取りおよび実行権限)で、プラグインがロードされる前に存在しているディレクトリを指す必要があります。正しい所有権と、プラグインファイルがプラットフォームのネイティブバイナリ拡張子(例えばLinuxでは.so)を使用していることを確認してください。 +- 導入バージョン: v3.2.0 ##### priority_networks -- Default: An empty string -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: 複数のIPアドレスを持つサーバーの選択戦略を宣言します。このパラメータで指定されたリストに最大1つのIPアドレスが一致する必要があることに注意してください。このパラメータの値は、CIDR表記でセミコロン(;)で区切られたエントリのリストです(例:`10.10.10.0/24`)。このリストのエントリにIPアドレスが一致しない場合、サーバーの利用可能なIPアドレスがランダムに選択されます。v3.3.0以降、StarRocksはIPv6ベースのデプロイメントをサポートします。サーバーがIPv4とIPv6の両方のアドレスを持ち、このパラメータが指定されていない場合、システムはデフォルトでIPv4アドレスを使用します。この動作は `net_use_ipv6_when_priority_networks_empty` を `true` に設定することで変更できます。 -- Introduced in: - +- デフォルト: 空文字列 +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: 複数のIPアドレスを持つサーバーの選択戦略を宣言します。最大で1つのIPアドレスがこのパラメーターで指定されたリストと一致する必要があることに注意してください。このパラメーターの値は、CIDR表記でセミコロン(;)で区切られたエントリのリストです。例えば、`10.10.10.0/24`。このリストのエントリと一致するIPアドレスがない場合、サーバーの利用可能なIPアドレスがランダムに選択されます。v3.3.0以降、StarRocksはIPv6ベースのデプロイメントをサポートしています。サーバーがIPv4とIPv6の両方のアドレスを持ち、このパラメーターが指定されていない場合、システムはデフォルトでIPv4アドレスを使用します。`net_use_ipv6_when_priority_networks_empty`を`true`に設定することで、この動作を変更できます。 +- 導入バージョン: - ##### rpc_compress_ratio_threshold -- Default: 1.1 -- Type: Double -- Unit: - -- Is mutable: はい -- Description: 圧縮形式でネットワーク経由でシリアライズされたローバッチを送信するかどうかを決定する際に使用される閾値 (uncompressed_size / compressed_size)。圧縮が試行されるとき (例: DataStreamSender、交換シンク、タブレットシンクインデックスチャネル、辞書キャッシュライター)、StarRocksはcompress_ratio = uncompressed_size / compressed_size を計算します。compress_ratioが `rpc_compress_ratio_threshold` より大きい場合にのみ圧縮ペイロードを使用します。デフォルトの1.1では、圧縮データは非圧縮データよりも少なくとも約9.1%小さくなければ使用されません。圧縮を優先するには値を下げます (帯域幅の節約が小さくてもCPU使用量が増加)。より大きなサイズ削減が得られない限り圧縮オーバーヘッドを避けるには値を上げます。注意: これはRPC/シャッフルシリアライズに適用され、ローバッチ圧縮が有効な場合 (compress_rowbatches) にのみ有効です。 -- Introduced in: v3.2.0 +- デフォルト: 1.1 +- タイプ: Double +- 単位: - +- 変更可能: はい +- 説明: シリアル化されたロウバッチを圧縮形式でネットワーク経由で送信するかどうかを決定する際に使用される閾値(`非圧縮サイズ / 圧縮サイズ`)。圧縮が試行される場合(例:DataStreamSender、exchange sink、tablet sink index channel、dictionary cache writer)、StarRocksは`compress_ratio = 非圧縮サイズ / 圧縮サイズ`を計算します。`compress_ratio > rpc_compress_ratio_threshold`の場合にのみ圧縮ペイロードを使用します。デフォルトの1.1では、圧縮データが非圧縮データよりも少なくとも約9.1%小さくなければ使用されません。圧縮を優先するには(より小さい帯域幅節約のためにCPUを多く使用)、値を減らしてください。より大きなサイズ削減が得られない限り圧縮オーバーヘッドを避けるには、値を増やしてください。注:これはRPC/シャッフルシリアル化に適用され、ロウバッチ圧縮が有効な場合にのみ有効です(`compress_rowbatches`)。 +- 導入バージョン: v3.2.0 ##### ssl_private_key_path -- Default: An empty string -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: BEのbRPCサーバーがデフォルト証明書の秘密鍵として使用するTLS/SSL秘密鍵(PEM)へのファイルシステムパス。`enable_https` が `true` に設定されている場合、システムはプロセス開始時に `brpc::ServerOptions::ssl_options().default_cert.private_key` をこのパスに設定します。ファイルはBEプロセスからアクセス可能であり、`ssl_certificate_path` で提供される証明書と一致する必要があります。この値が設定されていない場合、またはファイルが存在しないかアクセスできない場合、HTTPSは設定されず、bRPCサーバーの起動に失敗する可能性があります。このファイルを制限的なファイルシステム権限(例:600)で保護してください。 -- Introduced in: v4.0.0 +- デフォルト: 空文字列 +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: BEのbRPCサーバーがデフォルト証明書の秘密鍵として使用するTLS/SSL秘密鍵(PEM)へのファイルシステムパス。`enable_https`が`true`に設定されている場合、プロセス開始時にこの値が`brpc::ServerOptions::ssl_options().default_cert.private_key`にコピーされます。`ssl_certificate_path`で提供される証明書と一致する秘密鍵も設定する必要があります。この値が設定されていないか、ファイルが見つからないかアクセスできない場合、HTTPSは設定されず、bRPCサーバーが起動に失敗する可能性があります。このファイルは、制限的なファイルシステム権限(例:600)で保護してください。 +- 導入バージョン: v4.0.0 ##### thrift_client_retry_interval_ms -- Default: 100 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: Thriftクライアントが再試行する時間間隔。 -- Introduced in: - +- デフォルト: 100 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: Thriftクライアントが再試行する時間間隔。 +- 導入バージョン: - ##### thrift_connect_timeout_seconds -- Default: 3 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: Thriftクライアント作成時に使用される接続タイムアウト(秒単位)。`ClientCacheHelper::_create_client` はこの値を1000倍し、`ThriftClientImpl::set_conn_timeout()` に渡すため、BEクライアントキャッシュによって開かれる新しいThrift接続のTCP/接続ハンドシェイクタイムアウトを制御します。この設定は接続確立のみに影響します。送信/受信タイムアウトは別途設定されます。非常に小さい値は、高遅延ネットワークで偽の接続障害を引き起こす可能性があり、大きい値は到達不能なピアの検出を遅らせます。 -- Introduced in: v3.2.0 +- デフォルト: 3 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: Thriftクライアント作成時に使用される接続タイムアウト(秒単位)。`ClientCacheHelper::_create_client`はこの値を1000倍し、`ThriftClientImpl::set_conn_timeout()`に渡すため、BEクライアントキャッシュによって開かれる新しいThrift接続のTCP/接続ハンドシェイクタイムアウトを制御します。この設定は接続確立のみに影響し、送受信タイムアウトは別途設定されます。非常に小さい値は、高レイテンシネットワークで誤った接続失敗を引き起こす可能性があり、大きい値は到達不能なピアの検出を遅らせます。 +- 導入バージョン: v3.2.0 ##### thrift_port -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 内部のThriftベースのBackendServiceをエクスポートするために使用されるポート。プロセスがCompute Nodeとして実行され、この項目が0以外の値に設定されている場合、`be_port` を上書きし、Thriftサーバーはこの値にバインドします。それ以外の場合は `be_port` が使用されます。この設定は非推奨です。0以外の `thrift_port` を設定すると、`be_port` を使用するように助言する警告がログに記録されます。 -- Introduced in: v3.2.0 +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 内部ThriftベースのBackendServiceをエクスポートするために使用されるポート。プロセスがCompute Nodeとして実行され、この項目が0以外の値に設定されている場合、`be_port`をオーバーライドし、Thriftサーバーはこの値にバインドされます。それ以外の場合は`be_port`が使用されます。この設定は非推奨です。0以外の`thrift_port`を設定すると、`be_port`を使用するよう警告ログが出力されます。 +- 導入バージョン: v3.2.0 ##### thrift_rpc_connection_max_valid_time_ms -- Default: 5000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: Thrift RPC接続の最大有効時間。この値よりも長く接続プールに存在した接続は閉じられます。FE構成 `thrift_client_timeout_ms` と一致するように設定する必要があります。 -- Introduced in: - +- デフォルト: 5000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: Thrift RPC接続の最大有効時間。この値よりも長く接続プールに存在した接続は閉じられます。FEの設定`thrift_client_timeout_ms`と一致させる必要があります。 +- 導入バージョン: - ##### thrift_rpc_max_body_size -- Default: 0 -- Type: Int -- Unit: -- Is mutable: いいえ -- Description: RPCの最大文字列ボディサイズ。`0` はサイズが無制限であることを示します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: +- 変更可能: いいえ +- 説明: RPCの最大文字列ボディサイズ。`0`はサイズが無制限であることを示します。 +- 導入バージョン: - ##### thrift_rpc_strict_mode -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: Thriftの厳密な実行モードが有効になっているかどうか。Thriftの厳密なモードの詳細については、[Thrift Binary protocol encoding](https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md) を参照してください。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: Thriftの厳格な実行モードが有効になっているかどうか。Thrift厳格モードの詳細については、[Thrift Binary protocol encoding](https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md)を参照してください。 +- 導入バージョン: - ##### thrift_rpc_timeout_ms -- Default: 5000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: Thrift RPCのタイムアウト。 -- Introduced in: - +- デフォルト: 5000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: Thrift RPCのタイムアウト。 +- 導入バージョン: - ##### transaction_apply_thread_pool_num_min -- Default: 0 -- Type: Int -- Unit: スレッド -- Is mutable: はい -- Description: BEのUpdateManager内の「update_apply」スレッドプール(プライマリキーテーブルのrowsetを適用するプール)の最小スレッド数を設定します。値が0の場合、固定された最小値は無効になります(下限は強制されません)。`transaction_apply_worker_count` も0の場合、プールの最大スレッドはCPUコア数にデフォルト設定されるため、実効ワーカ容量はCPUコア数に等しくなります。これを増やすと、トランザクション適用に対するベースラインの同時実行性を保証できます。高すぎるとCPUの競合が増加する可能性があります。変更は、update_config HTTPハンドラを介して実行時に適用されます(applyスレッドプールで `update_min_threads` を呼び出します)。 -- Introduced in: v3.2.11 +- デフォルト: 0 +- タイプ: Int +- 単位: スレッド +- 変更可能: はい +- 説明: BEのUpdateManager内の"update_apply"スレッドプール(プライマリキーテーブルのロウセットを適用するために使用されるプール)の最小スレッド数を設定します。0の値は固定の最小値が無効であることを示します(強制的な下限なし)。`transaction_apply_worker_count`も0の場合、プールの最大スレッドはCPUコア数にデフォルト設定されるため、実効ワーカー容量はCPUコア数に等しくなります。これを増やすと、トランザクション適用に対する基本の並行処理を保証できます。高すぎるとCPUの競合が増加する可能性があります。変更はHTTPハンドラー`update_config`を介して実行時に適用されます(適用スレッドプールで`update_min_threads`が呼び出されます)。 +- 導入バージョン: v3.2.11 ##### transaction_publish_version_thread_pool_num_min -- Default: 0 -- Type: Int -- Unit: スレッド -- Is mutable: はい -- Description: AgentServerの「publish_version」動的スレッドプール(トランザクションバージョンを公開する/ TTaskType::PUBLISH_VERSIONタスクを処理するために使用される)で予約される最小スレッド数を設定します。起動時にプールはmin = max(設定値, MIN_TRANSACTION_PUBLISH_WORKER_COUNT) (MIN_TRANSACTION_PUBLISH_WORKER_COUNT = 1) で作成されるため、デフォルトの0は最小1スレッドになります。実行時にこの値を変更すると、updateコールバックがThreadPool::update_min_threadsを呼び出し、プールの保証された最小値が増減します(ただし、強制された最小値の1を下回ることはありません)。`transaction_publish_version_worker_count` (最大スレッド) と `transaction_publish_version_thread_pool_idle_time_ms` (アイドルタイムアウト) と連携して調整してください。 -- Introduced in: v3.2.11 +- デフォルト: 0 +- タイプ: Int +- 単位: スレッド +- 変更可能: はい +- 説明: AgentServerの"publish_version"動的スレッドプール(トランザクションバージョンをパブリッシュする/`TTaskType::PUBLISH_VERSION`タスクを処理するために使用される)に予約される最小スレッド数を設定します。起動時、プールは`min = max(設定値, MIN_TRANSACTION_PUBLISH_WORKER_COUNT)`(`MIN_TRANSACTION_PUBLISH_WORKER_COUNT = 1`)で作成されるため、デフォルトの0は最小1スレッドになります。実行時にこの値を変更すると、更新コールバックが`ThreadPool::update_min_threads`を呼び出し、プールの保証された最小値を増減します(ただし、強制される最小値1を下回ることはありません)。`transaction_publish_version_worker_count`(最大スレッド数)および`transaction_publish_version_thread_pool_idle_time_ms`(アイドルタイムアウト)と連携して調整してください。 +- 導入バージョン: v3.2.11 ##### use_mmap_allocate_chunk -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: この項目が `true` に設定されている場合、システムは匿名プライベートmmapマッピング(MAP_ANONYMOUS | MAP_PRIVATE)を使用してチャンクを割り当て、munmapで解放します。これを有効にすると、多数の仮想メモリマッピングが作成される可能性があるため、カーネル制限(rootユーザーとして `sysctl -w vm.max_map_count=262144` または `echo 262144 > /proc/sys/vm/max_map_count` を実行)を上げ、`chunk_reserved_bytes_limit` を比較的に大きな値に設定する必要があります。そうしないと、mmapを有効にすると、頻繁なマッピング/アンマッピングにより非常に低いパフォーマンスになる可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: この項目が`true`に設定されている場合、システムは匿名プライベートmmapマッピング(MAP_ANONYMOUS | MAP_PRIVATE)を使用してチャンクを割り当て、munmapでそれらを解放します。これを有効にすると、多数の仮想メモリマッピングが作成される可能性があるため、カーネル制限を上げる必要があります(rootユーザーとして`sysctl -w vm.max_map_count=262144`または`echo 262144 > /proc/sys/vm/max_map_count`を実行)。また、`chunk_reserved_bytes_limit`を比較的高く設定する必要があります。そうしないと、mmapを有効にすると、頻繁なマッピング/アンマッピングによりパフォーマンスが非常に低下する可能性があります。 +- 導入バージョン: v3.2.0 -### メタデータとクラスター管理 +### メタデータおよびクラスター管理 ##### cluster_id -- Default: -1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: このStarRocksバックエンドのグローバルクラスター識別子。起動時にStorageEngineは `config::cluster_id` を実効クラスターIDに読み込み、すべてのデータルートパスが同じクラスターIDを含んでいることを確認します(`StorageEngine::_check_all_root_path_cluster_id` を参照)。値 `-1` は「未設定」を意味します。エンジンは既存のデータディレクトリまたはマスターハートビートから実効IDを導出する可能性があります。非負のIDが構成されている場合、構成されたIDとデータディレクトリに保存されているIDとの不一致により、起動検証が失敗します(`Status::Corruption`)。一部のルートにIDがなく、エンジンがIDの書き込みを許可されている場合(`options.need_write_cluster_id`)、実効IDをそれらのルートに永続化します。 -- Introduced in: v3.2.0 +- デフォルト: -1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: このStarRocksバックエンドのグローバルクラスター識別子。起動時、StorageEngineは`config::cluster_id`をその実効クラスターIDに読み込み、すべてのデータルートパスが同じクラスターIDを含んでいることを検証します(`StorageEngine::_check_all_root_path_cluster_id`を参照)。-1の値は「未設定」を意味します。エンジンは既存のデータディレクトリまたはマスターハートビートから実効IDを導出する場合があります。負でないIDが設定されている場合、設定されたIDとデータディレクトリに格納されているIDとの間に不一致があると、起動検証は失敗します(Status::Corruption)。一部のルートにIDがなく、エンジンがIDの書き込みを許可されている場合(options.need_write_cluster_id)、実効IDをそれらのルートに永続化します。 +- 導入バージョン: v3.2.0 ##### consistency_max_memory_limit -- Default: 10G -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: CONSISTENCYメモリトラッカーのメモリサイズ指定。 -- Introduced in: v3.2.0 +- デフォルト: 10G +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: CONSISTENCYメモリトラッカーのメモリサイズ指定。 +- 導入バージョン: v3.2.0 ##### make_snapshot_rpc_timeout_ms -- Default: 20000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: リモートBEでスナップショットを作成する際に使用されるThrift RPCタイムアウトをミリ秒単位で設定します。リモートスナップショットの作成がデフォルトのタイムアウトを定期的に超える場合はこの値を増やし、応答しないBEでより早く失敗するにはこの値を減らします。他のタイムアウトがエンドツーエンド操作に影響を与える可能性があることに注意してください(例えば、実効的なタブレットライターオープンタイムアウトは `tablet_writer_open_rpc_timeout_sec` や `load_timeout_sec` に関連する可能性があります)。 -- Introduced in: v3.2.0 +- デフォルト: 20000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: リモートBEでスナップショットを作成する際に使用されるThrift RPCタイムアウトをミリ秒単位で設定します。リモートスナップショットの作成がデフォルトのタイムアウトを定期的に超える場合はこの値を増やし、応答しないBEでより早く失敗するにはこの値を減らします。他のタイムアウトがエンドツーエンドの操作に影響を与える可能性があることに注意してください(例えば、実効タブレットライターオープンタイムアウトは`tablet_writer_open_rpc_timeout_sec`や`load_timeout_sec`に関連する場合があります)。 +- 導入バージョン: v3.2.0 ##### metadata_cache_memory_limit_percent -- Default: 30 -- Type: Int -- Unit: パーセント -- Is mutable: はい -- Description: メタデータLRUキャッシュサイズをプロセスメモリ制限のパーセンテージとして設定します。起動時にStarRocksはキャッシュバイト数を (process_mem_limit * metadata_cache_memory_limit_percent / 100) として計算し、それをメタデータキャッシュアロケータに渡します。キャッシュは非PRIMARY_KEYS rowsets(PKテーブルはサポートされていません)にのみ使用され、`metadata_cache_memory_limit_percent` > 0 の場合にのみ有効になります。メタデータキャッシュを無効にするには ≤ 0 に設定します。この値を増やすとメタデータキャッシュ容量が増加しますが、他のコンポーネントで利用可能なメモリが減少します。ワークロードとシステムメモリに基づいて調整してください。BE_TESTビルドではアクティブではありません。 -- Introduced in: v3.2.10 +- デフォルト: 30 +- タイプ: Int +- 単位: パーセント +- 変更可能: はい +- 説明: メタデータLRUキャッシュサイズをプロセスメモリ制限のパーセンテージとして設定します。起動時、StarRocksはキャッシュバイトを(`process_mem_limit * metadata_cache_memory_limit_percent / 100`)として計算し、それをメタデータキャッシュアロケータに渡します。キャッシュは非PRIMARY_KEYSのロウセット(PKテーブルはサポートされていません)にのみ使用され、`metadata_cache_memory_limit_percent > 0`の場合にのみ有効になります。メタデータキャッシュを無効にするには`<= 0`に設定してください。この値を増やすとメタデータキャッシュ容量が増加しますが、他のコンポーネントで利用可能なメモリが減少します。ワークロードとシステムメモリに基づいて調整してください。BE_TESTビルドではアクティブではありません。 +- 導入バージョン: v3.2.10 ##### retry_apply_interval_second -- Default: 30 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 失敗したタブレット適用操作の再試行をスケジューリングする際に使用されるベース間隔(秒単位)。これは、送信失敗後の再試行のスケジューリングに直接使用され、バックオフのベース乗数としても使用されます。次の再試行遅延はmin(600, `retry_apply_interval_second` * failed_attempts)として計算されます。コードはまた、`retry_apply_interval_second` を使用して累積再試行期間(等差数列の合計)を計算し、`retry_apply_timeout_second` と比較して再試行を続けるかどうかを決定します。`enable_retry_apply` がtrueの場合にのみ有効です。この値を増やすと、個々の再試行遅延と再試行に費やされる累積時間の両方が長くなります。減らすと、再試行がより頻繁になり、`retry_apply_timeout_second` に達するまでの試行回数が増加する可能性があります。 -- Introduced in: v3.2.9 +- デフォルト: 30 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 失敗したタブレット適用操作の再試行をスケジュールする際に使用される基本間隔(秒単位)。これは、送信失敗後に再試行をスケジュールするために直接使用され、バックオフの基本乗数としても使用されます。次の再試行遅延は`min(600, retry_apply_interval_second * 失敗回数)`として計算されます。コードはまた、累積再試行期間(等差数列の合計)を計算するために`retry_apply_interval_second`を使用し、それを`retry_apply_timeout_second`と比較して再試行を継続するかどうかを決定します。`enable_retry_apply`がtrueの場合にのみ有効です。この値を増やすと、個々の再試行遅延と累積再試行時間が長くなります。減らすと、再試行頻度が高くなり、`retry_apply_timeout_second`に達するまでの試行回数が増加する可能性があります。 +- 導入バージョン: v3.2.9 ##### retry_apply_timeout_second -- Default: 7200 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 適用プロセスがあきらめ、タブレットがエラー状態に入る前に、保留中のバージョンの適用に許容される最大累積再試行時間(秒単位)。適用ロジックは、`retry_apply_interval_second` に基づいて指数関数的/バックオフ間隔を累積し、合計期間を `retry_apply_timeout_second` と比較します。`enable_retry_apply` がtrueであり、エラーが再試行可能と見なされる場合、累積バックオフが `retry_apply_timeout_second` を超えるまで適用試行は再スケジュールされます。その後、適用は停止し、タブレットはエラーに移行します。明示的に再試行不能なエラー(例:`Corruption`)は、この設定に関係なく再試行されません。この値を調整して、StarRocksが適用操作を再試行し続ける期間(デフォルト7200秒 = 2時間)を制御します。 -- Introduced in: v3.3.13, v3.4.3, v3.5.0 +- デフォルト: 7200 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 適用プロセスが諦めてタブレットがエラー状態に入るまでの、保留中のバージョンを適用するために許可される累積再試行時間の最大値(秒単位)。適用ロジックは、`retry_apply_interval_second`に基づいて指数関数的な/バックオフ間隔を累積し、合計期間を`retry_apply_timeout_second`と比較します。`enable_retry_apply`がtrueで、エラーが再試行可能と見なされる場合、累積バックオフが`retry_apply_timeout_second`を超えるまで適用試行は再スケジュールされます。その後、適用は停止し、タブレットはエラーに移行します。明示的に再試行できないエラー(例えば、Corruption)は、この設定に関わらず再試行されません。StarRocksが適用操作を再試行し続ける期間を制御するためにこの値を調整します(デフォルト7200秒 = 2時間)。 +- 導入バージョン: v3.3.13, v3.4.3, v3.5.0 ##### txn_commit_rpc_timeout_ms -- Default: 60000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: BEストリームロードおよびトランザクションコミット呼び出しで使用されるThrift RPC接続の最大許容存続時間(ミリ秒単位)。StarRocksはこの値をFEに送信されるリクエスト(stream_load計画、loadTxnBegin/loadTxnPrepare/loadTxnCommit、getLoadTxnStatusで使用)の `thrift_rpc_timeout_ms` として設定します。接続がこの値よりも長くプールされている場合、接続は閉じられます。リクエストごとのタイムアウト(`ctx->timeout_second`)が提供されている場合、BEはRPCタイムアウトをrpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms)) として計算するため、実効的なRPCタイムアウトはコンテキストとこの設定によって制限されます。不一致のタイムアウトを避けるために、これをFEの `thrift_client_timeout_ms` と一致させてください。 -- Introduced in: v3.2.0 +- デフォルト: 60000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: BEのストリームロードおよびトランザクションコミット呼び出しで使用されるThrift RPC接続の最大許容存続時間(ミリ秒)。StarRocksはこの値をFEに送信されるリクエストの`thrift_rpc_timeout_ms`として設定します(stream_load計画、loadTxnBegin/loadTxnPrepare/loadTxnCommit、getLoadTxnStatusで使用)。接続がこの値よりも長くプールされている場合、閉じられます。リクエストごとのタイムアウト(`ctx->timeout_second`)が提供される場合、BEはRPCタイムアウトを`rpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms))`として計算するため、実効RPCタイムアウトはコンテキストとこの設定によって制限されます。不一致のタイムアウトを避けるために、これをFEの`thrift_client_timeout_ms`と一致させてください。 +- 導入バージョン: v3.2.0 ##### txn_map_shard_size -- Default: 128 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: トランザクションマネージャーがトランザクションロックを分割し、競合を減らすために使用するロックマップシャードの数。その値は2の累乗(2^n)であるべきです。値を増やすと、追加のメモリとわずかな簿記のオーバーヘッドのコストで、同時実行性が向上し、ロック競合が減少します。予想される同時トランザクションと利用可能なメモリに合わせてシャード数を設定してください。 -- Introduced in: v3.2.0 +- デフォルト: 128 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: トランザクションマネージャーがトランザクションロックをパーティション化し、競合を減らすために使用するロックマップシャードの数。その値は2の累乗(2^n)であるべきです。これを増やすと、並行処理が増加し、ロックの競合が減少しますが、追加のメモリとわずかな簿記のオーバーヘッドが発生します。予想される同時トランザクションと利用可能なメモリに合わせてシャード数を選択してください。 +- 導入バージョン: v3.2.0 ##### txn_shard_size -- Default: 1024 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: トランザクションマネージャーが使用するロックシャードの数を制御します。この値はtxnロックのシャードサイズを決定します。2の累乗である必要があります。より大きな値に設定すると、ロックの競合が減少し、同時COMMIT/PUBLISHのスループットが向上しますが、追加のメモリとより詳細な内部的な簿記のコストが増加します。 -- Introduced in: v3.2.0 +- デフォルト: 1024 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: トランザクションマネージャーが使用するロックシャードの数を制御します。この値はtxnロックのシャードサイズを決定します。これは2の累乗でなければなりません。これを大きな値に設定すると、ロックの競合が減少し、同時COMMIT/PUBLISHのスループットが向上しますが、追加のメモリとよりきめ細かい内部の簿記のコストがかかります。 +- 導入バージョン: v3.2.0 ##### update_schema_worker_count -- Default: 3 -- Type: Int -- Unit: スレッド -- Is mutable: いいえ -- Description: TTaskType::UPDATE_SCHEMAタスクを処理するバックエンドの「update_schema」動的ThreadPool内のワーカースレッドの最大数を設定します。ThreadPoolは起動時にagent_serverで最小0スレッド(アイドル時には0にスケールダウン可能)とこの設定に等しい最大値(プールはデフォルトのアイドルタイムアウトと実質的に無制限のキューを使用)で作成されます。この値を増やすと、より多くの同時スキーマ更新タスクを許可できます(CPUとメモリ使用量が増加)。値を減らすと、並列スキーマ操作が制限されます。 -- Introduced in: v3.2.3 +- デフォルト: 3 +- タイプ: Int +- 単位: スレッド +- 変更可能: いいえ +- 説明: バックエンドの"update_schema"動的ThreadPoolで`TTaskType::UPDATE_SCHEMA`タスクを処理するワーカー・スレッドの最大数を設定します。ThreadPoolは起動時にagent_server内で最小0スレッド(アイドル時にゼロにスケールダウン可能)とこの設定に等しい最大値で作成されます。プールはデフォルトのアイドルタイムアウトと実質的に無制限のキューを使用します。この値を増やすと、より多くの同時スキーマ更新タスクを許可し(CPUとメモリ使用量が増加)、減らすと並行スキーマ操作を制限します。 +- 導入バージョン: v3.2.3 ##### update_tablet_meta_info_worker_count -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: タブレットメタデータ更新タスクを処理するバックエンドスレッドプールの最大ワーカースレッド数を設定します。スレッドプールはバックエンドの起動中に作成され、最小0スレッド(アイドル時には0にスケールダウン可能)とこの設定に等しい最大値(少なくとも1に制限される)を持ちます。実行時にこの値を更新すると、プールの最大スレッドが調整されます。同時メタデータ更新タスクを増やすには値を増やし、同時実行性を制限するには値を減らします。 -- Introduced in: v4.1.0, v4.0.6, v3.5.13 +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: タブレットメタデータ更新タスクを処理するバックエンドスレッドプールの最大ワーカー・スレッド数を設定します。スレッドプールはバックエンドの起動中に、最小0スレッド(アイドル時にゼロにスケールダウン可能)とこの設定に等しい最大値(最低1に制限される)で作成されます。実行時にこの値を更新すると、プールの最大スレッドが調整されます。同時メタデータ更新タスクを増やすにはこの値を増やし、並行処理を制限するにはこの値を減らします。 +- 導入バージョン: v4.1.0, v4.0.6, v3.5.13 -### ユーザー、ロール、および権限 +### ユーザー、ロール、権限 ##### ssl_certificate_path -- Default: -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: `enable_https` がtrueの場合にBEのbRPCサーバーが使用するTLS/SSL証明書ファイル(PEM)への絶対パス。BE起動時にこの値は `brpc::ServerOptions::ssl_options().default_cert.certificate` にコピーされます。一致する秘密鍵も `ssl_private_key_path` に設定する必要があります。CAで必要とされる場合、サーバー証明書とすべての中間証明書をPEM形式(証明書チェーン)で提供してください。ファイルはStarRocks BEプロセスから読み取り可能であり、起動時にのみ適用されます。`enable_https` が有効なときに設定されていないか無効な場合、bRPC TLSのセットアップが失敗し、サーバーが正しく起動できない可能性があります。 -- Introduced in: v4.0.0 +- デフォルト: +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: `enable_https`がtrueの場合にBEのbRPCサーバーが使用するTLS/SSL証明書ファイル(PEM)への絶対パス。BEの起動時に、この値は`brpc::ServerOptions::ssl_options().default_cert.certificate`にコピーされます。一致する秘密鍵も`ssl_private_key_path`に設定する必要があります。CAが必要とする場合は、PEM形式のサーバー証明書と中間証明書(証明書チェーン)を提供してください。ファイルはStarRocks BEプロセスから読み取り可能である必要があり、起動時にのみ適用されます。`enable_https`が有効な状態で設定されていないか無効な場合、bRPC TLSセットアップが失敗し、サーバーが正しく起動できない可能性があります。 +- 導入バージョン: v4.0.0 ### クエリエンジン ##### clear_udf_cache_when_start -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: 有効にすると、BEのUserFunctionCacheは起動時にローカルにキャッシュされているすべてのユーザー関数ライブラリをクリアします。`UserFunctionCache::init` 中に、コードは `_reset_cache_dir()` を呼び出し、設定されたUDFライブラリディレクトリ(`kLibShardNum` サブディレクトリに整理されている)からUDFファイルを削除し、Java/Python UDFのサフィックス(.jar/.py)を持つファイルを削除します。無効にすると(デフォルト)、BEは既存のキャッシュされたUDFファイルを削除する代わりにロードします。これを有効にすると、再起動後の最初の使用時にUDFバイナリが再ダウンロードされることになります(ネットワークトラフィックと初回使用のレイテンシが増加します)。 -- Introduced in: v4.0.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: 有効になっている場合、BEのUserFunctionCacheは起動時にローカルにキャッシュされているすべてのユーザー関数ライブラリをクリアします。`UserFunctionCache::init`中に、コードは`_reset_cache_dir()`を呼び出し、設定されたUDFライブラリディレクトリ(`kLibShardNum`サブディレクトリに整理されている)からUDFファイルを削除し、Java/Python UDFのサフィックス(.jar/.py)を持つファイルを削除します。無効になっている場合(デフォルト)、BEは既存のキャッシュされたUDFファイルを削除せずにロードします。これを有効にすると、再起動後の最初の使用時にUDFバイナリが再ダウンロードされることになります(ネットワークトラフィックと最初の使用時のレイテンシが増加します)。 +- 導入バージョン: v4.0.0 ##### dictionary_speculate_min_chunk_size -- Default: 10000 -- Type: Int -- Unit: 行 -- Is mutable: いいえ -- Description: `StringColumnWriter` および `DictColumnWriter` が辞書エンコーディング推測をトリガーするために使用する最小行数(チャンクサイズ)。入力列(または蓄積されたバッファと入力行)のサイズが `dictionary_speculate_min_chunk_size` 以上の場合、ライターは直ちに推測を実行し、より多くの行をバッファリングする代わりにエンコーディング(DICT、PLAINまたはBIT_SHUFFLE)を設定します。推測は文字列列の場合は `dictionary_encoding_ratio` を、数値/非文字列列の場合は `dictionary_encoding_ratio_for_non_string_column` を使用して、辞書エンコーディングが有益かどうかを決定します。また、大きな列の `byte_size`(UINT32_MAX以上)は、`BinaryColumn` のオーバーフローを避けるために即座の推測を強制します。 -- Introduced in: v3.2.0 +- デフォルト: 10000 +- タイプ: Int +- 単位: 行 +- 変更可能: いいえ +- 説明: StringColumnWriterとDictColumnWriterが辞書エンコーディングの推測をトリガーするために使用する最小行数(チャンクサイズ)。受信列(または累積バッファと受信行)のサイズが`dictionary_speculate_min_chunk_size`以上の場合、ライターは直ちに推測を実行し、より多くの行をバッファリングするのではなく、エンコーディング(DICT、PLAIN、またはBIT_SHUFFLE)を設定します。推測は、文字列列には`dictionary_encoding_ratio`を、数値/非文字列列には`dictionary_encoding_ratio_for_non_string_column`を使用して、辞書エンコーディングが有益かどうかを決定します。また、大きな列の`byte_size`(UINT32_MAX以上)は、`BinaryColumn`のオーバーフローを避けるために即座の推測を強制します。 +- 導入バージョン: v3.2.0 ##### disable_storage_page_cache -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: PageCacheを無効にするかどうかを制御するブール値。 - - PageCacheが有効になっている場合、StarRocksは最近スキャンされたデータをキャッシュします。 - - PageCacheは、類似のクエリが頻繁に繰り返される場合にクエリパフォーマンスを大幅に向上させることができます。 - - `true` はPageCacheを無効にすることを示します。 - - StarRocks v2.4以降、この項目のデフォルト値は `true` から `false` に変更されました。 -- Introduced in: - +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: PageCacheを無効にするかどうかを制御するブール値。 + - PageCacheが有効な場合、StarRocksは最近スキャンされたデータをキャッシュします。 + - 同様のクエリが頻繁に繰り返される場合、PageCacheはクエリパフォーマンスを大幅に向上させることができます。 + - `true`はPageCacheを無効にすることを示します。 + - この項目のデフォルト値は、StarRocks v2.4以降`true`から`false`に変更されました。 +- 導入バージョン: - ##### enable_bitmap_index_memory_page_cache -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: Bitmapインデックスのメモリキャッシュを有効にするかどうか。ポイントクエリを高速化するためにBitmapインデックスを使用する場合は、メモリキャッシュを推奨します。 -- Introduced in: v3.1 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: Bitmapインデックスのメモリキャッシュを有効にするかどうか。ポイントクエリを高速化するためにBitmapインデックスを使用する場合は、メモリキャッシュを推奨します。 +- 導入バージョン: v3.1 ##### enable_compaction_flat_json -- Default: True -- Type: Boolean -- Unit: -- Is mutable: はい -- Description: Flat JSONデータのコンパクションを有効にするかどうか。 -- Introduced in: v3.3.3 +- デフォルト: True +- タイプ: Boolean +- 単位: +- 変更可能: はい +- 説明: Flat JSONデータのコンパクションを有効にするかどうか。 +- 導入バージョン: v3.3.3 ##### enable_json_flat -- Default: false -- Type: Boolean -- Unit: -- Is mutable: はい -- Description: Flat JSON機能を有効にするかどうか。この機能が有効になった後、新しくロードされたJSONデータは自動的にフラット化され、JSONクエリのパフォーマンスが向上します。 -- Introduced in: v3.3.0 +- デフォルト: false +- タイプ: Boolean +- 単位: +- 変更可能: はい +- 説明: Flat JSON機能を有効にするかどうか。この機能を有効にすると、新しくロードされたJSONデータは自動的にフラット化され、JSONクエリのパフォーマンスが向上します。 +- 導入バージョン: v3.3.0 ##### enable_lazy_dynamic_flat_json -- Default: True -- Type: Boolean -- Unit: -- Is mutable: はい -- Description: 読み込みプロセスでFlat JSONスキーマが見つからない場合に、Lazy Dyamic Flat JSONを有効にするかどうか。この項目が `true` に設定されている場合、StarRocksはFlat JSON操作を読み込みプロセスではなく計算プロセスに延期します。 -- Introduced in: v3.3.3 +- デフォルト: True +- タイプ: Boolean +- 単位: +- 変更可能: はい +- 説明: 読み取りプロセスでFlat JSONスキーマが見つからない場合に、Lazy Dyamic Flat JSONを有効にするかどうか。この項目が`true`に設定されている場合、StarRocksはFlat JSON操作を読み取りプロセスではなく計算プロセスに延期します。 +- 導入バージョン: v3.3.3 ##### enable_ordinal_index_memory_page_cache -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 順序インデックスのメモリキャッシュを有効にするかどうか。順序インデックスは行IDからデータページ位置へのマッピングであり、スキャンを高速化するために使用できます。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 順序インデックスのメモリキャッシュを有効にするかどうか。順序インデックスは行IDからデータページ位置へのマッピングであり、スキャンを高速化するために使用できます。 +- 導入バージョン: - ##### enable_string_prefix_zonemap -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: プレフィックスベースの最小/最大値を使用して、文字列(CHAR/VARCHAR)列のZoneMapを有効にするかどうか。非キー文字列列の場合、最小/最大値は `string_prefix_zonemap_prefix_len` で設定された固定プレフィックス長に切り捨てられます。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: プレフィックスベースのmin/maxを使用して文字列(CHAR/VARCHAR)列のZoneMapを有効にするかどうか。非キー文字列列の場合、min/max値は`string_prefix_zonemap_prefix_len`で設定された固定プレフィックス長に切り詰められます。 +- 導入バージョン: - ##### enable_zonemap_index_memory_page_cache -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: zonemapインデックスのメモリキャッシュを有効にするかどうか。zonemapインデックスを使用してスキャンを高速化する場合は、メモリキャッシュを推奨します。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: ZoneMapインデックスのメモリキャッシュを有効にするかどうか。ZoneMapインデックスを使用してスキャンを高速化する場合は、メモリキャッシュを推奨します。 +- 導入バージョン: - ##### exchg_node_buffer_size_bytes -- Default: 10485760 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: 各クエリの交換ノードのレシーバー側の最大バッファサイズ。この構成項目はソフトリミットです。データが過剰な速度でレシーバー側に送信されたときにバックプレッシャがトリガーされます。 -- Introduced in: - +- デフォルト: 10485760 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: 各クエリの交換ノードのレシーバー側の最大バッファサイズ。この設定項目はソフトリミットです。データが過剰な速度でレシーバー側に送信されると、バックプレッシャがトリガーされます。 +- 導入バージョン: - ##### file_descriptor_cache_capacity -- Default: 16384 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: キャッシュできるファイルディスクリプタの数。 -- Introduced in: - +- デフォルト: 16384 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: キャッシュできるファイルディスクリプタの数。 +- 導入バージョン: - ##### flamegraph_tool_dir -- Default: `${STARROCKS_HOME}/bin/flamegraph` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: flamegraphツールのディレクトリ。プロファイルデータからフレームグラフを生成するためのpprof、stackcollapse-go.pl、およびflamegraph.plスクリプトが含まれている必要があります。 -- Introduced in: - +- デフォルト: `${STARROCKS_HOME}/bin/flamegraph` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: フレームグラフツールが格納されているディレクトリ。プロファイルデータからフレームグラフを生成するためのpprof、stackcollapse-go.pl、flamegraph.plスクリプトが含まれている必要があります。 +- 導入バージョン: - ##### fragment_pool_queue_size -- Default: 2048 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 各BEノードで処理できるクエリ数の上限。 -- Introduced in: - +- デフォルト: 2048 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 各BEノードで処理できるクエリ数の上限。 +- 導入バージョン: - ##### fragment_pool_thread_num_max -- Default: 4096 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: クエリに使用される最大スレッド数。 -- Introduced in: - +- デフォルト: 4096 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: クエリに使用される最大スレッド数。 +- 導入バージョン: - ##### fragment_pool_thread_num_min -- Default: 64 -- Type: Int -- Unit: 分 - -- Is mutable: いいえ -- Description: クエリに使用される最小スレッド数。 -- Introduced in: - +- デフォルト: 64 +- タイプ: Int +- 単位: 分 - +- 変更可能: いいえ +- 説明: クエリに使用される最小スレッド数。 +- 導入バージョン: - ##### hdfs_client_enable_hedged_read -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: ヘッジリード機能を有効にするかどうかを指定します。 -- Introduced in: v3.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: ヘッジドリード機能を有効にするかどうかを指定します。 +- 導入バージョン: v3.0 ##### hdfs_client_hedged_read_threadpool_size -- Default: 128 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: HDFSクライアントのヘッジリードスレッドプールのサイズを指定します。スレッドプールサイズは、HDFSクライアントでヘッジリードを実行するために割り当てるスレッド数を制限します。HDFSクラスターの **hdfs-site.xml** ファイルにある `dfs.client.hedged.read.threadpool.size` パラメータと同等です。 -- Introduced in: v3.0 +- デフォルト: 128 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: HDFSクライアントのヘッジドリードスレッドプールのサイズを指定します。スレッドプールのサイズは、HDFSクライアントでヘッジドリードの実行に専念するスレッドの数を制限します。HDFSクラスターの**hdfs-site.xml**ファイル内の`dfs.client.hedged.read.threadpool.size`パラメーターと同等です。 +- 導入バージョン: v3.0 ##### hdfs_client_hedged_read_threshold_millis -- Default: 2500 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: ヘッジリードを開始する前に待機するミリ秒数を指定します。たとえば、このパラメータを `30` に設定した場合、この状況で、ブロックからの読み取りが30ミリ秒以内に返されない場合、HDFSクライアントは直ちに異なるブロックレプリカに対して新しい読み取りを開始します。HDFSクラスターの **hdfs-site.xml** ファイルにある `dfs.client.hedged.read.threshold.millis` パラメータと同等です。 -- Introduced in: v3.0 +- デフォルト: 2500 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: ヘッジドリードを開始するまでの待機ミリ秒数を指定します。たとえば、このパラメーターを`30`に設定した場合、ブロックからの読み取りが30ミリ秒以内に返されない場合、HDFSクライアントは直ちに異なるブロックレプリカに対して新しい読み取りを開始します。HDFSクラスターの**hdfs-site.xml**ファイル内の`dfs.client.hedged.read.threshold.millis`パラメーターと同等です。 +- 導入バージョン: v3.0 ##### io_coalesce_adaptive_lazy_active -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 述語の選択性に基づいて、述語列と非述語列のI/Oを結合するかどうかを適応的に決定します。 -- Introduced in: v3.2 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 述語の選択性に基づいて、述語列と非述語列のI/Oを結合するかどうかを適応的に決定します。 +- 導入バージョン: v3.2 ##### jit_lru_cache_size -- Default: 0 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: JITコンパイル用のLRUキャッシュサイズ。0より大きい値に設定された場合、キャッシュの実際のサイズを表します。0以下に設定された場合、システムは式 `jit_lru_cache_size = min(mem_limit*0.01, 1GB)` を使用してキャッシュを適応的に設定します(ノードの `mem_limit` は16GB以上である必要があります)。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: JITコンパイルのためのLRUキャッシュサイズ。0より大きい値に設定されている場合、キャッシュの実際のサイズを表します。0以下に設定されている場合、システムは`jit_lru_cache_size = min(mem_limit*0.01, 1GB)`の式を使用してキャッシュを適応的に設定します(ただし、ノードの`mem_limit`は16 GB以上である必要があります)。 +- 導入バージョン: - ##### json_flat_column_max -- Default: 100 -- Type: Int -- Unit: -- Is mutable: はい -- Description: Flat JSONで抽出できるサブフィールドの最大数。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 -- Introduced in: v3.3.0 +- デフォルト: 100 +- タイプ: Int +- 単位: +- 変更可能: はい +- 説明: Flat JSONで抽出できるサブフィールドの最大数。このパラメーターは、`enable_json_flat`が`true`に設定されている場合にのみ有効になります。 +- 導入バージョン: v3.3.0 ##### json_flat_create_zonemap -- Default: true -- Type: Boolean -- Unit: -- Is mutable: はい -- Description: 書き込み中にフラット化されたJSONサブ列のZoneMapを作成するかどうか。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: +- 変更可能: はい +- 説明: 書き込み時にフラット化されたJSONサブカラムのZoneMapを作成するかどうか。このパラメーターは、`enable_json_flat`が`true`に設定されている場合にのみ有効になります。 +- 導入バージョン: - ##### json_flat_null_factor -- Default: 0.3 -- Type: Double -- Unit: -- Is mutable: はい -- Description: Flat JSONで抽出する列のNULL値の割合。NULL値の割合がこの閾値よりも高い場合、列は抽出されません。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 -- Introduced in: v3.3.0 +- デフォルト: 0.3 +- タイプ: Double +- 単位: +- 変更可能: はい +- 説明: Flat JSONで抽出するカラム内のNULL値の割合。NULL値の割合がこの閾値より高い場合、カラムは抽出されません。このパラメーターは、`enable_json_flat`が`true`に設定されている場合にのみ有効になります。 +- 導入バージョン: v3.3.0 ##### json_flat_sparsity_factor -- Default: 0.3 -- Type: Double -- Unit: -- Is mutable: はい -- Description: Flat JSONの同じ名前を持つ列の割合。同じ名前を持つ列の割合がこの値よりも低い場合、抽出は実行されません。このパラメータは、`enable_json_flat` が `true` に設定されている場合にのみ有効です。 -- Introduced in: v3.3.0 +- デフォルト: 0.3 +- タイプ: Double +- 単位: +- 変更可能: はい +- 説明: Flat JSONの同名カラムの割合。同名カラムの割合がこの値よりも低い場合、抽出は実行されません。このパラメーターは、`enable_json_flat`が`true`に設定されている場合にのみ有効になります。 +- 導入バージョン: v3.3.0 ##### lake_tablet_ignore_invalid_delete_predicate -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 列名が変更された後、重複キーテーブルへの論理削除によって導入された可能性のあるタブレットのrowsetメタデータ内の無効な削除述語を無視するかどうかを制御するブール値。 -- Introduced in: v4.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: カラム名変更後に重複キーテーブルに対する論理削除によって導入された、タブレットロウセットメタデータ内の無効な削除述語を無視するかどうかを制御するブール値。 +- 導入バージョン: v4.0 ##### late_materialization_ratio -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: SegmentIterator(ベクトルクエリエンジン)での遅延マテリアライズの使用を制御する範囲[0-1000]の整数比。値 `0`(または ≤ 0)は遅延マテリアライズを無効にします。`1000`(または ≥ 1000)はすべての読み取りに対して遅延マテリアライズを強制します。0より大きく1000未満の値は、述語フィルタ比率に基づいて動作を選択する条件付き戦略を有効にします(値が大きいほど遅延マテリアライズが優先されます)。セグメントに複雑なメトリックタイプが含まれている場合、StarRocksは代わりに `metric_late_materialization_ratio` を使用します。`lake_io_opts.cache_file_only` が設定されている場合、遅延マテリアライズは無効になります。 -- Introduced in: v3.2.0 +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: SegmentIterator(ベクタークエリエンジン)での遅延具現化の使用を制御する[0-1000]の範囲の整数比率。`0`(または`<= 0`)の値は遅延具現化を無効にし、`1000`(または`>= 1000`)はすべての読み取りで遅延具現化を強制します。`> 0`かつ`< 1000`の値は、述語フィルター比率に基づいてイテレータが動作を選択する条件付き戦略を有効にします(値が高いほど遅延具現化を優先)。セグメントに複雑なメトリックタイプが含まれている場合、StarRocksは代わりに`metric_late_materialization_ratio`を使用します。`lake_io_opts.cache_file_only`が設定されている場合、遅延具現化は無効になります。 +- 導入バージョン: v3.2.0 ##### max_hdfs_file_handle -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 開くことができるHDFSファイルディスクリプタの最大数。 -- Introduced in: - +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 開くことができるHDFSファイルディスクリプタの最大数。 +- 導入バージョン: - ##### max_memory_sink_batch_count -- Default: 20 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: スキャンキャッシュバッチの最大数。 -- Introduced in: - +- デフォルト: 20 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: Scan Cacheバッチの最大数。 +- 導入バージョン: - ##### max_pushdown_conditions_per_column -- Default: 1024 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 各列でプッシュダウンを許可する条件の最大数。条件数がこの制限を超えると、述語はストレージ層にプッシュダウンされません。 -- Introduced in: - +- デフォルト: 1024 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各カラムでプッシュダウンを許可する条件の最大数。条件数がこの制限を超えると、述語はストレージ層にプッシュダウンされません。 +- 導入バージョン: - ##### max_scan_key_num -- Default: 1024 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 各クエリでセグメント化されるスキャンキーの最大数。 -- Introduced in: - +- デフォルト: 1024 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各クエリによってセグメント化されるスキャンキーの最大数。 +- 導入バージョン: - ##### metric_late_materialization_ratio -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 複雑なメトリック列を含む読み取りに対して、遅延マテリアライズ行アクセス戦略がいつ使用されるかを制御します。有効範囲:[0-1000]。`0` は遅延マテリアライズを無効にします。`1000` はすべての適用可能な読み取りに対して遅延マテリアライズを強制します。1〜999の値は、遅延マテリアライズと早期マテリアライズの両方のコンテキストが準備され、述語/選択性に基づいて実行時に選択される条件付き戦略を有効にします。複雑なメトリックタイプが存在する場合、`metric_late_materialization_ratio` は一般的な `late_materialization_ratio` を上書きします。注:`cache_file_only` I/Oモードでは、この設定に関係なく遅延マテリアライズが無効になります。 -- Introduced in: v3.2.0 +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 複雑なメトリックカラムを含む読み取りに対して遅延具現化行アクセス戦略がいつ使用されるかを制御します。有効範囲: [0-1000]。`0`は遅延具現化を無効にし、`1000`はすべての適用可能な読み取りで遅延具現化を強制します。1-999の値は、述語/選択性に基づいて実行時に遅延具現化コンテキストと早期具現化コンテキストの両方が準備され、選択される条件付き戦略を有効にします。複雑なメトリックタイプが存在する場合、`metric_late_materialization_ratio`は一般的な`late_materialization_ratio`をオーバーライドします。注意: `cache_file_only` I/Oモードでは、この設定に関わらず遅延具現化が無効になります。 +- 導入バージョン: v3.2.0 ##### min_file_descriptor_number -- Default: 60000 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BEプロセスのファイルディスクリプタの最小数。 -- Introduced in: - +- デフォルト: 60000 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BEプロセス内のファイルディスクリプタの最小数。 +- 導入バージョン: - ##### object_storage_connect_timeout_ms -- Default: -1 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: オブジェクトストレージとのソケット接続を確立するためのタイムアウト期間。`-1` はSDK構成のデフォルトのタイムアウト期間を使用することを示します。 -- Introduced in: v3.0.9 +- デフォルト: -1 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: オブジェクトストレージとのソケット接続を確立するためのタイムアウト期間。`-1`はSDK設定のデフォルトタイムアウト期間を使用することを示します。 +- 導入バージョン: v3.0.9 ##### object_storage_request_timeout_ms -- Default: -1 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: オブジェクトストレージとのHTTP接続を確立するためのタイムアウト期間。`-1` はSDK構成のデフォルトのタイムアウト期間を使用することを示します。 -- Introduced in: v3.0.9 +- デフォルト: -1 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: オブジェクトストレージとのHTTP接続を確立するためのタイムアウト期間。`-1`はSDK設定のデフォルトタイムアウト期間を使用することを示します。 +- 導入バージョン: v3.0.9 ##### parquet_late_materialization_enable -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: パフォーマンス向上のためにParquetリーダーの遅延マテリアライズを有効にするかどうかを制御するブール値。`true` は遅延マテリアライズを有効にすることを示し、`false` は無効にすることを示します。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: パフォーマンス向上のためにParquetリーダーの遅延具現化を有効にするかどうかを制御するブール値。`true`は遅延具現化を有効にすることを示し、`false`は無効にすることを示します。 +- 導入バージョン: - ##### parquet_page_index_enable -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: パフォーマンス向上のためにParquetファイルのページインデックスを有効にするかどうかを制御するブール値。`true` はページインデックスを有効にすることを示し、`false` は無効にすることを示します。 -- Introduced in: v3.3 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: パフォーマンス向上のためにParquetファイルのページインデックスを有効にするかどうかを制御するブール値。`true`はページインデックスを有効にすることを示し、`false`は無効にすることを示します。 +- 導入バージョン: v3.3 ##### parquet_reader_bloom_filter_enable -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: パフォーマンス向上のためにParquetファイルのブルームフィルタを有効にするかどうかを制御するブール値。`true` はブルームフィルタを有効にすることを示し、`false` は無効にすることを示します。セッションレベルでシステム変数 `enable_parquet_reader_bloom_filter` を使用してこの動作を制御することもできます。Parquetのブルームフィルタは、**各行グループ内の列レベルで**維持されます。Parquetファイルが特定の列のブルームフィルタを含む場合、クエリはそれらの列の述語を使用して効率的に行グループをスキップできます。 -- Introduced in: v3.5 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: Parquetファイルのブルームフィルターを有効にしてパフォーマンスを向上させるかどうかを制御するブール値。`true`はブルームフィルターを有効にすることを示し、`false`は無効にすることを示します。セッションレベルでシステム変数`enable_parquet_reader_bloom_filter`を使用してこの動作を制御することもできます。Parquetのブルームフィルターは**各ロウグループ内のカラムレベルで**管理されます。Parquetファイルに特定のカラムのブルームフィルターが含まれている場合、クエリはそれらのカラムに対する述語を使用してロウグループを効率的にスキップできます。 +- 導入バージョン: v3.5 ##### path_gc_check_step -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 毎回連続してスキャンできるファイルの最大数。 -- Introduced in: - +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 毎回連続してスキャンできるファイルの最大数。 +- 導入バージョン: - ##### path_gc_check_step_interval_ms -- Default: 10 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: ファイルスキャン間の時間間隔。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: ファイルスキャン間の時間間隔。 +- 導入バージョン: - ##### path_scan_interval_second -- Default: 86400 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: GCが期限切れデータをクリーンアップする時間間隔。 -- Introduced in: - +- デフォルト: 86400 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: GCが期限切れデータをクリーンアップする時間間隔。 +- 導入バージョン: - ##### pipeline_connector_scan_thread_num_per_cpu -- Default: 8 -- Type: Double -- Unit: - -- Is mutable: はい -- Description: BEノードのPipeline ConnectorにCPUコアごとに割り当てられるスキャンスレッド数。この設定はv3.1.7以降、動的に変更可能になりました。 -- Introduced in: - +- デフォルト: 8 +- タイプ: Double +- 単位: - +- 変更可能: はい +- 説明: BEノードのPipeline ConnectorにCPUコアごとに割り当てられるスキャンスレッド数。この設定はv3.1.7以降動的に変更可能になりました。 +- 導入バージョン: - ##### pipeline_poller_timeout_guard_ms -- Default: -1 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: この項目が `0` より大きく設定されている場合、pollerでドライバーが単一のディスパッチに `pipeline_poller_timeout_guard_ms` よりも長くかかると、ドライバーとオペレーターの情報が出力されます。 -- Introduced in: - +- デフォルト: -1 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: この項目が`0`より大きい値に設定されている場合、ポーラーでドライバーが単一のディスパッチに`pipeline_poller_timeout_guard_ms`より長い時間がかかった場合、ドライバーとオペレーターの情報が出力されます。 +- 導入バージョン: - ##### pipeline_prepare_thread_pool_queue_size -- Default: 102400 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: Pipeline実行エンジンのPREPAREフラグメントスレッドプールの最大キュー長。 -- Introduced in: - +- デフォルト: 102400 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: Pipeline実行エンジンのPREPAREフラグメントスレッドプールの最大キュー長。 +- 導入バージョン: - ##### pipeline_prepare_thread_pool_thread_num -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: パイプライン実行エンジンのPREPAREフラグメントスレッドプールのスレッド数。`0` はシステムVCPUコア数に等しいことを示します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: パイプライン実行エンジンPREPAREフラグメントスレッドプールのスレッド数。`0`はシステムVCPUコア数に等しいことを示します。 +- 導入バージョン: - ##### pipeline_prepare_timeout_guard_ms -- Default: -1 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: この項目が `0` より大きく設定されている場合、PREPAREプロセス中にプランフラグメントが `pipeline_prepare_timeout_guard_ms` を超えると、プランフラグメントのスタックトレースが出力されます。 -- Introduced in: - +- デフォルト: -1 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: この項目が`0`より大きい値に設定されている場合、PREPAREプロセス中に計画フラグメントが`pipeline_prepare_timeout_guard_ms`を超えた場合、計画フラグメントのスタックトレースが出力されます。 +- 導入バージョン: - ##### pipeline_scan_thread_pool_queue_size -- Default: 102400 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: Pipeline実行エンジンのSCANスレッドプールの最大タスクキュー長。 -- Introduced in: - +- デフォルト: 102400 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: Pipeline実行エンジンのSCANスレッドプールの最大タスクキュー長。 +- 導入バージョン: - ##### pk_index_parallel_get_threadpool_size -- Default: 1048576 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データ(クラウドネイティブ/lake)モードでのPKインデックス並列取得操作で使用される「cloud_native_pk_index_get」スレッドプールの最大キューサイズ(保留中のタスク数)を設定します。このプールの実際のスレッド数は `pk_index_parallel_get_threadpool_max_threads` によって制御されます。この設定は、実行を待機しているタスクがどれだけキューに入れられるかを制限するだけです。非常に大きいデフォルト値(2^20)は、キューを事実上無制限にします。これを減らすと、キューに入れられたタスクによる過剰なメモリ増加を防ぎますが、キューがいっぱいになるとタスクの送信がブロックされたり失敗したりする可能性があります。ワークロードの同時実行性とメモリ制約に基づいて `pk_index_parallel_get_threadpool_max_threads` と一緒に調整してください。 -- Introduced in: - +- デフォルト: 1048576 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データ(クラウドネイティブ/Lake)モードでのPKインデックス並列取得操作で使用される"cloud_native_pk_index_get"スレッドプールの最大キューサイズ(保留中のタスク数)を設定します。このプールの実際のスレッド数は`pk_index_parallel_get_threadpool_max_threads`によって制御されます。この設定は、実行を待機しているタスクがキューにいくつまで入るかを制限するだけです。非常に大きいデフォルト値(2^20)は、キューを実質的に無制限にします。これを減らすと、キューに入れられたタスクによる過剰なメモリ増加を防ぎますが、キューがいっぱいになったときにタスクの送信がブロックまたは失敗する可能性があります。ワークロードの並行処理とメモリ制約に基づいて`pk_index_parallel_get_threadpool_max_threads`とともに調整してください。 +- 導入バージョン: - ##### priority_queue_remaining_tasks_increased_frequency -- Default: 512 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: `BlockingPriorityQueue` がすべての残りのタスクの優先度を増加(「エージング」)させて飢餓を防ぐ頻度を制御します。正常なget/popごとに内部 `_upgrade_counter` がインクリメントされます。`_upgrade_counter` が `priority_queue_remaining_tasks_increased_frequency` を超えると、キューはすべての要素の優先度をインクリメントし、ヒープを再構築し、カウンターをリセットします。値が低いほど優先度エージングが頻繁になります(飢餓は減少しますが、イテレーションと再ヒープ化によるCPUコストが増加します)。値が高いほどオーバーヘッドは減少しますが、優先度調整が遅れます。この値は単純な操作数閾値であり、時間 duration ではありません。 -- Introduced in: v3.2.0 +- デフォルト: 512 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: `BlockingPriorityQueue`が飢餓状態を避けるために、残りのすべてのタスクの優先度を「経過時間に基づいて」(age)どのくらいの頻度で増加させるかを制御します。get/popが成功するたびに内部の`_upgrade_counter`が増加します。`_upgrade_counter`が`priority_queue_remaining_tasks_increased_frequency`を超えると、キューはすべての要素の優先度を増加させ、ヒープを再構築し、カウンターをリセットします。値が低いほど優先度エイジングが頻繁に行われ(飢餓を減らすが、反復と再ヒープ化によるCPUコストが増加)、値が高いほどオーバーヘッドが減少しますが、優先度調整が遅延します。この値は単純な操作カウント閾値であり、時間期間ではありません。 +- 導入バージョン: v3.2.0 ##### query_cache_capacity -- Default: 536870912 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: BEのクエリキャッシュのサイズ。デフォルトサイズは512MBです。サイズは4MB未満にすることはできません。BEのメモリ容量が予想されるクエリキャッシュサイズをプロビジョニングするのに不十分な場合、BEのメモリ容量を増やすことができます。 -- Introduced in: - +- デフォルト: 536870912 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: BE内のクエリキャッシュのサイズ。デフォルトサイズは512 MBです。サイズは4 MB未満であってはなりません。BEのメモリ容量が予想されるクエリキャッシュサイズをプロビジョニングするのに不十分な場合、BEのメモリ容量を増やすことができます。 +- 導入バージョン: - ##### query_pool_spill_mem_limit_threshold -- Default: 1.0 -- Type: Double -- Unit: - -- Is mutable: いいえ -- Description: 自動スピリングが有効になっている場合、すべてのクエリのメモリ使用量が `query_pool memory limit * query_pool_spill_mem_limit_threshold` を超えると、中間結果のスピリングがトリガーされます。 -- Introduced in: v3.2.7 +- デフォルト: 1.0 +- タイプ: Double +- 単位: - +- 変更可能: いいえ +- 説明: 自動スピルが有効な場合、すべてのクエリのメモリ使用量が`query_pool memory limit * query_pool_spill_mem_limit_threshold`を超えると、中間結果のスピルがトリガーされます。 +- 導入バージョン: v3.2.7 ##### query_scratch_dirs -- Default: `${STARROCKS_HOME}` -- Type: string -- Unit: - -- Is mutable: いいえ -- Description: クエリ実行が中間データ(例えば、外部ソート、ハッシュ結合、その他のオペレーター)をスピルするために使用する書き込み可能なスクラッチディレクトリのカンマ区切りリスト。セミコロン(;)で区切られた1つ以上のパスを指定します(例:`/mnt/ssd1/tmp;/mnt/ssd2/tmp`)。ディレクトリはBEプロセスからアクセスおよび書き込み可能であり、十分な空きスペースが必要です。StarRocksはそれらの中から選択してスピルI/Oを分散させます。変更を有効にするには再起動が必要です。ディレクトリが見つからない、書き込みできない、または満杯の場合、スピルが失敗したり、クエリパフォーマンスが低下したりする可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: `${STARROCKS_HOME}` +- タイプ: string +- 単位: - +- 変更可能: いいえ +- 説明: クエリ実行が中間データ(例えば、外部ソート、ハッシュ結合、その他の演算子)をスピルするために使用する書き込み可能なスクラッチディレクトリのコンマ区切りリスト。セミコロン(`;`)で区切って1つ以上のパスを指定します(例:`/mnt/ssd1/tmp;/mnt/ssd2/tmp`)。ディレクトリはBEプロセスからアクセス可能で書き込み可能であり、十分な空き容量が必要です。StarRocksは、スピルI/Oを分散するためにそれらのうちから選択します。変更は有効になるために再起動が必要です。ディレクトリが存在しない、書き込み可能でない、または満杯の場合、スピルが失敗したり、クエリパフォーマンスが低下したりする可能性があります。 +- 導入バージョン: v3.2.0 ##### result_buffer_cancelled_interval_time -- Default: 300 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: BufferControlBlockがデータを解放するまでの待機時間。 -- Introduced in: - +- デフォルト: 300 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: BufferControlBlockがデータを解放するまでの待機時間。 +- 導入バージョン: - ##### scan_context_gc_interval_min -- Default: 5 -- Type: Int -- Unit: 分 -- Is mutable: はい -- Description: スキャンコンテキストをクリーンアップする時間間隔。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: 分 +- 変更可能: はい +- 説明: スキャンコンテキストをクリーンアップする時間間隔。 +- 導入バージョン: - ##### scanner_row_num -- Default: 16384 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: スキャンで各スキャンスレッドが返す最大行数。 -- Introduced in: - +- デフォルト: 16384 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各スキャンスレッドが1回のスキャンで返す最大行数。 +- 導入バージョン: - ##### scanner_thread_pool_queue_size -- Default: 102400 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: ストレージエンジンがサポートするスキャンタスクの数。 -- Introduced in: - +- デフォルト: 102400 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: ストレージエンジンがサポートするスキャンタスクの数。 +- 導入バージョン: - ##### scanner_thread_pool_thread_num -- Default: 48 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: ストレージエンジンが同時ストレージボリュームスキャンに使用するスレッド数。すべてのスレッドはスレッドプールで管理されます。 -- Introduced in: - +- デフォルト: 48 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: ストレージエンジンが並行ストレージボリュームスキャンに使用するスレッド数。すべてのスレッドはスレッドプールで管理されます。 +- 導入バージョン: - ##### string_prefix_zonemap_prefix_len -- Default: 16 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: `enable_string_prefix_zonemap` が有効な場合、文字列ZoneMapの最小/最大値に使用されるプレフィックス長。 -- Introduced in: - +- デフォルト: 16 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: `enable_string_prefix_zonemap`が有効な場合、文字列ZoneMapのmin/maxに使用されるプレフィックス長。 +- 導入バージョン: - ##### udf_thread_pool_size -- Default: 1 -- Type: Int -- Unit: スレッド -- Is mutable: いいえ -- Description: ExecEnvで作成されるUDF呼び出しPriorityThreadPool(ユーザー定義関数/UDF関連タスクの実行に使用)のサイズを設定します。この値はスレッドプール(PriorityThreadPool("udf", thread_num, queue_size))を構築する際にプールのスレッド数とキュー容量として使用されます。同時UDF実行を増やすにはこの値を増やし、過度なCPUとメモリの競合を避けるために小さく保ちます。 -- Introduced in: v3.2.0 +- デフォルト: 1 +- タイプ: Int +- 単位: スレッド +- 変更可能: いいえ +- 説明: ExecEnvで作成されるUDF呼び出し`PriorityThreadPool`のサイズを設定します(ユーザー定義関数/UDF関連タスクの実行に使用されます)。この値は、スレッドプールの作成時(`PriorityThreadPool("udf", thread_num, queue_size)`)にプールのスレッド数およびプールキュー容量として使用されます。同時UDF実行を増やすにはこの値を増やし、過剰なCPUとメモリの競合を避けるにはこの値を小さく保ってください。 +- 導入バージョン: v3.2.0 ##### update_memory_limit_percent -- Default: 60 -- Type: Int -- Unit: パーセント -- Is mutable: いいえ -- Description: 更新関連のメモリとキャッシュのために予約されるBEプロセスメモリの割合。起動中に `GlobalEnv` は更新用の `MemTracker` を `process_mem_limit` * clamp(`update_memory_limit_percent`, 0, 100) / 100 として計算します。`UpdateManager` もこのパーセンテージを使用して、プライマリインデックス/インデックスキャッシュ容量(インデックスキャッシュ容量 = `GlobalEnv::process_mem_limit` * `update_memory_limit_percent` / 100)をサイズ設定します。HTTP設定更新ロジックは、設定が変更された場合に更新サブシステムに適用される `update_primary_index_memory_limit` を更新マネージャーで呼び出すコールバックを登録します。この値を増やすと、更新/プライマリインデックスパスにより多くのメモリが割り当てられ(他のプールで利用可能なメモリが減少)、減らすと更新メモリとキャッシュ容量が減少します。値は0〜100の範囲にクランプされます。 -- Introduced in: v3.2.0 +- デフォルト: 60 +- タイプ: Int +- 単位: パーセント +- 変更可能: いいえ +- 説明: BEプロセスメモリのうち、更新関連メモリおよびキャッシュのために予約される割合。起動時、`GlobalEnv`は更新用の`MemTracker`を`process_mem_limit * clamp(update_memory_limit_percent, 0, 100) / 100`として計算します。`UpdateManager`もこの割合を使用してプライマリインデックス/インデックスキャッシュ容量を決定します(インデックスキャッシュ容量 = `GlobalEnv::process_mem_limit * update_memory_limit_percent / 100`)。HTTP設定更新ロジックは、更新マネージャーで`update_primary_index_memory_limit`を呼び出すコールバックを登録するため、設定が変更された場合、更新サブシステムに適用されます。この値を増やすと、更新/プライマリインデックスパスにより多くのメモリが割り当てられ(他のプールで利用可能なメモリが減少)、減らすと更新メモリとキャッシュ容量が減少します。値は0〜100の範囲にクランプされます。 +- 導入バージョン: v3.2.0 ##### vector_chunk_size -- Default: 4096 -- Type: Int -- Unit: 行 -- Is mutable: いいえ -- Description: 実行およびストレージコードパス全体で使用されるベクトル化されたチャンク(バッチ)ごとの行数。この値は `Chunk` および `RuntimeState` の `batch_size` の作成を制御し、オペレーターのスループット、オペレーターごとのメモリフットプリント、スピルおよびソートバッファのサイジング、I/Oヒューリスティックス(例:ORCライターの自然な書き込みサイズ)に影響します。これを増やすと、ワイド/CPUバウンドのワークロードでCPUおよびI/O効率が向上する可能性がありますが、ピークメモリ使用量が増加し、小規模な結果のクエリでレイテンシが増加する可能性があります。プロファイリングでバッチサイズがボトルネックであることが示された場合にのみ調整してください。それ以外の場合は、バランスの取れたメモリとパフォーマンスのためにデフォルトを維持してください。 -- Introduced in: v3.2.0 +- デフォルト: 4096 +- タイプ: Int +- 単位: 行 +- 変更可能: いいえ +- 説明: 実行およびストレージコードパス全体で使用される、ベクター化されたチャンク(バッチ)あたりの行数。この値は、ChunkおよびRuntimeStateのbatch_size作成を制御し、オペレーターのスループット、オペレーターあたりのメモリフットプリント、スピルおよびソートバッファのサイズ設定、I/Oヒューリスティック(例えば、ORCライターの自然な書き込みサイズ)に影響します。これを増やすと、ワイド/CPUバウンドのワークロードのCPUおよびI/O効率が向上する可能性がありますが、ピークメモリ使用量が増加し、小さな結果のクエリのレイテンシが増加する可能性があります。バッチサイズがボトルネックであることがプロファイリングで示された場合にのみ調整してください。そうでない場合は、バランスの取れたメモリとパフォーマンスのためにデフォルトのままにしてください。 +- 導入バージョン: v3.2.0 ### ロード ##### clear_transaction_task_worker_count -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: トランザクションをクリアするために使用されるスレッド数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: トランザクションクリアに使用されるスレッド数。 +- 導入バージョン: - ##### column_mode_partial_update_insert_batch_size -- Default: 4096 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 挿入された行を処理する際の列モード部分更新のバッチサイズ。この項目が `0` 以下に設定されている場合、無限ループを避けるために `1` に制限されます。この項目は、各バッチで処理される新しく挿入された行の数を制御します。値を大きくすると書き込みパフォーマンスが向上しますが、より多くのメモリを消費します。 -- Introduced in: v3.5.10, v4.0.2 +- デフォルト: 4096 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 挿入された行を処理する際のカラムモードの部分更新のバッチサイズ。この項目が`0`または負の値に設定されている場合、無限ループを避けるために`1`にクランプされます。この項目は、各バッチで処理される新しく挿入された行の数を制御します。値が大きいほど書き込みパフォーマンスが向上しますが、より多くのメモリを消費します。 +- 導入バージョン: v3.5.10, v4.0.2 ##### enable_load_spill_parallel_merge -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 単一タブレット内での並列スピルマージを有効にするかどうかを指定します。これを有効にすると、データロード中のスピルマージのパフォーマンスを向上させることができます。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 単一タブレット内での並列スピルマージを有効にするかどうかを指定します。これを有効にすると、データロード中のスピルマージのパフォーマンスを向上させることができます。 +- 導入バージョン: - ##### enable_stream_load_verbose_log -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: Stream LoadジョブのHTTPリクエストとレスポンスをログに記録するかどうかを指定します。 -- Introduced in: v2.5.17, v3.0.9, v3.1.6, v3.2.1 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: Stream LoadジョブのHTTPリクエストとレスポンスをログに記録するかどうかを指定します。 +- 導入バージョン: v2.5.17, v3.0.9, v3.1.6, v3.2.1 ##### flush_thread_num_per_store -- Default: 2 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 各ストアでMemTableをフラッシュするために使用されるスレッド数。 -- Introduced in: - +- デフォルト: 2 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各ストアでMemTableをフラッシュするために使用されるスレッド数。 +- 導入バージョン: - ##### lake_flush_thread_num_per_store -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターの各ストアでMemTableをフラッシュするために使用されるスレッド数。 -この値が `0` に設定されている場合、システムはCPUコア数の2倍の値を自動的に使用します。 -この値が `0` 未満に設定されている場合、システムはその絶対値とCPUコア数の積を値として使用します。 -- Introduced in: v3.1.12, 3.2.7 +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターの各ストアでMemTableをフラッシュするために使用されるスレッド数。 +値が`0`に設定されている場合、システムはCPUコア数の2倍の値を自動的に使用します。 +値が`0`より小さい値に設定されている場合、システムはその絶対値とCPUコア数の積の値を自動的に使用します。 +- 導入バージョン: v3.1.12, 3.2.7 ##### load_data_reserve_hours -- Default: 4 -- Type: Int -- Unit: 時間 -- Is mutable: いいえ -- Description: 小規模なロードによって生成されるファイルの予約時間。 -- Introduced in: - +- デフォルト: 4 +- タイプ: Int +- 単位: 時間 +- 変更可能: いいえ +- 説明: 小規模なローディングによって生成されたファイルの予約時間。 +- 導入バージョン: - ##### load_error_log_reserve_hours -- Default: 48 -- Type: Int -- Unit: 時間 -- Is mutable: はい -- Description: データロードログが予約される時間。 -- Introduced in: - +- デフォルト: 48 +- タイプ: Int +- 単位: 時間 +- 変更可能: はい +- 説明: データローディングログが予約される期間。 +- 導入バージョン: - ##### load_process_max_memory_limit_bytes -- Default: 107374182400 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: BEノード上のすべてのロードプロセスが占有できるメモリリソースの最大サイズ制限。 -- Introduced in: - +- デフォルト: 107374182400 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: BEノード上のすべてのロードプロセスが占有できるメモリリソースの最大サイズ制限。 +- 導入バージョン: - ##### load_spill_memory_usage_per_merge -- Default: 1073741824 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: スピルマージ中のマージ操作ごとの最大メモリ使用量。デフォルトは1GB(1073741824バイト)。このパラメータは、データロードスピルマージ中の個々のマージタスクのメモリ消費を制御し、過剰なメモリ使用量を防止します。 -- Introduced in: - +- デフォルト: 1073741824 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: スピルマージ中の各マージ操作あたりの最大メモリ使用量。デフォルトは1 GB(1073741824バイト)。このパラメーターは、データロードスピルマージ中の個々のマージタスクのメモリ消費を制御し、過剰なメモリ使用量を防ぎます。 +- 導入バージョン: - ##### max_consumer_num_per_group -- Default: 3 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Routine Loadのコンシューマグループ内の最大コンシューマ数。 -- Introduced in: - +- デフォルト: 3 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: Routine Loadのコンシューマーグループ内の最大コンシューマー数。 +- 導入バージョン: - ##### max_runnings_transactions_per_txn_map -- Default: 100 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 各パーティションで同時に実行できるトランザクションの最大数。 -- Introduced in: - +- デフォルト: 100 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各パーティションで同時に実行できるトランザクションの最大数。 +- 導入バージョン: - ##### number_tablet_writer_threads -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Stream Load、Broker Load、Insertなどの取り込みで使用されるタブレットライタースレッドの数。パラメータが0以下に設定されている場合、システムはCPUコア数の半分(最小16)を使用します。パラメータが0より大きい場合、システムはその値を使用します。この設定はv3.1.7以降、動的に変更可能になりました。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: Stream Load、Broker Load、Insertなどの取り込みで使用されるタブレットライタースレッドの数。パラメーターが0以下の値に設定されている場合、システムはCPUコア数の半分(最小16)を使用します。パラメーターが0より大きい値に設定されている場合、システムはその値を使用します。この設定はv3.1.7以降動的に変更可能になりました。 +- 導入バージョン: - ##### push_worker_count_high_priority -- Default: 3 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: HIGH優先度のロードタスクを処理するために使用されるスレッド数。 -- Introduced in: - +- デフォルト: 3 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: HIGH優先度でロードタスクを処理するために使用されるスレッド数。 +- 導入バージョン: - ##### push_worker_count_normal_priority -- Default: 3 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: NORMAL優先度のロードタスクを処理するために使用されるスレッド数。 -- Introduced in: - +- デフォルト: 3 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: NORMAL優先度でロードタスクを処理するために使用されるスレッド数。 +- 導入バージョン: - ##### streaming_load_max_batch_size_mb -- Default: 100 -- Type: Int -- Unit: MB -- Is mutable: はい -- Description: StarRocksにストリーミングできるJSONファイルの最大サイズ。 -- Introduced in: - +- デフォルト: 100 +- タイプ: Int +- 単位: MB +- 変更可能: はい +- 説明: StarRocksにストリームできるJSONファイルの最大サイズ。 +- 導入バージョン: - ##### streaming_load_max_mb -- Default: 102400 -- Type: Int -- Unit: MB -- Is mutable: はい -- Description: StarRocksにストリーミングできるファイルの最大サイズ。v3.0以降、デフォルト値は `10240` から `102400` に変更されました。 -- Introduced in: - +- デフォルト: 102400 +- タイプ: Int +- 単位: MB +- 変更可能: はい +- 説明: StarRocksにストリームできるファイルの最大サイズ。v3.0から、デフォルト値は`10240`から`102400`に変更されました。 +- 導入バージョン: - ##### streaming_load_rpc_max_alive_time_sec -- Default: 1200 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: Stream LoadのRPCタイムアウト。 -- Introduced in: - +- デフォルト: 1200 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: Stream LoadのRPCタイムアウト。 +- 導入バージョン: - ##### transaction_publish_version_thread_pool_idle_time_ms -- Default: 60000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: Publish Versionスレッドプールによってスレッドが再利用されるまでのアイドル時間。 -- Introduced in: - +- デフォルト: 60000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: Publish Versionスレッドプールがスレッドを回収するまでのアイドル時間。 +- 導入バージョン: - ##### transaction_publish_version_worker_count -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: バージョンを公開するために使用されるスレッドの最大数。この値が `0` 以下に設定されている場合、インポートの同時実行性が高いのに固定スレッド数しか使用されない場合にスレッドリソースが不足するのを避けるため、システムはCPUコア数を値として使用します。v2.5以降、デフォルト値は `8` から `0` に変更されました。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: バージョンをパブリッシュするために使用される最大スレッド数。この値が`0`以下に設定されている場合、インポートの並行処理が高いのに固定されたスレッド数しか使用されない場合に、スレッドリソース不足を避けるため、システムはCPUコア数を値として使用します。v2.5から、デフォルト値は`8`から`0`に変更されました。 +- 導入バージョン: - ##### write_buffer_size -- Default: 104857600 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: メモリ内のMemTableのバッファサイズ。この構成項目はフラッシュをトリガーする閾値です。 -- Introduced in: - +- デフォルト: 104857600 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: メモリ内のMemTableのバッファサイズ。この設定項目はフラッシュをトリガーする閾値です。 +- 導入バージョン: - ### ロードとアンロード ##### broker_write_timeout_seconds -- Default: 30 -- Type: int -- Unit: 秒 -- Is mutable: いいえ -- Description: バックエンドブローカー操作が書き込み/I/O RPCに使用するタイムアウト(秒単位)。この値は1000倍されてミリ秒単位のタイムアウトが生成され、`BrokerFileSystem` および `BrokerServiceConnection` インスタンス(例:ファイルエクスポートおよびスナップショットのアップロード/ダウンロード)へのデフォルトの `timeout_ms` として渡されます。ブローカーまたはネットワークが遅い場合、または大きなファイルを転送する場合に、この値を増やして早期タイムアウトを回避してください。値を減らすと、ブローカーRPCがより早く失敗する可能性があります。この値は `common/config` で定義されており、プロセス起動時に適用されます(動的にリロードできません)。 -- Introduced in: v3.2.0 +- デフォルト: 30 +- タイプ: int +- 単位: 秒 +- 変更可能: いいえ +- 説明: バックエンドのブローカー操作で書き込み/IO RPCに使用されるタイムアウト(秒単位)。この値は1000倍されてミリ秒単位のタイムアウトになり、BrokerFileSystemおよびBrokerServiceConnectionインスタンス(例:ファイルエクスポート、スナップショットアップロード/ダウンロード)へのデフォルトの`timeout_ms`として渡されます。ブローカーまたはネットワークが遅い場合、または大きなファイルを転送する場合に、時期尚早なタイムアウトを避けるためにこの値を増やしてください。減らすとブローカーRPCがより早く失敗する可能性があります。この値はcommon/configで定義されており、プロセス起動時に適用されます(動的に再読み込み可能ではありません)。 +- 導入バージョン: v3.2.0 ##### enable_load_channel_rpc_async -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 有効にすると、ロードチャネルオープンRPC(例:`PTabletWriterOpen`)の処理は、BRPCワーカーから専用のスレッドプールにオフロードされます。リクエストハンドラは `ChannelOpenTask` を作成し、`LoadChannelMgr::_open` をインラインで実行する代わりに、内部 `_async_rpc_pool` に送信します。これにより、BRPCスレッド内の作業とブロックが減少し、`load_channel_rpc_thread_pool_num` と `load_channel_rpc_thread_pool_queue_size` を介した同時実行性の調整が可能になります。スレッドプールへの送信が失敗した場合(プールがいっぱいであるかシャットダウンされた場合)、リクエストはキャンセルされ、エラー状態が返されます。プールは `LoadChannelMgr::close()` でシャットダウンされるため、リクエストの拒否や処理の遅延を避けるために、この機能を有効にする際の容量とライフサイクルを考慮してください。 -- Introduced in: v3.5.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 有効にすると、ロードチャネルのオープンRPC(例:`PTabletWriterOpen`)の処理は、BRPCワーカーから専用のスレッドプールにオフロードされます。リクエストハンドラーは`ChannelOpenTask`を作成し、`LoadChannelMgr::_open`をインラインで実行する代わりに内部の`_async_rpc_pool`にサブミットします。これにより、BRPCスレッド内の作業とブロッキングが減少し、`load_channel_rpc_thread_pool_num`と`load_channel_rpc_thread_pool_queue_size`を介して並行処理を調整できます。スレッドプールのサブミットが失敗した場合(プールが満杯またはシャットダウンされた場合)、リクエストはキャンセルされ、エラー状態が返されます。プールは`LoadChannelMgr::close()`でシャットダウンされるため、この機能を有効にする際には、リクエストの拒否や処理の遅延を避けるために容量とライフサイクルを考慮してください。 +- 導入バージョン: v3.5.0 ##### enable_load_diagnose -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 有効にすると、StarRocksは、"[E1008]Reached timeout" と一致するbrpcタイムアウトの後、BE OlapTableSink/NodeChannelから自動ロード診断を試行します。コードは `PLoadDiagnoseRequest` を作成し、リモートのLoadChannelにRPCを送信してプロファイルおよび/またはスタックトレースを収集します(`load_diagnose_rpc_timeout_profile_threshold_ms` および `load_diagnose_rpc_timeout_stack_trace_threshold_ms` によって制御)。診断RPCは `load_diagnose_send_rpc_timeout_ms` をタイムアウトとして使用します。診断リクエストがすでに進行中の場合、診断はスキップされます。これを有効にすると、ターゲットノードで追加のRPCとプロファイリング作業が発生します。高感度のプロダクションワークロードでは、余分なオーバーヘッドを避けるために無効にしてください。 -- Introduced in: v3.5.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 有効にすると、StarRocksは`"[E1008]Reached timeout"`と一致するbrpcタイムアウト後にBE OlapTableSink/NodeChannelから自動ロード診断を試行します。コードは`PLoadDiagnoseRequest`を作成し、リモートLoadChannelにRPCを送信してプロファイルおよび/またはスタックトレースを収集します(`load_diagnose_rpc_timeout_profile_threshold_ms`および`load_diagnose_rpc_timeout_stack_trace_threshold_ms`によって制御されます)。診断RPCは`load_diagnose_send_rpc_timeout_ms`をタイムアウトとして使用します。診断リクエストが既に進行中の場合、診断はスキップされます。これを有効にすると、ターゲットノードで追加のRPCとプロファイリング作業が発生します。センシティブな本番ワークロードでは、追加のオーバーヘッドを避けるために無効にしてください。 +- 導入バージョン: v3.5.0 ##### enable_load_segment_parallel -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: 有効にすると、rowsetセグメントのロードとrowsetレベルの読み取りは、StarRocksバックグラウンドスレッドプール(ExecEnv::load_segment_thread_poolとExecEnv::load_rowset_thread_pool)を使用して同時に実行されます。Rowset::load_segmentsとTabletReader::get_segment_iteratorsは、これらのプールにセグメントごとまたはrowsetごとのタスクを送信し、送信が失敗した場合はシリアルロードにフォールバックして警告をログに記録します。これを有効にすると、大規模なrowsetの読み取り/ロードレイテンシを削減できますが、CPU/IOの同時実行性とメモリ負荷が増加します。注:並列ロードはセグメントのロード完了順序を変更する可能性があるため、部分的な圧縮を妨げます(コードは `_parallel_load` をチェックし、有効な場合は部分圧縮を無効にします)。セグメント順序に依存する操作への影響を考慮してください。 -- Introduced in: v3.3.0, v3.4.0, v3.5.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: 有効にすると、ロウセットセグメントのロードとロウセットレベルの読み取りは、StarRocksのバックグラウンドスレッドプール(`ExecEnv::load_segment_thread_pool`および`ExecEnv::load_rowset_thread_pool`)を使用して並行して実行されます。`Rowset::load_segments`および`TabletReader::get_segment_iterators`は、セグメントごとまたはロウセットごとのタスクをこれらのプールにサブミットし、サブミットが失敗した場合はシリアルロードにフォールバックして警告をログに記録します。大きなロウセットの読み取り/ロードレイテンシを削減するにはこれを有効にしますが、CPU/IO並行処理とメモリ負荷が増加するコストがかかります。注:並列ロードはセグメントのロード完了順序を変更する可能性があるため、部分コンパクションを防ぎます(コードは`_parallel_load`をチェックし、有効な場合は部分コンパクションを無効にします)。セグメントの順序に依存する操作への影響を考慮してください。 +- 導入バージョン: v3.3.0, v3.4.0, v3.5.0 ##### enable_streaming_load_thread_pool -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: ストリーミングロードスキャナーが専用のストリーミングロードスレッドプールに送信されるかどうかを制御します。有効で、クエリが `TLoadJobType::STREAM_LOAD` のLOADである場合、ConnectorScanNodeはスキャナータスクを `streaming_load_thread_pool`(INT32_MAXスレッドとキューサイズで構成され、事実上無制限)に送信します。無効の場合、スキャナーは一般的な `thread_pool` とその `PriorityThreadPool` 送信ロジック(優先度計算、try_offer/offer動作)を使用します。これを有効にすると、ストリーミングロードの作業を通常のクエリ実行から隔離して干渉を減らすことができます。ただし、専用プールは事実上無制限であるため、有効にすると、大量のストリーミングロードトラフィックの下で同時スレッドとリソース使用量が増加する可能性があります。このオプションはデフォルトでオンになっており、通常は変更する必要はありません。 -- Introduced in: v3.2.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: ストリーミングロードスキャナが専用のストリーミングロードスレッドプールにサブミットされるかどうかを制御します。有効で、クエリが`TLoadJobType::STREAM_LOAD`のLOADである場合、`ConnectorScanNode`はスキャナタスクを`streaming_load_thread_pool`(INT32_MAXスレッドとキューサイズ、つまり実質的に無制限に構成されている)にサブミットします。無効の場合、スキャナは汎用の`thread_pool`とその`PriorityThreadPool`サブミットロジック(優先度計算、`try_offer/offer`動作)を使用します。有効にすると、ストリーミングロードの作業が通常のクエリ実行から分離され、干渉が減少します。ただし、専用プールが実質的に無制限であるため、有効にすると、ストリーミングロードトラフィックが多い場合に同時スレッドとリソース使用量が増加する可能性があります。このオプションはデフォルトでオンになっており、通常は変更する必要はありません。 +- 導入バージョン: v3.2.0 ##### es_http_timeout_ms -- Default: 5000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: ElasticsearchスクロールリクエストのためにESScanReaderのESネットワーククライアントが使用するHTTP接続タイムアウト(ミリ秒)。この値は `network_client.set_timeout_ms()` 経由で適用され、その後のスクロールPOSTを送信する前に適用され、クライアントがスクロール中にES応答を待機する時間を制御します。低速ネットワークまたは大規模なクエリの場合、早期タイムアウトを避けるためにこの値を増やしてください。応答しないESノードでより早く失敗するには、この値を減らしてください。この設定は `es_scroll_keepalive` を補完し、スクロールコンテキストのキープアライブ期間を制御します。 -- Introduced in: v3.2.0 +- デフォルト: 5000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: Elasticsearchスクロールリクエストで`ESScanReader`のESネットワーククライアントが使用するHTTP接続タイムアウト(ミリ秒単位)。この値は、後続のスクロールPOSTを送信する前に`network_client.set_timeout_ms()`を介して適用され、スクロール中にES応答を待つ時間を制御します。遅いネットワークまたは大きなクエリの場合に時期尚早なタイムアウトを避けるためにこの値を増やし、応答しないESノードでより早く失敗するために減らしてください。この設定は、スクロールコンテキストのキープアライブ期間を制御する`es_scroll_keepalive`を補完します。 +- 導入バージョン: v3.2.0 ##### es_index_max_result_window -- Default: 10000 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: StarRocksがElasticsearchから単一バッチで要求するドキュメントの最大数を制限します。StarRocksは、ESリーダーの `KEY_BATCH_SIZE` を構築する際に、ESリクエストのバッチサイズを min(`es_index_max_result_window`, `chunk_size`) に設定します。ESリクエストがElasticsearchインデックス設定 `index.max_result_window` を超えると、ElasticsearchはHTTP 400(Bad Request)を返します。大規模なインデックスをスキャンする場合、この値を調整するか、Elasticsearch側でES `index.max_result_window` を増やして、より大きな単一リクエストを許可してください。 -- Introduced in: v3.2.0 +- デフォルト: 10000 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: StarRocksがElasticsearchから単一バッチでリクエストするドキュメントの最大数を制限します。StarRocksは、ESリーダーの`KEY_BATCH_SIZE`を構築する際に、ESリクエストのバッチサイズを`min(es_index_max_result_window, chunk_size)`に設定します。ESリクエストがElasticsearchのインデックス設定`index.max_result_window`を超えると、ElasticsearchはHTTP 400(Bad Request)を返します。大きなインデックスをスキャンする場合、またはES側でESの`index.max_result_window`を増やしてより大きな単一リクエストを許可する場合に、この値を調整してください。 +- 導入バージョン: v3.2.0 ##### ignore_load_tablet_failure -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: この項目が `false`(デフォルト)に設定されている場合、システムはタブレットヘッダーのロード失敗(NotFoundおよびAlreadyExist以外のエラー)を致命的と見なします。コードはエラーをログに記録し、BEプロセスを停止するために `LOG(FATAL)` を呼び出します。`true` に設定されている場合、BEはこのようなタブレットごとのロードエラーにもかかわらず起動を続行します。失敗したタブレットIDは記録されスキップされ、成功したタブレットはロードされます。このパラメータは、RocksDBメタスキャン自体からの致命的なエラーを抑制しないことに注意してください。これは常にプロセスを終了させます。 -- Introduced in: v3.2.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: この項目が`false`(デフォルト)に設定されている場合、システムはタブレットヘッダーロードの失敗(NotFoundおよびAlreadyExist以外のエラー)を致命的として扱います。コードはエラーをログに記録し、BEプロセスを停止するために`LOG(FATAL)`を呼び出します。`true`に設定されている場合、BEはこのようなタブレットごとのロードエラーにもかかわらず起動を続行します。失敗したタブレットIDは記録されてスキップされ、成功したタブレットはロードされます。このパラメーターは、RocksDBメタスキャン自体からの致命的なエラーを抑制するものではないことに注意してください。これは常にプロセスを終了させます。 +- 導入バージョン: v3.2.0 ##### load_channel_abort_clean_up_delay_seconds -- Default: 600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 中止されたロードチャネルのロードIDをシステムが `_aborted_load_channels` から削除するまでどれくらいの時間(秒単位)保持するかを制御します。ロードジョブがキャンセルまたは失敗した場合、ロードIDは記録されたままになるため、遅れて到着するロードRPCはすぐに拒否できます。遅延が期限切れになると、定期的なバックグラウンドスイープ中(最小スイープ間隔は60秒)にエントリはクリーンアップされます。遅延が低すぎると、中止後に迷子RPCを受け入れるリスクがあり、高すぎると、必要以上に状態を保持し、リソースを消費する可能性があります。中止されたロードの遅延リクエスト拒否の正確性とリソース保持のバランスを取るためにこれを調整してください。 -- Introduced in: v3.5.11, v4.0.4 +- デフォルト: 600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 中止されたロードチャネルのロードIDを、`_aborted_load_channels`から削除するまでにシステムが保持する期間(秒単位)を制御します。ロードジョブがキャンセルまたは失敗した場合、ロードIDは記録されたままになるため、後から到着するロードRPCは直ちに拒否されます。遅延期間が経過すると、エントリは定期的なバックグラウンドスイープ中にクリーンアップされます(最小スイープ間隔は60秒)。遅延を低く設定しすぎると、中止後に迷子RPCを受け入れるリスクがあり、高く設定しすぎると、必要以上に状態を保持し、リソースを消費する可能性があります。この値を、遅延リクエスト拒否の正確性と、中止されたロードのリソース保持とのバランスを取るように調整してください。 +- 導入バージョン: v3.5.11, v4.0.4 ##### load_channel_rpc_thread_pool_num -- Default: -1 -- Type: Int -- Unit: スレッド -- Is mutable: はい -- Description: ロードチャネルの非同期RPCスレッドプールの最大スレッド数。0以下(デフォルト `-1`)に設定されている場合、プールのサイズはCPUコア数(`CpuInfo::num_cores()`)に自動設定されます。設定された値はThreadPoolBuilderの最大スレッドとして使用され、プールの最小スレッドはmin(5, max_threads)に設定されます。プールキューサイズは `load_channel_rpc_thread_pool_queue_size` によって個別に制御されます。この設定は、非同期RPCプールサイズをbrpcワーカーのデフォルト(`brpc_num_threads`)と揃えるために導入され、ロードRPC処理を同期から非同期に切り替えた後も動作の互換性を維持します。実行時にこの設定を変更すると、`ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)` がトリガーされます。 -- Introduced in: v3.5.0 +- デフォルト: -1 +- タイプ: Int +- 単位: スレッド +- 変更可能: はい +- 説明: ロードチャネルの非同期RPCスレッドプールの最大スレッド数。0以下(デフォルト`-1`)に設定されている場合、プールサイズはCPUコア数(`CpuInfo::num_cores()`)に自動設定されます。設定された値は`ThreadPoolBuilder`の最大スレッド数として使用され、プールの最小スレッド数は`min(5, max_threads)`に設定されます。プールキューサイズは`load_channel_rpc_thread_pool_queue_size`によって個別に制御されます。この設定は、ロードRPC処理を同期から非同期に切り替えた後も動作の互換性を保つために、非同期RPCプールサイズをbrpcワーカーのデフォルト(`brpc_num_threads`)に合わせるために導入されました。実行時にこの設定を変更すると、`ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)`がトリガーされます。 +- 導入バージョン: v3.5.0 ##### load_channel_rpc_thread_pool_queue_size -- Default: 1024000 -- Type: int -- Unit: 個 -- Is mutable: いいえ -- Description: LoadChannelMgrによって作成されるロードチャネルRPCスレッドプールの保留中タスクの最大キューサイズを設定します。このスレッドプールは、`enable_load_channel_rpc_async` が有効な場合、非同期 `open` リクエストを実行します。プールのサイズは `load_channel_rpc_thread_pool_num` と対になります。大きなデフォルト値(1024000)は、同期処理から非同期処理への切り替え後も動作を維持するためにbrpcワーカーのデフォルトと一致しています。キューがいっぱいの場合、ThreadPool::submit() は失敗し、着信 `open` RPCはエラーでキャンセルされ、呼び出し元は拒否を受け取ります。この値を増やすと、より大きな同時 `open` リクエストのバーストをバッファリングできます。減らすと、バックプレッシャが厳しくなりますが、負荷がかかるとより多くの拒否が発生する可能性があります。 -- Introduced in: v3.5.0 +- デフォルト: 1024000 +- タイプ: int +- 単位: カウント +- 変更可能: いいえ +- 説明: `LoadChannelMgr`によって作成されるロードチャネルRPCスレッドプールの保留中タスクキューの最大サイズを設定します。このスレッドプールは、`enable_load_channel_rpc_async`が有効な場合に非同期`open`リクエストを実行します。プールサイズは`load_channel_rpc_thread_pool_num`と対になります。大きなデフォルト値(1024000)は、同期処理から非同期処理への切り替え後も動作を維持するためにbrpcワーカーのデフォルトに合わせられています。キューが満杯の場合、`ThreadPool::submit()`は失敗し、受信`open` RPCはエラーとともにキャンセルされ、呼び出し元は拒否を受け取ります。同時`open`リクエストの大きなバーストをバッファリングするにはこの値を増やし、バックプレッシャを厳しくするには減らしますが、ロード下でより多くの拒否を引き起こす可能性があります。 +- 導入バージョン: v3.5.0 ##### load_diagnose_rpc_timeout_profile_threshold_ms -- Default: 60000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: ロードRPCがタイムアウトし(エラーに "[E1008]Reached timeout" が含まれる)、`enable_load_diagnose` がtrueの場合、この閾値は完全なプロファイリング診断が要求されるかどうかを制御します。リクエストレベルのRPCタイムアウト `_rpc_timeout_ms` が `load_diagnose_rpc_timeout_profile_threshold_ms` より大きい場合、その診断に対してプロファイリングが有効になります。`_rpc_timeout_ms` が小さい値の場合、リアルタイム/短時間タイムアウトのロードに対して頻繁な重い診断を避けるため、プロファイリングは20タイムアウトごとに1回サンプリングされます。この値は送信される `PLoadDiagnoseRequest` の `profile` フラグに影響します。スタックトレースの動作は `load_diagnose_rpc_timeout_stack_trace_threshold_ms` によって個別に制御され、送信タイムアウトは `load_diagnose_send_rpc_timeout_ms` によって制御されます。 -- Introduced in: v3.5.0 +- デフォルト: 60000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: ロードRPCがタイムアウトし(エラーに`"[E1008]Reached timeout"`が含まれる)、`enable_load_diagnose`がtrueの場合、この閾値は完全なプロファイリング診断が要求されるかどうかを制御します。リクエストレベルのRPCタイムアウト`_rpc_timeout_ms`が`load_diagnose_rpc_timeout_profile_threshold_ms`より大きい場合、その診断に対してプロファイリングが有効になります。`_rpc_timeout_ms`が小さい値の場合、リアルタイム/短時間タイムアウトのロードに対して頻繁な重い診断を避けるため、プロファイリングは20回のタイムアウトごとに1回サンプリングされます。この値は、送信される`PLoadDiagnoseRequest`の`profile`フラグに影響します。スタックトレースの動作は`load_diagnose_rpc_timeout_stack_trace_threshold_ms`によって個別に制御され、送信タイムアウトは`load_diagnose_send_rpc_timeout_ms`によって制御されます。 +- 導入バージョン: v3.5.0 ##### load_diagnose_rpc_timeout_stack_trace_threshold_ms -- Default: 600000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: 長時間実行されるロードRPCのリモートスタックトレースをいつ要求するかを決定するために使用される閾値(ミリ秒単位)。ロードRPCがタイムアウトエラーでタイムアウトし、実効RPCタイムアウト(_rpc_timeout_ms)がこの値を超えた場合、`OlapTableSink`/`NodeChannel` はターゲットBEに `load_diagnose` RPCを送信する際に `stack_trace=true` を含めます。これにより、BEはデバッグ用にスタックトレースを返すことができます。`LocalTabletsChannel::SecondaryReplicasWaiter` も、セカンダリレプリカの待機がこの間隔を超えた場合、プライマリからベストエフォートのスタックトレース診断をトリガーします。この動作には `enable_load_diagnose` が必要であり、診断RPCタイムアウトには `load_diagnose_send_rpc_timeout_ms` が使用されます。プロファイリングは `load_diagnose_rpc_timeout_profile_threshold_ms` によって個別に制御されます。この値を下げると、スタックトレースがより積極的に要求されるようになります。 -- Introduced in: v3.5.0 +- デフォルト: 600000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: 長時間実行されるロードRPCに対してリモートスタックトレースを要求する時期を決定するために使用される閾値(ミリ秒単位)。ロードRPCがタイムアウトエラーでタイムアウトし、実効RPCタイムアウト(`_rpc_timeout_ms`)がこの値を超えた場合、`OlapTableSink`/`NodeChannel`はターゲットBEに`stack_trace=true`を含む`load_diagnose` RPCを送信し、BEはデバッグのためにスタックトレースを返します。`LocalTabletsChannel::SecondaryReplicasWaiter`も、セカンダリレプリカの待機がこの間隔を超えた場合、プライマリからベストエフォートなスタックトレース診断をトリガーします。この動作には`enable_load_diagnose`が必要であり、診断RPCのタイムアウトには`load_diagnose_send_rpc_timeout_ms`を使用します。プロファイリングは`load_diagnose_rpc_timeout_profile_threshold_ms`によって個別に制御されます。この値を下げると、スタックトレースがより積極的に要求されるようになります。 +- 導入バージョン: v3.5.0 ##### load_diagnose_send_rpc_timeout_ms -- Default: 2000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: BEロードパスによって開始される診断関連のbrpc呼び出しに適用されるタイムアウト(ミリ秒単位)。`load_diagnose` RPC(LoadChannel brpc呼び出しがタイムアウトしたときにNodeChannel/OlapTableSinkによって送信される)およびレプリカステータスクエリ(SecondaryReplicasWaiter / LocalTabletsChannelがプライマリレプリカの状態をチェックするときに使用される)のコントローラタイムアウトを設定するために使用されます。リモート側がプロファイルまたはスタックトレースデータで応答するのに十分な高い値を選択しますが、障害処理が遅延しないように高すぎないようにしてください。このパラメータは `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、および `load_diagnose_rpc_timeout_stack_trace_threshold_ms` と連携して機能し、診断情報がいつ、何が要求されるかを制御します。 -- Introduced in: v3.5.0 +- デフォルト: 2000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: BEロードパスによって開始される診断関連のbrpc呼び出しに適用されるタイムアウト(ミリ秒単位)。`load_diagnose` RPC(`LoadChannel` brpc呼び出しがタイムアウトしたときに`NodeChannel`/`OlapTableSink`によって送信される)およびレプリカステータスクエリ(`SecondaryReplicasWaiter`/`LocalTabletsChannel`がプライマリレプリカの状態をチェックするときに使用する)のコントローラタイムアウトを設定するために使用されます。リモート側がプロファイルまたはスタックトレースデータで応答できるように十分に高い値を設定しますが、障害処理が遅延しないように高すぎないようにしてください。このパラメーターは、診断情報が要求される時期と内容を制御する`enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms`と連携して機能します。 +- 導入バージョン: v3.5.0 ##### load_fp_brpc_timeout_ms -- Default: -1 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: `node_channel_set_brpc_timeout` フェイルポイントがトリガーされたときにOlapTableSinkによって使用されるチャネルごとのbrpc RPCタイムアウトを上書きします。正の値に設定されている場合、NodeChannelはその内部 `_rpc_timeout_ms` をこの値(ミリ秒単位)に設定し、open/add-chunk/cancel RPCがより短いタイムアウトを使用するようにします。これにより、"[E1008]Reached timeout" エラーを生成するbrpcタイムアウトのシミュレーションが可能になります。デフォルト(`-1`)は上書きを無効にします。この値の変更はテストおよび障害注入を目的としています。小さい値は偽のタイムアウトを生成し、ロード診断をトリガーする可能性があります(`enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms`、および `load_diagnose_send_rpc_timeout_ms` を参照)。 -- Introduced in: v3.5.0 +- デフォルト: -1 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: `node_channel_set_brpc_timeout`失敗点がトリガーされたときに`OlapTableSink`が使用するチャネルごとのbrpc RPCタイムアウトをオーバーライドします。正の値に設定されている場合、`NodeChannel`はその内部`_rpc_timeout_ms`をこの値(ミリ秒単位)に設定し、open/add-chunk/cancel RPCがより短いタイムアウトを使用するようにして、`"[E1008]Reached timeout"`エラーを生成するbrpcタイムアウトのシミュレーションを可能にします。デフォルト(`-1`)はオーバーライドを無効にします。この値の変更はテストおよび障害注入を目的としています。小さい値は誤ったタイムアウトを生成し、ロード診断をトリガーする可能性があります(`enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms`、および`load_diagnose_send_rpc_timeout_ms`を参照)。 +- 導入バージョン: v3.5.0 ##### load_fp_tablets_channel_add_chunk_block_ms -- Default: -1 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: 有効にすると(正のミリ秒値に設定すると)、このフェイルポイント設定により、`TabletsChannel::add_chunk` がロード処理中に指定された時間だけスリープします。これは、BRPCタイムアウトエラー(例:"[E1008]Reached timeout")をシミュレートし、ロードレイテンシを増加させるコストの高い `add_chunk` 操作をエミュレートするために使用されます。0以下の値(デフォルト `-1`)はインジェクションを無効にします。これは障害処理、タイムアウト、レプリカ同期動作のテストを目的としています。通常のプロダクションワークロードでは有効にしないでください。書き込み完了が遅延し、アップストリームのタイムアウトやレプリカの中止をトリガーする可能性があります。 -- Introduced in: v3.5.0 +- デフォルト: -1 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: 有効(正のミリ秒値に設定)の場合、この障害点設定は、ロード処理中に`TabletsChannel::add_chunk`を指定された時間だけスリープさせます。これは、BRPCタイムアウトエラー(例:`"[E1008]Reached timeout"`)をシミュレートし、ロードレイテンシを増加させるコストの高い`add_chunk`操作をエミュレートするために使用されます。0以下の値(デフォルト`-1`)は注入を無効にします。これは障害処理、タイムアウト、レプリカ同期動作のテストを目的としています。書き込み完了を遅らせ、アップストリームタイムアウトやレプリカ中止をトリガーする可能性があるため、通常の本番ワークロードでは有効にしないでください。 +- 導入バージョン: v3.5.0 ##### load_segment_thread_pool_num_max -- Default: 128 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BEロード関連スレッドプールの最大ワーカースレッド数を設定します。この値はThreadPoolBuilderによって `exec_env.cpp` の `load_rowset_pool` と `load_segment_pool` の両方のスレッドを制限するために使用され、ストリーミングおよびバッチロード中のロードされたrowsetとセグメントの処理(例:デコード、インデックス作成、書き込み)の同時実行性を制御します。この値を増やすと並列処理が向上し、ロードスループットが向上する可能性がありますが、CPU、メモリ使用量、および潜在的な競合も増加します。減らすと同時ロード処理が制限され、スループットが低下する可能性があります。`load_segment_thread_pool_queue_size` と `streaming_load_thread_pool_idle_time_ms` と一緒に調整してください。変更にはBEの再起動が必要です。 -- Introduced in: v3.3.0, v3.4.0, v3.5.0 +- デフォルト: 128 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BEのロード関連スレッドプールの最大ワーカー・スレッド数を設定します。この値は`ThreadPoolBuilder`によって`exec_env.cpp`の`load_rowset_pool`と`load_segment_pool`の両方のスレッドを制限するために使用され、ストリーミングロードとバッチロード中のロードされたロウセットとセグメントの処理(例:デコード、インデックス作成、書き込み)の並行処理を制御します。この値を増やすと並行処理が増加し、ロードスループットが向上する可能性がありますが、CPU、メモリ使用量、および潜在的な競合も増加します。減らすと同時ロード処理が制限され、スループットが減少する可能性があります。`load_segment_thread_pool_queue_size`および`streaming_load_thread_pool_idle_time_ms`とともに調整してください。変更にはBEの再起動が必要です。 +- 導入バージョン: v3.3.0, v3.4.0, v3.5.0 ##### load_segment_thread_pool_queue_size -- Default: 10240 -- Type: Int -- Unit: タスク -- Is mutable: いいえ -- Description: 「load_rowset_pool」および「load_segment_pool」として作成されるロード関連スレッドプールの最大キュー長(保留中のタスク数)を設定します。これらのプールは `load_segment_thread_pool_num_max` を最大スレッド数として使用し、この設定はThreadPoolのオーバーフローポリシーが有効になる前にバッファリングできるロードセグメント/rowsetタスクの数を制御します(ThreadPoolの実装に応じて、その後の送信が拒否またはブロックされる可能性があります)。保留中のロード作業を増やすにはこの値を増やします(メモリ使用量が増加し、レイテンシが上昇する可能性があります)。バッファリングされたロードの同時実行性を制限し、メモリ使用量を減らすにはこの値を減らします。 -- Introduced in: v3.3.0, v3.4.0, v3.5.0 +- デフォルト: 10240 +- タイプ: Int +- 単位: タスク +- 変更可能: いいえ +- 説明: "load_rowset_pool"と"load_segment_pool"として作成されるロード関連スレッドプールの最大キュー長(保留中タスク数)を設定します。これらのプールは、最大スレッド数に`load_segment_thread_pool_num_max`を使用し、この設定は、`ThreadPool`のオーバーフローポリシーが有効になる前に、いくつのロードセグメント/ロウセットタスクをバッファリングできるかを制御します(`ThreadPool`の実装に応じて、さらなるサブミットは拒否またはブロックされる可能性があります)。保留中のロード作業を増やすにはこの値を増やし(メモリ使用量が増え、レイテンシが増加する可能性あり)、バッファリングされるロードの並行処理を制限し、メモリ使用量を削減するには減らしてください。 +- 導入バージョン: v3.3.0, v3.4.0, v3.5.0 ##### max_pulsar_consumer_num_per_group -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BEのルーチンロード用の単一データコンシューマグループで作成できるPulsarコンシューマの最大数を制御します。複数のトピックサブスクリプションでは累積的な確認応答がサポートされていないため、各コンシューマは正確に1つのトピック/パーティションをサブスクライブします。`pulsar_info->partitions` のパーティション数がこの値を超えると、グループ作成は失敗し、BEで `max_pulsar_consumer_num_per_group` を増やすか、BEを追加するようにエラーが通知されます。この制限は `PulsarDataConsumerGroup` の構築時に強制され、BEが1つのルーチンロードグループに対してこの数を超えるコンシューマをホストすることを防止します。Kafkaルーチンロードの場合、代わりに `max_consumer_num_per_group` が使用されます。 -- Introduced in: v3.2.0 +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BE上のRoutine Loadの単一データコンシューマーグループで作成できるPulsarコンシューマーの最大数を制御します。マルチトピックサブスクリプションでは累積承認がサポートされていないため、各コンシューマーは正確に1つのトピック/パーティションをサブスクライブします。`pulsar_info->partitions`内のパーティション数がこの値を超えると、グループ作成は失敗し、BEで`max_pulsar_consumer_num_per_group`を増やすか、BEを追加するようエラーが通知されます。この制限は`PulsarDataConsumerGroup`構築時に適用され、BEが1つのRoutine Loadグループに対してこの数以上のコンシューマーをホストするのを防ぎます。Kafka Routine Loadの場合、代わりに`max_consumer_num_per_group`が使用されます。 +- 導入バージョン: v3.2.0 ##### pull_load_task_dir -- Default: `${STARROCKS_HOME}/var/pull_load` -- Type: string -- Unit: - -- Is mutable: いいえ -- Description: BEが「プルロード」タスク(ダウンロードされたソースファイル、タスク状態、一時出力など)のデータと作業ファイルを保存するファイルシステムパス。ディレクトリはBEプロセスによって書き込み可能であり、着信ロードのための十分なディスクスペースが必要です。デフォルトはSTARROCKS_HOMEからの相対パスです。テストはこのディレクトリが存在することを期待して作成します(テスト構成を参照)。 -- Introduced in: v3.2.0 +- デフォルト: `${STARROCKS_HOME}/var/pull_load` +- タイプ: string +- 単位: - +- 変更可能: いいえ +- 説明: BEが「プルロード」タスクのデータと作業ファイル(ダウンロードされたソースファイル、タスクの状態、一時出力など)を格納するファイルシステムパス。ディレクトリはBEプロセスから書き込み可能で、受信するロードに対して十分なディスク容量が必要です。デフォルトはSTARROCKS_HOMEからの相対パスです。テストではこのディレクトリが存在することを期待して作成されます(テスト設定を参照)。 +- 導入バージョン: v3.2.0 ##### routine_load_kafka_timeout_second -- Default: 10 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: Kafka関連のルーチンロード操作に使用されるタイムアウト(秒単位)。クライアントリクエストがタイムアウトを指定しない場合、`routine_load_kafka_timeout_second` が `get_info` のデフォルトRPCタイムアウト(ミリ秒に変換される)として使用されます。また、`librdkafka` コンシューマの呼び出しごとのコンシューマポーリングタイムアウトとしても使用されます(ミリ秒に変換され、残りのランタイムで上限が設定されます)。注:内部 `get_info` パスは、FE側のタイムアウト競合を避けるために、`librdkafka` に渡す前にこの値を80%に減らします。この値を、タイムリーな障害報告とネットワーク/ブローカー応答に十分な時間のバランスが取れるように設定してください。設定は変更できないため、変更には再起動が必要です。 -- Introduced in: v3.2.0 +- デフォルト: 10 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: Kafka関連のルーチンロード操作に使用されるタイムアウト(秒単位)。クライアントリクエストがタイムアウトを指定しない場合、`get_info`のデフォルトRPCタイムアウトとして(ミリ秒に変換されて)`routine_load_kafka_timeout_second`が使用されます。また、librdkafkaコンシューマーの呼び出しごとのコンシュームポーリングタイムアウトとしても使用されます(ミリ秒に変換され、残りの実行時間によって制限されます)。注:内部の`get_info`パスは、FE側のタイムアウト競合を避けるために、librdkafkaに渡す前にこの値を80%に減らします。適時の障害報告とネットワーク/ブローカー応答に十分な時間のバランスをとる値に設定してください。設定は変更できないため、変更には再起動が必要です。 +- 導入バージョン: v3.2.0 ##### routine_load_pulsar_timeout_second -- Default: 10 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: リクエストが明示的なタイムアウトを提供しない場合に、BEがPulsar関連のルーチンロード操作に使用するデフォルトのタイムアウト(秒単位)。具体的には、`PInternalServiceImplBase::get_pulsar_info` はこの値を1000倍して、Pulsarパーティションメタデータとバックログをフェッチするルーチンロードタスク実行メソッドに渡されるミリ秒単位のタイムアウトを形成します。応答の遅いPulsar応答を許可するために値を増やしますが、障害検出が長くなります。応答の遅いブローカーでより早く失敗するには値を減らします。Kafkaに使用される `routine_load_kafka_timeout_second` に類似しています。 -- Introduced in: v3.2.0 +- デフォルト: 10 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: リクエストが明示的なタイムアウトを提供しない場合に、Pulsar関連のルーチンロード操作のためにBEが使用するデフォルトのタイムアウト(秒単位)。具体的には、`PInternalServiceImplBase::get_pulsar_info`はこの値を1000倍して、Pulsarパーティションメタデータとバックログをフェッチするルーチンロードタスク実行メソッドに渡されるミリ秒単位のタイムアウトを形成します。遅いPulsar応答を許可するためにこれを増やすと、障害検出が遅れるコストがかかります。遅いブローカーでより早く失敗するには減らしてください。Kafkaに使用される`routine_load_kafka_timeout_second`と類似しています。 +- 導入バージョン: v3.2.0 ##### streaming_load_thread_pool_idle_time_ms -- Default: 2000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: ストリーミングロード関連スレッドプールのスレッドアイドルタイムアウト(ミリ秒単位)を設定します。この値は、`stream_load_io` プール、および `load_rowset_pool` と `load_segment_pool` のThreadPoolBuilderにアイドルタイムアウトとして渡されます。これらのプールのスレッドは、この期間アイドル状態が続くと再利用されます。値が低いほどアイドルリソースの使用量は減少しますが、スレッド作成のオーバーヘッドが増加します。値が高いほど、短時間のバーストの間はスレッドをアクティブに保ちますが、ベースラインのリソース使用量が増加します。`stream_load_io` プールは `enable_streaming_load_thread_pool` が有効な場合に使用されます。 -- Introduced in: v3.2.0 +- デフォルト: 2000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: ストリーミングロード関連スレッドプールのスレッドアイドルタイムアウト(ミリ秒単位)を設定します。この値は、`stream_load_io`プールおよび`load_rowset_pool`と`load_segment_pool`の`ThreadPoolBuilder`にアイドルタイムアウトとして渡されます。これらのプールのスレッドは、この期間アイドル状態が続くと回収されます。値が低いほどアイドル状態のリソース使用量が減少しますが、スレッド作成オーバーヘッドが増加します。値が高いほど、短時間のバースト負荷に対してスレッドが長時間アクティブな状態を維持できます。`stream_load_io`プールは`enable_streaming_load_thread_pool`が有効な場合に使用されます。 +- 導入バージョン: v3.2.0 ##### streaming_load_thread_pool_num_min -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: ExecEnv初期化中に作成されるストリーミングロードIOスレッドプール("stream_load_io")の最小スレッド数。プールは `set_max_threads(INT32_MAX)` と `set_max_queue_size(INT32_MAX)` で構築されるため、同時ストリーミングロードでのデッドロックを避けるために実質的に無制限です。値が0の場合、プールはスレッドなしで開始され、オンデマンドで増加します。正の値を設定すると、起動時にその数のスレッドが予約されます。このプールは `enable_streaming_load_thread_pool` がtrueの場合に使用され、そのアイドルタイムアウトは `streaming_load_thread_pool_idle_time_ms` によって制御されます。全体的な同時実行性は `fragment_pool_thread_num_max` と `webserver_num_workers` によって依然として制約されます。この値を変更する必要はめったになく、高すぎるとリソース使用量が増加する可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: ExecEnv初期化中に作成されるストリーミングロードIOスレッドプール("stream_load_io")の最小スレッド数。プールは`set_max_threads(INT32_MAX)`および`set_max_queue_size(INT32_MAX)`で構築されるため、同時ストリーミングロードのデッドロックを避けるために実質的に無制限です。値が0の場合、プールはスレッドなしで開始され、オンデマンドで増加します。正の値を設定すると、起動時にその数のスレッドが予約されます。このプールは`enable_streaming_load_thread_pool`がtrueの場合に使用され、アイドルタイムアウトは`streaming_load_thread_pool_idle_time_ms`によって制御されます。全体的な並行処理は`fragment_pool_thread_num_max`および`webserver_num_workers`によって依然として制約されます。この値を変更する必要があることはめったになく、高すぎるとリソース使用量が増加する可能性があります。 +- 導入バージョン: v3.2.0 ### 統計レポート ##### enable_metric_calculator -- Default: true -- Type: boolean -- Unit: - -- Is mutable: いいえ -- Description: trueの場合、BEプロセスはバックグラウンドの「metrics_daemon」スレッド(非AppleプラットフォームでDaemon::initで開始)を起動し、約15秒ごとに `StarRocksMetrics::instance()->metrics()->trigger_hook()` を呼び出して、派生/システムメトリクス(例:push/queryバイト/秒、最大ディスクI/O使用率、最大ネットワーク送信/受信レート)を計算し、メモリ内訳をログに記録し、テーブルメトリクスのクリーンアップを実行します。falseの場合、これらのフックはメトリクス収集時に `MetricRegistry::collect` 内で同期的に実行され、メトリクススクレイプのレイテンシが増加する可能性があります。変更を有効にするにはプロセスの再起動が必要です。 -- Introduced in: v3.2.0 +- デフォルト: true +- タイプ: boolean +- 単位: - +- 変更可能: いいえ +- 説明: trueの場合、BEプロセスはバックグラウンドの"metrics_daemon"スレッドを起動し(非Appleプラットフォームでは`Daemon::init`で開始)、約15秒ごとに`StarRocksMetrics::instance()->metrics()->trigger_hook()`を呼び出して、派生/システムメトリクス(例:push/queryバイト/秒、ディスクI/O最大利用率、ネットワーク送受信最大レート)を計算し、メモリの内訳をログに記録し、テーブルメトリクスのクリーンアップを実行します。falseの場合、これらのフックはメトリクス収集時に`MetricRegistry::collect`内で同期的に実行され、メトリクススクレイプのレイテンシが増加する可能性があります。有効にするにはプロセス再起動が必要です。 +- 導入バージョン: v3.2.0 ##### enable_system_metrics -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: trueの場合、StarRocksは起動時にシステムレベルの監視を初期化します。設定されたストアパスからディスクデバイスを検出し、ネットワークインターフェースを列挙し、この情報をメトリクスサブシステムに渡してディスクI/O、ネットワークトラフィック、メモリ関連のシステムメトリクスの収集を有効にします。デバイスまたはインターフェースの検出に失敗した場合、初期化は警告をログに記録し、システムメトリクス設定を中止します。このフラグはシステムメトリクスが初期化されるかどうかのみを制御します。定期的なメトリクス集約スレッドは `enable_metric_calculator` によって個別に制御され、JVMメトリクス初期化は `enable_jvm_metrics` によって制御されます。この値を変更するには再起動が必要です。 -- Introduced in: v3.2.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: trueの場合、StarRocksは起動中にシステムレベルのモニタリングを初期化します。設定されたストアパスからディスクデバイスを検出し、ネットワークインターフェイスを列挙し、この情報をメトリクスサブシステムに渡して、ディスクI/O、ネットワークトラフィック、メモリ関連のシステムメトリクスの収集を有効にします。デバイスまたはインターフェイスの検出が失敗した場合、初期化は警告をログに記録し、システムメトリクスのセットアップを中止します。このフラグはシステムメトリクスが初期化されるかどうかのみを制御します。定期的なメトリクス集約スレッドは`enable_metric_calculator`によって個別に制御され、JVMメトリクス初期化は`enable_jvm_metrics`によって制御されます。この値を変更するには再起動が必要です。 +- 導入バージョン: v3.2.0 ##### profile_report_interval -- Default: 30 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ProfileReportWorkerが(1)LOADクエリのフラグメントごとのプロファイル情報をいつ報告するかを決定し、(2)報告サイクルの間にスリープする秒単位の間隔。ワーカーは、現在の時刻と各タスクの `last_report_time` を `(profile_report_interval * 1000) ms` を使用して比較し、非パイプラインおよびパイプラインのロードタスクの両方でプロファイルを再報告する必要があるかどうかを判断します。各ループでワーカーは現在の値(実行時に変更可能)を読み取ります。設定された値が0以下の場合、ワーカーは強制的に1に設定し、警告を出力します。この値を変更すると、次の報告決定とスリープ期間に影響します。 -- Introduced in: v3.2.0 +- デフォルト: 30 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: `ProfileReportWorker`が(1) LOADクエリのフラグメントごとのプロファイル情報を報告するタイミングと、(2) 報告サイクル間にスリープするタイミングを決定するために使用する間隔(秒単位)。ワーカーは、現在の時刻と各タスクの`last_report_time`を比較し、(profile_report_interval * 1000)ミリ秒を使用して、非パイプラインおよびパイプラインのロードタスクの両方でプロファイルを再報告すべきかどうかを判断します。各ループで、ワーカーは現在の値(実行時に変更可能)を読み取ります。設定された値が0以下の場合、ワーカーは強制的に1に設定し、警告を出力します。この値を変更すると、次の報告決定とスリープ期間に影響します。 +- 導入バージョン: v3.2.0 ##### report_disk_state_interval_seconds -- Default: 60 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ストレージボリュームの状態(ボリューム内のデータのサイズを含む)を報告する時間間隔。 -- Introduced in: - +- デフォルト: 60 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ストレージボリュームの状態(ボリューム内のデータサイズを含む)を報告する時間間隔。 +- 導入バージョン: - ##### report_resource_usage_interval_ms -- Default: 1000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: BEエージェントからFE(マスター)に送信される定期的なリソース使用量レポート間の間隔(ミリ秒単位)。エージェントのワーカースレッドはTResourceUsage(実行中のクエリ数、使用済み/制限メモリ、CPU使用率、リソースグループの使用量)を収集し、`report_task` を呼び出して、この設定された間隔(`task_worker_pool` を参照)だけスリープします。値が低いほど報告の適時性は向上しますが、CPU、ネットワーク、およびマスター負荷が増加します。値が高いほどオーバーヘッドは減少しますが、リソース情報が古くなります。報告は関連メトリクス(report_resource_usage_requests_total、report_resource_usage_requests_failed)を更新します。クラスターの規模とFE負荷に応じて調整してください。 -- Introduced in: v3.2.0 +- デフォルト: 1000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: BEエージェントからFE(マスター)に送信される定期的なリソース使用量レポート間の間隔(ミリ秒単位)。エージェントワーカー・スレッドは`TResourceUsage`(実行中のクエリ数、使用/制限メモリ、使用CPUパーミル、リソースグループ使用量)を収集し、`report_task`を呼び出し、この設定された間隔だけスリープします(task_worker_poolを参照)。値が低いほど報告の適時性が向上しますが、CPU、ネットワーク、マスターの負荷が増加します。値が高いほどオーバーヘッドが減少しますが、リソース情報が古くなります。報告は関連メトリクス(`report_resource_usage_requests_total`、`report_resource_usage_requests_failed`)を更新します。クラスターの規模とFEの負荷に合わせて調整してください。 +- 導入バージョン: v3.2.0 ##### report_tablet_interval_seconds -- Default: 60 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: すべてのタブレットの最新バージョンを報告する時間間隔。 -- Introduced in: - +- デフォルト: 60 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: すべてのタブレットの最新バージョンを報告する時間間隔。 +- 導入バージョン: - ##### report_task_interval_seconds -- Default: 10 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: タスクの状態を報告する時間間隔。タスクには、テーブルの作成、テーブルの削除、データのロード、テーブルスキーマの変更などがあります。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: タスクの状態を報告する時間間隔。タスクは、テーブルの作成、テーブルの削除、データのロード、テーブルスキーマの変更などです。 +- 導入バージョン: - ##### report_workgroup_interval_seconds -- Default: 5 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: すべてのワークグループの最新バージョンを報告する時間間隔。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: すべてのワークグループの最新バージョンを報告する時間間隔。 +- 導入バージョン: - ### ストレージ ##### alter_tablet_worker_count -- Default: 3 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: スキーマ変更に使用されるスレッド数。 -- Introduced in: - +- デフォルト: 3 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: スキーマ変更に使用されるスレッド数。 +- 導入バージョン: - ##### avro_ignore_union_type_tag -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: Avro Unionデータ型からシリアライズされたJSON文字列からタイプタグを削除するかどうか。 -- Introduced in: v3.3.7, v3.4 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: Avro Unionデータ型からシリアル化されたJSON文字列から型タグを削除するかどうか。 +- 導入バージョン: v3.3.7, v3.4 ##### base_compaction_check_interval_seconds -- Default: 60 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Base Compactionのスレッドポーリングの時間間隔。 -- Introduced in: - +- デフォルト: 60 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ベースコンパクションのポーリングスレッドの時間間隔。 +- 導入バージョン: - ##### base_compaction_interval_seconds_since_last_operation -- Default: 86400 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 前回のBase Compactionからの時間間隔。この構成項目は、Base Compactionをトリガーする条件の1つです。 -- Introduced in: - +- デフォルト: 86400 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 前回のベースコンパクションからの時間間隔。この設定項目は、ベースコンパクションをトリガーする条件の1つです。 +- 導入バージョン: - ##### base_compaction_num_threads_per_disk -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 各ストレージボリュームでBase Compactionに使用されるスレッド数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 各ストレージボリューム上のベースコンパクションに使用されるスレッド数。 +- 導入バージョン: - ##### base_cumulative_delta_ratio -- Default: 0.3 -- Type: Double -- Unit: - -- Is mutable: はい -- Description: 累積ファイルサイズとベースファイルサイズの比率。この比率に達することは、Base Compactionをトリガーする条件の1つです。 -- Introduced in: - +- デフォルト: 0.3 +- タイプ: Double +- 単位: - +- 変更可能: はい +- 説明: 累積ファイルサイズとベースファイルサイズの比率。この比率に達することは、ベースコンパクションをトリガーする条件の1つです。 +- 導入バージョン: - ##### chaos_test_enable_random_compaction_strategy -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: この項目が `true` に設定されている場合、`TabletUpdates::compaction()` はカオスエンジニアリングテストを目的としたランダムなコンパクション戦略(`compaction_random`)を使用します。このフラグは、タブレットのコンパクション選択中に通常の戦略(例:サイズ階層型コンパクション)の代わりに非決定論的/ランダムなポリシーに従うようにコンパクションを強制し、優先されます。これは制御されたテストのみを目的としており、有効にすると予測できないコンパクション順序、I/O/CPUの増加、およびテストの不安定性を引き起こす可能性があります。プロダクション環境では有効にしないでください。障害注入またはカオスエンジニアリングテストのシナリオでのみ使用してください。 -- Introduced in: v3.3.12, 3.4.2, 3.5.0, 4.0.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: この項目が`true`に設定されている場合、`TabletUpdates::compaction()`はカオスエンジニアリングテストを目的としたランダムコンパクション戦略(`compaction_random`)を使用します。このフラグは、タブレットのコンパクション選択中に通常の戦略(例:サイズ階層型コンパクション)ではなく、非決定論的/ランダムなポリシーに従うようにコンパクションを強制し、優先されます。これは制御されたテストのみを目的としており、有効にすると予測不能なコンパクション順序、I/O/CPUの増加、テストの不安定性が発生する可能性があります。本番環境では有効にしないでください。障害注入またはカオス​​テストシナリオでのみ使用してください。 +- 導入バージョン: v3.3.12, 3.4.2, 3.5.0, 4.0.0 ##### check_consistency_worker_count -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: タブレットの一貫性をチェックするために使用されるスレッド数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: タブレットの一貫性をチェックするために使用されるスレッド数。 +- 導入バージョン: - ##### clear_expired_replication_snapshots_interval_seconds -- Default: 3600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: システムが異常なレプリケーションによって残された期限切れのスナップショットをクリアする時間間隔。 -- Introduced in: v3.3.5 +- デフォルト: 3600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: システムが異常なレプリケーションによって残された期限切れのスナップショットをクリアする時間間隔。 +- 導入バージョン: v3.3.5 ##### compact_threads -- Default: 4 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 同時コンパクションタスクに使用される最大スレッド数。この設定はv3.1.7およびv3.2.2以降、動的に変更可能になりました。 -- Introduced in: v3.0.0 +- デフォルト: 4 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 同時コンパクションタスクに使用される最大スレッド数。この設定はv3.1.7およびv3.2.2以降、動的に変更可能になりました。 +- 導入バージョン: v3.0.0 ##### compaction_max_memory_limit -- Default: -1 -- Type: Long -- Unit: バイト -- Is mutable: いいえ -- Description: このBE上のコンパクションタスクで利用可能なメモリのグローバルな上限(バイト単位)。BEの初期化中、最終的なコンパクションメモリ制限はmin(`compaction_max_memory_limit`, process_mem_limit * `compaction_max_memory_limit_percent` / 100) として計算されます。`compaction_max_memory_limit` が負の場合(デフォルト `-1`)、`mem_limit` から導出されたBEプロセスメモリ制限にフォールバックします。パーセント値は [0,100] にクランプされます。プロセスメモリ制限が設定されていない場合(負の場合)、コンパクションメモリは無制限のままです(`-1`)。この計算された値は `_compaction_mem_tracker` の初期化に使用されます。`compaction_max_memory_limit_percent` および `compaction_memory_limit_per_worker` も参照してください。 -- Introduced in: v3.2.0 +- デフォルト: -1 +- タイプ: Long +- 単位: バイト +- 変更可能: いいえ +- 説明: このBE上のコンパクションタスクで利用可能なメモリのグローバルな上限(バイト単位)。BEの初期化中、最終的なコンパクションメモリ制限は`min(compaction_max_memory_limit, process_mem_limit * compaction_max_memory_limit_percent / 100)`として計算されます。`compaction_max_memory_limit`が負の値(デフォルト`-1`)の場合、`mem_limit`から派生したBEプロセスメモリ制限にフォールバックします。パーセンテージ値は[0,100]にクランプされます。プロセスメモリ制限が設定されていない(負の値)場合、コンパクションメモリは無制限(`-1`)のままになります。この計算された値は、`_compaction_mem_tracker`を初期化するために使用されます。`compaction_max_memory_limit_percent`および`compaction_memory_limit_per_worker`も参照してください。 +- 導入バージョン: v3.2.0 ##### compaction_max_memory_limit_percent -- Default: 100 -- Type: Int -- Unit: パーセント -- Is mutable: いいえ -- Description: コンパクションに使用できるBEプロセスメモリの割合。BEは、`compaction_max_memory_limit` と (プロセスメモリ制限 × このパーセンテージ / 100) の最小値としてコンパクションメモリ上限を計算します。この値が0未満または100より大きい場合、100として扱われます。`compaction_max_memory_limit` < 0 の場合、代わりにプロセスメモリ制限が使用されます。計算では、`mem_limit` から導出されたBEプロセスメモリも考慮されます。`compaction_memory_limit_per_worker` (ワーカごとの上限) と組み合わせて、この設定は利用可能な合計コンパクションメモリを制御し、したがってコンパクションの同時実行性とOOMリスクに影響します。 -- Introduced in: v3.2.0 +- デフォルト: 100 +- タイプ: Int +- 単位: パーセント +- 変更可能: いいえ +- 説明: コンパクションに使用できるBEプロセスメモリの割合。BEは、コンパクションメモリの上限を`compaction_max_memory_limit`と(プロセスメモリ制限 × このパーセンテージ / 100)の最小値として計算します。この値が0未満または100より大きい場合、100として扱われます。`compaction_max_memory_limit`が0未満の場合、代わりにプロセスメモリ制限が使用されます。計算では、`mem_limit`から派生したBEプロセスメモリも考慮されます。`compaction_memory_limit_per_worker`(ワーカーごとの上限)と組み合わせることで、この設定は利用可能な総コンパクションメモリを制御し、したがってコンパクションの並行処理とOOMリスクに影響します。 +- 導入バージョン: v3.2.0 ##### compaction_memory_limit_per_worker -- Default: 2147483648 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: 各コンパクションスレッドに許可される最大メモリサイズ。 -- Introduced in: - +- デフォルト: 2147483648 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: 各コンパクションスレッドに許可される最大メモリサイズ。 +- 導入バージョン: - ##### compaction_trace_threshold -- Default: 60 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 各コンパクションのタイム閾値。コンパクションがこのタイム閾値よりも長くかかった場合、StarRocksは対応するトレースを出力します。 -- Introduced in: - +- デフォルト: 60 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 各コンパクションの時間閾値。コンパクションがこの閾値よりも長い時間を要する場合、StarRocksは対応するトレースを出力します。 +- 導入バージョン: - ##### create_tablet_worker_count -- Default: 3 -- Type: Int -- Unit: スレッド -- Is mutable: はい -- Description: FEによって送信されたTTaskType::CREATE(タブレット作成)タスクを処理するAgentServerスレッドプール内のワーカースレッドの最大数を設定します。BE起動時、この値はスレッドプールの最大値として使用され(プールは最小スレッド数=1、最大キューサイズ=無制限で作成されます)、実行時に変更すると、`ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)` がトリガーされます。同時タブレット作成スループットを向上させるには(大量ロードやパーティション作成時に便利)この値を増やし、同時作成操作を抑制するにはこの値を減らします。値を上げるとCPU、メモリ、I/Oの同時実行性が増加し、競合を引き起こす可能性があります。スレッドプールは少なくとも1つのスレッドを強制するため、1未満の値は実質的な効果はありません。 -- Introduced in: v3.2.0 +- デフォルト: 3 +- タイプ: Int +- 単位: スレッド +- 変更可能: はい +- 説明: FEからサブミットされる`TTaskType::CREATE`(タブレット作成)タスクを処理するAgentServerスレッドプールの最大ワーカー・スレッド数を設定します。BE起動時、この値はスレッドプールの最大値として使用され(プールは最小スレッド数1、最大キューサイズ無制限で作成されます)、実行時に`update-config` HTTPアクションを介して変更すると、プールの最大スレッドが更新されます(`ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)`を呼び出す)。同時タブレット作成スループットを上げるにはこの値を増やし(一括ロードやパーティション作成時に有用)、同時作成操作を制限するには減らしてください。値を上げるとCPU、メモリ、I/Oの並行処理が増加し、競合を引き起こす可能性があります。スレッドプールは少なくとも1つのスレッドを強制するため、1未満の値は実際には効果がありません。 +- 導入バージョン: v3.2.0 ##### cumulative_compaction_check_interval_seconds -- Default: 1 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Cumulative Compactionのスレッドポーリングの時間間隔。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 累積コンパクションのポーリングスレッドの時間間隔。 +- 導入バージョン: - ##### cumulative_compaction_num_threads_per_disk -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: ディスクごとのCumulative Compactionスレッド数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: ディスクごとの累積コンパクションスレッド数。 +- 導入バージョン: - ##### data_page_size -- Default: 65536 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: 列データとインデックスページを構築する際に使用されるターゲットの非圧縮ページサイズ(バイト単位)。この値は `ColumnWriterOptions.data_page_size` と `IndexedColumnWriterOptions.index_page_size` にコピーされ、ページビルダー(例:`BinaryPlainPageBuilder::is_page_full` とバッファ予約ロジック)によって、ページの完了時期と予約するメモリ量を決定するために参照されます。値が0の場合、ビルダーのページサイズ制限が無効になります。この値を変更すると、ページ数、メタデータオーバーヘッド、メモリ予約、I/O/圧縮のトレードオフ(ページが小さいほどページとメタデータが増加し、ページが大きいほどページは減少し、圧縮は向上する可能性がありますが、メモリの急増が発生する可能性があります)に影響します。 -- Introduced in: v3.2.4 +- デフォルト: 65536 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: カラムデータとインデックスページの構築時に使用されるターゲット非圧縮ページサイズ(バイト単位)。この値は`ColumnWriterOptions.data_page_size`および`IndexedColumnWriterOptions.index_page_size`にコピーされ、ページビルダー(例:`BinaryPlainPageBuilder::is_page_full`およびバッファ予約ロジック)によって、ページの完了時期と予約するメモリ量を決定するために参照されます。値が0の場合、ビルダーのページサイズ制限は無効になります。この値を変更すると、ページ数、メタデータオーバーヘッド、メモリ予約、I/O/圧縮のトレードオフ(ページが小さいほどページ数とメタデータが増加し、ページが大きいほどページ数が減少し、圧縮は向上する可能性がありますが、メモリの急増が発生する可能性があります)に影響します。 +- 導入バージョン: v3.2.4 ##### default_num_rows_per_column_file_block -- Default: 1024 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 各行ブロックに格納できる最大行数。 -- Introduced in: - +- デフォルト: 1024 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各行ブロックに格納できる最大行数。 +- 導入バージョン: - ##### delete_worker_count_high_priority -- Default: 1 -- Type: Int -- Unit: スレッド -- Is mutable: いいえ -- Description: DeleteTaskWorkerPool内のワーカースレッドのうち、HIGH優先度削除スレッドとして割り当てられるスレッド数。起動時にAgentServerは、`total threads = delete_worker_count_normal_priority + delete_worker_count_high_priority` で削除プールを作成します。最初の `delete_worker_count_high_priority` スレッドは、`TPriority::HIGH` タスクを排他的にポップしようとするようにマークされます(高優先度削除タスクをポーリングし、利用可能なものがない場合はスリープ/ループします)。この値を増やすと、高優先度削除リクエストの同時実行性が向上します。減らすと、専用容量が減少し、高優先度削除のレイテンシが増加する可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: 1 +- タイプ: Int +- 単位: スレッド +- 変更可能: いいえ +- 説明: DeleteTaskWorkerPool内のワーカー・スレッドのうち、HIGH優先度削除スレッドとして割り当てられるスレッド数。起動時、AgentServerは削除プールを`delete_worker_count_normal_priority + delete_worker_count_high_priority`の合計スレッド数で作成します。最初の`delete_worker_count_high_priority`スレッドは、`TPriority::HIGH`タスクを排他的にポップしようとするようにマークされます(高優先度削除タスクをポーリングし、利用可能なタスクがない場合はスリープ/ループします)。この値を増やすと、高優先度削除リクエストの並行処理が増加します。減らすと、専用容量が減少し、高優先度削除のレイテンシが増加する可能性があります。 +- 導入バージョン: v3.2.0 ##### dictionary_encoding_ratio -- Default: 0.7 -- Type: Double -- Unit: - -- Is mutable: いいえ -- Description: `StringColumnWriter` がチャンクの辞書(DICT_ENCODING)とプレーン(PLAIN_ENCODING)エンコーディングを決定するエンコード推測フェーズで使用する割合(0.0〜1.0)。コードは `max_card = row_count * dictionary_encoding_ratio` を計算し、チャンクの異なるキー数をスキャンします。異なるキー数が `max_card` を超える場合、ライターは `PLAIN_ENCODING` を選択します。このチェックは、チャンクサイズが `dictionary_speculate_min_chunk_size` を超えた場合(および `row_count > dictionary_min_rowcount` の場合)にのみ実行されます。値を高く設定すると辞書エンコーディングが優先されます(より多くの異なるキーを許容します)。値を低く設定すると、より早くプレーンエンコーディングにフォールバックします。値1.0は事実上辞書エンコーディングを強制します(異なるキー数が `row_count` を超えることはありません)。 -- Introduced in: v3.2.0 +- デフォルト: 0.7 +- タイプ: Double +- 単位: - +- 変更可能: いいえ +- 説明: `StringColumnWriter`がエンコード推測フェーズ中に、チャンクの辞書(DICT_ENCODING)とプレーン(PLAIN_ENCODING)エンコーディングを決定するために使用する割合(0.0-1.0)。コードは`max_card = row_count * dictionary_encoding_ratio`を計算し、チャンクの異なるキーカウントをスキャンします。異なるカウントが`max_card`を超えると、ライターは`PLAIN_ENCODING`を選択します。このチェックは、チャンクサイズが`dictionary_speculate_min_chunk_size`を超えた場合(かつ`row_count > dictionary_min_rowcount`の場合)にのみ実行されます。値を高く設定すると辞書エンコーディングが優先され(より多くの異なるキーを許容)、値を低く設定するとプレーンエンコーディングへのフォールバックが早まります。1.0の値は、実質的に辞書エンコーディングを強制します(異なるカウントが行カウントを超えることはありません)。 +- 導入バージョン: v3.2.0 ##### dictionary_encoding_ratio_for_non_string_column -- Default: 0 -- Type: double -- Unit: - -- Is mutable: いいえ -- Description: 非文字列列(数値、日付/時刻、DECIMAL型)に辞書エンコーディングを使用するかどうかを決定する比率閾値。有効な場合(値 > 0.0001)、ライターは `max_card = row_count * dictionary_encoding_ratio_for_non_string_column` を計算し、`row_count > dictionary_min_rowcount` のサンプルでは、`distinct_count ≤ max_card` の場合にのみ `DICT_ENCODING` を選択します。それ以外の場合は `BIT_SHUFFLE` にフォールバックします。値 `0`(デフォルト)は非文字列の辞書エンコーディングを無効にします。このパラメータは `dictionary_encoding_ratio` と類似していますが、非文字列列に適用されます。(0,1] の値を使用してください。値が小さいほど辞書エンコーディングはカーディナリティの低い列に制限され、辞書メモリ/I/Oオーバーヘッドが減少します。 -- Introduced in: v3.3.0, v3.4.0, v3.5.0 +- デフォルト: 0 +- タイプ: double +- 単位: - +- 変更可能: いいえ +- 説明: 非文字列カラム(数値、日付/時刻、decimalタイプ)に辞書エンコーディングを使用するかどうかを決定するための比率閾値。有効(値が`0.0001`より大きい)の場合、ライターは`max_card = row_count * dictionary_encoding_ratio_for_non_string_column`を計算し、`row_count > dictionary_min_rowcount`のサンプルでは、`distinct_count ≤ max_card`の場合にのみ`DICT_ENCODING`を選択します。それ以外の場合は`BIT_SHUFFLE`にフォールバックします。`0`(デフォルト)の値は非文字列辞書エンコーディングを無効にします。このパラメーターは`dictionary_encoding_ratio`に類似していますが、非文字列カラムに適用されます。`(0,1]`の範囲の値を使用してください。値が小さいほど、辞書エンコーディングをカーディナリティの低いカラムに制限し、辞書のメモリ/IOオーバーヘッドを削減します。 +- 導入バージョン: v3.3.0, v3.4.0, v3.5.0 ##### dictionary_page_size -- Default: 1048576 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: rowsetセグメントを構築する際に使用される辞書ページのバイト単位のサイズ。この値はBE rowsetコードの `PageBuilderOptions::dict_page_size` に読み込まれ、単一の辞書ページに格納できる辞書エントリの数を制御します。この値を増やすと、より大きな辞書を許可することで辞書エンコードされた列の圧縮率が向上する可能性がありますが、ページが大きくなると書き込み/エンコード中に消費されるメモリが増加し、ページの読み取りやマテリアライズ時にI/Oとレイテンシが増加する可能性があります。大規模メモリ、書き込み頻度の高いワークロードの場合、保守的に設定し、実行時パフォーマンスの低下を防ぐために過度に大きな値を避けてください。 -- Introduced in: v3.3.0, v3.4.0, v3.5.0 +- デフォルト: 1048576 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: ロウセットセグメント構築時に使用される辞書ページのサイズ(バイト単位)。この値はBEロウセットコードの`PageBuilderOptions::dict_page_size`に読み込まれ、1つの辞書ページに格納できる辞書エントリの数を制御します。この値を増やすと、より大きな辞書を許可することで辞書エンコードされたカラムの圧縮率が向上する可能性がありますが、ページが大きいと書き込み/エンコード中のメモリ消費が増加し、ページ読み取りまたは具現化時のI/Oとレイテンシが増加する可能性があります。大規模メモリ、書き込み負荷の高いワークロードでは控えめに設定し、実行時パフォーマンスの低下を防ぐために過度に大きな値は避けてください。 +- 導入バージョン: v3.3.0, v3.4.0, v3.5.0 ##### disk_stat_monitor_interval -- Default: 5 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ディスクの健全性ステータスを監視する時間間隔。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ディスクのヘルス状態を監視する時間間隔。 +- 導入バージョン: - ##### download_low_speed_limit_kbps -- Default: 50 -- Type: Int -- Unit: KB/秒 -- Is mutable: はい -- Description: 各HTTPリクエストのダウンロード速度の下限。HTTPリクエストは、`download_low_speed_time` で指定された時間内にこの値よりも低い速度で継続的に実行された場合、中断されます。 -- Introduced in: - +- デフォルト: 50 +- タイプ: Int +- 単位: KB/秒 +- 変更可能: はい +- 説明: 各HTTPリクエストのダウンロード速度下限。`download_low_speed_time`で指定された期間内に、この値よりも低い速度でHTTPリクエストが継続的に実行された場合、HTTPリクエストは中止されます。 +- 導入バージョン: - ##### download_low_speed_time -- Default: 300 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: HTTPリクエストが制限よりも低いダウンロード速度で実行できる最大時間。HTTPリクエストは、この構成項目で指定された時間内に `download_low_speed_limit_kbps` の値よりも低い速度で継続的に実行された場合、中断されます。 -- Introduced in: - +- デフォルト: 300 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: HTTPリクエストが制限よりも低いダウンロード速度で実行できる最大時間。この設定項目で指定された期間内に、`download_low_speed_limit_kbps`の値よりも低い速度でHTTPリクエストが継続的に実行された場合、HTTPリクエストは中止されます。 +- 導入バージョン: - ##### download_worker_count -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BEノード上のリストアジョブのダウンロードタスクの最大スレッド数。`0` は、BEが稼働しているマシンのCPUコア数に値を設定することを示します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BEノードでのリストアジョブのダウンロードタスクの最大スレッド数。`0`は、BEが稼働しているマシン上のCPUコア数に設定することを示します。 +- 導入バージョン: - ##### drop_tablet_worker_count -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: タブレットを削除するために使用されるスレッド数。`0` はノード内のCPUコアの半分を示します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: タブレットを削除するために使用されるスレッド数。`0`はノード内のCPUコア数の半分を示します。 +- 導入バージョン: - ##### enable_check_string_lengths -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: ロード中にデータ長をチェックして、VARCHARデータが範囲外であることによる圧縮失敗を解決するかどうか。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: ロード中にデータ長をチェックして、範囲外のVARCHARデータによるコンパクション失敗を解決するかどうか。 +- 導入バージョン: - ##### enable_event_based_compaction_framework -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: イベントベースのコンパクションフレームワークを有効にするかどうか。`true` はイベントベースのコンパクションフレームワークが有効であることを示し、`false` は無効であることを示します。イベントベースのコンパクションフレームワークを有効にすると、多くのタブレットがあるシナリオや単一タブレットが大量のデータを持つシナリオで、コンパクションのオーバーヘッドを大幅に削減できます。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: イベントベースのコンパクションフレームワークを有効にするかどうか。`true`はイベントベースのコンパクションフレームワークが有効になっていることを示し、`false`は無効になっていることを示します。イベントベースのコンパクションフレームワークを有効にすると、多くのタブレットがあるシナリオや単一タブレットに大量のデータがあるシナリオで、コンパクションのオーバーヘッドを大幅に削減できます。 +- 導入バージョン: - ##### enable_lazy_delta_column_compaction -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 有効にすると、コンパクションは部分列更新によって生成されたデルタ列に対して「遅延」戦略を優先します。StarRocksは、コンパクションI/Oを節約するために、デルタ列ファイルをメインセグメントファイルに積極的にマージすることを避けます。実際には、コンパクション選択コードは部分列更新rowsetと複数の候補をチェックします。これらが見つかり、このフラグがtrueの場合、エンジンはコンパクションへの追加入力を停止するか、空のrowset(レベル-1)のみをマージし、デルタ列を分離したままにします。これにより、コンパクション中の即時I/OとCPUが削減されますが、統合が遅延する(セグメントと一時ストレージオーバーヘッドが増加する可能性)コストがかかります。正確性とクエリのセマンティクスは変更されません。 -- Introduced in: v3.2.3 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 有効にすると、コンパクションは部分カラム更新によって生成されたデルタカラムに対して「遅延」戦略を優先します。StarRocksは、コンパクションI/Oを節約するために、デルタカラムファイルをメインセグメントファイルに積極的にマージすることを避けます。実際には、コンパクション選択コードは部分カラム更新ロウセットと複数の候補をチェックします。それらが見つかり、このフラグがtrueの場合、エンジンはコンパクションへのさらなる入力の追加を停止するか、空のロウセット(レベル-1)のみをマージし、デルタカラムを分離したままにします。これにより、コンパクション中の即時I/OとCPUは削減されますが、統合の遅延(潜在的に多くのセグメントと一時的なストレージオーバーヘッド)が発生するコストがかかります。正確性とクエリセマンティクスは変更されません。 +- 導入バージョン: v3.2.3 ##### enable_new_load_on_memory_limit_exceeded -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: ハードメモリリソース制限に達したときに新しいロードプロセスを許可するかどうか。`true` は新しいロードプロセスが許可されることを示し、`false` は拒否されることを示します。 -- Introduced in: v3.3.2 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: ハードメモリリソース制限に達した場合に、新しいロードプロセスを許可するかどうか。`true`は新しいロードプロセスが許可されることを示し、`false`は拒否されることを示します。 +- 導入バージョン: v3.3.2 ##### enable_pk_index_parallel_compaction -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターでプライマリキーインデックスの並列コンパクションを有効にするかどうか。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターでプライマリキーインデックスの並列コンパクションを有効にするかどうか。 +- 導入バージョン: - ##### enable_pk_index_parallel_execution -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターでプライマリキーインデックス操作の並列実行を有効にするかどうか。有効にすると、システムはスレッドプールを使用して公開操作中にセグメントを並行して処理し、大規模なタブレットのパフォーマンスを大幅に向上させます。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターでプライマリキーインデックス操作の並列実行を有効にするかどうか。有効にすると、システムはスレッドプールを使用してパブリッシュ操作中にセグメントを同時に処理し、大規模なタブレットのパフォーマンスを大幅に向上させます。 +- 導入バージョン: - ##### enable_pk_index_eager_build -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: データインポートおよびコンパクションフェーズ中にプライマリキーインデックスファイルを積極的に構築するかどうか。有効にすると、システムはデータ書き込み中に永続PKインデックスファイルを即座に生成し、その後のクエリパフォーマンスを向上させます。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: データインポートおよびコンパクションフェーズ中にプライマリキーインデックスファイルを積極的に構築するかどうか。有効にすると、システムはデータ書き込み中に永続PKインデックスファイルを直ちに生成し、その後のクエリパフォーマンスを向上させます。 +- 導入バージョン: - ##### enable_pk_size_tiered_compaction_strategy -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: プライマリキーテーブルのサイズ階層型コンパクションポリシーを有効にするかどうか。`true` はサイズ階層型コンパクション戦略が有効であることを示し、`false` は無効であることを示します。 -- Introduced in: この項目はv3.2.4およびv3.1.10以降の共有データクラスター、およびv3.2.5およびv3.1.10以降の共有なしクラスターで有効になります。 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: プライマリキーテーブルのサイズ階層型コンパクションポリシーを有効にするかどうか。`true`はサイズ階層型コンパクション戦略が有効であることを示し、`false`は無効であることを示します。 +- 導入バージョン: この項目は、v3.2.4およびv3.1.10以降の共有データクラスターと、v3.2.5およびv3.1.10以降の共有型なしクラスターで有効になります。 ##### enable_rowset_verify -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 生成されたrowsetの正しさを検証するかどうか。有効にすると、コンパクションとスキーマ変更後に生成されたrowsetの正しさがチェックされます。 -- Introduced in: - +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 生成されたロウセットの正確性を検証するかどうか。有効にすると、コンパクションとスキーマ変更後に生成されたロウセットの正確性がチェックされます。 +- 導入バージョン: - ##### enable_size_tiered_compaction_strategy -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: サイズ階層型コンパクションポリシー(プライマリキーテーブルを除く)を有効にするかどうか。`true` はサイズ階層型コンパクション戦略が有効であることを示し、`false` は無効であることを示します。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: サイズ階層型コンパクションポリシーを有効にするかどうか(プライマリキーテーブルを除く)。`true`はサイズ階層型コンパクション戦略が有効であることを示し、`false`は無効であることを示します。 +- 導入バージョン: - ##### enable_strict_delvec_crc_check -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: `enable_strict_delvec_crc_check` がtrueに設定されている場合、削除ベクターに対して厳密なCRC32チェックを実行し、不一致が検出された場合は失敗を返します。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: `enable_strict_delvec_crc_check`がtrueに設定されている場合、削除ベクトルに対して厳格なCRC32チェックを実行し、不一致が検出された場合は失敗を返します。 +- 導入バージョン: - ##### enable_transparent_data_encryption -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: 有効にすると、StarRocksは新しく書き込まれたストレージオブジェクト(セグメントファイル、削除/更新ファイル、rowsetセグメント、lake SSTs、永続インデックスファイルなど)に対して暗号化されたディスク上のアーティファクトを作成します。ライター(RowsetWriter/SegmentWriter、lake UpdateManager/LakePersistentIndexおよび関連コードパス)はKeyCacheから暗号化情報を要求し、`encryption_info` を書き込み可能ファイルにアタッチし、`encryption_meta` をrowset / セグメント / sstableメタデータ(`segment_encryption_metas`、削除/更新暗号化メタデータ)に永続化します。フロントエンドとバックエンド/CNの暗号化フラグは一致している必要があります。不一致の場合、BEはハートビートで中止します(`LOG(FATAL)`)。このフラグは実行時に変更できません。デプロイ前に有効にし、キー管理(KEK)とKeyCacheがクラスター全体で適切に構成され、同期されていることを確認してください。 -- Introduced in: v3.3.1, 3.4.0, 3.5.0, 4.0.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: 有効にすると、StarRocksは新しく書き込まれるストレージオブジェクト(セグメントファイル、削除/更新ファイル、ロウセットセグメント、Lake SST、永続インデックスファイルなど)に対して暗号化されたオンディスクアーティファクトを作成します。ライター(RowsetWriter/SegmentWriter、Lake UpdateManager/LakePersistentIndexおよび関連コードパス)は、KeyCacheから暗号化情報を要求し、`encryption_info`を書き込み可能ファイルに添付し、`encryption_meta`をロウセット/セグメント/SSTableメタデータ(`segment_encryption_metas`、削除/更新暗号化メタデータ)に永続化します。FrontendとBackend/CNの暗号化フラグは一致している必要があります。不一致があると、BEはハートビートで中止されます(LOG(FATAL))。このフラグは実行時に変更可能ではありません。デプロイ前に有効にし、キー管理(KEK)とKeyCacheがクラスター全体で適切に構成され、同期されていることを確認してください。 +- 導入バージョン: v3.3.1, 3.4.0, 3.5.0, 4.0.0 ##### enable_zero_copy_from_page_cache -- Default: true -- Type: boolean -- Unit: - -- Is mutable: はい -- Description: 有効にすると、`FixedLengthColumnBase` はページキャッシュによってサポートされるバッファから発生するデータを追加する際にバイトコピーを避けることができます。`append_numbers` では、すべての条件が満たされている場合(設定がtrueである、入力リソースが所有されている、リソースメモリが列の要素タイプにアラインされている、列が空である、リソース長が要素サイズの倍数である)、コードは入力 `ContainerResource` を取得し、列の内部リソースポインタ(ゼロコピー)を設定します。これを有効にすると、CPUとメモリコピーのオーバーヘッドが削減され、取り込み/スループットが向上する可能性があります。欠点としては、列の寿命が取得されたバッファと結合され、正しい所有権/アラインメントに依存することです。安全なコピーを強制するには無効にしてください。 -- Introduced in: - +- デフォルト: true +- タイプ: boolean +- 単位: - +- 変更可能: はい +- 説明: 有効にすると、`FixedLengthColumnBase`は、ページキャッシュによってバックアップされたバッファから発生するデータを追加する際に、バイトコピーを回避できる場合があります。`append_numbers`では、すべての条件が満たされている場合、コードは受信`ContainerResource`を取得し、カラムの内部リソースポインタ(ゼロコピー)を設定します。条件とは、設定がtrueであること、受信リソースが所有されていること、リソースメモリがカラム要素タイプに合わせてアラインされていること、カラムが空であること、リソース長が要素サイズの倍数であることです。これを有効にすると、CPUとメモリコピーのオーバーヘッドが減少し、取り込み/スループットが向上する可能性があります。欠点としては、カラムのライフタイムが取得されたバッファと結合され、正しい所有権/アラインメントに依存します。安全なコピーを強制するには無効にしてください。 +- 導入バージョン: - ##### file_descriptor_cache_clean_interval -- Default: 3600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 一定期間使用されていないファイルディスクリプタをクリーンアップする時間間隔。 -- Introduced in: - +- デフォルト: 3600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 一定期間使用されていないファイルディスクリプタをクリーンアップする時間間隔。 +- 導入バージョン: - ##### ignore_broken_disk -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: 設定されたストレージパスが読み取り/書き込みチェックに失敗したり、解析に失敗したりした場合の起動動作を制御します。`false`(デフォルト)の場合、BEは `storage_root_path` または `spill_local_storage_dir` 内の破損したエントリを致命的と見なし、起動を中止します。`true` の場合、StarRocksは `check_datapath_rw` に失敗したり、解析に失敗したりしたストレージパスをスキップ(警告をログに記録して削除)し、BEは残りの健全なパスで起動を続行できます。注:設定されたすべてのパスが削除された場合でも、BEは終了します。これを有効にすると、構成ミスまたは故障したディスクを隠蔽し、無視されたパス上のデータが利用できなくなる可能性があります。ログとディスクの状態を適切に監視してください。 -- Introduced in: v3.2.0 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: 設定されたストレージパスが読み取り/書き込みチェックに失敗したり、解析に失敗したりした場合の起動動作を制御します。`false`(デフォルト)の場合、BEは`storage_root_path`または`spill_local_storage_dir`内の破損したエントリを致命的とみなし、起動を中止します。`true`の場合、StarRocksは`check_datapath_rw`に失敗した、または解析に失敗したストレージパスをスキップし(警告をログに記録して削除)、BEは残りの健全なパスで起動を続行できます。注:設定されたすべてのパスが削除された場合でも、BEは終了します。これを有効にすると、設定ミスやディスク障害を隠蔽し、無視されたパス上のデータが利用できなくなる可能性があります。ログとディスクの状態を適切に監視してください。 +- 導入バージョン: v3.2.0 ##### inc_rowset_expired_sec -- Default: 1800 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 着信データの有効期限。この構成項目は増分クローンで使用されます。 -- Introduced in: - +- デフォルト: 1800 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 受信データの有効期限。この設定項目は増分クローンで使用されます。 +- 導入バージョン: - ##### load_process_max_memory_hard_limit_ratio -- Default: 2 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BEノード上のすべてのロードプロセスが占有できるメモリリソースのハードリミット(比率)。`enable_new_load_on_memory_limit_exceeded` が `false` に設定されており、すべてのロードプロセスのメモリ消費が `load_process_max_memory_limit_percent * load_process_max_memory_hard_limit_ratio` を超えた場合、新しいロードプロセスは拒否されます。 -- Introduced in: v3.3.2 +- デフォルト: 2 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BEノード上のすべてのロードプロセスが占有できるメモリリソースのハードリミット(比率)。`enable_new_load_on_memory_limit_exceeded`が`false`に設定されており、すべてのロードプロセスのメモリ消費が`load_process_max_memory_limit_percent * load_process_max_memory_hard_limit_ratio`を超えると、新しいロードプロセスは拒否されます。 +- 導入バージョン: v3.3.2 ##### load_process_max_memory_limit_percent -- Default: 30 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BEノード上のすべてのロードプロセスが占有できるメモリリソースのソフトリミット(パーセンテージ)。 -- Introduced in: - +- デフォルト: 30 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BEノード上のすべてのロードプロセスが占有できるメモリリソースのソフトリミット(パーセンテージ)。 +- 導入バージョン: - ##### lz4_acceleration -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 内蔵LZ4コンプレッサによって使用されるLZ4「アクセラレーション」パラメータを制御します(`LZ4_compress_fast_continue` に渡されます)。値が高いほど圧縮速度が優先され、圧縮率が犠牲になります。値が低いほど(1)より良い圧縮を生成しますが、遅くなります。有効範囲:MIN=1、MAX=65537。この設定は、BlockCompression内のすべてのLZ4ベースのコーデック(例:LZ4およびHadoop-LZ4)に影響し、圧縮方法のみを変更します。LZ4形式や解凍の互換性は変更しません。CPUバウンドまたは低遅延のワークロードで、より大きな出力が許容できる場合は上向きに(例:4、8など)調整してください。ストレージまたはI/Oに敏感なワークロードの場合は1に保ってください。スループット対サイズのトレードオフはデータに大きく依存するため、変更する前に代表的なデータでテストしてください。 -- Introduced in: v3.4.1, 3.5.0, 4.0.0 +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 内蔵LZ4コンプレッサーで使用されるLZ4「アクセラレーション」パラメーターを制御します(`LZ4_compress_fast_continue`に渡されます)。値が高いほど圧縮速度を優先し、圧縮率を犠牲にします。値が低いほど(1)、より良い圧縮を生成しますが遅くなります。有効範囲:MIN=1、MAX=65537。この設定は`BlockCompression`内のすべてのLZ4ベースのコーデック(例:LZ4およびHadoop-LZ4)に影響し、圧縮の実行方法のみを変更します。LZ4形式や解凍互換性は変更しません。CPUバウンドまたは低レイテンシのワークロードで、より大きな出力が許容される場合は値を増やし(例:4、8、...)、ストレージまたはI/Oセンシティブなワークロードの場合は1のままにしてください。スループットとサイズのトレードオフはデータに大きく依存するため、変更する前に代表的なデータでテストしてください。 +- 導入バージョン: v3.4.1, 3.5.0, 4.0.0 ##### lz4_expected_compression_ratio -- Default: 2.1 -- Type: double -- Unit: 無次元 (圧縮率) -- Is mutable: はい -- Description: シリアライゼーション圧縮戦略が観測されたLZ4圧縮が「良好」であるかどうかを判断する際に使用する閾値。`compress_strategy.cpp` では、この値が `lz4_expected_compression_speed_mbps` と共に報酬メトリクスを計算する際に観測された `compress_ratio` を分割します。結合された報酬が1.0より大きい場合、戦略は肯定的なフィードバックを記録します。この値を増やすと、期待される圧縮率が高くなり(条件を満たすのが難しくなる)、減らすと、観測された圧縮が満足できると見なされやすくなります。典型的なデータの圧縮率に合わせるように調整してください。有効範囲:MIN=1、MAX=65537。 -- Introduced in: v3.4.1, 3.5.0, 4.0.0 +- デフォルト: 2.1 +- タイプ: double +- 単位: 無次元 (圧縮率) +- 変更可能: はい +- 説明: シリアル化圧縮戦略が観察されたLZ4圧縮が「良い」かどうかを判断するために使用する閾値。`compress_strategy.cpp`では、この値は`lz4_expected_compression_speed_mbps`とともに報酬メトリクスを計算する際に観察された圧縮率を分割します。結合された報酬が`> 1.0`の場合、戦略は正のフィードバックを記録します。この値を増やすと、予想される圧縮率が高くなり(条件を満たすのが難しくなる)、値を下げると、観察された圧縮が満足できると見なされやすくなります。一般的なデータの圧縮率に合わせて調整してください。有効範囲:MIN=1、MAX=65537。 +- 導入バージョン: v3.4.1, 3.5.0, 4.0.0 ##### lz4_expected_compression_speed_mbps -- Default: 600 -- Type: double -- Unit: MB/秒 -- Is mutable: はい -- Description: 適応圧縮ポリシー (CompressStrategy) で使用されるメガバイト/秒単位の期待されるLZ4圧縮スループット。フィードバックルーチンは `reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps)` を計算します。`reward_ratio` が1.0より大きい場合、正のカウンタ (alpha) がインクリメントされ、そうでない場合は負のカウンタ (beta) がインクリメントされます。これは、将来のデータが圧縮されるかどうかに影響します。この値をハードウェアでの典型的なLZ4スループットを反映するように調整してください。値を上げると、ポリシーが実行を「良好」と分類するのが難しくなり (より高い観測速度が必要)、下げると分類が容易になります。正の有限数である必要があります。 -- Introduced in: v3.4.1, 3.5.0, 4.0.0 +- デフォルト: 600 +- タイプ: double +- 単位: MB/s +- 変更可能: はい +- 説明: 適応圧縮ポリシー(CompressStrategy)で使用される、1秒あたりのメガバイト単位の予想LZ4圧縮スループット。フィードバックルーチンは`reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps)`を計算します。`reward_ratio > 1.0`の場合、正のカウンター(alpha)が増加し、それ以外の場合は負のカウンター(beta)が増加します。これは将来のデータが圧縮されるかどうかに影響します。この値は、ハードウェア上の一般的なLZ4スループットを反映するように調整してください。値を上げると、ポリシーが実行を「良い」と分類するのを難しくし(より高い観察速度が必要)、値を下げると分類が容易になります。正の有限数である必要があります。 +- 導入バージョン: v3.4.1, 3.5.0, 4.0.0 ##### make_snapshot_worker_count -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BEノード上のスナップショット作成タスクの最大スレッド数。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BEノードでのスナップショット作成タスクの最大スレッド数。 +- 導入バージョン: - ##### manual_compaction_threads -- Default: 4 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: Manual Compactionのスレッド数。 -- Introduced in: - +- デフォルト: 4 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 手動コンパクションのスレッド数。 +- 導入バージョン: - ##### max_base_compaction_num_singleton_deltas -- Default: 100 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 各Base Compactionで圧縮できるセグメントの最大数。 -- Introduced in: - +- デフォルト: 100 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 各ベースコンパクションで圧縮できるセグメントの最大数。 +- 導入バージョン: - ##### max_compaction_candidate_num -- Default: 40960 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: コンパクションの候補タブレットの最大数。値が大きすぎると、高いメモリ使用量と高いCPU負荷を引き起こします。 -- Introduced in: - +- デフォルト: 40960 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: コンパクションの候補となるタブレットの最大数。値が大きすぎると、高いメモリ使用量と高いCPU負荷を引き起こします。 +- 導入バージョン: - ##### max_compaction_concurrency -- Default: -1 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: コンパクション(Base CompactionとCumulative Compactionの両方を含む)の最大同時実行性。値 `-1` は同時実行性に制限がないことを示します。`0` はコンパクションを無効にすることを示します。イベントベースのコンパクションフレームワークが有効な場合、このパラメータは変更可能です。 -- Introduced in: - +- デフォルト: -1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: コンパクションの最大並行処理数(ベースコンパクションと累積コンパクションの両方を含む)。値`-1`は並行処理に制限がないことを示します。`0`はコンパクションを無効にすることを示します。このパラメーターは、イベントベースのコンパクションフレームワークが有効な場合に変更可能です。 +- 導入バージョン: - ##### max_cumulative_compaction_num_singleton_deltas -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 単一のCumulative Compactionでマージできるセグメントの最大数。コンパクション中にOOMが発生した場合、この値を減らすことができます。 -- Introduced in: - +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 単一の累積コンパクションでマージできるセグメントの最大数。コンパクション中にOOMが発生する場合は、この値を減らすことができます。 +- 導入バージョン: - ##### max_download_speed_kbps -- Default: 50000 -- Type: Int -- Unit: KB/秒 -- Is mutable: はい -- Description: 各HTTPリクエストの最大ダウンロード速度。この値はBEノード間のデータレプリカ同期のパフォーマンスに影響します。 -- Introduced in: - +- デフォルト: 50000 +- タイプ: Int +- 単位: KB/秒 +- 変更可能: はい +- 説明: 各HTTPリクエストの最大ダウンロード速度。この値はBEノード間のデータレプリカ同期のパフォーマンスに影響します。 +- 導入バージョン: - ##### max_garbage_sweep_interval -- Default: 3600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ストレージボリュームのガベージコレクションの最大時間間隔。この設定はv3.0以降、動的に変更可能になりました。 -- Introduced in: - +- デフォルト: 3600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ストレージボリュームのガベージコレクションの最大時間間隔。この設定はv3.0以降動的に変更可能になりました。 +- 導入バージョン: - ##### max_percentage_of_error_disk -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 対応するBEノードが終了する前に、ストレージボリュームで許容できるエラーディスクの最大割合。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 対応するBEノードが終了する前に、ストレージボリュームで許容できるエラーの最大割合。 +- 導入バージョン: - ##### max_queueing_memtable_per_tablet -- Default: 2 -- Type: Long -- Unit: 個 -- Is mutable: はい -- Description: 書き込みパスのタブレットごとのバックプレッシャを制御します。タブレットのキューイング(まだフラッシュされていない)memtableの数が `max_queueing_memtable_per_tablet` に達するかそれを超えると、`LocalTabletsChannel` および `LakeTabletsChannel` のライターは、より多くの書き込み作業を送信する前にブロック(スリープ/再試行)します。これにより、同時memtableフラッシュの同時実行性とピークメモリ使用量が減少しますが、大量の負荷がかかるとレイテンシやRPCタイムアウトが増加するコストがかかります。より多くの同時memtableを許可するには(メモリとI/Oバーストが増加)、この値を高く設定します。メモリ負荷を制限し、書き込みスロットリングを増やすには、この値を低く設定します。 -- Introduced in: v3.2.0 +- デフォルト: 2 +- タイプ: Long +- 単位: カウント +- 変更可能: はい +- 説明: 書き込みパスでのタブレットごとのバックプレッシャを制御します。タブレットのキューイング中(まだフラッシュされていない)のMemTableの数が`max_queueing_memtable_per_tablet`に達するか超えると、`LocalTabletsChannel`と`LakeTabletsChannel`のライターは、さらに書き込み作業をサブミットする前にブロックされます(スリープ/再試行)。これにより、同時MemTableフラッシュの並行処理とピークメモリ使用量が減少し、重い負荷時のレイテンシ増加やRPCタイムアウトが発生するコストがかかります。より多くの同時MemTable(より多くのメモリとI/Oバースト)を許可するには高く設定し、メモリ負荷を制限し、書き込みスロットリングを増やすには低く設定します。 +- 導入バージョン: v3.2.0 ##### max_row_source_mask_memory_bytes -- Default: 209715200 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: 行ソースマスクバッファの最大メモリサイズ。バッファがこの値より大きい場合、データはディスク上の一時ファイルに永続化されます。この値は `compaction_memory_limit_per_worker` の値よりも低く設定する必要があります。 -- Introduced in: - +- デフォルト: 209715200 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: 行ソースマスクバッファの最大メモリサイズ。バッファがこの値より大きい場合、データはディスク上の一時ファイルに永続化されます。この値は`compaction_memory_limit_per_worker`の値よりも小さく設定する必要があります。 +- 導入バージョン: - ##### max_tablet_write_chunk_bytes -- Default: 536870912 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: 現在のインメモリタブレット書き込みチャンクの最大許容メモリ(バイト単位)。この値を超えると、チャンクは満杯と見なされ、送信キューに入れられます。この値を増やすと、ワイドテーブル(多くの列)をロードする際のRPCの頻度を減らすことができ、これによりスループットが向上する可能性がありますが、メモリ使用量とRPCペイロードが大きくなります。RPCの削減とメモリおよびシリアライズ/BRPCの制限のバランスをとるように調整してください。 -- Introduced in: v3.2.12 +- デフォルト: 536870912 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: 現在のインメモリタブレット書き込みチャンクの最大許容メモリ(バイト単位)。この値を超えると、チャンクはフルと見なされ、送信キューに入れられます。この値を増やすと、ワイドテーブル(多くのカラム)をロードする際のRPCの頻度が減り、スループットが向上する可能性がありますが、メモリ使用量とRPCペイロードが大きくなります。RPCの削減とメモリおよびシリアル化/BRPC制限とのバランスを取るように調整してください。 +- 導入バージョン: v3.2.12 ##### max_update_compaction_num_singleton_deltas -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーテーブルの単一のコンパクションでマージできるrowsetの最大数。 -- Introduced in: - +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーテーブルの単一のコンパクションでマージできるロウセットの最大数。 +- 導入バージョン: - ##### memory_limitation_per_thread_for_schema_change -- Default: 2 -- Type: Int -- Unit: GB -- Is mutable: はい -- Description: 各スキーマ変更タスクに許可される最大メモリサイズ。 -- Introduced in: - +- デフォルト: 2 +- タイプ: Int +- 単位: GB +- 変更可能: はい +- 説明: 各スキーマ変更タスクに許可される最大メモリサイズ。 +- 導入バージョン: - ##### memory_ratio_for_sorting_schema_change -- Default: 0.8 -- Type: Double -- Unit: - (単位なし比率) -- Is mutable: はい -- Description: スキーマ変更ソート操作中のメンテーブルの最大バッファサイズとして使用されるスレッドごとのスキーマ変更メモリ制限の割合。この比率は `memory_limitation_per_thread_for_schema_change` (GBで設定され、バイトに変換される) に乗算されて `max_buffer_size` が計算され、その結果は4GBで上限が設定されます。`SchemaChangeWithSorting` および `SortedSchemaChange` が `MemTable/DeltaWriter` を作成するときに使用されます。この比率を上げると、より大きなインメモリバッファが許可されます(フラッシュ/マージが減少)が、メモリ負荷のリスクが高まります。減らすと、より頻繁なフラッシュとより高いI/O/マージオーバーヘッドが発生します。 -- Introduced in: v3.2.0 +- デフォルト: 0.8 +- タイプ: Double +- 単位: - (単位なしの比率) +- 変更可能: はい +- 説明: スキーマ変更ソート操作中のMemTable最大バッファサイズとして使用される、スレッドごとのスキーマ変更メモリ制限の割合。この比率は`memory_limitation_per_thread_for_schema_change`(GB単位で設定され、バイトに変換される)と乗算されて`max_buffer_size`を計算し、その結果は4GBで制限されます。`SchemaChangeWithSorting`と`SortedSchemaChange`がMemTable/DeltaWriterを作成する際に使用されます。この比率を増やすと、より大きなインメモリバッファ(フラッシュ/マージの減少)が可能になりますが、メモリ負荷のリスクが高まります。減らすと、より頻繁なフラッシュと高いI/O/マージオーバーヘッドが発生します。 +- 導入バージョン: v3.2.0 ##### min_base_compaction_num_singleton_deltas -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Base Compactionをトリガーする最小セグメント数。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: ベースコンパクションをトリガーするセグメントの最小数。 +- 導入バージョン: - ##### min_compaction_failure_interval_sec -- Default: 120 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 以前のコンパクション失敗からタブレットコンパクションをスケジュールできる最小時間間隔。 -- Introduced in: - +- デフォルト: 120 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 前回のコンパクション失敗からタブレットコンパクションをスケジュールできる最小時間間隔。 +- 導入バージョン: - ##### min_cumulative_compaction_failure_interval_sec -- Default: 30 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 失敗時にCumulative Compactionが再試行する最小時間間隔。 -- Introduced in: - +- デフォルト: 30 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 累積コンパクションが失敗した場合に再試行する最小時間間隔。 +- 導入バージョン: - ##### min_cumulative_compaction_num_singleton_deltas -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Cumulative Compactionをトリガーする最小セグメント数。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 累積コンパクションをトリガーするセグメントの最小数。 +- 導入バージョン: - ##### min_garbage_sweep_interval -- Default: 180 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ストレージボリュームのガベージコレクションの最小時間間隔。この設定はv3.0以降、動的に変更可能になりました。 -- Introduced in: - +- デフォルト: 180 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ストレージボリュームのガベージコレクションの最小時間間隔。この設定はv3.0以降動的に変更可能になりました。 +- 導入バージョン: - ##### parallel_clone_task_per_path -- Default: 8 -- Type: Int -- Unit: スレッド -- Is mutable: はい -- Description: BE上のストレージパスごとに割り当てられる並列クローンワーカースレッドの数。BE起動時、クローンスレッドプールの最大スレッド数はmax(`number_of_store_paths` * `parallel_clone_task_per_path`, MIN_CLONE_TASK_THREADS_IN_POOL) として計算されます。例えば、4つのストレージパスとデフォルト=8の場合、クローンプール最大値は32です。この設定は、BEによって処理されるCLONEタスク(タブレットレプリカコピー)の同時実行性を直接制御します。これを増やすと、並列クローンのスループットが向上しますが、CPU、ディスク、ネットワークの競合も増加します。減らすと、同時クローンタスクが制限され、FEスケジュールされたクローン操作が抑制される可能性があります。この値は動的クローンスレッドプールに適用され、update-config HTTPアクションを介して実行時に変更できます(agent_serverがクローンプールの最大スレッドを更新するようにします)。 -- Introduced in: v3.2.0 +- デフォルト: 8 +- タイプ: Int +- 単位: スレッド +- 変更可能: はい +- 説明: BE上のストレージパスごとに割り当てられる並列クローンワーカー・スレッドの数。BE起動時、クローンスレッドプールの最大スレッド数は`max(ストアパス数 * parallel_clone_task_per_path, MIN_CLONE_TASK_THREADS_IN_POOL)`として計算されます。例えば、4つのストレージパスとデフォルト=8の場合、クローンプールの最大値は32です。この設定は、BEによって処理されるCLONEタスク(タブレットレプリカコピー)の並行処理を直接制御します。これを増やすと並列クローンスループットが向上しますが、CPU、ディスク、ネットワークの競合も増加します。減らすと同時クローンタスクが制限され、FEでスケジュールされたクローン操作がスロットルされる可能性があります。この値は動的クローンスレッドプールに適用され、`update-config`パスを介して実行時に変更できます(agent_serverがクローンプールの最大スレッドを更新する原因となります)。 +- 導入バージョン: v3.2.0 ##### partial_update_memory_limit_per_worker -- Default: 2147483648 -- Type: long -- Unit: バイト -- Is mutable: はい -- Description: 部分列更新(コンパクション/rowset更新処理で使用)を実行する際に、単一のワーカーがソースチャンクを組み立てるために使用できる最大メモリ(バイト単位)。リーダーは行ごとの更新メモリ(total_update_row_size / num_rows_upt)を推定し、それを読み取られた行数に乗算します。その積がこの制限を超えると、現在のチャンクはフラッシュされ、追加のメモリ増加を避けるために処理されます。これを、更新ワーカーごとに利用可能なメモリに合わせて設定してください。低すぎるとI/O/処理オーバーヘッドが増加し(多くの小さなチャンク)、高すぎるとメモリ負荷やOOMのリスクが高まります。行ごとの推定値がゼロの場合(レガシーrowset)、この設定はバイトベースの制限を課しません(INT32_MAX行数制限のみが適用されます)。 -- Introduced in: v3.2.10 +- デフォルト: 2147483648 +- タイプ: long +- 単位: バイト +- 変更可能: はい +- 説明: 部分カラム更新を実行する際(コンパクション/ロウセット更新処理で使用)、単一のワーカーがソースチャンクの組み立てに使用できる最大メモリ(バイト単位)。リーダーは行ごとの更新メモリ(`total_update_row_size / num_rows_upt`)を見積もり、それを読み取られた行数で乗算します。その積がこの制限を超えると、現在のチャンクはフラッシュされ、追加のメモリ増加を避けるために処理されます。これを更新ワーカーあたりの利用可能なメモリに合わせて設定してください。低すぎるとI/O/処理オーバーヘッドが増加し(多くの小さなチャンク)、高すぎるとメモリ負荷やOOMのリスクがあります。行ごとの見積もりがゼロの場合(レガシーロウセット)、この設定はバイトベースの制限を課しません(`INT32_MAX`行数制限のみが適用されます)。 +- 導入バージョン: v3.2.10 ##### path_gc_check -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: 有効にすると、StorageEngineはデータディレクトリごとのバックグラウンドスレッドを起動し、定期的なパススキャンとガベージコレクションを実行します。起動時に `start_bg_threads()` は `_path_scan_thread_callback`(`DataDir::perform_path_scan` と `perform_tmp_path_scan` を呼び出す)と `_path_gc_thread_callback`(`DataDir::perform_path_gc_by_tablet`、`DataDir::perform_path_gc_by_rowsetid`、`DataDir::perform_delta_column_files_gc`、および `DataDir::perform_crm_gc` を呼び出す)を生成します。スキャンとGCの間隔は `path_scan_interval_second` と `path_gc_check_interval_second` によって制御されます。CRMファイルのクリーンアップは `unused_crm_file_threshold_second` を使用します。自動パスレベルのクリーンアップを防ぐにはこれを無効にしてください(その場合、孤立した/一時ファイルを手動で管理する必要があります)。このフラグを変更するにはプロセスの再起動が必要です。 -- Introduced in: v3.2.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: 有効にすると、`StorageEngine`はデータディレクトリごとのバックグラウンドスレッドを開始し、定期的なパススキャンとガベージコレクションを実行します。起動時に`start_bg_threads()`は`_path_scan_thread_callback`(`DataDir::perform_path_scan`および`perform_tmp_path_scan`を呼び出す)と`_path_gc_thread_callback`(`DataDir::perform_path_gc_by_tablet`、`DataDir::perform_path_gc_by_rowsetid`、`DataDir::perform_delta_column_files_gc`、および`DataDir::perform_crm_gc`を呼び出す)を生成します。スキャンとGCの間隔は`path_scan_interval_second`と`path_gc_check_interval_second`によって制御されます。CRMファイルクリーンアップは`unused_crm_file_threshold_second`を使用します。これを無効にすると、パスレベルの自動クリーンアップが防止されます(その後、孤立ファイル/一時ファイルを手動で管理する必要があります)。このフラグを変更するには、プロセスの再起動が必要です。 +- 導入バージョン: v3.2.0 ##### path_gc_check_interval_second -- Default: 86400 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: ストレージエンジンのパスガベージコレクションバックグラウンドスレッドの実行間隔(秒単位)。各ウェイクアップは、DataDirがタブレットごと、rowset IDごと、デルタ列ファイルGC、およびCRM GCによってパスGCを実行することをトリガーします(CRM GC呼び出しは `unused_crm_file_threshold_second` を使用します)。非正の値に設定されている場合、コードは強制的に間隔を1800秒(30分)に設定し、警告を出力します。オンディスクの一時ファイルまたはダウンロードされたファイルがスキャンおよび削除される頻度を制御するためにこれを調整してください。 -- Introduced in: v3.2.0 +- デフォルト: 86400 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: ストレージエンジンのパスガベージコレクションバックグラウンドスレッドの実行間隔(秒単位)。各ウェイクで`DataDir`はタブレット、ロウセットID、デルタカラムファイルGC、およびCRM GCによってパスGCを実行します(CRM GC呼び出しは`unused_crm_file_threshold_second`を使用します)。非正の値に設定されている場合、コードは間隔を1800秒(30分)に強制し、警告を出力します。この値を調整して、オンディスクの一時ファイルまたはダウンロードされたファイルがスキャンおよび削除される頻度を制御します。 +- 導入バージョン: v3.2.0 ##### pending_data_expire_time_sec -- Default: 1800 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ストレージエンジン内の保留データの有効期限。 -- Introduced in: - +- デフォルト: 1800 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ストレージエンジン内の保留中データの有効期限。 +- 導入バージョン: - ##### pindex_major_compaction_limit_per_disk -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: ディスクごとのコンパクションの最大同時実行性。これにより、コンパクションによるディスク間のI/Oの不均一性の問題に対処します。この問題は、特定のディスクでI/Oが過度に高くなる原因となる可能性があります。 -- Introduced in: v3.0.9 +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: ディスクごとのコンパクションの最大並行処理数。これは、コンパクションによるディスク間のI/Oの不均一性の問題を解決します。この問題は、特定のディスクで過度に高いI/Oを引き起こす可能性があります。 +- 導入バージョン: v3.0.9 ##### pk_index_compaction_score_ratio -- Default: 1.5 -- Type: Double -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックスのコンパクションスコア比率。例えば、N個のファイルセットがある場合、コンパクションスコアは `N * pk_index_compaction_score_ratio` となります。 -- Introduced in: - +- デフォルト: 1.5 +- タイプ: Double +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックスのコンパクションスコア比率。例えば、N個のファイルセットがある場合、コンパクションスコアは`N * pk_index_compaction_score_ratio`になります。 +- 導入バージョン: - ##### pk_index_early_sst_compaction_threshold -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックスの早期SSTコンパクション閾値。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックスの早期SSTコンパクション閾値。 +- 導入バージョン: - ##### pk_index_map_shard_size -- Default: 4096 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: lake UpdateManagerのプライマリキーインデックスシャードマップで使用されるシャード数。UpdateManagerは、このサイズの `PkIndexShard` ベクトルを割り当て、ビットマスクを介してタブレットIDをシャードにマップします。この値を増やすと、そうでなければ同じシャードを共有するタブレット間のロック競合が減少しますが、その代償としてより多くのミューテックスオブジェクトとわずかに高いメモリ使用量が発生します。コードがビットマスクインデックスに依存しているため、値は2の累乗である必要があります。サイジングのガイダンスについては、`tablet_map_shard_size` ヒューリスティック `total_num_of_tablets_in_BE / 512` を参照してください。 -- Introduced in: v3.2.0 +- デフォルト: 4096 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: Lake UpdateManagerのプライマリキーインデックスシャードマップで使用されるシャード数。UpdateManagerはこのサイズの`PkIndexShard`のベクターを割り当て、タブレットIDをビットマスクを介してシャードにマッピングします。この値を増やすと、同じシャードを共有するであろうタブレット間のロック競合が減少しますが、ミューテックスオブジェクトの増加とわずかなメモリ使用量の増加を伴います。コードはビットマスクインデックスに依存するため、値は2の累乗でなければなりません。サイズ設定のガイダンスについては、`tablet_map_shard_size`ヒューリスティック:`total_num_of_tablets_in_BE / 512`を参照してください。 +- 導入バージョン: v3.2.0 ##### pk_index_memtable_flush_threadpool_max_threads -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックスMemTableフラッシュ用スレッドプールの最大スレッド数。`0` はCPUコア数の半分に自動設定されることを意味します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックスMemTableフラッシュ用スレッドプールの最大スレッド数。`0`はCPUコア数の半分に自動設定されることを意味します。 +- 導入バージョン: - ##### pk_index_memtable_flush_threadpool_size -- Default: 1048576 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データ(クラウドネイティブ/lake)モードで使用されるプライマリキーインデックスmemtableフラッシュスレッドプールの最大キューサイズ(保留中のタスク数)を制御します。スレッドプールはExecEnvで「cloud_native_pk_index_flush」として作成されます。その最大スレッド数は `pk_index_memtable_flush_threadpool_max_threads` によって制御されます。この値を増やすと、実行前にmemtableフラッシュタスクをより多くバッファリングできます。これにより、即時のバックプレッシャは減少しますが、キューに入れられたタスクオブジェクトによって消費されるメモリが増加します。減らすと、バッファリングされたタスクが制限され、スレッドプール動作に応じて、より早いバックプレッシャまたはタスク拒否が発生する可能性があります。利用可能なメモリと予想される同時フラッシュワークロードに応じて調整してください。 -- Introduced in: - +- デフォルト: 1048576 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データ(クラウドネイティブ/Lake)モードで使用されるプライマリキーインデックスMemTableフラッシュスレッドプールの最大キューサイズ(保留中のタスク数)を制御します。スレッドプールはExecEnvで"cloud_native_pk_index_flush"として作成され、その最大スレッド数は`pk_index_memtable_flush_threadpool_max_threads`によって管理されます。この値を増やすと、実行前にMemTableフラッシュタスクをより多くバッファリングでき、即時のバックプレッシャを軽減できますが、キューに入れられたタスクオブジェクトによって消費されるメモリが増加します。減らすと、バッファリングされるタスクが制限され、スレッドプールの動作に応じてより早期のバックプレッシャまたはタスク拒否が発生する可能性があります。利用可能なメモリと予想される同時フラッシュワークロードに応じて調整してください。 +- 導入バージョン: - ##### pk_index_memtable_max_count -- Default: 2 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックスのMemTablesの最大数。 -- Introduced in: - +- デフォルト: 2 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックスのMemTableの最大数。 +- 導入バージョン: - ##### pk_index_memtable_max_wait_flush_timeout_ms -- Default: 30000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックスMemTableフラッシュ完了待機時間の上限。すべてのMemTablesを同期的にフラッシュする場合(例えば、SST操作の取り込み前)、システムはこのタイムアウトまで待機します。デフォルトは30秒です。 -- Introduced in: - +- デフォルト: 30000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: 共有データクラスターでプライマリキーインデックスMemTableフラッシュの完了を待機する最大タイムアウト。すべてのMemTableを同期的にフラッシュする際(例えば、Ingest SST操作の前など)、システムはこのタイムアウトまで待機します。デフォルトは30秒です。 +- 導入バージョン: - ##### pk_index_parallel_compaction_task_split_threshold_bytes -- Default: 33554432 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: プライマリキーインデックスコンパクションタスクの分割閾値。タスクに関与するファイルの合計サイズがこの閾値よりも小さい場合、タスクは分割されません。 -- Introduced in: - +- デフォルト: 33554432 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: プライマリキーインデックスコンパクションタスクの分割閾値。タスクに関与するファイルの合計サイズがこの閾値より小さい場合、タスクは分割されません。 +- 導入バージョン: - ##### pk_index_parallel_compaction_threadpool_max_threads -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのクラウドネイティブプライマリキーインデックス並列コンパクション用スレッドプールの最大スレッド数。`0` はCPUコア数の半分に自動設定されることを意味します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのクラウドネイティブプライマリキーインデックス並列コンパクション用スレッドプールの最大スレッド数。`0`はCPUコア数の半分に自動設定されることを意味します。 +- 導入バージョン: - ##### pk_index_parallel_compaction_threadpool_size -- Default: 1048576 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データモードのクラウドネイティブプライマリキーインデックス並列コンパクションで使用されるスレッドプールの最大キューサイズ(保留中のタスク数)。この設定は、スレッドプールが新しい送信を拒否するまでにキューに入れることができるコンパクションタスクの数を制御します。実効的な並列処理は `pk_index_parallel_compaction_threadpool_max_threads` によって制限されます。多くの同時コンパクションタスクが予想される場合にタスクの拒否を避けるにはこの値を増やしますが、キューが大きくなると、キューに入っている作業のメモリとレイテンシが増加することに注意してください。 -- Introduced in: - +- デフォルト: 1048576 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データモードのクラウドネイティブプライマリキーインデックス並列コンパクションで使用されるスレッドプールの最大キューサイズ(保留中タスク数)。この設定は、スレッドプールが新しいサブミッションを拒否するまでにキューに入れられるコンパクションタスクの数を制御します。実効並列処理は`pk_index_parallel_compaction_threadpool_max_threads`によって制限されます。多くの同時コンパクションタスクが予想される場合に、タスク拒否を避けるためにこの値を増やしますが、キューが大きくなると、キューに入れられた作業のメモリとレイテンシが増加することに注意してください。 +- 導入バージョン: - ##### pk_index_parallel_execution_min_rows -- Default: 16384 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックス操作で並列実行を有効にするための最小行閾値。 -- Introduced in: - +- デフォルト: 16384 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックス操作で並列実行を有効にする最小行数閾値。 +- 導入バージョン: - ##### pk_index_parallel_execution_threadpool_max_threads -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックス並列実行用スレッドプールの最大スレッド数。`0` はCPUコア数の半分に自動設定されることを意味します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックス並列実行用スレッドプールの最大スレッド数。`0`はCPUコア数の半分に自動設定されることを意味します。 +- 導入バージョン: - ##### pk_index_size_tiered_level_multiplier -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーインデックスのサイズ階層型コンパクション戦略のレベル乗数パラメータ。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーインデックスサイズ階層型コンパクション戦略のレベル乗数パラメーター。 +- 導入バージョン: - ##### pk_index_size_tiered_max_level -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーインデックスのサイズ階層型コンパクション戦略の最大レベル。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーインデックスサイズ階層型コンパクション戦略の最大レベル。 +- 導入バージョン: - ##### pk_index_size_tiered_min_level_size -- Default: 131072 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーインデックスのサイズ階層型コンパクション戦略の最小レベル。 -- Introduced in: - +- デフォルト: 131072 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーインデックスサイズ階層型コンパクション戦略の最小レベル。 +- 導入バージョン: - ##### pk_index_sstable_sample_interval_bytes -- Default: 16777216 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: 共有データクラスターのSSTableファイルのサンプリング間隔サイズ。SSTableファイルのサイズがこの閾値を超えると、システムはこの間隔でSSTableからキーをサンプリングして、コンパクションタスクの境界パーティションを最適化します。この閾値よりも小さいSSTableの場合、開始キーのみが境界キーとして使用されます。デフォルトは16MBです。 -- Introduced in: - +- デフォルト: 16777216 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: 共有データクラスター内のSSTableファイルのサンプリング間隔サイズ。SSTableファイルのサイズがこの閾値を超えると、システムはコンパクションタスクの境界パーティショニングを最適化するために、この間隔でSSTableからキーをサンプリングします。この閾値より小さいSSTableの場合、開始キーのみが境界キーとして使用されます。デフォルトは16 MBです。 +- 導入バージョン: - ##### pk_index_target_file_size -- Default: 67108864 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーインデックスのターゲットファイルサイズ。 -- Introduced in: - +- デフォルト: 67108864 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: 共有データクラスターのプライマリキーインデックスのターゲットファイルサイズ。 +- 導入バージョン: - ##### pk_index_eager_build_threshold_bytes -- Default: 104857600 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: `enable_pk_index_eager_build` がtrueに設定されている場合、インポートまたはコンパクション中に生成されたデータがこの閾値を超えた場合にのみ、システムはPKインデックスファイルを積極的に構築します。デフォルトは100MBです。 -- Introduced in: - +- デフォルト: 104857600 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: `enable_pk_index_eager_build`がtrueに設定されている場合、インポートまたはコンパクション中に生成されたデータがこの閾値を超えた場合にのみ、システムは積極的にPKインデックスファイルを構築します。デフォルトは100MBです。 +- 導入バージョン: - ##### primary_key_limit_size -- Default: 128 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: プライマリキーテーブルのキー列の最大サイズ。 -- Introduced in: v2.5 +- デフォルト: 128 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: プライマリキーテーブルのキーカラムの最大サイズ。 +- 導入バージョン: v2.5 ##### release_snapshot_worker_count -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BEノード上のスナップショット解放タスクの最大スレッド数。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BEノードでのスナップショット解放タスクの最大スレッド数。 +- 導入バージョン: - ##### repair_compaction_interval_seconds -- Default: 600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Repair Compactionスレッドのポーリング時間間隔。 -- Introduced in: - +- デフォルト: 600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: リペアコンパクションスレッドのポーリング時間間隔。 +- 導入バージョン: - ##### replication_max_speed_limit_kbps -- Default: 50000 -- Type: Int -- Unit: KB/秒 -- Is mutable: はい -- Description: 各レプリケーションスレッドの最大速度。 -- Introduced in: v3.3.5 +- デフォルト: 50000 +- タイプ: Int +- 単位: KB/s +- 変更可能: はい +- 説明: 各レプリケーションスレッドの最大速度。 +- 導入バージョン: v3.3.5 ##### replication_min_speed_limit_kbps -- Default: 50 -- Type: Int -- Unit: KB/秒 -- Is mutable: はい -- Description: 各レプリケーションスレッドの最小速度。 -- Introduced in: v3.3.5 +- デフォルト: 50 +- タイプ: Int +- 単位: KB/s +- 変更可能: はい +- 説明: 各レプリケーションスレッドの最小速度。 +- 導入バージョン: v3.3.5 ##### replication_min_speed_time_seconds -- Default: 300 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: レプリケーションスレッドが最小速度を下回ることを許容される時間。実際の速度が `replication_min_speed_limit_kbps` より低い時間がこの値を超えると、レプリケーションは失敗します。 -- Introduced in: v3.3.5 +- デフォルト: 300 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: レプリケーションスレッドが最小速度を下回っても許容される時間。実際の速度が`replication_min_speed_limit_kbps`を下回る時間がこの値を超えると、レプリケーションは失敗します。 +- 導入バージョン: v3.3.5 ##### replication_threads -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: レプリケーションに使用される最大スレッド数。`0` はスレッド数をBE CPUコア数の4倍に設定することを示します。 -- Introduced in: v3.3.5 +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: レプリケーションに使用される最大スレッド数。`0`は、BEのCPUコア数の4倍のスレッド数に設定することを示します。 +- 導入バージョン: v3.3.5 ##### size_tiered_level_multiple -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: サイズ階層型コンパクションポリシーにおける2つの連続するレベル間のデータサイズの倍数。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: サイズ階層型コンパクションポリシーにおける2つの連続するレベル間のデータサイズの倍数。 +- 導入バージョン: - ##### size_tiered_level_multiple_dupkey -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: サイズ階層型コンパクションポリシーにおける、Duplicate Keyテーブルの2つの隣接するレベル間のデータ量の差の倍数。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: サイズ階層型コンパクションポリシーにおいて、重複キーテーブルの2つの隣接するレベル間のデータ量差の倍数。 +- 導入バージョン: - ##### size_tiered_level_num -- Default: 7 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: サイズ階層型コンパクションポリシーのレベル数。各レベルには最大1つのrowsetが予約されます。したがって、安定した状態では、この構成項目で指定されたレベル数と同じ数のrowsetが最大で存在します。 -- Introduced in: - +- デフォルト: 7 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: サイズ階層型コンパクションポリシーのレベル数。各レベルに最大1つのロウセットが予約されます。したがって、安定した状態では、この設定項目で指定されたレベル数と同じ数のロウセットが最大で存在します。 +- 導入バージョン: - ##### size_tiered_max_compaction_level -- Default: 3 -- Type: Int -- Unit: レベル -- Is mutable: はい -- Description: 単一のプライマリキーリアルタイムコンパクションタスクにマージできるサイズ階層レベルの数を制限します。PKサイズ階層型コンパクションの選択中、StarRocksはサイズによってrowsetの順序付けられた「レベル」を構築し、この制限に達するまで連続するレベルを選択されたコンパクション入力に追加します(コードは `compaction_level <= size_tiered_max_compaction_level` を使用します)。この値は含まれ、マージされた異なるサイズ階層の数をカウントします(最上位レベルは1としてカウントされます)。PKサイズ階層型コンパクション戦略が有効な場合にのみ有効です。これを上げると、コンパクションタスクにさらに多くのレベルを含めることができます(より大きく、I/OとCPUを多用するマージ、潜在的により高い書き込み増幅)。一方、下げるとマージが制限され、タスクサイズとリソース使用量が削減されます。 -- Introduced in: v4.0.0 +- デフォルト: 3 +- タイプ: Int +- 単位: レベル +- 変更可能: はい +- 説明: 単一のプライマリキーリアルタイムコンパクションタスクにマージできるサイズ階層レベルの数を制限します。PKサイズ階層型コンパクション選択中、StarRocksはサイズ順にロウセットの「レベル」を構築し、この制限に達するまで連続するレベルを選択されたコンパクション入力に追加します(コードは`compaction_level <= size_tiered_max_compaction_level`を使用)。値は包括的であり、マージされた異なるサイズ層の数を数えます(最上位レベルは1とカウントされます)。PKサイズ階層型コンパクション戦略が有効な場合にのみ有効です。値を上げると、コンパクションタスクにより多くのレベルを含めることができ(より大きく、I/OとCPUを多く消費するマージ、潜在的に高い書き込み増幅)、減らすとマージを制限し、タスクサイズとリソース使用量を削減します。 +- 導入バージョン: v4.0.0 ##### size_tiered_min_level_size -- Default: 131072 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: サイズ階層型コンパクションポリシーにおける最小レベルのデータサイズ。この値よりも小さいrowsetは、直ちにデータコンパクションをトリガーします。 -- Introduced in: - +- デフォルト: 131072 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: サイズ階層型コンパクションポリシーにおける最小レベルのデータサイズ。この値よりも小さいロウセットは、直ちにデータコンパクションをトリガーします。 +- 導入バージョン: - ##### small_dictionary_page_size -- Default: 4096 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: `BinaryPlainPageDecoder` が辞書(バイナリ/プレーン)ページを積極的に解析するかどうかを決定する閾値(バイト単位)。ページのエンコードサイズが `small_dictionary_page_size` 未満の場合、デコーダーはすべての文字列エントリをインメモリベクトル(`_parsed_datas`)に事前に解析し、ランダムアクセスとバッチ読み取りを高速化します。この値を上げると、より多くのページが事前に解析されます(アクセスごとのデコードオーバーヘッドを削減し、より大きな辞書の実効圧縮率を向上させる可能性があります)が、メモリ使用量と解析に費やされるCPUが増加します。過度に大きな値は全体的なパフォーマンスを低下させる可能性があります。メモリとアクセスレイテンシのトレードオフを測定した後でのみ調整してください。 -- Introduced in: v3.4.1, v3.5.0 +- デフォルト: 4096 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: `BinaryPlainPageDecoder`が辞書(バイナリ/プレーン)ページを積極的に解析するかどうかを決定するために使用する閾値(バイト単位)。ページのエンコードサイズが`small_dictionary_page_size`未満の場合、デコーダーはすべての文字列エントリをインメモリベクトル(`_parsed_datas`)に事前に解析し、ランダムアクセスとバッチ読み取りを高速化します。この値を上げると、より多くのページが事前解析されます(アクセスあたりのデコードオーバーヘッドを削減し、より大きな辞書で実効圧縮を向上させる可能性があります)が、メモリ使用量と解析に費やされるCPUが増加します。過度に大きな値は全体的なパフォーマンスを低下させる可能性があります。メモリとアクセスレイテンシのトレードオフを測定した後で調整してください。 +- 導入バージョン: v3.4.1, v3.5.0 ##### snapshot_expire_time_sec -- Default: 172800 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: スナップショットファイルの有効期限。 -- Introduced in: - +- デフォルト: 172800 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: スナップショットファイルの有効期限。 +- 導入バージョン: - ##### stale_memtable_flush_time_sec -- Default: 0 -- Type: long -- Unit: 秒 -- Is mutable: はい -- Description: 送信ジョブのメモリ使用量が高い場合、`stale_memtable_flush_time_sec` 秒よりも長く更新されていないメンテーブルは、メモリ負荷を軽減するためにフラッシュされます。この動作は、メモリ制限が近づいている場合(`limit_exceeded_by_ratio(70)` 以上)にのみ考慮されます。`LocalTabletsChannel` では、非常に高いメモリ使用量(`limit_exceeded_by_ratio(95)`)の場合、追加パスでサイズが `write_buffer_size / 4` を超えるメンテーブルをフラッシュする可能性があります。値 `0` は、この年齢ベースの古いメンテーブルフラッシングを無効にします(不変パーティションのメンテーブルは、アイドル状態または高メモリ時にすぐにフラッシュされます)。 -- Introduced in: v3.2.0 +- デフォルト: 0 +- タイプ: long +- 単位: 秒 +- 変更可能: はい +- 説明: 送信側ジョブのメモリ使用量が高い場合、`stale_memtable_flush_time_sec`秒以上更新されていないMemTableは、メモリ負荷を軽減するためにフラッシュされます。この動作は、メモリ制限が近づいている場合(`limit_exceeded_by_ratio(70)`以上)にのみ考慮されます。`LocalTabletsChannel`では、非常に高いメモリ使用量(`limit_exceeded_by_ratio(95)`)の場合に、サイズが`write_buffer_size / 4`を超えるMemTableがフラッシュされる追加のパスが存在する場合があります。`0`の値は、この期間ベースの古いMemTableのフラッシュを無効にします(不変パーティションのMemTableは、アイドル時または高メモリ時でも直ちにフラッシュされます)。 +- 導入バージョン: v3.2.0 ##### storage_flood_stage_left_capacity_bytes -- Default: 107374182400 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: すべてのBEディレクトリに残っているストレージスペースのハードリミット。BEストレージディレクトリの残りのストレージスペースがこの値よりも少なく、ストレージ使用量(パーセンテージ)が `storage_flood_stage_usage_percent` を超えている場合、ロードジョブとリストアジョブは拒否されます。構成を有効にするには、FE構成項目 `storage_usage_hard_limit_reserve_bytes` と一緒にこの項目を設定する必要があります。 -- Introduced in: - +- デフォルト: 107374182400 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: 全BEディレクトリの残りのストレージスペースのハードリミット。BEストレージディレクトリの残りのストレージスペースがこの値よりも小さく、かつストレージ使用率(パーセンテージ)が`storage_flood_stage_usage_percent`を超えた場合、LoadおよびRestoreジョブは拒否されます。この項目をFE設定項目`storage_usage_hard_limit_reserve_bytes`と一緒に設定することで、設定が有効になります。 +- 導入バージョン: - ##### storage_flood_stage_usage_percent -- Default: 95 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: すべてのBEディレクトリのストレージ使用率(パーセンテージ)のハードリミット。BEストレージディレクトリのストレージ使用率(パーセンテージ)がこの値を超え、残りのストレージスペースが `storage_flood_stage_left_capacity_bytes` 未満の場合、ロードジョブとリストアジョブは拒否されます。構成を有効にするには、FE構成項目 `storage_usage_hard_limit_percent` と一緒にこの項目を設定する必要があります。 -- Introduced in: - +- デフォルト: 95 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 全BEディレクトリのストレージ使用率(パーセンテージ)のハードリミット。BEストレージディレクトリのストレージ使用率(パーセンテージ)がこの値を超え、かつ残りのストレージスペースが`storage_flood_stage_left_capacity_bytes`よりも小さい場合、LoadおよびRestoreジョブは拒否されます。この項目をFE設定項目`storage_usage_hard_limit_percent`と一緒に設定することで、設定が有効になります。 +- 導入バージョン: - ##### storage_high_usage_disk_protect_ratio -- Default: 0.1 -- Type: double -- Unit: - -- Is mutable: はい -- Description: タブレット作成のストレージルートを選択する際、`StorageEngine` は候補ディスクを `disk_usage(0)` でソートし、平均使用量を計算します。使用量が (平均使用量 + `storage_high_usage_disk_protect_ratio`) より大きいディスクは、優先選択プールから除外されます(ランダム化された優先シャッフルには参加せず、したがって初期選択から延期されます)。この保護を無効にするには0に設定します。値は分数です(一般的な範囲は0.0〜1.0)。値が大きいほど、スケジューラは平均よりも高いディスクに対して寛容になります。 -- Introduced in: v3.2.0 +- デフォルト: 0.1 +- タイプ: double +- 単位: - +- 変更可能: はい +- 説明: タブレット作成のためにストレージルートを選択する際、StorageEngineは候補ディスクを`disk_usage(0)`でソートし、平均使用量を計算します。使用量が(平均使用量 + `storage_high_usage_disk_protect_ratio`)より大きいディスクは、優先選択プールから除外されます(ランダムな優先シャッフルに参加せず、したがって初期選択から除外されます)。この保護を無効にするには0に設定します。値は小数です(一般的な範囲は0.0-1.0)。値が大きいほど、スケジューラは平均よりも高いディスクに対してより寛容になります。 +- 導入バージョン: v3.2.0 ##### storage_medium_migrate_count -- Default: 3 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: ストレージメディアの移行(SATAからSSDへ)に使用されるスレッド数。 -- Introduced in: - +- デフォルト: 3 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: ストレージメディアの移行(SATAからSSDへ)に使用されるスレッド数。 +- 導入バージョン: - ##### storage_root_path -- Default: `${STARROCKS_HOME}/storage` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: ストレージボリュームのディレクトリとメディア。例:`/data1,medium:hdd;/data2,medium:ssd`。 +- デフォルト: `${STARROCKS_HOME}/storage` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: ストレージボリュームのディレクトリとメディア。例: `/data1,medium:hdd;/data2,medium:ssd`。 - 複数のボリュームはセミコロン(`;`)で区切られます。 - - ストレージメディアがSSDの場合、ディレクトリの最後に `,medium:ssd` を追加します。 - - ストレージメディアがHDDの場合、ディレクトリの最後に `,medium:hdd` を追加します。 -- Introduced in: - + - ストレージメディアがSSDの場合は、ディレクトリの末尾に`,medium:ssd`を追加します。 + - ストレージメディアがHDDの場合は、ディレクトリの末尾に`,medium:hdd`を追加します。 +- 導入バージョン: - ##### sync_tablet_meta -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: タブレットメタデータの同期を有効にするかどうかを制御するブール値。`true` は同期を有効にすることを示し、`false` は無効にすることを示します。 -- Introduced in: - +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: タブレットメタデータの同期を有効にするかどうかを制御するブール値。`true`は同期を有効にすることを示し、`false`は無効にすることを示します。 +- 導入バージョン: - ##### tablet_map_shard_size -- Default: 1024 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: タブレットマップのシャードサイズ。値は2の累乗である必要があります。 -- Introduced in: - +- デフォルト: 1024 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: タブレットマップのシャードサイズ。値は2の累乗でなければなりません。 +- 導入バージョン: - ##### tablet_max_pending_versions -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキータブレットで許容される保留バージョンの最大数。保留バージョンとは、コミットされたがまだ適用されていないバージョンを指します。 -- Introduced in: - +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキータブレットで許容できる保留中バージョンの最大数。保留中バージョンとは、コミットされたがまだ適用されていないバージョンを指します。 +- 導入バージョン: - ##### tablet_max_versions -- Default: 1000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: タブレットに許可される最大バージョン数。バージョン数がこの値を超えると、新しい書き込みリクエストは失敗します。 -- Introduced in: - +- デフォルト: 1000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: タブレットで許可される最大バージョン数。バージョン数がこの値を超えると、新しい書き込みリクエストは失敗します。 +- 導入バージョン: - ##### tablet_meta_checkpoint_min_interval_secs -- Default: 600 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: TabletMetaチェックポイントのスレッドポーリング時間間隔。 -- Introduced in: - +- デフォルト: 600 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: TabletMetaチェックポイントのポーリングスレッドの時間間隔。 +- 導入バージョン: - ##### tablet_meta_checkpoint_min_new_rowsets_num -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 最後のTabletMetaチェックポイント以降に作成する最小rowset数。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 前回のTabletMetaチェックポイント以降に作成するロウセットの最小数。 +- 導入バージョン: - ##### tablet_rowset_stale_sweep_time_sec -- Default: 1800 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: タブレット内の古いrowsetをスイープする時間間隔。 -- Introduced in: - +- デフォルト: 1800 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: タブレット内の古いロウセットをスイープする時間間隔。 +- 導入バージョン: - ##### tablet_stat_cache_update_interval_second -- Default: 300 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Tablet Stat Cacheが更新される時間間隔。 -- Introduced in: - +- デフォルト: 300 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: Tablet Stat Cacheが更新される時間間隔。 +- 導入バージョン: - ##### tablet_writer_open_rpc_timeout_sec -- Default: 300 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: リモートBEでタブレットライターを開くRPCのタイムアウト(秒単位)。値はミリ秒に変換され、オープン呼び出しを発行する際のリクエストタイムアウトとbrpcコントロールタイムアウトの両方に適用されます。ランタイムは、実効タイムアウトを `tablet_writer_open_rpc_timeout_sec` と全体的なロードタイムアウトの半分(つまり、min(`tablet_writer_open_rpc_timeout_sec`, `load_timeout_sec` / 2))の最小値として使用します。この値を、タイムリーな障害検出(小さすぎると早期のオープン失敗を引き起こす可能性あり)とBEがライターを初期化するのに十分な時間を与える(大きすぎるとエラー処理が遅れる)バランスをとるように設定してください。 -- Introduced in: v3.2.0 +- デフォルト: 300 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: リモートBEでタブレットライターを開くRPCのタイムアウト(秒単位)。値はミリ秒に変換され、オープン呼び出しを発行する際にリクエストタイムアウトとbrpc制御タイムアウトの両方に適用されます。ランタイムは、実効タイムアウトとして`tablet_writer_open_rpc_timeout_sec`と全体のロードタイムアウトの半分(つまり、`min(tablet_writer_open_rpc_timeout_sec, load_timeout_sec / 2)`)の最小値を使用します。これを、適時の障害検出(小さすぎると時期尚早なオープン失敗を引き起こす可能性があります)と、BEがライターを初期化するのに十分な時間を与える(大きすぎるとエラー処理が遅延する)とのバランスを取るように設定してください。 +- 導入バージョン: v3.2.0 ##### transaction_apply_worker_count -- Default: 0 -- Type: Int -- Unit: スレッド -- Is mutable: はい -- Description: UpdateManagerの「update_apply」スレッドプール(トランザクションのrowsetを適用するプール、特にプライマリキーテーブルの場合)が使用するワーカースレッドの最大数を制御します。`>0` の値は固定された最大スレッド数を設定します。`0`(デフォルト)はプールのサイズがCPUコア数に等しいことを意味します。設定された値は起動時(`UpdateManager::init`)に適用され、`update-config` HTTPアクションを介して実行時に変更でき、プールの最大スレッドを更新します。これを調整して、適用同時実行性(スループット)を向上させるか、CPU/メモリの競合を制限してください。最小スレッド数とアイドルタイムアウトはそれぞれ `transaction_apply_thread_pool_num_min` と `transaction_apply_worker_idle_time_ms` によって制御されます。 -- Introduced in: v3.2.0 +- デフォルト: 0 +- タイプ: Int +- 単位: スレッド +- 変更可能: はい +- 説明: `UpdateManager`の"update_apply"スレッドプール(トランザクションのロウセットを適用するプール、特にプライマリキーテーブルの場合)が使用するワーカー・スレッドの最大数を制御します。`>0`の値は固定の最大スレッド数を設定し、0(デフォルト)はプールのサイズをCPUコア数に等しくします。設定された値は起動時(`UpdateManager::init`)に適用され、`update-config` HTTPアクションを介して実行時に変更できます(プールの最大スレッドが更新されます)。適用並行処理(スループット)を増やすか、CPU/メモリの競合を制限するためにこれを調整してください。最小スレッド数とアイドルタイムアウトは、それぞれ`transaction_apply_thread_pool_num_min`と`transaction_apply_worker_idle_time_ms`によって管理されます。 +- 導入バージョン: v3.2.0 ##### transaction_apply_worker_idle_time_ms -- Default: 500 -- Type: int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: トランザクション/更新を適用するために使用されるUpdateManagerの「update_apply」スレッドプールのアイドルタイムアウト(ミリ秒単位)を設定します。この値は `MonoDelta::FromMilliseconds` を介して `ThreadPoolBuilder::set_idle_timeout` に渡されるため、このタイムアウトよりも長くアイドル状態が続くワーカースレッドは終了する可能性があります(プールの設定された最小スレッド数と最大スレッド数に従います)。値が低いほどリソースを早く解放しますが、バースト負荷の下ではスレッド作成/破棄のオーバーヘッドが増加します。値が高いほど、ベースラインのリソース使用量が増加するコストで、短時間のバーストの間はワーカーをウォームに保ちます。 -- Introduced in: v3.2.11 +- デフォルト: 500 +- タイプ: int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: トランザクション/更新の適用に使用される`UpdateManager`の"update_apply"スレッドプールのアイドルタイムアウト(ミリ秒単位)を設定します。この値は`MonoDelta::FromMilliseconds`を介して`ThreadPoolBuilder::set_idle_timeout`に渡されるため、このタイムアウトよりも長くアイドル状態のスレッドは終了される場合があります(プールの設定された最小スレッド数と最大スレッド数による)。値が低いほどリソースの解放が速くなりますが、バースト負荷下でのスレッド作成/破棄オーバーヘッドが増加します。値が高いほど、短時間のバーストに対してワーカーはアクティブな状態を維持しますが、ベースラインのリソース使用量が増加します。 +- 導入バージョン: v3.2.11 ##### trash_file_expire_time_sec -- Default: 86400 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: ゴミファイルをクリーンアップする時間間隔。v2.5.17、v3.0.9、v3.1.6以降、デフォルト値は259,200から86,400に変更されました。 -- Introduced in: - +- デフォルト: 86400 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: ゴミファイルをクリーンアップする時間間隔。v2.5.17、v3.0.9、v3.1.6以降、デフォルト値は259,200から86,400に変更されました。 +- 導入バージョン: - ##### unused_rowset_monitor_interval -- Default: 30 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: 期限切れのrowsetをクリーンアップする時間間隔。 -- Introduced in: - +- デフォルト: 30 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: 期限切れのロウセットをクリーンアップする時間間隔。 +- 導入バージョン: - ##### update_cache_expire_sec -- Default: 360 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Update Cacheの有効期限。 -- Introduced in: - +- デフォルト: 360 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: Update Cacheの有効期限。 +- 導入バージョン: - ##### update_compaction_check_interval_seconds -- Default: 10 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: プライマリキーテーブルのコンパクションをチェックする時間間隔。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: プライマリキーテーブルのコンパクションをチェックする時間間隔。 +- 導入バージョン: - ##### update_compaction_delvec_file_io_amp_ratio -- Default: 2 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーテーブルのDelvecファイルを含むrowsetのコンパクション優先度を制御するために使用されます。値が大きいほど優先度が高くなります。 -- Introduced in: - +- デフォルト: 2 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーテーブルのDelvecファイルを含むロウセットのコンパクション優先度を制御するために使用されます。値が大きいほど優先度が高くなります。 +- 導入バージョン: - ##### update_compaction_num_threads_per_disk -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーテーブルのディスクごとのコンパクションスレッド数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーテーブルのディスクごとのコンパクションスレッド数。 +- 導入バージョン: - ##### update_compaction_per_tablet_min_interval_seconds -- Default: 120 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: プライマリキーテーブルの各タブレットでコンパクションがトリガーされる最小時間間隔。 -- Introduced in: - +- デフォルト: 120 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: プライマリキーテーブルの各タブレットでコンパクションがトリガーされる最小時間間隔。 +- 導入バージョン: - ##### update_compaction_ratio_threshold -- Default: 0.5 -- Type: Double -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーテーブルのコンパクションがマージできるデータの最大割合。単一のタブレットが過度に大きくなる場合、この値を縮小することを推奨します。 -- Introduced in: v3.1.5 +- デフォルト: 0.5 +- タイプ: Double +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスター内のプライマリキーテーブルのコンパクションがマージできるデータの最大割合。単一のタブレットが過度に大きくなる場合は、この値を縮小することを推奨します。 +- 導入バージョン: v3.1.5 ##### update_compaction_result_bytes -- Default: 1073741824 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: プライマリキーテーブルの単一コンパクションの最大結果サイズ。 -- Introduced in: - +- デフォルト: 1073741824 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: プライマリキーテーブルの単一コンパクションの最大結果サイズ。 +- 導入バージョン: - ##### update_compaction_size_threshold -- Default: 268435456 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: プライマリキーテーブルのコンパクションスコアはファイルサイズに基づいて計算され、他のテーブルタイプとは異なります。このパラメータは、プライマリキーテーブルのコンパクションスコアを他のテーブルタイプと同様にし、ユーザーが理解しやすくするために使用できます。 -- Introduced in: - +- デフォルト: 268435456 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: プライマリキーテーブルのコンパクションスコアはファイルサイズに基づいて計算され、他のテーブルタイプとは異なります。このパラメーターを使用すると、プライマリキーテーブルのコンパクションスコアを他のテーブルタイプと同様にし、ユーザーが理解しやすくすることができます。 +- 導入バージョン: - ##### upload_worker_count -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BEノード上のバックアップジョブのアップロードタスクの最大スレッド数。`0` は、BEが稼働しているマシンのCPUコア数に値を設定することを示します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BEノードでのバックアップジョブのアップロードタスクの最大スレッド数。`0`は、BEが稼働しているマシン上のCPUコア数に設定することを示します。 +- 導入バージョン: - ##### vertical_compaction_max_columns_per_group -- Default: 5 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 垂直コンパクションのグループあたりの最大列数。 -- Introduced in: - +- デフォルト: 5 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 垂直コンパクションのグループあたりの最大カラム数。 +- 導入バージョン: - ### 共有データ ##### download_buffer_size -- Default: 4194304 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: スナップショットファイルをダウンロードする際に使用されるインメモリコピーバッファのサイズ(バイト単位)。`SnapshotLoader::download` はこの値を `fs::copy` に転送ごとのチャンクサイズとして渡し、リモートのシーケンシャルファイルからローカルの書き込み可能ファイルに読み込む際に使用します。値が大きいほど、システムコール/I/Oオーバーヘッドが減少するため、高帯域幅リンクでのスループットが向上する可能性があります。値が小さいほど、アクティブな転送ごとのピークメモリ使用量が減少します。注:このパラメータはストリームごとのバッファサイズを制御し、ダウンロードスレッドの数は制御しません。総メモリ消費量 = `download_buffer_size` * `number_of_concurrent_downloads`。 -- Introduced in: v3.2.13 +- デフォルト: 4194304 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: スナップショットファイルのダウンロード時に使用されるインメモリコピーバッファのサイズ(バイト単位)。`SnapshotLoader::download`は、リモートシーケンシャルファイルからローカル書き込み可能ファイルに読み込む際に、この値を`fs::copy`に転送ごとのチャンクサイズとして渡します。値が大きいほど、システムコール/I/Oオーバーヘッドを削減することで高帯域幅リンクのスループットが向上する可能性があります。値が小さいほど、アクティブな転送ごとのピークメモリ使用量が減少します。注:このパラメーターはストリームごとのバッファサイズを制御し、ダウンロードスレッド数は制御しません。総メモリ消費量 = `download_buffer_size * 同時ダウンロード数`。 +- 導入バージョン: v3.2.13 ##### graceful_exit_wait_for_frontend_heartbeat -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: グレースフルシャットダウンを完了する前に、SHUTDOWNステータスを示すフロントエンドハートビート応答を少なくとも1つ待機するかどうかを決定します。有効にすると、グレースフルシャットダウンプロセスは、ハートビートRPCを介してSHUTDOWN確認が応答されるまでアクティブなままになり、フロントエンドが2つの通常のハートビート間隔の間で終了状態を検出するのに十分な時間を確保します。 -- Introduced in: v3.4.5 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: グレースフル終了を完了する前に、SHUTDOWNステータスを示すフロントエンドハートビート応答を少なくとも1つ待機するかどうかを決定します。有効にすると、グレースフルシャットダウンプロセスは、ハートビートRPCを介してSHUTDOWN確認が応答されるまでアクティブなままになり、フロントエンドが2つの通常のハートビート間隔で終了状態を検出するのに十分な時間を確保します。 +- 導入バージョン: v3.4.5 ##### lake_compaction_stream_buffer_size_bytes -- Default: 1048576 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: 共有データクラスターのクラウドネイティブテーブルコンパクション用のリーダーのリモートI/Oバッファサイズ。デフォルト値は1MBです。この値を増やすとコンパクションプロセスを高速化できます。 -- Introduced in: v3.2.3 +- デフォルト: 1048576 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: 共有データクラスター内のクラウドネイティブテーブルコンパクション用のリーダーのリモートI/Oバッファサイズ。デフォルト値は1MBです。この値を増やすことでコンパクションプロセスを高速化できます。 +- 導入バージョン: v3.2.3 ##### lake_pk_compaction_max_input_rowsets -- Default: 500 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターのプライマリキーテーブルのコンパクションタスクで許可される入力rowsetの最大数。このパラメータのデフォルト値は、v3.2.4およびv3.1.10以降 `5` から `1000` に、v3.3.1およびv3.2.9以降 `500` に変更されました。プライマリキーテーブルのサイズ階層型コンパクションポリシーが有効になった後(`enable_pk_size_tiered_compaction_strategy` を `true` に設定することで)、StarRocksは書き込み増幅を減らすために各コンパクションのrowset数を制限する必要がありません。したがって、このパラメータのデフォルト値は増加しています。 -- Introduced in: v3.1.8, v3.2.3 +- デフォルト: 500 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスター内のプライマリキーテーブルのコンパクションタスクで許可される入力ロウセットの最大数。このパラメーターのデフォルト値は、v3.2.4およびv3.1.10以降`5`から`1000`に、v3.3.1およびv3.2.9以降`500`に変更されました。プライマリキーテーブルのSized-tiered Compactionポリシーが有効(`enable_pk_size_tiered_compaction_strategy`を`true`に設定)になると、書き込み増幅を減らすために各コンパクションのロウセット数を制限する必要がなくなります。したがって、このパラメーターのデフォルト値は増加しています。 +- 導入バージョン: v3.1.8, v3.2.3 ##### loop_count_wait_fragments_finish -- Default: 2 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: BE/CNプロセスが終了する際に待機するループの数。各ループは固定された10秒間隔です。ループ待機を無効にするには `0` に設定できます。v3.4以降、この項目は変更可能になり、デフォルト値は `0` から `2` に変更されました。 -- Introduced in: v2.5 +- デフォルト: 2 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: BE/CNプロセスが終了する際に待機するループの回数。各ループは10秒の固定間隔です。ループ待機を無効にするには`0`に設定できます。v3.4以降、この項目は変更可能になり、デフォルト値が`0`から`2`に変更されました。 +- 導入バージョン: v2.5 ##### max_client_cache_size_per_host -- Default: 10 -- Type: Int -- Unit: エントリ (キャッシュされたクライアントインスタンス)/ホスト -- Is mutable: いいえ -- Description: BE全体のクライアントキャッシュによって各リモートホストに対して保持されるキャッシュされたクライアントインスタンスの最大数。この単一の設定は、ExecEnv初期化中にBackendServiceClientCache、FrontendServiceClientCache、およびBrokerServiceClientCacheを作成する際に使用されるため、これらのキャッシュ全体でホストごとに保持されるクライアントスタブ/接続の数を制限します。この値を上げると、再接続とスタブ作成のオーバーヘッドが減少しますが、メモリとファイルディスクリプタの使用量が増加します。減らすとリソースは節約されますが、接続のチャーンが増加する可能性があります。値は起動時に読み取られ、実行時に変更することはできません。現在、1つの共有設定ですべてのクライアントキャッシュタイプを制御しています。後でキャッシュごとの個別の設定が導入される可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: 10 +- タイプ: Int +- 単位: ホストあたりエントリ (キャッシュされたクライアントインスタンス) +- 変更可能: いいえ +- 説明: BE全体のクライアントキャッシュによって各リモートホストに対して保持されるキャッシュされたクライアントインスタンスの最大数。この単一の設定は、`ExecEnv`の初期化中に`BackendServiceClientCache`、`FrontendServiceClientCache`、`BrokerServiceClientCache`を作成する際に使用されるため、これらのキャッシュ全体でホストごとに保持されるクライアントスタブ/接続の数を制限します。この値を上げると、再接続とスタブ作成のオーバーヘッドが減少しますが、メモリとファイルディスクリプタの使用量が増加します。減らすとリソースが節約されますが、接続のチャーンが増加する可能性があります。値は起動時に読み取られ、実行時に変更することはできません。現在、1つの共有設定がすべてのクライアントキャッシュタイプを制御しています。後でキャッシュごとの個別の設定が導入される可能性があります。 +- 導入バージョン: v3.2.0 ##### starlet_filesystem_instance_cache_capacity -- Default: 10000 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Starletファイルシステムインスタンスのキャッシュ容量。 -- Introduced in: v3.2.16, v3.3.11, v3.4.1 +- デフォルト: 10000 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: Starletファイルシステムインスタンスのキャッシュ容量。 +- 導入バージョン: v3.2.16, v3.3.11, v3.4.1 ##### starlet_filesystem_instance_cache_ttl_sec -- Default: 86400 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Starletファイルシステムインスタンスのキャッシュ有効期限。 -- Introduced in: v3.3.15, 3.4.5 +- デフォルト: 86400 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: Starletファイルシステムインスタンスのキャッシュ有効期限。 +- 導入バージョン: v3.3.15, 3.4.5 ##### starlet_port -- Default: 9070 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: BEおよびCN用の追加のエージェントサービスポート。 -- Introduced in: - +- デフォルト: 9070 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: BEおよびCN用の追加エージェントサービスポート。 +- 導入バージョン: - ##### starlet_star_cache_disk_size_percent -- Default: 80 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 共有データクラスターでData Cacheが使用できるディスク容量の最大割合。 -- Introduced in: v3.1 +- デフォルト: 80 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 共有データクラスターでData Cacheが最大で使用できるディスク容量の割合。 +- 導入バージョン: v3.1 ##### starlet_use_star_cache -- Default: false in v3.1 and true from v3.2.3 -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターでData Cacheを有効にするかどうか。`true` はこの機能を有効にし、`false` は無効にすることを示します。v3.2.3以降、デフォルト値は `false` から `true` に変更されました。 -- Introduced in: v3.1 +- デフォルト: v3.1ではfalse、v3.2.3以降はtrue +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターでData Cacheを有効にするかどうか。`true`はこの機能を有効にすることを示し、`false`は無効にすることを示します。デフォルト値はv3.2.3以降`false`から`true`に変更されました。 +- 導入バージョン: v3.1 ##### starlet_write_file_with_tag -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターにおいて、オブジェクトストレージに書き込まれるファイルにオブジェクトストレージタグを付けて、カスタムファイル管理を便利にするかどうか。 -- Introduced in: v3.5.3 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターで、オブジェクトストレージに書き込まれるファイルにオブジェクトストレージタグを付けて、カスタムファイル管理を便利にするかどうか。 +- 導入バージョン: v3.5.3 ##### table_schema_service_max_retries -- Default: 3 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Table Schema Serviceリクエストの最大再試行回数。 -- Introduced in: v4.1 +- デフォルト: 3 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: テーブルスキーマサービスリクエストの最大再試行回数。 +- 導入バージョン: v4.1 ### データレイク ##### datacache_block_buffer_enable -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: Data Cache効率を最適化するためにBlock Bufferを有効にするかどうか。Block Bufferが有効になっている場合、システムはData CacheからBlockデータを読み取り、一時バッファにキャッシュします。これにより、頻繁なキャッシュ読み取りによって引き起こされる余分なオーバーヘッドが削減されます。 -- Introduced in: v3.2.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: Data Cacheの効率を最適化するためにBlock Bufferを有効にするかどうか。Block Bufferが有効な場合、システムはData CacheからBlockデータを読み取り、一時バッファにキャッシュすることで、頻繁なキャッシュ読み取りによる余分なオーバーヘッドを削減します。 +- 導入バージョン: v3.2.0 ##### datacache_disk_adjust_interval_seconds -- Default: 10 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Data Cacheの自動容量スケーリングの間隔。定期的に、システムはキャッシュディスク使用量をチェックし、必要に応じて自動スケーリングをトリガーします。 -- Introduced in: v3.3.0 +- デフォルト: 10 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: Data Cacheの自動容量スケーリングの間隔。定期的に、システムはキャッシュディスク使用量をチェックし、必要に応じて自動スケーリングをトリガーします。 +- 導入バージョン: v3.3.0 ##### datacache_disk_idle_seconds_for_expansion -- Default: 7200 -- Type: Int -- Unit: 秒 -- Is mutable: はい -- Description: Data Cacheの自動拡張の最小待機時間。ディスク使用量が `datacache_disk_low_level` をこの期間よりも長く下回っている場合にのみ、自動スケーリングアップがトリガーされます。 -- Introduced in: v3.3.0 +- デフォルト: 7200 +- タイプ: Int +- 単位: 秒 +- 変更可能: はい +- 説明: Data Cacheの自動拡張の最小待機時間。ディスク使用率が`datacache_disk_low_level`をこの期間以上下回っている場合にのみ、自動スケールアップがトリガーされます。 +- 導入バージョン: v3.3.0 ##### datacache_disk_size -- Default: 0 -- Type: String -- Unit: - -- Is mutable: はい -- Description: 単一ディスクにキャッシュできるデータの最大量。パーセンテージ(例:`80%`)または物理的な制限(例:`2T`、`500G`)として設定できます。たとえば、2つのディスクを使用し、`datacache_disk_size` パラメータの値を `21474836480`(20 GB)に設定した場合、これら2つのディスクに最大40 GBのデータをキャッシュできます。デフォルト値は `0` で、メモリのみを使用してデータをキャッシュすることを示します。 -- Introduced in: - +- デフォルト: 0 +- タイプ: String +- 単位: - +- 変更可能: はい +- 説明: 単一ディスクにキャッシュできるデータの最大量。パーセンテージ(例:`80%`)または物理的な制限(例:`2T`、`500G`)として設定できます。たとえば、2つのディスクを使用し、`datacache_disk_size`パラメーターの値を`21474836480`(20 GB)に設定した場合、これらの2つのディスクに最大40 GBのデータをキャッシュできます。デフォルト値は`0`で、メモリのみを使用してデータをキャッシュすることを示します。 +- 導入バージョン: - ##### datacache_enable -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: いいえ -- Description: Data Cacheを有効にするかどうか。`true` はData Cacheが有効であることを示し、`false` はData Cacheが無効であることを示します。v3.3以降、デフォルト値は `true` に変更されました。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: いいえ +- 説明: Data Cacheを有効にするかどうか。`true`はData Cacheが有効であることを示し、`false`はData Cacheが無効であることを示します。デフォルト値はv3.3以降`true`に変更されました。 +- 導入バージョン: - ##### datacache_eviction_policy -- Default: slru -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: Data Cacheの退去ポリシー。有効な値:`lru`(最小最近使用)および `slru`(セグメント化LRU)。 -- Introduced in: v3.4.0 +- デフォルト: slru +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: Data Cacheの退去ポリシー。有効な値: `lru`(最小使用頻度)および`slru`(セグメント化LRU)。 +- 導入バージョン: v3.4.0 ##### datacache_inline_item_count_limit -- Default: 130172 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: Data Cacheのインラインキャッシュ項目の最大数。特に小さなキャッシュブロックの場合、Data Cacheはそれらを `inline` モードで格納し、ブロックデータとメタデータを一緒にメモリにキャッシュします。 -- Introduced in: v3.4.0 +- デフォルト: 130172 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: Data Cacheのインラインキャッシュアイテムの最大数。特に小さなキャッシュブロックの場合、Data Cacheはそれらを`inline`モードで格納し、ブロックデータとメタデータを一緒にメモリにキャッシュします。 +- 導入バージョン: v3.4.0 ##### datacache_mem_size -- Default: 0 -- Type: String -- Unit: - -- Is mutable: はい -- Description: メモリにキャッシュできるデータの最大量。パーセンテージ(例:`10%`)または物理的な制限(例:`10G`、`21474836480`)として設定できます。 -- Introduced in: - +- デフォルト: 0 +- タイプ: String +- 単位: - +- 変更可能: はい +- 説明: メモリにキャッシュできるデータの最大量。パーセンテージ(例:`10%`)または物理的な制限(例:`10G`、`21474836480`)として設定できます。 +- 導入バージョン: - ##### datacache_min_disk_quota_for_adjustment -- Default: 10737418240 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: Data Cacheの自動スケーリングのための最小有効容量。システムがキャッシュ容量をこの値よりも小さく調整しようとすると、キャッシュ容量は直接 `0` に設定され、キャッシュ容量が不足することによる頻繁なキャッシュの満杯と退去による最適ではないパフォーマンスが防止されます。 -- Introduced in: v3.3.0 +- デフォルト: 10737418240 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: Data Cache自動スケーリングの最小有効容量。システムがキャッシュ容量をこの値よりも少なく調整しようとすると、キャッシュ容量は直接`0`に設定され、キャッシュ容量不足による頻繁なキャッシュの満杯と退去に起因する最適でないパフォーマンスを防ぎます。 +- 導入バージョン: v3.3.0 ##### disk_high_level -- Default: 90 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: キャッシュ容量の自動スケーリングアップをトリガーするディスク使用率の上限(パーセンテージ)。ディスク使用率がこの値を超えると、システムは自動的にData Cacheからキャッシュデータを削除します。v3.4.0以降、デフォルト値は `80` から `90` に変更されました。この項目はv4.0以降、`datacache_disk_high_level` から `disk_high_level` に名称変更されました。 -- Introduced in: v3.3.0 +- デフォルト: 90 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: キャッシュ容量の自動スケールアップをトリガーするディスク使用量の上限(パーセンテージ)。ディスク使用量がこの値を超えると、システムはData Cacheからキャッシュデータを自動的に退去させます。v3.4.0以降、デフォルト値は`80`から`90`に変更されました。この項目はv4.0以降`datacache_disk_high_level`から`disk_high_level`に名称変更されました。 +- 導入バージョン: v3.3.0 ##### disk_low_level -- Default: 60 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: キャッシュ容量の自動スケーリングダウンをトリガーするディスク使用率の下限(パーセンテージ)。ディスク使用率が `datacache_disk_idle_seconds_for_expansion` で指定された期間、この値を下回ったままになり、Data Cacheに割り当てられたスペースが完全に利用されている場合、システムは自動的に上限を増やすことでキャッシュ容量を拡張します。この項目はv4.0以降、`datacache_disk_low_level` から `disk_low_level` に名称変更されました。 -- Introduced in: v3.3.0 +- デフォルト: 60 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: キャッシュ容量の自動スケールダウンをトリガーするディスク使用量の下限(パーセンテージ)。ディスク使用量がこの値を`datacache_disk_idle_seconds_for_expansion`で指定された期間以上下回っている間、Data Cacheに割り当てられたスペースが完全に利用されている場合、システムは上限を増やすことでキャッシュ容量を自動的に拡張します。この項目はv4.0以降`datacache_disk_low_level`から`disk_low_level`に名称変更されました。 +- 導入バージョン: v3.3.0 ##### disk_safe_level -- Default: 80 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: Data Cacheのディスク使用率の安全レベル(パーセンテージ)。Data Cacheが自動スケーリングを実行する際、システムはディスク使用率がこの値にできるだけ近づくようにキャッシュ容量を調整します。v3.4.0以降、デフォルト値は `70` から `80` に変更されました。この項目はv4.0以降、`datacache_disk_safe_level` から `disk_safe_level` に名称変更されました。 -- Introduced in: v3.3.0 +- デフォルト: 80 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: Data Cacheのディスク使用量の安全レベル(パーセンテージ)。Data Cacheが自動スケーリングを実行する際、システムはディスク使用量がこの値にできるだけ近くなるようにキャッシュ容量を調整します。v3.4.0以降、デフォルト値は`70`から`80`に変更されました。この項目はv4.0以降`datacache_disk_safe_level`から`disk_safe_level`に名称変更されました。 +- 導入バージョン: v3.3.0 ##### enable_connector_sink_spill -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 外部テーブルへの書き込みでスピリングを有効にするかどうか。この機能を有効にすると、メモリ不足時に外部テーブルへの書き込みによって多数の小さなファイルが生成されるのを防ぐことができます。現在、この機能はIcebergテーブルへの書き込みのみをサポートしています。 -- Introduced in: v4.0.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 外部テーブルへの書き込みでSpillingを有効にするかどうか。この機能を有効にすると、メモリ不足時に外部テーブルへの書き込みの結果として多数の小さなファイルが生成されるのを防ぐことができます。現在、この機能はIcebergテーブルへの書き込みのみをサポートしています。 +- 導入バージョン: v4.0.0 ##### enable_datacache_disk_auto_adjust -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: Data Cacheディスク容量の自動スケーリングを有効にするかどうか。有効にすると、システムは現在のディスク使用率に基づいてキャッシュ容量を動的に調整します。この項目はv4.0以降、`datacache_auto_adjust_enable` から `enable_datacache_disk_auto_adjust` に名称変更されました。 -- Introduced in: v3.3.0 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: Data Cacheディスク容量の自動スケーリングを有効にするかどうか。有効にすると、システムは現在のディスク使用率に基づいてキャッシュ容量を動的に調整します。この項目はv4.0以降`datacache_auto_adjust_enable`から`enable_datacache_disk_auto_adjust`に名称変更されました。 +- 導入バージョン: v3.3.0 ##### jdbc_connection_idle_timeout_ms -- Default: 600000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: いいえ -- Description: JDBC接続プールでアイドル状態の接続が期限切れになるまでの時間。JDBC接続プールで接続のアイドル時間がこの値を超えると、接続プールは構成項目 `jdbc_minimum_idle_connections` で指定された数を超えるアイドル接続を閉じます。 -- Introduced in: - +- デフォルト: 600000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: いいえ +- 説明: JDBC接続プール内のアイドル接続が期限切れになるまでの時間。JDBC接続プール内の接続アイドル時間がこの値を超えると、接続プールは`jdbc_minimum_idle_connections`で指定された数を超えるアイドル接続を閉じます。 +- 導入バージョン: - ##### jdbc_connection_pool_size -- Default: 8 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: JDBC接続プールのサイズ。各BEノードでは、同じ `jdbc_url` を持つ外部テーブルにアクセスするクエリは同じ接続プールを共有します。 -- Introduced in: - +- デフォルト: 8 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: JDBC接続プールサイズ。各BEノードで、同じ`jdbc_url`で外部テーブルにアクセスするクエリは同じ接続プールを共有します。 +- 導入バージョン: - ##### jdbc_minimum_idle_connections -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: JDBC接続プール内のアイドル接続の最小数。 -- Introduced in: - +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: JDBC接続プール内のアイドル接続の最小数。 +- 導入バージョン: - ##### lake_clear_corrupted_cache_data -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターで破損したデータキャッシュをシステムがクリアすることを許可するかどうか。 -- Introduced in: v3.4 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターで、システムが破損したデータキャッシュをクリアすることを許可するかどうか。 +- 導入バージョン: v3.4 ##### lake_clear_corrupted_cache_meta -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 共有データクラスターで破損したメタデータキャッシュをシステムがクリアすることを許可するかどうか。 -- Introduced in: v3.3 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターで、システムが破損したメタデータキャッシュをクリアすることを許可するかどうか。 +- 導入バージョン: v3.3 ##### lake_enable_vertical_compaction_fill_data_cache -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 垂直コンパクションタスクが共有データクラスターのローカルディスクにデータをキャッシュすることを許可するかどうか。 -- Introduced in: v3.1.7, v3.2.3 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 共有データクラスターで、垂直コンパクションタスクがローカルディスクにデータをキャッシュすることを許可するかどうか。 +- 導入バージョン: v3.1.7, v3.2.3 ##### lake_replication_read_buffer_size -- Default: 16777216 -- Type: Long -- Unit: バイト -- Is mutable: はい -- Description: Lakeレプリケーション中にlakeセグメントファイルをダウンロードする際に使用される読み取りバッファサイズ。この値はリモートファイルを読み取るための読み取りごとの割り当てを決定します。実装では、この設定と1MBの最小値のうち大きい方が使用されます。値が大きいほど読み取り呼び出しの回数が減り、スループットが向上する可能性がありますが、同時ダウンロードごとに使用されるメモリが増加します。値が小さいほどメモリ使用量は減少しますが、I/O呼び出しのコストが増加します。ネットワーク帯域幅、ストレージI/O特性、および並列レプリケーションスレッド数に応じて調整してください。 -- Introduced in: - +- デフォルト: 16777216 +- タイプ: Long +- 単位: バイト +- 変更可能: はい +- 説明: Lakeレプリケーション中にLakeセグメントファイルをダウンロードする際に使用される読み取りバッファサイズ。この値はリモートファイルを読み取るための読み取りごとの割り当てを決定します。実装では、この設定と最小1MBの大きい方が使用されます。値が大きいほど、読み取り呼び出しの数が減り、スループットが向上する可能性がありますが、同時ダウンロードごとのメモリ使用量が増加します。値が小さいほどメモリ使用量が減少しますが、I/O呼び出しが増加するコストがかかります。ネットワーク帯域幅、ストレージI/O特性、および並列レプリケーションスレッド数に応じて調整してください。 +- 導入バージョン: - ##### lake_service_max_concurrency -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: 共有データクラスターのRPCリクエストの最大同時実行性。この閾値に達すると、着信リクエストは拒否されます。この項目が `0` に設定されている場合、同時実行性に制限はありません。 -- Introduced in: - +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: 共有データクラスターでのRPCリクエストの最大並行処理数。この閾値に達すると、受信リクエストは拒否されます。この項目が`0`に設定されている場合、並行処理に制限はありません。 +- 導入バージョン: - ##### max_hdfs_scanner_num -- Default: 50 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: `ConnectorScanNode` が持つことができる同時実行コネクタ(HDFS/リモート)スキャナーの最大数を制限します。スキャンの起動中、ノードは推定同時実行性(メモリ、チャンクサイズ、`scanner_row_num` に基づく)を計算し、この値で上限を設定して、予約するスキャナーとチャンクの数、および起動するスキャナースレッドの数を決定します。また、実行時に保留中のスキャナーをスケジュールする際(過剰なサブスクリプションを避けるため)、およびファイルハンドル制限を考慮して再送信できる保留中のスキャナーの数を決定する際にも参照されます。これを減らすと、スレッド、メモリ、およびオープンファイルの負荷が減少しますが、スループットが低下する可能性があります。増やすと、同時実行性とリソース使用量が増加します。 -- Introduced in: v3.2.0 +- デフォルト: 50 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: `ConnectorScanNode`が持つことができる同時実行コネクタ(HDFS/リモート)スキャナの最大数を制限します。スキャン開始時に、ノードは推定並行処理(メモリ、チャンクサイズ、`scanner_row_num`に基づく)を計算し、この値でキャップして、予約するスキャナとチャンクの数、および開始するスレッドスキャナの数を決定します。また、ファイルハンドル制限を考慮して、実行時に保留中のスキャナをスケジュールする際や、再送信できる保留中のスキャナの数を決定する際にも参照されます。これを減らすと、スレッド数、メモリ、オープンファイルの負荷が減りますが、潜在的なスループットが低下します。増やすと、並行処理とリソース使用量が増加します。 +- 導入バージョン: v3.2.0 ##### query_max_memory_limit_percent -- Default: 90 -- Type: Int -- Unit: - -- Is mutable: いいえ -- Description: Query Poolが使用できる最大メモリ。Processメモリ制限のパーセンテージとして表されます。 -- Introduced in: v3.1.0 +- デフォルト: 90 +- タイプ: Int +- 単位: - +- 変更可能: いいえ +- 説明: Query Poolが使用できる最大メモリ。プロセスメモリ制限のパーセンテージとして表されます。 +- 導入バージョン: v3.1.0 ##### rocksdb_max_write_buffer_memory_bytes -- Default: 1073741824 -- Type: Int64 -- Unit: - -- Is mutable: いいえ -- Description: RocksDBのメタの書き込みバッファの最大サイズです。デフォルトは1GBです。 -- Introduced in: v3.5.0 +- デフォルト: 1073741824 +- タイプ: Int64 +- 単位: - +- 変更可能: いいえ +- 説明: RocksDBのメタ用書き込みバッファの最大サイズです。デフォルトは1GBです。 +- 導入バージョン: v3.5.0 ##### rocksdb_write_buffer_memory_percent -- Default: 5 -- Type: Int64 -- Unit: - -- Is mutable: いいえ -- Description: RocksDBのメタの書き込みバッファのメモリ割合です。デフォルトはシステムメモリの5%です。ただし、これとは別に、書き込みバッファメモリの最終的な計算サイズは64MB未満でも1GB(`rocksdb_max_write_buffer_memory_bytes`)を超過することもありません。 -- Introduced in: v3.5.0 +- デフォルト: 5 +- タイプ: Int64 +- 単位: - +- 変更可能: いいえ +- 説明: RocksDBのメタ用書き込みバッファメモリの割合です。デフォルトはシステムメモリの5%です。ただし、これとは別に、書き込みバッファメモリの最終的な計算サイズは64MBを下回ることも1G(`rocksdb_max_write_buffer_memory_bytes`)を超えることもありません。 +- 導入バージョン: v3.5.0 ### その他 ##### default_mv_resource_group_concurrency_limit -- Default: 0 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクの最大同時実行性(BEノードごと)。デフォルト値 `0` は制限がないことを示します。 -- Introduced in: v3.1 +- デフォルト: 0 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: リソースグループ`default_mv_wg`内のマテリアライズドビューリフレッシュタスクの最大並行処理数(BEノードあたり)。デフォルト値`0`は制限がないことを示します。 +- 導入バージョン: v3.1 ##### default_mv_resource_group_cpu_limit -- Default: 1 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクが使用できる最大CPUコア数(BEノードごと)。 -- Introduced in: v3.1 +- デフォルト: 1 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: リソースグループ`default_mv_wg`内のマテリアライズドビューリフレッシュタスクが使用できる最大CPUコア数(BEノードあたり)。 +- 導入バージョン: v3.1 ##### default_mv_resource_group_memory_limit -- Default: 0.8 -- Type: Double -- Unit: -- Is mutable: はい -- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクが使用できる最大メモリ割合(BEノードごと)。デフォルト値はメモリの80%を示します。 -- Introduced in: v3.1 +- デフォルト: 0.8 +- タイプ: Double +- 単位: +- 変更可能: はい +- 説明: リソースグループ`default_mv_wg`内のマテリアライズドビューリフレッシュタスクが使用できる最大メモリ割合(BEノードあたり)。デフォルト値はメモリの80%を示します。 +- 導入バージョン: v3.1 ##### default_mv_resource_group_spill_mem_limit_threshold -- Default: 0.8 -- Type: Double -- Unit: - -- Is mutable: はい -- Description: リソースグループ `default_mv_wg` 内のマテリアライズドビューの更新タスクが中間結果のスピリングをトリガーする前のメモリ使用量閾値。デフォルト値はメモリの80%を示します。 -- Introduced in: v3.1 +- デフォルト: 0.8 +- タイプ: Double +- 単位: - +- 変更可能: はい +- 説明: リソースグループ`default_mv_wg`内のマテリアライズドビューリフレッシュタスクが中間結果のスピルをトリガーする前のメモリ使用量閾値。デフォルト値はメモリの80%を示します。 +- 導入バージョン: v3.1 ##### enable_resolve_hostname_to_ip_in_load_error_url -- Default: false -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: `error_urls` のデバッグのために、オペレーターがFEハートビートから元のホスト名を使用するか、環境のニーズに基づいてIPアドレスへの解決を強制するかを選択できるかどうか。 - - `true`: ホスト名をIPに解決します。 +- デフォルト: false +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: `error_urls`のデバッグのため、オペレーターがFEハートビートからの元のホスト名を使用するか、環境のニーズに基づいてIPアドレスへの解決を強制するかを選択できるかどうか。 + - `true`: ホスト名をIPアドレスに解決します。 - `false` (デフォルト): エラーURLに元のホスト名を保持します。 -- Introduced in: v4.0.1 +- 導入バージョン: v4.0.1 ##### enable_retry_apply -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: 有効にすると、再試行可能に分類されたタブレット適用失敗(例えば、一時的なメモリ制限エラー)は、タブレットを即座にエラーとマークする代わりに再試行のために再スケジュールされます。`TabletUpdates` の再試行パスは、現在の失敗回数に乗算し、600秒の最大値にクランプされた `retry_apply_interval_second` を使用して次の試行をスケジュールするため、バックオフは連続する失敗とともに増加します。明示的に再試行不能なエラー(例えば、破損)は再試行を迂回し、適用プロセスを即座にエラー状態に移行させます。再試行は、全体的なタイムアウト/最終条件に達するまで続き、その後、適用はエラー状態に入ります。これをオフにすると、失敗した適用タスクの自動再スケジュールが無効になり、失敗した適用は再試行なしでエラー状態に移行します。 -- Introduced in: v3.2.9 +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: 有効にすると、再試行可能に分類されたタブレット適用失敗(例えば、一時的なメモリ制限エラー)は、タブレットを直ちにエラーとマークするのではなく、再試行のために再スケジュールされます。`TabletUpdates`の再試行パスは、現在の失敗数で`retry_apply_interval_second`を乗算し、600秒の最大値にクランプして、次の試行をスケジュールします。したがって、バックオフは連続する失敗とともに増加します。明示的に再試行できないエラー(例えば破損)は再試行を迂回し、適用プロセスは直ちにエラー状態に入ります。再試行は、全体的なタイムアウト/最終条件に達するまで継続され、その後、適用はエラー状態に入ります。これをオフにすると、失敗した適用タスクの自動再スケジュールが無効になり、失敗した適用は再試行せずにエラー状態に移行します。 +- 導入バージョン: v3.2.9 ##### enable_token_check -- Default: true -- Type: Boolean -- Unit: - -- Is mutable: はい -- Description: トークンチェックを有効にするかどうかを制御するブール値。`true` はトークンチェックを有効にすることを示し、`false` は無効にすることを示します。 -- Introduced in: - +- デフォルト: true +- タイプ: Boolean +- 単位: - +- 変更可能: はい +- 説明: トークンチェックを有効にするかどうかを制御するブール値。`true`はトークンチェックを有効にすることを示し、`false`は無効にすることを示します。 +- 導入バージョン: - ##### es_scroll_keepalive -- Default: 5m -- Type: String -- Unit: 分 (サフィックス付き文字列、例: "5m") -- Is mutable: いいえ -- Description: スクロール検索コンテキストのためにElasticsearchに送信されるキープアライブ期間。この値は、初期スクロールURL(`?scroll=`)の構築時および後続のスクロールリクエストの送信時(`ESScrollQueryBuilder` 経由)にそのまま使用されます(例:「5m」)。これは、ES側でES検索コンテキストがガベージコレクションされるまでの時間を制御します。長く設定するとスクロールコンテキストがより長くアクティブに保たれますが、ESクラスターのリソース使用期間が長くなります。この値はESスキャンリーダーによって起動時に読み取られ、実行時に変更することはできません。 -- Introduced in: v3.2.0 +- デフォルト: 5m +- タイプ: String +- 単位: 分 (suffix付き文字列、例: "5m") +- 変更可能: いいえ +- 説明: スクロール検索コンテキストのためにElasticsearchに送信されるキープアライブ期間。この値は、初期スクロールURL(`?scroll=`)の構築時および後続のスクロールリクエスト(`ESScrollQueryBuilder`経由)の送信時に(例えば"5m"のように)そのまま使用されます。これは、ES側でES検索コンテキストがガベージコレクションされるまでの時間を制御します。長く設定するとスクロールコンテキストはより長くアクティブな状態を保ちますが、ESクラスターのリソース使用量が増加します。この値は起動時にESスキャンリーダーによって読み取られ、実行時に変更することはできません。 +- 導入バージョン: v3.2.0 ##### load_replica_status_check_interval_ms_on_failure -- Default: 2000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: 以前のチェックRPCが失敗した場合に、セカンダリレプリカがプライマリレプリカのステータスをチェックする間隔。 -- Introduced in: v3.5.1 +- デフォルト: 2000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: セカンダリレプリカがプライマリレプリカのステータスをチェックする間隔。前回のチェックRPCが失敗した場合。 +- 導入バージョン: v3.5.1 ##### load_replica_status_check_interval_ms_on_success -- Default: 15000 -- Type: Int -- Unit: ミリ秒 -- Is mutable: はい -- Description: 以前のチェックRPCが成功した場合に、セカンダリレプリカがプライマリレプリカのステータスをチェックする間隔。 -- Introduced in: v3.5.1 +- デフォルト: 15000 +- タイプ: Int +- 単位: ミリ秒 +- 変更可能: はい +- 説明: セカンダリレプリカがプライマリレプリカのステータスをチェックする間隔。前回のチェックRPCが成功した場合。 +- 導入バージョン: v3.5.1 ##### max_length_for_bitmap_function -- Default: 1000000 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: ビットマップ関数の入力値の最大長。 -- Introduced in: - +- デフォルト: 1000000 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: ビットマップ関数の入力値の最大長。 +- 導入バージョン: - ##### max_length_for_to_base64 -- Default: 200000 -- Type: Int -- Unit: バイト -- Is mutable: いいえ -- Description: to_base64() 関数の入力値の最大長。 -- Introduced in: - +- デフォルト: 200000 +- タイプ: Int +- 単位: バイト +- 変更可能: いいえ +- 説明: `to_base64()`関数の入力値の最大長。 +- 導入バージョン: - ##### memory_high_level -- Default: 75 -- Type: Long -- Unit: パーセント -- Is mutable: はい -- Description: プロセスメモリ制限のパーセンテージとして表される高水域メモリ閾値。総メモリ消費がこのパーセンテージを超えると、BEは徐々にメモリを解放し始め(現在はデータキャッシュと更新キャッシュを削除することで)、負荷を軽減します。モニターはこの値を使用して `memory_high = mem_limit * memory_high_level / 100` を計算し、消費が `memory_high` を超えた場合、GCアドバイザによってガイドされた制御された削除を実行します。消費が `memory_urgent_level`(別の設定)を超えた場合、より積極的な即時削減が行われます。この値は、閾値を超えた場合に特定のメモリ集約型操作(例えば、プライマリキーのプリロード)を無効にするためにも参照されます。`memory_urgent_level` との検証(`memory_urgent_level` > `memory_high_level`、`memory_high_level` >= 1、`memory_urgent_level` <= 100)を満たす必要があります。 -- Introduced in: v3.2.0 +- デフォルト: 75 +- タイプ: Long +- 単位: パーセント +- 変更可能: はい +- 説明: プロセスメモリ制限のパーセンテージとして表される高水位メモリ閾値。総メモリ消費量がこのパーセンテージを超えると、BEはメモリ負荷を軽減するために徐々にメモリを解放し始めます(現在はデータキャッシュと更新キャッシュを退去させることによって)。モニターはこの値を使用して`memory_high = mem_limit * memory_high_level / 100`を計算し、消費量が`> memory_high`の場合、GCアドバイザによって導かれた制御された退去を実行します。消費量が`memory_urgent_level`(別の設定)を超えた場合、より積極的な即時削減が発生します。この値は、閾値を超えた場合に特定のメモリ集約型操作(例えばプライマリキーのプリロード)を無効にするためにも参照されます。`memory_urgent_level`との検証を満たす必要があります(`memory_urgent_level > memory_high_level`、`memory_high_level >= 1`、`memory_urgent_level <= 100`)。 +- 導入バージョン: v3.2.0 ##### report_exec_rpc_request_retry_num -- Default: 10 -- Type: Int -- Unit: - -- Is mutable: はい -- Description: FEに実行RPCリクエストを報告するためのRPCリクエストの再試行回数。デフォルト値は10で、これはRPCリクエストが失敗した場合、そのフラグメントインスタンスがRPCを完了する限り、10回再試行されることを意味します。実行RPCリクエストの報告はロードジョブにとって重要であり、あるフラグメントインスタンスの完了報告が失敗した場合、ロードジョブはタイムアウトするまでハングします。 -- Introduced in: - +- デフォルト: 10 +- タイプ: Int +- 単位: - +- 変更可能: はい +- 説明: FEに実行RPCリクエストを報告するためのRPCリクエストの再試行回数。デフォルト値は10で、これはRPCリクエストがフラグメントインスタンス完了RPCの場合にのみ、失敗した場合に10回再試行されることを意味します。実行RPCリクエストの報告はロードジョブにとって重要であり、1つのフラグメントインスタンス完了報告が失敗すると、ロードジョブはタイムアウトまでハングアップします。 +- 導入バージョン: - ##### sleep_one_second -- Default: 1 -- Type: Int -- Unit: 秒 -- Is mutable: いいえ -- Description: BEエージェントワーカースレッドが、マスターアドレス/ハートビートがまだ利用できない場合や、短時間の再試行/バックオフが必要な場合に、1秒間の一時停止として使用する小さなグローバルスリープ間隔(秒単位)。コードベースでは、複数のレポートワーカープール(例:ReportDiskStateTaskWorkerPool、ReportOlapTableTaskWorkerPool、ReportWorkgroupTaskWorkerPool)によって参照され、ビジーウェイトを回避し、再試行中のCPU消費を削減します。この値を増やすと、再試行頻度とマスター可用性への応答性が低下します。減らすと、ポーリングレートとCPU使用量が増加します。応答性とリソース使用量のトレードオフを意識してのみ調整してください。 -- Introduced in: v3.2.0 +- デフォルト: 1 +- タイプ: Int +- 単位: 秒 +- 変更可能: いいえ +- 説明: BEエージェントワーカー・スレッドが、マスターアドレス/ハートビートがまだ利用できない場合、または短い再試行/バックオフが必要な場合に、1秒間の一時停止として使用する小さなグローバルなスリープ間隔(秒単位)。コードベースでは、複数のレポートワーカープール(例:`ReportDiskStateTaskWorkerPool`、`ReportOlapTableTaskWorkerPool`、`ReportWorkgroupTaskWorkerPool`)によって参照され、再試行中のビジーウェイトを回避し、CPU消費を削減します。この値を増やすと、再試行頻度とマスター可用性への応答性が低下します。減らすと、ポーリングレートとCPU使用量が増加します。応答性とリソース使用量のトレードオフを意識して調整してください。 +- 導入バージョン: v3.2.0 ##### small_file_dir -- Default: `${STARROCKS_HOME}/lib/small_file/` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: ファイルマネージャーによってダウンロードされたファイルを保存するために使用されるディレクトリ。 -- Introduced in: - +- デフォルト: `${STARROCKS_HOME}/lib/small_file/` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: ファイルマネージャーによってダウンロードされたファイルを格納するために使用されるディレクトリ。 +- 導入バージョン: - ##### upload_buffer_size -- Default: 4194304 -- Type: Int -- Unit: バイト -- Is mutable: はい -- Description: スナップショットファイルをリモートストレージ(ブローカーまたは直接FileSystem)にアップロードする際のファイルコピー操作で使用されるバッファサイズ(バイト単位)。アップロードパス(`snapshot_loader.cpp`)では、この値が `fs::copy` に各アップロードストリームの読み取り/書き込みチャンクサイズとして渡されます。デフォルトは4MiBです。この値を増やすと、高遅延または高帯域幅リンクでのスループットが向上する可能性がありますが、同時アップロードごとのメモリ使用量が増加します。減らすと、ストリームごとのメモリは減少しますが、転送効率が低下する可能性があります。`upload_worker_count` および利用可能な総メモリと合わせて調整してください。 -- Introduced in: v3.2.13 +- デフォルト: 4194304 +- タイプ: Int +- 単位: バイト +- 変更可能: はい +- 説明: スナップショットファイルをリモートストレージ(ブローカーまたは直接FileSystem)にアップロードする際のファイルコピー操作で使用されるバッファサイズ(バイト単位)。アップロードパス(snapshot_loader.cpp)では、この値が各アップロードストリームの読み取り/書き込みチャンクサイズとして`fs::copy`に渡されます。デフォルトは4 MiBです。この値を増やすと、高レイテンシまたは高帯域幅リンクでのスループットが向上する可能性がありますが、同時アップロードごとのメモリ使用量が増加します。減らすと、ストリームごとのメモリ使用量は減少しますが、転送効率が低下する可能性があります。`upload_worker_count`および利用可能な総メモリと合わせて調整してください。 +- 導入バージョン: v3.2.13 ##### user_function_dir -- Default: `${STARROCKS_HOME}/lib/udf` -- Type: String -- Unit: - -- Is mutable: いいえ -- Description: ユーザー定義関数(UDF)を保存するために使用されるディレクトリ。 -- Introduced in: - +- デフォルト: `${STARROCKS_HOME}/lib/udf` +- タイプ: String +- 単位: - +- 変更可能: いいえ +- 説明: ユーザー定義関数(UDF)を格納するために使用されるディレクトリ。 +- 導入バージョン: - ##### web_log_bytes -- Default: 1048576 (1 MB) -- Type: long -- Unit: バイト -- Is mutable: いいえ -- Description: INFOログファイルから読み取り、BEデバッグウェブサーバーのログページに表示する最大バイト数。ハンドラはこの値を使用してシークオフセットを計算し(最後のNバイトを表示)、非常に大きなログファイルの読み取りまたは提供を回避します。ログファイルがこの値よりも小さい場合、ファイル全体が表示されます。注:現在の実装では、INFOログを読み取って提供するコードはコメントアウトされており、ハンドラはINFOログファイルを開けないと報告するため、ログ提供コードが有効になっていない限り、このパラメータは効果がない可能性があります。 -- Introduced in: v3.2.0 +- デフォルト: 1048576 (1 MB) +- タイプ: long +- 単位: バイト +- 変更可能: いいえ +- 説明: INFOログファイルから読み取り、BEデバッグウェブサーバーのログページに表示する最大バイト数。ハンドラーはこの値を使用してシークオフセットを計算し(最後のNバイトを表示)、非常に大きなログファイルを読み取ったり提供したりするのを避けます。ログファイルがこの値よりも小さい場合、ファイル全体が表示されます。注:現在の実装では、INFOログを読み取り提供するコードはコメントアウトされており、ハンドラーはINFOログファイルを開けなかったと報告するため、ログ提供コードが有効になっていない限り、このパラメーターは効果がない可能性があります。 +- 導入バージョン: v3.2.0 -### 削除されたパラメータ +### 削除されたパラメーター ##### enable_bit_unpack_simd -- Status: 削除済み -- Description: このパラメータは削除されました。ビットアンパックSIMD選択は、現在コンパイル時(AVX2/BMI2)に処理され、デフォルトの実装に自動的にフォールバックします。 -- Removed in: - +- ステータス: 削除済み +- 説明: このパラメーターは削除されました。Bit-unpack SIMDの選択は、コンパイル時に(AVX2/BMI2)自動的に行われ、デフォルトの実装にフォールバックします。 +- 削除バージョン: - diff --git a/docs/ja/administration/management/Backup_and_restore.md b/docs/ja/administration/management/Backup_and_restore.md index 17f70c1..fd28bce 100644 --- a/docs/ja/administration/management/Backup_and_restore.md +++ b/docs/ja/administration/management/Backup_and_restore.md @@ -2,13 +2,13 @@ displayed_sidebar: docs --- -# データのバックアップとリストア +# データのバックアップと復元 -このトピックでは、StarRocksでのデータのバックアップとリストア、または新しいStarRocksクラスターへのデータ移行について説明します。 +このトピックでは、StarRocksでのデータのバックアップと復元、または新しいStarRocksクラスターへのデータ移行について説明します。 -StarRocksは、データをスナップショットとしてリモートストレージシステムにバックアップし、そのデータを任意のStarRocksクラスターにリストアすることをサポートしています。 +StarRocksは、データをスナップショットとしてリモートストレージシステムにバックアップし、そのデータを任意のStarRocksクラスターに復元することをサポートしています。 -v3.4.0以降、StarRocksはより多くのオブジェクトをサポートし、柔軟性を向上させるために構文をリファクタリングすることにより、BACKUPおよびRESTOREの機能を強化しました。 +v3.4.0以降、StarRocksは、より多くのオブジェクトをサポートし、より柔軟な構文にリファクタリングすることで、BACKUPおよびRESTOREの機能を強化しました。 StarRocksは以下のリモートストレージシステムをサポートしています。 @@ -19,21 +19,21 @@ StarRocksは以下のリモートストレージシステムをサポートし StarRocksは以下のオブジェクトのバックアップをサポートしています。 -- 内部データベース、テーブル(全てのタイプとパーティショニング戦略)、およびパーティション +- 内部データベース、テーブル(すべてのタイプとパーティショニング戦略)、およびパーティション - 外部カタログのメタデータ(v3.4.0以降でサポート) - 同期マテリアライズドビューと非同期マテリアライズドビュー - 論理ビュー(v3.4.0以降でサポート) -- ユーザー定義関数 (UDF)(v3.4.0以降でサポート) +- ユーザー定義関数(UDF)(v3.4.0以降でサポート) -> **NOTE** +> **注** > -> Shared-data StarRocksクラスターはデータのBACKUPとRESTOREをサポートしていません。 +> 共有データStarRocksクラスターは、データのBACKUPおよびRESTOREをサポートしていません。 ## リポジトリの作成 -データをバックアップする前に、リポジトリを作成する必要があります。これはリモートストレージシステムにデータスナップショットを保存するために使用されます。StarRocksクラスター内に複数のリポジトリを作成できます。詳細な手順については、[CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md)を参照してください。 +データをバックアップする前に、リモートストレージシステムにデータスナップショットを保存するために使用するリポジトリを作成する必要があります。StarRocksクラスターには複数のリポジトリを作成できます。詳細な手順については、[CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md)を参照してください。 -- HDFSにリポジトリを作成する +- HDFSでのリポジトリ作成 以下の例は、HDFSクラスターに`test_repo`という名前のリポジトリを作成します。 @@ -47,9 +47,9 @@ PROPERTIES( ); ``` -- AWS S3にリポジトリを作成する +- AWS S3でのリポジトリ作成 - AWS S3へのアクセス認証方法として、IAMユーザーベースの認証情報(Access KeyとSecret Key)、Instance Profile、またはAssumed Roleを選択できます。 + AWS S3にアクセスするための認証方法として、IAMユーザーベースの認証情報(アクセスキーとシークレットキー)、Instance Profile、またはAssumed Roleを選択できます。 - 以下の例は、IAMユーザーベースの認証情報を認証方法として使用し、AWS S3バケット`bucket_s3`に`test_repo`という名前のリポジトリを作成します。 @@ -89,11 +89,11 @@ PROPERTIES( ); ``` -> **NOTE** +> **注** > -> StarRocksは、S3Aプロトコルにのみ従ってAWS S3にリポジトリを作成することをサポートしています。したがって、AWS S3にリポジトリを作成する際は、`ON LOCATION`でリポジトリのロケーションとして渡すS3 URIの`s3://`を`s3a://`に置き換える必要があります。 +> StarRocksは、S3Aプロトコルに準拠してのみAWS S3にリポジトリを作成することをサポートしています。したがって、AWS S3にリポジトリを作成する際には、`ON LOCATION`でリポジトリロケーションとして渡すS3 URIの`s3://`を`s3a://`に置き換える必要があります。 -- Google GCSにリポジトリを作成する +- Google GCSでのリポジトリ作成 以下の例は、Google GCSバケット`bucket_gcs`に`test_repo`という名前のリポジトリを作成します。 @@ -108,12 +108,12 @@ PROPERTIES( ); ``` -> **NOTE** +> **注** > -> - StarRocksは、S3Aプロトコルにのみ従ってGoogle GCSにリポジトリを作成することをサポートしています。したがって、Google GCSにリポジトリを作成する際は、`ON LOCATION`でリポジトリのロケーションとして渡すGCS URIのプレフィックスを`s3a://`に置き換える必要があります。 +> - StarRocksは、S3Aプロトコルに準拠してのみGoogle GCSにリポジトリを作成することをサポートしています。したがって、Google GCSにリポジトリを作成する際には、`ON LOCATION`でリポジトリロケーションとして渡すGCS URIのプレフィックスを`s3a://`に置き換える必要があります。 > - エンドポイントアドレスに`https`を指定しないでください。 -- MinIOにリポジトリを作成する +- MinIOでのリポジトリ作成 以下の例は、MinIOバケット`bucket_minio`に`test_repo`という名前のリポジトリを作成します。 @@ -128,15 +128,15 @@ PROPERTIES( ); ``` -リポジトリが作成された後、[SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md)を使用してリポジトリを確認できます。データをリストアした後、[DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md)を使用してStarRocks内のリポジトリを削除できます。ただし、リモートストレージシステムにバックアップされたデータスナップショットはStarRocks経由で削除することはできません。リモートストレージシステムで手動で削除する必要があります。 +リポジトリ作成後、[SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md)でリポジトリを確認できます。データ復元後、[DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md)を使用してStarRocks内のリポジトリを削除できます。ただし、リモートストレージシステムにバックアップされたデータスナップショットはStarRocksを通じて削除することはできません。リモートストレージシステムで手動で削除する必要があります。 ## データのバックアップ リポジトリが作成されたら、データスナップショットを作成し、リモートリポジトリにバックアップする必要があります。詳細な手順については、[BACKUP](../../sql-reference/sql-statements/backup_restore/BACKUP.md)を参照してください。BACKUPは非同期操作です。[SHOW BACKUP](../../sql-reference/sql-statements/backup_restore/SHOW_BACKUP.md)を使用してBACKUPジョブのステータスを確認したり、[CANCEL BACKUP](../../sql-reference/sql-statements/backup_restore/CANCEL_BACKUP.md)を使用してBACKUPジョブをキャンセルしたりできます。 -StarRocksは、データベース、テーブル、またはパーティションの粒度でのFULLバックアップをサポートしています。 +StarRocksは、データベース、テーブル、またはパーティションの粒度レベルでのFULLバックアップをサポートしています。 -テーブルに大量のデータを保存している場合、パーティションごとにデータをバックアップおよびリストアすることをお勧めします。これにより、ジョブの失敗時の再試行コストを削減できます。定期的に増分データをバックアップする必要がある場合は、テーブルに[パーティショニングプラン](../../table_design/data_distribution/Data_distribution.md#partitioning)を設定し、毎回新しいパーティションのみをバックアップできます。 +テーブルに大量のデータを保存している場合、パーティション単位でデータをバックアップおよび復元することをお勧めします。これにより、ジョブの失敗時の再試行コストを削減できます。定期的に増分データをバックアップする必要がある場合は、テーブルの[パーティショニング計画](../../table_design/data_distribution/Data_distribution.md#partitioning)を構成し、毎回新しいパーティションのみをバックアップできます。 ### データベースのバックアップ @@ -145,28 +145,28 @@ StarRocksは、データベース、テーブル、またはパーティショ 以下の例は、データベース`sr_hub`をスナップショット`sr_hub_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 BACKUP DATABASE sr_hub SNAPSHOT sr_hub_backup TO test_repo; --- 以前のバージョンでの構文と互換性があります。 +-- 以前のバージョンの構文と互換性があります。 BACKUP SNAPSHOT sr_hub.sr_hub_backup TO test_repo; ``` ### テーブルのバックアップ -StarRocksは、全てのタイプとパーティショニング戦略のテーブルのバックアップとリストアをサポートしています。テーブルに対して完全なBACKUPを実行すると、そのテーブルと、その上に構築された同期マテリアライズドビューがバックアップされます。 +StarRocksは、すべてのタイプおよびパーティショニング戦略のテーブルのバックアップと復元をサポートしています。テーブルに対して完全なBACKUPを実行すると、そのテーブルと、その上に構築された同期マテリアライズドビューがバックアップされます。 以下の例は、データベース`sr_hub`からテーブル`sr_member`をスナップショット`sr_member_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 BACKUP DATABASE sr_hub SNAPSHOT sr_member_backup TO test_repo ON (TABLE sr_member); --- 以前のバージョンでの構文と互換性があります。 +-- 以前のバージョンの構文と互換性があります。 BACKUP SNAPSHOT sr_hub.sr_member_backup TO test_repo ON (sr_member); @@ -190,27 +190,27 @@ ON (ALL TABLES); ### パーティションのバックアップ -以下の例は、データベース`sr_hub`のテーブル`sr_member`のパーティション`p1`をスナップショット`sr_par_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 +以下の例は、データベース`sr_hub`からテーブル`sr_member`のパーティション`p1`をスナップショット`sr_par_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 BACKUP DATABASE sr_hub SNAPSHOT sr_par_backup TO test_repo ON (TABLE sr_member PARTITION (p1)); --- 以前のバージョンでの構文と互換性があります。 +-- 以前のバージョンの構文と互換性があります。 BACKUP SNAPSHOT sr_hub.sr_par_backup TO test_repo ON (sr_member PARTITION (p1)); ``` -複数のパーティション名をコンマ (`,`) で区切って指定することで、パーティションを一括でバックアップできます。 +複数のパーティション名をカンマ (`,`) で区切って指定することで、パーティションを一括でバックアップできます。 ### マテリアライズドビューのバックアップ -同期マテリアライズドビューは、ベーステーブルのBACKUP操作と同時にバックアップされるため、手動でバックアップする必要はありません。 +同期マテリアライズドビューはベーステーブルのBACKUP操作と共にバックアップされるため、手動でバックアップする必要はありません。 -非同期マテリアライズドビューは、それが属するデータベースのBACKUP操作と同時にバックアップできます。手動でバックアップすることもできます。 +非同期マテリアライズドビューは、それが属するデータベースのBACKUP操作と共にバックアップできます。また、手動でバックアップすることも可能です。 以下の例は、データベース`sr_hub`からマテリアライズドビュー`sr_mv1`をスナップショット`sr_mv1_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 @@ -264,7 +264,7 @@ ON (ALL VIEWS); ### UDFのバックアップ -以下の例は、データベース`sr_hub`からUDF`sr_udf1`をスナップショット`sr_udf1_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 +以下の例は、データベース`sr_hub`からUDF `sr_udf1`をスナップショット`sr_udf1_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_udf1_backup @@ -272,7 +272,7 @@ TO test_repo ON (FUNCTION sr_udf1); ``` -以下の例は、データベース`sr_hub`から2つのUDF`sr_udf1`と`sr_udf2`をスナップショット`sr_udf2_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 +以下の例は、データベース`sr_hub`から2つのUDF `sr_udf1`と`sr_udf2`をスナップショット`sr_udf2_backup`としてバックアップし、そのスナップショットをリポジトリ`test_repo`にアップロードします。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_udf2_backup @@ -311,29 +311,29 @@ BACKUP ALL EXTERNAL CATALOGS SNAPSHOT all_catalog_backup TO test_repo; ``` -外部カタログに対するBACKUP操作をキャンセルするには、次のステートメントを実行します。 +外部カタログに対するBACKUP操作をキャンセルするには、以下のステートメントを実行します。 ```SQL CANCEL BACKUP FOR EXTERNAL CATALOG; ``` -## データのリストア +## データの復元 -リモートストレージシステムにバックアップされたデータスナップショットを、現在のStarRocksクラスターまたは他のStarRocksクラスターにリストアして、データを回復または移行できます。 +リモートストレージシステムにバックアップされたデータスナップショットを、現在のStarRocksクラスターまたは他のStarRocksクラスターに復元することで、データを復旧または移行できます。 -**スナップショットからオブジェクトをリストアする際は、スナップショットのタイムスタンプを指定する必要があります。** +**スナップショットからオブジェクトを復元する際には、そのスナップショットのタイムスタンプを指定する必要があります。** -リモートストレージシステム内のデータスナップショットをリストアするには、[RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md)ステートメントを使用します。 +[RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md)ステートメントを使用して、リモートストレージシステム内のデータスナップショットを復元します。 RESTOREは非同期操作です。[SHOW RESTORE](../../sql-reference/sql-statements/backup_restore/SHOW_RESTORE.md)を使用してRESTOREジョブのステータスを確認したり、[CANCEL RESTORE](../../sql-reference/sql-statements/backup_restore/CANCEL_RESTORE.md)を使用してRESTOREジョブをキャンセルしたりできます。 -### (オプション)新しいクラスターにリポジトリを作成する +### (オプション)新しいクラスターでのリポジトリ作成 -データを別のStarRocksクラスターに移行するには、ターゲットクラスターで同じ**リポジトリ名**と**ロケーション**を持つリポジトリを作成する必要があります。そうしないと、以前にバックアップされたデータスナップショットを表示できません。詳細については、[リポジトリの作成](#create-a-repository)を参照してください。 +データを別のStarRocksクラスターに移行するには、ターゲットクラスターに同じ**リポジトリ名**と**ロケーション**を持つリポジトリを作成する必要があります。そうしないと、以前にバックアップされたデータスナップショットを表示できません。詳細については、「[リポジトリの作成](#create-a-repository)」を参照してください。 -### スナップショットタイムスタンプの取得 +### スナップショットのタイムスタンプの取得 -データをリストアする前に、[SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md)を使用してリポジトリ内のスナップショットを確認し、タイムスタンプを取得できます。 +データを復元する前に、[SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md)を使用してリポジトリ内のスナップショットを確認し、タイムスタンプを取得できます。 以下の例は、`test_repo`内のスナップショット情報を確認します。 @@ -347,73 +347,73 @@ mysql> SHOW SNAPSHOT ON test_repo; 1 row in set (1.16 sec) ``` -### データベースのリストア +### データベースの復元 -以下の例は、スナップショット`sr_hub_backup`内のデータベース`sr_hub`をターゲットクラスター内のデータベース`sr_hub`にリストアします。スナップショットにデータベースが存在しない場合、システムはエラーを返します。ターゲットクラスターにデータベースが存在しない場合、システムは自動的に作成します。 +以下の例は、スナップショット`sr_hub_backup`内のデータベース`sr_hub`を、ターゲットクラスターのデータベース`sr_hub`に復元します。データベースがスナップショットに存在しない場合、システムはエラーを返します。データベースがターゲットクラスターに存在しない場合、システムは自動的に作成します。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 RESTORE SNAPSHOT sr_hub_backup FROM test_repo DATABASE sr_hub PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); --- 以前のバージョンでの構文と互換性があります。 +-- 以前のバージョンの構文と互換性があります。 RESTORE SNAPSHOT sr_hub.sr_hub_backup -FROM `test_repo` +FROM `test_repo` PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); ``` -以下の例は、スナップショット`sr_hub_backup`内のデータベース`sr_hub`をターゲットクラスター内のデータベース`sr_hub_new`にリストアします。スナップショットにデータベース`sr_hub`が存在しない場合、システムはエラーを返します。ターゲットクラスターにデータベース`sr_hub_new`が存在しない場合、システムは自動的に作成します。 +以下の例は、スナップショット`sr_hub_backup`内のデータベース`sr_hub`を、ターゲットクラスターのデータベース`sr_hub_new`に復元します。データベース`sr_hub`がスナップショットに存在しない場合、システムはエラーを返します。データベース`sr_hub_new`がターゲットクラスターに存在しない場合、システムは自動的に作成します。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 RESTORE SNAPSHOT sr_hub_backup FROM test_repo DATABASE sr_hub AS sr_hub_new PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); ``` -### テーブルのリストア +### テーブルの復元 -以下の例は、スナップショット`sr_member_backup`内のデータベース`sr_hub`のテーブル`sr_member`を、ターゲットクラスター内のデータベース`sr_hub`のテーブル`sr_member`にリストアします。 +以下の例は、スナップショット`sr_member_backup`内のデータベース`sr_hub`のテーブル`sr_member`を、ターゲットクラスターのデータベース`sr_hub`のテーブル`sr_member`に復元します。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 RESTORE SNAPSHOT sr_member_backup -FROM test_repo -DATABASE sr_hub -ON (TABLE sr_member) +FROM test_repo +DATABASE sr_hub +ON (TABLE sr_member) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); --- 以前のバージョンでの構文と互換性があります。 +-- 以前のバージョンの構文と互換性があります。 RESTORE SNAPSHOT sr_hub.sr_member_backup FROM test_repo ON (sr_member) PROPERTIES ("backup_timestamp"="2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_member_backup`内のデータベース`sr_hub`のテーブル`sr_member`を、ターゲットクラスター内のデータベース`sr_hub_new`のテーブル`sr_member_new`にリストアします。 +以下の例は、スナップショット`sr_member_backup`内のデータベース`sr_hub`のテーブル`sr_member`を、ターゲットクラスターのデータベース`sr_hub_new`のテーブル`sr_member_new`に復元します。 ```SQL RESTORE SNAPSHOT sr_member_backup -FROM test_repo +FROM test_repo DATABASE sr_hub AS sr_hub_new -ON (TABLE sr_member AS sr_member_new) +ON (TABLE sr_member AS sr_member_new) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_core_backup`内のデータベース`sr_hub`の2つのテーブル`sr_member`と`sr_pmc`を、ターゲットクラスター内のデータベース`sr_hub`の2つのテーブル`sr_member`と`sr_pmc`にリストアします。 +以下の例は、スナップショット`sr_core_backup`内のデータベース`sr_hub`の2つのテーブル`sr_member`と`sr_pmc`を、ターゲットクラスターのデータベース`sr_hub`の2つのテーブル`sr_member`と`sr_pmc`に復元します。 ```SQL RESTORE SNAPSHOT sr_core_backup -FROM test_repo +FROM test_repo DATABASE sr_hub -ON (TABLE sr_member, TABLE sr_pmc) +ON (TABLE sr_member, TABLE sr_pmc) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_all_backup`内のデータベース`sr_hub`からすべてのテーブルをリストアします。 +以下の例は、スナップショット`sr_all_backup`内のデータベース`sr_hub`からすべてのテーブルを復元します。 ```SQL RESTORE SNAPSHOT sr_all_backup @@ -422,76 +422,76 @@ DATABASE sr_hub ON (ALL TABLES); ``` -以下の例は、スナップショット`sr_all_backup`内のデータベース`sr_hub`からすべてのテーブルのうちの1つをリストアします。 +以下の例は、スナップショット`sr_all_backup`内のデータベース`sr_hub`からすべてのテーブルのうちの1つを復元します。 ```SQL RESTORE SNAPSHOT sr_all_backup FROM test_repo DATABASE sr_hub -ON (TABLE sr_member) +ON (TABLE sr_member) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -### パーティションのリストア +### パーティションの復元 -以下の例は、スナップショット`sr_par_backup`内のテーブル`sr_member`のパーティション`p1`を、ターゲットクラスター内のテーブル`sr_member`のパーティション`p1`にリストアします。 +以下の例は、スナップショット`sr_par_backup`内のテーブル`sr_member`のパーティション`p1`を、ターゲットクラスターのテーブル`sr_member`のパーティション`p1`に復元します。 ```SQL --- v3.4.0以降でサポート。 +-- v3.4.0以降でサポートされています。 RESTORE SNAPSHOT sr_par_backup FROM test_repo DATABASE sr_hub -ON (TABLE sr_member PARTITION (p1)) +ON (TABLE sr_member PARTITION (p1)) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); --- 以前のバージョンでの構文と互換性があります。 +-- 以前のバージョンの構文と互換性があります。 RESTORE SNAPSHOT sr_hub.sr_par_backup FROM test_repo -ON (sr_member PARTITION (p1)) +ON (sr_member PARTITION (p1)) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -複数のパーティション名をコンマ (`,`) で区切って指定することで、パーティションを一括でリストアできます。 +複数のパーティション名をカンマ (`,`) で区切って指定することで、パーティションを一括で復元できます。 -### マテリアライズドビューのリストア +### マテリアライズドビューの復元 -以下の例は、スナップショット`sr_mv1_backup`内のデータベース`sr_hub`からマテリアライズドビュー`sr_mv1`をターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_mv1_backup`内のデータベース`sr_hub`からマテリアライズドビュー`sr_mv1`をターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_mv1_backup FROM test_repo DATABASE sr_hub -ON (MATERIALIZED VIEW sr_mv1) +ON (MATERIALIZED VIEW sr_mv1) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_mv2_backup`内のデータベース`sr_hub`から2つのマテリアライズドビュー`sr_mv1`と`sr_mv2`をターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_mv2_backup`内のデータベース`sr_hub`から2つのマテリアライズドビュー`sr_mv1`と`sr_mv2`をターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_mv2_backup FROM test_repo DATABASE sr_hub -ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2) +ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_mv3_backup`内のデータベース`sr_hub`からすべてのマテリアライズドビューをターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_mv3_backup`内のデータベース`sr_hub`からすべてのマテリアライズドビューをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_mv3_backup FROM test_repo DATABASE sr_hub -ON (ALL MATERIALIZED VIEWS) +ON (ALL MATERIALIZED VIEWS) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_mv3_backup`内のデータベース`sr_hub`からすべてのマテリアライズドビューのうちの1つをターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_mv3_backup`内のデータベース`sr_hub`からすべてのマテリアライズドビューのうちの1つをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_mv3_backup FROM test_repo DATABASE sr_hub -ON (MATERIALIZED VIEW sr_mv1) +ON (MATERIALIZED VIEW sr_mv1) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` @@ -499,125 +499,125 @@ PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); RESTORE後、[SHOW MATERIALIZED VIEWS](../../sql-reference/sql-statements/materialized_view/SHOW_MATERIALIZED_VIEW.md)を使用してマテリアライズドビューのステータスを確認できます。 -- マテリアライズドビューがアクティブな場合は、直接使用できます。 -- マテリアライズドビューが非アクティブな場合は、そのベーステーブルがリストアされていないためである可能性があります。すべてのベーステーブルがリストアされた後、[ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md)を使用してマテリアライズドビューを再アクティブ化できます。 +- マテリアライズドビューがアクティブな場合、直接使用できます。 +- マテリアライズドビューが非アクティブな場合、ベーステーブルが復元されていない可能性があります。すべてのベーステーブルが復元された後、[ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md)を使用してマテリアライズドビューを再度アクティブ化できます。 ::: -### 論理ビューのリストア +### 論理ビューの復元 -以下の例は、スナップショット`sr_view1_backup`内のデータベース`sr_hub`から論理ビュー`sr_view1`をターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_view1_backup`内のデータベース`sr_hub`から論理ビュー`sr_view1`をターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_view1_backup FROM test_repo DATABASE sr_hub -ON (VIEW sr_view1) +ON (VIEW sr_view1) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_view2_backup`内のデータベース`sr_hub`から2つの論理ビュー`sr_view1`と`sr_view2`をターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_view2_backup`内のデータベース`sr_hub`から2つの論理ビュー`sr_view1`と`sr_view2`をターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_view2_backup FROM test_repo DATABASE sr_hub -ON (VIEW sr_view1, VIEW sr_view2) +ON (VIEW sr_view1, VIEW sr_view2) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_view3_backup`内のデータベース`sr_hub`からすべての論理ビューをターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_view3_backup`内のデータベース`sr_hub`からすべての論理ビューをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_view3_backup FROM test_repo DATABASE sr_hub -ON (ALL VIEWS) +ON (ALL VIEWS) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_view3_backup`内のデータベース`sr_hub`からすべての論理ビューのうちの1つをターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_view3_backup`内のデータベース`sr_hub`からすべての論理ビューのうちの1つをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_view3_backup FROM test_repo DATABASE sr_hub -ON (VIEW sr_view1) +ON (VIEW sr_view1) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -### UDFのリストア +### UDFの復元 -以下の例は、スナップショット`sr_udf1_backup`内のデータベース`sr_hub`からUDF`sr_udf1`をターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_udf1_backup`内のデータベース`sr_hub`からUDF `sr_udf1`をターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_udf1_backup FROM test_repo DATABASE sr_hub -ON (FUNCTION sr_udf1) +ON (FUNCTION sr_udf1) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_udf2_backup`内のデータベース`sr_hub`から2つのUDF`sr_udf1`と`sr_udf2`をターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_udf2_backup`内のデータベース`sr_hub`から2つのUDF `sr_udf1`と`sr_udf2`をターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_udf2_backup FROM test_repo DATABASE sr_hub -ON (FUNCTION sr_udf1, FUNCTION sr_udf2) +ON (FUNCTION sr_udf1, FUNCTION sr_udf2) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_udf3_backup`内のデータベース`sr_hub`からすべてのUDFをターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_udf3_backup`内のデータベース`sr_hub`からすべてのUDFをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_udf3_backup FROM test_repo DATABASE sr_hub -ON (ALL FUNCTIONS) +ON (ALL FUNCTIONS) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`sr_udf3_backup`内のデータベース`sr_hub`からすべてのUDFのうちの1つをターゲットクラスターにリストアします。 +以下の例は、スナップショット`sr_udf3_backup`内のデータベース`sr_hub`からすべてのUDFのうちの1つをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT sr_udf3_backup FROM test_repo DATABASE sr_hub -ON (FUNCTION sr_udf1) +ON (FUNCTION sr_udf1) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -### 外部カタログのメタデータのリストア +### 外部カタログのメタデータの復元 -以下の例は、スナップショット`iceberg_backup`内の外部カタログ`iceberg`のメタデータをターゲットクラスターにリストアし、`iceberg_new`として名前を変更します。 +以下の例は、スナップショット`iceberg_backup`内の外部カタログ`iceberg`のメタデータをターゲットクラスターに復元し、`iceberg_new`として名前を変更します。 ```SQL RESTORE SNAPSHOT iceberg_backup FROM test_repo -EXTERNAL CATALOG (iceberg AS iceberg_new) +EXTERNAL CATALOG (iceberg AS iceberg_new) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`iceberg_hive_backup`内の2つの外部カタログ`iceberg`と`hive`のメタデータをターゲットクラスターにリストアします。 +以下の例は、スナップショット`iceberg_hive_backup`内の2つの外部カタログ`iceberg`と`hive`のメタデータをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT iceberg_hive_backup -FROM test_repo +FROM test_repo EXTERNAL CATALOGS (iceberg, hive) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下の例は、スナップショット`all_catalog_backup`内のすべての外部カタログのメタデータをターゲットクラスターにリストアします。 +以下の例は、スナップショット`all_catalog_backup`内のすべての外部カタログのメタデータをターゲットクラスターに復元します。 ```SQL RESTORE SNAPSHOT all_catalog_backup -FROM test_repo +FROM test_repo ALL EXTERNAL CATALOGS PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -外部カタログに対するRESTORE操作をキャンセルするには、次のステートメントを実行します。 +外部カタログに対するRESTORE操作をキャンセルするには、以下のステートメントを実行します。 ```SQL CANCEL RESTORE FOR EXTERNAL CATALOG; @@ -625,26 +625,26 @@ CANCEL RESTORE FOR EXTERNAL CATALOG; ## BACKUPまたはRESTOREジョブの構成 -BE構成ファイル**be.conf**で以下の構成項目を修正することで、BACKUPまたはRESTOREジョブのパフォーマンスを最適化できます。 +BE構成ファイル**be.conf**で以下の構成項目を変更することで、BACKUPまたはRESTOREジョブのパフォーマンスを最適化できます。 -| 構成項目 | 説明 | -| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------- | -| `make_snapshot_worker_count` | BEノード上のBACKUPジョブのスナップショット作成タスクのスレッドの最大数。デフォルト: `5`。この構成項目の値を増やすと、スナップショット作成タスクの並行度が高まります。 | -| `release_snapshot_worker_count` | BEノード上の失敗したBACKUPジョブのスナップショット解放タスクのスレッドの最大数。デフォルト: `5`。この構成項目の値を増やすと、スナップショット解放タスクの並行度が高まります。 | -| `upload_worker_count` | BEノード上のBACKUPジョブのアップロードタスクのスレッドの最大数。デフォルト: `0`。`0`は、BEが存在するマシンのCPUコア数に値を設定することを示します。この構成項目の値を増やすと、アップロードタスクの並行度が高まります。 | -| `download_worker_count` | BEノード上のRESTOREジョブのダウンロードタスクのスレッドの最大数。デフォルト: `0`。`0`は、BEが存在するマシンのCPUコア数に値を設定することを示します。この構成項目の値を増やすと、ダウンロードタスクの並行度が高まります。 | +| 設定項目 | 説明 | +|---|---| +| make_snapshot_worker_count | BEノード上のBACKUPジョブのスナップショット作成タスクの最大スレッド数。デフォルト: `5`。この設定項目の値を増やすと、スナップショット作成タスクの並行性が向上します。 | +| release_snapshot_worker_count | BEノード上の失敗したBACKUPジョブのスナップショット解放タスクの最大スレッド数。デフォルト: `5`。この設定項目の値を増やすと、スナップショット解放タスクの並行性が向上します。 | +| upload_worker_count | BEノード上のBACKUPジョブのアップロードタスクの最大スレッド数。デフォルト: `0`。`0`は、BEが存在するマシンのCPUコア数に値を設定することを示します。この設定項目の値を増やすと、アップロードタスクの並行性が向上します。 | +| download_worker_count | BEノード上のRESTOREジョブのダウンロードタスクの最大スレッド数。デフォルト: `0`。`0`は、BEが存在するマシンのCPUコア数に値を設定することを示します。この設定項目の値を増やすと、ダウンロードタスクの並行性が向上します。 | ## 使用上の注意 -- グローバル、データベース、テーブル、パーティションレベルでのバックアップおよびリストア操作には、異なる権限が必要です。詳細については、[シナリオに応じたロールのカスタマイズ](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios)を参照してください。 -- 各データベースでは、同時に実行できるBACKUPまたはRESTOREジョブは1つだけです。そうでない場合、StarRocksはエラーを返します。 -- BACKUPおよびRESTOREジョブはStarRocksクラスターの多くのリソースを占有するため、StarRocksクラスターの負荷が低いときにデータをバックアップおよびリストアすることをお勧めします。 +- グローバル、データベース、テーブル、パーティションレベルでのバックアップおよび復元操作には、異なる権限が必要です。詳細については、「[シナリオに基づくロールのカスタマイズ](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios)」を参照してください。 +- 各データベースでは、一度に1つのBACKUPまたはRESTOREジョブのみが許可されます。そうでない場合、StarRocksはエラーを返します。 +- BACKUPおよびRESTOREジョブはStarRocksクラスターの多くのリソースを占有するため、StarRocksクラスターに大きな負荷がかかっていない間にデータをバックアップおよび復元できます。 - StarRocksは、データバックアップのためのデータ圧縮アルゴリズムの指定をサポートしていません。 -- データはスナップショットとしてバックアップされるため、スナップショット生成時にロードされたデータはスナップショットに含まれません。したがって、スナップショットが生成された後、かつRESTOREジョブが完了する前に古いクラスターにデータをロードした場合、そのデータをリストア先のクラスターにもロードする必要があります。データ移行が完了した後、一定期間両方のクラスターにデータを並行してロードし、データとサービスの正確性を検証した上で、アプリケーションを新しいクラスターに移行することをお勧めします。 -- RESTOREジョブが完了するまで、リストア対象のテーブルを操作することはできません。 -- Primary Keyテーブルは、v2.5より前のStarRocksクラスターにリストアすることはできません。 -- リストアするテーブルは、リストア前に新しいクラスターで作成する必要はありません。RESTOREジョブが自動的に作成します。 -- リストア対象のテーブルと同じ名前の既存テーブルがある場合、StarRocksはまず既存テーブルのスキーマがリストア対象テーブルのスキーマと一致するかどうかを確認します。スキーマが一致する場合、StarRocksは既存テーブルをスナップショット内のデータで上書きします。スキーマが一致しない場合、RESTOREジョブは失敗します。キーワード`AS`を使用してリストア対象のテーブルの名前を変更するか、データをリストアする前に既存のテーブルを削除することができます。 -- RESTOREジョブが既存のデータベース、テーブル、またはパーティションを上書きする場合、ジョブがCOMMITフェーズに入った後、上書きされたデータを元に戻すことはできません。この時点でRESTOREジョブが失敗またはキャンセルされた場合、データが破損し、アクセスできなくなる可能性があります。この場合、RESTORE操作を再度実行し、ジョブが完了するのを待つしかありません。したがって、現在のデータがもう使用されていないことを確認できない限り、上書きによるデータのリストアは推奨されません。上書き操作は、まずスナップショットと既存のデータベース、テーブル、またはパーティション間のメタデータの整合性をチェックします。不整合が検出された場合、RESTORE操作は実行できません。 -- 現在、StarRocksはユーザーアカウント、権限、およびリソースグループに関連する構成データのバックアップとリストアをサポートしていません。 -- 現在、StarRocksはテーブル間のColocate Join関係のバックアップとリストアをサポートしていません。 +- データはスナップショットとしてバックアップされるため、スナップショット生成時にロードされたデータはスナップショットに含まれません。したがって、スナップショット生成後からRESTOREジョブが完了するまでの間に古いクラスターにデータをロードした場合、そのデータを復元先のクラスターにもロードする必要があります。データ移行が完了した後、しばらくの間両方のクラスターに並行してデータをロードし、データの正確性とサービスを確認した後にアプリケーションを新しいクラスターに移行することをお勧めします。 +- RESTOREジョブが完了するまで、復元対象のテーブルを操作することはできません。 +- Primary Keyテーブルは、v2.5より前のStarRocksクラスターには復元できません。 +- 復元する前に、新しいクラスターで復元対象のテーブルを作成する必要はありません。RESTOREジョブが自動的に作成します。 +- 復元対象のテーブルと重複する名前の既存テーブルがある場合、StarRocksはまず既存テーブルのスキーマが復元対象テーブルのスキーマと一致するかどうかを確認します。スキーマが一致する場合、StarRocksは既存テーブルをスナップショット内のデータで上書きします。スキーマが一致しない場合、RESTOREジョブは失敗します。この場合、キーワード`AS`を使用して復元対象のテーブルの名前を変更するか、データを復元する前に既存のテーブルを削除することができます。 +- RESTOREジョブが既存のデータベース、テーブル、またはパーティションを上書きする場合、ジョブがCOMMITフェーズに入った後、上書きされたデータは復元できません。この時点でRESTOREジョブが失敗またはキャンセルされた場合、データが破損してアクセス不能になる可能性があります。この場合、RESTORE操作を再度実行し、ジョブの完了を待つしかありません。したがって、現在のデータがもう使用されていないと確信している場合を除き、上書きによってデータを復元することはお勧めしません。上書き操作は、最初にスナップショットと既存のデータベース、テーブル、またはパーティション間のメタデータの一貫性をチェックします。不整合が検出された場合、RESTORE操作は実行できません。 +- 現在、StarRocksは、ユーザーアカウント、権限、およびリソースグループに関連する構成データのバックアップと復元をサポートしていません。 +- 現在、StarRocksは、テーブル間のColocate Join関係のバックアップと復元をサポートしていません。 diff --git a/docs/ja/administration/management/FE_configuration.md b/docs/ja/administration/management/FE_configuration.md index a84e416..99bac8f 100644 --- a/docs/ja/administration/management/FE_configuration.md +++ b/docs/ja/administration/management/FE_configuration.md @@ -10,29 +10,39 @@ import StaticFEConfigNote from '../../_assets/commonMarkdown/StaticFE_config_not import EditionSpecificFEItem from '../../_assets/commonMarkdown/Edition_Specific_FE_Item.mdx' -# FE 設定 +# FE Configuration +FE設定 -## FE 設定項目の表示 +## View FE configuration items +FE構成項目の表示 -FE の起動後、MySQL クライアントで `ADMIN SHOW FRONTEND CONFIG` コマンドを実行して、パラメーター設定を確認できます。特定のパラメーターの設定をクエリするには、次のコマンドを実行します。 +After your FE is started, you can run the ADMIN SHOW FRONTEND CONFIG command on your MySQL client to check the parameter configurations. If you want to query the configuration of a specific parameter, run the following command: +FEが起動した後、MySQLクライアントで`ADMIN SHOW FRONTEND CONFIG`コマンドを実行して、パラメーター設定を確認できます。特定のパラメーターの設定をクエリしたい場合は、次のコマンドを実行します。 ```SQL ADMIN SHOW FRONTEND CONFIG [LIKE "pattern"]; ``` -返されるフィールドの詳細については、[ADMIN SHOW CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SHOW_CONFIG.md) を参照してください。 +For detailed description of the returned fields, see [ADMIN SHOW CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SHOW_CONFIG.md). +返されるフィールドの詳細については、[ADMIN SHOW CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SHOW_CONFIG.md)を参照してください。 +:::note +You must have administrator privileges to run cluster administration-related commands. +::: :::note クラスター管理関連コマンドを実行するには、管理者権限が必要です。 ::: -## FE パラメーターの設定 +## Configure FE parameters +FEパラメーターの設定 -### FE 動的パラメーターの設定 +### Configure FE dynamic parameters +FE動的パラメーターの設定 -[ADMIN SET FRONTEND CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md) を使用して、FE 動的パラメーターの設定を構成または変更できます。 +You can configure or modify the settings of FE dynamic parameters using [ADMIN SET FRONTEND CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md). +[ADMIN SET FRONTEND CONFIG](../../sql-reference/sql-statements/cluster-management/config_vars/ADMIN_SET_CONFIG.md)を使用して、FE動的パラメーターの設定を構成または変更できます。 ```SQL ADMIN SET FRONTEND CONFIG ("key" = "value"); @@ -40,128 +50,144 @@ ADMIN SET FRONTEND CONFIG ("key" = "value"); -### FE 静的パラメーターの設定 +### Configure FE static parameters +FE静的パラメーターの設定 -## FE パラメーターについて +## Understand FE parameters +FEパラメーターの理解 -### ロギング +### Logging +ログ記録 +##### audit_log_delete_age ##### audit_log_delete_age +- Default: 30d +- Type: String +- Unit: - +- Is mutable: No +- Description: The retention period of audit log files. The default value `30d` specifies that each audit log file can be retained for 30 days. StarRocks checks each audit log file and deletes those that were generated 30 days ago. +- Introduced in: - - デフォルト: 30d - 型: String - 単位: - - 変更可能: いいえ -- 説明: 監査ログファイルの保持期間。デフォルト値の `30d` は、各監査ログファイルが 30 日間保持されることを指定します。StarRocks は各監査ログファイルをチェックし、30 日以上前に生成されたものを削除します。 +- 説明: 監査ログファイルの保持期間。デフォルト値`30d`は、各監査ログファイルが30日間保持できることを指定します。StarRocksは各監査ログファイルをチェックし、30日前に生成されたファイルを削除します。 - 導入バージョン: - +##### audit_log_dir ##### audit_log_dir +- Default: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- Type: String +- Unit: - +- Is mutable: No +- Description: The directory that stores audit log files. +- Introduced in: - - デフォルト: StarRocksFE.STARROCKS_HOME_DIR + "/log" - 型: String - 単位: - - 変更可能: いいえ -- 説明: 監査ログファイルが保存されるディレクトリ。 +- 説明: 監査ログファイルを保存するディレクトリ。 - 導入バージョン: - +##### audit_log_enable_compress ##### audit_log_enable_compress +- Default: false +- Type: Boolean +- Unit: N/A +- Is mutable: No +- Description: When true, the generated Log4j2 configuration appends a ".gz" postfix to rotated audit log filenames (fe.audit.log.*) so that Log4j2 will produce compressed (.gz) archived audit log files on rollover. The setting is read during FE startup in Log4jConfig.initLogging and is applied to the RollingFile appender for audit logs; it only affects rotated/archived files, not the active audit log. Because the value is initialized at startup, changing it requires restarting the FE to take effect. Use alongside audit log rotation settings (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num). +- Introduced in: 3.2.12 - デフォルト: false - 型: Boolean - 単位: N/A - 変更可能: いいえ -- 説明: true の場合、生成された Log4j2 設定は、ローテーションされた監査ログファイル名 (fe.audit.log.*) に ".gz" 接尾辞を追加し、Log4j2 がロールオーバー時に圧縮された (.gz) アーカイブ監査ログファイルを生成するようにします。この設定は、FE 起動時に Log4jConfig.initLogging で読み込まれ、監査ログ用の RollingFile アペンダーに適用されます。アクティブな監査ログには影響せず、ローテーション/アーカイブされたファイルにのみ影響します。値は起動時に初期化されるため、変更を有効にするには FE の再起動が必要です。監査ログのローテーション設定 (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num) とともに使用します。 +- 説明: trueの場合、生成されたLog4j2設定は、ロールオーバー時にLog4j2が圧縮された(.gz)アーカイブ監査ログファイルを生成するように、ローテーションされた監査ログファイル名(fe.audit.log.*)に".gz"の接尾辞を追加します。この設定は、FE起動時にLog4jConfig.initLoggingで読み取られ、監査ログ用のRollingFileアペンダーに適用されます。アクティブな監査ログではなく、ローテーション/アーカイブされたファイルにのみ影響します。値は起動時に初期化されるため、変更を有効にするにはFEを再起動する必要があります。監査ログのローテーション設定(audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num)と併用してください。 - 導入バージョン: 3.2.12 +##### audit_log_json_format ##### audit_log_json_format +- Default: false +- Type: Boolean +- Unit: N/A +- Is mutable: Yes +- Description: When true, FE audit events are emitted as structured JSON (Jackson ObjectMapper serializing a Map of annotated AuditEvent fields) instead of the default pipe-separated "key=value" string. The setting affects all built-in audit sinks handled by AuditLogBuilder: connection audit, query audit, big-query audit (big-query threshold fields are added to the JSON when the event qualifies), and slow-audit output. Fields annotated for big-query thresholds and the "features" field are treated specially (excluded from normal audit entries; included in big-query or feature logs as applicable). Enable this to make logs machine-parsable for log collectors or SIEMs; note it changes the log format and may require updating any existing parsers that expect the legacy pipe-separated format. +- Introduced in: 3.2.7 - デフォルト: false - 型: Boolean - 単位: N/A - 変更可能: はい -- 説明: true の場合、FE 監査イベントは、デフォルトのパイプ区切り「key=value」文字列ではなく、構造化された JSON (AuditEvent フィールドの Map をシリアル化する Jackson ObjectMapper) として出力されます。この設定は、AuditLogBuilder で処理されるすべての組み込み監査シンクに影響します。接続監査、クエリ監査、大規模クエリ監査 (イベントが条件を満たす場合、大規模クエリのしきい値フィールドが JSON に追加されます)、および低速監査出力です。大規模クエリのしきい値と「features」フィールドの注釈が付けられたフィールドは特別に扱われます (通常の監査エントリから除外され、適用可能な場合は大規模クエリまたは機能ログに含まれます)。ログコレクターまたは SIEM のためにログを機械で解析できるようにするには、これを有効にします。ただし、ログ形式が変更され、従来のパイプ区切り形式を想定する既存のパーサーを更新する必要がある場合があります。 +- 説明: trueの場合、FE監査イベントは、デフォルトのパイプ区切り「key=value」文字列ではなく、構造化されたJSON(Jackson ObjectMapperがアノテーション付きAuditEventフィールドのMapをシリアル化)として出力されます。この設定は、AuditLogBuilderによって処理されるすべての組み込み監査シンクに影響します。接続監査、クエリ監査、ビッグクエリ監査(イベントが条件を満たす場合、ビッグクエリしきい値フィールドがJSONに追加されます)、およびスロー監査出力です。ビッグクエリしきい値と「features」フィールドにアノテーションが付けられたフィールドは特別に処理されます(通常の監査エントリから除外され、該当する場合にビッグクエリまたは機能ログに含まれます)。ログをログコレクタまたはSIEMが機械的に解析できるようにするには、これを有効にします。ログ形式が変更され、従来のパイプ区切り形式を想定する既存のパーサーを更新する必要がある場合があることに注意してください。 - 導入バージョン: 3.2.7 +##### audit_log_modules ##### audit_log_modules +- Default: slow_query, query +- Type: String[] +- Unit: - +- Is mutable: No +- Description: The modules for which StarRocks generates audit log entries. By default, StarRocks generates audit logs for the `slow_query` module and the `query` module. The `connection` module is supported from v3.0. Separate the module names with a comma (,) and a space. +- Introduced in: - - デフォルト: slow_query, query - 型: String[] - 単位: - - 変更可能: いいえ -- 説明: StarRocks が監査ログエントリを生成するモジュール。デフォルトでは、StarRocks は `slow_query` モジュールと `query` モジュールの監査ログを生成します。`connection` モジュールは v3.0 からサポートされています。モジュール名をコンマ (,) とスペースで区切ります。 +- 説明: StarRocksが監査ログエントリを生成するモジュール。デフォルトでは、StarRocksは`slow_query`モジュールと`query`モジュールの監査ログを生成します。`connection`モジュールはv3.0以降でサポートされています。モジュール名をコンマ(,)とスペースで区切ります。 - 導入バージョン: - +##### audit_log_roll_interval ##### audit_log_roll_interval +- Default: DAY +- Type: String +- Unit: - +- Is mutable: No +- Description: The time interval at which StarRocks rotates audit log entries. Valid values: `DAY` and `HOUR`. + - If this parameter is set to `DAY`, a suffix in the `yyyyMMdd` format is added to the names of audit log files. + - If this parameter is set to `HOUR`, a suffix in the `yyyyMMddHH` format is added to the names of audit log files. +- Introduced in: - - デフォルト: DAY - 型: String - 単位: - - 変更可能: いいえ -- 説明: StarRocks が監査ログエントリをローテーションする時間間隔。有効な値: `DAY` と `HOUR`。 - - このパラメーターを `DAY` に設定すると、監査ログファイル名に `yyyyMMdd` 形式のサフィックスが追加されます。 - - このパラメーターを `HOUR` に設定すると、監査ログファイル名に `yyyyMMddHH` 形式のサフィックスが追加されます。 +- 説明: StarRocksが監査ログエントリをローテーションする時間間隔。有効な値:`DAY`と`HOUR`。 + - このパラメーターが`DAY`に設定されている場合、監査ログファイル名に`yyyyMMdd`形式のサフィックスが追加されます。 + - このパラメーターが`HOUR`に設定されている場合、監査ログファイル名に`yyyyMMddHH`形式のサフィックスが追加されます。 - 導入バージョン: - +##### audit_log_roll_num ##### audit_log_roll_num +- Default: 90 +- Type: Int +- Unit: - +- Is mutable: No +- Description: The maximum number of audit log files that can be retained within each retention period specified by the `audit_log_roll_interval` parameter. +- Introduced in: - - デフォルト: 90 - 型: Int - 単位: - - 変更可能: いいえ -- 説明: `audit_log_roll_interval` パラメーターで指定された各保持期間内に保持できる監査ログファイルの最大数。 +- 説明: `audit_log_roll_interval`パラメーターで指定された各保持期間内に保持できる監査ログファイルの最大数。 - 導入バージョン: - +##### bdbje_log_level ##### bdbje_log_level +- Default: INFO +- Type: String +- Unit: - +- Is mutable: No +- Description: Controls the logging level used by Berkeley DB Java Edition (BDB JE) in StarRocks. During BDB environment initialization BDBEnvironment.initConfigs() applies this value to the Java logger for the `com.sleepycat.je` package and to the BDB JE environment file logging level (EnvironmentConfig.FILE_LOGGING_LEVEL). Accepts standard java.util.logging.Level names such as SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST, ALL, OFF. Setting to ALL enables all log messages. Increasing verbosity will raise log volume and may impact disk I/O and performance; the value is read when the BDB environment is initialized, so it takes effect only after environment (re)initialization. +- Introduced in: v3.2.0 - デフォルト: INFO - 型: String - 単位: - - 変更可能: いいえ -- 説明: StarRocks で Berkeley DB Java Edition (BDB JE) が使用するログレベルを制御します。BDB 環境初期化 BDBEnvironment.initConfigs() 中に、この値を `com.sleepycat.je` パッケージの Java ロガーと BDB JE 環境ファイルロギングレベル (EnvironmentConfig.FILE_LOGGING_LEVEL) に適用します。SEVERE、WARNING、INFO、CONFIG、FINE、FINER、FINEST、ALL、OFF などの標準的な java.util.logging.Level 名を受け入れます。ALL に設定すると、すべてのログメッセージが有効になります。詳細度を上げるとログ量が増加し、ディスク I/O とパフォーマンスに影響を与える可能性があります。値は BDB 環境が初期化されるときに読み取られるため、環境の (再) 初期化後にのみ有効になります。 -- 導入バージョン: v3.2.0 - -##### big_query_log_delete_age - -- デフォルト: 7d -- 型: String -- 単位: - -- 変更可能: いいえ -- 説明: FE 大規模クエリログファイル (`fe.big_query.log.*`) が自動削除されるまでの保持期間を制御します。この値は、Log4j の削除ポリシーに IfLastModified の age として渡されます。最終更新時刻がこの値よりも古いローテーションされた大規模クエリログは削除されます。`d` (日)、`h` (時間)、`m` (分)、`s` (秒) のサフィックスをサポートします。例: `7d` (7 日間)、`10h` (10 時間)、`60m` (60 分)、`120s` (120 秒)。この項目は、`big_query_log_roll_interval` および `big_query_log_roll_num` と連携して、どのファイルを保持またはパージするかを決定します。 -- 導入バージョン: v3.2.0 - -##### big_query_log_dir - -- デフォルト: `Config.STARROCKS_HOME_DIR + "/log"` -- 型: String -- 単位: - -- 変更可能: いいえ -- 説明: FE が大規模クエリダンプログ (`fe.big_query.log.*`) を書き込むディレクトリ。Log4j 設定は、このパスを使用して `fe.big_query.log` およびそのローテーションされたファイル用の RollingFile アペンダーを作成します。ローテーションと保持は、`big_query_log_roll_interval` (時間ベースのサフィックス)、`log_roll_size_mb` (サイズトリガー)、`big_query_log_roll_num` (最大ファイル数)、および `big_query_log_delete_age` (時間ベースの削除) によって管理されます。大規模クエリレコードは、`big_query_log_cpu_second_threshold`、`big_query_log_scan_rows_threshold`、または `big_query_log_scan_bytes_threshold` などのユーザー定義のしきい値を超えるクエリに対してログに記録されます。`big_query_log_modules` を使用して、どのモジュールがこのファイルにログを記録するかを制御します。 -- 導入バージョン: v3.2.0 - -##### big_query_log_modules - -- デフォルト: `{"query"}` -- 型: String[] -- 単位: - -- 変更可能: いいえ -- 説明: モジュールごとの大規模クエリロギングを有効にするモジュール名サフィックスのリスト。一般的な値は論理コンポーネント名です。例えば、デフォルトの `query` は `big_query.query` を生成します。 -- 導入バージョン: v3.2.0 - -##### big_query_log_roll_interval - -- デフォルト: `"DAY"` -- 型: String -- 単位: - -- 変更可能: いいえ -- 説明: `big_query` ログアペンダーのローリングファイル名の日付コンポーネントを構築するために使用される時間間隔を指定します。有効な値 (大文字と小文字を区別しない) は `DAY` (デフォルト) と `HOUR` です。`DAY` は日次パターン (`"%d{yyyyMMdd}"`) を生成し、`HOUR` は時間パターン (`"%d{yyyyMMddHH}"`) を生成します。この値は、サイズベースのロールオーバー (`big_query_roll_maxsize`) とインデックスベースのロールオーバー (`big_query_log_roll_num`) と組み合わされて、RollingFile の filePattern を形成します。無効な値は、ログ設定の生成を失敗させ (IOException)、ログの初期化または再設定を妨げる可能性があります。`big_query_log_dir`、`big_query_roll_maxsize`、`big_query_log_roll_num`、および `big_query_log_delete_age` とともに使用します。 -- 導入バージョン: v3.2.0 - -##### big_query_log_roll_num - -- デフォルト: 10 -- 型: Int -- 単位: - -- 変更可能: いいえ -- 説明: `big_query_log_roll_interval` ごとに保持するローテーションされた FE 大規模クエリログファイルの最大数。この値は、`fe.big_query.log` の RollingFile アペンダーの DefaultRolloverStrategy `max` 属性にバインドされます。ログが (時間または `log_roll_size_mb` によって) ロールオーバーされると、StarRocks は最大 `big_query_log_roll_num` 個のインデックス付きファイル +- 説明: StarRocksにおけるBerkeley DB Java Edition (BDB JE) が使用するログレベルを制御します。BDB環境の初期化中、`BDBEnvironment.initConfigs()`は、この値を` diff --git a/docs/ja/administration/management/Scale_up_down.md b/docs/ja/administration/management/Scale_up_down.md index 08e5a68..e2f5c95 100644 --- a/docs/ja/administration/management/Scale_up_down.md +++ b/docs/ja/administration/management/Scale_up_down.md @@ -2,20 +2,20 @@ displayed_sidebar: docs --- -# スケールインとスケールアウト +# スケールイン・アウト -このトピックでは、StarRocksのノードをスケールインおよびスケールアウトする方法について説明します。 +このトピックでは、StarRocksのノードをスケールイン・アウトする方法について説明します。 -## FEのスケールインとスケールアウト +## FEのスケールイン・アウト StarRocksには、FollowerとObserverの2種類のFEノードがあります。Followerは選挙の投票と書き込みに関与します。Observerはログの同期と読み取りパフォーマンスの拡張にのみ使用されます。 -> * Follower FEの数(リーダーを含む)は奇数でなければならず、高可用性(HA)モードを形成するために3つをデプロイすることが推奨されます。 -> * FEが高可用性デプロイメント(リーダー1、Follower 2)である場合、読み取りパフォーマンスを向上させるためにObserver FEを追加することが推奨されます。 +> * フォロワーFE(リーダーを含む)の数は奇数である必要があり、高可用性(HA)モードを形成するために3つデプロイすることが推奨されます。 +> * FEが高可用性デプロイメント(1リーダー、2フォロワー)の場合、読み取りパフォーマンスを向上させるためにObserver FEを追加することが推奨されます。 ### FEのスケールアウト -FEノードをデプロイし、サービスを開始した後、以下のコマンドを実行してFEをスケールアウトします。 +FEノードをデプロイし、サービスを開始した後、FEをスケールアウトするために以下のコマンドを実行します。 ~~~sql alter system add follower "fe_host:edit_log_port"; @@ -24,34 +24,34 @@ alter system add observer "fe_host:edit_log_port"; ### FEのスケールイン -FEのスケールインはスケールアウトと似ています。以下のコマンドを実行してFEをスケールインします。 +FEのスケールインはスケールアウトと似ています。FEをスケールインするために以下のコマンドを実行します。 ~~~sql alter system drop follower "fe_host:edit_log_port"; alter system drop observer "fe_host:edit_log_port"; ~~~ -拡張と縮小の後、`show proc '/frontends';` を実行してノード情報を確認できます。 +拡張および縮小後、`show proc '/frontends';`を実行してノード情報を確認できます。 -## BEのスケールインとスケールアウト +## BEのスケールイン・アウト -StarRocksは、BEがスケールインまたはスケールアウトされた後、全体のパフォーマンスに影響を与えることなく、自動的にロードバランシングを実行します。 +StarRocksは、BEのスケールイン・アウト後も、全体のパフォーマンスに影響を与えることなく、自動的にロードバランシングを実行します。 -新しいBEノードを追加すると、システムのTablet Schedulerが新しいノードとその低い負荷を検出し、高負荷のBEノードから新しい低負荷のBEノードへタブレットの移動を開始し、クラスター全体でのデータと負荷の均等な分散を保証します。 +新しいBEノードを追加すると、システムのTablet Schedulerが新しいノードとその低い負荷を検出し、高負荷のBEノードから新しい低負荷のBEノードへタブレットの移動を開始し、クラスター全体でデータと負荷が均等に分散されるようにします。 バランシングプロセスは、各BEに対して計算されるloadScoreに基づいており、ディスク使用率とレプリカ数の両方を考慮します。システムは、loadScoreが高いノードからloadScoreが低いノードへタブレットを移動させることを目指します。 -FE構成パラメータ`tablet_sched_disable_balance`を確認して、自動バランシングが無効になっていないことを確認できます(このパラメータはデフォルトでfalseであり、タブレットバランシングがデフォルトで有効であることを意味します)。詳細については、[レプリカ管理ドキュメント](./resource_management/Replica.md)を参照してください。 +FE設定パラメータ`tablet_sched_disable_balance`を確認して、自動バランシングが無効になっていないことを確認できます(このパラメータはデフォルトでfalseであり、タブレットバランシングがデフォルトで有効であることを意味します)。詳細については、[レプリカ管理ドキュメント](./resource_management/Replica.md)を参照してください。 ### BEのスケールアウト -以下のコマンドを実行してBEをスケールアウトします。 +BEをスケールアウトするために以下のコマンドを実行します。 ~~~sql alter system add backend 'be_host:be_heartbeat_service_port'; ~~~ -以下のコマンドを実行してBEのステータスを確認します。 +BEのステータスを確認するために以下のコマンドを実行します。 ~~~sql show proc '/backends'; @@ -59,30 +59,30 @@ show proc '/backends'; ### BEのスケールイン -BEノードをスケールインする方法には、`DROP` と `DECOMMISSION` の2つがあります。 +BEノードをスケールインする方法には、`DROP`と`DECOMMISSION`の2つがあります。 -`DROP`はBEノードを即座に削除し、失われた複製はFEのスケジューリングによって補われます。`DECOMMISSION`はまず複製が補われることを確認してからBEノードを削除します。`DECOMMISSION`の方が少し扱いやすく、BEのスケールインには推奨されます。 +`DROP`はBEノードを直ちに削除し、失われた複製はFEスケジューリングによって補充されます。`DECOMMISSION`は、まず複製が補充されていることを確認してからBEノードを削除します。`DECOMMISSION`の方がより安全であり、BEのスケールインには推奨されます。 -両方のメソッドのコマンドは似ています: +両方の方法のコマンドは似ています: * `alter system decommission backend "be_host:be_heartbeat_service_port";` * `alter system drop backend "be_host:be_heartbeat_service_port";` -バックエンドのドロップは危険な操作であるため、実行する前に二重に確認する必要があります。 +バックエンドのドロップは危険な操作であるため、実行する前に二度確認する必要があります。 * `alter system drop backend "be_host:be_heartbeat_service_port";` -## CNのスケールインとスケールアウト +## CNのスケールイン・アウト ### CNのスケールアウト -以下のコマンドを実行してCNをスケールアウトします。 +CNをスケールアウトするために以下のコマンドを実行します。 ~~~sql ALTER SYSTEM ADD COMPUTE NODE "cn_host:cn_heartbeat_service_port"; ~~~ -以下のコマンドを実行してCNのステータスを確認します。 +CNのステータスを確認するために以下のコマンドを実行します。 ~~~sql SHOW PROC '/compute_nodes'; @@ -90,10 +90,10 @@ SHOW PROC '/compute_nodes'; ### CNのスケールイン -CNのスケールインはスケールアウトと似ています。以下のコマンドを実行してCNをスケールインします。 +CNのスケールインはスケールアウトと似ています。CNをスケールインするために以下のコマンドを実行します。 ~~~sql ALTER SYSTEM DROP COMPUTE NODE "cn_host:cn_heartbeat_service_port"; ~~~ -`SHOW PROC '/compute_nodes';` を実行してノード情報を確認できます。 +`SHOW PROC '/compute_nodes';`を実行してノード情報を確認できます。 diff --git a/docs/ja/administration/management/audit_loader.md b/docs/ja/administration/management/audit_loader.md index 2038c1a..b325935 100644 --- a/docs/ja/administration/management/audit_loader.md +++ b/docs/ja/administration/management/audit_loader.md @@ -4,38 +4,38 @@ displayed_sidebar: docs # AuditLoader を介した StarRocks 内での監査ログの管理 -このトピックでは、プラグイン AuditLoader を介して、テーブル内で StarRocks の監査ログを管理する方法について説明します。 +このトピックでは、AuditLoader プラグインを介して、テーブル内で StarRocks 監査ログを管理する方法について説明します。 -StarRocks は、監査ログを内部データベースではなく、ローカルファイル **fe/log/fe.audit.log** に保存します。プラグイン AuditLoader を使用すると、クラスター内で直接監査ログを管理できます。インストールされると、AuditLoader はファイルからログを読み取り、HTTP PUT を介して StarRocks にロードします。その後、SQL ステートメントを使用して StarRocks で監査ログをクエリできます。 +StarRocks は、監査ログを内部データベースではなく、ローカルファイル **fe/log/fe.audit.log** に保存します。AuditLoader プラグインを使用すると、クラスター内で直接監査ログを管理できます。インストールされると、AuditLoader はファイルからログを読み取り、HTTP PUT を介して StarRocks にロードします。その後、SQL ステートメントを使用して StarRocks の監査ログをクエリできます。 -## 監査ログを保存するテーブルを作成する +## 監査ログを保存するテーブルの作成 -StarRocks クラスターにデータベースとテーブルを作成して、監査ログを保存します。詳細な手順については、[CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) および [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md) を参照してください。 +StarRocks クラスターにデータベースとテーブルを作成し、監査ログを保存します。詳細な手順については、[CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) および [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md) を参照してください。 -監査ログのフィールドは StarRocks のバージョンによって異なるため、アップグレード時の互換性の問題を避けるために、以下に記載されている推奨事項に従うことが重要です。 +StarRocks のバージョンによって監査ログのフィールドが異なるため、アップグレード時の互換性の問題を避けるために、以下の推奨事項に従うことが重要です。 -> **注意** +> **CAUTION** > > - すべての新しいフィールドは `NULL` とマークする必要があります。 -> - フィールドの名前を変更してはいけません。ユーザーがそれらに依存している可能性があるためです。 -> - フィールドタイプには、`VARCHAR(32)` -> `VARCHAR(64)` のように、後方互換性のある変更のみを適用して、挿入時のエラーを回避する必要があります。 -> - `AuditEvent` フィールドは名前のみで解決されます。テーブル内の列の順序は重要ではなく、ユーザーはいつでも変更できます。 -> - テーブルに存在しない `AuditEvent` フィールドは無視されるため、ユーザーは不要な列を削除できます。 +> - フィールドは、ユーザーが依存している可能性があるため、名前を変更してはなりません。 +> - 挿入時のエラーを避けるために、フィールド型には下位互換性のある変更のみを適用する必要があります。例: `VARCHAR(32)` -> `VARCHAR(64)`。 +> - `AuditEvent` フィールドは名前のみで解決されます。テーブル内のカラムの順序は重要ではなく、ユーザーはいつでも変更できます。 +> - テーブルに存在しない `AuditEvent` フィールドは無視されるため、ユーザーは不要なカラムを削除できます。 ```SQL CREATE DATABASE starrocks_audit_db__; CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( - `queryId` VARCHAR(64) COMMENT "クエリの一意のID", - `timestamp` DATETIME NOT NULL COMMENT "クエリ開始時刻", + `queryId` VARCHAR(64) COMMENT "クエリの一意なID", + `timestamp` DATETIME NOT NULL COMMENT "クエリ開始時間", `queryType` VARCHAR(12) COMMENT "クエリタイプ (query, slow_query, connection)", `clientIp` VARCHAR(32) COMMENT "クライアントIP", `user` VARCHAR(64) COMMENT "クエリユーザー名", - `authorizedUser` VARCHAR(64) COMMENT "ユーザーの一意の識別子 (user_identity)", + `authorizedUser` VARCHAR(64) COMMENT "ユーザーの一意な識別子 (user_identity)", `resourceGroup` VARCHAR(64) COMMENT "リソースグループ名", `catalog` VARCHAR(32) COMMENT "カタログ名", `db` VARCHAR(96) COMMENT "クエリが実行されるデータベース", - `state` VARCHAR(8) COMMENT "クエリの状態 (EOF, ERR, OK)", + `state` VARCHAR(8) COMMENT "クエリ状態 (EOF, ERR, OK)", `errorCode` VARCHAR(512) COMMENT "エラーコード", `queryTime` BIGINT COMMENT "クエリ実行時間 (ミリ秒)", `scanBytes` BIGINT COMMENT "クエリによってスキャンされたバイト数", @@ -45,13 +45,13 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( `memCostBytes` BIGINT COMMENT "クエリによって消費されたメモリ (バイト)", `stmtId` INT COMMENT "SQLステートメントのインクリメンタルID", `isQuery` TINYINT COMMENT "SQLがクエリであるかどうか (1または0)", - `feIp` VARCHAR(128) COMMENT "ステートメントを実行したFEのIP", + `feIp` VARCHAR(128) COMMENT "ステートメントを実行したFE IP", `stmt` VARCHAR(1048576) COMMENT "元のSQLステートメント", - `digest` VARCHAR(32) COMMENT "遅いSQLのフィンガープリント", - `planCpuCosts` DOUBLE COMMENT "クエリ計画中のCPU使用率 (ナノ秒)", - `planMemCosts` DOUBLE COMMENT "クエリ計画中のメモリ使用量 (バイト)", + `digest` VARCHAR(32) COMMENT "低速SQLのフィンガープリント", + `planCpuCosts` DOUBLE COMMENT "クエリプランニング中のCPU使用率 (ナノ秒)", + `planMemCosts` DOUBLE COMMENT "クエリプランニング中のメモリ使用率 (バイト)", `pendingTimeMs` BIGINT COMMENT "クエリがキューで待機した時間 (ミリ秒)", - `candidateMVs` VARCHAR(65533) NULL COMMENT "候補となるマテリアライズドビューのリスト", + `candidateMVs` VARCHAR(65533) NULL COMMENT "候補のマテリアライズドビューのリスト", `hitMvs` VARCHAR(65533) NULL COMMENT "一致したマテリアライズドビューのリスト", `warehouse` VARCHAR(32) NULL COMMENT "ウェアハウス名" ) ENGINE = OLAP @@ -64,7 +64,7 @@ PROPERTIES ( ); ``` -`starrocks_audit_tbl__` は動的パーティションで作成されます。デフォルトでは、テーブルが作成されてから10分後に最初の動的パーティションが作成されます。その後、監査ログをテーブルにロードできます。次のステートメントを使用して、テーブル内のパーティションを確認できます。 +`starrocks_audit_tbl__` は動的パーティションで作成されます。デフォルトでは、最初の動的パーティションはテーブル作成後10分で作成されます。その後、監査ログをテーブルにロードできます。次のステートメントを使用して、テーブル内のパーティションを確認できます。 ```SQL SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; @@ -72,9 +72,9 @@ SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; パーティションが作成されたら、次のステップに進むことができます。 -## AuditLoader をダウンロードして設定する +## AuditLoader のダウンロードと設定 -1. [AuditLoader](https://releases.starrocks.io/resources/auditloader.zip) インストールパッケージをダウンロードします。このパッケージは、利用可能なすべてのバージョンの StarRocks と互換性があります。 +1. [AuditLoader インストールパッケージ](https://releases.starrocks.io/resources/auditloader.zip) をダウンロードします。このパッケージは、利用可能なすべての StarRocks バージョンと互換性があります。 2. インストールパッケージを解凍します。 @@ -82,35 +82,35 @@ SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; unzip auditloader.zip ``` - 以下のファイルが解凍されます。 + 以下のファイルが展開されます。 - **auditloader.jar**: AuditLoader の JAR ファイル。 - **plugin.properties**: AuditLoader のプロパティファイル。このファイルを変更する必要はありません。 - - **plugin.conf**: AuditLoader の設定ファイル。ほとんどの場合、`user` および `password` フィールドのみを変更する必要があります。 + - **plugin.conf**: AuditLoader の設定ファイル。ほとんどの場合、このファイルの `user` および `password` フィールドのみを変更する必要があります。 -3. AuditLoader を設定するために **plugin.conf** を変更します。AuditLoader が正しく機能するように、以下の項目を設定する必要があります。 +3. AuditLoader を設定するために **plugin.conf** を変更します。AuditLoader が正しく動作するために、以下の項目を設定する必要があります。 - - `frontend_host_port`: FE の IP アドレスと HTTP ポート。`:` の形式です。デフォルト値 `127.0.0.1:8030` に設定することをお勧めします。StarRocks の各 FE は独自の監査ログを独立して管理し、プラグインをインストールすると、各 FE は独自のバックグラウンドスレッドを開始して監査ログをフェッチおよび保存し、Stream Load を介してそれらを書き込みます。`frontend_host_port` 設定項目は、プラグインのバックグラウンド Stream Load タスクに HTTP プロトコルの IP とポートを提供するために使用され、このパラメータは複数の値をサポートしません。パラメータの IP 部分はクラスター内の任意の FE の IP を使用できますが、対応する FE がクラッシュした場合、他の FE のバックグラウンドにある監査ログ書き込みタスクも通信障害のために失敗するため、推奨されません。デフォルト値 `127.0.0.1:8030` に設定することをお勧めします。これにより、各 FE が自身の HTTP ポートを使用して通信し、他の FE の例外が発生した場合の通信への影響を回避できます(すべての書き込みタスクは最終的に FE Leader ノードに転送されて実行されます)。 + - `frontend_host_port`: FE の IP アドレスと HTTP ポート。形式は `:` です。デフォルト値 `127.0.0.1:8030` に設定することをお勧めします。StarRocks の各 FE は独自の監査ログを独立して管理しており、プラグインをインストールすると、各 FE は独自のバックグラウンドスレッドを開始して監査ログをフェッチおよび保存し、Stream Load を介して書き込みます。`frontend_host_port` 設定項目は、プラグインのバックグラウンド Stream Load タスクの HTTP プロトコルの IP とポートを提供するために使用され、このパラメータは複数の値をサポートしません。パラメータの IP 部分はクラスター内の任意の FE の IP を使用できますが、対応する FE がクラッシュした場合、通信の失敗により他の FE のバックグラウンドでの監査ログ書き込みタスクも失敗するため、お勧めしません。デフォルト値 `127.0.0.1:8030` に設定することをお勧めします。これにより、各 FE は独自の HTTP ポートを使用して通信し、他の FE の例外が発生した場合の通信への影響を回避できます(すべての書き込みタスクは最終的に FE Leader ノードに転送されて実行されます)。 - `database`: 監査ログをホストするために作成したデータベースの名前。 - `table`: 監査ログをホストするために作成したテーブルの名前。 - - `user`: クラスターのユーザー名。テーブルにデータをロードする権限(LOAD_PRIV)を持っている必要があります。 + - `user`: クラスターのユーザー名。テーブルにデータをロードする権限 (LOAD_PRIV) を持っている必要があります。 - `password`: ユーザーのパスワード。 - - `secret_key`: パスワードを暗号化するために使用されるキー(文字列、16バイト以下である必要があります)。このパラメータが設定されていない場合、**plugin.conf** のパスワードは暗号化されず、`password` に平文のパスワードを指定するだけでよいことを示します。このパラメータが指定されている場合、パスワードはこのキーによって暗号化され、`password` に暗号化された文字列を指定する必要があります。暗号化されたパスワードは、StarRocks で `AES_ENCRYPT` 関数を使用して生成できます: `SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`。 - - `filter`: 監査ログロードのフィルター条件。このパラメータは、Stream Load の [WHERE パラメータ](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties) に基づいており、つまり `-H “where: ”` で、デフォルトは空の文字列です。例: `filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`。 + - `secret_key`: パスワードの暗号化に使用されるキー(文字列、16バイト以下である必要があります)。このパラメータが設定されていない場合、**plugin.conf** のパスワードは暗号化されず、`password` に平文のパスワードを指定するだけでよいことを示します。このパラメータが指定されている場合、パスワードはこのキーによって暗号化されており、`password` に暗号化された文字列を指定する必要があることを示します。暗号化されたパスワードは、StarRocks で `AES_ENCRYPT` 関数を使用して生成できます: `SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`。 + - `filter`: 監査ログロードのフィルター条件。このパラメータは、Stream Load の [WHERE パラメータ](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties) に基づいており、`-H “where: ”` と同義で、デフォルトは空の文字列です。例: `filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`。 -4. ファイルをパッケージに戻して zip 圧縮します。 +4. ファイルをパッケージに再圧縮します。 ```shell zip -q -m -r auditloader.zip auditloader.jar plugin.conf plugin.properties ``` -5. パッケージをすべての FE ノードをホストするマシンに配布します。すべてのパッケージが同一のパスに保存されていることを確認してください。そうでない場合、インストールは失敗します。パッケージを配布した後、パッケージへの絶対パスをコピーすることを忘れないでください。 +5. パッケージを FE ノードをホストするすべてのマシンに配布します。すべてのパッケージが同一のパスに保存されていることを確認してください。そうでない場合、インストールは失敗します。パッケージを配布した後、パッケージへの絶対パスをコピーすることを忘れないでください。 - > **注** + > **NOTE** > - > **auditloader.zip** をすべての FE がアクセスできる HTTP サービス(例: `httpd` や `nginx`)に配布し、ネットワーク経由でインストールすることもできます。どちらの場合も、インストールが実行された後、**auditloader.zip** はパスに永続化される必要があり、インストール後にソースファイルを削除してはならないことに注意してください。 + > **auditloader.zip** をすべての FE がアクセスできる HTTP サービス (例えば、`httpd` や `nginx`) に配布し、ネットワーク経由でインストールすることもできます。どちらの場合も、インストールが実行された後、**auditloader.zip** はパスに永続化され、インストール後にソースファイルを削除してはならないことに注意してください。 -## AuditLoader をインストールする +## AuditLoader のインストール コピーしたパスとともに次のステートメントを実行して、AuditLoader を StarRocks のプラグインとしてインストールします。 @@ -124,9 +124,9 @@ INSTALL PLUGIN FROM ""; INSTALL PLUGIN FROM ""; ``` -ネットワークパス経由でプラグインをインストールする場合は、INSTALL ステートメントのプロパティでパッケージの md5 を提供する必要があります。 +ネットワークパスを介してプラグインをインストールする場合は、INSTALL ステートメントのプロパティでパッケージの md5 を提供する必要があります。 -例: +例: ```sql INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5sum" = "3975F7B880C9490FE95F42E2B2A28E2D"); @@ -134,18 +134,18 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 詳細な手順については、[INSTALL PLUGIN](../../sql-reference/sql-statements/cluster-management/plugin/INSTALL_PLUGIN.md) を参照してください。 -## インストールの確認と監査ログのクエリ +## インストールの検証と監査ログのクエリ 1. [SHOW PLUGINS](../../sql-reference/sql-statements/cluster-management/plugin/SHOW_PLUGINS.md) を介して、インストールが成功したかどうかを確認できます。 - 次の例では、プラグイン `AuditLoader` の `Status` が `INSTALLED` であり、インストールが成功したことを意味します。 + 以下の例では、プラグイン `AuditLoader` の `Status` が `INSTALLED` であり、インストールが成功したことを意味します。 ```Plain mysql> SHOW PLUGINS\G *************************** 1. row *************************** Name: __builtin_AuditLogBuilder Type: AUDIT - Description: 組み込み監査ロガー + Description: builtin audit logger Version: 0.12.0 JavaVersion: 1.8.31 ClassName: com.starrocks.qe.AuditLogBuilder @@ -156,7 +156,7 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 *************************** 2. row *************************** Name: AuditLoader Type: AUDIT - Description: バージョン3.3.11以降で利用可能。監査ログをStarRocksにロードし、ユーザーはクエリの統計を表示できます + Description: Available for versions 3.3.11+. Load audit log to starrocks, and user can view the statistic of queries Version: 5.0.0 JavaVersion: 11 ClassName: com.starrocks.plugin.audit.AuditLoaderPlugin @@ -167,7 +167,7 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 2 rows in set (0.01 sec) ``` -2. いくつかのランダムな SQL を実行して監査ログを生成し、60秒間(または AuditLoader を設定したときに `max_batch_interval_sec` アイテムで指定した時間)待って、AuditLoader が監査ログを StarRocks にロードできるようにします。 +2. いくつかのランダムな SQL を実行して監査ログを生成し、60 秒(または AuditLoader を設定したときに `max_batch_interval_sec` 項目で指定した時間)待って、AuditLoader が監査ログを StarRocks にロードできるようにします。 3. テーブルをクエリして監査ログを確認します。 @@ -175,7 +175,7 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__; ``` - 次の例は、監査ログがテーブルに正常にロードされたことを示しています。 + 以下の例は、監査ログがテーブルに正常にロードされたことを示しています。 ```Plain mysql> SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__\G @@ -212,10 +212,10 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 ## トラブルシューティング -動的パーティションが作成され、プラグインがインストールされた後もテーブルに監査ログがロードされない場合は、**plugin.conf** が正しく設定されているかどうかを確認できます。変更するには、まずプラグインをアンインストールする必要があります。 +動的パーティションが作成され、プラグインがインストールされた後も監査ログがテーブルにロードされない場合は、**plugin.conf** が適切に設定されているかどうかを確認できます。変更するには、まずプラグインをアンインストールする必要があります。 ```SQL UNINSTALL PLUGIN AuditLoader; ``` -AuditLoader のログは **fe.log** に出力されます。**fe.log** でキーワード `audit` を検索して取得できます。すべての設定が正しく行われたら、上記の手順に従って AuditLoader を再度インストールできます。 +AuditLoader のログは **fe.log** に出力されます。**fe.log** でキーワード `audit` を検索することで取得できます。すべての設定が正しく行われた後、上記の手順に従って AuditLoader を再度インストールできます。 diff --git a/docs/ja/administration/management/compaction.md b/docs/ja/administration/management/compaction.md index c61008c..75483e4 100644 --- a/docs/ja/administration/management/compaction.md +++ b/docs/ja/administration/management/compaction.md @@ -2,46 +2,46 @@ displayed_sidebar: docs --- -# 共有データクラスターのCompaction +# 共有データクラスターのコンパクション -このトピックでは、StarRocksの共有データクラスターでCompactionを管理する方法について説明します。 +このトピックでは、StarRocks の共有データクラスターにおけるコンパクションの管理方法について説明します。 ## 概要 -StarRocksでの各データロード操作は、データファイルの新しいバージョンを生成します。Compactionは、異なるバージョンのデータファイルをより大きなファイルにマージし、小さなファイルの数を減らしてクエリ効率を向上させます。 +StarRocks の各データロード操作は、データファイルの新しいバージョンを生成します。コンパクションは、異なるバージョンのデータファイルをより大きなファイルにマージし、小さなファイルの数を減らしてクエリ効率を向上させます。 -## Compaction Score +## コンパクションスコア ### 概要 -*Compaction Score* は、パーティション内のデータファイルのマージ状況を反映します。スコアが高いほどマージの進行度が低いことを示し、パーティションに未マージのデータファイルバージョンがより多く存在することを意味します。FEは、各パーティションのCompaction Score情報を維持しており、これにはMax Compaction Score(パーティション内のすべてのTabletで最も高いスコア)が含まれます。 +*コンパクションスコア*は、パーティション内のデータファイルのマージ状況を反映します。スコアが高いほどマージの進行度が低いことを示し、パーティションに未マージのデータファイルバージョンがより多く存在することを意味します。FE は各パーティションのコンパクションスコア情報を維持しており、これには Max Compaction Score (パーティション内のすべてのタブレットの中で最も高いスコア) が含まれます。 -パーティションのMax Compaction ScoreがFEパラメーター `lake_compaction_score_selector_min_score` (デフォルト: 10) を下回る場合、そのパーティションのCompactionは完了していると見なされます。Max Compaction Scoreが100を超える場合は、Compactionが不健全な状態であることを示します。スコアがFEパラメーター `lake_ingest_slowdown_threshold` (デフォルト: 100) を超えると、システムはそのパーティションのデータロードトランザクションコミットを減速させます。 `lake_compaction_score_upper_bound` (デフォルト: 2000) を超えた場合、システムはそのパーティションのインポートトランザクションを拒否します。 +パーティションの Max Compaction Score が FE パラメータ `lake_compaction_score_selector_min_score` (デフォルト: 10) を下回る場合、そのパーティションのコンパクションは完了したと見なされます。Max Compaction Score が 100 を超える場合は、不健全なコンパクション状態を示します。スコアが FE パラメータ `lake_ingest_slowdown_threshold` (デフォルト: 100) を超える場合、システムはそのパーティションのデータロードトランザクションコミットを遅延させます。`lake_compaction_score_upper_bound` (デフォルト: 2000) を超える場合、システムはそのパーティションのインポートトランザクションを拒否します。 ### 計算ルール -通常、各データファイルはCompaction Scoreに1貢献します。たとえば、パーティションに1つのTabletがあり、最初のロード操作で生成されたデータファイルが10個ある場合、パーティションのMax Compaction Scoreは10です。Tablet内でトランザクションによって生成されたすべてのデータファイルは、Rowsetとしてグループ化されます。 +通常、各データファイルはコンパクションスコアに 1 を寄与します。例えば、パーティションに 1 つのタブレットと最初のロード操作で生成された 10 個のデータファイルがある場合、パーティションの Max Compaction Score は 10 です。タブレット内のトランザクションによって生成されたすべてのデータファイルは Rowset としてグループ化されます。 -スコア計算中、TabletのRowsetはサイズ別にグループ化され、ファイル数が最も多いグループがTabletのCompaction Scoreを決定します。 +スコア計算中、タブレットの Rowset はサイズ別にグループ化され、ファイル数が最も多いグループがタブレットのコンパクションスコアを決定します。 -たとえば、Tabletが7回のロード操作を経て、100 MB、100 MB、100 MB、10 MB、10 MB、10 MB、10 MBというサイズのRowsetを生成します。計算中、システムは3つの100 MBのRowsetを1つのグループにし、4つの10 MBのRowsetを別のグループにします。Compaction Scoreは、より多くのファイルを持つグループに基づいて計算されます。この場合、2番目のグループの方がCompaction Scoreが高くなります。Compactionはスコアが高いグループを優先するため、最初のCompaction後、Rowsetの分布は100 MB、100 MB、100 MB、および40 MBになります。 +例えば、タブレットが 7 回のロード操作を受け、サイズが 100 MB、100 MB、100 MB、10 MB、10 MB、10 MB、10 MB の Rowset を生成したとします。計算中、システムは 3 つの 100 MB Rowset を 1 つのグループに、4 つの 10 MB Rowset を別のグループにします。コンパクションスコアは、ファイル数が多いグループに基づいて計算されます。この場合、2 番目のグループの方がコンパクションスコアが高くなります。コンパクションはスコアが高いグループを優先するため、最初のコンパクションの後、Rowset の分布は 100 MB、100 MB、100 MB、および 40 MB となります。 -## Compactionワークフロー +## コンパクションワークフロー -共有データクラスターの場合、StarRocksはFEによって制御される新しいCompactionメカニズムを導入しています。 +共有データクラスターの場合、StarRocks は新しい FE 制御のコンパクションメカニズムを導入しています。 -1. **スコア計算**: Leader FEノードは、トランザクションの公開結果に基づいて、パーティションのCompaction Scoreを計算し保存します。 -2. **候補選択**: FEは、最も高いMax Compaction Scoreを持つパーティションをCompaction候補として選択します。 -3. **タスク生成**: FEは、選択されたパーティションに対してCompactionトランザクションを開始し、Tabletレベルのサブタスクを生成し、FEパラメーター `lake_compaction_max_tasks` で設定された制限に達するまでCompute Nodes (CNs) にディスパッチします。 -4. **サブタスク実行**: CNsはバックグラウンドでCompactionサブタスクを実行します。CNごとの同時サブタスクの数は、CNパラメーター `compact_threads` によって制御されます。 -5. **結果収集**: FEはサブタスクの結果を集計し、Compactionトランザクションをコミットします。 -6. **公開**: FEは、正常にコミットされたCompactionトランザクションを公開します。 +1. **スコア計算**: リーダー FE ノードは、トランザクション発行結果に基づいてパーティションのコンパクションスコアを計算し、保存します。 +2. **候補選択**: FE は、Max Compaction Score が最も高いパーティションをコンパクション候補として選択します。 +3. **タスク生成**: FE は、選択されたパーティションに対してコンパクショントランザクションを開始し、タブレットレベルのサブタスクを生成し、FE パラメータ `lake_compaction_max_tasks` で設定された制限に達するまで、それらを Compute Nodes (CN) にディスパッチします。 +4. **サブタスク実行**: CN はバックグラウンドでコンパクションサブタスクを実行します。CN ごとの同時実行サブタスクの数は、CN パラメータ `compact_threads` によって制御されます。 +5. **結果収集**: FE はサブタスクの結果を集約し、コンパクショントランザクションをコミットします。 +6. **発行**: FE は、正常にコミットされたコンパクショントランザクションを発行します。 -## Compactionの管理 +## コンパクションの管理 -### Compaction Scoreの表示 +### コンパクションスコアの表示 -- `SHOW PROC` ステートメントを使用して、特定のテーブルのパーティションのCompaction Scoreを表示できます。通常、`MaxCS` フィールドにのみ注目すれば十分です。`MaxCS` が10未満の場合、Compactionは完了していると見なされます。`MaxCS` が100を超える場合、Compaction Scoreは比較的高くなります。`MaxCS` が500を超える場合、Compaction Scoreは非常に高く、手動での介入が必要になる場合があります。 +- SHOW PROC ステートメントを使用して、特定のテーブルのパーティションのコンパクションスコアを表示できます。通常、`MaxCS` フィールドにのみ注目すれば十分です。`MaxCS` が 10 未満の場合、コンパクションは完了したと見なされます。`MaxCS` が 100 を超える場合、コンパクションスコアは比較的高くなっています。`MaxCS` が 500 を超える場合、コンパクションスコアは非常に高く、手動での介入が必要になる場合があります。 ```Plain SHOW PARTITIONS FROM @@ -60,7 +60,7 @@ StarRocksでの各データロード操作は、データファイルの新し 1 row in set (0.20 sec) ``` -- システム定義ビュー `information_schema.partitions_meta` をクエリして、パーティションのCompaction Scoreを表示することもできます。 +- システム定義ビュー `information_schema.partitions_meta` をクエリすることで、パーティションのコンパクションスコアを表示することもできます。 例: @@ -82,13 +82,13 @@ StarRocksでの各データロード操作は、データファイルの新し +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ ``` -### Compactionタスクの表示 +### コンパクションタスクの表示 -システムに新しいデータがロードされると、FEは異なるCNノードで実行されるCompactionタスクを常にスケジューリングします。最初にFEでCompactionタスクの一般的なステータスを表示し、次にCNで各タスクの実行詳細を表示できます。 +新しいデータがシステムにロードされると、FE はコンパクションタスクを絶えずスケジュールし、さまざまな CN ノードで実行します。最初に FE でコンパクションタスクの全体的なステータスを表示し、次に CN で各タスクの実行詳細を表示できます。 -#### Compactionタスクの一般的なステータスを表示する +#### コンパクションタスクの全体的なステータスの表示 -`SHOW PROC` ステートメントを使用して、Compactionタスクの一般的なステータスを表示できます。 +SHOW PROC ステートメントを使用して、コンパクションタスクの全体的なステータスを表示できます。 ```SQL SHOW PROC '/compactions'; @@ -105,30 +105,31 @@ mysql> SHOW PROC '/compactions'; | ssb.lineorder.10068 | 16 | 2026-01-10 03:29:07 | 2026-01-10 03:29:13 | 2026-01-10 03:29:14 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":38} | | ssb.lineorder.10055 | 20 | 2026-01-10 03:29:11 | 2026-01-10 03:29:15 | 2026-01-10 03:29:17 | NULL | {"sub_task_count":12,"read_local_sec":0,"read_local_mb":218,"read_remote_sec":0,"read_remote_mb":0,"read_segment_count":120,"write_segment_count":12,"write_segment_mb":218,"write_remote_sec":4,"in_queue_sec":23} | +---------------------+-------+---------------------+---------------------+---------------------+-------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` 以下のフィールドが返されます。 -- `Partition`: Compactionタスクが属するパーティション。 -- `TxnID`: Compactionタスクに割り当てられたトランザクションID。 -- `StartTime`: Compactionタスクが開始された時刻。`NULL` は、タスクがまだ開始されていないことを示します。 -- `CommitTime`: Compactionタスクがデータをコミットした時刻。`NULL` は、データがまだコミットされていないことを示します。 -- `FinishTime`: Compactionタスクがデータを公開した時刻。`NULL` は、データがまだ公開されていないことを示します。 -- `Error`: Compactionタスクのエラーメッセージ(もしあれば)。 -- `Profile`: (v3.2.12およびv3.3.4以降でサポート) 完了後のCompactionタスクのProfile。 - - `sub_task_count`: パーティション内のサブタスク(Tabletに相当)の数。 +- `Partition`: コンパクションタスクが属するパーティション。 +- `TxnID`: コンパクションタスクに割り当てられたトランザクション ID。 +- `StartTime`: コンパクションタスクが開始された時刻。`NULL` はタスクがまだ開始されていないことを示します。 +- `CommitTime`: コンパクションタスクがデータをコミットした時刻。`NULL` はデータがまだコミットされていないことを示します。 +- `FinishTime`: コンパクションタスクがデータを発行した時刻。`NULL` はデータがまだ発行されていないことを示します。 +- `Error`: コンパクションタスクのエラーメッセージ (存在する場合)。 +- `Profile`: (v3.2.12 および v3.3.4 以降でサポート) 完了したコンパクションタスクのプロファイル。 + - `sub_task_count`: パーティション内のサブタスク (タブレットと同等) の数。 - `read_local_sec`: すべてのサブタスクがローカルキャッシュからデータを読み取るのにかかった合計時間。単位: 秒。 - `read_local_mb`: すべてのサブタスクがローカルキャッシュから読み取ったデータの合計サイズ。単位: MB。 - `read_remote_sec`: すべてのサブタスクがリモートストレージからデータを読み取るのにかかった合計時間。単位: 秒。 - `read_remote_mb`: すべてのサブタスクがリモートストレージから読み取ったデータの合計サイズ。単位: MB。 - - `read_segment_count`: すべてのサブタスクによって読み取られたファイルの総数。 - - `write_segment_count`: すべてのサブタスクによって生成された新しいファイルの総数。 - - `write_segment_mb`: すべてのサブタスクによって生成された新しいファイルの合計サイズ。単位: MB。 + - `read_segment_count`: すべてのサブタスクが読み取ったファイルの合計数。 + - `write_segment_count`: すべてのサブタスクが生成した新しいファイルの合計数。 + - `write_segment_mb`: すべてのサブタスクが生成した新しいファイルの合計サイズ。単位: MB。 - `write_remote_sec`: すべてのサブタスクがリモートストレージにデータを書き込むのにかかった合計時間。単位: 秒。 - - `in_queue_sec`: すべてのサブタスクがキューに滞留した合計時間。単位: 秒。 + - `in_queue_sec`: すべてのサブタスクがキューに滞留していた合計時間。単位: 秒。 -#### Compactionタスクの実行詳細を表示する +#### コンパクションタスクの実行詳細の表示 -各Compactionタスクは複数のサブタスクに分割され、それぞれがTabletに対応します。システム定義ビュー `information_schema.be_cloud_native_compactions` をクエリして、各サブタスクの実行詳細を表示できます。 +各コンパクションタスクは複数のサブタスクに分割され、それぞれがタブレットに対応します。システム定義ビュー `information_schema.be_cloud_native_compactions` をクエリすることで、各サブタスクの実行詳細を表示できます。 例: @@ -144,34 +145,35 @@ mysql> SELECT * FROM information_schema.be_cloud_native_compactions; | 10001 | 51052 | 43036 | 12 | 0 | 0 | NULL | NULL | 0 | | | | 10001 | 51053 | 43035 | 12 | 0 | 1 | 2024-09-24 19:15:16 | NULL | 2 | | {"read_local_sec":0,"read_local_mb":1,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":100,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | +-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` 以下のフィールドが返されます。 -- `BE_ID`: CNのID。 -- `TXN_ID`: サブタスクが属するトランザクションのID。 -- `TABLET_ID`: サブタスクが属するTabletのID。 -- `VERSION`: Tabletのバージョン。 +- `BE_ID`: CN の ID。 +- `TXN_ID`: サブタスクが属するトランザクションの ID。 +- `TABLET_ID`: サブタスクが属するタブレットの ID。 +- `VERSION`: タブレットのバージョン。 - `RUNS`: サブタスクが実行された回数。 - `START_TIME`: サブタスクが開始された時刻。 - `FINISH_TIME`: サブタスクが完了した時刻。 -- `PROGRESS`: TabletのCompaction進捗状況(パーセンテージ)。 -- `STATUS`: サブタスクのステータス。エラーがある場合は、このフィールドにエラーメッセージが返されます。 -- `PROFILE`: (v3.2.12およびv3.3.4以降でサポート) サブタスクのランタイムプロファイル。 +- `PROGRESS`: タブレットのコンパクションの進行状況(パーセンテージ)。 +- `STATUS`: サブタスクのステータス。エラーがある場合はこのフィールドにエラーメッセージが返されます。 +- `PROFILE`: (v3.2.12 および v3.3.4 以降でサポート) サブタスクのランタイムプロファイル。 - `read_local_sec`: サブタスクがローカルキャッシュからデータを読み取るのにかかった時間。単位: 秒。 - `read_local_mb`: サブタスクがローカルキャッシュから読み取ったデータのサイズ。単位: MB。 - `read_remote_sec`: サブタスクがリモートストレージからデータを読み取るのにかかった時間。単位: 秒。 - `read_remote_mb`: サブタスクがリモートストレージから読み取ったデータのサイズ。単位: MB。 - `read_local_count`: サブタスクがローカルキャッシュからデータを読み取った回数。 - `read_remote_count`: サブタスクがリモートストレージからデータを読み取った回数。 - - `in_queue_sec`: サブタスクがキューに滞留した時間。単位: 秒。 + - `in_queue_sec`: サブタスクがキューに滞留していた時間。単位: 秒。 -### Compactionタスクの設定 +### コンパクションタスクの構成 -これらのFEおよびCN (BE) パラメーターを使用してCompactionタスクを設定できます。 +これらの FE および CN (BE) パラメータを使用して、コンパクションタスクを構成できます。 -#### FEパラメーター +#### FE パラメータ -以下のFEパラメーターは動的に設定できます。 +以下の FE パラメータを動的に構成できます。 ```SQL ADMIN SET FRONTEND CONFIG ("lake_compaction_max_tasks" = "-1"); @@ -182,8 +184,8 @@ ADMIN SET FRONTEND CONFIG ("lake_compaction_max_tasks" = "-1"); - デフォルト: -1 - タイプ: Int - 単位: - -- 変更可能: はい -- 説明: 共有データクラスターで許可される同時Compactionタスクの最大数。この項目を`-1`に設定すると、生存しているCNノードの数に16を掛けた値として、同時タスク数を適応的に計算することを示します。この値を`0`に設定すると、Compactionが無効になります。 +- 変更可能: Yes +- 説明: 共有データクラスターで許可される同時コンパクションタスクの最大数。この項目を `-1` に設定すると、同時タスク数が適応的に計算されます。すなわち、稼働中の CN ノード数に 16 を乗じた値になります。この値を `0` に設定すると、コンパクションが無効になります。 - 導入バージョン: v3.1.0 ```SQL @@ -195,13 +197,13 @@ ADMIN SET FRONTEND CONFIG ("lake_compaction_disable_tables" = "11111;22222"); - デフォルト: "" - タイプ: String - 単位: - -- 変更可能: はい -- 説明: 特定のテーブルのCompactionを無効にします。これは開始済みのCompactionには影響しません。この項目の値はテーブルIDです。複数の値は`;`で区切られます。 +- 変更可能: Yes +- 説明: 特定のテーブルのコンパクションを無効にします。これは、既に開始されているコンパクションには影響しません。この項目の値はテーブル ID です。複数の値はセミコロン (`;`) で区切られます。 - 導入バージョン: v3.2.7 -#### CNパラメーター +#### CN パラメータ -以下のCNパラメーターは動的に設定できます。 +以下の CN パラメータを動的に構成できます。 ```SQL UPDATE information_schema.be_configs SET VALUE = 8 @@ -213,90 +215,90 @@ WHERE name = "compact_threads"; - デフォルト: 4 - タイプ: Int - 単位: - -- 変更可能: はい -- 説明: 同時Compactionタスクに使用されるスレッドの最大数。この設定は、v3.1.7およびv3.2.2以降で動的に変更可能になりました。 +- 変更可能: Yes +- 説明: 同時コンパクションタスクに使用されるスレッドの最大数。この設定は、v3.1.7 および v3.2.2 以降で動的に変更可能になりました。 - 導入バージョン: v3.0.0 -> **注記** +> **NOTE** > -> 本番環境では、`compact_threads` をBE/CNのCPUコア数の25%に設定することをお勧めします。 +> 本番環境では、`compact_threads` を BE/CN の CPU コア数の 25% に設定することをお勧めします。 ##### max_cumulative_compaction_num_singleton_deltas - デフォルト: 500 - タイプ: Int - 単位: - -- 変更可能: はい -- 説明: 単一のCumulative Compactionでマージできるセグメントの最大数。Compaction中にOOMが発生する場合、この値を減らすことができます。 +- 変更可能: Yes +- 説明: 単一の累積コンパクションでマージできるセグメントの最大数。コンパクション中に OOM が発生する場合、この値を減らすことができます。 - 導入バージョン: - -> **注記** +> **NOTE** > -> 本番環境では、Compactionタスクを高速化し、リソース消費を削減するために、`max_cumulative_compaction_num_singleton_deltas` を `100` に設定することをお勧めします。 +> 本番環境では、コンパクションタスクを高速化し、そのリソース消費を削減するために、`max_cumulative_compaction_num_singleton_deltas` を `100` に設定することをお勧めします。 ##### lake_pk_compaction_max_input_rowsets - デフォルト: 500 - タイプ: Int - 単位: - -- 変更可能: はい -- 説明: 共有データクラスターのPrimary KeyテーブルのCompactionタスクで許可される入力Rowsetの最大数。このパラメーターのデフォルト値は、v3.2.4およびv3.1.10以降で`5`から`1000`に、v3.3.1およびv3.2.9以降で`500`に変更されました。Primary Keyテーブルでサイズ階層型Compactionポリシーが有効化された後 (`enable_pk_size_tiered_compaction_strategy` を `true` に設定することで)、StarRocksは書き込み増幅を減らすために各CompactionのRowset数を制限する必要がなくなりました。したがって、このパラメーターのデフォルト値は増加しています。 +- 変更可能: Yes +- 説明: 共有データクラスターの主キーテーブルコンパクションタスクで許可される入力 Rowset の最大数。このパラメータのデフォルト値は、v3.2.4 および v3.1.10 以降で `5` から `1000` に、v3.3.1 および v3.2.9 以降で `500` に変更されました。主キーテーブルでサイズ階層型コンパクションポリシー (`enable_pk_size_tiered_compaction_strategy` を `true` に設定) が有効になった後、StarRocks は書き込み増幅を減らすために各コンパクションの Rowset 数を制限する必要がありません。したがって、このパラメータのデフォルト値は増加しています。 - 導入バージョン: v3.1.8, v3.2.3 -> **注記** +> **NOTE** > -> 本番環境では、Compactionタスクを高速化し、リソース消費を削減するために、`max_cumulative_compaction_num_singleton_deltas` を `100` に設定することをお勧めします。 +> 本番環境では、コンパクションタスクを高速化し、そのリソース消費を削減するために、`max_cumulative_compaction_num_singleton_deltas` を `100` に設定することをお勧めします。 -### 手動でCompactionタスクをトリガーする +### コンパクションタスクの手動トリガー ```SQL --- テーブル全体に対してCompactionをトリガーします。 +-- テーブル全体のコンパクションをトリガーします。 ALTER TABLE COMPACT; --- 特定のパーティションに対してCompactionをトリガーします。 +-- 特定のパーティションのコンパクションをトリガーします。 ALTER TABLE COMPACT ; --- 複数のパーティションに対してCompactionをトリガーします。 +-- 複数のパーティションのコンパクションをトリガーします。 ALTER TABLE COMPACT (, , ...); ``` -### Compactionタスクのキャンセル +### コンパクションタスクのキャンセル -タスクのトランザクションIDを使用して、Compactionタスクを手動でキャンセルできます。 +タスクのトランザクション ID を使用して、コンパクションタスクを手動でキャンセルできます。 ```SQL CANCEL COMPACTION WHERE TXN_ID = ; ``` -> **注記** +> **NOTE** > -> - `CANCEL COMPACTION` ステートメントはLeader FEノードから送信する必要があります。 -> - `CANCEL COMPACTION` ステートメントは、まだコミットされていないトランザクション、つまり `SHOW PROC '/compactions'` の戻り値で `CommitTime` がNULLであるトランザクションにのみ適用されます。 -> - `CANCEL COMPACTION` は非同期プロセスです。タスクがキャンセルされたかどうかは、`SHOW PROC '/compactions'` を実行して確認できます。 +> - CANCEL COMPACTION ステートメントは、リーダー FE ノードから送信する必要があります。 +> - CANCEL COMPACTION ステートメントは、コミットされていないトランザクション、つまり `SHOW PROC '/compactions'` の戻り値で `CommitTime` が NULL であるトランザクションにのみ適用されます。 +> - CANCEL COMPACTION は非同期プロセスです。`SHOW PROC '/compactions'` を実行してタスクがキャンセルされたかどうかを確認できます。 ## ベストプラクティス -Compactionはクエリパフォーマンスにとって非常に重要であるため、テーブルとパーティションのデータマージ状況を定期的に監視することをお勧めします。以下にいくつかのベストプラクティスとガイドラインを示します。 +コンパクションはクエリパフォーマンスにとって非常に重要であるため、テーブルとパーティションのデータマージステータスを定期的に監視することをお勧めします。以下にいくつかのベストプラクティスとガイドラインを示します。 -- ロード間の時間間隔を長くし(10秒未満の間隔のシナリオは避ける)、ロードあたりのバッチサイズを大きくするよう努めます(100行未満のデータバッチは避ける)。 -- CN上の並列Compactionワーカー スレッドの数を調整して、タスクの実行を高速化します。本番環境では、`compact_threads` をBE/CNのCPUコア数の25%に設定することをお勧めします。 -- `show proc '/compactions'` および `select * from information_schema.be_cloud_native_compactions;` を使用してCompactionタスクのステータスを監視します。 -- Compaction Scoreを監視し、それに基づいてアラートを設定します。StarRocksの組み込みGrafana監視テンプレートには、このメトリックが含まれています。 -- Compaction中のリソース消費、特にメモリ使用量に注意してください。Grafana監視テンプレートには、このメトリックも含まれています。 +- ロード間の時間間隔を長くし(10 秒未満の間隔は避ける)、ロードごとのバッチサイズを増やします(100 行未満のバッチサイズは避ける)。 +- CN 上の並列コンパクションワーカー数スレッドを調整して、タスク実行を高速化します。本番環境では、`compact_threads` を BE/CN の CPU コア数の 25% に設定することをお勧めします。 +- `show proc '/compactions'` と `select * from information_schema.be_cloud_native_compactions;` を使用して、コンパクションタスクのステータスを監視します。 +- コンパクションスコアを監視し、それに基づいてアラートを設定します。StarRocks の組み込み Grafana 監視テンプレートにはこのメトリックが含まれています。 +- コンパクション中のリソース消費、特にメモリ使用量に注意してください。Grafana 監視テンプレートにはこのメトリックも含まれています。 ## トラブルシューティング ### 遅いクエリ -適時でないCompactionによって引き起こされる遅いクエリを特定するには、SQL Profileで、単一のFragment内の `SegmentsReadCount` を `TabletCount` で割った値を確認できます。その値が数十以上のような大きな値である場合、Compactionが適時でないことが遅いクエリの原因である可能性があります。 +タイムリーでないコンパクションによって引き起こされる遅いクエリを特定するには、SQL プロファイルで、単一のフラグメント内の `SegmentsReadCount` を `TabletCount` で割った値を確認します。この値が数十以上といった大きな値である場合、タイムリーでないコンパクションが遅いクエリの原因である可能性があります。 -### クラスター内の高いMax Compaction Score +### クラスター内の Max Compaction Score が高い -1. `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` および `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"` を使用して、Compaction関連のパラメーターが適切な範囲内にあるかどうかを確認します。 -2. `SHOW PROC '/compactions'` を使用してCompactionがスタックしているかどうかを確認します。 - - `CommitTime` がNULLのままである場合、システムビュー `information_schema.be_cloud_native_compactions` を調べてCompactionがスタックしている理由を確認します。 - - `FinishTime` がNULLのままである場合、Leader FEログで `TxnID` を使用して公開失敗の理由を検索します。 -3. `SHOW PROC '/compactions'` を使用してCompactionが遅く実行されているかどうかを確認します。 - - `sub_task_count` が大きすぎる場合(`SHOW PARTITIONS` を使用してこのパーティション内の各Tabletのサイズを確認)、テーブルが不適切に作成されている可能性があります。 - - `read_remote_mb` が大きすぎる場合(読み取りデータの合計の30%を超える場合)、サーバーのディスクサイズを確認し、`SHOW BACKENDS` で `DataCacheMetrics` フィールドを通じてキャッシュクォータも確認します。 - - `write_remote_sec` が大きすぎる場合(Compactionの合計時間の90%を超える場合)、リモートストレージへの書き込みが遅すぎる可能性があります。これは、キーワード `single upload latency` および `multi upload latency` を含む共有データ固有の監視メトリックを確認することで検証できます。 - - `in_queue_sec` が大きすぎる場合(Tabletあたりの平均待機時間が60秒を超える場合)、パラメーター設定が不適切であるか、他の実行中のCompactionが遅すぎる可能性があります。 +1. `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` と `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"` を使用して、コンパクション関連のパラメータが適切な範囲内にあるかどうかを確認します。 +2. `SHOW PROC '/compactions'` を使用して、コンパクションがスタックしているかどうかを確認します。 + - `CommitTime` が NULL のままの場合、`information_schema.be_cloud_native_compactions` システムビューをチェックして、コンパクションがスタックしている理由を探します。 + - `FinishTime` が NULL のままの場合、リーダー FE ログで `TxnID` を使用して発行失敗の理由を検索します。 +3. `SHOW PROC '/compactions'` を使用して、コンパクションの実行が遅いかどうかを確認します。 + - `sub_task_count` が大きすぎる場合(`SHOW PARTITIONS` を使用してこのパーティション内の各タブレットのサイズをチェック)、テーブルが不適切に作成されている可能性があります。 + - `read_remote_mb` が大きすぎる場合(総読み取りデータの 30% を超える)、サーバーのディスクサイズをチェックし、`SHOW BACKENDS` の `DataCacheMetrics` フィールドでキャッシュクォータもチェックします。 + - `write_remote_sec` が大きすぎる場合(総コンパクション時間の 90% を超える)、リモートストレージへの書き込みが遅すぎる可能性があります。これは、`single upload latency` および `multi upload latency` というキーワードを持つ共有データ固有の監視メトリックをチェックすることで検証できます。 + - `in_queue_sec` が大きすぎる場合(タブレットあたりの平均待機時間が 60 秒を超える)、パラメータ設定が不適切であるか、他の実行中のコンパクションが遅すぎる可能性があります。 --- diff --git a/docs/ja/administration/management/configuration.mdx b/docs/ja/administration/management/configuration.mdx new file mode 100644 index 0000000..142ea23 --- /dev/null +++ b/docs/ja/administration/management/configuration.mdx @@ -0,0 +1,11 @@ +--- +displayed_sidebar: docs +--- + +# 設定 + +FEノードとBEノードの設定パラメーター。 + +import DocCardList from '@theme/DocCardList'; + + diff --git a/docs/ja/administration/management/enable_fqdn.md b/docs/ja/administration/management/enable_fqdn.md new file mode 100644 index 0000000..254e42d --- /dev/null +++ b/docs/ja/administration/management/enable_fqdn.md @@ -0,0 +1,163 @@ +--- +displayed_sidebar: docs +--- + +# FQDNアクセスを有効にする + +このトピックでは、完全修飾ドメイン名(FQDN)を使用してクラスターアクセスを有効にする方法について説明します。FQDNは、インターネット経由でアクセスできる特定のエンティティに対する**完全なドメイン名**です。FQDNは、ホスト名とドメイン名の2つの部分で構成されます。 + +2.4より前は、StarRocksはIPアドレスによるFEおよびBEへのアクセスのみをサポートしていました。FQDNを使用してノードをクラスターに追加した場合でも、最終的にはIPアドレスに変換されていました。これは、StarRocksクラスター内の特定のノードのIPアドレスを変更すると、ノードへのアクセス障害につながる可能性があるため、DBAにとって大きな不便を引き起こしていました。バージョン2.4では、StarRocksは各ノードをそのIPアドレスから分離しました。これにより、StarRocksのノードをFQDNのみで管理できるようになりました。 + +## 前提条件 + +StarRocksクラスターでFQDNアクセスを有効にするには、以下の要件が満たされていることを確認してください。 + +- クラスター内の各マシンにはホスト名が必要です。 + +- 各マシンのファイル **/etc/hosts** に、クラスター内の他のマシンの対応するIPアドレスとFQDNを指定する必要があります。 + +- **/etc/hosts** ファイル内のIPアドレスは一意である必要があります。 + +## FQDNアクセスで新しいクラスターをセットアップする + +デフォルトでは、新しいクラスターのFEノードはIPアドレスアクセス経由で起動されます。FQDNアクセスで新しいクラスターを起動するには、**クラスターを初めて起動するとき**に、以下のコマンドを実行してFEノードを起動する必要があります。 + +```Shell +./bin/start_fe.sh --host_type FQDN --daemon +``` + +プロパティ `--host_type` は、ノードの起動に使用されるアクセス方法を指定します。有効な値には `FQDN` と `IP` が含まれます。このプロパティは、ノードを初めて起動するときに一度だけ指定する必要があります。 + +各BEノードは、FEメタデータで定義された `BE Address` で自身を識別します。そのため、BEノードを起動する際に `--host_type` を指定する必要はありません。`BE Address` がFQDNを持つBEノードを定義している場合、BEノードはこのFQDNで自身を識別します。 + +## 既存のクラスターでFQDNアクセスを有効にする + +以前にIPアドレス経由で起動された既存のクラスターでFQDNアクセスを有効にするには、まずStarRocksをバージョン2.4.0以降に**アップグレード**する必要があります。 + +### FEノードのFQDNアクセスを有効にする + +Leader FEノードのFQDNアクセスを有効にする前に、すべての非Leader Follower FEノードのFQDNアクセスを有効にする必要があります。 + +> **注意** +> +> FEノードのFQDNアクセスを有効にする前に、クラスターに少なくとも3つのFollower FEノードがあることを確認してください。 + +#### 非Leader Follower FEノードのFQDNアクセスを有効にする + +1. FEノードのデプロイディレクトリに移動し、以下のコマンドを実行してFEノードを停止します。 + + ```Shell + ./bin/stop_fe.sh + ``` + +2. MySQLクライアント経由で以下のステートメントを実行し、停止したFEノードの `Alive` ステータスを確認します。`Alive` ステータスが `false` になるまで待ちます。 + + ```SQL + SHOW PROC '/frontends'\G + ``` + +3. 以下のステートメントを実行して、IPアドレスをFQDNに置き換えます。 + + ```SQL + ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; + ``` + +4. 以下のコマンドを実行して、FQDNアクセスでFEノードを起動します。 + + ```Shell + ./bin/start_fe.sh --host_type FQDN --daemon + ``` + + プロパティ `--host_type` は、ノードの起動に使用されるアクセス方法を指定します。有効な値には `FQDN` と `IP` が含まれます。ノードを変更した後にノードを再起動するときに、このプロパティを一度だけ指定する必要があります。 + +5. FEノードの `Alive` ステータスを確認します。`Alive` ステータスが `true` になるまで待ちます。 + + ```SQL + SHOW PROC '/frontends'\G + ``` + +6. 現在のFEノードの `Alive` ステータスが `true` になった後、上記の手順を繰り返して、他の非Leader Follower FEノードのFQDNアクセスを順次有効にします。 + +#### Leader FEノードのFQDNアクセスを有効にする + +すべての非Leader FEノードが正常に変更され再起動された後、Leader FEノードのFQDNアクセスを有効にできます。 + +> **注記** +> +> Leader FEノードがFQDNアクセスで有効になる前は、ノードをクラスターに追加するために使用されるFQDNは、対応するIPアドレスに変換されます。FQDNアクセスが有効になったLeader FEノードがクラスターのために選出された後、FQDNはIPアドレスに変換されなくなります。 + +1. Leader FEノードのデプロイディレクトリに移動し、以下のコマンドを実行してLeader FEノードを停止します。 + + ```Shell + ./bin/stop_fe.sh + ``` + +2. MySQLクライアント経由で以下のステートメントを実行し、新しいLeader FEノードがクラスターのために選出されたかどうかを確認します。 + + ```SQL + SHOW PROC '/frontends'\G + ``` + + `Alive` ステータスが `true` で、`isMaster` が `true` のFEノードは、実行中のLeader FEです。 + +3. 以下のステートメントを実行して、IPアドレスをFQDNに置き換えます。 + + ```SQL + ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; + ``` + +4. 以下のコマンドを実行して、FQDNアクセスでFEノードを起動します。 + + ```Shell + ./bin/start_fe.sh --host_type FQDN --daemon + ``` + + プロパティ `--host_type` は、ノードの起動に使用されるアクセス方法を指定します。有効な値には `FQDN` と `IP` が含まれます。ノードを変更した後にノードを再起動するときに、このプロパティを一度だけ指定する必要があります。 + +5. FEノードの `Alive` ステータスを確認します。 + + ```Plain + SHOW PROC '/frontends'\G + ``` + + `Alive` ステータスが `true` になると、FEノードは正常に変更され、Follower FEノードとしてクラスターに追加されます。 + +### BEノードのFQDNアクセスを有効にする + +MySQLクライアント経由で以下のステートメントを実行し、IPアドレスをFQDNに置き換えて、BEノードのFQDNアクセスを有効にします。 + +```SQL +ALTER SYSTEM MODIFY BACKEND HOST "" TO ""; +``` + +> **注記** +> +> FQDNアクセスが有効になった後も、BEノードを再起動する必要はありません。 + +## ロールバック + +FQDNアクセスが有効になったStarRocksクラスターを、FQDNアクセスをサポートしない以前のバージョンにロールバックするには、まずクラスター内のすべてのノードのIPアドレスアクセスを有効にする必要があります。[既存のクラスターでFQDNアクセスを有効にする](#enable-fqdn-access-in-an-existing-cluster) を一般的なガイダンスとして参照できますが、SQLコマンドを以下のように変更する必要があります。 + +- FEノードのIPアドレスアクセスを有効にする: + +```SQL +ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; +``` + +- BEノードのIPアドレスアクセスを有効にする: + +```SQL +ALTER SYSTEM MODIFY BACKEND HOST "" TO ""; +``` + +変更は、クラスターが正常に再起動された後に有効になります。 + +## FAQ + +**Q: FEノードのFQDNアクセスを有効にする際に「required 1 replica. But none were active with this master」というエラーが発生します。どうすればよいですか?** + +A: FEノードのFQDNアクセスを有効にする前に、クラスターに少なくとも3つのFollower FEノードがあることを確認してください。 + +**Q: FQDNアクセスが有効なクラスターにIPアドレスを使用して新しいノードを追加できますか?** + +A: はい、できます。 diff --git a/docs/ja/administration/management/graceful_exit.md b/docs/ja/administration/management/graceful_exit.md new file mode 100644 index 0000000..fdf6f09 --- /dev/null +++ b/docs/ja/administration/management/graceful_exit.md @@ -0,0 +1,276 @@ +--- +displayed_sidebar: docs +--- + +# Graceful Exit + +v3.3以降、StarRocksはGraceful Exitをサポートしています。 + +## 概要 + +Graceful Exitは、StarRocks FE、BE、CNノードの**無停止アップグレードと再起動**をサポートするために設計されたメカニズムです。その主な目的は、ノードの再起動、ローリングアップグレード、クラスタのスケーリングなどのメンテナンス操作中に、実行中のクエリやデータ取り込みタスクへの影響を最小限に抑えることです。 + +Graceful Exitは以下を保証します。 + +- 終了が開始されると、ノードは**新しいタスクの受け入れを停止**します。 +- 既存のクエリとロードジョブは、制御された時間枠内で**完了することが許可されます**。 +- システムコンポーネント (FE/BE/CN) が**ステータス変更を調整**し、クラスタがトラフィックを正しく再ルーティングできるようにします。 + +Graceful Exitメカニズムは、以下に説明するように、FEとBE/CNノードで異なります。 + +### FE Graceful Exitメカニズム + +#### トリガーシグナル + +FE Graceful Exitは以下によって開始されます。 + +```bash +stop_fe.sh -g +``` + +これは`SIGUSR1`シグナルを送信し、デフォルトの終了(`-g`オプションなし)は`SIGTERM`シグナルを送信します。 + +#### ロードバランサー認識 + +シグナルを受信すると: + +- FEは`/api/health`エンドポイントで直ちに**HTTP 500**を返します。 +- ロードバランサーは約15秒以内に劣化状態を検出し、このFEへの新しい接続のルーティングを停止します。 + +#### コネクションドレインとシャットダウンロジック + +**Follower FE** + +- 読み取り専用クエリを処理します。 +- FEノードにアクティブなセッションがない場合、接続は直ちに閉じられます。 +- SQLが実行中の場合、FEノードはセッションを閉じる前に実行が終了するのを待ちます。 + +**Leader FE** + +- 読み取りリクエストの処理はFollower FEと同じです。 +- 書き込みリクエストの処理には以下が必要です。 + + - BDBJEを閉じます。 + - 新しいLeader選出が完了するのを待ちます。 + - 後続の書き込みを新しく選出されたLeaderにリダイレクトします。 + +#### タイムアウト制御 + +クエリが長時間実行されすぎる場合、FEは**60秒**後に強制的に終了します(`--timeout`オプションで設定可能)。 + +### BE/CN Graceful Exitメカニズム + +#### トリガーシグナル + +BE Graceful Exitは以下によって開始されます。 + +```bash +stop_be.sh -g +``` + +CN Graceful Exitは以下によって開始されます。 + +```bash +stop_cn.sh -g +``` + +これは`SIGTERM`シグナルを送信し、デフォルトの終了(オプションなし)は`SIGKILL`シグナルを送信します。 + +#### 状態遷移 + +シグナルを受信した後: + +- BE/CNノードは自身を**exiting**とマークします。 +- **新しいクエリフラグメント**を`INTERNAL_ERROR`を返すことで拒否します。 +- 既存のフラグメントの処理は継続します。 + +#### 実行中クエリの待機ループ + +BE/CNが既存のフラグメントの終了を待機する動作は、BE/CN構成`loop_count_wait_fragments_finish`(デフォルト:2)によって制御されます。実際の待機時間は`loop_count_wait_fragments_finish × 10 seconds`(つまり、デフォルトで20秒)に等しくなります。タイムアウト後にフラグメントが残っている場合、BE/CNは通常のシャットダウン(スレッド、ネットワーク、その他のプロセスのクローズ)に進みます。 + +#### FE認識の改善 + +v3.4以降、FEはハートビート障害に基づいてBE/CNを`DEAD`とマークしなくなりました。FEはBE/CNの「exiting」状態を正しく認識し、フラグメントが完了するためのGraceful Exitウィンドウを大幅に長くすることができます。 + +## 設定 + +### FE設定 + +#### `stop_fe.sh -g --timeout` + +- 説明:FEが強制終了されるまでの最大待機時間。 +- デフォルト:60 (秒) +- 適用方法:スクリプトコマンドで指定します。例:`--timeout 120`。 + +#### *最小LB検出時間* + +- 説明:LBがヘルス状態の劣化を検出するには、最低15秒が必要です。 +- デフォルト:15 (秒) +- 適用方法:固定値 + +### BE/CN設定 + +#### `loop_count_wait_fragments_finish` + +- 説明:既存のフラグメントに対するBE/CNの待機時間。この値に10秒を掛けます。 +- デフォルト:2 +- 適用方法:BE/CN設定ファイルを変更するか、動的に更新します。 + +#### `graceful_exit_wait_for_frontend_heartbeat` + +- 説明:BE/CNがFEからのハートビートを通じて**SHUTDOWN**を確認するまで待機するかどうか。v3.4.5以降。 +- デフォルト:false +- 適用方法:BE/CN設定ファイルを変更するか、動的に更新します。 + +#### `stop_be.sh -g --timeout`, `stop_cn.sh -g --timeout` + +- 説明:BE/CNが強制終了されるまでの最大待機時間。BE/CNの待機時間に達する前に終了するのを防ぐため、`loop_count_wait_fragments_finish` * 10より大きい値を設定してください。 +- デフォルト:false +- 適用方法:スクリプトコマンドで指定します。例:`--timeout 30`。 + +### グローバルスイッチ + +Graceful Exitはv3.4以降**デフォルトで有効**です。一時的に無効にするには、BE/CN設定`loop_count_wait_fragments_finish`を`0`に設定します。 + +## Graceful Exit中の期待される動作 + +### クエリワークロード + +| クエリタイプ | 期待される動作 | +| ----------------------------------- | ----------------------------------------------------------------------------------- | +| **短い (20秒未満)** | BE/CNは十分長く待機するため、クエリは通常通り完了します。 | +| **中程度 (20〜60秒)** | BE/CNの待機時間内に完了したクエリは成功を返します。それ以外の場合、クエリはキャンセルされ、手動での再試行が必要です。 | +| **長い (60秒以上)** | クエリはタイムアウトによりFE/BE/CNによって終了される可能性が高く、手動での再試行が必要です。 | + +### データ取り込みタスク + +- **FlinkまたはKafka Connectorsを介したロードタスク**は自動的に再試行され、ユーザーに目に見える中断はありません。 +- **Stream Load (非フレームワーク)、Broker Load、Routine Loadタスク**は、接続が切断された場合に失敗する可能性があります。手動での再試行が必要です。 +- **バックグラウンドタスク**は、FEの再試行メカニズムによって自動的に再スケジュールされ、実行されます。 + +### アップグレードおよび再起動操作 + +Graceful Exitは以下を保証します。 + +- クラスタ全体のダウンタイムなし。 +- ノードを1つずつドレインすることによる安全なローリングアップグレード。 + +## 制限事項とバージョンによる違い + +### バージョンの動作の違い + +| バージョン | 動作 | +| --------- | ------------------------------------------------------------------------------------------------------------------------- | +| **v3.3** | BE Graceful Exitに欠陥あり:FEがBE/CNを`DEAD`と prematurely マークし、クエリがキャンセルされる可能性があります。実質的な待機時間は制限されます(デフォルト15秒)。 | +| **v3.4+** | 拡張された待機時間を完全にサポート。FEはBE/CNの「exiting」状態を正しく識別します。本番環境での使用を推奨します。 | + +### 運用上の制限 + +- 極端なケース(例えば、BE/CNがハングする)では、Graceful Exitが失敗する可能性があります。プロセスを終了するには`kill -9`が必要となり、部分的なデータ永続性(スナップショットで回復可能)のリスクがあります。 + +## 使用方法 + +### 前提条件 + +**StarRocksバージョン**: + +- **v3.3+**:Graceful Exitの基本的なサポート。 +- **v3.4+**:強化されたステータス管理、より長い待機時間(数分まで)。 + +**構成**: + +- `loop_count_wait_fragments_finish`が正の整数に設定されていることを確認してください。 +- `graceful_exit_wait_for_frontend_heartbeat`を`true`に設定し、FEがBEの「EXITING」状態を検出できるようにします。 + +### FE Graceful Exitの実行 + +```bash +./bin/stop_fe.sh -g --timeout 60 +``` + +パラメーター: + +- `--timeout`:FEノードが強制終了されるまでの最大待機時間。 + +動作: + +- システムは最初に`SIGUSR1`シグナルを送信します。 +- タイムアウト後、`SIGKILL`にフォールバックします。 + +#### FEステータスの検証 + +以下のAPIを通じてFEのヘルス状態を確認できます。 + +``` +http://:8030/api/health +``` + +LBは、連続して200以外の応答を受信した後、ノードを削除します。 + +### BE/CN Graceful Exitの実行 + +- **v3.3の場合**: + + - BE: + + ```bash + ./be/bin/stop_be.sh -g + ``` + + - CN: + + ```bash + ./be/bin/stop_cn.sh -g + ``` + +- **v3.4+の場合**: + + - BE: + + ```bash + ./bin/stop_be.sh -g --timeout 600 + ``` + + - CN: + + ```bash + ./bin/stop_cn.sh -g --timeout 600 + ``` + +フラグメントが残っていない場合、BE/CNは直ちに終了します。 + +#### BE/CNステータスの検証 + +FEで実行: + +```sql +SHOW BACKENDS; +``` + +`StatusCode`: + +- `SHUTDOWN`:BE/CN Graceful Exitが進行中。 +- `DISCONNECTED`:BE/CNノードが完全に終了しました。 + +## ローリングアップグレードワークフロー + +### 手順 + +1. ノード`A`でGraceful Exitを実行します。 +2. ノード`A`がFE側から`DISCONNECTED`として表示されていることを確認します。 +3. ノード`A`をアップグレードして再起動します。 +4. 残りのノードについても上記を繰り返します。 + +### Graceful Exitの監視 + +FEログ`fe.log`、BEログ`be.log`、またはCNログ`cn.log`をチェックして、終了中にタスクがあったかどうかを確認します。 + +## トラブルシューティング + +### BE/CNがタイムアウトで終了する + +タスクがGraceful Exit期間内に完了しなかった場合、BE/CNは強制終了(`SIGKILL`)をトリガーします。これがタスクの実行時間が長すぎるためか、または設定が不適切(例えば、`--timeout`の値が小さすぎる)であるためかを検証してください。 + +### ノードステータスがSHUTDOWNではない + +ノードステータスが`SHUTDOWN`ではない場合、`loop_count_wait_fragments_finish`が正の整数に設定されているか、またはBE/CNが終了前にハートビートを報告したかどうか(報告していない場合は`graceful_exit_wait_for_frontend_heartbeat`を`true`に設定)を検証してください。 diff --git a/docs/zh/administration/management/BE_blacklist.md b/docs/zh/administration/management/BE_blacklist.md index ae39359..0c03a3f 100644 --- a/docs/zh/administration/management/BE_blacklist.md +++ b/docs/zh/administration/management/BE_blacklist.md @@ -4,9 +4,9 @@ displayed_sidebar: docs # 管理 BE 和 CN 黑名单 -从 v3.3.0 版本开始,StarRocks 支持 BE 黑名单功能,该功能允许您禁止在查询执行中使用某些 BE 节点,从而避免因 BE 节点连接失败而导致的频繁查询失败或其他意外行为。例如,当网络问题阻止连接到一个或多个 BE 时,就可以使用黑名单。 +从 v3.3.0 版本起,StarRocks 支持 BE 黑名单功能,该功能允许您禁止在查询执行中使用某些 BE 节点,从而避免因 BE 节点连接失败而导致的频繁查询失败或其他意外行为。一个阻止连接到一个或多个 BE 的网络问题就是使用黑名单的一个示例。 -从 v4.0 版本开始,StarRocks 支持将 Compute Node (CN) 添加到黑名单。 +从 v4.0 版本起,StarRocks 支持将 Compute Node (CN) 添加到黑名单。 默认情况下,StarRocks 可以自动管理 BE 和 CN 黑名单,将失去连接的 BE 或 CN 节点添加到黑名单中,并在连接重新建立时将其从黑名单中移除。但是,如果节点是手动加入黑名单的,StarRocks 不会将其从黑名单中移除。 @@ -19,7 +19,7 @@ displayed_sidebar: docs ## 将 BE/CN 添加到黑名单 -您可以使用 [ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md) 手动将 BE/CN 节点添加到黑名单中。在此语句中,您必须指定要加入黑名单的 BE/CN 节点的 ID。您可以通过执行 [SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md) 获取 BE ID,通过执行 [SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md) 获取 CN ID。 +您可以使用 [ADD BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/ADD_BACKEND_BLACKLIST.md) 手动将 BE/CN 节点添加到黑名单。在该语句中,您必须指定要加入黑名单的 BE/CN 节点的 ID。您可以通过执行 [SHOW BACKENDS](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_BACKENDS.md) 获取 BE ID,并通过执行 [SHOW COMPUTE NODES](../../sql-reference/sql-statements/cluster-management/nodes_processes/SHOW_COMPUTE_NODES.md) 获取 CN ID。 示例: @@ -43,17 +43,17 @@ SHOW COMPUTE NODES\G ADD COMPUTE NODE BLACKLIST 10005; ``` -## 将 BE/CN 从黑名单中移除 +## 从黑名单中移除 BE/CN -您可以使用 [DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md) 手动将 BE/CN 节点从黑名单中移除。在此语句中,您也必须指定 BE/CN 节点的 ID。 +您可以使用 [DELETE BACKEND/COMPUTE NODE BLACKLIST](../../sql-reference/sql-statements/cluster-management/nodes_processes/DELETE_BACKEND_BLACKLIST.md) 手动将 BE/CN 节点从黑名单中移除。在该语句中,您也必须指定 BE/CN 节点的 ID。 示例: ```SQL --- 将 BE 从黑名单中移除。 +-- 从黑名单中移除 BE。 DELETE BACKEND BLACKLIST 10001; --- 将 CN 从黑名单中移除。 +-- 从黑名单中移除 CN。 DELETE COMPUTE NODE BLACKLIST 10005; ``` @@ -83,20 +83,20 @@ SHOW COMPUTE NODE BLACKLIST; 返回以下字段: -- `AddBlackListType`: BE/CN 节点是如何被添加到黑名单的。`MANUAL` 表示用户手动将其加入黑名单。`AUTO` 表示 StarRocks 自动将其加入黑名单。 -- `LostConnectionTime`: - - 对于 `MANUAL` 类型,表示 BE/CN 节点被手动添加到黑名单的时间。 +- `AddBlackListType`:BE/CN 节点是如何添加到黑名单的。`MANUAL` 表示用户手动将其加入黑名单。`AUTO` 表示 StarRocks 自动将其加入黑名单。 +- `LostConnectionTime`: + - 对于 `MANUAL` 类型,表示 BE/CN 节点手动添加到黑名单的时间。 - 对于 `AUTO` 类型,表示上次成功建立连接的时间。 -- `LostConnectionNumberInPeriod`: 在 `CheckTimePeriod(s)` 内检测到的断开连接次数,`CheckTimePeriod(s)` 是 StarRocks 检查黑名单中 BE/CN 节点连接状态的间隔。 -- `CheckTimePeriod(s)`: StarRocks 检查黑名单中 BE/CN 节点连接状态的间隔。其值等于您为 FE 配置项 `black_host_history_sec` 指定的值。单位:秒。 +- `LostConnectionNumberInPeriod`:在 `CheckTimePeriod(s)`(StarRocks 检查黑名单中 BE/CN 节点连接状态的间隔)内检测到的断开连接次数。 +- `CheckTimePeriod(s)`:StarRocks 检查黑名单中 BE/CN 节点连接状态的间隔。其值评估为您为 FE 配置项 `black_host_history_sec` 指定的值。单位:秒。 ## 配置 BE/CN 黑名单的自动管理 -每当 BE/CN 节点失去与 FE 节点的连接,或者由于 BE/CN 节点上的查询超时而失败时,FE 节点都会将该 BE/CN 节点添加到其 BE 和 CN 黑名单中。FE 节点将通过计算其在特定时间段内的连接失败次数,持续评估黑名单中 BE/CN 节点的连接性。仅当 BE/CN 节点的连接失败次数低于预设阈值时,StarRocks 才会将其从黑名单中移除。 +每当 BE/CN 节点与 FE 节点失去连接,或者由于 BE/CN 节点超时导致查询失败时,FE 节点都会将该 BE/CN 节点添加到其 BE 和 CN 黑名单中。FE 节点将通过计算 BE/CN 节点在特定时间段内的连接失败次数,持续评估黑名单中 BE/CN 节点的连通性。StarRocks 仅在 BE/CN 节点的连接失败次数低于预设阈值时,才会将其从黑名单中移除。 -您可以使用以下 [FE 配置](./FE_configuration.md) 配置 BE 和 CN 黑名单的自动管理: +您可以使用以下 [FE configurations](./FE_configuration.md) 配置 BE 和 CN 黑名单的自动管理: -- `black_host_history_sec`: 黑名单中保留 BE/CN 节点历史连接失败记录的时间长度。 -- `black_host_connect_failures_within_time`: 允许黑名单中的 BE/CN 节点发生连接失败的阈值。 +- `black_host_history_sec`:黑名单中 BE/CN 节点历史连接失败的保留时间。 +- `black_host_connect_failures_within_time`:黑名单中 BE/CN 节点允许的连接失败阈值。 -如果 BE/CN 节点是自动添加到黑名单的,StarRocks 将评估其连接性并判断是否可以将其从黑名单中移除。在 `black_host_history_sec` 期间,只有当黑名单中的 BE/CN 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从黑名单中移除。 +如果 BE/CN 节点自动添加到黑名单中,StarRocks 将评估其连通性并判断是否可以将其从黑名单中移除。在 `black_host_history_sec` 期间,只有当黑名单中的 BE/CN 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从黑名单中移除。 diff --git a/docs/zh/administration/management/BE_configuration.md b/docs/zh/administration/management/BE_configuration.md index b0ff5c3..d514e3f 100644 --- a/docs/zh/administration/management/BE_configuration.md +++ b/docs/zh/administration/management/BE_configuration.md @@ -39,8 +39,8 @@ curl http://:/varz - 默认值: 1800000 (30 分钟) - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 控制 DiagnoseDaemon 对 `STACK_TRACE` 请求执行连续堆栈跟踪诊断之间的最短时间间隔。当诊断请求到达时,如果上次收集堆栈跟踪的时间距今不到 `diagnose_stack_trace_interval_ms` 毫秒,则守护进程会跳过收集和记录堆栈跟踪。增加此值可减少频繁堆栈转储带来的 CPU 开销和日志量;减少此值可捕获更频繁的跟踪以调试瞬时问题(例如,在长时间 `TabletsChannel::add_chunk` 阻塞的负载故障点模拟中)。 +- 是否可变: 是 +- 描述: 控制 DiagnoseDaemon 处理 `STACK_TRACE` 请求时,连续栈追踪诊断之间的最小时间间隔。当诊断请求到达时,如果上次收集的栈追踪发生在 `diagnose_stack_trace_interval_ms` 毫秒内,Daemon 将跳过收集和记录栈追踪。增加此值可以减少因频繁栈转储造成的 CPU 开销和日志量;减少此值可以捕获更频繁的追踪,以调试瞬态问题(例如,在长时间 `TabletsChannel::add_chunk` 阻塞的负载故障点模拟中)。 - 引入版本: v3.5.0 ##### lake_replication_slow_log_ms @@ -48,8 +48,8 @@ curl http://:/varz - 默认值: 30000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: Lake Replication 期间发出慢日志条目的阈值。每次文件复制后,代码会测量以微秒为单位的经过时间,并在经过时间大于或等于 `lake_replication_slow_log_ms * 1000` 时将操作标记为慢速。触发时,StarRocks 会写入一条 INFO 日志,其中包含该复制文件的文件大小、成本和跟踪指标。增加此值可减少大型/慢速传输中嘈杂的慢日志;减少此值可更快地检测并发现较小的慢速复制事件。 +- 是否可变: 是 +- 描述: Lake Replication 期间发出慢日志条目的阈值。每次文件复制后,代码会测量经过的时间(微秒),并将操作标记为慢速,如果经过的时间大于或等于 `lake_replication_slow_log_ms * 1000`。触发时,StarRocks 会写入一个 INFO 级别日志,其中包含该复制文件的大小、成本和追踪指标。增加此值可减少大/慢速传输造成的嘈杂慢日志;减少此值可更早地检测并发现较小的慢速复制事件。 - 引入版本: - ##### load_rpc_slow_log_frequency_threshold_seconds @@ -57,8 +57,8 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 控制系统打印超出其配置的 RPC 超时的加载 RPC 慢日志条目的频率。慢日志还包括加载通道运行时配置文件。将此值设置为 0 实际上会导致每个超时都进行日志记录。 +- 是否可变: 是 +- 描述: 控制系统打印超出其配置的 RPC 超时的负载 RPC 慢日志条目的频率。慢日志还包括负载通道运行时配置文件。将此值设置为 0 实际上会导致每次超时都进行日志记录。 - 引入版本: v3.4.3, v3.5.0 ##### log_buffer_level @@ -66,8 +66,8 @@ curl http://:/varz - 默认值: 空字符串 - 类型: String - 单位: - -- 可变: 否 -- 描述: 刷新日志的策略。默认值表示日志在内存中缓冲。有效值为 `-1` 和 `0`。`-1` 表示日志不在内存中缓冲。 +- 是否可变: 否 +- 描述: 日志刷新策略。默认值表示日志在内存中缓冲。有效值为 `-1` 和 `0`。`-1` 表示日志不缓冲在内存中。 - 引入版本: - ##### pprof_profile_dir @@ -75,8 +75,8 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/log` - 类型: String - 单位: - -- 可变: 否 -- 描述: StarRocks 写入 pprof 构件(Jemalloc 堆快照和 gperftools CPU 配置文件)的目录路径。 +- 是否可变: 否 +- 描述: StarRocks 写入 pprof 制品 (Jemalloc 堆快照和 gperftools CPU 配置文件) 的目录路径。 - 引入版本: v3.2.0 ##### sys_log_dir @@ -84,8 +84,8 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/log` - 类型: String - 单位: - -- 可变: 否 -- 描述: 存储系统日志(包括 INFO、WARNING、ERROR 和 FATAL)的目录。 +- 是否可变: 否 +- 描述: 存储系统日志 (包括 INFO, WARNING, ERROR 和 FATAL) 的目录。 - 引入版本: - ##### sys_log_level @@ -93,8 +93,8 @@ curl http://:/varz - 默认值: INFO - 类型: String - 单位: - -- 可变: 是 (从 v3.3.0, v3.2.7 和 v3.1.12 开始) -- 描述: 系统日志条目的严重性级别。有效值:INFO、WARN、ERROR 和 FATAL。此项从 v3.3.0、v3.2.7 和 v3.1.12 版本开始更改为动态配置。 +- 是否可变: 是 (从 v3.3.0, v3.2.7 和 v3.1.12 开始) +- 描述: 系统日志条目的严重性级别。有效值: INFO, WARN, ERROR, FATAL。此项从 v3.3.0, v3.2.7 和 v3.1.12 开始变为动态配置。 - 引入版本: - ##### sys_log_roll_mode @@ -102,8 +102,8 @@ curl http://:/varz - 默认值: SIZE-MB-1024 - 类型: String - 单位: - -- 可变: 否 -- 描述: 系统日志分段为日志卷的模式。有效值包括 `TIME-DAY`、`TIME-HOUR` 和 `SIZE-MB-`size。默认值表示日志被分段为大小为 1 GB 的日志卷。 +- 是否可变: 否 +- 描述: 系统日志分段滚动的模式。有效值包括 `TIME-DAY`、`TIME-HOUR` 和 `SIZE-MB-`size。默认值表示日志分段滚动,每个段 1 GB。 - 引入版本: - ##### sys_log_roll_num @@ -111,8 +111,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 保留的日志卷数量。 +- 是否可变: 否 +- 描述: 保留的日志段数量。 - 引入版本: - ##### sys_log_timezone @@ -120,7 +120,7 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 是否在日志前缀中显示时区信息。`true` 表示显示时区信息,`false` 表示不显示。 - 引入版本: - @@ -129,17 +129,17 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 要打印的日志级别。此配置项用于控制代码中以 VLOG 启动的日志的输出。 +- 是否可变: 否 +- 描述: 要打印的日志级别。此配置项用于控制代码中以 VLOG 开头的日志输出。 - 引入版本: - ##### sys_log_verbose_modules -- 默认值: +- 默认值: - 类型: Strings - 单位: - -- 可变: 否 -- 描述: 要打印的日志模块。例如,如果将此配置项设置为 OLAP,StarRocks 只打印 OLAP 模块的日志。有效值为 BE 中的命名空间,包括 `starrocks`、`starrocks::debug`、`starrocks::fs`、`starrocks::io`、`starrocks::lake`、`starrocks::pipeline`、`starrocks::query_cache`、`starrocks::stream` 和 `starrocks::workgroup`。 +- 是否可变: 否 +- 描述: 要打印的日志模块。例如,如果您将此配置项设置为 OLAP,StarRocks 只打印 OLAP 模块的日志。有效值是 BE 中的命名空间,包括 `starrocks`、`starrocks::debug`、`starrocks::fs`、`starrocks::io`、`starrocks::lake`、`starrocks::pipeline`、`starrocks::query_cache`、`starrocks::stream` 和 `starrocks::workgroup`。 - 引入版本: - ### 服务器 @@ -149,8 +149,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 当单个分配请求超过配置的大分配阈值(`g_large_memory_alloc_failure_threshold > 0` 且请求大小 `>` 阈值)时,此标志控制进程的响应方式。如果为 true,当检测到此类大分配时,StarRocks 会立即调用 `std::abort()`(硬崩溃)。如果为 false,则分配被阻塞,分配器返回失败(nullptr 或 ENOMEM),以便调用者可以处理错误。此检查仅对未通过 `TRY_CATCH_BAD_ALLOC` 路径包装的分配生效(当捕获到 bad-alloc 时,内存 hook 使用不同的流程)。启用此项可用于对意外的巨大分配进行快速失败调试;除非您希望在尝试进行超大分配时立即中止进程,否则在生产环境中请保持禁用状态。 +- 是否可变: 是 +- 描述: 当单个分配请求超过配置的大分配阈值(`g_large_memory_alloc_failure_threshold` > 0 且请求大小 > 阈值)时,此标志控制进程的响应方式。如果为 true,当检测到此类大分配时,StarRocks 会立即调用 `std::abort()`(硬崩溃)。如果为 false,则分配被阻塞,分配器返回失败(nullptr 或 ENOMEM),以便调用者可以处理错误。此检查仅对未通过 TRY_CATCH_BAD_ALLOC 路径包装的分配生效(当捕获 bad-alloc 时,内存 hook 使用不同的流程)。启用此功能可快速调试意外的巨大分配;在生产环境中禁用此功能,除非您希望在尝试进行超大分配时立即中止进程。 - 引入版本: v3.4.3, 3.5.0, 4.0.0 ##### arrow_flight_port @@ -158,8 +158,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: - -- 可变: 否 -- 描述: BE Arrow Flight SQL 服务器的 TCP 端口。`-1` 表示禁用 Arrow Flight 服务。在非 macOS 构建中,BE 在启动期间会使用此端口调用 Arrow Flight SQL 服务器;如果端口不可用,服务器启动将失败,BE 进程会退出。配置的端口会在心跳载荷中报告给 FE。 +- 是否可变: 否 +- 描述: BE Arrow Flight SQL 服务器的 TCP 端口。`-1` 表示禁用 Arrow Flight 服务。在非 macOS 构建中,BE 在启动期间使用此端口调用 Arrow Flight SQL 服务器;如果端口不可用,服务器启动失败且 BE 进程退出。配置的端口在心跳负载中报告给 FE。 - 引入版本: v3.4.0, v3.5.0 ##### be_exit_after_disk_write_hang_second @@ -167,7 +167,7 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 秒 -- 可变: 否 +- 是否可变: 否 - 描述: 磁盘挂起后 BE 等待退出的时间长度。 - 引入版本: - @@ -176,7 +176,7 @@ curl http://:/varz - 默认值: 48 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: HTTP 服务器使用的线程数。 - 引入版本: - @@ -185,7 +185,7 @@ curl http://:/varz - 默认值: 8040 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE HTTP 服务器端口。 - 引入版本: - @@ -194,7 +194,7 @@ curl http://:/varz - 默认值: 9060 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE Thrift 服务器端口,用于接收来自 FE 的请求。 - 引入版本: - @@ -203,8 +203,8 @@ curl http://:/varz - 默认值: 64 - 类型: Int - 单位: 线程 -- 可变: 否 -- 描述: BE Thrift 服务器用于处理后端 RPC/执行请求的工作线程数。此值在创建 BackendService 时传递给 ThriftServer,并控制有多少并发请求处理器可用;当所有工作线程都忙时,请求会排队。根据预期的并发 RPC 负载和可用的 CPU/内存进行调整:增加它会提高并发性,但也会增加每个线程的内存和上下文切换成本;减少它会限制并行处理,并可能增加请求延迟。 +- 是否可变: 否 +- 描述: BE Thrift 服务器用于处理后端 RPC/执行请求的工作线程数。此值在创建 BackendService 时传递给 ThriftServer,并控制可用的并发请求处理程序数量;当所有工作线程都忙时,请求将被排队。根据预期的并发 RPC 负载和可用的 CPU/内存进行调整:增加它会提高并发性,但也会增加每个线程的内存和上下文切换成本;减少它会限制并行处理并可能增加请求延迟。 - 引入版本: v3.2.0 ##### brpc_connection_type @@ -212,12 +212,12 @@ curl http://:/varz - 默认值: `"single"` - 类型: string - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: bRPC 通道连接模式。有效值: - - `"single"` (默认): 每个通道一个持久 TCP 连接。 - - `"pooled"`: 一个持久连接池,以更高的并发性为代价,但会使用更多套接字/文件描述符。 - - `"short"`: 每个 RPC 创建的短生命周期连接,以减少持久资源使用但延迟较高。 - 选择会影响每个套接字的缓冲行为,并可能在未写入字节超过套接字限制时影响 `Socket.Write` 失败 (EOVERCROWDED)。 + - `"single"` (默认值): 每个通道一个持久 TCP 连接。 + - `"pooled"`: 一个持久连接池,用于提高并发性,但会消耗更多 socket/文件描述符。 + - `"short"`: 每个 RPC 创建的短生命周期连接,以减少持久资源使用,但延迟较高。 + 此选择会影响每个 socket 的缓冲行为,并可能影响当未写入字节超过 socket 限制时的 `Socket.Write` 失败 (EOVERCROWDED)。 - 引入版本: v3.2.5 ##### brpc_max_body_size @@ -225,7 +225,7 @@ curl http://:/varz - 默认值: 2147483648 - 类型: Int - 单位: 字节 -- 可变: 否 +- 是否可变: 否 - 描述: bRPC 的最大主体大小。 - 引入版本: - @@ -234,8 +234,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 客户端为每个远程服务器端点保留的最大持久 bRPC 连接数。对于每个端点,`BrpcStubCache` 创建一个 `StubPool`,其 `_stubs` 向量被保留为此大小。首次访问时,会创建新的存根,直到达到限制。之后,现有存根会以轮询方式返回。增加此值会提高每个端点的并发性(减少单个通道上的争用),但会增加文件描述符、内存和通道的成本。 +- 是否可变: 否 +- 描述: 客户端为每个远程服务器端点维护的最大持久 bRPC 连接数。对于每个端点,`BrpcStubCache` 会创建一个 `StubPool`,其 `_stubs` 向量被预留为该大小。在首次访问时,会创建新的 stub,直到达到限制。此后,现有 stub 会以循环方式返回。增加此值会提高每个端点的并发性(减少单个通道上的争用),但会增加文件描述符、内存和通道的成本。 - 引入版本: v3.2.0 ##### brpc_num_threads @@ -243,7 +243,7 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: bRPC 的 bthread 数量。值 `-1` 表示与 CPU 线程数相同。 - 引入版本: - @@ -252,7 +252,7 @@ curl http://:/varz - 默认值: 8060 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE bRPC 端口,用于查看 bRPC 的网络统计信息。 - 引入版本: - @@ -261,8 +261,8 @@ curl http://:/varz - 默认值: 1073741824 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: 设置 bRPC 服务器中每个套接字未写入的出站字节的限制。当套接字上缓冲的、尚未写入的数据量达到此限制时,后续的 `Socket.Write` 调用将以 EOVERCROWDED 失败。这可以防止每个连接的内存无限制增长,但可能会导致非常大的消息或慢速对等体的 RPC 发送失败。将此值与 `brpc_max_body_size` 对齐,以确保单个消息体不会大于允许的未写入缓冲区。增加此值会增加每个连接的内存使用量。 +- 是否可变: 否 +- 描述: 设置 bRPC 服务器中每个 socket 未写入出站字节的限制。当 socket 上缓冲的未写入数据量达到此限制时,后续的 `Socket.Write` 调用将以 EOVERCROWDED 失败。这可以防止每个连接的内存无限制增长,但可能导致非常大的消息或慢速对等体发送 RPC 失败。将此值与 `brpc_max_body_size` 对齐,以确保单个消息体不大于允许的未写入缓冲区。增加此值会提高每个连接的内存使用量。 - 引入版本: v3.2.0 ##### brpc_stub_expire_s @@ -270,8 +270,8 @@ curl http://:/varz - 默认值: 3600 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: bRPC 存根缓存的过期时间。默认值为 60 分钟。 +- 是否可变: 是 +- 描述: bRPC stub 缓存的过期时间。默认值是 60 分钟。 - 引入版本: - ##### compress_rowbatches @@ -279,8 +279,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 一个布尔值,控制是否在 BE 之间的 RPC 中压缩 Row Batch。`true` 表示压缩 Row Batch,`false` 表示不压缩。 +- 是否可变: 否 +- 描述: 一个布尔值,控制 BE 之间 RPC 是否压缩行批次。`true` 表示压缩行批次,`false` 表示不压缩。 - 引入版本: - ##### consistency_max_memory_limit_percent @@ -288,8 +288,8 @@ curl http://:/varz - 默认值: 20 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 用于计算与一致性相关任务的内存预算的百分比上限。在 BE 启动期间,最终的一致性限制被计算为从 `consistency_max_memory_limit`(字节)解析的值与 (`process_mem_limit * consistency_max_memory_limit_percent / 100`) 中的最小值。如果 `process_mem_limit` 未设置 (-1),则一致性内存被视为无限制。对于 `consistency_max_memory_limit_percent`,小于 0 或大于 100 的值被视为 100。调整此值会增加或减少为一致性操作保留的内存,因此会影响查询和其他服务可用的内存。 +- 是否可变: 否 +- 描述: 用于计算一致性相关任务内存预算的百分比上限。在 BE 启动期间,最终的一致性限制计算为从 `consistency_max_memory_limit`(字节)解析的值和 (`process_mem_limit * consistency_max_memory_limit_percent / 100`) 中的最小值。如果 `process_mem_limit` 未设置(-1),则一致性内存被视为无限。对于 `consistency_max_memory_limit_percent`,小于 0 或大于 100 的值被视为 100。调整此值会增加或减少为一致性操作保留的内存,因此会影响查询和其他服务可用的内存。 - 引入版本: v3.2.0 ##### delete_worker_count_normal_priority @@ -297,8 +297,8 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: 线程 -- 可变: 否 -- 描述: 专门用于处理 BE Agent 上删除 (REALTIME_PUSH with DELETE) 任务的普通优先级工作线程数。在启动时,此值会添加到 `delete_worker_count_high_priority` 以确定 DeleteTaskWorkerPool 的大小(参见 agent_server.cpp)。该池将前 `delete_worker_count_high_priority` 个线程分配为 HIGH 优先级,其余为 NORMAL;普通优先级线程处理标准删除任务并有助于提高总体删除吞吐量。增加此值可提高并发删除容量(更高的 CPU/IO 使用率);减少此值可降低资源争用。 +- 是否可变: 否 +- 描述: 分配给 BE 代理上处理删除(带有 DELETE 的 REALTIME_PUSH)任务的普通优先级工作线程数。在启动时,此值会添加到 `delete_worker_count_high_priority` 中,以确定 DeleteTaskWorkerPool 的大小(参见 agent_server.cpp)。池将前 `delete_worker_count_high_priority` 个线程分配为 HIGH 优先级,其余为 NORMAL;普通优先级线程处理标准删除任务并有助于整体删除吞吐量。增加此值可提高并发删除容量(更高的 CPU/IO 使用率);减少此值可降低资源争用。 - 引入版本: v3.2.0 ##### disable_mem_pools @@ -306,8 +306,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 是否禁用 MemPool。当此项设置为 `true` 时,MemPool 块池化被禁用,因此每个分配都会获得其自己的大小块,而不是重用或增加池化块。禁用池化会减少长期保留的缓冲区内存,但会增加分配频率、块数量增加并跳过完整性检查(由于块数量过多而避免)。保持 `disable_mem_pools` 为 `false`(默认)以受益于分配重用和更少的系统调用。仅当您必须避免大量池化内存保留(例如,低内存环境或诊断运行)时才将其设置为 `true`。 +- 是否可变: 否 +- 描述: 是否禁用 MemPool。当此项设置为 `true` 时,MemPool 块池化被禁用,因此每个分配都会获得其自己的大小块,而不是重用或增加池化块。禁用池化会减少长期保留的缓冲内存,但代价是分配更频繁、块数增加,并跳过完整性检查(由于块数大而避免)。保持 `disable_mem_pools` 为 `false`(默认值)以受益于分配重用和更少的系统调用。仅当必须避免大量池化内存保留(例如,低内存环境或诊断运行)时才将其设置为 `true`。 - 引入版本: v3.2.0 ##### enable_https @@ -315,8 +315,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 当此项设置为 `true` 时,BE 的 bRPC 服务器被配置为使用 TLS:`ServerOptions.ssl_options` 将在 BE 启动时填充 `ssl_certificate_path` 和 `ssl_private_key_path` 指定的证书和私钥。这为传入的 bRPC 连接启用 HTTPS/TLS;客户端必须使用 TLS 连接。确保证书和密钥文件存在,BE 进程可访问,并且符合 bRPC/SSL 期望。 +- 是否可变: 否 +- 描述: 当此项设置为 `true` 时,BE 的 bRPC 服务器将配置为使用 TLS:`ServerOptions.ssl_options` 将在 BE 启动时填充 `ssl_certificate_path` 和 `ssl_private_key_path` 指定的证书和私钥。这为传入的 bRPC 连接启用 HTTPS/TLS;客户端必须使用 TLS 连接。确保证书和密钥文件存在,BE 进程可访问,并符合 bRPC/SSL 预期。 - 引入版本: v4.0.0 ##### enable_jemalloc_memory_tracker @@ -324,8 +324,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 当此项设置为 `true` 时,BE 会启动一个后台线程 (jemalloc_tracker_daemon),该线程每秒轮询 Jemalloc 统计信息一次,并使用 Jemalloc "stats.metadata" 值更新 GlobalEnv Jemalloc 元数据 MemTracker。这确保了 Jemalloc 元数据消耗被纳入 StarRocks 进程内存核算,并防止了 Jemalloc 内部使用的内存报告不足。该跟踪器仅在非 macOS 构建中编译/启动 (#ifndef __APPLE__) 并作为名为 "jemalloc_tracker_daemon" 的守护线程运行。由于此设置影响启动行为和维护 MemTracker 状态的线程,因此更改它需要重新启动。仅当未使用 Jemalloc 或 Jemalloc 跟踪以不同方式有意管理时才禁用;否则保持启用以维护准确的内存核算和分配安全措施。 +- 是否可变: 否 +- 描述: 当此项设置为 `true` 时,BE 会启动一个后台线程(jemalloc_tracker_daemon),该线程每秒轮询 jemalloc 统计信息一次,并使用 jemalloc 的 "stats.metadata" 值更新 GlobalEnv jemalloc 元数据 MemTracker。这确保了 jemalloc 元数据消耗被包含在 StarRocks 进程内存核算中,并防止低估 jemalloc 内部使用的内存。跟踪器仅在非 macOS 构建上编译/启动(#ifndef __APPLE__),并作为名为 "jemalloc_tracker_daemon" 的守护线程运行。由于此设置会影响启动行为和维护 MemTracker 状态的线程,更改它需要重新启动。仅当不使用 jemalloc 或 jemalloc 跟踪有意以不同方式管理时才禁用;否则保持启用以维护准确的内存核算和分配保护。 - 引入版本: v3.2.12 ##### enable_jvm_metrics @@ -333,8 +333,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 控制系统是否在启动时初始化和注册 JVM 特定的指标。启用时,指标子系统将创建 JVM 相关的收集器(例如,堆、GC 和线程指标)用于导出,禁用时,这些收集器不会初始化。此参数旨在向前兼容,并可能在未来版本中删除。使用 `enable_system_metrics` 控制系统级指标收集。 +- 是否可变: 否 +- 描述: 控制系统是否在启动时初始化和注册 JVM 特定的指标。启用时,指标子系统将创建 JVM 相关的收集器(例如,堆、GC 和线程指标)以供导出;禁用时,这些收集器不会被初始化。此参数用于向前兼容,并可能在未来版本中移除。使用 `enable_system_metrics` 控制系统级指标收集。 - 引入版本: v4.0.0 ##### get_pindex_worker_count @@ -342,8 +342,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 设置 UpdateManager 中 "get_pindex" 线程池的工作线程数,该线程池用于加载/获取持久化索引数据(在应用主键表的 Rowset 时使用)。在运行时,配置更新将调整池的最大线程数:如果 `>0`,则应用该值;如果为 0,则运行时回调使用 CPU 核心数 (CpuInfo::num_cores())。在初始化时,池的最大线程数计算为 max(get_pindex_worker_count, max_apply_thread_cnt * 2),其中 `max_apply_thread_cnt` 是 apply 线程池的最大值。增加此值可提高 pindex 加载的并行度;降低此值可减少并发性并降低内存/CPU 使用率。 +- 是否可变: 是 +- 描述: 设置 UpdateManager 中 "get_pindex" 线程池的工作线程数,该线程池用于加载/获取持久索引数据(用于为主键表应用 RowSet)。在运行时,配置更新将调整池的最大线程数:如果 `>0`,则应用该值;如果为 0,则运行时回调使用 CPU 核心数 (CpuInfo::num_cores())。在初始化时,池的最大线程数计算为 max(get_pindex_worker_count, max_apply_thread_cnt * 2),其中 max_apply_thread_cnt 是 apply 线程池的最大值。增加此值可提高 pindex 加载的并行性;降低此值可减少并发性和内存/CPU 使用率。 - 引入版本: v3.2.0 ##### heartbeat_service_port @@ -351,7 +351,7 @@ curl http://:/varz - 默认值: 9050 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE 心跳服务端口,用于接收来自 FE 的心跳。 - 引入版本: - @@ -360,7 +360,7 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE 心跳服务的线程数。 - 引入版本: - @@ -369,8 +369,8 @@ curl http://:/varz - 默认值: `${UDF_RUNTIME_DIR}` - 类型: string - 单位: - -- 可变: 否 -- 描述: BE 上的本地目录,用于暂存 UDF(用户定义函数)库以及 Python UDF 工作进程操作的目录。StarRocks 将 UDF 库从 HDFS 复制到此路径,在 `/pyworker_` 创建每个工作进程的 Unix 域套接字,并在执行前将 Python 工作进程的当前目录更改为该目录。该目录必须存在,BE 进程可写入,并位于支持 Unix 域套接字(即本地文件系统)的文件系统上。由于此配置在运行时不可变,请在启动前设置,并确保每个 BE 上具有足够的权限和磁盘空间。 +- 是否可变: 否 +- 描述: BE 上的本地目录,用于暂存 UDF(用户定义函数)库和 Python UDF 工作进程操作。StarRocks 会将 UDF 库从 HDFS 复制到此路径,在 `/pyworker_` 创建每个 worker 的 Unix 域 socket,并在执行前将 Python worker 进程切换到此目录。该目录必须存在,BE 进程可写入,并驻留在支持 Unix 域 socket 的文件系统上(即本地文件系统)。由于此配置在运行时不可变,请在启动前设置它,并确保每个 BE 上都有足够的权限和磁盘空间。 - 引入版本: v3.2.0 ##### max_transmit_batched_bytes @@ -378,8 +378,8 @@ curl http://:/varz - 默认值: 262144 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: 在刷新到网络之前,单个传输请求中要累积的最大序列化字节数。发送方实现将序列化的 ChunkPB 载荷添加到 PTransmitChunkParams 请求中,并在累积字节数超过 `max_transmit_batched_bytes` 或达到 EOS 时发送请求。增加此值可减少 RPC 频率并提高吞吐量,但会以更高的每个请求延迟和内存使用为代价;减少此值可降低延迟和内存,但会增加 RPC 速率。 +- 是否可变: 否 +- 描述: 在刷新到网络之前,单个传输请求中要累积的最大序列化字节数。发送方实现将序列化后的 ChunkPB 负载添加到 PTransmitChunkParams 请求中,并在累积字节数超过 `max_transmit_batched_bytes` 或达到 EOS 时发送请求。增加此值可减少 RPC 频率并提高吞吐量,但会增加每个请求的延迟和内存使用;减少此值可降低延迟和内存,但会增加 RPC 速率。 - 引入版本: v3.2.0 ##### mem_limit @@ -387,8 +387,8 @@ curl http://:/varz - 默认值: 90% - 类型: String - 单位: - -- 可变: 否 -- 描述: BE 进程内存上限。可以将其设置为百分比("80%")或物理限制("100G")。默认硬限制是服务器内存大小的 90%,软限制是 80%。如果要在同一服务器上部署 StarRocks 和其他内存密集型服务,则需要配置此参数。 +- 是否可变: 否 +- 描述: BE 进程内存上限。您可以将其设置为百分比(“80%”)或物理限制(“100G”)。默认硬限制是服务器内存大小的 90%,软限制是 80%。如果要在同一服务器上部署 StarRocks 和其他内存密集型服务,则需要配置此参数。 - 引入版本: - ##### memory_max_alignment @@ -396,8 +396,8 @@ curl http://:/varz - 默认值: 16 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: 设置 MemPool 将接受的对齐分配的最大字节对齐。仅当调用者需要更大的对齐时(例如,用于 SIMD、设备缓冲区或 ABI 约束)才增加此值。较大的值会增加每个分配的填充和保留内存浪费,并且必须保持在系统分配器和平台支持的范围内。 +- 是否可变: 否 +- 描述: 设置 MemPool 将接受的对齐分配的最大字节对齐。仅当调用者需要更大对齐时(用于 SIMD、设备缓冲区或 ABI 约束)才增加此值。较大的值会增加每个分配的填充和保留内存浪费,并且必须保持在系统分配器和平台支持的范围内。 - 引入版本: v3.2.0 ##### memory_urgent_level @@ -405,8 +405,8 @@ curl http://:/varz - 默认值: 85 - 类型: long - 单位: 百分比 (0-100) -- 可变: 是 -- 描述: 以进程内存限制的百分比表示的紧急内存水位。当进程内存消耗超过 `(limit * memory_urgent_level / 100)` 时,BE 会触发即时内存回收,这将强制数据缓存收缩,逐出更新缓存,并导致持久/Lake MemTable 被视为“满”,以便它们将很快被刷新/压缩。代码验证此设置必须大于 `memory_high_level`,并且 `memory_high_level` 必须大于或等于 `1` 且小于或等于 `100`。较低的值会导致更激进、更早的回收,即更频繁的缓存逐出和刷新。较高的值会延迟回收,如果太接近 100,则存在 OOM 风险。将此项与 `memory_high_level` 和数据缓存相关自动调整设置一起调整。 +- 是否可变: 是 +- 描述: 紧急内存水位线,表示为进程内存限制的百分比。当进程内存消耗超过 `(limit * memory_urgent_level / 100)` 时,BE 会触发即时内存回收,这会强制数据缓存收缩、驱逐更新缓存,并使持久/Lake MemTable 被视为“已满”,因此它们将很快被刷新/压缩。代码验证此设置必须大于 `memory_high_level`,并且 `memory_high_level` 必须大于或等于 `1`,且小于或等于 `100`。较低的值会导致更激进、更早的回收,即更频繁的缓存逐出和刷新。较高的值会延迟回收,如果太接近 100,则有 OOM 风险。将此项与 `memory_high_level` 和数据缓存相关的自动调整设置一起调整。 - 引入版本: v3.2.0 ##### net_use_ipv6_when_priority_networks_empty @@ -414,7 +414,7 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 一个布尔值,控制当 `priority_networks` 未指定时是否优先使用 IPv6 地址。`true` 表示当托管节点的服务器同时具有 IPv4 和 IPv6 地址且 `priority_networks` 未指定时,允许系统优先使用 IPv6 地址。 - 引入版本: v3.3.0 @@ -423,8 +423,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: 核心 -- 可变: 否 -- 描述: 控制系统用于 CPU 敏感决策(例如,线程池大小调整和运行时调度)的 CPU 核心数。值为 0 启用自动检测:系统读取 `/proc/cpuinfo` 并使用所有可用核心。如果设置为正整数,则该值将覆盖检测到的核心数并成为有效的核心数。在容器内运行时,cgroup cpuset 或 cpu quota 设置可以进一步限制可用的核心;`CpuInfo` 也尊重这些 cgroup 限制。 +- 是否可变: 否 +- 描述: 控制系统用于 CPU 感知决策(例如,线程池大小调整和运行时调度)的 CPU 核心数。值为 0 启用自动检测:系统读取 `/proc/cpuinfo` 并使用所有可用核心。如果设置为正整数,该值将覆盖检测到的核心数并成为实际核心数。在容器内运行时,cgroup cpuset 或 cpu quota 设置可以进一步限制可用核心;`CpuInfo` 也遵守这些 cgroup 限制。 - 引入版本: v3.2.0 ##### plugin_path @@ -432,8 +432,8 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/plugin` - 类型: String - 单位: - -- 可变: 否 -- 描述: StarRocks 加载外部插件(动态库、连接器构件、UDF 二进制文件等)的文件系统目录。`plugin_path` 应该指向 BE 进程可访问的目录(读取和执行权限),并且在加载插件之前必须存在。确保正确的权限,并且插件文件使用平台的本地二进制扩展名(例如,Linux 上的 .so)。 +- 是否可变: 否 +- 描述: StarRocks 加载外部插件(动态库、连接器制品、UDF 二进制文件等)的文件系统目录。`plugin_path` 应指向 BE 进程可访问的目录(读写执行权限),并且在加载插件之前必须存在。确保正确的所有权,并且插件文件使用平台的原生二进制扩展名(例如,Linux 上的 .so)。 - 引入版本: v3.2.0 ##### priority_networks @@ -441,8 +441,8 @@ curl http://:/varz - 默认值: 空字符串 - 类型: String - 单位: - -- 可变: 否 -- 描述: 声明服务器有多个 IP 地址时的选择策略。请注意,最多一个 IP 地址必须与此参数指定的列表匹配。此参数的值是一个列表,由 CIDR 表示法中以分号 (;) 分隔的条目组成,例如 `10.10.10.0/24`。如果没有 IP 地址与此列表中的条目匹配,将随机选择服务器的一个可用 IP 地址。从 v3.3.0 开始,StarRocks 支持基于 IPv6 的部署。如果服务器同时具有 IPv4 和 IPv6 地址,并且未指定此参数,系统默认使用 IPv4 地址。可以通过将 `net_use_ipv6_when_priority_networks_empty` 设置为 `true` 来更改此行为。 +- 是否可变: 否 +- 描述: 声明具有多个 IP 地址的服务器的选择策略。请注意,最多只有一个 IP 地址必须与此参数指定列表中的条目匹配。此参数的值是一个列表,由以分号 (;) 分隔的 CIDR 格式条目组成,例如 `10.10.10.0/24`。如果没有 IP 地址与此列表中的条目匹配,则将随机选择服务器的可用 IP 地址。从 v3.3.0 开始,StarRocks 支持基于 IPv6 的部署。如果服务器同时具有 IPv4 和 IPv6 地址,并且未指定此参数,则系统默认使用 IPv4 地址。您可以通过将 `net_use_ipv6_when_priority_networks_empty` 设置为 `true` 来更改此行为。 - 引入版本: - ##### rpc_compress_ratio_threshold @@ -450,8 +450,8 @@ curl http://:/varz - 默认值: 1.1 - 类型: Double - 单位: - -- 可变: 是 -- 描述: 决定是否以压缩形式通过网络发送序列化 Row Batch 时使用的阈值 (uncompressed_size / compressed_size)。当尝试压缩时(例如,在 DataStreamSender、exchange sink、tablet sink index channel、dictionary cache writer 中),StarRocks 计算 compress_ratio = uncompressed_size / compressed_size;仅当 compress_ratio `>` rpc_compress_ratio_threshold 时才使用压缩载荷。默认值为 1.1,压缩数据必须至少比未压缩数据小约 9.1% 才能被使用。降低此值以优先使用压缩(更多 CPU 用于更小的带宽节省);提高此值以避免压缩开销,除非它能产生更大的尺寸减小。注意:这适用于 RPC/shuffle 序列化,并且仅当 Row Batch 压缩启用时 (compress_rowbatches) 才有效。 +- 是否可变: 是 +- 描述: 用于决定是否以压缩形式通过网络发送序列化行批次的阈值(uncompressed_size / compressed_size)。当尝试压缩时(例如,在 DataStreamSender、exchange sink、tablet sink 索引通道、字典缓存写入器中),StarRocks 会计算 compress_ratio = uncompressed_size / compressed_size;仅当 compress_ratio `>` rpc_compress_ratio_threshold 时才使用压缩负载。默认值为 1.1,表示压缩数据必须比未压缩数据小至少约 9.1% 才能使用。降低此值以优先压缩(更多 CPU 消耗以获得较小的带宽节省);提高此值以避免压缩开销,除非它产生更大的尺寸缩减。注意:这适用于 RPC/shuffle 序列化,并且仅当启用行批次压缩时 (compress_rowbatches) 才有效。 - 引入版本: v3.2.0 ##### ssl_private_key_path @@ -459,8 +459,8 @@ curl http://:/varz - 默认值: 空字符串 - 类型: String - 单位: - -- 可变: 否 -- 描述: BE 的 bRPC 服务器用作默认证书私钥的 TLS/SSL 私钥 (PEM) 文件系统路径。当 `enable_https` 设置为 `true` 时,系统在进程启动时将 `brpc::ServerOptions::ssl_options().default_cert.private_key` 设置为此路径。文件必须可由 BE 进程访问,并且必须与 `ssl_certificate_path` 提供的证书匹配。如果未设置此值或文件丢失或无法访问,HTTPS 将无法配置,bRPC 服务器可能无法启动。使用限制性的文件系统权限(例如 600)保护此文件。 +- 是否可变: 否 +- 描述: TLS/SSL 私钥 (PEM) 的文件系统路径,BE 的 bRPC 服务器将其用作默认证书的私钥。当 `enable_https` 设置为 `true` 时,系统在进程启动时将 `brpc::ServerOptions::ssl_options().default_cert.private_key` 设置为此路径。文件必须可由 BE 进程访问,并且必须与 `ssl_certificate_path` 提供的证书匹配。如果未设置此值或文件丢失或不可访问,HTTPS 将不会被配置,bRPC 服务器可能无法启动。使用限制性文件系统权限(例如,600)保护此文件。 - 引入版本: v4.0.0 ##### thrift_client_retry_interval_ms @@ -468,7 +468,7 @@ curl http://:/varz - 默认值: 100 - 类型: Int - 单位: 毫秒 -- 可变: 是 +- 是否可变: 是 - 描述: Thrift 客户端重试的时间间隔。 - 引入版本: - @@ -477,8 +477,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: 秒 -- 可变: 否 -- 描述: 创建 Thrift 客户端时使用的连接超时(秒)。ClientCacheHelper::_create_client 将此值乘以 1000 并将其传递给 ThriftClientImpl::set_conn_timeout(),因此它控制 BE 客户端缓存打开的新 Thrift 连接的 TCP/连接握手超时。此设置仅影响连接建立;发送/接收超时是单独配置的。非常小的值可能会在高延迟网络上导致虚假连接失败,而大值会延迟检测不可达的对等体。 +- 是否可变: 否 +- 描述: 创建 Thrift 客户端时使用的连接超时(秒)。ClientCacheHelper::_create_client 将此值乘以 1000 并将其传递给 ThriftClientImpl::set_conn_timeout(),因此它控制由 BE 客户端缓存打开的新 Thrift 连接的 TCP/连接握手超时。此设置仅影响连接建立;发送/接收超时是单独配置的。非常小的值可能导致高延迟网络上的虚假连接失败,而大值则会延迟无法访问的对等体的检测。 - 引入版本: v3.2.0 ##### thrift_port @@ -486,8 +486,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 用于导出内部基于 Thrift 的 BackendService 的端口。当进程作为 Compute Node 运行且此项设置为非零值时,它会覆盖 `be_port`,Thrift 服务器绑定到此值;否则使用 `be_port`。此配置已弃用 — 设置非零 `thrift_port` 会记录警告,建议改用 `be_port`。 +- 是否可变: 否 +- 描述: 用于导出内部基于 Thrift 的 BackendService 的端口。当进程作为 Compute Node 运行且此项设置为非零值时,它将覆盖 `be_port` 并且 Thrift 服务器绑定到此值;否则使用 `be_port`。此配置已弃用——设置非零 `thrift_port` 会记录警告,建议改用 `be_port`。 - 引入版本: v3.2.0 ##### thrift_rpc_connection_max_valid_time_ms @@ -495,8 +495,8 @@ curl http://:/varz - 默认值: 5000 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: Thrift RPC 连接的最大有效时间。如果连接在连接池中存在时间超过此值,则连接将被关闭。它必须与 FE 配置 `thrift_client_timeout_ms` 保持一致。 +- 是否可变: 否 +- 描述: Thrift RPC 连接的最大有效时间。如果连接在连接池中存在的时间超过此值,它将被关闭。必须与 FE 配置 `thrift_client_timeout_ms` 保持一致。 - 引入版本: - ##### thrift_rpc_max_body_size @@ -504,8 +504,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: -- 可变: 否 -- 描述: RPC 的最大字符串主体大小。`0` 表示大小不受限制。 +- 是否可变: 否 +- 描述: RPC 的最大字符串主体大小。`0` 表示大小无限制。 - 引入版本: - ##### thrift_rpc_strict_mode @@ -513,7 +513,7 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 是否启用 Thrift 的严格执行模式。有关 Thrift 严格模式的更多信息,请参阅 [Thrift Binary protocol encoding](https://github.com/apache/thrift/blob/master/doc/specs/thrift-binary-protocol.md)。 - 引入版本: - @@ -522,7 +522,7 @@ curl http://:/varz - 默认值: 5000 - 类型: Int - 单位: 毫秒 -- 可变: 是 +- 是否可变: 是 - 描述: Thrift RPC 的超时时间。 - 引入版本: - @@ -531,8 +531,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: 线程 -- 可变: 是 -- 描述: 设置 BE UpdateManager 中 "update_apply" 线程池的最小线程数 — 该线程池用于应用主键表的 Rowset。值为 0 禁用固定最小值(无强制下限);当 `transaction_apply_worker_count` 也为 0 时,池的最大线程数默认为 CPU 核心数,因此有效的工作容量等于 CPU 核心数。您可以提高此值以保证应用事务的基线并发性;设置过高可能会增加 CPU 争用。更改通过 `update_config` HTTP 处理程序在运行时应用(它在 apply 线程池上调用 `update_min_threads`)。 +- 是否可变: 是 +- 描述: 设置 BE 的 UpdateManager 中 "update_apply" 线程池的最小线程数——该线程池用于应用主键表的 RowSet。值为 0 表示禁用固定最小值(不强制下限);当 `transaction_apply_worker_count` 也为 0 时,池的最大线程数默认为 CPU 核心数,因此有效 worker 容量等于 CPU 核心数。您可以提高此值以保证应用事务的基线并发性;设置过高可能会增加 CPU 争用。更改通过 `update_config` HTTP 处理程序在运行时应用(它调用 apply 线程池上的 `update_min_threads`)。 - 引入版本: v3.2.11 ##### transaction_publish_version_thread_pool_num_min @@ -540,7 +540,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: 线程 -- 可变: 是 +- 是否可变: 是 - 描述: 设置 AgentServer "publish_version" 动态线程池中保留的最小线程数(用于发布事务版本/处理 `TTaskType::PUBLISH_VERSION` 任务)。在启动时,池以 min = max(config value, MIN_TRANSACTION_PUBLISH_WORKER_COUNT) (MIN_TRANSACTION_PUBLISH_WORKER_COUNT = 1) 创建,因此默认 0 会导致最小 1 个线程。在运行时更改此值会调用更新回调以调用 ThreadPool::update_min_threads,提高或降低池的保证最小值(但不低于强制最小值 1)。与 `transaction_publish_version_worker_count`(最大线程数)和 `transaction_publish_version_thread_pool_idle_time_ms`(空闲超时)协调。 - 引入版本: v3.2.11 @@ -549,8 +549,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 当此项设置为 `true` 时,系统使用匿名私有 mmap 映射 (`MAP_ANONYMOUS | MAP_PRIVATE`) 分配块,并使用 munmap 释放它们。启用此功能可能会创建许多虚拟内存映射,因此您必须提高内核限制(作为 root 用户,运行 `sysctl -w vm.max_map_count=262144` 或 `echo 262144 > /proc/sys/vm/max_map_count`),并将 `chunk_reserved_bytes_limit` 设置为相对较大的值。否则,启用 mmap 可能会由于频繁的映射/取消映射而导致非常差的性能。 +- 是否可变: 否 +- 描述: 当此项设置为 `true` 时,系统使用匿名私有 mmap 映射 (MAP_ANONYMOUS | MAP_PRIVATE) 分配 chunk,并使用 munmap 释放它们。启用此功能可能会创建许多虚拟内存映射,因此您必须提高内核限制(作为 root 用户,运行 `sysctl -w vm.max_map_count=262144` 或 `echo 262144 > /proc/sys/vm/max_map_count`),并将 `chunk_reserved_bytes_limit` 设置为相对较大的值。否则,启用 mmap 可能会由于频繁映射/取消映射而导致性能非常差。 - 引入版本: v3.2.0 ### 元数据和集群管理 @@ -560,8 +560,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 此 StarRocks 后端的全局集群标识符。在启动时,StorageEngine 读取 config::cluster_id 到其有效的集群 ID,并验证所有数据根路径是否包含相同的集群 ID(参见 StorageEngine::_check_all_root_path_cluster_id)。值 -1 表示“未设置”——引擎可以从现有数据目录或从主心跳派生有效的 ID。如果配置了非负 ID,则配置的 ID 与数据目录中存储的 ID 之间的任何不匹配都会导致启动验证失败 (Status::Corruption)。当某些根缺少 ID 且引擎被允许写入 ID (options.need_write_cluster_id) 时,它会将有效的 ID 持久化到这些根中。 +- 是否可变: 否 +- 描述: 此 StarRocks 后端的全局集群标识符。在启动时,StorageEngine 将 config::cluster_id 读取到其实际集群 ID,并验证所有数据根路径都包含相同的集群 ID(参见 StorageEngine::_check_all_root_path_cluster_id)。值为 -1 表示“未设置”——引擎可以从现有数据目录或主节点心跳中派生实际 ID。如果配置了非负 ID,则配置 ID 与数据目录中存储的 ID 之间的任何不匹配都将导致启动验证失败 (Status::Corruption)。当某些根缺少 ID 且引擎被允许写入 ID (options.need_write_cluster_id) 时,它会将实际 ID 持久化到这些根中。 - 引入版本: v3.2.0 ##### consistency_max_memory_limit @@ -569,8 +569,8 @@ curl http://:/varz - 默认值: 10G - 类型: String - 单位: - -- 可变: 否 -- 描述: CONSISTENCY 内存跟踪器的内存大小规范。 +- 是否可变: 否 +- 描述: CONSISTENCY 内存 Tracker 的内存大小规范。 - 引入版本: v3.2.0 ##### make_snapshot_rpc_timeout_ms @@ -578,8 +578,8 @@ curl http://:/varz - 默认值: 20000 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: 设置在远程 BE 上创建快照时使用的 Thrift RPC 超时(毫秒)。当远程快照创建经常超出默认超时时,增加此值;减少此值以在无响应的 BE 上更快失败。请注意,其他超时可能会影响端到端操作(例如,有效的 Tablet 写入器打开超时可能与 `tablet_writer_open_rpc_timeout_sec` 和 `load_timeout_sec` 相关)。 +- 是否可变: 否 +- 描述: 设置在远程 BE 上创建快照时使用的 Thrift RPC 超时(毫秒)。当远程快照创建经常超出默认超时时,增加此值;降低此值可更快地对无响应的 BE 失败。请注意,其他超时可能会影响端到端操作(例如,有效的 Tablet 写入器打开超时可能与 `tablet_writer_open_rpc_timeout_sec` 和 `load_timeout_sec` 相关)。 - 引入版本: v3.2.0 ##### metadata_cache_memory_limit_percent @@ -587,8 +587,8 @@ curl http://:/varz - 默认值: 30 - 类型: Int - 单位: 百分比 -- 可变: 是 -- 描述: 设置元数据 LRU 缓存大小为进程内存限制的百分比。在启动时,StarRocks 将缓存字节计算为 (`process_mem_limit * metadata_cache_memory_limit_percent / 100`) 并将其传递给元数据缓存分配器。缓存仅用于非 PRIMARY_KEYS Rowset(不支持 PK 表),并且仅当 `metadata_cache_memory_limit_percent > 0` 时才启用;将其设置为 `<= 0` 以禁用元数据缓存。增加此值会提高元数据缓存容量,但会减少其他组件可用的内存;根据工作负载和系统内存进行调整。在 BE_TEST 构建中不活动。 +- 是否可变: 是 +- 描述: 将元数据 LRU 缓存大小设置为进程内存限制的百分比。在启动时,StarRocks 计算缓存字节数,公式为 (process_mem_limit * metadata_cache_memory_limit_percent / 100),并将其传递给元数据缓存分配器。该缓存仅用于非 PRIMARY_KEYS RowSet(不支持 PK 表),并且仅在 metadata_cache_memory_limit_percent > 0 时启用;将其设置为 <= 0 以禁用元数据缓存。增加此值会提高元数据缓存容量,但会减少其他组件可用的内存;根据工作负载和系统内存进行调整。在 BE_TEST 构建中不活跃。 - 引入版本: v3.2.10 ##### retry_apply_interval_second @@ -596,8 +596,8 @@ curl http://:/varz - 默认值: 30 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 用于调度失败的 Tablet 应用操作重试的基本间隔(秒)。它直接用于在提交失败后调度重试,并作为退避的基本乘数:下一个重试延迟计算为 min(600, `retry_apply_interval_second` * failed_attempts)。代码还使用 `retry_apply_interval_second` 计算累积重试持续时间(等差数列和),并将其与 `retry_apply_timeout_second` 进行比较,以决定是否继续重试。仅当 `enable_retry_apply` 为 true 时有效。增加此值会延长单个重试延迟和累积重试时间;减少此值会使重试更频繁,并可能增加达到 `retry_apply_timeout_second` 之前的尝试次数。 +- 是否可变: 是 +- 描述: 调度失败的 Tablet apply 操作重试时使用的基本间隔(秒)。它直接用于在提交失败后调度重试,并作为回退的基本乘数:下一次重试延迟计算为 min(600, `retry_apply_interval_second` * failed_attempts)。代码还使用 `retry_apply_interval_second` 计算累积重试持续时间(等差数列和),并将其与 `retry_apply_timeout_second` 进行比较,以决定是否继续重试。仅当 `enable_retry_apply` 为 true 时有效。增加此值会延长单个重试延迟和累积重试时间;减少此值会使重试更频繁,并可能在达到 `retry_apply_timeout_second` 之前增加尝试次数。 - 引入版本: v3.2.9 ##### retry_apply_timeout_second @@ -605,8 +605,8 @@ curl http://:/varz - 默认值: 7200 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 允许应用挂起版本之前积累的最大重试时间(秒),在此时间之后,应用进程将放弃并使 Tablet 进入错误状态。应用逻辑根据 `retry_apply_interval_second` 累积指数/退避间隔,并将总持续时间与 `retry_apply_timeout_second` 进行比较。如果 `enable_retry_apply` 为 true 且错误被认为是可重试的,则应用尝试将重新调度,直到累积退避超过 `retry_apply_timeout_second`;然后应用停止,Tablet 转换为错误状态。明确不可重试的错误(例如 Corruption)无论此设置如何都不会重试。调整此值以控制 StarRocks 将继续重试应用操作多长时间(默认 7200s = 2 小时)。 +- 是否可变: 是 +- 描述: 允许应用挂起版本之前允许的最大累积重试时间(秒),超过此时间后,apply 进程将放弃,Tablet 进入错误状态。apply 逻辑根据 `retry_apply_interval_second` 累积指数/退避间隔,并将总持续时间与 `retry_apply_timeout_second` 进行比较。如果 `enable_retry_apply` 为 true 且错误被认为是可重试的,则会重新安排 apply 尝试,直到累积退避超过 `retry_apply_timeout_second`;然后 apply 停止,Tablet 转换为错误状态。明确不可重试的错误(例如 Corruption)将不重试,无论此设置如何。调整此值以控制 StarRocks 将继续重试 apply 操作的时间(默认 7200 秒 = 2 小时)。 - 引入版本: v3.3.13, v3.4.3, v3.5.0 ##### txn_commit_rpc_timeout_ms @@ -614,8 +614,8 @@ curl http://:/varz - 默认值: 60000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: BE 流式加载和事务提交调用使用的 Thrift RPC 连接的最大允许生命周期(毫秒)。StarRocks 将此值设置为发送到 FE 的请求的 `thrift_rpc_timeout_ms`(用于 stream_load 规划、loadTxnBegin/loadTxnPrepare/loadTxnCommit 和 getLoadTxnStatus)。如果连接在池中停留时间超过此值,它将被关闭。当提供了每个请求的超时 (`ctx->timeout_second`) 时,BE 将 RPC 超时计算为 rpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms)),因此有效的 RPC 超时受上下文和此配置的限制。请确保此值与 FE 的 `thrift_client_timeout_ms` 保持一致,以避免超时不匹配。 +- 是否可变: 是 +- 描述: BE stream-load 和事务提交调用使用的 Thrift RPC 连接的最大允许生命周期(毫秒)。StarRocks 将此值设置为发送到 FE 的请求的 `thrift_rpc_timeout_ms`(用于 stream_load 规划、loadTxnBegin/loadTxnPrepare/loadTxnCommit 和 getLoadTxnStatus)。如果连接在池中时间超过此值,它将被关闭。当提供了每个请求超时 (`ctx->timeout_second`) 时,BE 将 RPC 超时计算为 rpc_timeout_ms = max(ctx*1000/4, min(ctx*1000/2, txn_commit_rpc_timeout_ms)),因此有效 RPC 超时受限于上下文和此配置。请保持此值与 FE 的 `thrift_client_timeout_ms` 一致,以避免超时不匹配。 - 引入版本: v3.2.0 ##### txn_map_shard_size @@ -623,8 +623,8 @@ curl http://:/varz - 默认值: 128 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 事务管理器用于分区事务锁和减少争用的锁映射分片数量。其值应为 2 的幂 (2^n);增加它会增加并发性并减少锁争用,但会增加额外的内存和少量簿记开销。根据预期的并发事务和可用内存选择分片数量。 +- 是否可变: 否 +- 描述: 事务管理器用于分区事务锁和减少争用的锁映射分片数。其值应为 2 的幂 (2^n);增加它会增加并发性并减少锁争用,但会增加额外的内存和少量簿记开销。根据预期的并发事务和可用内存选择分片计数。 - 引入版本: v3.2.0 ##### txn_shard_size @@ -632,8 +632,8 @@ curl http://:/varz - 默认值: 1024 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 控制事务管理器使用的锁分片数量。此值决定了事务锁的分片大小。它必须是 2 的幂;将其设置为更大的值会减少锁争用并提高并发 COMMIT/PUBLISH 吞吐量,但会增加额外的内存和更细粒度的内部簿记开销。 +- 是否可变: 否 +- 描述: 控制事务管理器使用的锁分片数量。此值决定了事务锁的分片大小。它必须是 2 的幂;将其设置为较大值可以减少锁争用并提高并发 COMMIT/PUBLISH 吞吐量,但代价是额外的内存和更细粒度的内部簿记。 - 引入版本: v3.2.0 ##### update_schema_worker_count @@ -641,8 +641,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: 线程 -- 可变: 否 -- 描述: 设置后端“update_schema”动态 ThreadPool 中处理 `TTaskType::UPDATE_SCHEMA` 任务的最大工作线程数。ThreadPool 在启动期间在 AgentServer 中创建,最小线程数为 0(空闲时可以缩减到零),最大线程数等于此设置;该池使用默认的空闲超时和实际上无限的队列。增加此值以允许更多并发的模式更新任务(更高的 CPU 和内存使用率),或降低此值以限制并行模式操作。 +- 是否可变: 否 +- 描述: 设置后端“update_schema”动态 ThreadPool 中处理 TTaskType::UPDATE_SCHEMA 任务的最大工作线程数。ThreadPool 在启动时在 `agent_server` 中创建,最小线程数为 0(空闲时可缩减到零),最大线程数等于此设置;该池使用默认的空闲超时和实际上无限的队列。增加此值以允许更多并发的模式更新任务(更高的 CPU 和内存使用),或降低此值以限制并行模式操作。 - 引入版本: v3.2.3 ##### update_tablet_meta_info_worker_count @@ -650,19 +650,19 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 设置后端线程池中处理 Tablet 元数据更新任务的最大工作线程数。线程池在后端启动期间创建,最小线程数为 0(空闲时可以缩减到零),最大线程数等于此设置(限制为至少 1)。在运行时更新此值会调整池的最大线程数。增加它以允许更多并发的元数据更新任务,或降低它以限制并发性。 +- 是否可变: 是 +- 描述: 设置后端线程池中处理 Tablet 元数据更新任务的最大工作线程数。线程池在后端启动期间创建,最小线程数为 0(空闲时可缩减到零),最大线程数等于此设置(限制为至少 1)。在运行时更新此值将调整池的最大线程数。增加它以允许更多并发的元数据更新任务,或降低它以限制并发性。 - 引入版本: v4.1.0, v4.0.6, v3.5.13 ### 用户、角色和权限 ##### ssl_certificate_path -- 默认值: +- 默认值: - 类型: String - 单位: - -- 可变: 否 -- 描述: BE 的 bRPC 服务器在 `enable_https` 为 true 时将使用的 TLS/SSL 证书文件 (PEM) 的绝对路径。在 BE 启动时,此值将复制到 `brpc::ServerOptions::ssl_options().default_cert.certificate`;您还必须将 `ssl_private_key_path` 设置为匹配的私钥。如果您的 CA 要求,请以 PEM 格式提供服务器证书和任何中间证书(证书链)。该文件必须可由 StarRocks BE 进程读取,并且仅在启动时应用。如果未设置或无效而 `enable_https` 已启用,bRPC TLS 设置可能会失败并阻止服务器正常启动。 +- 是否可变: 否 +- 描述: TLS/SSL 证书文件 (PEM) 的绝对路径,当 `enable_https` 为 true 时,BE 的 bRPC 服务器将使用此证书作为默认证书。在 BE 启动时,此值将复制到 `brpc::ServerOptions::ssl_options().default_cert.certificate`;您还必须将 `ssl_private_key_path` 设置为匹配的私钥。如果您的 CA 要求,请以 PEM 格式提供服务器证书和任何中间证书(证书链)。该文件必须可由 StarRocks BE 进程读取,并且仅在启动时应用。如果未设置或无效,而 `enable_https` 已启用,则 bRPC TLS 设置可能会失败并阻止服务器正常启动。 - 引入版本: v4.0.0 ### 查询引擎 @@ -672,8 +672,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 启用时,BE 的 UserFunctionCache 将在启动时清除所有本地缓存的用户函数库。在 UserFunctionCache::init 期间,代码会调用 _reset_cache_dir(),该函数会从配置的 UDF 库目录(组织成 kLibShardNum 子目录)中删除 UDF 文件,并删除带有 Java/Python UDF 后缀(.jar/.py)的文件。禁用时(默认),BE 会加载现有缓存的 UDF 文件而不是删除它们。启用此功能会强制在重启后首次使用时重新下载 UDF 二进制文件(增加网络流量和首次使用的延迟)。 +- 是否可变: 否 +- 描述: 启用时,BE 的 UserFunctionCache 将在启动时清除所有本地缓存的用户函数库。在 UserFunctionCache::init 期间,代码会调用 _reset_cache_dir(),该函数会从配置的 UDF 库目录(组织成 kLibShardNum 子目录)中删除 UDF 文件,并删除具有 Java/Python UDF 后缀(.jar/.py)的文件。禁用时(默认),BE 会加载现有缓存的 UDF 文件而不是删除它们。启用此功能会强制 UDF 二进制文件在重启后首次使用时重新下载(增加网络流量和首次使用延迟)。 - 引入版本: v4.0.0 ##### dictionary_speculate_min_chunk_size @@ -681,8 +681,8 @@ curl http://:/varz - 默认值: 10000 - 类型: Int - 单位: 行 -- 可变: 否 -- 描述: StringColumnWriter 和 DictColumnWriter 用于触发字典编码推测的最小行数(块大小)。如果传入列(或累积缓冲区加上传入行)的大小大于或等于 `dictionary_speculate_min_chunk_size`,则写入器将立即运行推测并设置编码(DICT、PLAIN 或 BIT_SHUFFLE),而不是缓冲更多行。推测使用 `dictionary_encoding_ratio` 用于字符串列,使用 `dictionary_encoding_ratio_for_non_string_column` 用于数值/非字符串列来决定字典编码是否有利。此外,大的列 byte_size(大于或等于 UINT32_MAX)会强制立即推测以避免 `BinaryColumn` 溢出。 +- 是否可变: 否 +- 描述: StringColumnWriter 和 DictColumnWriter 用于触发字典编码推测的最小行数(块大小)。如果传入的列(或累积缓冲区加上传入行)的大小大于或等于 `dictionary_speculate_min_chunk_size`,写入器将立即运行推测并设置编码(DICT、PLAIN 或 BIT_SHUFFLE),而不是缓冲更多行。推测使用 `dictionary_encoding_ratio` 用于字符串列,`dictionary_encoding_ratio_for_non_string_column` 用于数值/非字符串列,以决定字典编码是否有利。此外,较大的列字节大小(大于或等于 UINT32_MAX)会强制立即推测以避免 `BinaryColumn` 溢出。 - 引入版本: v3.2.0 ##### disable_storage_page_cache @@ -690,21 +690,21 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 一个布尔值,控制是否禁用 PageCache。 - - 当启用 PageCache 时,StarRocks 会缓存最近扫描的数据。 - - 当频繁重复相似查询时,PageCache 可以显著提高查询性能。 + - 启用 PageCache 后,StarRocks 会缓存最近扫描的数据。 + - 当重复执行类似查询时,PageCache 可以显著提高查询性能。 - `true` 表示禁用 PageCache。 - 从 StarRocks v2.4 开始,此项的默认值已从 `true` 更改为 `false`。 - 引入版本: - ##### enable_bitmap_index_memory_page_cache -- 默认值: true +- 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否为 Bitmap 索引启用内存缓存。如果想使用 Bitmap 索引加速点查询,建议启用内存缓存。 +- 是否可变: 是 +- 描述: 是否为 Bitmap 索引启用内存缓存。如果您想使用 Bitmap 索引加速点查询,建议使用内存缓存。 - 引入版本: v3.1 ##### enable_compaction_flat_json @@ -712,7 +712,7 @@ curl http://:/varz - 默认值: True - 类型: Boolean - 单位: -- 可变: 是 +- 是否可变: 是 - 描述: 是否为 Flat JSON 数据启用 Compaction。 - 引入版本: v3.3.3 @@ -721,7 +721,7 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: -- 可变: 是 +- 是否可变: 是 - 描述: 是否启用 Flat JSON 功能。启用此功能后,新加载的 JSON 数据将自动展平,从而提高 JSON 查询性能。 - 引入版本: v3.3.0 @@ -730,8 +730,8 @@ curl http://:/varz - 默认值: True - 类型: Boolean - 单位: -- 可变: 是 -- 描述: 当查询在读取过程中缺少 Flat JSON 模式时,是否启用 Lazy Dynamic Flat JSON。当此项设置为 `true` 时,StarRocks 会将 Flat JSON 操作推迟到计算过程而不是读取过程。 +- 是否可变: 是 +- 描述: 当查询在读取过程中缺少 Flat JSON 模式时,是否启用 Lazy Dyamic Flat JSON。当此项设置为 `true` 时,StarRocks 会将 Flat JSON 操作推迟到计算过程,而不是读取过程。 - 引入版本: v3.3.3 ##### enable_ordinal_index_memory_page_cache @@ -739,8 +739,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否为 Ordinal 索引启用内存缓存。Ordinal 索引是行 ID 到数据页位置的映射,可用于加速扫描。 +- 是否可变: 是 +- 描述: 是否为序数索引启用内存缓存。序数索引是将行 ID 映射到数据页位置的映射,可用于加速扫描。 - 引入版本: - ##### enable_string_prefix_zonemap @@ -748,8 +748,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否为使用前缀 Min/Max 的字符串 (CHAR/VARCHAR) 列启用 ZoneMap。对于非关键字符串列,Min/Max 值会被截断为 `string_prefix_zonemap_prefix_len` 配置的固定前缀长度。 +- 是否可变: 是 +- 描述: 是否为使用前缀 Min/Max 的字符串 (CHAR/VARCHAR) 列启用 ZoneMap。对于非键字符串列,Min/Max 值被截断为由 `string_prefix_zonemap_prefix_len` 配置的固定前缀长度。 - 引入版本: - ##### enable_zonemap_index_memory_page_cache @@ -757,8 +757,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否为 Zonemap 索引启用内存缓存。如果想使用 Zonemap 索引加速扫描,建议启用内存缓存。 +- 是否可变: 是 +- 描述: 是否为 ZoneMap 索引启用内存缓存。如果您想使用 ZoneMap 索引加速扫描,建议使用内存缓存。 - 引入版本: - ##### exchg_node_buffer_size_bytes @@ -766,8 +766,8 @@ curl http://:/varz - 默认值: 10485760 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 每个查询的交换节点接收端的最大缓冲区大小。此配置项是一个软限制。当数据以过快的速度发送到接收端时,会触发反压。 +- 是否可变: 是 +- 描述: 每个查询的交换节点接收端最大缓冲区大小。此配置项是软限制。当数据以过快的速度发送到接收端时,会触发反压。 - 引入版本: - ##### file_descriptor_cache_capacity @@ -775,8 +775,8 @@ curl http://:/varz - 默认值: 16384 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 可以缓存的文件描述符数量。 +- 是否可变: 否 +- 描述: 可缓存的文件描述符数量。 - 引入版本: - ##### flamegraph_tool_dir @@ -784,8 +784,8 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/bin/flamegraph` - 类型: String - 单位: - -- 可变: 否 -- 描述: Flamegraph 工具的目录,其中应包含 pprof、stackcollapse-go.pl 和 flamegraph.pl 脚本,用于从配置文件数据生成火焰图。 +- 是否可变: 否 +- 描述: Flamegraph 工具的目录,应包含 pprof、stackcollapse-go.pl 和 flamegraph.pl 脚本,用于从配置文件数据生成火焰图。 - 引入版本: - ##### fragment_pool_queue_size @@ -793,8 +793,8 @@ curl http://:/varz - 默认值: 2048 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 每个 BE 节点上可以处理的查询数量上限。 +- 是否可变: 否 +- 描述: 每个 BE 节点可处理的查询数量上限。 - 引入版本: - ##### fragment_pool_thread_num_max @@ -802,7 +802,7 @@ curl http://:/varz - 默认值: 4096 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 用于查询的最大线程数。 - 引入版本: - @@ -811,7 +811,7 @@ curl http://:/varz - 默认值: 64 - 类型: Int - 单位: 分钟 - -- 可变: 否 +- 是否可变: 否 - 描述: 用于查询的最小线程数。 - 引入版本: - @@ -820,7 +820,7 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 指定是否启用对冲读取功能。 - 引入版本: v3.0 @@ -829,8 +829,8 @@ curl http://:/varz - 默认值: 128 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 指定 HDFS 客户端上对冲读取线程池的大小。线程池大小限制了 HDFS 客户端中专门用于运行对冲读取的线程数。它等效于 HDFS 集群 **hdfs-site.xml** 文件中的 `dfs.client.hedged.read.threadpool.size` 参数。 +- 是否可变: 否 +- 描述: 指定 HDFS 客户端上的对冲读取线程池大小。线程池大小限制了 HDFS 客户端中专用于运行对冲读取的线程数。它等同于 HDFS 集群 **hdfs-site.xml** 文件中的 `dfs.client.hedged.read.threadpool.size` 参数。 - 引入版本: v3.0 ##### hdfs_client_hedged_read_threshold_millis @@ -838,8 +838,8 @@ curl http://:/varz - 默认值: 2500 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: 指定在启动对冲读取之前等待的毫秒数。例如,您已将此参数设置为 `30`。在这种情况下,如果从块读取在 30 毫秒内未返回,您的 HDFS 客户端会立即启动对不同块副本的新读取。它等效于 HDFS 集群 **hdfs-site.xml** 文件中的 `dfs.client.hedged.read.threshold.millis` 参数。 +- 是否可变: 否 +- 描述: 指定在启动对冲读取之前等待的毫秒数。例如,您已将此参数设置为 `30`。在这种情况下,如果从块读取在 30 毫秒内未返回,HDFS 客户端会立即对不同的块副本启动新的读取。它等同于 HDFS 集群 **hdfs-site.xml** 文件中的 `dfs.client.hedged.read.threshold.millis` 参数。 - 引入版本: v3.0 ##### io_coalesce_adaptive_lazy_active @@ -847,8 +847,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 根据谓词的选择性,自适应地确定是否合并谓词列和非谓词列的 I/O。 +- 是否可变: 是 +- 描述: 基于谓词的选择性,自适应地确定是否合并谓词列和非谓词列的 I/O。 - 引入版本: v3.2 ##### jit_lru_cache_size @@ -856,8 +856,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: JIT 编译的 LRU 缓存大小。如果它设置为大于 0,则表示缓存的实际大小。如果设置为小于或等于 0,系统将使用公式 `jit_lru_cache_size = min(mem_limit*0.01, 1GB)` 自适应地设置缓存(同时节点的 `mem_limit` 必须大于或等于 16 GB)。 +- 是否可变: 是 +- 描述: JIT 编译的 LRU 缓存大小。如果设置为大于 0,则表示缓存的实际大小。如果设置为小于或等于 0,系统将使用公式 `jit_lru_cache_size = min(mem_limit*0.01, 1GB)` 自适应地设置缓存(节点 `mem_limit` 必须大于或等于 16 GB)。 - 引入版本: - ##### json_flat_column_max @@ -865,7 +865,7 @@ curl http://:/varz - 默认值: 100 - 类型: Int - 单位: -- 可变: 是 +- 是否可变: 是 - 描述: Flat JSON 可提取的最大子字段数量。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 - 引入版本: v3.3.0 @@ -874,8 +874,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: -- 可变: 是 -- 描述: 是否在写入时为扁平化 JSON 子列创建 ZoneMaps。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 是否可变: 是 +- 描述: 在写入时是否为展平的 JSON 子列创建 ZoneMap。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 - 引入版本: - ##### json_flat_null_factor @@ -883,8 +883,8 @@ curl http://:/varz - 默认值: 0.3 - 类型: Double - 单位: -- 可变: 是 -- 描述: Flat JSON 提取的列中 NULL 值的比例。如果列中 NULL 值的比例高于此阈值,则不提取该列。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 是否可变: 是 +- 描述: Flat JSON 提取的列中 NULL 值的比例。如果列中 NULL 值的比例高于此阈值,则不进行提取。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 - 引入版本: v3.3.0 ##### json_flat_sparsity_factor @@ -892,8 +892,8 @@ curl http://:/varz - 默认值: 0.3 - 类型: Double - 单位: -- 可变: 是 -- 描述: Flat JSON 同名列的比例。如果同名列的比例低于此值,则不执行提取。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 +- 是否可变: 是 +- 描述: Flat JSON 中同名列的比例。如果同名列的比例低于此值,则不进行提取。此参数仅在 `enable_json_flat` 设置为 `true` 时生效。 - 引入版本: v3.3.0 ##### lake_tablet_ignore_invalid_delete_predicate @@ -901,8 +901,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 一个布尔值,控制是否忽略 Tablet Rowset 元数据中可能由于列名重命名后对重复键表进行逻辑删除而引入的无效删除谓词。 +- 是否可变: 是 +- 描述: 一个布尔值,控制是否忽略 Tablet RowSet 元数据中可能因逻辑删除导致的无效删除谓词,该谓词在列名重命名后引入到 Duplicate Key 表中。 - 引入版本: v4.0 ##### late_materialization_ratio @@ -910,8 +910,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 范围 [0-1000] 内的整数比率,控制 SegmentIterator(向量查询引擎)中使用延迟物化。值为 `0`(或 <= 0)禁用延迟物化;`1000`(或 >= 1000)强制所有读取使用延迟物化。值 > 0 且 < 1000 启用条件策略,其中准备了延迟和早期物化上下文,并且迭代器根据谓词过滤器比率选择行为(值越高越有利于延迟物化)。当 Segment 包含复杂指标类型时,StarRocks 改用 `metric_late_materialization_ratio`。如果 `lake_io_opts.cache_file_only` 设置为 true,则禁用延迟物化。 +- 是否可变: 否 +- 描述: SegmentIterator(向量查询引擎)中控制晚期物化使用的整数比例,范围 [0-1000]。值为 `0`(或 <= 0)禁用晚期物化;`1000`(或 >= 1000)强制所有读取使用晚期物化。值 > 0 且 < 1000 启用条件策略,其中晚期和早期物化上下文都已准备好,迭代器根据谓词过滤比例选择行为(值越高越倾向于晚期物化)。当 Segment 包含复杂指标类型时,StarRocks 改用 `metric_late_materialization_ratio`。如果 `lake_io_opts.cache_file_only` 设置为 true,则禁用晚期物化。 - 引入版本: v3.2.0 ##### max_hdfs_file_handle @@ -919,8 +919,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 可打开的最大 HDFS 文件描述符数量。 +- 是否可变: 是 +- 描述: 可打开的 HDFS 文件描述符的最大数量。 - 引入版本: - ##### max_memory_sink_batch_count @@ -928,26 +928,26 @@ curl http://:/varz - 默认值: 20 - 类型: Int - 单位: - -- 可变: 是 -- 描述: Scan Cache 的最大批次数量。 +- 是否可变: 是 +- 描述: Scan Cache 批次的最大数量。 - 引入版本: - ##### max_pushdown_conditions_per_column -- Default: 1024 -- Type: Int -- Unit: - -- Is mutable: Yes -- Description: 每个列允许下推的最大条件数。如果条件数超过此限制,谓词将不会下推到存储层。 -- Introduced in: - +- 默认值: 1024 +- 类型: Int +- 单位: - +- 是否可变: 是 +- 描述: 每列允许下推的最大条件数。如果条件数超过此限制,谓词将不会下推到存储层。 +- 引入版本: - ##### max_scan_key_num - 默认值: 1024 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 每个查询分段的最大扫描键数量。 +- 是否可变: 是 +- 描述: 每个查询分割的最大扫描键数量。 - 引入版本: - ##### metric_late_materialization_ratio @@ -955,8 +955,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 控制包含复杂指标列的读取何时使用延迟物化行访问策略。有效范围: [0-1000]。`0` 禁用延迟物化;`1000` 强制所有适用读取使用延迟物化。值 1-999 启用条件策略,其中准备了延迟和早期物化上下文,并根据谓词/选择性在运行时选择。当存在复杂指标类型时,`metric_late_materialization_ratio` 会覆盖通用的 `late_materialization_ratio`。注意:`cache_file_only` I/O 模式将导致延迟物化被禁用,无论此设置如何。 +- 是否可变: 否 +- 描述: 控制包含复杂指标列的读取何时使用晚期物化行访问策略。有效范围:[0-1000]。`0` 禁用晚期物化;`1000` 强制所有适用的读取使用晚期物化。值 1-999 启用条件策略,其中晚期和早期物化上下文都已准备好,并根据谓词/选择性在运行时选择。当存在复杂指标类型时,`metric_late_materialization_ratio` 会覆盖通用的 `late_materialization_ratio`。注意:`cache_file_only` I/O 模式将导致禁用晚期物化,无论此设置如何。 - 引入版本: v3.2.0 ##### min_file_descriptor_number @@ -964,8 +964,8 @@ curl http://:/varz - 默认值: 60000 - 类型: Int - 单位: - -- 可变: 否 -- 描述: BE 进程中文件描述符的最小数量。 +- 是否可变: 否 +- 描述: BE 进程中的最小文件描述符数量。 - 引入版本: - ##### object_storage_connect_timeout_ms @@ -973,8 +973,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: 与对象存储建立套接字连接的超时时长。`-1` 表示使用 SDK 配置的默认超时时长。 +- 是否可变: 否 +- 描述: 与对象存储建立 socket 连接的超时持续时间。`-1` 表示使用 SDK 配置的默认超时持续时间。 - 引入版本: v3.0.9 ##### object_storage_request_timeout_ms @@ -982,8 +982,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: 与对象存储建立 HTTP 连接的超时时长。`-1` 表示使用 SDK 配置的默认超时时长。 +- 是否可变: 否 +- 描述: 与对象存储建立 HTTP 连接的超时持续时间。`-1` 表示使用 SDK 配置的默认超时持续时间。 - 引入版本: v3.0.9 ##### parquet_late_materialization_enable @@ -991,8 +991,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 一个布尔值,控制是否启用 Parquet 读取器的延迟物化以提高性能。`true` 表示启用延迟物化,`false` 表示禁用。 +- 是否可变: 否 +- 描述: 一个布尔值,控制是否启用 Parquet 读取器的晚期物化以提高性能。`true` 表示启用晚期物化,`false` 表示禁用。 - 引入版本: - ##### parquet_page_index_enable @@ -1000,8 +1000,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 一个布尔值,控制是否启用 Parquet 文件的 Page Index 以提高性能。`true` 表示启用 Page Index,`false` 表示禁用。 +- 是否可变: 否 +- 描述: 一个布尔值,控制是否启用 Parquet 文件的 pageindex 以提高性能。`true` 表示启用 pageindex,`false` 表示禁用。 - 引入版本: v3.3 ##### parquet_reader_bloom_filter_enable @@ -1009,8 +1009,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 一个布尔值,控制是否启用 Parquet 文件的布隆过滤器以提高性能。`true` 表示启用布隆过滤器,`false` 表示禁用。您也可以通过系统变量 `enable_parquet_reader_bloom_filter` 在会话级别控制此行为。Parquet 中的布隆过滤器在**每个行组内以列级别**维护。如果 Parquet 文件包含某些列的布隆过滤器,查询可以使用这些列上的谓词有效地跳过行组。 +- 是否可变: 是 +- 描述: 一个布尔值,控制是否启用 Parquet 文件的 Bloom Filter 以提高性能。`true` 表示启用 Bloom Filter,`false` 表示禁用。您也可以通过会话变量 `enable_parquet_reader_bloom_filter` 在会话级别控制此行为。Parquet 中的 Bloom Filter **在每个行组的列级别维护**。如果 Parquet 文件包含某些列的 Bloom Filter,查询可以使用这些列上的谓词高效地跳过行组。 - 引入版本: v3.5 ##### path_gc_check_step @@ -1018,7 +1018,7 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 每次可连续扫描的最大文件数。 - 引入版本: - @@ -1027,7 +1027,7 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: 毫秒 -- 可变: 是 +- 是否可变: 是 - 描述: 文件扫描之间的时间间隔。 - 引入版本: - @@ -1036,7 +1036,7 @@ curl http://:/varz - 默认值: 86400 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: GC 清理过期数据的时间间隔。 - 引入版本: - @@ -1045,8 +1045,8 @@ curl http://:/varz - 默认值: 8 - 类型: Double - 单位: - -- 可变: 是 -- 描述: BE 节点中 Pipeline Connector 每 CPU 核心分配的扫描线程数。此配置从 v3.1.7 版本开始变为动态。 +- 是否可变: 是 +- 描述: 每个 BE 节点中 Pipeline Connector 分配给每个 CPU 核心的扫描线程数。此配置项从 v3.1.7 开始更改为动态。 - 引入版本: - ##### pipeline_poller_timeout_guard_ms @@ -1054,8 +1054,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 当此项设置为大于 `0` 时,如果驱动程序在 Poller 中单次调度花费的时间超过 `pipeline_poller_timeout_guard_ms`,则会打印驱动程序和操作员的信息。 +- 是否可变: 是 +- 描述: 当此项设置为大于 `0` 时,如果一个 driver 在 poller 中单次调度耗时超过 `pipeline_poller_timeout_guard_ms`,则会打印该 driver 和 operator 的信息。 - 引入版本: - ##### pipeline_prepare_thread_pool_queue_size @@ -1063,8 +1063,8 @@ curl http://:/varz - 默认值: 102400 - 类型: Int - 单位: - -- 可变: 否 -- 描述: Pipeline 执行引擎 PREPARE Fragment 线程池的最大队列长度。 +- 是否可变: 否 +- 描述: Pipeline 执行引擎的 PREPARE 片段线程池的最大队列长度。 - 引入版本: - ##### pipeline_prepare_thread_pool_thread_num @@ -1072,8 +1072,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 否 -- 描述: Pipeline 执行引擎 PREPARE Fragment 线程池中的线程数。`0` 表示该值等于系统 VCPU 核心数。 +- 是否可变: 否 +- 描述: Pipeline 执行引擎 PREPARE 片段线程池中的线程数。`0` 表示该值等于系统 VCPU 核心数。 - 引入版本: - ##### pipeline_prepare_timeout_guard_ms @@ -1081,8 +1081,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 当此项设置为大于 `0` 时,如果计划片段在 PREPARE 过程中超过 `pipeline_prepare_timeout_guard_ms`,则会打印计划片段的堆栈跟踪。 +- 是否可变: 是 +- 描述: 当此项设置为大于 `0` 时,如果一个计划片段在 PREPARE 过程中超过 `pipeline_prepare_timeout_guard_ms`,则会打印该计划片段的栈追踪。 - 引入版本: - ##### pipeline_scan_thread_pool_queue_size @@ -1090,8 +1090,8 @@ curl http://:/varz - 默认值: 102400 - 类型: Int - 单位: - -- 可变: 否 -- 描述: Pipeline 执行引擎 SCAN 线程池的最大任务队列长度。 +- 是否可变: 否 +- 描述: Pipeline 执行引擎的 SCAN 线程池的最大任务队列长度。 - 引入版本: - ##### pk_index_parallel_get_threadpool_size @@ -1099,8 +1099,8 @@ curl http://:/varz - 默认值: 1048576 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 设置共享数据(云原生/Lake)模式下 PK 索引并行获取操作使用的 "cloud_native_pk_index_get" 线程池的最大队列大小(挂起任务数)。该池的实际线程数由 `pk_index_parallel_get_threadpool_max_threads` 控制;此设置仅限制有多少任务可以排队等待执行。非常大的默认值 (2^20) 实际上使队列无限制;降低它可防止排队任务导致过度内存增长,但当队列已满时可能导致任务提交阻塞或失败。根据工作负载并发性和内存限制与 `pk_index_parallel_get_threadpool_max_threads` 一起调整。 +- 是否可变: 是 +- 描述: 设置共享数据(云原生/lake)模式下 PK 索引并行获取操作使用的 "cloud_native_pk_index_get" 线程池的最大队列大小(挂起任务数)。该池的实际线程数由 `pk_index_parallel_get_threadpool_max_threads` 控制;此设置仅限制可能排队等待执行的任务数量。非常大的默认值 (2^20) 实际上使队列无界;降低它可防止排队任务导致过度内存增长,但可能在队列满时导致任务提交阻塞或失败。根据工作负载并发性和内存约束,与 `pk_index_parallel_get_threadpool_max_threads` 一起调整。 - 引入版本: - ##### priority_queue_remaining_tasks_increased_frequency @@ -1108,8 +1108,8 @@ curl http://:/varz - 默认值: 512 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 控制 BlockingPriorityQueue 增加所有剩余任务优先级(“老化”)的频率,以避免饥饿。每次成功 get/pop 都会增加一个内部 `_upgrade_counter`;当 `_upgrade_counter` 超过 `priority_queue_remaining_tasks_increased_frequency` 时,队列会增加每个元素的优先级,重建堆,并重置计数器。较小的值会导致更频繁的优先级老化(减少饥饿,但由于迭代和重新堆化而增加 CPU 成本);较大的值会减少该开销但会延迟优先级调整。该值是一个简单的操作计数阈值,而不是时间持续时间。 +- 是否可变: 是 +- 描述: 控制 BlockingPriorityQueue 提高所有剩余任务优先级(“老化”)以避免饥饿的频率。每次成功 get/pop 都会增加内部 `_upgrade_counter`;当 `_upgrade_counter` 超过 `priority_queue_remaining_tasks_increased_frequency` 时,队列会增加每个元素的优先级,重建堆,并重置计数器。较小的值会导致更频繁的优先级老化(减少饥饿但增加迭代和重新堆化的 CPU 成本);较大的值会减少该开销但延迟优先级调整。该值是简单的操作计数阈值,而不是时间持续时间。 - 引入版本: v3.2.0 ##### query_cache_capacity @@ -1117,7 +1117,7 @@ curl http://:/varz - 默认值: 536870912 - 类型: Int - 单位: 字节 -- 可变: 否 +- 是否可变: 否 - 描述: BE 中查询缓存的大小。默认大小为 512 MB。大小不能小于 4 MB。如果 BE 的内存容量不足以提供预期的查询缓存大小,您可以增加 BE 的内存容量。 - 引入版本: - @@ -1126,7 +1126,7 @@ curl http://:/varz - 默认值: 1.0 - 类型: Double - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 如果启用自动溢出,当所有查询的内存使用量超过 `query_pool memory limit * query_pool_spill_mem_limit_threshold` 时,将触发中间结果溢出。 - 引入版本: v3.2.7 @@ -1135,8 +1135,8 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}` - 类型: string - 单位: - -- 可变: 否 -- 描述: 逗号分隔的可写临时目录列表,由查询执行用于溢出中间数据(例如,外部排序、哈希连接和其他操作符)。指定一个或多个路径,用 `;` 分隔(例如 `/mnt/ssd1/tmp;/mnt/ssd2/tmp`)。目录必须可由 BE 进程访问和写入,并有足够的可用空间;StarRocks 将在其中选择以分配溢出 I/O。更改需要重新启动才能生效。如果目录缺失、不可写或已满,溢出可能会失败或降低查询性能。 +- 是否可变: 否 +- 描述: 以逗号分隔的 scratch 目录列表,由查询执行用于溢出中间数据(例如,外部排序、哈希连接和其他操作)。指定一个或多个以 `;` 分隔的路径(例如,`/mnt/ssd1/tmp;/mnt/ssd2/tmp`)。目录应可由 BE 进程访问和写入,并有足够的可用空间;StarRocks 将从中选择以分配溢出 I/O。更改需要重新启动才能生效。如果目录缺失、不可写入或已满,溢出可能会失败或降低查询性能。 - 引入版本: v3.2.0 ##### result_buffer_cancelled_interval_time @@ -1144,7 +1144,7 @@ curl http://:/varz - 默认值: 300 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: BufferControlBlock 释放数据前的等待时间。 - 引入版本: - @@ -1153,7 +1153,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: 分钟 -- 可变: 是 +- 是否可变: 是 - 描述: 清理 Scan Context 的时间间隔。 - 引入版本: - @@ -1162,8 +1162,8 @@ curl http://:/varz - 默认值: 16384 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 扫描中每个扫描线程返回的最大行数。 +- 是否可变: 是 +- 描述: 每次扫描中每个扫描线程返回的最大行数。 - 引入版本: - ##### scanner_thread_pool_queue_size @@ -1171,7 +1171,7 @@ curl http://:/varz - 默认值: 102400 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 存储引擎支持的扫描任务数量。 - 引入版本: - @@ -1180,7 +1180,7 @@ curl http://:/varz - 默认值: 48 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 存储引擎用于并发存储卷扫描的线程数。所有线程都在线程池中管理。 - 引入版本: - @@ -1189,7 +1189,7 @@ curl http://:/varz - 默认值: 16 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 当 `enable_string_prefix_zonemap` 启用时,用于字符串 ZoneMap Min/Max 的前缀长度。 - 引入版本: - @@ -1198,8 +1198,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: 线程 -- 可变: 否 -- 描述: 设置在 ExecEnv 中创建的 UDF 调用 PriorityThreadPool 的大小(用于执行用户定义函数/UDF 相关任务)。该值用作线程池线程计数,也用作构建线程池 (PriorityThreadPool("udf", thread_num, queue_size)) 时的池队列容量。增加此值以允许更多并发 UDF 执行;保持较小以避免过度的 CPU 和内存争用。 +- 是否可变: 否 +- 描述: 设置 ExecEnv 中创建的 UDF 调用 PriorityThreadPool 的大小(用于执行用户定义函数/UDF 相关任务)。该值用作线程池的线程计数,也用作构建线程池(PriorityThreadPool("udf", thread_num, queue_size))时的队列容量。增加此值以允许更多并发 UDF 执行;保持较小值以避免过多的 CPU 和内存争用。 - 引入版本: v3.2.0 ##### update_memory_limit_percent @@ -1207,8 +1207,8 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 百分比 -- 可变: 否 -- 描述: 为更新相关内存和缓存保留的 BE 进程内存的比例。在启动期间,`GlobalEnv` 计算更新的 `MemTracker` 为 `process_mem_limit * clamp(update_memory_limit_percent, 0, 100) / 100`。`UpdateManager` 也使用此百分比来确定其主索引/索引缓存容量(索引缓存容量 = `GlobalEnv::process_mem_limit * update_memory_limit_percent / 100`)。HTTP 配置更新逻辑注册一个回调,该回调在更新管理器上调用 `update_primary_index_memory_limit`,因此如果配置更改,更改将应用于更新子系统。增加此值会为更新/主索引路径分配更多内存(减少其他池可用的内存);减少此值会减少更新内存和缓存容量。值被限制在 0-100 范围内。 +- 是否可变: 否 +- 描述: 预留给更新相关内存和缓存的 BE 进程内存的百分比。在启动期间,`GlobalEnv` 计算更新的 `MemTracker` 为 `process_mem_limit * clamp(update_memory_limit_percent, 0, 100) / 100`。`UpdateManager` 也使用此百分比来调整其主索引/索引缓存容量(索引缓存容量 = `GlobalEnv::process_mem_limit * update_memory_limit_percent / 100`)。HTTP 配置更新逻辑注册了一个回调,该回调在更新管理器上调用 `update_primary_index_memory_limit`,因此如果配置更改,更改将应用于更新子系统。增加此值会为更新/主索引路径提供更多内存(减少其他池可用的内存);减少它会减少更新内存和缓存容量。值被限制在 0-100 范围内。 - 引入版本: v3.2.0 ##### vector_chunk_size @@ -1216,8 +1216,8 @@ curl http://:/varz - 默认值: 4096 - 类型: Int - 单位: 行 -- 可变: 否 -- 描述: 在整个执行和存储代码路径中使用的每个向量化 Chunk(批次)的行数。此值控制 Chunk 和 RuntimeState 的 batch_size 创建,影响操作符吞吐量、每个操作符的内存占用、溢出和排序缓冲区大小调整以及 I/O 启发式(例如,ORC 写入器的自然写入大小)。增加它可以在宽/CPU 密集型工作负载中提高 CPU 和 I/O 效率,但会提高峰值内存使用率,并可能增加小结果查询的延迟。仅当分析显示批次大小是瓶颈时才进行调整;否则保持默认值以平衡内存和性能。 +- 是否可变: 否 +- 描述: 在整个执行和存储代码路径中使用的每个向量化 Chunk(批次)的行数。此值控制 Chunk 和 RuntimeState 的 `batch_size` 创建,影响操作符吞吐量、每个操作符的内存占用、溢出和排序缓冲区大小,以及 I/O 启发式(例如,ORC 写入器自然写入大小)。增加它可以在宽/CPU 密集型工作负载中提高 CPU 和 I/O 效率,但会提高峰值内存使用,并可能增加小结果查询的延迟。仅当分析显示批次大小是瓶颈时才进行调整;否则保持默认值以平衡内存和性能。 - 引入版本: v3.2.0 ### 加载 @@ -1227,7 +1227,7 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 用于清除事务的线程数。 - 引入版本: - @@ -1236,8 +1236,8 @@ curl http://:/varz - 默认值: 4096 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 列模式部分更新处理插入行时的批处理大小。如果此项设置为 `0` 或负数,它将被限制为 `1` 以避免无限循环。此项控制每个批次处理的新插入行数。较大的值可以提高写入性能,但会消耗更多内存。 +- 是否可变: 是 +- 描述: 列模式部分更新处理插入行时的批次大小。如果此项设置为 `0` 或负数,它将被限制为 `1` 以避免无限循环。此项控制每个批次处理的新插入行数。较大的值可以提高写入性能,但会消耗更多内存。 - 引入版本: v3.5.10, v4.0.2 ##### enable_load_spill_parallel_merge @@ -1245,7 +1245,7 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 指定是否在单个 Tablet 中启用并行溢出合并。启用此功能可以提高数据加载期间溢出合并的性能。 - 引入版本: - @@ -1254,7 +1254,7 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 指定是否记录 Stream Load 作业的 HTTP 请求和响应。 - 引入版本: v2.5.17, v3.0.9, v3.1.6, v3.2.1 @@ -1263,8 +1263,8 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 每个存储用于刷新 MemTable 的线程数。 +- 是否可变: 是 +- 描述: 每个存储中用于刷新 MemTable 的线程数。 - 引入版本: - ##### lake_flush_thread_num_per_store @@ -1272,10 +1272,10 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 共享数据集群中每个存储用于刷新 MemTable 的线程数。 -当此值设置为 `0` 时,系统使用 CPU 核心数的两倍作为值。 -当此值设置为小于 `0` 时,系统使用其绝对值与 CPU 核心数的乘积作为值。 +- 是否可变: 是 +- 描述: 共享数据集群中每个存储中用于刷新 MemTable 的线程数。 +当此值设置为 `0` 时,系统将使用 CPU 核心数的两倍作为该值。 +当此值设置为小于 `0` 时,系统将使用其绝对值与 CPU 核心数的乘积作为该值。 - 引入版本: v3.1.12, 3.2.7 ##### load_data_reserve_hours @@ -1283,8 +1283,8 @@ curl http://:/varz - 默认值: 4 - 类型: Int - 单位: 小时 -- 可变: 否 -- 描述: 小型加载产生文件的保留时间。 +- 是否可变: 否 +- 描述: 小型加载生成文件的保留时间。 - 引入版本: - ##### load_error_log_reserve_hours @@ -1292,7 +1292,7 @@ curl http://:/varz - 默认值: 48 - 类型: Int - 单位: 小时 -- 可变: 是 +- 是否可变: 是 - 描述: 数据加载日志的保留时间。 - 引入版本: - @@ -1301,7 +1301,7 @@ curl http://:/varz - 默认值: 107374182400 - 类型: Int - 单位: 字节 -- 可变: 否 +- 是否可变: 否 - 描述: BE 节点上所有加载进程可占用的内存资源的最大大小限制。 - 引入版本: - @@ -1310,8 +1310,8 @@ curl http://:/varz - 默认值: 1073741824 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 溢出合并期间每次合并操作的最大内存使用量。默认值为 1 GB (1073741824 字节)。此参数控制数据加载溢出合并期间单个合并任务的内存消耗,以防止内存使用量过高。 +- 是否可变: 是 +- 描述: 溢出合并期间每个合并操作的最大内存使用量。默认值为 1 GB (1073741824 字节)。此参数控制数据加载溢出合并期间单个合并任务的内存消耗,以防止内存使用量过大。 - 引入版本: - ##### max_consumer_num_per_group @@ -1319,8 +1319,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: - -- 可变: 是 -- 描述: Routine Load 消费者组中的最大消费者数量。 +- 是否可变: 是 +- 描述: Routine Load 消费组中消费者的最大数量。 - 引入版本: - ##### max_runnings_transactions_per_txn_map @@ -1328,8 +1328,8 @@ curl http://:/varz - 默认值: 100 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 每个分区中可同时运行的最大事务数。 +- 是否可变: 是 +- 描述: 每个分区中可并发运行的最大事务数量。 - 引入版本: - ##### number_tablet_writer_threads @@ -1337,8 +1337,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 摄取(例如 Stream Load、Broker Load 和 Insert)中使用的 Tablet 写入线程数。当参数设置为小于或等于 0 时,系统使用 CPU 核心数的一半,最小值为 16。当参数设置为大于 0 时,系统使用该值。此配置从 v3.1.7 版本开始变为动态。 +- 是否可变: 是 +- 描述: 摄取(如 Stream Load、Broker Load 和 Insert)中使用的 Tablet 写入器线程数。当参数设置为小于或等于 0 时,系统使用 CPU 核心数的一半,最小值为 16。当参数设置为大于 0 时,系统使用该值。此配置项从 v3.1.7 开始更改为动态。 - 引入版本: - ##### push_worker_count_high_priority @@ -1346,8 +1346,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 用于处理高优先级加载任务的线程数。 +- 是否可变: 否 +- 描述: 用于处理 HIGH 优先级加载任务的线程数。 - 引入版本: - ##### push_worker_count_normal_priority @@ -1355,8 +1355,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 用于处理普通优先级加载任务的线程数。 +- 是否可变: 否 +- 描述: 用于处理 NORMAL 优先级加载任务的线程数。 - 引入版本: - ##### streaming_load_max_batch_size_mb @@ -1364,7 +1364,7 @@ curl http://:/varz - 默认值: 100 - 类型: Int - 单位: MB -- 可变: 是 +- 是否可变: 是 - 描述: 可流式传输到 StarRocks 的 JSON 文件的最大大小。 - 引入版本: - @@ -1373,7 +1373,7 @@ curl http://:/varz - 默认值: 102400 - 类型: Int - 单位: MB -- 可变: 是 +- 是否可变: 是 - 描述: 可流式传输到 StarRocks 的文件的最大大小。从 v3.0 开始,默认值已从 `10240` 更改为 `102400`。 - 引入版本: - @@ -1382,8 +1382,8 @@ curl http://:/varz - 默认值: 1200 - 类型: Int - 单位: 秒 -- 可变: 否 -- 描述: Stream Load 的 RPC 超时时间。 +- 是否可变: 否 +- 描述: Stream Load 的 RPC 超时。 - 引入版本: - ##### transaction_publish_version_thread_pool_idle_time_ms @@ -1391,8 +1391,8 @@ curl http://:/varz - 默认值: 60000 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: Publish Version 线程池回收线程前的空闲时间。 +- 是否可变: 否 +- 描述: 线程被 Publish Version 线程池回收前的空闲时间。 - 引入版本: - ##### transaction_publish_version_worker_count @@ -1400,8 +1400,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 用于发布版本的最大线程数。当此值设置为小于或等于 `0` 时,系统使用 CPU 核心数作为值,以避免在高并发导入时线程资源不足而仅使用固定数量的线程。从 v2.5 开始,默认值已从 `8` 更改为 `0`。 +- 是否可变: 是 +- 描述: 用于发布版本的最大线程数。当此值设置为小于或等于 `0` 时,系统将使用 CPU 核心数作为该值,以避免在高导入并发但仅使用固定线程数时出现线程资源不足的问题。从 v2.5 开始,默认值已从 `8` 更改为 `0`。 - 引入版本: - ##### write_buffer_size @@ -1409,7 +1409,7 @@ curl http://:/varz - 默认值: 104857600 - 类型: Int - 单位: 字节 -- 可变: 是 +- 是否可变: 是 - 描述: 内存中 MemTable 的缓冲区大小。此配置项是触发刷新的阈值。 - 引入版本: - @@ -1420,8 +1420,8 @@ curl http://:/varz - 默认值: 30 - 类型: int - 单位: 秒 -- 可变: 否 -- 描述: 后端 Broker 操作用于写入/IO RPC 的超时(秒)。该值乘以 1000 以产生毫秒超时,并作为默认的 `timeout_ms` 传递给 BrokerFileSystem 和 BrokerServiceConnection 实例(例如文件导出和快照上传/下载)。当 Broker 或网络较慢或传输大文件时,增加此值可避免过早超时;减少此值可能会导致 Broker RPC 更早失败。此值在 common/config 中定义,并在进程启动时应用(不可动态重新加载)。 +- 是否可变: 否 +- 描述: 后端 Broker 操作用于写入/IO RPC 的超时(秒)。该值乘以 1000 以生成毫秒超时,并作为默认的 timeout_ms 传递给 BrokerFileSystem 和 BrokerServiceConnection 实例(例如,文件导出和快照上传/下载)。当 Broker 或网络缓慢或传输大文件时,增加此值以避免过早超时;减少此值可能会导致 Broker RPC 失败更早。此值在 common/config 中定义,并在进程启动时应用(不可动态重新加载)。 - 引入版本: v3.2.0 ##### enable_load_channel_rpc_async @@ -1429,8 +1429,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 启用时,加载通道打开 RPC(例如 `PTabletWriterOpen`)的处理将从 BRPC 工作线程卸载到专门的线程池:请求处理程序创建一个 `ChannelOpenTask` 并将其提交到内部 `_async_rpc_pool`,而不是内联运行 `LoadChannelMgr::_open`。这减少了 BRPC 线程内部的工作和阻塞,并允许通过 `load_channel_rpc_thread_pool_num` 和 `load_channel_rpc_thread_pool_queue_size` 调整并发性。如果线程池提交失败(当池满或关闭时),请求将被取消并返回错误状态。池在 `LoadChannelMgr::close()` 时关闭,因此在启用此功能时要考虑容量和生命周期,以避免请求被拒绝或处理延迟。 +- 是否可变: 是 +- 描述: 启用时,加载通道打开 RPC(例如,`PTabletWriterOpen`)的处理将从 BRPC Worker 卸载到专用线程池:请求处理程序创建一个 `ChannelOpenTask` 并将其提交到内部 `_async_rpc_pool`,而不是在行内运行 `LoadChannelMgr::_open`。这减少了 BRPC 线程中的工作和阻塞,并允许通过 `load_channel_rpc_thread_pool_num` 和 `load_channel_rpc_thread_pool_queue_size` 调整并发性。如果线程池提交失败(当池已满或已关闭时),请求将被取消并返回错误状态。池在 `LoadChannelMgr::close()` 时关闭,因此在启用此功能时请考虑容量和生命周期,以避免请求被拒绝或处理延迟。 - 引入版本: v3.5.0 ##### enable_load_diagnose @@ -1438,8 +1438,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 启用时,StarRocks 将在 bRPC 超时匹配 "[E1008]Reached timeout" 后尝试从 BE OlapTableSink/NodeChannel 进行自动加载诊断。代码会创建一个 `PLoadDiagnoseRequest` 并向远程 LoadChannel 发送 RPC 以收集配置文件和/或堆栈跟踪(由 `load_diagnose_rpc_timeout_profile_threshold_ms` 和 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 控制)。诊断 RPC 使用 `load_diagnose_send_rpc_timeout_ms` 作为其超时。如果诊断请求已在进行中,则跳过诊断。启用此功能会在目标节点上产生额外的 RPC 和分析工作;在敏感的生产工作负载上禁用以避免额外开销。 +- 是否可变: 是 +- 描述: 启用时,StarRocks 会在 bRPC 超时匹配 "[E1008]Reached timeout" 后,尝试从 BE OlapTableSink/NodeChannel 进行自动化负载诊断。代码会创建一个 `PLoadDiagnoseRequest` 并向远程 LoadChannel 发送 RPC,以收集配置文件和/或栈追踪(由 `load_diagnose_rpc_timeout_profile_threshold_ms` 和 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 控制)。诊断 RPC 使用 `load_diagnose_send_rpc_timeout_ms` 作为其超时。如果诊断请求已在进行中,则跳过诊断。启用此功能会在目标节点上产生额外的 RPC 和分析工作;在敏感的生产工作负载上禁用此功能以避免额外开销。 - 引入版本: v3.5.0 ##### enable_load_segment_parallel @@ -1447,8 +1447,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 启用时,Rowset Segment 加载和 Rowset 级别读取会使用 StarRocks 后台线程池(ExecEnv::load_segment_thread_pool 和 ExecEnv::load_rowset_thread_pool)并发执行。Rowset::load_segments 和 TabletReader::get_segment_iterators 将每个 Segment 或每个 Rowset 任务提交到这些池中,如果提交失败则回退到串行加载并记录警告。启用此功能可降低大型 Rowset 的读取/加载延迟,但会增加 CPU/IO 并发性和内存压力。注意:并行加载可能会改变 Segment 的加载完成顺序,因此会阻止部分 Compaction(代码会检查 `_parallel_load` 并在启用时禁用部分 Compaction);考虑对依赖 Segment 顺序的操作的影响。 +- 是否可变: 否 +- 描述: 启用时,RowSet Segment 加载和 RowSet 级别读取将使用 StarRocks 后台线程池(ExecEnv::load_segment_thread_pool 和 ExecEnv::load_rowset_thread_pool)并发执行。Rowset::load_segments 和 TabletReader::get_segment_iterators 将每个 Segment 或每个 RowSet 的任务提交到这些池中,如果提交失败,则回退到串行加载并记录警告。启用此功能可以降低大型 RowSet 的读取/加载延迟,但会增加 CPU/IO 并发性和内存压力。注意:并行加载可能会改变 Segment 的加载完成顺序,从而阻止部分 Compaction(代码会检查 `_parallel_load` 并在启用时禁用部分 Compaction);请考虑依赖 Segment 顺序的操作的影响。 - 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### enable_streaming_load_thread_pool @@ -1456,8 +1456,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 控制流式加载扫描器是否提交到专用的流式加载线程池。启用时,如果查询是 `TLoadJobType::STREAM_LOAD` 的 LOAD,ConnectorScanNode 会将扫描器任务提交到 `streaming_load_thread_pool`(该池配置有 INT32_MAX 线程和队列大小,即实际上无限制)。禁用时,扫描器使用通用的 `thread_pool` 及其 PriorityThreadPool 提交逻辑(优先级计算、try_offer/offer 行为)。启用此功能可将流式加载工作与常规查询执行隔离,以减少干扰;但是,由于专用池实际上是无限制的,启用此功能可能会在重度流式加载流量下增加并发线程和资源使用。此选项默认开启,通常不需要修改。 +- 是否可变: 是 +- 描述: 控制 Streaming Load 扫描器是否提交到专用的 Streaming Load 线程池。当启用且查询是带有 `TLoadJobType::STREAM_LOAD` 的 LOAD 时,ConnectorScanNode 会将扫描器任务提交到 `streaming_load_thread_pool`(配置有 INT32_MAX 个线程和队列大小,即实际上无界)。禁用时,扫描器使用通用 `thread_pool` 及其 `PriorityThreadPool` 提交逻辑(优先级计算、try_offer/offer 行为)。启用此功能可将 Streaming Load 工作与常规查询执行隔离,以减少干扰;但是,由于专用池实际上是无界的,启用此功能可能会在重度 Streaming Load 流量下增加并发线程和资源使用。此选项默认开启,通常不需要修改。 - 引入版本: v3.2.0 ##### es_http_timeout_ms @@ -1465,8 +1465,8 @@ curl http://:/varz - 默认值: 5000 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: ESScanReader 中 ES 网络客户端用于 Elasticsearch scroll 请求的 HTTP 连接超时(毫秒)。此值通过 `network_client.set_timeout_ms()` 应用,然后发送后续的 scroll POST 请求,并控制客户端在滚动过程中等待 ES 响应的时间。对于慢速网络或大型查询,增加此值可避免过早超时;减少此值可更快地在无响应的 ES 节点上失败。此设置补充了 `es_scroll_keepalive`,后者控制 scroll 上下文的保持活动持续时间。 +- 是否可变: 否 +- 描述: ESScanReader 中 ES 网络客户端用于 Elasticsearch 滚动请求的 HTTP 连接超时(毫秒)。此值通过 `network_client.set_timeout_ms()` 应用,然后发送后续的滚动 POST,并控制客户端在滚动期间等待 ES 响应的时间。对于慢速网络或大型查询,增加此值以避免过早超时;减少此值以更快地对无响应的 ES 节点失败。此设置补充了 `es_scroll_keepalive`,后者控制滚动上下文的保活持续时间。 - 引入版本: v3.2.0 ##### es_index_max_result_window @@ -1474,8 +1474,8 @@ curl http://:/varz - 默认值: 10000 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 限制 StarRocks 在单个批次中将从 Elasticsearch 请求的最大文档数。StarRocks 在为 ES 读取器构建 `KEY_BATCH_SIZE` 时,将 ES 请求批处理大小设置为 min(`es_index_max_result_window`, `chunk_size`)。如果 ES 请求超过 Elasticsearch 索引设置 `index.max_result_window`,Elasticsearch 将返回 HTTP 400 (Bad Request)。在扫描大型索引时调整此值,或在 Elasticsearch 端增加 ES `index.max_result_window` 以允许更大的单个请求。 +- 是否可变: 否 +- 描述: 限制 StarRocks 在单个批次中从 Elasticsearch 请求的最大文档数。StarRocks 在为 ES 读取器构建 `KEY_BATCH_SIZE` 时,将 ES 请求批次大小设置为 min(`es_index_max_result_window`, `chunk_size`)。如果 ES 请求超过 Elasticsearch 索引设置 `index.max_result_window`,Elasticsearch 将返回 HTTP 400 (Bad Request)。在扫描大型索引时调整此值,或在 Elasticsearch 端增加 ES `index.max_result_window` 以允许更大的单个请求。 - 引入版本: v3.2.0 ##### ignore_load_tablet_failure @@ -1483,8 +1483,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 当此项设置为 `false` 时,系统将把任何 Tablet Header 加载失败(非 NotFound 和非 AlreadyExist 错误)视为致命错误:代码将记录错误并调用 LOG(FATAL) 以停止 BE 进程。当设置为 `true` 时,BE 将继续启动,即使存在此类每个 Tablet 的加载错误——失败的 Tablet ID 会被记录并跳过,而成功的 Tablet 仍会加载。请注意,此参数**不**会抑制 RocksDB 元数据扫描本身的致命错误,这些错误总是会导致进程退出。 +- 是否可变: 否 +- 描述: 当此项设置为 `false` 时,系统将把任何 Tablet 头加载失败(非 NotFound 和非 AlreadyExist 错误)视为致命错误:代码会记录错误并调用 LOG(FATAL) 以停止 BE 进程。当设置为 `true` 时,BE 将在出现此类每个 Tablet 的加载错误时继续启动——失败的 Tablet ID 会被记录并跳过,而成功的 Tablet 仍然会被加载。请注意,此参数不会抑制 RocksDB 元扫描本身导致的致命错误,这些错误始终会导致进程退出。 - 引入版本: v3.2.0 ##### load_channel_abort_clean_up_delay_seconds @@ -1492,8 +1492,8 @@ curl http://:/varz - 默认值: 600 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 控制系统在从 `_aborted_load_channels` 中删除中止的加载通道的加载 ID 之前保留多长时间(秒)。当加载作业被取消或失败时,加载 ID 会被记录,以便任何迟到的加载 RPC 可以立即被拒绝;一旦延迟过期,该条目将在定期后台清理期间被清除(最小清理间隔为 60 秒)。将延迟设置过低可能会在中止后接受离散的 RPC,而设置过高可能会保留状态并消耗资源比必要时间更长。调整此值以平衡晚期请求拒绝的正确性和中止加载的资源保留。 +- 是否可变: 是 +- 描述: 控制系统在从 `_aborted_load_channels` 中移除中止的加载通道的加载 ID 之前保留多长时间(秒)。当加载作业被取消或失败时,加载 ID 会被记录下来,以便任何迟到的加载 RPC 可以立即被拒绝;一旦延迟过期,该条目将在定期后台扫描(最小扫描间隔为 60 秒)期间被清除。将延迟设置得过低可能会在中止后接受离散的 RPC,而设置得过高可能会保留状态并消耗资源超过必要的时间。调整此值以平衡迟到请求拒绝的正确性和中止加载的资源保留。 - 引入版本: v3.5.11, v4.0.4 ##### load_channel_rpc_thread_pool_num @@ -1501,17 +1501,17 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 线程 -- 可变: 是 -- 描述: 加载通道异步 RPC 线程池的最大线程数。当设置为小于或等于 0(默认 `-1`)时,池大小会自动设置为 CPU 核心数 (`CpuInfo::num_cores()`)。配置的值用作 ThreadPoolBuilder 的最大线程数,池的最小线程数设置为 min(5, max_threads)。池队列大小由 `load_channel_rpc_thread_pool_queue_size` 单独控制。引入此设置是为了使异步 RPC 池大小与 bRPC 工作线程的默认值 (`brpc_num_threads`) 对齐,以便在将加载 RPC 处理从同步切换到异步后行为保持兼容。在运行时更改此配置会触发 `ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)`。 +- 是否可变: 是 +- 描述: 加载通道异步 RPC 线程池的最大线程数。当设置为小于或等于 0(默认 `-1`)时,池大小会自动设置为 CPU 核心数 (`CpuInfo::num_cores()`)。配置的值用作 ThreadPoolBuilder 的最大线程数,池的最小线程数设置为 min(5, max_threads)。池队列大小由 `load_channel_rpc_thread_pool_queue_size` 单独控制。引入此设置是为了使异步 RPC 池大小与 bRPC worker 的默认值 (`brpc_num_threads`) 对齐,以便在加载 RPC 处理从同步切换到异步后行为保持兼容。在运行时更改此配置会触发 `ExecEnv::GetInstance()->load_channel_mgr()->async_rpc_pool()->update_max_threads(...)`。 - 引入版本: v3.5.0 ##### load_channel_rpc_thread_pool_queue_size - 默认值: 1024000 - 类型: int -- 单位: 数量 -- 可变: 否 -- 描述: 设置 LoadChannelMgr 创建的加载通道 RPC 线程池的最大待处理任务队列大小。当 `enable_load_channel_rpc_async` 启用时,此线程池执行异步 `open` 请求;池大小与 `load_channel_rpc_thread_pool_num` 配对。大型默认值 (1024000) 与 bRPC 工作线程的默认值对齐,以在从同步处理切换到异步处理后保留行为。如果队列已满,ThreadPool::submit() 将失败,并且传入的 `open` RPC 将因错误而被取消,导致调用者收到拒绝。增加此值可缓冲更大的并发 `open` 请求突发;减少它会收紧反压,但在负载下可能导致更多拒绝。 +- 单位: 计数 +- 是否可变: 否 +- 描述: 设置由 LoadChannelMgr 创建的加载通道 RPC 线程池的最大挂起任务队列大小。当 `enable_load_channel_rpc_async` 启用时,此线程池执行异步 `open` 请求;池大小与 `load_channel_rpc_thread_pool_num` 配对。默认值较大 (1024000) 与 bRPC worker 的默认值对齐,以在从同步处理切换到异步处理后保持行为。如果队列已满,ThreadPool::submit() 将失败,并且传入的 open RPC 将被取消并返回错误,导致调用方收到拒绝。增加此值以缓冲更大批量的并发 `open` 请求;减少它会收紧反压,但可能会在负载下导致更多拒绝。 - 引入版本: v3.5.0 ##### load_diagnose_rpc_timeout_profile_threshold_ms @@ -1519,8 +1519,8 @@ curl http://:/varz - 默认值: 60000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 当加载 RPC 超时(错误包含 "[E1008]Reached timeout")且 `enable_load_diagnose` 为 true 时,此阈值控制是否请求完整的分析诊断。如果请求级别的 RPC 超时 `_rpc_timeout_ms` 大于 `load_diagnose_rpc_timeout_profile_threshold_ms`,则为该诊断启用分析。对于较小的 `_rpc_timeout_ms` 值,分析每 20 次超时采样一次,以避免对实时/短超时加载进行频繁的重诊断。此值影响发送的 `PLoadDiagnoseRequest` 中的 `profile` 标志;堆栈跟踪行为由 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 单独控制,发送超时由 `load_diagnose_send_rpc_timeout_ms` 控制。 +- 是否可变: 是 +- 描述: 当加载 RPC 超时(错误包含 "[E1008]Reached timeout")且 `enable_load_diagnose` 为 true 时,此阈值控制是否请求完整的分析诊断。如果请求级别的 RPC 超时 `_rpc_timeout_ms` 大于 `load_diagnose_rpc_timeout_profile_threshold_ms`,则该诊断启用分析。对于较小的 `_rpc_timeout_ms` 值,分析每 20 次超时采样一次,以避免对实时/短超时加载进行频繁的重诊断。此值影响发送的 `PLoadDiagnoseRequest` 中的 `profile` 标志;栈追踪行为由 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 单独控制,发送超时由 `load_diagnose_send_rpc_timeout_ms` 控制。 - 引入版本: v3.5.0 ##### load_diagnose_rpc_timeout_stack_trace_threshold_ms @@ -1528,8 +1528,8 @@ curl http://:/varz - 默认值: 600000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 用于决定何时请求远程堆栈跟踪以用于长时间运行的加载 RPC 的阈值(毫秒)。当加载 RPC 因超时错误而超时且有效 RPC 超时 (`_rpc_timeout_ms`) 超过此值时,`OlapTableSink`/`NodeChannel` 将在发送到目标 BE 的 `load_diagnose` RPC 中包含 `stack_trace=true`,以便 BE 可以返回堆栈跟踪进行调试。`LocalTabletsChannel::SecondaryReplicasWaiter` 也会在等待辅助副本超过此间隔时从主副本触发尽力而为的堆栈跟踪诊断。此行为需要 `enable_load_diagnose`,并使用 `load_diagnose_send_rpc_timeout_ms` 作为诊断 RPC 超时;分析由 `load_diagnose_rpc_timeout_profile_threshold_ms` 单独控制。降低此值会增加请求堆栈跟踪的积极性。 +- 是否可变: 是 +- 描述: 用于决定何时请求远程栈追踪以用于长时间运行的加载 RPC 的阈值(毫秒)。当加载 RPC 因超时错误而超时,并且有效 RPC 超时 (`_rpc_timeout_ms`) 超过此值时,`OlapTableSink`/`NodeChannel` 将在 `load_diagnose` RPC 中包含 `stack_trace=true` 发送到目标 BE,以便 BE 可以返回栈追踪进行调试。`LocalTabletsChannel::SecondaryReplicasWaiter` 也会在等待次要副本超过此间隔时,触发主副本的最佳尝试栈追踪诊断。此行为需要 `enable_load_diagnose`,并使用 `load_diagnose_send_rpc_timeout_ms` 作为诊断 RPC 超时;分析由 `load_diagnose_rpc_timeout_profile_threshold_ms` 单独控制。降低此值会增加请求栈追踪的积极性。 - 引入版本: v3.5.0 ##### load_diagnose_send_rpc_timeout_ms @@ -1537,8 +1537,8 @@ curl http://:/varz - 默认值: 2000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 应用于 BE 加载路径发起的诊断相关 bRPC 调用的超时(毫秒)。它用于设置 `load_diagnose` RPC(当 LoadChannel bRPC 调用超时时由 NodeChannel/OlapTableSink 发送)和副本状态查询(当 SecondaryReplicasWaiter / LocalTabletsChannel 检查主副本状态时使用)的控制器超时。选择一个足够高的值以允许远程端响应配置文件或堆栈跟踪数据,但不要太高以至于延迟故障处理。此参数与 `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms` 和 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 协同工作,它们控制何时以及请求哪些诊断信息。 +- 是否可变: 是 +- 描述: 应用于 BE 加载路径启动的诊断相关 bRPC 调用的超时(毫秒)。它用于为 `load_diagnose` RPC(当 LoadChannel bRPC 调用超时时由 NodeChannel/OlapTableSink 发送)和副本状态查询(当 SecondaryReplicasWaiter / LocalTabletsChannel 检查主副本状态时使用)设置控制器超时。选择足够高的值,以允许远程端响应配置文件或栈追踪数据,但不要太高,以免延迟故障处理。此参数与 `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms` 和 `load_diagnose_rpc_timeout_stack_trace_threshold_ms` 协同工作,它们控制何时以及请求何种诊断信息。 - 引入版本: v3.5.0 ##### load_fp_brpc_timeout_ms @@ -1546,8 +1546,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 当 `node_channel_set_brpc_timeout` 故障点触发时,覆盖 OlapTableSink 使用的每个通道 bRPC RPC 超时。如果设置为正值,NodeChannel 将其内部 `_rpc_timeout_ms` 设置为该值(毫秒),导致 open/add-chunk/cancel RPC 使用较短的超时,并启用模拟产生 "[E1008]Reached timeout" 错误的 bRPC 超时。默认值 (`-1`) 禁用覆盖。更改此值旨在用于测试和故障注入;小值可能会产生错误的超时并触发加载诊断(参见 `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms` 和 `load_diagnose_send_rpc_timeout_ms`)。 +- 是否可变: 是 +- 描述: 当 `node_channel_set_brpc_timeout` 故障点触发时,覆盖 OlapTableSink 使用的每个通道 bRPC RPC 超时。如果设置为正值,NodeChannel 将其内部 `_rpc_timeout_ms` 设置为该值(毫秒),导致 open/add-chunk/cancel RPC 使用较短的超时,并启用模拟产生 "[E1008]Reached timeout" 错误的 bRPC 超时。默认值 (`-1`) 禁用此覆盖。更改此值用于测试和故障注入;小值可能会产生虚假超时并触发加载诊断(参见 `enable_load_diagnose`、`load_diagnose_rpc_timeout_profile_threshold_ms`、`load_diagnose_rpc_timeout_stack_trace_threshold_ms` 和 `load_diagnose_send_rpc_timeout_ms`)。 - 引入版本: v3.5.0 ##### load_fp_tablets_channel_add_chunk_block_ms @@ -1555,8 +1555,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 启用时(设置为正毫秒值),此故障点配置使 TabletsChannel::add_chunk 在加载处理期间休眠指定时间。它用于模拟 BRPC 超时错误(例如 "[E1008]Reached timeout")并模拟昂贵的 add_chunk 操作,从而增加加载延迟。小于或等于 0 的值(默认 `-1`)禁用注入。旨在用于测试故障处理、超时和副本同步行为——不要在正常的生产工作负载中启用,因为它会延迟写入完成并可能触发上游超时或副本中止。 +- 是否可变: 是 +- 描述: 启用时(设置为正毫秒值),此故障点配置会使 TabletsChannel::add_chunk 在加载处理期间休眠指定时间。它用于模拟 BRPC 超时错误(例如,"[E1008]Reached timeout")并模拟耗时的 add_chunk 操作,从而增加加载延迟。小于或等于 0 的值(默认 `-1`)禁用注入。用于测试故障处理、超时和副本同步行为——请勿在正常生产工作负载中启用,因为它会延迟写入完成并可能触发上游超时或副本中止。 - 引入版本: v3.5.0 ##### load_segment_thread_pool_num_max @@ -1564,8 +1564,8 @@ curl http://:/varz - 默认值: 128 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 设置 BE 加载相关线程池的最大工作线程数。此值由 ThreadPoolBuilder 用于限制 exec_env.cpp 中 `load_rowset_pool` 和 `load_segment_pool` 的线程数,控制流式加载和批量加载期间处理已加载 Rowset 和 Segment(例如解码、索引、写入)的并发性。增加此值会提高并行度并可以改善加载吞吐量,但也会增加 CPU、内存使用率和潜在的争用;减少此值会限制并发加载处理并可能降低吞吐量。与 `load_segment_thread_pool_queue_size` 和 `streaming_load_thread_pool_idle_time_ms` 一起调整。更改需要 BE 重启。 +- 是否可变: 否 +- 描述: 设置 BE 加载相关线程池的最大工作线程数。此值由 ThreadPoolBuilder 用于限制 `exec_env.cpp` 中的 `load_rowset_pool` 和 `load_segment_pool` 的线程,控制 Streaming 和批处理加载期间处理已加载 RowSet 和 Segment(例如,解码、索引、写入)的并发性。增加此值可提高并行性并可提高加载吞吐量,但也会增加 CPU、内存使用和潜在的争用;减少此值会限制并发加载处理并可能降低吞吐量。与 `load_segment_thread_pool_queue_size` 和 `streaming_load_thread_pool_idle_time_ms` 一起调整。更改需要 BE 重启。 - 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### load_segment_thread_pool_queue_size @@ -1573,8 +1573,8 @@ curl http://:/varz - 默认值: 10240 - 类型: Int - 单位: 任务 -- 可变: 否 -- 描述: 设置作为 "load_rowset_pool" 和 "load_segment_pool" 创建的加载相关线程池的最大队列长度(待处理任务数)。这些池使用 `load_segment_thread_pool_num_max` 作为其最大线程数,此配置控制在 ThreadPool 的溢出策略生效之前可以缓冲多少加载 Segment/Rowset 任务(根据 ThreadPool 实现,进一步的提交可能会被拒绝或阻塞)。增加此值以允许更多待处理的加载工作(使用更多内存并可能增加延迟);减少此值以限制缓冲加载并发性并减少内存使用量。 +- 是否可变: 否 +- 描述: 设置创建为 "load_rowset_pool" 和 "load_segment_pool" 的加载相关线程池的最大队列长度(挂起任务数)。这些池使用 `load_segment_thread_pool_num_max` 作为其最大线程数,此配置控制在 ThreadPool 的溢出策略生效之前可以缓冲多少加载 Segment/RowSet 任务(进一步的提交可能会根据 ThreadPool 的实现被拒绝或阻塞)。增加此值以允许更多挂起的加载工作(使用更多内存并可能增加延迟);减少它以限制缓冲的加载并发性并减少内存使用。 - 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### max_pulsar_consumer_num_per_group @@ -1582,8 +1582,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 控制 BE 上 Routine Load 的单个数据消费者组中可创建的最大 Pulsar 消费者数量。由于多主题订阅不支持累积确认,每个消费者只订阅一个主题/分区;如果 `pulsar_info->partitions` 中的分区数量超过此值,则组创建将失败并返回错误,建议增加 BE 上的 `max_pulsar_consumer_num_per_group` 或添加更多 BE。此限制在构建 PulsarDataConsumerGroup 时强制执行,并防止 BE 为一个 Routine Load 组托管超过此数量的消费者。对于 Kafka Routine Load,则使用 `max_consumer_num_per_group`。 +- 是否可变: 是 +- 描述: 控制 BE 上 Routine Load 的单个数据消费组中可以创建的最大 Pulsar 消费者数量。由于多主题订阅不支持累积确认,每个消费者精确订阅一个主题/分区;如果 `pulsar_info->partitions` 中的分区数量超过此值,则组创建将失败,并提示增加 BE 上的 `max_pulsar_consumer_num_per_group` 或添加更多 BE。此限制在构建 PulsarDataConsumerGroup 时强制执行,并防止 BE 为一个 Routine Load 组托管超过此数量的消费者。对于 Kafka Routine Load,则使用 `max_consumer_num_per_group`。 - 引入版本: v3.2.0 ##### pull_load_task_dir @@ -1591,8 +1591,8 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/var/pull_load` - 类型: string - 单位: - -- 可变: 否 -- 描述: BE 存储“拉取加载”任务(下载的源文件、任务状态、临时输出等)数据和工作文件的文件系统路径。目录必须可由 BE 进程写入,并有足够的磁盘空间用于传入加载。默认值是相对于 STARROCKS_HOME 的;测试创建并期望此目录存在(参见测试配置)。 +- 是否可变: 否 +- 描述: 文件系统路径,BE 在此存储“拉取加载”任务的数据和工作文件(下载的源文件、任务状态、临时输出等)。该目录必须可由 BE 进程写入,并有足够的磁盘空间用于传入加载。默认值相对于 STARROCKS_HOME;测试创建并期望此目录存在(参见测试配置)。 - 引入版本: v3.2.0 ##### routine_load_kafka_timeout_second @@ -1600,8 +1600,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: 秒 -- 可变: 否 -- 描述: 用于 Kafka 相关 Routine Load 操作的超时时间(秒)。当客户端请求未指定超时时,`routine_load_kafka_timeout_second` 用作 `get_info` 的默认 RPC 超时(转换为毫秒)。它还用作 librdkafka 消费者的每次调用消费轮询超时(转换为毫秒并受剩余运行时限制)。注意:内部 `get_info` 路径在将其传递给 librdkafka 之前将此值减小到 80% 以避免 FE 端超时竞争。将此值设置为平衡及时故障报告和网络/Broker 响应足够时间的值;更改需要重新启动,因为该设置不可变。 +- 是否可变: 否 +- 描述: Kafka 相关 Routine Load 操作使用的超时(秒)。当客户端请求未指定超时时,`routine_load_kafka_timeout_second` 用作 `get_info` 的默认 RPC 超时(转换为毫秒)。它也用作 librdkafka 消费者的每次调用消费轮询超时(转换为毫秒并受限于剩余运行时)。注意:内部 `get_info` 路径在将其传递给 librdkafka 之前将此值减少 80%,以避免 FE 端超时竞争。将此值设置为平衡及时故障报告和网络/代理响应足够时间的值;更改需要重新启动,因为此设置为不可变。 - 引入版本: v3.2.0 ##### routine_load_pulsar_timeout_second @@ -1609,8 +1609,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: 秒 -- 可变: 否 -- 描述: 当请求未提供显式超时时,BE 用于 Pulsar 相关 Routine Load 操作的默认超时(秒)。具体来说,`PInternalServiceImplBase::get_pulsar_info` 将此值乘以 1000 以形成传递给获取 Pulsar 分区元数据和积压的 Routine Load 任务执行器方法的毫秒超时。增加此值以允许较慢的 Pulsar 响应,但代价是更长的故障检测时间;减少此值以在慢速 Broker 上更快失败。类似于用于 Kafka 的 `routine_load_kafka_timeout_second`。 +- 是否可变: 否 +- 描述: BE 在请求未提供明确超时时,用于 Pulsar 相关 Routine Load 操作的默认超时(秒)。具体来说,`PInternalServiceImplBase::get_pulsar_info` 将此值乘以 1000 以形成传递给获取 Pulsar 分区元数据和积压的 Routine Load 任务执行器方法的毫秒超时。增加此值以允许较慢的 Pulsar 响应,但会增加故障检测时间;减少此值以更快地对慢速 Broker 失败。类似于用于 Kafka 的 `routine_load_kafka_timeout_second`。 - 引入版本: v3.2.0 ##### streaming_load_thread_pool_idle_time_ms @@ -1618,8 +1618,8 @@ curl http://:/varz - 默认值: 2000 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: 设置流式加载相关线程池的线程空闲超时(毫秒)。该值用作 `stream_load_io` 池以及 `load_rowset_pool` 和 `load_segment_pool` 的 ThreadPoolBuilder 的空闲超时。这些池中的线程在此持续时间空闲时将被回收;较小的值可更快释放资源,但会增加突发负载下的线程创建开销,而较大的值可使线程在短时突发期间保持活动状态。当 `enable_streaming_load_thread_pool` 启用时,使用 `stream_load_io` 池。 +- 是否可变: 否 +- 描述: 设置 Streaming Load 相关线程池的线程空闲超时(毫秒)。该值用作传递给 ThreadPoolBuilder 的 `stream_load_io` 池以及 `load_rowset_pool` 和 `load_segment_pool` 的空闲超时。这些池中的线程在此持续时间内空闲时将被回收;较小的值可更快释放资源但增加线程创建开销,而较大值可保持线程在短时突发负载下存活。`stream_load_io` 池在 `enable_streaming_load_thread_pool` 启用时使用。 - 引入版本: v3.2.0 ##### streaming_load_thread_pool_num_min @@ -1627,8 +1627,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 在 ExecEnv 初始化期间创建的流式加载 IO 线程池 ("stream_load_io") 的最小线程数。该池以 `set_max_threads(INT32_MAX)` 和 `set_max_queue_size(INT32_MAX)` 构建,因此它实际上是无限制的,以避免并发流式加载的死锁。值为 0 允许池以零线程启动并按需增长;设置正值会在启动时保留那么多线程。当 `enable_streaming_load_thread_pool` 为 true 时使用此池,其空闲超时由 `streaming_load_thread_pool_idle_time_ms` 控制。总体并发性仍然受 `fragment_pool_thread_num_max` 和 `webserver_num_workers` 的限制;更改此值很少必要,如果设置过高可能会增加资源使用。 +- 是否可变: 否 +- 描述: ExecEnv 初始化期间创建的 Streaming Load IO 线程池 ("stream_load_io") 的最小线程数。该池以 `set_max_threads(INT32_MAX)` 和 `set_max_queue_size(INT32_MAX)` 构建,因此它实际上是无界的,以避免并发 Streaming Load 时的死锁。值为 0 允许池在启动时没有线程并按需增长;设置正值会在启动时保留那么多线程。此池在 `enable_streaming_load_thread_pool` 为 true 时使用,其空闲超时由 `streaming_load_thread_pool_idle_time_ms` 控制。总体并发性仍受 `fragment_pool_thread_num_max` 和 `webserver_num_workers` 限制;更改此值很少有必要,如果设置过高可能会增加资源使用。 - 引入版本: v3.2.0 ### 统计报告 @@ -1638,8 +1638,8 @@ curl http://:/varz - 默认值: true - 类型: boolean - 单位: - -- 可变: 否 -- 描述: 当为 true 时,BE 进程会启动一个后台 "metrics_daemon" 线程(在非 Apple 平台上在 Daemon::init 中启动),该线程大约每 15 秒运行一次,以调用 `StarRocksMetrics::instance()->metrics()->trigger_hook()` 并计算派生/系统指标(例如,推送/查询字节/秒、最大磁盘 I/O 利用率、最大网络发送/接收速率),记录内存细分并运行表指标清理。当为 false 时,这些 Hook 在指标收集时在 MetricRegistry::collect 内部同步执行,这可能会增加指标抓取延迟。需要进程重新启动才能生效。 +- 是否可变: 否 +- 描述: 当为 true 时,BE 进程会启动一个后台“metrics_daemon”线程(在非 Apple 平台上通过 Daemon::init 启动),该线程每约 15 秒运行一次,调用 `StarRocksMetrics::instance()->metrics()->trigger_hook()` 并计算派生/系统指标(例如,push/query 字节/秒、最大磁盘 I/O 利用率、最大网络发送/接收速率),记录内存分解并运行表指标清理。当为 false 时,这些 hook 在指标收集时在 `MetricRegistry::collect` 内部同步执行,这可能会增加指标抓取延迟。需要重新启动进程才能生效。 - 引入版本: v3.2.0 ##### enable_system_metrics @@ -1647,8 +1647,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 当为 true 时,StarRocks 在启动时初始化系统级监控:它从配置的存储路径发现磁盘设备并枚举网络接口,然后将此信息传递到指标子系统,以启用磁盘 I/O、网络流量和内存相关系统指标的收集。如果设备或接口发现失败,初始化会记录警告并中止系统指标设置。此标志仅控制是否初始化系统指标;定期指标聚合线程由 `enable_metric_calculator` 单独控制,JVM 指标初始化由 `enable_jvm_metrics` 控制。更改此值需要重新启动。 +- 是否可变: 否 +- 描述: 当为 true 时,StarRocks 在启动期间初始化系统级监控:它从配置的存储路径发现磁盘设备并枚举网络接口,然后将此信息传递到指标子系统,以启用磁盘 I/O、网络流量和内存相关系统指标的收集。如果设备或接口发现失败,初始化会记录警告并中止系统指标设置。此标志仅控制是否初始化系统指标;周期性指标聚合线程由 `enable_metric_calculator` 单独控制,JVM 指标初始化由 `enable_jvm_metrics` 控制。更改此值需要重新启动。 - 引入版本: v3.2.0 ##### profile_report_interval @@ -1656,8 +1656,8 @@ curl http://:/varz - 默认值: 30 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: ProfileReportWorker 用于 (1) 决定何时报告 LOAD 查询的每个 Fragment 配置文件信息以及 (2) 在报告周期之间休眠的间隔(秒)。工作线程使用 (profile_report_interval * 1000) 毫秒将当前时间与每个任务的 `last_report_time` 进行比较,以确定是否应为非 Pipeline 和 Pipeline 加载任务重新报告配置文件。在每次循环中,工作线程读取当前值(运行时可变);如果配置值小于或等于 0,工作线程会强制将其设置为 1 并发出警告。更改此值会影响下一个报告决策和休眠持续时间。 +- 是否可变: 是 +- 描述: ProfileReportWorker 用于 (1) 决定何时报告 LOAD 查询的每个片段配置文件信息,以及 (2) 在报告周期之间睡眠的时间间隔(秒)。Worker 将当前时间与每个任务的 last_report_time 使用 (profile_report_interval * 1000) ms 进行比较,以确定是否应为非 Pipeline 和 Pipeline 加载任务重新报告配置文件。在每个循环中,Worker 读取当前值(运行时可变);如果配置值小于或等于 0,Worker 会强制将其设置为 1 并发出警告。更改此值会影响下一次报告决策和睡眠持续时间。 - 引入版本: v3.2.0 ##### report_disk_state_interval_seconds @@ -1665,8 +1665,8 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 报告存储卷状态(包括卷内数据大小)的时间间隔。 +- 是否可变: 是 +- 描述: 报告存储卷状态的时间间隔,包括卷内数据的大小。 - 引入版本: - ##### report_resource_usage_interval_ms @@ -1674,8 +1674,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: BE 代理向 FE(主节点)发送定期资源使用报告的时间间隔(毫秒)。代理工作线程收集 TResourceUsage(运行中的查询数量、已用/限制内存、已用 CPU 千分比和资源组使用情况)并调用 report_task,然后休眠此配置的间隔(参见 task_worker_pool)。较小的值可提高报告的及时性,但会增加 CPU、网络和主节点负载;较大的值可减少开销,但会使资源信息更新不及时。报告会更新相关指标(report_resource_usage_requests_total、report_resource_usage_requests_failed)。根据集群规模和 FE 负载进行调整。 +- 是否可变: 是 +- 描述: BE 代理向 FE (master) 发送周期性资源使用报告的间隔(毫秒)。代理工作线程收集 TResourceUsage(运行中的查询数量、已用/限制内存、已用 CPU 千分比和资源组使用情况)并调用 report_task,然后在此配置的间隔内休眠(参见 task_worker_pool)。较小的值可提高报告及时性,但会增加 CPU、网络和 master 负载;较大的值可减少开销,但会使资源信息更新不及时。报告更新相关指标(report_resource_usage_requests_total、report_resource_usage_requests_failed)。根据集群规模和 FE 负载进行调整。 - 引入版本: v3.2.0 ##### report_tablet_interval_seconds @@ -1683,7 +1683,7 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 报告所有 Tablet 最新版本的时间间隔。 - 引入版本: - @@ -1692,7 +1692,7 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 报告任务状态的时间间隔。任务可以是创建表、删除表、加载数据或更改表模式。 - 引入版本: - @@ -1701,7 +1701,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 报告所有工作组最新版本的时间间隔。 - 引入版本: - @@ -1712,7 +1712,7 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 用于 Schema Change 的线程数。 - 引入版本: - @@ -1721,7 +1721,7 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 是否从 Avro Union 数据类型序列化的 JSON 字符串中剥离类型标签。 - 引入版本: v3.3.7, v3.4 @@ -1730,7 +1730,7 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: Base Compaction 的线程轮询时间间隔。 - 引入版本: - @@ -1739,8 +1739,8 @@ curl http://:/varz - 默认值: 86400 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 自上次 Base Compaction 以来的时间间隔。此配置项是触发 Base Compaction 的条件之一。 +- 是否可变: 是 +- 描述: 上次 Base Compaction 发生以来的时间间隔。此配置项是触发 Base Compaction 的条件之一。 - 引入版本: - ##### base_compaction_num_threads_per_disk @@ -1748,8 +1748,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 每个存储卷用于 Base Compaction 的线程数。 +- 是否可变: 否 +- 描述: 每个存储卷上用于 Base Compaction 的线程数。 - 引入版本: - ##### base_cumulative_delta_ratio @@ -1757,8 +1757,8 @@ curl http://:/varz - 默认值: 0.3 - 类型: Double - 单位: - -- 可变: 是 -- 描述: 累积文件大小与基文件大小之比。达到此比率是触发 Base Compaction 的条件之一。 +- 是否可变: 是 +- 描述: 累积文件大小与 Base 文件大小之比。此比率达到此值是触发 Base Compaction 的条件之一。 - 引入版本: - ##### chaos_test_enable_random_compaction_strategy @@ -1766,8 +1766,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 当此项设置为 `true` 时,TabletUpdates::compaction() 使用为混沌工程测试设计的随机 Compaction 策略 (compaction_random)。此标志强制 Compaction 遵循非确定性/随机策略,而不是正常策略(例如,Size-tiered Compaction),并且在 Tablet 的 Compaction 选择期间具有优先权。它仅用于受控测试:启用它可能会导致不可预测的 Compaction 顺序、增加 I/O/CPU 和测试不稳定性。请勿在生产环境中启用;仅用于故障注入或混沌测试场景。 +- 是否可变: 是 +- 描述: 当此项设置为 `true` 时,TabletUpdates::compaction() 使用为混沌工程测试设计的随机 Compaction 策略 (compaction_random)。此标志强制 Compaction 遵循非确定性/随机策略而不是正常策略(例如,size-tiered Compaction),并在 Tablet 的 Compaction 选择期间优先。它仅用于受控测试:启用它可能会产生不可预测的 Compaction 顺序、增加 I/O/CPU 和测试不稳定性。请勿在生产环境中启用;仅用于故障注入或混沌测试场景。 - 引入版本: v3.3.12, 3.4.2, 3.5.0, 4.0.0 ##### check_consistency_worker_count @@ -1775,7 +1775,7 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 用于检查 Tablet 一致性的线程数。 - 引入版本: - @@ -1784,8 +1784,8 @@ curl http://:/varz - 默认值: 3600 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 系统清除异常复制留下的过期快照的时间间隔。 +- 是否可变: 是 +- 描述: 系统清除异常复制遗留的过期快照的时间间隔。 - 引入版本: v3.3.5 ##### compact_threads @@ -1793,8 +1793,8 @@ curl http://:/varz - 默认值: 4 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 用于并发 Compaction 任务的最大线程数。此配置从 v3.1.7 和 v3.2.2 版本开始变为动态。 +- 是否可变: 是 +- 描述: 用于并发 Compaction 任务的最大线程数。此配置项从 v3.1.7 和 v3.2.2 开始更改为动态。 - 引入版本: v3.0.0 ##### compaction_max_memory_limit @@ -1802,8 +1802,8 @@ curl http://:/varz - 默认值: -1 - 类型: Long - 单位: 字节 -- 可变: 否 -- 描述: 此 BE 上 Compaction 任务可用内存的全局上限(字节)。在 BE 初始化期间,最终的 Compaction 内存限制计算为 min(`compaction_max_memory_limit`, `process_mem_limit * compaction_max_memory_limit_percent / 100`)。如果 `compaction_max_memory_limit` 为负数(默认 `-1`),则回退到从 `mem_limit` 派生的 BE 进程内存限制。百分比值被限制在 [0,100] 之间。如果进程内存限制未设置(负数),则 Compaction 内存保持无限制 (`-1`)。此计算值用于初始化 `_compaction_mem_tracker`。另请参见 `compaction_max_memory_limit_percent` 和 `compaction_memory_limit_per_worker`。 +- 是否可变: 否 +- 描述: 此 BE 上 Compaction 任务可用内存的全局上限(字节)。在 BE 初始化期间,最终的 Compaction 内存限制计算为 min(`compaction_max_memory_limit`, process_mem_limit * `compaction_max_memory_limit_percent` / 100)。如果 `compaction_max_memory_limit` 为负数(默认 `-1`),则回退到从 `mem_limit` 派生的 BE 进程内存限制。百分比值被限制在 [0,100] 之间。如果进程内存限制未设置(负数),Compaction 内存保持无限制(`-1`)。此计算值用于初始化 `_compaction_mem_tracker`。另请参见 `compaction_max_memory_limit_percent` 和 `compaction_memory_limit_per_worker`。 - 引入版本: v3.2.0 ##### compaction_max_memory_limit_percent @@ -1811,8 +1811,8 @@ curl http://:/varz - 默认值: 100 - 类型: Int - 单位: 百分比 -- 可变: 否 -- 描述: Compaction 可能使用的 BE 进程内存百分比。BE 将 Compaction 内存上限计算为 `compaction_max_memory_limit` 和 (进程内存限制 × 此百分比 / 100) 的最小值。如果此值 < 0 或 > 100,则将其视为 100。如果 `compaction_max_memory_limit` < 0,则使用进程内存限制。计算还会考虑从 `mem_limit` 派生的 BE 进程内存。结合 `compaction_memory_limit_per_worker`(每个 worker 的上限),此设置控制可用的总 Compaction 内存,因此会影响 Compaction 并发性和 OOM 风险。 +- 是否可变: 否 +- 描述: 可用于 Compaction 的 BE 进程内存的百分比。BE 将 Compaction 内存上限计算为 `compaction_max_memory_limit` 和 (进程内存限制 × 此百分比 / 100) 中的最小值。如果此值 < 0 或 > 100,则将其视为 100。如果 `compaction_max_memory_limit` < 0,则改用进程内存限制。该计算还考虑了从 `mem_limit` 派生的 BE 进程内存。结合 `compaction_memory_limit_per_worker`(每个 Worker 的上限),此设置控制可用的总 Compaction 内存,从而影响 Compaction 并发性和 OOM 风险。 - 引入版本: v3.2.0 ##### compaction_memory_limit_per_worker @@ -1820,7 +1820,7 @@ curl http://:/varz - 默认值: 2147483648 - 类型: Int - 单位: 字节 -- 可变: 否 +- 是否可变: 否 - 描述: 每个 Compaction 线程允许的最大内存大小。 - 引入版本: - @@ -1829,8 +1829,8 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 每次 Compaction 的时间阈值。如果一次 Compaction 的时间超过此阈值,StarRocks 会打印相应的跟踪。 +- 是否可变: 是 +- 描述: 每次 Compaction 的时间阈值。如果 Compaction 耗时超过此阈值,StarRocks 将打印相应的追踪。 - 引入版本: - ##### create_tablet_worker_count @@ -1838,8 +1838,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: 线程 -- 可变: 是 -- 描述: 设置 AgentServer 线程池中处理 FE 提交的 `TTaskType::CREATE`(创建 Tablet)任务的最大工作线程数。在 BE 启动时,此值用作线程池的最大值(池以最小线程数 = 1 和最大队列大小 = 无限制创建),在运行时更改此值会触发 `ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)`。增加此值可提高并发 Tablet 创建吞吐量(在批量加载或分区创建期间有用);减少此值会限制并发创建操作。提高此值会增加 CPU、内存和 I/O 并发性,并可能导致争用;线程池强制至少一个线程,因此小于 1 的值没有实际效果。 +- 是否可变: 是 +- 描述: 设置 AgentServer 线程池中处理 FE 提交的 `TTaskType::CREATE` (create-tablet) 任务的最大工作线程数。在 BE 启动时,此值用作线程池的最大值(池创建时最小线程数为 1,最大队列大小无限制),在运行时通过 `update-config` HTTP 操作更改此值会触发 `ExecEnv::agent_server()->get_thread_pool(TTaskType::CREATE)->update_max_threads(...)`。增加此值可提高并发 Tablet 创建吞吐量(在批量加载或分区创建期间很有用);减少它会限制并发创建操作。提高此值会增加 CPU、内存和 I/O 并发性,并可能导致争用;线程池强制至少一个线程,因此小于 1 的值没有实际效果。 - 引入版本: v3.2.0 ##### cumulative_compaction_check_interval_seconds @@ -1847,7 +1847,7 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 累积 Compaction 的线程轮询时间间隔。 - 引入版本: - @@ -1856,8 +1856,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 每个磁盘的累积 Compaction 线程数。 +- 是否可变: 否 +- 描述: 每块磁盘的累积 Compaction 线程数。 - 引入版本: - ##### data_page_size @@ -1865,8 +1865,8 @@ curl http://:/varz - 默认值: 65536 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: 构建列数据和索引页时使用的目标未压缩页面大小(字节)。此值复制到 ColumnWriterOptions.data_page_size 和 IndexedColumnWriterOptions.index_page_size,并由页面构建器(例如 BinaryPlainPageBuilder::is_page_full 和缓冲区保留逻辑)查阅,以决定何时完成页面以及保留多少内存。值为 0 会禁用构建器中的页面大小限制。更改此值会影响页面计数、元数据开销、内存保留和 I/O/压缩权衡(较小的页面→更多页面和元数据;较大的页面→较少页面,可能更好的压缩但更大的内存峰值)。 +- 是否可变: 否 +- 描述: 构建列数据和索引页时使用的目标未压缩页大小(字节)。此值被复制到 ColumnWriterOptions.data_page_size 和 IndexedColumnWriterOptions.index_page_size,并由页构建器(例如 BinaryPlainPageBuilder::is_page_full 和缓冲区预留逻辑)查询,以决定何时完成页以及预留多少内存。值为 0 会禁用构建器中的页大小限制。更改此值会影响页计数、元数据开销、内存预留以及 I/O/压缩权衡(较小的页 → 更多页和元数据;较大的页 → 较少页,可能更好的压缩但更大的内存峰值)。 - 引入版本: v3.2.4 ##### default_num_rows_per_column_file_block @@ -1874,7 +1874,7 @@ curl http://:/varz - 默认值: 1024 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 每个行块中可存储的最大行数。 - 引入版本: - @@ -1883,8 +1883,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: 线程 -- 可变: 否 -- 描述: DeleteTaskWorkerPool 中被分配为 HIGH 优先级删除线程的工作线程数。在启动时,AgentServer 以总线程数 = `delete_worker_count_normal_priority + delete_worker_count_high_priority` 创建删除池;前 `delete_worker_count_high_priority` 个线程被标记为专门尝试弹出 TPriority::HIGH 任务(它们轮询高优先级删除任务,如果没有可用则休眠/循环)。增加此值可增加高优先级删除请求的并发性;减少此值会降低专用容量并可能增加高优先级删除的延迟。 +- 是否可变: 否 +- 描述: DeleteTaskWorkerPool 中被分配为高优先级删除线程的工作线程数。启动时 AgentServer 创建删除池,总线程数 = delete_worker_count_normal_priority + delete_worker_count_high_priority;前 `delete_worker_count_high_priority` 个线程被标记为专门尝试弹出 TPriority::HIGH 任务(它们轮询高优先级删除任务,如果没有可用则休眠/循环)。增加此值会增加高优先级删除请求的并发性;减少此值会降低专用容量,并可能增加高优先级删除的延迟。 - 引入版本: v3.2.0 ##### dictionary_encoding_ratio @@ -1892,8 +1892,8 @@ curl http://:/varz - 默认值: 0.7 - 类型: Double - 单位: - -- 可变: 否 -- 描述: StringColumnWriter 在编码推测阶段用于决定 Chunk 的字典 (DICT_ENCODING) 编码和纯 (PLAIN_ENCODING) 编码之间的分数 (0.0–1.0)。代码计算 max_card = row_count * `dictionary_encoding_ratio` 并扫描 Chunk 的不同键计数;如果不同计数超过 max_card,写入器会选择 PLAIN_ENCODING。仅当 Chunk 大小通过 `dictionary_speculate_min_chunk_size`(且当 row_count > dictionary_min_rowcount)时才执行检查。将值设置得更高有利于字典编码(容忍更多不同键);设置得更低会导致更早回退到纯编码。值为 1.0 实际上强制字典编码(不同计数永远不会超过行数)。 +- 是否可变: 否 +- 描述: StringColumnWriter 在编码推测阶段用于决定 Chunk 的字典(DICT_ENCODING)和普通(PLAIN_ENCODING)编码之间的分数(0.0–1.0)。代码计算 max_card = `row_count * dictionary_encoding_ratio` 并扫描 Chunk 的不同键计数;如果不同计数超过 max_card,写入器选择 PLAIN_ENCODING。仅当 Chunk 大小通过 `dictionary_speculate_min_chunk_size`(且 `row_count > dictionary_min_rowcount`)时才执行检查。设置较高的值有利于字典编码(容忍更多不同键);设置较低的值会导致更早回退到普通编码。值为 1.0 实际上强制使用字典编码(不同计数永远不会超过行数)。 - 引入版本: v3.2.0 ##### dictionary_encoding_ratio_for_non_string_column @@ -1901,8 +1901,8 @@ curl http://:/varz - 默认值: 0 - 类型: double - 单位: - -- 可变: 否 -- 描述: 用于决定是否对非字符串列(数值、日期/时间、Decimal 类型)使用字典编码的比例阈值。启用时(值 > 0.0001),写入器计算 `max_card = row_count * dictionary_encoding_ratio_for_non_string_column`,对于 `row_count > dictionary_min_rowcount` 的样本,仅当 `distinct_count <= max_card` 时才选择 DICT_ENCODING;否则回退到 BIT_SHUFFLE。值为 `0`(默认)禁用非字符串字典编码。此参数类似于 `dictionary_encoding_ratio`,但适用于非字符串列。使用 (0,1] 范围内的值 — 较小的值将字典编码限制为基数较低的列,并减少字典内存/IO 开销。 +- 是否可变: 否 +- 描述: 用于决定是否对非字符串列(数值、日期/时间、Decimal 类型)使用字典编码的比例阈值。启用时(值 > 0.0001),写入器计算 `max_card = row_count * dictionary_encoding_ratio_for_non_string_column`,对于 `row_count > dictionary_min_rowcount` 的样本,仅当 `distinct_count ≤ max_card` 时才选择 DICT_ENCODING;否则回退到 BIT_SHUFFLE。值为 `0`(默认)禁用非字符串字典编码。此参数类似于 `dictionary_encoding_ratio`,但适用于非字符串列。使用 (0,1] 范围内的值——较小的值将字典编码限制到基数较低的列,并减少字典内存/IO 开销。 - 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### dictionary_page_size @@ -1910,8 +1910,8 @@ curl http://:/varz - 默认值: 1048576 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: 构建 Rowset Segment 时使用的字典页大小(字节)。此值在 BE Rowset 代码中读入 `PageBuilderOptions::dict_page_size`,并控制单个字典页中可存储的字典条目数。增加此值可以通过允许更大的字典来提高字典编码列的压缩比,但更大的页面在写入/编码期间会消耗更多内存,并在读取或物化页面时增加 I/O 和延迟。对于大内存、写密集型工作负载保守设置,并避免过大的值以防止运行时性能下降。 +- 是否可变: 否 +- 描述: 构建 RowSet Segment 时使用的字典页大小(字节)。此值被读取到 BE RowSet 代码中的 `PageBuilderOptions::dict_page_size`,并控制单个字典页中可以存储的字典条目数。增加此值可以通过允许更大的字典来提高字典编码列的压缩比,但更大的页在写入/编码期间会消耗更多内存,并且在读取或物化页时可能会增加 I/O 和延迟。对于大内存、写入密集型工作负载保守设置,并避免过大的值以防止运行时性能下降。 - 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### disk_stat_monitor_interval @@ -1919,7 +1919,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 监视磁盘健康状态的时间间隔。 - 引入版本: - @@ -1928,8 +1928,8 @@ curl http://:/varz - 默认值: 50 - 类型: Int - 单位: KB/秒 -- 可变: 是 -- 描述: 每个 HTTP 请求的下载速度下限。当 HTTP 请求在配置项 `download_low_speed_time` 指定的时间跨度内持续以低于此值的速度运行时,该请求将中止。 +- 是否可变: 是 +- 描述: 每个 HTTP 请求的下载速度下限。当 HTTP 请求在 `download_low_speed_time` 配置项指定的时间跨度内持续以低于此值的速度运行时,请求将中止。 - 引入版本: - ##### download_low_speed_time @@ -1937,8 +1937,8 @@ curl http://:/varz - 默认值: 300 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: HTTP 请求以低于限制的下载速度运行的最大时间。当 HTTP 请求在此配置项指定的时间跨度内持续以低于 `download_low_speed_limit_kbps` 值时,该请求将中止。 +- 是否可变: 是 +- 描述: HTTP 请求以低于限制的下载速度运行的最大时间。当 HTTP 请求在此配置项指定的时间跨度内持续以低于 `download_low_speed_limit_kbps` 值指定的速度运行时,请求将中止。 - 引入版本: - ##### download_worker_count @@ -1946,8 +1946,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 -- 描述: BE 节点上 restore 作业下载任务的最大线程数。`0` 表示将该值设置为 BE 所在机器的 CPU 核心数。 +- 是否可变: 是 +- 描述: BE 节点上恢复作业下载任务的最大线程数。`0` 表示将该值设置为 BE 所在机器的 CPU 核心数。 - 引入版本: - ##### drop_tablet_worker_count @@ -1955,7 +1955,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 用于删除 Tablet 的线程数。`0` 表示节点中 CPU 核心数的一半。 - 引入版本: - @@ -1964,8 +1964,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 是否在加载期间检查数据长度,以解决因 VARCHAR 数据超出边界而导致的 Compaction 失败。 +- 是否可变: 否 +- 描述: 加载期间是否检查数据长度,以解决因 VARCHAR 数据超出范围导致的 Compaction 失败。 - 引入版本: - ##### enable_event_based_compaction_framework @@ -1973,8 +1973,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 是否启用事件驱动的 Compaction 框架。`true` 表示启用事件驱动的 Compaction 框架,`false` 表示禁用。在 Tablet 数量众多或单个 Tablet 数据量大的场景中,启用事件驱动的 Compaction 框架可以大大减少 Compaction 的开销。 +- 是否可变: 否 +- 描述: 是否启用基于事件的 Compaction 框架。`true` 表示启用基于事件的 Compaction 框架,`false` 表示禁用。在 Tablet 数量多或单个 Tablet 数据量大的场景中,启用基于事件的 Compaction 框架可以大大降低 Compaction 开销。 - 引入版本: - ##### enable_lazy_delta_column_compaction @@ -1982,8 +1982,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 启用时,Compaction 将优先对部分列更新产生的 Delta 列采用“惰性”策略:StarRocks 将避免急于将 Delta 列文件合并回其主 Segment 文件,以节省 Compaction I/O。实际上,Compaction 选择代码会检查部分列更新 Rowset 和多个候选项;如果找到且此标志为 true,引擎将停止向 Compaction 添加更多输入或仅合并空 Rowset(级别 -1),使 Delta 列保持独立。这会减少 Compaction 期间的即时 I/O 和 CPU,但会以延迟合并为代价(可能导致更多 Segment 和临时存储开销)。正确性和查询语义保持不变。 +- 是否可变: 是 +- 描述: 启用时,Compaction 将优先对部分列更新产生的增量列采用“懒惰”策略:StarRocks 将避免急于将增量列文件合并回其主 Segment 文件以节省 Compaction I/O。实际上,Compaction 选择代码会检查部分列更新 RowSet 和多个候选;如果找到且此标志为 true,引擎将停止向 Compaction 添加更多输入,或者仅合并空 RowSet (level -1),将增量列分开。这减少了 Compaction 期间的即时 I/O 和 CPU,但代价是延迟合并(可能更多 Segment 和临时存储开销)。正确性和查询语义不变。 - 引入版本: v3.2.3 ##### enable_new_load_on_memory_limit_exceeded @@ -1991,8 +1991,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 当达到硬内存资源限制时,是否允许新的加载进程。`true` 表示允许新的加载进程,`false` 表示拒绝。 +- 是否可变: 是 +- 描述: 当达到内存资源硬限制时,是否允许新的加载进程。`true` 表示允许新的加载进程,`false` 表示拒绝。 - 引入版本: v3.3.2 ##### enable_pk_index_parallel_compaction @@ -2000,8 +2000,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 共享数据集群中是否启用主键索引的并行 Compaction。 +- 是否可变: 是 +- 描述: 是否在共享数据集群中为主键索引启用并行 Compaction。 - 引入版本: - ##### enable_pk_index_parallel_execution @@ -2009,8 +2009,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 共享数据集群中是否为主键索引操作启用并行执行。启用后,系统会在发布操作期间使用线程池并发处理 Segment,显著提高大型 Tablet 的性能。 +- 是否可变: 是 +- 描述: 是否在共享数据集群中为主键索引操作启用并行执行。启用后,系统会使用线程池在发布操作期间并发处理 Segment,显著提高大型 Tablet 的性能。 - 引入版本: - ##### enable_pk_index_eager_build @@ -2018,8 +2018,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否在数据导入和 Compaction 阶段急切构建主键索引文件。启用后,系统会在数据写入期间立即生成持久化 PK 索引文件,从而提高后续查询性能。 +- 是否可变: 是 +- 描述: 在数据导入和 Compaction 阶段是否急切构建主键索引文件。启用后,系统会在数据写入期间立即生成持久化 PK 索引文件,从而提高后续查询性能。 - 引入版本: - ##### enable_pk_size_tiered_compaction_strategy @@ -2027,7 +2027,7 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 是否为主键表启用 Size-tiered Compaction 策略。`true` 表示启用 Size-tiered Compaction 策略,`false` 表示禁用。 - 引入版本: 此项从 v3.2.4 和 v3.1.10 开始对共享数据集群生效,从 v3.2.5 和 v3.1.10 开始对共享无数据集群生效。 @@ -2036,8 +2036,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否验证生成的 Rowset 的正确性。启用后,将在 Compaction 和 Schema Change 后检查生成的 Rowset 的正确性。 +- 是否可变: 是 +- 描述: 是否验证生成的 RowSet 的正确性。启用后,将在 Compaction 和 Schema Change 后检查生成的 RowSet 的正确性。 - 引入版本: - ##### enable_size_tiered_compaction_strategy @@ -2045,7 +2045,7 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 是否启用 Size-tiered Compaction 策略(不包括主键表)。`true` 表示启用 Size-tiered Compaction 策略,`false` 表示禁用。 - 引入版本: - @@ -2054,8 +2054,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 当 `enable_strict_delvec_crc_check` 设置为 true 时,我们将对 Delete Vector 执行严格的 CRC32 检查,如果检测到不匹配,将返回失败。 +- 是否可变: 是 +- 描述: 当 `enable_strict_delvec_crc_check` 设置为 true 时,我们将对 delete vector 执行严格的 CRC32 检查,如果检测到不匹配,将返回失败。 - 引入版本: - ##### enable_transparent_data_encryption @@ -2063,8 +2063,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 启用时,StarRocks 将为新写入的存储对象(Segment 文件、删除/更新文件、Rowset Segment、Lake SST、持久索引文件等)创建加密的磁盘工件。写入器 (RowsetWriter/SegmentWriter、Lake UpdateManager/LakePersistentIndex 和相关代码路径) 将从 KeyCache 请求加密信息,将 `encryption_info` 附加到可写文件,并将 `encryption_meta` 持久化到 Rowset/Segment/SSTable 元数据中 (segment_encryption_metas、delete/update encryption metadata)。Frontend 和 Backend/CN 加密标志必须匹配 — 不匹配会导致 BE 在心跳时中止 (LOG(FATAL))。此标志在运行时不可变;在部署前启用它,并确保密钥管理 (KEK) 和 KeyCache 在整个集群中正确配置和同步。 +- 是否可变: 否 +- 描述: 启用时,StarRocks 将为新写入的存储对象(Segment 文件、删除/更新文件、RowSet Segment、Lake SST、持久索引文件等)创建加密的磁盘制品。写入器(RowsetWriter/SegmentWriter、Lake UpdateManager/LakePersistentIndex 和相关代码路径)将从 KeyCache 请求加密信息,将 encryption_info 附加到可写入文件,并将 encryption_meta 持久化到 RowSet / Segment / SSTable 元数据(segment_encryption_metas、delete/update encryption metadata)。Frontend 和 Backend/CN 的加密标志必须匹配——不匹配会导致 BE 在心跳时中止 (LOG(FATAL))。此标志在运行时不可变;在部署之前启用它,并确保密钥管理 (KEK) 和 KeyCache 在整个集群中正确配置和同步。 - 引入版本: v3.3.1, 3.4.0, 3.5.0, 4.0.0 ##### enable_zero_copy_from_page_cache @@ -2072,8 +2072,8 @@ curl http://:/varz - 默认值: true - 类型: boolean - 单位: - -- 可变: 是 -- 描述: 启用时,FixedLengthColumnBase 在追加来自 Page Cache 支持的缓冲区的数据时,可能会避免复制字节。在 `append_numbers` 中,如果满足所有条件,代码将获取传入的 ContainerResource 并设置列的内部资源指针(零拷贝):配置为 true,传入资源被拥有,资源内存与列元素类型对齐,列为空,并且资源长度是元素大小的倍数。启用此功能可减少 CPU 和内存拷贝开销,并可提高摄取/扫描吞吐量。缺点:它将列的生命周期与获取的缓冲区耦合,并依赖于正确的拥有权/对齐;禁用以强制安全复制。 +- 是否可变: 是 +- 描述: 启用时,FixedLengthColumnBase 在追加来自 PageCache 支持的缓冲区的数据时,可能会避免复制字节。在 `append_numbers` 中,如果所有条件都满足,代码将获取传入的 ContainerResource 并设置列的内部资源指针(零拷贝):配置为 true,传入资源已拥有,资源内存与列元素类型对齐,列为空,且资源长度是元素大小的倍数。启用此功能可减少 CPU 和内存复制开销,并可提高摄取/扫描吞吐量。缺点:它将列的生命周期与获取的缓冲区耦合,并依赖于正确的拥有权/对齐;禁用以强制安全复制。 - 引入版本: - ##### file_descriptor_cache_clean_interval @@ -2081,7 +2081,7 @@ curl http://:/varz - 默认值: 3600 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 清理一定时间内未使用的文件描述符的时间间隔。 - 引入版本: - @@ -2090,8 +2090,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 控制当配置的存储路径读/写检查失败或解析失败时的启动行为。当 `false`(默认)时,BE 将 `storage_root_path` 或 `spill_local_storage_dir` 中的任何损坏条目视为致命错误,并将中止启动。当 `true` 时,StarRocks 将跳过(记录警告并移除)任何 `check_datapath_rw` 失败或解析失败的存储路径,以便 BE 可以继续使用剩余的健康路径启动。注意:如果所有配置的路径都被移除,BE 仍将退出。启用此功能可能会掩盖配置错误或失败的磁盘,并导致被忽略路径上的数据不可用;相应地监控日志和磁盘健康状况。 +- 是否可变: 否 +- 描述: 控制配置的存储路径在读写检查失败或解析失败时的启动行为。当 `false`(默认)时,BE 将 `storage_root_path` 或 `spill_local_storage_dir` 中的任何损坏条目视为致命错误,并将中止启动。当 `true` 时,StarRocks 将跳过(记录警告并移除)任何 `check_datapath_rw` 失败或解析失败的存储路径,以便 BE 可以继续使用剩余的健康路径启动。注意:如果所有配置的路径都被移除,BE 仍然会退出。启用此功能可能会掩盖配置错误或故障的磁盘,并导致被忽略路径上的数据不可用;相应地监控日志和磁盘健康状况。 - 引入版本: v3.2.0 ##### inc_rowset_expired_sec @@ -2099,7 +2099,7 @@ curl http://:/varz - 默认值: 1800 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 传入数据的过期时间。此配置项用于增量克隆。 - 引入版本: - @@ -2108,7 +2108,7 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: BE 节点上所有加载进程可占用的内存资源的硬限制(比例)。当 `enable_new_load_on_memory_limit_exceeded` 设置为 `false`,且所有加载进程的内存消耗超过 `load_process_max_memory_limit_percent * load_process_max_memory_hard_limit_ratio` 时,将拒绝新的加载进程。 - 引入版本: v3.3.2 @@ -2117,7 +2117,7 @@ curl http://:/varz - 默认值: 30 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE 节点上所有加载进程可占用的内存资源的软限制(百分比)。 - 引入版本: - @@ -2126,8 +2126,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 控制内置 LZ4 压缩器使用的 LZ4“加速”参数(传递给 LZ4_compress_fast_continue)。较高的值优先考虑压缩速度,但会牺牲压缩比;较低的值 (1) 会产生更好的压缩,但速度较慢。有效范围:MIN=1,MAX=65537。此设置会影响 BlockCompression 中所有基于 LZ4 的编解码器(例如 LZ4 和 Hadoop-LZ4),并且只改变压缩的执行方式 — 它不改变 LZ4 格式或解压缩兼容性。对于 CPU 密集型或低延迟工作负载,可接受较大的输出时,向上调整(例如 4、8、...);对于存储或 IO 敏感型工作负载,保持为 1。在更改前使用代表性数据进行测试,因为吞吐量与大小的权衡高度依赖于数据。 +- 是否可变: 是 +- 描述: 控制内置 LZ4 压缩器使用的 LZ4 “加速”参数(传递给 LZ4_compress_fast_continue)。较高的值优先考虑压缩速度,但会牺牲压缩比;较低的值(1)会产生更好的压缩,但速度较慢。有效范围:MIN=1,MAX=65537。此设置会影响 BlockCompression 中所有基于 LZ4 的编解码器(例如,LZ4 和 Hadoop-LZ4),并且只改变压缩的执行方式——它不改变 LZ4 格式或解压缩兼容性。向上调整(例如,4、8 等)适用于 CPU 密集型或低延迟工作负载,其中较大的输出是可以接受的;对于存储或 IO 敏感型工作负载,保持为 1。更改前请使用代表性数据进行测试,因为吞吐量与大小的权衡高度依赖于数据。 - 引入版本: v3.4.1, 3.5.0, 4.0.0 ##### lz4_expected_compression_ratio @@ -2135,17 +2135,17 @@ curl http://:/varz - 默认值: 2.1 - 类型: double - 单位: 无量纲 (压缩比) -- 可变: 是 -- 描述: 序列化压缩策略用于判断观察到的 LZ4 压缩是否“良好”的阈值 (uncompressed_size / compressed_size)。在 compress_strategy.cpp 中,此值将观察到的 compress_ratio 与 `lz4_expected_compression_speed_mbps` 一起计算奖励指标;如果组合奖励 `>` 1.0,策略会记录正反馈。增加此值会提高预期的压缩比(使条件更难满足),而降低此值会使观察到的压缩更容易被认为是令人满意的。调整以匹配典型数据可压缩性。有效范围:MIN=1,MAX=65537。 +- 是否可变: 是 +- 描述: 序列化压缩策略用于判断观察到的 LZ4 压缩是否“良好”的阈值(uncompressed_size / compressed_size)。在 compress_strategy.cpp 中,此值与 `lz4_expected_compression_speed_mbps` 一起用于计算奖励指标;如果组合奖励 > 1.0,策略会记录正反馈。增加此值会提高预期的压缩比(使条件更难满足),而降低此值会使观察到的压缩更容易被认为是令人满意的。调整以匹配典型数据可压缩性。有效范围:MIN=1,MAX=65537。 - 引入版本: v3.4.1, 3.5.0, 4.0.0 ##### lz4_expected_compression_speed_mbps - 默认值: 600 - 类型: double -- 单位: MB/秒 -- 可变: 是 -- 描述: 自适应压缩策略 (CompressStrategy) 使用的预期 LZ4 压缩吞吐量(兆字节/秒)。反馈例程计算 `reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps)`。如果 `reward_ratio > 1.0`,则增加正计数器 (alpha),否则增加负计数器 (beta);这会影响未来数据是否会被压缩。调整此值以反映硬件上典型的 LZ4 吞吐量——提高它会使策略更难将运行分类为“良好”(需要更高的观察速度),降低它会使分类更容易。必须是正有限数。 +- 单位: MB/s +- 是否可变: 是 +- 描述: 自适应压缩策略 (CompressStrategy) 使用的预期 LZ4 压缩吞吐量(兆字节/秒)。反馈例程计算 reward_ratio = (observed_compression_ratio / lz4_expected_compression_ratio) * (observed_speed / lz4_expected_compression_speed_mbps)。如果 reward_ratio > 1.0,则正计数器 (alpha) 增加,否则负计数器 (beta) 增加;这会影响未来数据是否会被压缩。调整此值以反映您硬件上典型的 LZ4 吞吐量——提高它会使策略更难将运行分类为“良好”(需要更高的观察速度),降低它会使分类更容易。必须是正有限数。 - 引入版本: v3.4.1, 3.5.0, 4.0.0 ##### make_snapshot_worker_count @@ -2153,8 +2153,8 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 是 -- 描述: BE 节点上 make snapshot 任务的最大线程数。 +- 是否可变: 是 +- 描述: BE 节点上创建快照任务的最大线程数。 - 引入版本: - ##### manual_compaction_threads @@ -2162,7 +2162,7 @@ curl http://:/varz - 默认值: 4 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 手动 Compaction 的线程数。 - 引入版本: - @@ -2171,7 +2171,7 @@ curl http://:/varz - 默认值: 100 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 每次 Base Compaction 中可压缩的最大 Segment 数量。 - 引入版本: - @@ -2180,8 +2180,8 @@ curl http://:/varz - 默认值: 40960 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 候选 Compaction Tablet 的最大数量。如果值过大,会导致高内存使用和高 CPU 负载。 +- 是否可变: 是 +- 描述: 候选 Compaction Tablet 的最大数量。如果值太大,会导致高内存使用和高 CPU 负载。 - 引入版本: - ##### max_compaction_concurrency @@ -2189,8 +2189,8 @@ curl http://:/varz - 默认值: -1 - 类型: Int - 单位: - -- 可变: 是 -- 描述: Compaction(包括 Base Compaction 和 Cumulative Compaction)的最大并发数。值 `-1` 表示不对并发数施加限制。`0` 表示禁用 Compaction。当事件驱动的 Compaction 框架启用时,此参数是可变的。 +- 是否可变: 是 +- 描述: Compaction(包括 Base Compaction 和 Cumulative Compaction)的最大并发数。值 `-1` 表示不对并发施加限制。`0` 表示禁用 Compaction。当启用基于事件的 Compaction 框架时,此参数是可变的。 - 引入版本: - ##### max_cumulative_compaction_num_singleton_deltas @@ -2198,8 +2198,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 单次 Cumulative Compaction 中可合并的最大 Segment 数量。如果 Compaction 期间发生 OOM,可以减小此值。 +- 是否可变: 是 +- 描述: 单次累积 Compaction 中可合并的最大 Segment 数量。如果在 Compaction 期间发生 OOM,您可以减小此值。 - 引入版本: - ##### max_download_speed_kbps @@ -2207,8 +2207,8 @@ curl http://:/varz - 默认值: 50000 - 类型: Int - 单位: KB/秒 -- 可变: 是 -- 描述: 每个 HTTP 请求的最大下载速度。此值影响 BE 节点间数据副本同步的性能。 +- 是否可变: 是 +- 描述: 每个 HTTP 请求的最大下载速度。此值会影响 BE 节点之间数据副本同步的性能。 - 引入版本: - ##### max_garbage_sweep_interval @@ -2216,8 +2216,8 @@ curl http://:/varz - 默认值: 3600 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 存储卷垃圾回收的最大时间间隔。此配置从 v3.0 开始变为动态。 +- 是否可变: 是 +- 描述: 存储卷垃圾回收的最大时间间隔。此配置项从 v3.0 开始更改为动态。 - 引入版本: - ##### max_percentage_of_error_disk @@ -2225,7 +2225,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 在相应的 BE 节点退出之前,存储卷中可容忍的最大错误百分比。 - 引入版本: - @@ -2233,9 +2233,9 @@ curl http://:/varz - 默认值: 2 - 类型: Long -- 单位: 数量 -- 可变: 是 -- 描述: 控制写入路径的每个 Tablet 反压:当 Tablet 的排队(尚未刷新)MemTable 数量达到或超过 `max_queueing_memtable_per_tablet` 时,LocalTabletsChannel 和 LakeTabletsChannel 中的写入器将阻塞(休眠/重试),然后提交更多写入工作。这以增加高负载下的延迟或 RPC 超时为代价,减少了同时的 MemTable 刷新并发性和峰值内存使用。设置更高以允许更多并发 MemTable(更多内存和 I/O 突发);设置更低以限制内存压力并增加写入限流。 +- 单位: 计数 +- 是否可变: 是 +- 描述: 控制写入路径的每个 Tablet 反压:当 Tablet 的排队(尚未刷新)MemTable 数量达到或超过 `max_queueing_memtable_per_tablet` 时,LocalTabletsChannel 和 LakeTabletsChannel 中的写入器将阻塞(休眠/重试),然后提交更多写入工作。这以增加重负载的延迟或 RPC 超时为代价,减少了同时 MemTable 刷新的并发性和峰值内存使用。设置较高的值以允许更多并发 MemTable(更多内存和 I/O 突发);设置较低的值以限制内存压力并增加写入节流。 - 引入版本: v3.2.0 ##### max_row_source_mask_memory_bytes @@ -2243,8 +2243,8 @@ curl http://:/varz - 默认值: 209715200 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: 行源掩码缓冲区的最大内存大小。当缓冲区大于此值时,数据将持久化到磁盘上的临时文件。此值应设置为小于 `compaction_memory_limit_per_worker` 的值。 +- 是否可变: 否 +- 描述: 行源掩码缓冲区的最大内存大小。当缓冲区大于此值时,数据将持久化到磁盘上的临时文件。此值应设置低于 `compaction_memory_limit_per_worker` 的值。 - 引入版本: - ##### max_tablet_write_chunk_bytes @@ -2252,8 +2252,8 @@ curl http://:/varz - 默认值: 536870912 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 当前内存中 Tablet 写入 Chunk 的最大允许内存(字节),在此之前它被视为已满并排队等待发送。增加此值可减少加载宽表(多列)时的 RPC 频率,这可以提高吞吐量,但会增加内存使用和更大的 RPC 有效载荷。调整以平衡更少的 RPC 与内存和序列化/BRPC 限制。 +- 是否可变: 是 +- 描述: 当前内存中 Tablet 写入 Chunk 的最大允许内存(字节),在此之后它被视为已满并排队发送。增加此值可减少加载宽表(多列)时的 RPC 频率,这可以提高吞吐量,但会增加内存使用和 RPC 负载。调整以平衡更少的 RPC 与内存和序列化/BRPC 限制。 - 引入版本: v3.2.12 ##### max_update_compaction_num_singleton_deltas @@ -2261,8 +2261,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 主键表单次 Compaction 中可合并的最大 Rowset 数量。 +- 是否可变: 是 +- 描述: 主键表单次 Compaction 中可合并的最大 RowSet 数量。 - 引入版本: - ##### memory_limitation_per_thread_for_schema_change @@ -2270,17 +2270,17 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: GB -- 可变: 是 -- 描述: 每个 Schema Change 任务允许的最大内存大小。 +- 是否可变: 是 +- 描述: 每个模式更改任务允许的最大内存大小。 - 引入版本: - ##### memory_ratio_for_sorting_schema_change - 默认值: 0.8 - 类型: Double -- 单位: - (无单位比率) -- 可变: 是 -- 描述: 每个线程 Schema Change 内存限制的比例,用作排序 Schema Change 操作期间 MemTable 最大缓冲区大小。该比率乘以 `memory_limitation_per_thread_for_schema_change`(以 GB 配置并转换为字节)以计算 `max_buffer_size`,结果上限为 4GB。由 SchemaChangeWithSorting 和 SortedSchemaChange 在创建 MemTable/DeltaWriter 时使用。增加此比率允许更大的内存缓冲区(更少的刷新/合并),但会增加内存压力的风险;减少此比率会导致更频繁的刷新和更高的 I/O/合并开销。 +- 单位: - (无单位比例) +- 是否可变: 是 +- 描述: 每个线程的 Schema Change 内存限制的百分比,用作排序 Schema Change 操作期间 MemTable 的最大缓冲区大小。该比例乘以 `memory_limitation_per_thread_for_schema_change`(以 GB 配置并转换为字节)以计算 `max_buffer_size`,结果上限为 4GB。由 SchemaChangeWithSorting 和 SortedSchemaChange 在创建 MemTable/DeltaWriter 时使用。增加此比例允许更大的内存缓冲区(更少的刷新/合并),但增加了内存压力风险;减少它会导致更频繁的刷新和更高的 I/O/合并开销。 - 引入版本: v3.2.0 ##### min_base_compaction_num_singleton_deltas @@ -2288,7 +2288,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 触发 Base Compaction 的最小 Segment 数量。 - 引入版本: - @@ -2297,8 +2297,8 @@ curl http://:/varz - 默认值: 120 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 自上次 Compaction 失败以来,可调度 Tablet Compaction 的最短时间间隔。 +- 是否可变: 是 +- 描述: 自上次 Compaction 失败以来,Tablet Compaction 可调度的最小时间间隔。 - 引入版本: - ##### min_cumulative_compaction_failure_interval_sec @@ -2306,8 +2306,8 @@ curl http://:/varz - 默认值: 30 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: Cumulative Compaction 失败后重试的最短时间间隔。 +- 是否可变: 是 +- 描述: 累积 Compaction 在失败后重试的最小时间间隔。 - 引入版本: - ##### min_cumulative_compaction_num_singleton_deltas @@ -2315,8 +2315,8 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 触发 Cumulative Compaction 的最小 Segment 数量。 +- 是否可变: 是 +- 描述: 触发累积 Compaction 的最小 Segment 数量。 - 引入版本: - ##### min_garbage_sweep_interval @@ -2324,8 +2324,8 @@ curl http://:/varz - 默认值: 180 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 存储卷垃圾回收的最小时间间隔。此配置从 v3.0 开始变为动态。 +- 是否可变: 是 +- 描述: 存储卷垃圾回收的最小时间间隔。此配置项从 v3.0 开始更改为动态。 - 引入版本: - ##### parallel_clone_task_per_path @@ -2333,8 +2333,8 @@ curl http://:/varz - 默认值: 8 - 类型: Int - 单位: 线程 -- 可变: 是 -- 描述: BE 上每个存储路径分配的并行克隆工作线程数。在 BE 启动时,克隆线程池的最大线程数计算为 max(number_of_store_paths * parallel_clone_task_per_path, MIN_CLONE_TASK_THREADS_IN_POOL)。例如,对于 4 个存储路径和默认值 8,克隆池最大线程数 = 32。此设置直接控制 BE 处理 CLONE 任务(Tablet 副本拷贝)的并发性:增加它会提高并行克隆吞吐量,但也会增加 CPU、磁盘和网络争用;减少它会限制同时克隆任务并可能限制 FE 调度的克隆操作。该值应用于动态克隆线程池,可以通过 update-config HTTP 操作在运行时更改(导致 agent_server 更新克隆池的最大线程数)。 +- 是否可变: 是 +- 描述: BE 上每个存储路径分配的并行克隆工作线程数。在 BE 启动时,克隆线程池的最大线程数计算为 max(number_of_store_paths * parallel_clone_task_per_path, MIN_CLONE_TASK_THREADS_IN_POOL)。例如,对于 4 个存储路径和默认值 8,克隆池最大线程数为 32。此设置直接控制 BE 处理 CLONE 任务(Tablet 副本复制)的并发性:增加它会提高并行克隆吞吐量,但也会增加 CPU、磁盘和网络争用;减少它会限制同时克隆任务并可能限制 FE 调度的克隆操作。该值应用于动态克隆线程池,可以在运行时通过 update-config 路径更改(导致 `agent_server` 更新克隆池的最大线程数)。 - 引入版本: v3.2.0 ##### partial_update_memory_limit_per_worker @@ -2342,8 +2342,8 @@ curl http://:/varz - 默认值: 2147483648 - 类型: long - 单位: 字节 -- 可变: 是 -- 描述: 单个 Worker 在执行部分列更新时(用于 Compaction / Rowset 更新处理)用于组装源 Chunk 的最大内存(字节)。读取器估计每行更新内存 (`total_update_row_size / num_rows_upt`) 并将其乘以读取的行数;当该乘积超过此限制时,当前 Chunk 将被刷新和处理,以避免额外的内存增长。设置此值以匹配每个更新 Worker 的可用内存——过低会增加 I/O/处理开销(许多小 Chunk);过高会增加内存压力或 OOM 风险。如果每行估计值为零(旧版 Rowset),则此配置不施加基于字节的限制(仅适用 INT32_MAX 行数限制)。 +- 是否可变: 是 +- 描述: 在执行部分列更新(用于 Compaction / RowSet 更新处理)时,单个 Worker 用于组装源 Chunk 的最大内存(字节)。读取器估计每行更新内存 (`total_update_row_size / num_rows_upt`) 并乘以读取的行数;当该乘积超过此限制时,当前 Chunk 将被刷新和处理以避免额外的内存增长。将其设置为匹配每个更新 Worker 的可用内存——过低会增加 I/O/处理开销(许多小 Chunk);过高会增加内存压力或 OOM 风险。如果每行估算值为零(旧版 RowSet),此配置不会施加基于字节的限制(仅适用 INT32_MAX 行数限制)。 - 引入版本: v3.2.10 ##### path_gc_check @@ -2351,8 +2351,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 启用时,StorageEngine 会启动每个数据目录的后台线程,执行定期路径扫描和垃圾回收。在启动时,`start_bg_threads()` 会生成 `_path_scan_thread_callback`(调用 `DataDir::perform_path_scan` 和 `perform_tmp_path_scan`)以及 `_path_gc_thread_callback`(调用 `DataDir::perform_path_gc_by_tablet`、`DataDir::perform_path_gc_by_rowsetid`、`DataDir::perform_delta_column_files_gc` 和 `DataDir::perform_crm_gc`)。扫描和 GC 间隔由 `path_scan_interval_second` 和 `path_gc_check_interval_second` 控制;CRM 文件清理使用 `unused_crm_file_threshold_second`。禁用此功能可防止自动路径级清理(您必须手动管理孤立/临时文件)。更改此标志需要重新启动进程。 +- 是否可变: 否 +- 描述: 启用时,StorageEngine 会启动每个数据目录的后台线程,执行周期性路径扫描和垃圾回收。在启动时,`start_bg_threads()` 会生成 `_path_scan_thread_callback`(调用 `DataDir::perform_path_scan` 和 `perform_tmp_path_scan`)和 `_path_gc_thread_callback`(调用 `DataDir::perform_path_gc_by_tablet`、`DataDir::perform_path_gc_by_rowsetid`、`DataDir::perform_delta_column_files_gc` 和 `DataDir::perform_crm_gc`)。扫描和 GC 间隔由 `path_scan_interval_second` 和 `path_gc_check_interval_second` 控制;CRM 文件清理使用 `unused_crm_file_threshold_second`。禁用此功能可防止自动路径级清理(您必须手动管理孤立/临时文件)。更改此标志需要重新启动进程。 - 引入版本: v3.2.0 ##### path_gc_check_interval_second @@ -2360,8 +2360,8 @@ curl http://:/varz - 默认值: 86400 - 类型: Int - 单位: 秒 -- 可变: 否 -- 描述: 存储引擎路径垃圾回收后台线程运行之间的时间间隔(秒)。每次唤醒都会触发 DataDir 按 Tablet、按 Rowset ID、Delta Column 文件 GC 和 CRM GC 执行路径 GC(CRM GC 调用使用 `unused_crm_file_threshold_second`)。如果设置为非正值,代码会强制将间隔设置为 1800 秒(半小时)并发出警告。调整此值以控制扫描和删除磁盘上的临时或下载文件的频率。 +- 是否可变: 否 +- 描述: 存储引擎的路径垃圾回收后台线程运行间隔(秒)。每次唤醒都会触发 DataDir 按 Tablet、按 RowSet ID、增量列文件 GC 和 CRM GC 执行路径 GC(CRM GC 调用使用 `unused_crm_file_threshold_second`)。如果设置为非正值,代码会强制将间隔设置为 1800 秒(半小时)并发出警告。调整此值以控制扫描和移除磁盘上的临时或下载文件的频率。 - 引入版本: v3.2.0 ##### pending_data_expire_time_sec @@ -2369,8 +2369,8 @@ curl http://:/varz - 默认值: 1800 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 存储引擎中待处理数据的过期时间。 +- 是否可变: 是 +- 描述: 存储引擎中挂起数据的过期时间。 - 引入版本: - ##### pindex_major_compaction_limit_per_disk @@ -2378,8 +2378,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 每个磁盘上 Compaction 的最大并发数。这解决了因 Compaction 导致的磁盘 I/O 不均衡问题。该问题可能导致某些磁盘的 I/O 过高。 +- 是否可变: 是 +- 描述: 每块磁盘的 Compaction 最大并发数。这解决了由于 Compaction 导致的磁盘 I/O 不均衡问题。此问题可能导致某些磁盘的 I/O 过高。 - 引入版本: v3.0.9 ##### pk_index_compaction_score_ratio @@ -2387,8 +2387,8 @@ curl http://:/varz - 默认值: 1.5 - 类型: Double - 单位: - -- 可变: 是 -- 描述: 共享数据集群中主键索引的 Compaction 得分比。例如,如果有 N 个文件集,Compaction 得分将为 `N * pk_index_compaction_score_ratio`。 +- 是否可变: 是 +- 描述: 共享数据集群中主键索引的 Compaction 分数比例。例如,如果有 N 个文件集,则 Compaction 分数将是 `N * pk_index_compaction_score_ratio`。 - 引入版本: - ##### pk_index_early_sst_compaction_threshold @@ -2396,7 +2396,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中主键索引的早期 SST Compaction 阈值。 - 引入版本: - @@ -2405,8 +2405,8 @@ curl http://:/varz - 默认值: 4096 - 类型: Int - 单位: - -- 可变: 否 -- 描述: Lake UpdateManager 中主键索引分片映射使用的分片数量。UpdateManager 分配一个此大小的 `PkIndexShard` 向量,并通过位掩码将 Tablet ID 映射到分片。增加此值可减少否则会共享同一分片的 Tablet 之间的锁争用,但代价是更多的互斥对象和稍高的内存使用。该值必须是 2 的幂,因为代码依赖于位掩码索引。有关大小调整指导,请参见 `tablet_map_shard_size` 启发式方法:`total_num_of_tablets_in_BE / 512`。 +- 是否可变: 否 +- 描述: Lake UpdateManager 中主键索引分片映射使用的分片数。UpdateManager 分配一个此大小的 `PkIndexShard` 向量,并通过位掩码将 Tablet ID 映射到分片。增加此值可减少 Tablet 之间因共享同一分片而导致的锁争用,但代价是增加互斥对象和略微增加内存使用。该值必须是 2 的幂,因为代码依赖于位掩码索引。有关大小调整指南,请参阅 `tablet_map_shard_size` 启发式:`total_num_of_tablets_in_BE / 512`。 - 引入版本: v3.2.0 ##### pk_index_memtable_flush_threadpool_max_threads @@ -2414,8 +2414,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 共享数据集群中主键索引 MemTable 刷新的线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 +- 是否可变: 是 +- 描述: 共享数据集群中主键索引 MemTable 刷新线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 - 引入版本: - ##### pk_index_memtable_flush_threadpool_size @@ -2423,8 +2423,8 @@ curl http://:/varz - 默认值: 1048576 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 控制共享数据(云原生/Lake)模式下主键索引 MemTable 刷新线程池的最大队列大小(挂起任务数)。线程池在 ExecEnv 中创建为 "cloud_native_pk_index_flush";其最大线程数由 `pk_index_memtable_flush_threadpool_max_threads` 管理。增加此值允许更多 MemTable 刷新任务在执行前缓冲,这可以减少即时反压,但会增加排队任务对象消耗的内存。减少此值会限制缓冲任务,并可能根据线程池行为导致更早的反压或任务拒绝。根据可用内存和预期并发刷新工作负载进行调整。 +- 是否可变: 是 +- 描述: 控制共享数据(云原生/lake)模式下主键索引 MemTable 刷新线程池的最大队列大小(挂起任务数)。线程池在 ExecEnv 中创建为 "cloud_native_pk_index_flush";其最大线程数由 `pk_index_memtable_flush_threadpool_max_threads` 控制。增加此值允许更多 MemTable 刷新任务在执行前缓冲,这可以减少即时反压,但会增加排队任务对象消耗的内存。减少它会限制缓冲任务,并可能根据线程池行为导致更早的反压或任务拒绝。根据可用内存和预期的并发刷新工作负载进行调整。 - 引入版本: - ##### pk_index_memtable_max_count @@ -2432,7 +2432,7 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中主键索引 MemTable 的最大数量。 - 引入版本: - @@ -2441,8 +2441,8 @@ curl http://:/varz - 默认值: 30000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 共享数据集群中等待主键索引 MemTable 刷新完成的最大超时时间。当同步刷新所有 MemTable 时(例如,在摄取 SST 操作之前),系统会等待直到此超时。默认值为 30 秒。 +- 是否可变: 是 +- 描述: 共享数据集群中等待主键索引 MemTable 刷新完成的最大超时。当同步刷新所有 MemTable(例如,在摄取 SST 操作之前)时,系统最多等待此超时。默认值为 30 秒。 - 引入版本: - ##### pk_index_parallel_compaction_task_split_threshold_bytes @@ -2450,8 +2450,8 @@ curl http://:/varz - 默认值: 33554432 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 主键索引 Compaction 任务的拆分阈值。当任务涉及的文件总大小小于此阈值时,任务将不会被拆分。 +- 是否可变: 是 +- 描述: 主键索引 Compaction 任务的拆分阈值。当任务中涉及文件的总大小小于此阈值时,任务将不会被拆分。 - 引入版本: - ##### pk_index_parallel_compaction_threadpool_max_threads @@ -2459,7 +2459,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中云原生主键索引并行 Compaction 线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 - 引入版本: - @@ -2468,8 +2468,8 @@ curl http://:/varz - 默认值: 1048576 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 共享数据模式下云原生主键索引并行 Compaction 使用的线程池的最大队列大小(待处理任务数)。此设置控制在线程池拒绝新提交之前可以排队多少 Compaction 任务。有效并行度受 `pk_index_parallel_compaction_threadpool_max_threads` 限制;当您预期有许多并发 Compaction 任务时,增加此值以避免任务拒绝,但请注意,较大的队列可能会增加排队工作的内存和延迟。 +- 是否可变: 是 +- 描述: 共享数据模式下云原生主键索引并行 Compaction 使用的线程池的最大队列大小(挂起任务数)。此设置控制在线程池拒绝新提交之前,可以排队多少 Compaction 任务。有效并行性受 `pk_index_parallel_compaction_threadpool_max_threads` 限制;增加此值以避免在预期有许多并发 Compaction 任务时任务被拒绝,但请注意,较大的队列会增加内存和排队工作的延迟。 - 引入版本: - ##### pk_index_parallel_execution_min_rows @@ -2477,7 +2477,7 @@ curl http://:/varz - 默认值: 16384 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中启用主键索引操作并行执行的最小行数阈值。 - 引入版本: - @@ -2486,7 +2486,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中主键索引并行执行线程池的最大线程数。`0` 表示自动设置为 CPU 核心数的一半。 - 引入版本: - @@ -2495,7 +2495,7 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 主键索引 Size-tiered Compaction 策略的级别乘数参数。 - 引入版本: - @@ -2504,7 +2504,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 主键索引 Size-tiered Compaction 策略的最大级别。 - 引入版本: - @@ -2513,7 +2513,7 @@ curl http://:/varz - 默认值: 131072 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 主键索引 Size-tiered Compaction 策略的最小级别。 - 引入版本: - @@ -2522,7 +2522,7 @@ curl http://:/varz - 默认值: 16777216 - 类型: Int - 单位: 字节 -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中 SSTable 文件的采样间隔大小。当 SSTable 文件大小超过此阈值时,系统会以该间隔从 SSTable 中采样键,以优化 Compaction 任务的边界分区。对于小于此阈值的 SSTable,仅使用起始键作为边界键。默认值为 16 MB。 - 引入版本: - @@ -2531,7 +2531,7 @@ curl http://:/varz - 默认值: 67108864 - 类型: Int - 单位: 字节 -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中主键索引的目标文件大小。 - 引入版本: - @@ -2540,8 +2540,8 @@ curl http://:/varz - 默认值: 104857600 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 当 `enable_pk_index_eager_build` 设置为 true 时,系统仅在导入或 Compaction 期间生成的数据超过此阈值时才会急切构建 PK 索引文件。默认值为 100MB。 +- 是否可变: 是 +- 描述: 当 `enable_pk_index_eager_build` 设置为 true 时,仅当导入或 Compaction 期间生成的数据超过此阈值时,系统才会急切构建 PK 索引文件。默认值为 100MB。 - 引入版本: - ##### primary_key_limit_size @@ -2549,17 +2549,17 @@ curl http://:/varz - 默认值: 128 - 类型: Int - 单位: 字节 -- 可变: 是 +- 是否可变: 是 - 描述: 主键表中键列的最大大小。 - 引入版本: v2.5 ##### release_snapshot_worker_count -- 默认值: 5 +- Default: 5 - 类型: Int - 单位: - -- 可变: 是 -- 描述: BE 节点上 release snapshot 任务的最大线程数。 +- 是否可变: 是 +- 描述: BE 节点上释放快照任务的最大线程数。 - 引入版本: - ##### repair_compaction_interval_seconds @@ -2567,16 +2567,16 @@ curl http://:/varz - 默认值: 600 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 轮询 Repair Compaction 线程的时间间隔。 +- 是否可变: 是 +- 描述: 轮询修复 Compaction 线程的时间间隔。 - 引入版本: - ##### replication_max_speed_limit_kbps - 默认值: 50000 - 类型: Int -- 单位: KB/秒 -- 可变: 是 +- 单位: KB/s +- 是否可变: 是 - 描述: 每个复制线程的最大速度。 - 引入版本: v3.3.5 @@ -2584,8 +2584,8 @@ curl http://:/varz - 默认值: 50 - 类型: Int -- 单位: KB/秒 -- 可变: 是 +- 单位: KB/s +- 是否可变: 是 - 描述: 每个复制线程的最小速度。 - 引入版本: v3.3.5 @@ -2594,8 +2594,8 @@ curl http://:/varz - 默认值: 300 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 复制线程低于最小速度允许持续的时间。如果实际速度低于 `replication_min_speed_limit_kbps` 的时间超过此值,复制将失败。 +- 是否可变: 是 +- 描述: 复制线程低于最小速度的允许持续时间。如果实际速度低于 `replication_min_speed_limit_kbps` 的时间超过此值,复制将失败。 - 引入版本: v3.3.5 ##### replication_threads @@ -2603,7 +2603,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 用于复制的最大线程数。`0` 表示将线程数设置为 BE CPU 核心数的四倍。 - 引入版本: v3.3.5 @@ -2612,7 +2612,7 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: Size-tiered Compaction 策略中两个连续级别之间的数据大小倍数。 - 引入版本: - @@ -2621,8 +2621,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 在 Size-tiered Compaction 策略中,两个相邻级别之间数据量差异的倍数,用于 Duplicate Key 表。 +- 是否可变: 是 +- 描述: 在 Size-tiered Compaction 策略中,Duplicate Key 表的两个相邻级别之间数据量差异的倍数。 - 引入版本: - ##### size_tiered_level_num @@ -2630,8 +2630,8 @@ curl http://:/varz - 默认值: 7 - 类型: Int - 单位: - -- 可变: 是 -- 描述: Size-tiered Compaction 策略的级别数。每个级别最多保留一个 Rowset。因此,在稳定条件下,Rowset 的数量最多与此配置项中指定的级别数相同。 +- 是否可变: 是 +- 描述: Size-tiered Compaction 策略的级别数。每个级别最多保留一个 RowSet。因此,在稳定条件下,RowSet 数量最多与此配置项中指定的级别数相同。 - 引入版本: - ##### size_tiered_max_compaction_level @@ -2639,8 +2639,8 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: 级别 -- 可变: 是 -- 描述: 限制在一个主键实时 Compaction 任务中可以合并的 Size-tiered 级别数量。在 PK Size-tiered Compaction 选择期间,StarRocks 会按大小构建有序的 Rowset “级别”,并将连续的级别添加到选定的 Compaction 输入中,直到达到此限制(代码使用 `compaction_level <= size_tiered_max_compaction_level`)。该值是包含性的,并计算合并的不同 Size-tiered 数量(最高级别计为 1)。仅当 PK Size-tiered Compaction 策略启用时有效;提高它允许 Compaction 任务包含更多级别(更大、I/O 和 CPU 密集型合并,潜在更高的写入放大),而降低它会限制合并并减少任务大小和资源使用。 +- 是否可变: 是 +- 描述: 限制可合并到单个主键实时 Compaction 任务中的 Size-tiered 级别数量。在 PK Size-tiered Compaction 选择期间,StarRocks 按大小构建有序的“级别”RowSet,并将连续级别添加到所选的 Compaction 输入中,直到达到此限制(代码使用 `compaction_level <= size_tiered_max_compaction_level`)。该值是包含性的,并计算合并的不同 Size 层数(顶层计为 1)。仅当 PK Size-tiered Compaction 策略启用时才有效;提高它允许 Compaction 任务包含更多级别(更大、I/O 和 CPU 密集型合并,潜在更高的写入放大),而降低它会限制合并并减少任务大小和资源使用。 - 引入版本: v4.0.0 ##### size_tiered_min_level_size @@ -2648,8 +2648,8 @@ curl http://:/varz - 默认值: 131072 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: Size-tiered Compaction 策略中最小级别的数据大小。小于此值的 Rowset 将立即触发数据 Compaction。 +- 是否可变: 是 +- 描述: Size-tiered Compaction 策略中最小级别的数据大小。小于此值的 RowSet 将立即触发数据 Compaction。 - 引入版本: - ##### small_dictionary_page_size @@ -2657,8 +2657,8 @@ curl http://:/varz - 默认值: 4096 - 类型: Int - 单位: 字节 -- 可变: 否 -- 描述: BinaryPlainPageDecoder 用于决定是否急切解析字典(二进制/纯文本)页面的阈值(字节)。如果页面编码大小小于 `small_dictionary_page_size`,解码器会预解析所有字符串条目到内存向量 (`_parsed_datas`) 中,以加速随机访问和批量读取。增加此值会导致更多页面被预解析(这可以减少每次访问的解码开销,并可能提高较大字典的有效压缩),但会增加内存使用和解析所花费的 CPU;过大的值可能会降低整体性能。仅在测量内存和访问延迟权衡后才进行调整。 +- 是否可变: 否 +- 描述: BinaryPlainPageDecoder 用于决定是否急切解析字典(二进制/纯文本)页的阈值(字节)。如果页的编码大小小于 `small_dictionary_page_size`,解码器会将所有字符串条目预解析到内存向量 (`_parsed_datas`) 中,以加速随机访问和批处理读取。增加此值会导致更多页被预解析(这可以减少每次访问的解码开销,并可能增加较大字典的有效压缩),但会增加内存使用和解析所花费的 CPU;过大的值可能会降低整体性能。仅在测量内存和访问延迟权衡后进行调整。 - 引入版本: v3.4.1, v3.5.0 ##### snapshot_expire_time_sec @@ -2666,7 +2666,7 @@ curl http://:/varz - 默认值: 172800 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 快照文件的过期时间。 - 引入版本: - @@ -2675,8 +2675,8 @@ curl http://:/varz - 默认值: 0 - 类型: long - 单位: 秒 -- 可变: 是 -- 描述: 当发送者作业的内存使用量很高时,如果 MemTable 超过 `stale_memtable_flush_time_sec` 秒未更新,将被刷新以减少内存压力。此行为仅在内存限制接近时(`limit_exceeded_by_ratio(70)` 或更高)才被考虑。在 LocalTabletsChannel 中,当内存使用量非常高时(`limit_exceeded_by_ratio(95)`),可能会额外刷新大小超过 `write_buffer_size / 4` 的 MemTable。值为 `0` 会禁用这种基于年龄的过期 MemTable 刷新(不可变分区 MemTable 在空闲或内存高时仍会立即刷新)。 +- 是否可变: 是 +- 描述: 当发送作业的内存使用率较高时,超过 `stale_memtable_flush_time_sec` 秒未更新的 MemTable 将被刷新以减少内存压力。此行为仅在内存限制接近(`limit_exceeded_by_ratio(70)` 或更高)时考虑。在 LocalTabletsChannel 中,当内存使用率非常高(`limit_exceeded_by_ratio(95)`)时,可能会有额外的路径刷新大小超过 `write_buffer_size / 4` 的 MemTable。值为 `0` 会禁用此基于年龄的过期 MemTable 刷新(不可变分区 MemTable 在空闲或内存高时仍会立即刷新)。 - 引入版本: v3.2.0 ##### storage_flood_stage_left_capacity_bytes @@ -2684,8 +2684,8 @@ curl http://:/varz - 默认值: 107374182400 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 所有 BE 目录中剩余存储空间的硬限制。如果 BE 存储目录的剩余存储空间小于此值且存储使用率(百分比)超过 `storage_flood_stage_usage_percent`,则拒绝 Load 和 Restore 作业。您需要将此项与 FE 配置项 `storage_usage_hard_limit_reserve_bytes` 一起设置才能使配置生效。 +- 是否可变: 是 +- 描述: 所有 BE 目录剩余存储空间的硬限制。如果 BE 存储目录的剩余存储空间小于此值,且存储使用率(百分比)超过 `storage_flood_stage_usage_percent`,则会拒绝 Load 和 Restore 作业。您需要将此项与 FE 配置项 `storage_usage_hard_limit_reserve_bytes` 一起设置,以使配置生效。 - 引入版本: - ##### storage_flood_stage_usage_percent @@ -2693,8 +2693,8 @@ curl http://:/varz - 默认值: 95 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 所有 BE 目录中存储使用率的硬限制(百分比)。如果 BE 存储目录的存储使用率(百分比)超过此值且剩余存储空间小于 `storage_flood_stage_left_capacity_bytes`,则拒绝 Load 和 Restore 作业。您需要将此项与 FE 配置项 `storage_usage_hard_limit_percent` 一起设置才能使配置生效。 +- 是否可变: 是 +- 描述: 所有 BE 目录存储使用率(百分比)的硬限制。如果 BE 存储目录的存储使用率(百分比)超过此值,且剩余存储空间小于 `storage_flood_stage_left_capacity_bytes`,则会拒绝 Load 和 Restore 作业。您需要将此项与 FE 配置项 `storage_usage_hard_limit_percent` 一起设置,以使配置生效。 - 引入版本: - ##### storage_high_usage_disk_protect_ratio @@ -2702,8 +2702,8 @@ curl http://:/varz - 默认值: 0.1 - 类型: double - 单位: - -- 可变: 是 -- 描述: 在选择用于 Tablet 创建的存储根目录时,StorageEngine 按 `disk_usage(0)` 对候选磁盘进行排序并计算平均使用率。任何使用率大于(平均使用率 + `storage_high_usage_disk_protect_ratio`)的磁盘都会被排除在优先选择池之外(它将不参与随机优先混洗,因此最初被选择的机会被推迟)。设置为 0 可禁用此保护。值是小数(典型范围 0.0–1.0);较大的值使调度程序对高于平均水平的磁盘更具容忍度。 +- 是否可变: 是 +- 描述: 在选择用于创建 Tablet 的存储根目录时,StorageEngine 按 `disk_usage(0)` 对候选磁盘进行排序并计算平均使用率。任何使用率大于(平均使用率 + `storage_high_usage_disk_protect_ratio`)的磁盘都将从优先选择池中排除(它将不参与随机优先选择,因此最初不会被选中)。设置为 0 可禁用此保护。值是小数(典型范围 0.0–1.0);较大的值使调度程序对高于平均水平的磁盘更宽容。 - 引入版本: v3.2.0 ##### storage_medium_migrate_count @@ -2711,7 +2711,7 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 用于存储介质迁移(从 SATA 到 SSD)的线程数。 - 引入版本: - @@ -2720,11 +2720,11 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/storage` - 类型: String - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 存储卷的目录和介质。示例:`/data1,medium:hdd;/data2,medium:ssd`。 - - 多个卷用分号 (`;`) 分隔。 - - 如果存储介质为 SSD,在目录末尾添加 `,medium:ssd`。 - - 如果存储介质为 HDD,在目录末尾添加 `,medium:hdd`。 + - 多个卷以分号 (`;`) 分隔。 + - 如果存储介质是 SSD,请在目录末尾添加 `,medium:ssd`。 + - 如果存储介质是 HDD,请在目录末尾添加 `,medium:hdd`。 - 引入版本: - ##### sync_tablet_meta @@ -2732,7 +2732,7 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 一个布尔值,控制是否启用 Tablet 元数据的同步。`true` 表示启用同步,`false` 表示禁用。 - 引入版本: - @@ -2741,8 +2741,8 @@ curl http://:/varz - 默认值: 1024 - 类型: Int - 单位: - -- 可变: 否 -- 描述: Tablet Map 的分片大小。该值必须是 2 的幂。 +- 是否可变: 否 +- 描述: Tablet 映射分片大小。该值必须是 2 的幂。 - 引入版本: - ##### tablet_max_pending_versions @@ -2750,8 +2750,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 主键 Tablet 上可容忍的最大待处理版本数。待处理版本指的是已提交但尚未应用的版本。 +- 是否可变: 是 +- 描述: 主键 Tablet 上可容忍的最大挂起版本数。挂起版本指的是已提交但尚未应用的版本。 - 引入版本: - ##### tablet_max_versions @@ -2759,8 +2759,8 @@ curl http://:/varz - 默认值: 1000 - 类型: Int - 单位: - -- 可变: 是 -- 描述: Tablet 上允许的最大版本数。如果版本数超过此值,新的写入请求将失败。 +- 是否可变: 是 +- 描述: Tablet 允许的最大版本数。如果版本数超过此值,新的写入请求将失败。 - 引入版本: - ##### tablet_meta_checkpoint_min_interval_secs @@ -2768,7 +2768,7 @@ curl http://:/varz - 默认值: 600 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: TabletMeta Checkpoint 的线程轮询时间间隔。 - 引入版本: - @@ -2777,8 +2777,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 自上次 TabletMeta Checkpoint 以来创建的最小 Rowset 数量。 +- 是否可变: 是 +- 描述: 自上次 TabletMeta Checkpoint 以来创建的最小 RowSet 数量。 - 引入版本: - ##### tablet_rowset_stale_sweep_time_sec @@ -2786,8 +2786,8 @@ curl http://:/varz - 默认值: 1800 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 清理 Tablet 中过期 Rowset 的时间间隔。 +- 是否可变: 是 +- 描述: 清理 Tablet 中过期 RowSet 的时间间隔。 - 引入版本: - ##### tablet_stat_cache_update_interval_second @@ -2795,8 +2795,8 @@ curl http://:/varz - 默认值: 300 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: Tablet Stat Cache 的更新时间间隔。 +- 是否可变: 是 +- 描述: Tablet 统计缓存更新的时间间隔。 - 引入版本: - ##### tablet_writer_open_rpc_timeout_sec @@ -2804,8 +2804,8 @@ curl http://:/varz - 默认值: 300 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 在远程 BE 上打开 Tablet Writer 的 RPC 超时(秒)。该值转换为毫秒,并应用于发出打开调用时的请求超时和 bRPC 控制超时。运行时使用 `tablet_writer_open_rpc_timeout_sec` 和总体加载超时的一半(即 min(`tablet_writer_open_rpc_timeout_sec`, `load_timeout_sec` / 2))中的最小值作为有效超时。设置此值以平衡及时故障检测(过小可能导致过早的打开失败)和为 BE 提供足够时间初始化写入器(过大延迟错误处理)。 +- 是否可变: 是 +- 描述: 打开远程 BE 上 Tablet 写入器的 RPC 超时(秒)。该值转换为毫秒,并应用于发出打开调用时的请求超时和 bRPC 控制超时。运行时使用 `tablet_writer_open_rpc_timeout_sec` 和总加载超时的一半中的最小值作为有效超时(即 min(`tablet_writer_open_rpc_timeout_sec`, `load_timeout_sec` / 2))。设置此值以平衡及时故障检测(过小可能导致过早打开失败)和给予 BE 足够时间初始化写入器(过大延迟错误处理)。 - 引入版本: v3.2.0 ##### transaction_apply_worker_count @@ -2813,8 +2813,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: 线程 -- 可变: 是 -- 描述: 控制 UpdateManager 中 "update_apply" 线程池使用的最大工作线程数 — 该线程池用于应用事务的 Rowset(特别是对于主键表)。值 `>0` 设置固定的最大线程数;0(默认值)使池大小等于 CPU 核心数。配置的值在启动时应用 (UpdateManager::init),可以通过 update-config HTTP 操作在运行时更改,该操作会更新池的最大线程数。调整此值以增加应用并发性(吞吐量)或限制 CPU/内存争用;最小线程数和空闲超时分别由 `transaction_apply_thread_pool_num_min` 和 `transaction_apply_worker_idle_time_ms` 控制。 +- 是否可变: 是 +- 描述: 控制 UpdateManager 的 "update_apply" 线程池使用的最大工作线程数——该线程池用于应用事务的 RowSet(特别是主键表)。值 `>0` 设置固定最大线程数;0(默认值)使池大小等于 CPU 核心数。配置的值在启动时应用(UpdateManager::init),并可在运行时通过 `update-config` HTTP 操作更改,该操作会更新池的最大线程数。调整此值以增加应用并发性(吞吐量)或限制 CPU/内存争用;最小线程数和空闲超时分别由 `transaction_apply_thread_pool_num_min` 和 `transaction_apply_worker_idle_time_ms` 控制。 - 引入版本: v3.2.0 ##### transaction_apply_worker_idle_time_ms @@ -2822,8 +2822,8 @@ curl http://:/varz - 默认值: 500 - 类型: int - 单位: 毫秒 -- 可变: 否 -- 描述: 设置 UpdateManager 中用于应用事务/更新的 "update_apply" 线程池的空闲超时(毫秒)。该值通过 MonoDelta::FromMilliseconds 传递给 ThreadPoolBuilder::set_idle_timeout,因此空闲时间超过此超时的 Worker 线程可能会被终止(受池配置的最小线程数和最大线程数限制)。较小的值可更快释放资源,但会增加突发负载下的线程创建/销毁开销;较大的值可使 Worker 在短时突发期间保持“热”状态,但代价是更高的基线资源使用。 +- 是否可变: 否 +- 描述: 设置 UpdateManager 的 "update_apply" 线程池用于应用事务/更新的空闲超时(毫秒)。该值通过 MonoDelta::FromMilliseconds 传递给 ThreadPoolBuilder::set_idle_timeout,因此空闲时间超过此超时的 Worker 线程可能会被终止(受限于池配置的最小线程数和最大线程数)。较小的值可更快释放资源,但在突发负载下会增加线程创建/销毁开销;较大的值可使 Worker 在短时突发下保持活跃,但会增加基线资源使用。 - 引入版本: v3.2.11 ##### trash_file_expire_time_sec @@ -2831,8 +2831,8 @@ curl http://:/varz - 默认值: 86400 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 清理垃圾文件的时间间隔。默认值已从 v2.5.17、v3.0.9 和 v3.1.6 开始从 259,200 更改为 86,400。 +- 是否可变: 是 +- 描述: 清理垃圾文件的时间间隔。从 v2.5.17, v3.0.9 和 v3.1.6 开始,默认值已从 259,200 更改为 86,400。 - 引入版本: - ##### unused_rowset_monitor_interval @@ -2840,8 +2840,8 @@ curl http://:/varz - 默认值: 30 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: 清理过期 Rowset 的时间间隔。 +- 是否可变: 是 +- 描述: 清理过期 RowSet 的时间间隔。 - 引入版本: - ##### update_cache_expire_sec @@ -2849,8 +2849,8 @@ curl http://:/varz - 默认值: 360 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: Update Cache 的过期时间。 +- 是否可变: 是 +- 描述: 更新缓存的过期时间。 - 引入版本: - ##### update_compaction_check_interval_seconds @@ -2858,7 +2858,7 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 检查主键表 Compaction 的时间间隔。 - 引入版本: - @@ -2867,8 +2867,8 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 用于控制主键表中包含 Delvec 文件的 Rowset 的 Compaction 优先级。值越大,优先级越高。 +- 是否可变: 是 +- 描述: 用于控制主键表中包含 Delvec 文件的 RowSet 的 Compaction 优先级。值越大,优先级越高。 - 引入版本: - ##### update_compaction_num_threads_per_disk @@ -2876,8 +2876,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 主键表每个磁盘的 Compaction 线程数。 +- 是否可变: 是 +- 描述: 主键表每块磁盘的 Compaction 线程数。 - 引入版本: - ##### update_compaction_per_tablet_min_interval_seconds @@ -2885,7 +2885,7 @@ curl http://:/varz - 默认值: 120 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: 主键表中每个 Tablet 触发 Compaction 的最小时间间隔。 - 引入版本: - @@ -2894,8 +2894,8 @@ curl http://:/varz - 默认值: 0.5 - 类型: Double - 单位: - -- 可变: 是 -- 描述: 共享数据集群中主键表 Compaction 可合并的最大数据比例。如果单个 Tablet 过大,建议减小此值。 +- 是否可变: 是 +- 描述: 共享数据集群中 Compaction 可为主键表合并数据的最大比例。如果单个 Tablet 变得过大,建议缩小此值。 - 引入版本: v3.1.5 ##### update_compaction_result_bytes @@ -2903,7 +2903,7 @@ curl http://:/varz - 默认值: 1073741824 - 类型: Int - 单位: 字节 -- 可变: 是 +- 是否可变: 是 - 描述: 主键表单次 Compaction 的最大结果大小。 - 引入版本: - @@ -2912,8 +2912,8 @@ curl http://:/varz - 默认值: 268435456 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 主键表的 Compaction 得分是根据文件大小计算的,这与其他表类型不同。此参数可用于使主键表的 Compaction 得分与其他表类型相似,从而更易于用户理解。 +- 是否可变: 是 +- 描述: 主键表的 Compaction Score 是根据文件大小计算的,这与其他表类型不同。此参数可用于使主键表的 Compaction Score 与其他表类型相似,从而使用户更容易理解。 - 引入版本: - ##### upload_worker_count @@ -2921,7 +2921,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: BE 节点上备份作业上传任务的最大线程数。`0` 表示将该值设置为 BE 所在机器的 CPU 核心数。 - 引入版本: - @@ -2930,8 +2930,8 @@ curl http://:/varz - 默认值: 5 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 垂直 Compaction 中每组的最大列数。 +- 是否可变: 否 +- 描述: 垂直 Compaction 每组的最大列数。 - 引入版本: - ### 共享数据 @@ -2941,8 +2941,8 @@ curl http://:/varz - 默认值: 4194304 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 下载快照文件时使用的内存中拷贝缓冲区大小(字节)。SnapshotLoader::download 将此值传递给 fs::copy 作为每传输块大小,用于从远程顺序文件读取到本地可写文件。较大的值可以通过减少系统调用/IO 开销来提高高带宽链接的吞吐量;较小的值会减少每个活动传输的峰值内存使用。注意:此参数控制每个流的缓冲区大小,而不是下载线程数——总内存消耗 = `download_buffer_size * number_of_concurrent_downloads`。 +- 是否可变: 是 +- 描述: 下载快照文件时使用的内存中复制缓冲区的大小(字节)。SnapshotLoader::download 将此值传递给 fs::copy 作为每次传输的块大小,用于从远程顺序文件读取到本地可写文件。较大的值可以通过减少系统调用/IO 开销来提高高带宽链接上的吞吐量;较小的值可减少每个活动传输的峰值内存使用。注意:此参数控制每个流的缓冲区大小,而不是下载线程数——总内存消耗 = `download_buffer_size` * 并发下载数。 - 引入版本: v3.2.13 ##### graceful_exit_wait_for_frontend_heartbeat @@ -2950,8 +2950,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 确定是否等待至少一个 Frontend 心跳响应指示 SHUTDOWN 状态,然后才完成优雅退出。启用时,优雅关闭过程保持活动状态,直到通过心跳 RPC 响应 SHUTDOWN 确认,确保 Frontend 在两个常规心跳间隔之间有足够的时间检测终止状态。 +- 是否可变: 是 +- 描述: 确定是否在完成优雅退出之前等待至少一个前端心跳响应,指示 SHUTDOWN 状态。启用时,优雅关机过程保持活跃,直到通过心跳 RPC 响应 SHUTDOWN 确认,确保前端有足够的时间在两次常规心跳间隔之间检测终止状态。 - 引入版本: v3.4.5 ##### lake_compaction_stream_buffer_size_bytes @@ -2959,7 +2959,7 @@ curl http://:/varz - 默认值: 1048576 - 类型: Int - 单位: 字节 -- 可变: 是 +- 是否可变: 是 - 描述: 共享数据集群中云原生表 Compaction 的读取器远程 I/O 缓冲区大小。默认值为 1MB。您可以增加此值以加速 Compaction 进程。 - 引入版本: v3.2.3 @@ -2968,8 +2968,8 @@ curl http://:/varz - 默认值: 500 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 共享数据集群中主键表 Compaction 任务允许的最大输入 Rowset 数量。此参数的默认值从 v3.2.4 和 v3.1.10 开始从 `5` 更改为 `1000`,从 v3.3.1 和 v3.2.9 开始更改为 `500`。在为主键表启用 Size-tiered Compaction 策略(通过将 `enable_pk_size_tiered_compaction_strategy` 设置为 `true`)后,StarRocks 不需要限制每次 Compaction 的 Rowset 数量以减少写入放大。因此,此参数的默认值增加了。 +- 是否可变: 是 +- 描述: 共享数据集群中主键表 Compaction 任务允许的最大输入 RowSet 数量。从 v3.2.4 和 v3.1.10 开始,此参数的默认值从 `5` 更改为 `1000`,从 v3.3.1 和 v3.2.9 开始更改为 `500`。在为主键表启用 Size-tiered Compaction 策略(通过将 `enable_pk_size_tiered_compaction_strategy` 设置为 `true`)后,StarRocks 无需限制每次 Compaction 的 RowSet 数量以减少写入放大。因此,此参数的默认值增加。 - 引入版本: v3.1.8, v3.2.3 ##### loop_count_wait_fragments_finish @@ -2977,17 +2977,17 @@ curl http://:/varz - 默认值: 2 - 类型: Int - 单位: - -- 可变: 是 -- 描述: BE/CN 进程退出时要等待的循环次数。每个循环是固定的 10 秒间隔。您可以将其设置为 `0` 以禁用循环等待。从 v3.4 开始,此项变为可变,其默认值从 `0` 更改为 `2`。 +- 是否可变: 是 +- 描述: BE/CN 进程退出时要等待的循环次数。每个循环是固定的 10 秒间隔。您可以将其设置为 `0` 以禁用循环等待。从 v3.4 开始,此项更改为可变,其默认值从 `0` 更改为 `2`。 - 引入版本: v2.5 ##### max_client_cache_size_per_host - 默认值: 10 - 类型: Int -- 单位: 每个主机条目(缓存的客户端实例) -- 可变: 否 -- 描述: BE 范围客户端缓存为每个远程主机保留的最大缓存客户端实例数。此单个设置在 ExecEnv 初始化期间创建 BackendServiceClientCache、FrontendServiceClientCache 和 BrokerServiceClientCache 时使用,因此它限制了这些缓存中为每个主机保留的客户端存根/连接数。增加此值可减少重新连接和存根创建开销,但会增加内存和文件描述符使用;减少此值可节省资源,但可能增加连接流失。该值在启动时读取,不能在运行时更改。目前一个共享设置控制所有客户端缓存类型;以后可能会引入每个缓存的单独配置。 +- 单位: 每个主机的条目数 (缓存的客户端实例) +- 是否可变: 否 +- 描述: BE 全局客户端缓存为每个远程主机保留的最大缓存客户端实例数。此单个设置在 ExecEnv 初始化期间创建 BackendServiceClientCache、FrontendServiceClientCache 和 BrokerServiceClientCache 时使用,因此它限制了这些缓存中每个主机保留的客户端 stub/连接数。增加此值可减少重新连接和 stub 创建开销,但会增加内存和文件描述符使用;减少它可节省资源,但可能会增加连接流失。此值在启动时读取,不能在运行时更改。目前,一个共享设置控制所有客户端缓存类型;以后可能会引入每个缓存的单独配置。 - 引入版本: v3.2.0 ##### starlet_filesystem_instance_cache_capacity @@ -2995,7 +2995,7 @@ curl http://:/varz - 默认值: 10000 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: Starlet 文件系统实例的缓存容量。 - 引入版本: v3.2.16, v3.3.11, v3.4.1 @@ -3004,7 +3004,7 @@ curl http://:/varz - 默认值: 86400 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: Starlet 文件系统实例的缓存过期时间。 - 引入版本: v3.3.15, 3.4.5 @@ -3013,7 +3013,7 @@ curl http://:/varz - 默认值: 9070 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: BE 和 CN 的额外代理服务端口。 - 引入版本: - @@ -3022,17 +3022,17 @@ curl http://:/varz - 默认值: 80 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 共享数据集群中 Data Cache 最多可使用的磁盘容量百分比。 - 引入版本: v3.1 ##### starlet_use_star_cache -- 默认值: v3.1 为 false,从 v3.2.3 起为 true +- 默认值: v3.1 中为 false,v3.2.3 中为 true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 共享数据集群中是否启用 Data Cache。`true` 表示启用此功能,`false` 表示禁用。默认值从 v3.2.3 开始从 `false` 更改为 `true`。 +- 是否可变: 是 +- 描述: 是否在共享数据集群中启用 Data Cache。`true` 表示启用此功能,`false` 表示禁用。默认值从 v3.2.3 开始从 `false` 更改为 `true`。 - 引入版本: v3.1 ##### starlet_write_file_with_tag @@ -3040,8 +3040,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 在共享数据集群中,是否为写入对象存储的文件添加对象存储标签,以便于自定义文件管理。 +- 是否可变: 是 +- 描述: 在共享数据集群中,是否使用对象存储标签标记写入对象存储的文件,以便进行便捷的自定义文件管理。 - 引入版本: v3.5.3 ##### table_schema_service_max_retries @@ -3049,19 +3049,19 @@ curl http://:/varz - 默认值: 3 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 表模式服务请求的最大重试次数。 +- 是否可变: 是 +- 描述: 表 Schema Service 请求的最大重试次数。 - 引入版本: v4.1 -### 数据湖 +### Data Lake ##### datacache_block_buffer_enable - 默认值: true - 类型: Boolean - 单位: - -- 可变: 否 -- 描述: 是否启用 Block Buffer 以优化 Data Cache 效率。启用 Block Buffer 后,系统会从 Data Cache 读取 Block 数据并将其缓存在临时缓冲区中,从而减少频繁缓存读取带来的额外开销。 +- 是否可变: 否 +- 描述: 是否启用 Block Buffer 以优化 Data Cache 效率。启用 Block Buffer 后,系统会从 Data Cache 中读取 Block 数据并将其缓存到临时缓冲区中,从而减少频繁缓存读取带来的额外开销。 - 引入版本: v3.2.0 ##### datacache_disk_adjust_interval_seconds @@ -3069,7 +3069,7 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: 秒 -- 可变: 是 +- 是否可变: 是 - 描述: Data Cache 自动容量伸缩的间隔。系统会定期检查缓存磁盘使用情况,并在必要时触发自动伸缩。 - 引入版本: v3.3.0 @@ -3078,8 +3078,8 @@ curl http://:/varz - 默认值: 7200 - 类型: Int - 单位: 秒 -- 可变: 是 -- 描述: Data Cache 自动扩展的最小等待时间。仅当磁盘使用率低于 `datacache_disk_low_level` 持续时间超过此值时,才会触发自动扩展。 +- 是否可变: 是 +- 描述: Data Cache 自动扩容的最小等待时间。仅当磁盘使用率在超过此持续时间后仍低于 `datacache_disk_low_level` 时,才会触发自动扩容。 - 引入版本: v3.3.0 ##### datacache_disk_size @@ -3087,16 +3087,16 @@ curl http://:/varz - 默认值: 0 - 类型: String - 单位: - -- 可变: 是 -- 描述: 单个磁盘上可缓存的最大数据量。可以将其设置为百分比(例如 `80%`)或物理限制(例如 `2T`、`500G`)。例如,如果您使用两个磁盘并将 `datacache_disk_size` 参数的值设置为 `21474836480` (20 GB),则这两个磁盘上最多可缓存 40 GB 数据。默认值为 `0`,表示只使用内存来缓存数据。 +- 是否可变: 是 +- 描述: 单个磁盘上可缓存的最大数据量。您可以将其设置为百分比(例如,`80%`)或物理限制(例如,`2T`,`500G`)。例如,如果您使用两个磁盘并将 `datacache_disk_size` 参数的值设置为 `21474836480` (20 GB),则这些磁盘上最多可缓存 40 GB 数据。默认值为 `0`,表示仅使用内存缓存数据。 - 引入版本: - ##### datacache_enable -- 默认值: true +- 默认值: v3.3 中为 true - 类型: Boolean - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 是否启用 Data Cache。`true` 表示启用 Data Cache,`false` 表示禁用 Data Cache。默认值从 v3.3 开始更改为 `true`。 - 引入版本: - @@ -3105,8 +3105,8 @@ curl http://:/varz - 默认值: slru - 类型: String - 单位: - -- 可变: 否 -- 描述: Data Cache 的逐出策略。有效值:`lru`(最近最少使用)和 `slru`(分段 LRU)。 +- 是否可变: 否 +- 描述: Data Cache 的驱逐策略。有效值:`lru`(最近最少使用)和 `slru`(分段 LRU)。 - 引入版本: v3.4.0 ##### datacache_inline_item_count_limit @@ -3114,8 +3114,8 @@ curl http://:/varz - 默认值: 130172 - 类型: Int - 单位: - -- 可变: 否 -- 描述: Data Cache 中内联缓存项的最大数量。对于一些特别小的缓存块,Data Cache 将它们以 `inline` 模式存储,该模式将块数据和元数据一起缓存在内存中。 +- 是否可变: 否 +- 描述: Data Cache 中内联缓存项的最大数量。对于一些特别小的缓存块,Data Cache 以 `inline` 模式存储它们,这会将块数据和元数据一起缓存到内存中。 - 引入版本: v3.4.0 ##### datacache_mem_size @@ -3123,8 +3123,8 @@ curl http://:/varz - 默认值: 0 - 类型: String - 单位: - -- 可变: 是 -- 描述: 内存中可缓存的最大数据量。可以将其设置为百分比(例如 `10%`)或物理限制(例如 `10G`、`21474836480`)。 +- 是否可变: 是 +- 描述: 内存中可缓存的最大数据量。您可以将其设置为百分比(例如,`10%`)或物理限制(例如,`10G`,`21474836480`)。 - 引入版本: - ##### datacache_min_disk_quota_for_adjustment @@ -3132,8 +3132,8 @@ curl http://:/varz - 默认值: 10737418240 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: Data Cache 自动伸缩的最小有效容量。如果系统尝试将缓存容量调整到小于此值,则缓存容量将直接设置为 `0`,以防止因缓存容量不足导致频繁缓存填充和逐出而造成的次优性能。 +- 是否可变: 是 +- 描述: Data Cache 自动伸缩的最小有效容量。如果系统尝试将缓存容量调整到小于此值,缓存容量将直接设置为 `0`,以防止因缓存容量不足导致频繁缓存填充和驱逐而造成的次优性能。 - 引入版本: v3.3.0 ##### disk_high_level @@ -3141,8 +3141,8 @@ curl http://:/varz - 默认值: 90 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 触发缓存容量自动扩容的磁盘使用率上限(百分比)。当磁盘使用率超过此值时,系统会自动从 Data Cache 中逐出缓存数据。从 v3.4.0 开始,默认值从 `80` 更改为 `90`。从 v4.0 开始,此项从 `datacache_disk_high_level` 重命名为 `disk_high_level`。 +- 是否可变: 是 +- 描述: 触发缓存容量自动扩容的磁盘使用率上限(百分比)。当磁盘使用率超过此值时,系统会自动从 Data Cache 中驱逐缓存数据。从 v3.4.0 开始,默认值从 `80` 更改为 `90`。此项从 v4.0 开始重命名为 `disk_high_level`。 - 引入版本: v3.3.0 ##### disk_low_level @@ -3150,8 +3150,8 @@ curl http://:/varz - 默认值: 60 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 触发缓存容量自动缩容的磁盘使用率下限(百分比)。当磁盘使用率在此值以下持续时间超过 `datacache_disk_idle_seconds_for_expansion` 且为 Data Cache 分配的空间被完全利用时,系统将通过增加上限自动扩展缓存容量。从 v4.0 开始,此项从 `datacache_disk_low_level` 重命名为 `disk_low_level`。 +- 是否可变: 是 +- 描述: 触发缓存容量自动缩容的磁盘使用率下限(百分比)。当磁盘使用率在此值以下持续时间超过 `datacache_disk_idle_seconds_for_expansion` 指定的期限,并且 Data Cache 分配的空间已完全利用时,系统将通过增加上限自动扩容缓存容量。此项从 v4.0 开始重命名为 `disk_low_level`。 - 引入版本: v3.3.0 ##### disk_safe_level @@ -3159,8 +3159,8 @@ curl http://:/varz - 默认值: 80 - 类型: Int - 单位: - -- 可变: 是 -- 描述: Data Cache 的磁盘使用率安全级别(百分比)。当 Data Cache 执行自动伸缩时,系统会调整缓存容量,目标是使磁盘使用率尽可能接近此值。从 v3.4.0 开始,默认值从 `70` 更改为 `80`。从 v4.0 开始,此项从 `datacache_disk_safe_level` 重命名为 `disk_safe_level`。 +- 是否可变: 是 +- 描述: Data Cache 的磁盘使用安全级别(百分比)。当 Data Cache 执行自动伸缩时,系统会调整缓存容量,目标是使磁盘使用率尽可能接近此值。从 v3.4.0 开始,默认值从 `70` 更改为 `80`。此项从 v4.0 开始重命名为 `disk_safe_level`。 - 引入版本: v3.3.0 ##### enable_connector_sink_spill @@ -3168,8 +3168,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否为写入外部表启用溢出 (Spilling)。启用此功能可以防止在内存不足时写入外部表导致生成大量小文件。目前,此功能仅支持写入 Iceberg 表。 +- 是否可变: 是 +- 描述: 是否为写入外部表启用溢出 (Spilling)。启用此功能可防止在内存不足时因写入外部表而生成大量小文件。目前,此功能仅支持写入 Iceberg 表。 - 引入版本: v4.0.0 ##### enable_datacache_disk_auto_adjust @@ -3177,8 +3177,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 是否启用 Data Cache 磁盘容量的自动伸缩。启用后,系统会根据当前磁盘使用率动态调整缓存容量。从 v4.0 开始,此项从 `datacache_auto_adjust_enable` 重命名为 `enable_datacache_disk_auto_adjust`。 +- 是否可变: 是 +- 描述: 是否启用 Data Cache 磁盘容量的自动伸缩。启用后,系统会根据当前磁盘使用率动态调整缓存容量。此项从 v4.0 开始重命名为 `enable_datacache_disk_auto_adjust`。 - 引入版本: v3.3.0 ##### jdbc_connection_idle_timeout_ms @@ -3186,8 +3186,8 @@ curl http://:/varz - 默认值: 600000 - 类型: Int - 单位: 毫秒 -- 可变: 否 -- 描述: JDBC 连接池中空闲连接过期的时间长度。如果 JDBC 连接池中的连接空闲时间超过此值,连接池将关闭超过配置项 `jdbc_minimum_idle_connections` 指定数量的空闲连接。 +- 是否可变: 否 +- 描述: JDBC 连接池中空闲连接过期的时间长度。如果 JDBC 连接池中的连接空闲时间超过此值,连接池将关闭超出 `jdbc_minimum_idle_connections` 配置项指定数量的空闲连接。 - 引入版本: - ##### jdbc_connection_pool_size @@ -3195,8 +3195,8 @@ curl http://:/varz - 默认值: 8 - 类型: Int - 单位: - -- 可变: 否 -- 描述: JDBC 连接池大小。在每个 BE 节点上,使用相同 `jdbc_url` 访问外部表的查询共享同一个连接池。 +- 是否可变: 否 +- 描述: JDBC 连接池大小。在每个 BE 节点上,访问具有相同 `jdbc_url` 的外部表的查询共享同一个连接池。 - 引入版本: - ##### jdbc_minimum_idle_connections @@ -3204,8 +3204,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 否 -- 描述: JDBC 连接池中的最小空闲连接数。 +- 是否可变: 否 +- 描述: JDBC 连接池中最小空闲连接数。 - 引入版本: - ##### lake_clear_corrupted_cache_data @@ -3213,8 +3213,8 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 共享数据集群中是否允许系统清除损坏的数据缓存。 +- 是否可变: 是 +- 描述: 是否允许系统清除共享数据集群中损坏的数据缓存。 - 引入版本: v3.4 ##### lake_clear_corrupted_cache_meta @@ -3222,8 +3222,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 共享数据集群中是否允许系统清除损坏的元数据缓存。 +- 是否可变: 是 +- 描述: 是否允许系统清除共享数据集群中损坏的元数据缓存。 - 引入版本: v3.3 ##### lake_enable_vertical_compaction_fill_data_cache @@ -3231,8 +3231,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 共享数据集群中是否允许垂直 Compaction 任务将数据缓存到本地磁盘。 +- 是否可变: 是 +- 描述: 是否允许垂直 Compaction 任务在共享数据集群中将数据缓存到本地磁盘。 - 引入版本: v3.1.7, v3.2.3 ##### lake_replication_read_buffer_size @@ -3240,8 +3240,8 @@ curl http://:/varz - 默认值: 16777216 - 类型: Long - 单位: 字节 -- 可变: 是 -- 描述: 在 Lake Replication 期间下载 Lake Segment 文件时使用的读取缓冲区大小。此值确定读取远程文件的每个读取分配;实现使用此设置和 1 MB 最小值的较大者。较大的值会减少读取调用次数并可以提高吞吐量,但会增加每个并发下载使用的内存;较小的值会降低内存使用,但会增加 I/O 调用次数。根据网络带宽、存储 I/O 特性以及并行复制线程数进行调整。 +- 是否可变: 是 +- 描述: 在 Lake Replication 期间下载 Lake Segment 文件时使用的读取缓冲区大小。此值决定了读取远程文件的每次读取分配;实现使用此设置和 1 MB 最小值的较大者。较大的值可减少读取调用次数并提高吞吐量,但会增加每个并发下载使用的内存;较小的值可降低内存使用,但会增加 I/O 调用次数。根据网络带宽、存储 I/O 特性和并行复制线程数进行调整。 - 引入版本: - ##### lake_service_max_concurrency @@ -3249,8 +3249,8 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 共享数据集群中 RPC 请求的最大并发数。当达到此阈值时,传入请求将被拒绝。当此项设置为 `0` 时,不限制并发数。 +- 是否可变: 否 +- 描述: 共享数据集群中 RPC 请求的最大并发数。当达到此阈值时,传入请求将被拒绝。当此项设置为 `0` 时,不对并发施加限制。 - 引入版本: - ##### max_hdfs_scanner_num @@ -3258,8 +3258,8 @@ curl http://:/varz - 默认值: 50 - 类型: Int - 单位: - -- 可变: 否 -- 描述: 限制 ConnectorScanNode 可以同时运行的最大连接器 (HDFS/远程) 扫描器数量。在扫描启动期间,节点计算估计并发性(基于内存、块大小和 scanner_row_num),然后使用此值作为上限,以确定要保留的扫描器和块数量以及要启动的扫描器线程数量。在运行时调度待处理扫描器时也会参考此值(以避免过度订阅),以及在考虑文件句柄限制时决定可以重新提交多少待处理扫描器。降低此值可减少线程、内存和打开文件压力,但可能影响吞吐量;增加此值会提高并发性和资源使用率。 +- 是否可变: 否 +- 描述: 限制 ConnectorScanNode 可拥有的并发运行连接器 (HDFS/远程) 扫描器的最大数量。在扫描启动期间,节点计算估计并发性(基于内存、块大小和 scanner_row_num),然后用此值限制它,以确定要保留多少扫描器和块以及要启动多少扫描器线程。在运行时调度挂起扫描器时也会查询此值(以避免超额订阅),并在考虑文件句柄限制时决定可以重新提交多少挂起扫描器。降低此值可减少线程、内存和打开文件压力,但可能会降低吞吐量;增加此值可提高并发性和资源使用。 - 引入版本: v3.2.0 ##### query_max_memory_limit_percent @@ -3267,7 +3267,7 @@ curl http://:/varz - 默认值: 90 - 类型: Int - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: Query Pool 可使用的最大内存。它表示为进程内存限制的百分比。 - 引入版本: v3.1.0 @@ -3276,8 +3276,8 @@ curl http://:/varz - 默认值: 1073741824 - 类型: Int64 - 单位: - -- 可变: 否 -- 描述: 这是 RocksDB 中元数据写入缓冲区的最大大小。默认值为 1GB。 +- 是否可变: 否 +- 描述: RocksDB 中元数据的写入缓冲区的最大大小。默认值为 1GB。 - 引入版本: v3.5.0 ##### rocksdb_write_buffer_memory_percent @@ -3285,8 +3285,8 @@ curl http://:/varz - 默认值: 5 - 类型: Int64 - 单位: - -- 可变: 否 -- 描述: 这是 RocksDB 中元数据写入缓冲区的内存百分比。默认值为系统内存的 5%。但是,除此之外,写入缓冲区内存的最终计算大小不会小于 64MB 也不会超过 1G (`rocksdb_max_write_buffer_memory_bytes`)。 +- 是否可变: 否 +- 描述: RocksDB 中元数据的写入缓冲区内存百分比。默认值为系统内存的 5%。但是,除此之外,写入缓冲区内存的最终计算大小不会小于 64MB 也不会超过 1G (`rocksdb_max_write_buffer_memory_bytes`)。 - 引入版本: v3.5.0 ### 其他 @@ -3296,7 +3296,7 @@ curl http://:/varz - 默认值: 0 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 资源组 `default_mv_wg` 中物化视图刷新任务的最大并发数(每个 BE 节点)。默认值 `0` 表示没有限制。 - 引入版本: v3.1 @@ -3305,7 +3305,7 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 资源组 `default_mv_wg` 中物化视图刷新任务可使用的最大 CPU 核心数(每个 BE 节点)。 - 引入版本: v3.1 @@ -3314,7 +3314,7 @@ curl http://:/varz - 默认值: 0.8 - 类型: Double - 单位: -- 可变: 是 +- 是否可变: 是 - 描述: 资源组 `default_mv_wg` 中物化视图刷新任务可使用的最大内存比例(每个 BE 节点)。默认值表示内存的 80%。 - 引入版本: v3.1 @@ -3323,7 +3323,7 @@ curl http://:/varz - 默认值: 0.8 - 类型: Double - 单位: - -- 可变: 是 +- 是否可变: 是 - 描述: 资源组 `default_mv_wg` 中物化视图刷新任务触发中间结果溢出前的内存使用阈值。默认值表示内存的 80%。 - 引入版本: v3.1 @@ -3332,9 +3332,9 @@ curl http://:/varz - 默认值: false - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 对于 `error_urls` 调试,是否允许操作员根据其环境需求选择使用 FE 心跳中的原始主机名,或强制解析为 IP 地址。 - - `true`: 将主机名解析为 IP。 +- 是否可变: 是 +- 描述: 对于 `error_urls` 调试,是否允许操作符根据其环境需求选择使用 FE 心跳中的原始主机名,或强制解析为 IP 地址。 + - `true`: 将主机名解析为 IP 地址。 - `false` (默认): 在错误 URL 中保留原始主机名。 - 引入版本: v4.0.1 @@ -3343,8 +3343,8 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 启用时,被归类为可重试的 Tablet 应用失败(例如瞬态内存限制错误)将被重新调度重试,而不是立即将 Tablet 标记为错误。TabletUpdates 中的重试路径使用 `retry_apply_interval_second` 乘以当前失败计数并限制在 600 秒最大值来调度下一次尝试,因此退避随着连续失败而增长。明确不可重试的错误(例如损坏)会绕过重试,并导致应用进程立即进入错误状态。重试将持续到达到整体超时/终止条件,之后应用将进入错误状态。关闭此功能会禁用失败应用任务的自动重新调度,并导致失败的应用在没有重试的情况下转换为错误状态。 +- 是否可变: 是 +- 描述: 启用时,被分类为可重试的 Tablet 应用失败(例如瞬态内存限制错误)将重新调度重试,而不是立即将 Tablet 标记为错误。TabletUpdates 中的重试路径使用 `retry_apply_interval_second` 乘以当前失败计数并限制在 600 秒最大值来调度下一次尝试,因此退避时间随连续失败而增长。明确不可重试的错误(例如损坏)会绕过重试,并导致应用进程立即进入错误状态。重试会持续进行,直到达到总体超时/终止条件,之后应用将进入错误状态。关闭此功能将禁用失败应用任务的自动重新调度,并导致失败的应用在不重试的情况下转换为错误状态。 - 引入版本: v3.2.9 ##### enable_token_check @@ -3352,17 +3352,17 @@ curl http://:/varz - 默认值: true - 类型: Boolean - 单位: - -- 可变: 是 -- 描述: 一个布尔值,控制是否启用令牌检查。`true` 表示启用令牌检查,`false` 表示禁用。 +- 是否可变: 是 +- 描述: 一个布尔值,控制是否启用 token 检查。`true` 表示启用 token 检查,`false` 表示禁用。 - 引入版本: - ##### es_scroll_keepalive - 默认值: 5m - 类型: String -- 单位: 分钟(带后缀的字符串,例如 "5m") -- 可变: 否 -- 描述: 发送到 Elasticsearch 以用于滚动搜索上下文的保持活动时长。该值在构建初始滚动 URL (`?scroll=`) 和发送后续滚动请求(通过 ESScrollQueryBuilder)时按原样使用(例如 "5m")。这控制了 ES 搜索上下文在 ES 端进行垃圾回收之前保留多长时间;设置更长会使滚动上下文保持活动更长时间,但会延长 ES 集群上的资源使用。该值在启动时由 ES 扫描读取器读取,不能在运行时更改。 +- 单位: 分钟 (带后缀的字符串,例如 "5m") +- 是否可变: 否 +- 描述: 发送给 Elasticsearch 用于滚动搜索上下文的保活持续时间。该值在构建初始滚动 URL(`?scroll=`)和发送后续滚动请求(通过 ESScrollQueryBuilder)时按原样使用(例如 "5m")。这控制了 ES 搜索上下文在 ES 端垃圾回收之前保留多长时间;将其设置得更长会使滚动上下文保持活动更长时间,但会延长 ES 集群上的资源使用。该值在启动时由 ES 扫描读取器读取,不能在运行时更改。 - 引入版本: v3.2.0 ##### load_replica_status_check_interval_ms_on_failure @@ -3370,8 +3370,8 @@ curl http://:/varz - 默认值: 2000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 如果上次检查 RPC 失败,辅助副本检查主副本状态的间隔。 +- 是否可变: 是 +- 描述: 如果上次检查 RPC 失败,次要副本检查其在主要副本上的状态的时间间隔。 - 引入版本: v3.5.1 ##### load_replica_status_check_interval_ms_on_success @@ -3379,8 +3379,8 @@ curl http://:/varz - 默认值: 15000 - 类型: Int - 单位: 毫秒 -- 可变: 是 -- 描述: 如果上次检查 RPC 成功,辅助副本检查主副本状态的间隔。 +- 是否可变: 是 +- 描述: 如果上次检查 RPC 成功,次要副本检查其在主要副本上的状态的时间间隔。 - 引入版本: v3.5.1 ##### max_length_for_bitmap_function @@ -3388,7 +3388,7 @@ curl http://:/varz - 默认值: 1000000 - 类型: Int - 单位: 字节 -- 可变: 否 +- 是否可变: 否 - 描述: Bitmap 函数输入值的最大长度。 - 引入版本: - @@ -3397,7 +3397,7 @@ curl http://:/varz - 默认值: 200000 - 类型: Int - 单位: 字节 -- 可变: 否 +- 是否可变: 否 - 描述: `to_base64()` 函数输入值的最大长度。 - 引入版本: - @@ -3406,8 +3406,8 @@ curl http://:/varz - 默认值: 75 - 类型: Long - 单位: 百分比 -- 可变: 是 -- 描述: 高水位内存阈值,表示为进程内存限制的百分比。当总内存消耗超过此百分比时,BE 开始逐渐释放内存(目前通过逐出数据缓存和更新缓存)以缓解压力。监视器使用此值计算 `memory_high = mem_limit * memory_high_level / 100`,如果消耗 `>` `memory_high`,则在 GC 顾问的指导下执行受控逐出;如果消耗超过 `memory_urgent_level`(一个单独的配置),则会发生更积极的即时减少。此值还会用于在超过阈值时禁用某些内存密集型操作(例如主键预加载)。必须满足与 `memory_urgent_level` 的验证(`memory_urgent_level > memory_high_level`,`memory_high_level >= 1`,`memory_urgent_level <= 100`)。 +- 是否可变: 是 +- 描述: 高水位内存阈值,表示为进程内存限制的百分比。当总内存消耗超过此百分比时,BE 开始逐渐释放内存(目前通过逐出数据缓存和更新缓存)以缓解压力。监视器使用此值计算 `memory_high = mem_limit * memory_high_level / 100`,如果消耗 `>` memory_high,则执行由 GC Advisor 指导的受控逐出;如果消耗超过 `memory_urgent_level`(一个单独的配置),则会发生更激进的即时缩减。此值也会被查询,以在阈值超过时禁用某些内存密集型操作(例如主键预加载)。必须满足 `memory_urgent_level` 的验证(`memory_urgent_level` > `memory_high_level`,`memory_high_level` >= 1,`memory_urgent_level` <= 100)。 - 引入版本: v3.2.0 ##### report_exec_rpc_request_retry_num @@ -3415,8 +3415,8 @@ curl http://:/varz - 默认值: 10 - 类型: Int - 单位: - -- 可变: 是 -- 描述: 向 FE 报告 exec rpc 请求的 rpc 请求重试次数。默认值为 10,这意味着如果 rpc 请求失败,只有在是 Fragment Instance 完成 rpc 时,它才会被重试 10 次。报告 exec rpc 请求对于加载作业很重要,如果一个 Fragment Instance 完成报告失败,加载作业将挂起直到超时。 +- 是否可变: 是 +- 描述: 报告 exec RPC 请求到 FE 的 RPC 请求重试次数。默认值为 10,这意味着如果 RPC 请求失败,它将重试 10 次,仅当它是片段实例完成 RPC 时。报告 exec RPC 请求对于加载作业很重要,如果一个片段实例完成报告失败,加载作业将挂起直到超时。 - 引入版本: - ##### sleep_one_second @@ -3424,8 +3424,8 @@ curl http://:/varz - 默认值: 1 - 类型: Int - 单位: 秒 -- 可变: 否 -- 描述: BE 代理工作线程使用的一个小的全局休眠间隔(秒),当主地址/心跳尚不可用或需要短时间重试/退避时作为一秒的暂停。在代码库中,它被几个报告工作池(例如 ReportDiskStateTaskWorkerPool、ReportOlapTableTaskWorkerPool、ReportWorkgroupTaskWorkerPool)引用,以避免忙等待并减少重试时的 CPU 消耗。增加此值会降低重试频率和对主可用性的响应速度;减少它会增加轮询速率和 CPU 使用率。仅在了解响应性与资源使用权衡的情况下进行调整。 +- 是否可变: 否 +- 描述: BE 代理工作线程使用的全局小睡眠间隔(秒),用于在主地址/心跳尚不可用或需要短暂重试/退避时暂停一秒。在代码库中,它被多个报告工作池引用(例如 ReportDiskStateTaskWorkerPool、ReportOlapTableTaskWorkerPool、ReportWorkgroupTaskWorkerPool),以避免在重试时忙等待并减少 CPU 消耗。增加此值会减慢重试频率和对主节点可用性的响应速度;减少它会增加轮询速率和 CPU 使用率。仅在了解响应速度和资源使用权衡后进行调整。 - 引入版本: v3.2.0 ##### small_file_dir @@ -3433,7 +3433,7 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/lib/small_file/` - 类型: String - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 用于存储文件管理器下载的文件的目录。 - 引入版本: - @@ -3442,8 +3442,8 @@ curl http://:/varz - 默认值: 4194304 - 类型: Int - 单位: 字节 -- 可变: 是 -- 描述: 文件拷贝操作在将快照文件上传到远程存储(Broker 或直接 FileSystem)时使用的缓冲区大小(字节)。在上传路径 (snapshot_loader.cpp) 中,此值作为每个上传流的读/写块大小传递给 fs::copy。默认值为 4 MiB。增加此值可以提高高延迟或高带宽链接的吞吐量,但会增加每个并发上传的内存使用;减少此值可降低每个流的内存,但可能降低传输效率。与 `upload_worker_count` 和总体可用内存一起调整。 +- 是否可变: 是 +- 描述: 文件复制操作在将快照文件上传到远程存储(Broker 或直接 FileSystem)时使用的缓冲区大小(字节)。在上传路径 (snapshot_loader.cpp) 中,此值作为每个上传流的读/写块大小传递给 fs::copy。默认值为 4 MiB。增加此值可以提高高延迟或高带宽链接上的吞吐量,但会增加每个并发上传的内存使用;减少它会降低每个流的内存,但可能会降低传输效率。与 `upload_worker_count` 和总可用内存一起调整。 - 引入版本: v3.2.13 ##### user_function_dir @@ -3451,7 +3451,7 @@ curl http://:/varz - 默认值: `${STARROCKS_HOME}/lib/udf` - 类型: String - 单位: - -- 可变: 否 +- 是否可变: 否 - 描述: 用于存储用户定义函数 (UDF) 的目录。 - 引入版本: - @@ -3460,14 +3460,14 @@ curl http://:/varz - 默认值: 1048576 (1 MB) - 类型: long - 单位: 字节 -- 可变: 否 -- 描述: 从 INFO 日志文件读取并在 BE 调试 Web 服务器的日志页面上显示的最大字节数。处理程序使用此值计算查找偏移量(显示最后 N 字节),以避免读取或提供非常大的日志文件。如果日志文件小于此值,则显示整个文件。注意:在当前实现中,读取和提供 INFO 日志的代码被注释掉了,处理程序报告 INFO 日志文件无法打开,因此除非启用日志提供代码,否则此参数可能无效。 +- 是否可变: 否 +- 描述: 从 INFO 日志文件中读取并在 BE 调试 Web 服务器的日志页面上显示的最大字节数。处理程序使用此值计算查找偏移量(显示最后 N 个字节),以避免读取或提供非常大的日志文件。如果日志文件小于此值,则显示整个文件。注意:在当前的实现中,读取和提供 INFO 日志的代码被注释掉了,处理程序报告 INFO 日志文件无法打开,因此除非启用日志提供代码,否则此参数可能无效。 - 引入版本: v3.2.0 -### 已删除参数 +### 已移除参数 ##### enable_bit_unpack_simd -- 状态: 已删除 -- 描述: 此参数已被删除。Bit-unpack SIMD 选择现在在编译时处理 (AVX2/BMI2),并自动回退到默认实现。 -- 删除版本: - +- 状态: 已移除 +- 描述: 此参数已被移除。Bit-unpack SIMD 选择现在在编译时处理 (AVX2/BMI2),并自动回退到默认实现。 +- 移除版本: - diff --git a/docs/zh/administration/management/Backup_and_restore.md b/docs/zh/administration/management/Backup_and_restore.md index 5fc65b6..6ce535b 100644 --- a/docs/zh/administration/management/Backup_and_restore.md +++ b/docs/zh/administration/management/Backup_and_restore.md @@ -2,15 +2,15 @@ displayed_sidebar: docs --- -# 备份与恢复数据 +# 备份和恢复数据 本文介绍如何在 StarRocks 中备份和恢复数据,或将数据迁移到新的 StarRocks 集群。 -StarRocks 支持将数据备份为快照并存储到远端存储系统,然后将数据恢复到任意 StarRocks 集群。 +StarRocks 支持将数据以快照形式备份到远程存储系统,并可将数据恢复到任意 StarRocks 集群。 -从 v3.4.0 版本起,StarRocks 增强了 BACKUP 和 RESTORE 的功能,支持更多对象并重构了语法以提高灵活性。 +从 v3.4.0 版本开始,StarRocks 增强了 BACKUP 和 RESTORE 的功能,支持更多对象并重构了语法以提供更好的灵活性。 -StarRocks 支持以下远端存储系统: +StarRocks 支持以下远程存储系统: - Apache™ Hadoop® (HDFS) 集群 - AWS S3 @@ -20,136 +20,136 @@ StarRocks 支持以下远端存储系统: StarRocks 支持备份以下对象: - 内部数据库、表(所有类型和分区策略)以及分区 -- 外部 Catalog 的元数据(v3.4.0 版本起支持) +- 外部 Catalog 的元数据(v3.4.0 版本及以后支持) - 同步物化视图和异步物化视图 -- 逻辑视图(v3.4.0 版本起支持) -- 用户定义函数 UDFs(v3.4.0 版本起支持) +- 逻辑视图(v3.4.0 版本及以后支持) +- 用户自定义函数(v3.4.0 版本及以后支持) -> **NOTE** +> **注意** > -> 共享数据集群模式的 StarRocks 集群不支持数据 BACKUP 和 RESTORE。 +> 共享数据 StarRocks 集群不支持数据 BACKUP 和 RESTORE。 ## 创建仓库 -在备份数据之前,您需要创建一个仓库,用于在远端存储系统中存储数据快照。您可以在 StarRocks 集群中创建多个仓库。有关详细说明,请参阅 [CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md)。 +在备份数据之前,您需要创建一个仓库,用于在远程存储系统中存储数据快照。您可以在 StarRocks 集群中创建多个仓库。有关详细说明,请参阅 [CREATE REPOSITORY](../../sql-reference/sql-statements/backup_restore/CREATE_REPOSITORY.md)。 - 在 HDFS 中创建仓库 -以下示例在 HDFS 集群中创建名为 `test_repo` 的仓库。 +以下示例在 HDFS 集群中创建了一个名为 `test_repo` 的仓库。 ```SQL CREATE REPOSITORY test_repo WITH BROKER ON LOCATION "hdfs://:/repo_dir/backup" PROPERTIES( - "username" = "", - "password" = "" + "username" = "", // 用户名 + "password" = "" // 密码 ); ``` - 在 AWS S3 中创建仓库 - 您可以选择 IAM user-based credential (Access Key and Secret Key)、Instance Profile 或 Assumed Role 作为访问 AWS S3 的凭证方法。 + 您可以选择基于 IAM 用户的凭证(Access Key 和 Secret Key)、Instance Profile 或 Assumed Role 作为访问 AWS S3 的凭证方式。 - - 以下示例使用 IAM user-based credentials 凭证方法,在 AWS S3 存储桶 `bucket_s3` 中创建名为 `test_repo` 的仓库。 + - 以下示例在 AWS S3 存储桶 `bucket_s3` 中创建了一个名为 `test_repo` 的仓库,使用基于 IAM 用户的凭证作为凭证方式。 ```SQL CREATE REPOSITORY test_repo WITH BROKER ON LOCATION "s3a://bucket_s3/backup" PROPERTIES( - "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", - "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyyyyyyyyy", - "aws.s3.region" = "us-east-1" + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", // Access Key + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyyyyyyyyy", // Secret Key + "aws.s3.region" = "us-east-1" // 地域 ); ``` - - 以下示例使用 Instance Profile 凭证方法,在 AWS S3 存储桶 `bucket_s3` 中创建名为 `test_repo` 的仓库。 + - 以下示例在 AWS S3 存储桶 `bucket_s3` 中创建了一个名为 `test_repo` 的仓库,使用 Instance Profile 作为凭证方式。 ```SQL CREATE REPOSITORY test_repo WITH BROKER ON LOCATION "s3a://bucket_s3/backup" PROPERTIES( - "aws.s3.use_instance_profile" = "true", - "aws.s3.region" = "us-east-1" + "aws.s3.use_instance_profile" = "true", // 是否使用 Instance Profile + "aws.s3.region" = "us-east-1" // 地域 ); ``` - - 以下示例使用 Assumed Role 凭证方法,在 AWS S3 存储桶 `bucket_s3` 中创建名为 `test_repo` 的仓库。 + - 以下示例在 AWS S3 存储桶 `bucket_s3` 中创建了一个名为 `test_repo` 的仓库,使用 Assumed Role 作为凭证方式。 ```SQL CREATE REPOSITORY test_repo WITH BROKER ON LOCATION "s3a://bucket_s3/backup" PROPERTIES( - "aws.s3.use_instance_profile" = "true", - "aws.s3.iam_role_arn" = "arn:aws:iam::xxxxxxxxxx:role/yyyyyyyy", - "aws.s3.region" = "us-east-1" + "aws.s3.use_instance_profile" = "true", // 是否使用 Instance Profile + "aws.s3.iam_role_arn" = "arn:aws:iam::xxxxxxxxxx:role/yyyyyyyy", // IAM Role ARN + "aws.s3.region" = "us-east-1" // 地域 ); ``` -> **NOTE** +> **注意** > -> StarRocks 仅支持根据 S3A 协议在 AWS S3 中创建仓库。因此,当您在 AWS S3 中创建仓库时,必须将 `ON LOCATION` 中作为仓库位置传入的 S3 URI 中的 `s3://` 替换为 `s3a://`。 +> StarRocks 仅支持按照 S3A 协议在 AWS S3 中创建仓库。因此,当您在 AWS S3 中创建仓库时,必须将 `ON LOCATION` 中作为仓库位置传入的 S3 URI 中的 `s3://` 替换为 `s3a://`。 - 在 Google GCS 中创建仓库 -以下示例在 Google GCS 存储桶 `bucket_gcs` 中创建名为 `test_repo` 的仓库。 +以下示例在 Google GCS 存储桶 `bucket_gcs` 中创建了一个名为 `test_repo` 的仓库。 ```SQL CREATE REPOSITORY test_repo WITH BROKER ON LOCATION "s3a://bucket_gcs/backup" PROPERTIES( - "fs.s3a.access.key" = "xxxxxxxxxxxxxxxxxxxx", - "fs.s3a.secret.key" = "yyyyyyyyyyyyyyyyyyyy", - "fs.s3a.endpoint" = "storage.googleapis.com" + "fs.s3a.access.key" = "xxxxxxxxxxxxxxxxxxxx", // Access Key + "fs.s3a.secret.key" = "yyyyyyyyyyyyyyyyyyyy", // Secret Key + "fs.s3a.endpoint" = "storage.googleapis.com" // Endpoint ); ``` -> **NOTE** +> **注意** > -> - StarRocks 仅支持根据 S3A 协议在 Google GCS 中创建仓库。因此,当您在 Google GCS 中创建仓库时,必须将 `ON LOCATION` 中作为仓库位置传入的 GCS URI 中的前缀替换为 `s3a://`。 -> - 端点地址中不要指定 `https`。 +> - StarRocks 仅支持按照 S3A 协议在 Google GCS 中创建仓库。因此,当您在 Google GCS 中创建仓库时,必须将 `ON LOCATION` 中作为仓库位置传入的 GCS URI 中的前缀替换为 `s3a://`。 +> - 请勿在 endpoint 地址中指定 `https`。 - 在 MinIO 中创建仓库 -以下示例在 MinIO 存储桶 `bucket_minio` 中创建名为 `test_repo` 的仓库。 +以下示例在 MinIO 存储桶 `bucket_minio` 中创建了一个名为 `test_repo` 的仓库。 ```SQL CREATE REPOSITORY test_repo WITH BROKER ON LOCATION "s3://bucket_minio/backup" PROPERTIES( - "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", - "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy", - "aws.s3.endpoint" = "http://minio:9000" + "aws.s3.access_key" = "XXXXXXXXXXXXXXXXX", // Access Key + "aws.s3.secret_key" = "yyyyyyyyyyyyyyyyy", // Secret Key + "aws.s3.endpoint" = "http://minio:9000" // Endpoint ); ``` -仓库创建完成后,您可以通过 [SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md) 命令查看仓库。数据恢复完成后,您可以使用 [DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md) 命令删除 StarRocks 中的仓库。但是,存储在远端存储系统中的数据快照无法通过 StarRocks 删除。您需要手动在远端存储系统中删除它们。 +仓库创建完成后,您可以通过 [SHOW REPOSITORIES](../../sql-reference/sql-statements/backup_restore/SHOW_REPOSITORIES.md) 查看仓库。数据恢复后,您可以使用 [DROP REPOSITORY](../../sql-reference/sql-statements/backup_restore/DROP_REPOSITORY.md) 在 StarRocks 中删除仓库。但是,备份在远程存储系统中的数据快照无法通过 StarRocks 删除。您需要手动在远程存储系统中删除它们。 ## 备份数据 -仓库创建完成后,您需要创建一个数据快照并将其备份到远端仓库。有关详细说明,请参阅 [BACKUP](../../sql-reference/sql-statements/backup_restore/BACKUP.md)。BACKUP 是一种异步操作。您可以使用 [SHOW BACKUP](../../sql-reference/sql-statements/backup_restore/SHOW_BACKUP.md) 命令检查 BACKUP 作业的状态,或使用 [CANCEL BACKUP](../../sql-reference/sql-statements/backup_restore/CANCEL_BACKUP.md) 命令取消 BACKUP 作业。 +仓库创建完成后,您需要在远程仓库中创建并备份数据快照。有关详细说明,请参阅 [BACKUP](../../sql-reference/sql-statements/backup_restore/BACKUP.md)。BACKUP 是一个异步操作。您可以使用 [SHOW BACKUP](../../sql-reference/sql-statements/backup_restore/SHOW_BACKUP.md) 查看 BACKUP 作业的状态,或使用 [CANCEL BACKUP](../../sql-reference/sql-statements/backup_restore/CANCEL_BACKUP.md) 取消 BACKUP 作业。 -StarRocks 支持在数据库、表或分区级别进行 FULL 备份。 +StarRocks 支持在数据库、表或分区粒度上进行 FULL 备份。 -如果您的表存储了大量数据,建议您按分区备份和恢复数据。这样,您可以减少作业失败时的重试成本。如果您需要定期备份增量数据,可以为表配置一个 [分区方案](../../table_design/data_distribution/Data_distribution.md#partitioning),每次只备份新分区。 +如果您的表中存储了大量数据,建议您按分区进行数据备份和恢复。这样可以减少作业失败时的重试成本。如果您需要定期备份增量数据,可以为表配置一个 [分区方案](../../table_design/data_distribution/Data_distribution.md#partitioning),每次只备份新分区。 ### 备份数据库 对数据库执行完整 BACKUP 操作将备份数据库中的所有表、同步和异步物化视图、逻辑视图和 UDF。 -以下示例将数据库 `sr_hub` 备份到快照 `sr_hub_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 备份到快照 `sr_hub_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 BACKUP DATABASE sr_hub SNAPSHOT sr_hub_backup TO test_repo; --- Compatible with the syntax in earlier versions. (兼容早期版本语法) +-- 兼容早期版本语法。 BACKUP SNAPSHOT sr_hub.sr_hub_backup TO test_repo; ``` @@ -158,21 +158,21 @@ TO test_repo; StarRocks 支持备份和恢复所有类型和分区策略的表。对表执行完整 BACKUP 操作将备份该表及其上构建的同步物化视图。 -以下示例将数据库 `sr_hub` 中的表 `sr_member` 备份到快照 `sr_member_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的表 `sr_member` 备份到快照 `sr_member_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 BACKUP DATABASE sr_hub SNAPSHOT sr_member_backup TO test_repo ON (TABLE sr_member); --- Compatible with the syntax in earlier versions. (兼容早期版本语法) +-- 兼容早期版本语法。 BACKUP SNAPSHOT sr_hub.sr_member_backup TO test_repo ON (sr_member); ``` -以下示例将数据库 `sr_hub` 中的两张表 `sr_member` 和 `sr_pmc` 备份到快照 `sr_core_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的两张表 `sr_member` 和 `sr_pmc` 备份到快照 `sr_core_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_core_backup @@ -180,7 +180,7 @@ TO test_repo ON (TABLE sr_member, TABLE sr_pmc); ``` -以下示例将数据库 `sr_hub` 中的所有表备份到快照 `sr_all_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的所有表备份到快照 `sr_all_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_all_backup @@ -190,29 +190,29 @@ ON (ALL TABLES); ### 备份分区 -以下示例将数据库 `sr_hub` 中表 `sr_member` 的分区 `p1` 备份到快照 `sr_par_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中表 `sr_member` 的分区 `p1` 备份到快照 `sr_par_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 BACKUP DATABASE sr_hub SNAPSHOT sr_par_backup TO test_repo ON (TABLE sr_member PARTITION (p1)); --- Compatible with the syntax in earlier versions. (兼容早期版本语法) +-- 兼容早期版本语法。 BACKUP SNAPSHOT sr_hub.sr_par_backup TO test_repo ON (sr_member PARTITION (p1)); ``` -您可以指定多个分区名称,用逗号 (`,`) 分隔,以批量备份分区。 +您可以指定多个分区名(用逗号 (`,`) 分隔)来批量备份分区。 ### 备份物化视图 -您无需手动备份同步物化视图,因为它们会随同基表的 BACKUP 操作一起备份。 +您无需手动备份同步物化视图,因为它们会随同基础表的 BACKUP 操作一起备份。 异步物化视图可以随同其所属数据库的 BACKUP 操作一起备份。您也可以手动备份它们。 -以下示例将数据库 `sr_hub` 中的物化视图 `sr_mv1` 备份到快照 `sr_mv1_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的物化视图 `sr_mv1` 备份到快照 `sr_mv1_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_mv1_backup @@ -220,7 +220,7 @@ TO test_repo ON (MATERIALIZED VIEW sr_mv1); ``` -以下示例将数据库 `sr_hub` 中的两个物化视图 `sr_mv1` 和 `sr_mv2` 备份到快照 `sr_mv2_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的两个物化视图 `sr_mv1` 和 `sr_mv2` 备份到快照 `sr_mv2_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_mv2_backup @@ -228,7 +228,7 @@ TO test_repo ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2); ``` -以下示例将数据库 `sr_hub` 中的所有物化视图备份到快照 `sr_mv3_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的所有物化视图备份到快照 `sr_mv3_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_mv3_backup @@ -238,7 +238,7 @@ ON (ALL MATERIALIZED VIEWS); ### 备份逻辑视图 -以下示例将数据库 `sr_hub` 中的逻辑视图 `sr_view1` 备份到快照 `sr_view1_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的逻辑视图 `sr_view1` 备份到快照 `sr_view1_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_view1_backup @@ -246,7 +246,7 @@ TO test_repo ON (VIEW sr_view1); ``` -以下示例将数据库 `sr_hub` 中的两个逻辑视图 `sr_view1` 和 `sr_view2` 备份到快照 `sr_view2_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的两个逻辑视图 `sr_view1` 和 `sr_view2` 备份到快照 `sr_view2_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_view2_backup @@ -254,7 +254,7 @@ TO test_repo ON (VIEW sr_view1, VIEW sr_view2); ``` -以下示例将数据库 `sr_hub` 中的所有逻辑视图备份到快照 `sr_view3_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的所有逻辑视图备份到快照 `sr_view3_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_view3_backup @@ -264,7 +264,7 @@ ON (ALL VIEWS); ### 备份 UDF -以下示例将数据库 `sr_hub` 中的 UDF `sr_udf1` 备份到快照 `sr_udf1_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的 UDF `sr_udf1` 备份到快照 `sr_udf1_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_udf1_backup @@ -272,7 +272,7 @@ TO test_repo ON (FUNCTION sr_udf1); ``` -以下示例将数据库 `sr_hub` 中的两个 UDF `sr_udf1` 和 `sr_udf2` 备份到快照 `sr_udf2_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的两个 UDF `sr_udf1` 和 `sr_udf2` 备份到快照 `sr_udf2_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_udf2_backup @@ -280,7 +280,7 @@ TO test_repo ON (FUNCTION sr_udf1, FUNCTION sr_udf2); ``` -以下示例将数据库 `sr_hub` 中的所有 UDF 备份到快照 `sr_udf3_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将数据库 `sr_hub` 中的所有 UDF 备份到快照 `sr_udf3_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP DATABASE sr_hub SNAPSHOT sr_udf3_backup @@ -290,28 +290,28 @@ ON (ALL FUNCTIONS); ### 备份外部 Catalog 的元数据 -以下示例将外部 Catalog `iceberg` 的元数据备份到快照 `iceberg_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将外部 Catalog `iceberg` 的元数据备份到快照 `iceberg_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP EXTERNAL CATALOG (iceberg) SNAPSHOT iceberg_backup TO test_repo; ``` -以下示例将两个外部 Catalog `iceberg` 和 `hive` 的元数据备份到快照 `iceberg_hive_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将外部 Catalog `iceberg` 和 `hive` 的元数据备份到快照 `iceberg_hive_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP EXTERNAL CATALOGS (iceberg, hive) SNAPSHOT iceberg_hive_backup TO test_repo; ``` -以下示例将所有外部 Catalog 的元数据备份到快照 `all_catalog_backup` 中,并将快照上传到仓库 `test_repo`。 +以下示例将所有外部 Catalog 的元数据备份到快照 `all_catalog_backup` 中,并将该快照上传到仓库 `test_repo`。 ```SQL BACKUP ALL EXTERNAL CATALOGS SNAPSHOT all_catalog_backup TO test_repo; ``` -要取消对外部 Catalog 的 BACKUP 操作,请执行以下语句: +要取消外部 Catalog 的 BACKUP 操作,请执行以下语句: ```SQL CANCEL BACKUP FOR EXTERNAL CATALOG; @@ -319,23 +319,23 @@ CANCEL BACKUP FOR EXTERNAL CATALOG; ## 恢复数据 -您可以将备份在远端存储系统中的数据快照恢复到当前或其他的 StarRocks 集群,以实现数据恢复或数据迁移。 +您可以将远程存储系统中备份的数据快照恢复到当前或其它 StarRocks 集群,以恢复或迁移数据。 -**当您从快照恢复对象时,必须指定快照的时间戳。** +**从快照恢复对象时,必须指定快照的时间戳。** -使用 [RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md) 语句来恢复远端存储系统中的数据快照。 +使用 [RESTORE](../../sql-reference/sql-statements/backup_restore/RESTORE.md) 语句恢复远程存储系统中的数据快照。 -RESTORE 是一种异步操作。您可以使用 [SHOW RESTORE](../../sql-reference/sql-statements/backup_restore/SHOW_RESTORE.md) 命令检查 RESTORE 作业的状态,或使用 [CANCEL RESTORE](../../sql-reference/sql-statements/backup_restore/CANCEL_RESTORE.md) 命令取消 RESTORE 作业。 +RESTORE 是一个异步操作。您可以使用 [SHOW RESTORE](../../sql-reference/sql-statements/backup_restore/SHOW_RESTORE.md) 查看 RESTORE 作业的状态,或使用 [CANCEL RESTORE](../../sql-reference/sql-statements/backup_restore/CANCEL_RESTORE.md) 取消 RESTORE 作业。 -### (可选) 在新集群中创建仓库 +### (可选)在新集群中创建仓库 -要将数据迁移到另一个 StarRocks 集群,您需要在目标集群中创建具有相同**仓库名称**和**位置**的仓库,否则将无法查看之前备份的数据快照。有关详细信息,请参阅 [创建仓库](#create-a-repository)。 +要将数据迁移到另一个 StarRocks 集群,您需要在目标集群中创建与原集群具有相同**仓库名称**和**位置**的仓库,否则将无法查看之前备份的数据快照。有关详细信息,请参阅[创建仓库](#create-a-repository)。 ### 获取快照时间戳 -在恢复数据之前,您可以使用 [SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md) 命令查看仓库中的快照信息以获取时间戳。 +在恢复数据之前,您可以使用 [SHOW SNAPSHOT](../../sql-reference/sql-statements/backup_restore/SHOW_SNAPSHOT.md) 查看仓库中的快照以获取时间戳。 -以下示例检查 `test_repo` 中的快照信息。 +以下示例查看 `test_repo` 中的快照信息。 ```Plain mysql> SHOW SNAPSHOT ON test_repo; @@ -352,13 +352,13 @@ mysql> SHOW SNAPSHOT ON test_repo; 以下示例将快照 `sr_hub_backup` 中的数据库 `sr_hub` 恢复到目标集群中的数据库 `sr_hub`。如果快照中不存在该数据库,系统将返回错误。如果目标集群中不存在该数据库,系统将自动创建。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 RESTORE SNAPSHOT sr_hub_backup FROM test_repo DATABASE sr_hub PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); --- Compatible with the syntax in earlier versions. (兼容早期版本语法) +-- 兼容早期版本语法。 RESTORE SNAPSHOT sr_hub.sr_hub_backup FROM `test_repo` PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); @@ -367,7 +367,7 @@ PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); 以下示例将快照 `sr_hub_backup` 中的数据库 `sr_hub` 恢复到目标集群中的数据库 `sr_hub_new`。如果快照中不存在数据库 `sr_hub`,系统将返回错误。如果目标集群中不存在数据库 `sr_hub_new`,系统将自动创建。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 RESTORE SNAPSHOT sr_hub_backup FROM test_repo DATABASE sr_hub AS sr_hub_new @@ -379,14 +379,14 @@ PROPERTIES("backup_timestamp" = "2024-12-09-10-25-58-842"); 以下示例将快照 `sr_member_backup` 中数据库 `sr_hub` 的表 `sr_member` 恢复到目标集群中数据库 `sr_hub` 的表 `sr_member`。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 RESTORE SNAPSHOT sr_member_backup FROM test_repo DATABASE sr_hub ON (TABLE sr_member) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); --- Compatible with the syntax in earlier versions. (兼容早期版本语法) +-- 兼容早期版本语法。 RESTORE SNAPSHOT sr_hub.sr_member_backup FROM test_repo ON (sr_member) @@ -413,7 +413,7 @@ ON (TABLE sr_member, TABLE sr_pmc) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_all_backup` 中数据库 `sr_hub` 的所有表。 +以下示例从快照 `sr_all_backup` 中恢复数据库 `sr_hub` 的所有表。 ```SQL RESTORE SNAPSHOT sr_all_backup @@ -422,7 +422,7 @@ DATABASE sr_hub ON (ALL TABLES); ``` -以下示例恢复快照 `sr_all_backup` 中数据库 `sr_hub` 的其中一张表。 +以下示例从快照 `sr_all_backup` 中恢复数据库 `sr_hub` 的一张表。 ```SQL RESTORE SNAPSHOT sr_all_backup @@ -437,21 +437,21 @@ PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); 以下示例将快照 `sr_par_backup` 中表 `sr_member` 的分区 `p1` 恢复到目标集群中表 `sr_member` 的分区 `p1`。 ```SQL --- Supported from v3.4.0 onwards. (v3.4.0 版本起支持) +-- v3.4.0 及以后版本支持。 RESTORE SNAPSHOT sr_par_backup FROM test_repo DATABASE sr_hub ON (TABLE sr_member PARTITION (p1)) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); --- Compatible with the syntax in earlier versions. (兼容早期版本语法) +-- 兼容早期版本语法。 RESTORE SNAPSHOT sr_hub.sr_par_backup FROM test_repo ON (sr_member PARTITION (p1)) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -您可以指定多个分区名称,用逗号 (`,`) 分隔,以批量恢复分区。 +您可以指定多个分区名(用逗号 (`,`) 分隔)来批量恢复分区。 ### 恢复物化视图 @@ -475,7 +475,7 @@ ON (MATERIALIZED VIEW sr_mv1, MATERIALIZED VIEW sr_mv2) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_mv3_backup` 中数据库 `sr_hub` 的所有物化视图到目标集群。 +以下示例将快照 `sr_mv3_backup` 中数据库 `sr_hub` 的所有物化视图恢复到目标集群。 ```SQL RESTORE SNAPSHOT sr_mv3_backup @@ -485,7 +485,7 @@ ON (ALL MATERIALIZED VIEWS) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_mv3_backup` 中数据库 `sr_hub` 的其中一个物化视图到目标集群。 +以下示例将快照 `sr_mv3_backup` 中数据库 `sr_hub` 的一个物化视图恢复到目标集群。 ```SQL RESTORE SNAPSHOT sr_mv3_backup @@ -497,10 +497,10 @@ PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); :::info -RESTORE 之后,您可以使用 [SHOW MATERIALIZED VIEWS](../../sql-reference/sql-statements/materialized_view/SHOW_MATERIALIZED_VIEW.md) 查看物化视图的状态。 +RESTORE 后,您可以使用 [SHOW MATERIALIZED VIEWS](../../sql-reference/sql-statements/materialized_view/SHOW_MATERIALIZED_VIEW.md) 查看物化视图的状态。 -- 如果物化视图是 Active 状态,则可以直接使用。 -- 如果物化视图是 Inactive 状态,可能是因为其基表未恢复。所有基表恢复后,您可以使用 [ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md) 重新激活物化视图。 +- 如果物化视图处于活动状态,可以直接使用。 +- 如果物化视图处于非活动状态,可能是因为其基础表未恢复。在所有基础表恢复后,您可以使用 [ALTER MATERIALIZED VIEW](../../sql-reference/sql-statements/materialized_view/ALTER_MATERIALIZED_VIEW.md) 重新激活物化视图。 ::: @@ -526,7 +526,7 @@ ON (VIEW sr_view1, VIEW sr_view2) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_view3_backup` 中数据库 `sr_hub` 的所有逻辑视图到目标集群。 +以下示例将快照 `sr_view3_backup` 中数据库 `sr_hub` 的所有逻辑视图恢复到目标集群。 ```SQL RESTORE SNAPSHOT sr_view3_backup @@ -536,7 +536,7 @@ ON (ALL VIEWS) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_view3_backup` 中数据库 `sr_hub` 的其中一个逻辑视图到目标集群。 +以下示例将快照 `sr_view3_backup` 中数据库 `sr_hub` 的一个逻辑视图恢复到目标集群。 ```SQL RESTORE SNAPSHOT sr_view3_backup @@ -568,7 +568,7 @@ ON (FUNCTION sr_udf1, FUNCTION sr_udf2) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_udf3_backup` 中数据库 `sr_hub` 的所有 UDF 到目标集群。 +以下示例将快照 `sr_udf3_backup` 中数据库 `sr_hub` 的所有 UDF 恢复到目标集群。 ```SQL RESTORE SNAPSHOT sr_udf3_backup @@ -578,7 +578,7 @@ ON (ALL FUNCTIONS) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `sr_udf3_backup` 中数据库 `sr_hub` 的其中一个 UDF 到目标集群。 +以下示例将快照 `sr_udf3_backup` 中数据库 `sr_hub` 的一个 UDF 恢复到目标集群。 ```SQL RESTORE SNAPSHOT sr_udf3_backup @@ -599,7 +599,7 @@ EXTERNAL CATALOG (iceberg AS iceberg_new) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `iceberg_hive_backup` 中两个外部 Catalog `iceberg` 和 `hive` 的元数据到目标集群。 +以下示例将快照 `iceberg_hive_backup` 中外部 Catalog `iceberg` 和 `hive` 的元数据恢复到目标集群。 ```SQL RESTORE SNAPSHOT iceberg_hive_backup @@ -608,7 +608,7 @@ EXTERNAL CATALOGS (iceberg, hive) PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -以下示例恢复快照 `all_catalog_backup` 中所有外部 Catalog 的元数据到目标集群。 +以下示例将快照 `all_catalog_backup` 中所有外部 Catalog 的元数据恢复到目标集群。 ```SQL RESTORE SNAPSHOT all_catalog_backup @@ -617,7 +617,7 @@ ALL EXTERNAL CATALOGS PROPERTIES ("backup_timestamp" = "2024-12-09-10-52-10-940"); ``` -要取消对外部 Catalog 的 RESTORE 操作,请执行以下语句: +要取消外部 Catalog 的 RESTORE 操作,请执行以下语句: ```SQL CANCEL RESTORE FOR EXTERNAL CATALOG; @@ -627,24 +627,24 @@ CANCEL RESTORE FOR EXTERNAL CATALOG; 您可以通过修改 BE 配置文件 **be.conf** 中的以下配置项来优化 BACKUP 或 RESTORE 作业的性能: -| 配置项 | 描述 | -| ---------------------------- | ---------------------------------------------------------------------------------------------------------- | -| make_snapshot_worker_count | BE 节点上 BACKUP 作业创建快照任务的最大线程数。默认值:`5`。增加此配置项的值以提高创建快照任务的并发性。 | -| release_snapshot_worker_count | BE 节点上失败的 BACKUP 作业释放快照任务的最大线程数。默认值:`5`。增加此配置项的值以提高释放快照任务的并发性。 | -| upload_worker_count | BE 节点上 BACKUP 作业上传任务的最大线程数。默认值:`0`。`0` 表示设置为 BE 所在机器的 CPU 核心数。增加此配置项的值以提高上传任务的并发性。 | -| download_worker_count | BE 节点上 RESTORE 作业下载任务的最大线程数。默认值:`0`。`0` 表示设置为 BE 所在机器的 CPU 核心数。增加此配置项的值以提高下载任务的并发性。 | +| 配置项 | 描述 | +| ---------------------------- | -------------------------------------------------------------------------------------- | +| make_snapshot_worker_count | BE 节点上 BACKUP 作业中创建快照任务的最大线程数。默认值:`5`。增加此配置项的值可提高创建快照任务的并发度。 | +| release_snapshot_worker_count | BE 节点上失败的 BACKUP 作业中释放快照任务的最大线程数。默认值:`5`。增加此配置项的值可提高释放快照任务的并发度。 | +| upload_worker_count | BE 节点上 BACKUP 作业中上传任务的最大线程数。默认值:`0`。`0` 表示将此值设置为 BE 所在机器的 CPU 核心数。增加此配置项的值可提高上传任务的并发度。 | +| download_worker_count | BE 节点上 RESTORE 作业中下载任务的最大线程数。默认值:`0`。`0` 表示将此值设置为 BE 所在机器的 CPU 核心数。增加此配置项的值可提高下载任务的并发度。 | ## 使用须知 -- 在 GLOBAL、DATABASE、TABLE 和 PARTITION 级别执行备份和恢复操作需要不同的权限。有关详细信息,请参阅 [根据场景定制角色](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios)。 -- 每个数据库每次只允许运行一个 BACKUP 或 RESTORE 作业。否则,StarRocks 将返回错误。 -- 由于 BACKUP 和 RESTORE 作业会占用 StarRocks 集群的许多资源,因此您可以在 StarRocks 集群负载不高时备份和恢复数据。 +- 在全局、数据库、表和分区级别执行备份和恢复操作需要不同的权限。有关详细信息,请参阅[基于场景自定义角色](../user_privs/authorization/User_privilege.md#customize-roles-based-on-scenarios)。 +- 每个数据库每次只允许一个 BACKUP 或 RESTORE 作业运行。否则,StarRocks 将返回错误。 +- BACKUP 和 RESTORE 作业会占用 StarRocks 集群的许多资源,因此您可以在 StarRocks 集群负载不高时进行数据备份和恢复。 - StarRocks 不支持为数据备份指定数据压缩算法。 -- 由于数据是作为快照备份的,因此在快照生成时加载的数据不包含在快照中。因此,如果在快照生成后和 RESTORE 作业完成之前将数据加载到旧集群中,您还需要将数据加载到恢复数据的集群中。建议您在数据迁移完成后的一段时间内并行将数据加载到两个集群中,并在验证数据和服务正确性后将应用程序迁移到新集群。 +- 由于数据是作为快照备份的,因此在快照生成时加载的数据不包含在快照中。因此,如果在生成快照后且 RESTORE 作业完成前将数据加载到旧集群中,您还需要将数据加载到恢复数据的集群中。建议您在数据迁移完成后的一段时间内并行将数据加载到两个集群中,并在验证数据和服务的正确性后将应用程序迁移到新集群。 - 在 RESTORE 作业完成之前,您无法操作要恢复的表。 -- Primary Key 表不能恢复到 v2.5 之前的 StarRocks 集群。 +- 主键表(Primary Key table)无法恢复到 v2.5 之前的 StarRocks 集群。 - 在恢复表之前,您无需在新集群中创建要恢复的表。RESTORE 作业会自动创建。 -- 如果存在一个与要恢复的表同名的表,StarRocks 首先会检查现有表的 Schema 是否与要恢复的表的 Schema 匹配。如果 Schema 匹配,StarRocks 将用快照中的数据覆盖现有表。如果 Schema 不匹配,RESTORE 作业将失败。您可以选择使用 `AS` 关键字重命名要恢复的表,或在恢复数据之前删除现有表。 -- 如果 RESTORE 作业覆盖了现有的数据库、表或分区,则在作业进入 COMMIT 阶段后,被覆盖的数据无法恢复。如果 RESTORE 作业在此点失败或被取消,数据可能会损坏且无法访问。在这种情况下,您只能再次执行 RESTORE 操作并等待作业完成。因此,除非您确定当前数据不再使用,否则我们建议您不要通过覆盖方式恢复数据。覆盖操作首先检查快照与现有数据库、表或分区之间的元数据一致性。如果检测到不一致,则无法执行 RESTORE 操作。 -- 当前 StarRocks 不支持备份和恢复与用户账户、权限和资源组相关的配置数据。 -- 当前 StarRocks 不支持备份和恢复表之间的 Colocate Join 关系。 +- 如果存在与要恢复的表同名的现有表,StarRocks 首先检查现有表的 schema 是否与要恢复的表匹配。如果 schema 匹配,StarRocks 将使用快照中的数据覆盖现有表。如果 schema 不匹配,RESTORE 作业将失败。您可以选择使用关键字 `AS` 重命名要恢复的表,或者在恢复数据之前删除现有表。 +- 如果 RESTORE 作业覆盖现有数据库、表或分区,则在作业进入 COMMIT 阶段后,被覆盖的数据将无法恢复。如果 RESTORE 作业在此阶段失败或被取消,数据可能会损坏且无法访问。在这种情况下,您只能再次执行 RESTORE 操作并等待作业完成。因此,除非您确定当前数据不再使用,否则我们不建议通过覆盖方式恢复数据。覆盖操作首先检查快照与现有数据库、表或分区之间的元数据一致性。如果检测到不一致,则无法执行 RESTORE 操作。 +- 目前,StarRocks 不支持备份和恢复与用户账户、权限和资源组相关的配置数据。 +- 目前,StarRocks 不支持备份和恢复表之间的 Colocate Join 关系。 diff --git a/docs/zh/administration/management/FE_configuration.md b/docs/zh/administration/management/FE_configuration.md index fbdf15b..0229aeb 100644 --- a/docs/zh/administration/management/FE_configuration.md +++ b/docs/zh/administration/management/FE_configuration.md @@ -16,7 +16,7 @@ import EditionSpecificFEItem from '../../_assets/commonMarkdown/Edition_Specific ## 查看 FE 配置项 -FE 启动后,您可以在 MySQL 客户端上运行 ADMIN SHOW FRONTEND CONFIG 命令查看参数配置。如果要查询特定参数的配置,请运行以下命令: +FE 启动后,可以通过 MySQL 客户端执行 ADMIN SHOW FRONTEND CONFIG 命令查看所有参数配置。如果您想查询特定参数的配置,请运行以下命令: ```SQL ADMIN SHOW FRONTEND CONFIG [LIKE "pattern"]; @@ -50,1805 +50,1805 @@ ADMIN SET FRONTEND CONFIG ("key" = "value"); ##### audit_log_delete_age -- 默认值:30d -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:审计日志文件的保留期限。默认值 `30d` 指定每个审计日志文件可以保留 30 天。StarRocks 会检查每个审计日志文件并删除 30 天前生成的那些文件。 -- 引入版本:- +- 默认值: 30d +- 类型: String +- 单位: - +- 可变: No +- 描述: 审计日志文件的保留期限。默认值 `30d` 指定每个审计日志文件可以保留 30 天。StarRocks 会检查每个审计日志文件并删除 30 天前生成的文件。 +- 引入版本: - ##### audit_log_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/log" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储审计日志文件的目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储审计日志文件的目录。 +- 引入版本: - ##### audit_log_enable_compress -- 默认值:false -- 类型:Boolean -- 单位:N/A -- 是否可变:否 -- 描述:当为 true 时,生成的 Log4j2 配置会为轮转的审计日志文件名 (fe.audit.log.*) 附加 ".gz" 后缀,以便 Log4j2 在轮转时生成压缩的 (.gz) 归档审计日志文件。该设置在 FE 启动期间在 Log4jConfig.initLogging 中读取,并应用于审计日志的 RollingFile appender;它只影响轮转/归档文件,不影响活动的审计日志。由于该值在启动时初始化,因此更改它需要重启 FE 才能生效。与审计日志轮转设置 (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num) 一起使用。 -- 引入版本:3.2.12 +- 默认值: false +- 类型: Boolean +- 单位: N/A +- 可变: No +- 描述: 当为 true 时,生成的 Log4j2 配置会为轮转的审计日志文件名 (fe.audit.log.*) 追加 ".gz" 后缀,以便 Log4j2 在轮转时生成压缩的 (.gz) 归档审计日志文件。该设置在 FE 启动时在 Log4jConfig.initLogging 中读取,并应用于审计日志的 RollingFile appender;它只影响轮转/归档文件,而不影响活动审计日志。由于该值在启动时初始化,因此更改它需要重新启动 FE 才能生效。与审计日志轮转设置 (audit_log_dir, audit_log_roll_interval, audit_roll_maxsize, audit_log_roll_num) 一起使用。 +- 引入版本: 3.2.12 ##### audit_log_json_format -- 默认值:false -- 类型:Boolean -- 单位:N/A -- 是否可变:是 -- 描述:当为 true 时,FE 审计事件以结构化 JSON 格式(Jackson ObjectMapper 序列化带注释的 AuditEvent 字段的 Map)发出,而不是默认的管道分隔 "key=value" 字符串。该设置影响 AuditLogBuilder 处理的所有内置审计接收器:连接审计、查询审计、大查询审计(当事件符合条件时,大查询阈值字段会添加到 JSON 中)和慢审计输出。大查询阈值和 "features" 字段的带注释字段会进行特殊处理(从普通审计条目中排除;根据适用情况包含在大查询或功能日志中)。启用此功能可使日志对于日志收集器或 SIEM 而言可机器解析;请注意,它会更改日志格式,并且可能需要更新任何期望旧版管道分隔格式的现有解析器。 -- 引入版本:3.2.7 +- 默认值: false +- 类型: Boolean +- 单位: N/A +- 可变: Yes +- 描述: 当为 true 时,FE 审计事件将作为结构化的 JSON(Jackson ObjectMapper 序列化带注释的 AuditEvent 字段的 Map)发出,而不是默认的管道分隔 "key=value" 字符串。该设置影响 AuditLogBuilder 处理的所有内置审计接收器:连接审计、查询审计、大查询审计(当事件符合条件时,大查询阈值字段会添加到 JSON 中)和慢查询审计输出。用于大查询阈值的字段和 "features" 字段会特殊处理(从普通审计条目中排除;根据适用情况包含在大查询或功能日志中)。启用此功能可使日志对于日志收集器或 SIEM 可机器解析;请注意,它会更改日志格式,并且可能需要更新任何期望旧版管道分隔格式的现有解析器。 +- 引入版本: 3.2.7 ##### audit_log_modules -- 默认值:slow_query, query -- 类型:String[] -- 单位:- -- 是否可变:否 -- 描述:StarRocks 为其生成审计日志条目的模块。默认情况下,StarRocks 为 `slow_query` 模块和 `query` 模块生成审计日志。从 v3.0 开始支持 `connection` 模块。模块名称之间用逗号 (,) 和空格分隔。 -- 引入版本:- +- 默认值: slow_query, query +- 类型: String[] +- 单位: - +- 可变: No +- 描述: StarRocks 生成审计日志条目的模块。默认情况下,StarRocks 为 `slow_query` 模块和 `query` 模块生成审计日志。从 v3.0 开始支持 `connection` 模块。模块名称之间用逗号 (,) 和空格分隔。 +- 引入版本: - ##### audit_log_roll_interval -- 默认值:DAY -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:StarRocks 轮转审计日志条目的时间间隔。有效值:`DAY` 和 `HOUR`。 +- 默认值: DAY +- 类型: String +- 单位: - +- 可变: No +- 描述: StarRocks 轮转审计日志条目的时间间隔。有效值: `DAY` 和 `HOUR`。 - 如果此参数设置为 `DAY`,则在审计日志文件名称中添加 `yyyyMMdd` 格式的后缀。 - 如果此参数设置为 `HOUR`,则在审计日志文件名称中添加 `yyyyMMddHH` 格式的后缀。 -- 引入版本:- +- 引入版本: - ##### audit_log_roll_num -- 默认值:90 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:在 `audit_log_roll_interval` 参数指定的每个保留期内可以保留的审计日志文件的最大数量。 -- 引入版本:- +- 默认值: 90 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 在 `audit_log_roll_interval` 参数指定的每个保留期内,可以保留的审计日志文件的最大数量。 +- 引入版本: - ##### bdbje_log_level -- 默认值:INFO -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:控制 StarRocks 中 Berkeley DB Java Edition (BDB JE) 使用的日志级别。在 BDB 环境初始化期间,BDBEnvironment.initConfigs() 将此值应用于 `com.sleepycat.je` 包的 Java 日志记录器和 BDB JE 环境文件日志级别 (EnvironmentConfig.FILE_LOGGING_LEVEL)。接受标准 `java.util.logging.Level` 名称,例如 SEVERE、WARNING、INFO、CONFIG、FINE、FINER、FINEST、ALL、OFF。设置为 ALL 将启用所有日志消息。增加详细程度将提高日志量,并可能影响磁盘 I/O 和性能;该值在 BDB 环境初始化时读取,因此仅在环境(重新)初始化后生效。 -- 引入版本:v3.2.0 +- 默认值: INFO +- 类型: String +- 单位: - +- 可变: No +- 描述: 控制 StarRocks 中 Berkeley DB Java Edition (BDB JE) 使用的日志级别。在 BDB 环境初始化 BDBEnvironment.initConfigs() 期间,此值将应用于 `com.sleepycat.je` 包的 Java 日志记录器以及 BDB JE 环境文件日志级别 (EnvironmentConfig.FILE_LOGGING_LEVEL)。接受标准的 java.util.logging.Level 名称,如 SEVERE、WARNING、INFO、CONFIG、FINE、FINER、FINEST、ALL、OFF。设置为 ALL 可启用所有日志消息。增加详细程度会提高日志量,并可能影响磁盘 I/O 和性能;该值在 BDB 环境初始化时读取,因此仅在环境(重新)初始化后生效。 +- 引入版本: v3.2.0 ##### big_query_log_delete_age -- 默认值:7d -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:控制 FE 大查询日志文件 (`fe.big_query.log.*`) 在自动删除之前保留多长时间。该值作为 IfLastModified age 传递给 Log4j 的删除策略 — 任何最后修改时间早于此值的轮转大查询日志都将被删除。支持的后缀包括 `d`(天)、`h`(小时)、`m`(分钟)和 `s`(秒)。示例:`7d`(7 天)、`10h`(10 小时)、`60m`(60 分钟)和 `120s`(120 秒)。此项与 `big_query_log_roll_interval` 和 `big_query_log_roll_num` 一起确定哪些文件被保留或清除。 -- 引入版本:v3.2.0 +- 默认值: 7d +- 类型: String +- 单位: - +- 可变: No +- 描述: 控制 FE 大查询日志文件 (`fe.big_query.log.*`) 在自动删除前的保留时间。该值作为 IfLastModified age 传递给 Log4j 的删除策略——任何上次修改时间早于此值的轮转大查询日志都将被删除。支持的后缀包括 `d` (天)、`h` (小时)、`m` (分钟) 和 `s` (秒)。示例: `7d` (7 天), `10h` (10 小时), `60m` (60 分钟), 和 `120s` (120 秒)。此项与 `big_query_log_roll_interval` 和 `big_query_log_roll_num` 共同决定哪些文件被保留或清除。 +- 引入版本: v3.2.0 ##### big_query_log_dir -- 默认值:`Config.STARROCKS_HOME_DIR + "/log"` -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:FE 写入大查询转储日志 (`fe.big_query.log.*`) 的目录。Log4j 配置使用此路径为 `fe.big_query.log` 及其轮转文件创建 RollingFile appender。这些文件的轮转和保留由 `big_query_log_roll_interval`(基于时间的后缀)、`log_roll_size_mb`(大小触发器)、`big_query_log_roll_num`(最大文件数)和 `big_query_log_delete_age`(基于年龄的删除)控制。对于超过用户定义阈值(如 `big_query_log_cpu_second_threshold`、`big_query_log_scan_rows_threshold` 或 `big_query_log_scan_bytes_threshold`)的查询,会记录大查询记录。使用 `big_query_log_modules` 控制哪些模块记录到此文件。 -- 引入版本:v3.2.0 +- 默认值: `Config.STARROCKS_HOME_DIR + "/log"` +- 类型: String +- 单位: - +- 可变: No +- 描述: FE 写入大查询转储日志 (`fe.big_query.log.*`) 的目录。Log4j 配置使用此路径为 `fe.big_query.log` 及其轮转文件创建 RollingFile appender。轮转和保留由 `big_query_log_roll_interval` (基于时间后缀)、`log_roll_size_mb` (大小触发器)、`big_query_log_roll_num` (最大文件数) 和 `big_query_log_delete_age` (基于年龄删除) 控制。对于超过用户定义阈值 (如 `big_query_log_cpu_second_threshold`、`big_query_log_scan_rows_threshold` 或 `big_query_log_scan_bytes_threshold`) 的查询,会记录大查询记录。使用 `big_query_log_modules` 控制哪些模块记录到此文件。 +- 引入版本: v3.2.0 ##### big_query_log_modules -- 默认值:`{"query"}` -- 类型:String[] -- 单位:- -- 是否可变:否 -- 描述:启用按模块大查询日志记录的模块名称后缀列表。典型值是逻辑组件名称。例如,默认的 `query` 会生成 `big_query.query`。 -- 引入版本:v3.2.0 +- 默认值: `{"query"}` +- 类型: String[] +- 单位: - +- 可变: No +- 描述: 模块名称后缀列表,用于启用每个模块的大查询日志记录。典型值为逻辑组件名称。例如,默认的 `query` 会生成 `big_query.query`。 +- 引入版本: v3.2.0 ##### big_query_log_roll_interval -- 默认值:`"DAY"` -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:指定用于构建 `big_query` 日志 appender 的滚动文件名称的日期组件的时间间隔。有效值(不区分大小写)为 `DAY`(默认)和 `HOUR`。`DAY` 生成每日模式 (`"%d{yyyyMMdd}"`),`HOUR` 生成每小时模式 (`"%d{yyyyMMddHH}"`)。该值与基于大小的滚动 (`big_query_roll_maxsize`) 和基于索引的滚动 (`big_query_log_roll_num`) 结合,形成 RollingFile 文件模式。无效值会导致日志配置生成失败 (IOException),并可能阻止日志初始化或重新配置。与 `big_query_log_dir`、`big_query_roll_maxsize`、`big_query_log_roll_num` 和 `big_query_log_delete_age` 一起使用。 -- 引入版本:v3.2.0 +- 默认值: `"DAY"` +- 类型: String +- 单位: - +- 可变: No +- 描述: 指定用于构建 `big_query` 日志 appender 的轮转文件名称的日期组件的时间间隔。有效值(不区分大小写)为 `DAY` (默认) 和 `HOUR`。`DAY` 生成每日模式 (`"%d{yyyyMMdd}"`),`HOUR` 生成每小时模式 (`"%d{yyyyMMddHH}"`)。该值与基于大小的轮转 (`big_query_roll_maxsize`) 和基于索引的轮转 (`big_query_log_roll_num`) 结合,形成 RollingFile filePattern。无效值会导致日志配置生成失败 (IOException),并可能阻止日志初始化或重新配置。与 `big_query_log_dir`、`big_query_roll_maxsize`、`big_query_log_roll_num` 和 `big_query_log_delete_age` 一起使用。 +- 引入版本: v3.2.0 ##### big_query_log_roll_num -- 默认值:10 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:每个 `big_query_log_roll_interval` 要保留的轮转 FE 大查询日志文件的最大数量。此值绑定到 RollingFile appender 的 DefaultRolloverStrategy `max` 属性,用于 `fe.big_query.log`;当日志滚动时(按时间或按 `log_roll_size_mb`),StarRocks 最多保留 `big_query_log_roll_num` 个索引文件(文件模式使用时间后缀加索引)。早于此计数的文件可能会因滚动而删除,`big_query_log_delete_age` 还可以根据最后修改时间删除文件。 -- 引入版本:v3.2.0 +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 每个 `big_query_log_roll_interval` 保留的轮转 FE 大查询日志文件的最大数量。此值绑定到 RollingFile appender 的 DefaultRolloverStrategy `max` 属性,用于 `fe.big_query.log`;当日志轮转时(按时间或按 `log_roll_size_mb`),StarRocks 最多保留 `big_query_log_roll_num` 个索引文件(filePattern 使用时间后缀加索引)。超出此数量的文件可能会被轮转删除,并且 `big_query_log_delete_age` 还可以根据上次修改时间删除文件。 +- 引入版本: v3.2.0 ##### dump_log_delete_age -- 默认值:7d -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:转储日志文件的保留期限。默认值 `7d` 指定每个转储日志文件可以保留 7 天。StarRocks 会检查每个转储日志文件并删除 7 天前生成的那些文件。 -- 引入版本:- +- 默认值: 7d +- 类型: String +- 单位: - +- 可变: No +- 描述: 转储日志文件的保留期限。默认值 `7d` 指定每个转储日志文件可以保留 7 天。StarRocks 会检查每个转储日志文件并删除 7 天前生成的文件。 +- 引入版本: - ##### dump_log_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/log" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储转储日志文件的目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储转储日志文件的目录。 +- 引入版本: - ##### dump_log_modules -- 默认值:query -- 类型:String[] -- 单位:- -- 是否可变:否 -- 描述:StarRocks 为其生成转储日志条目的模块。默认情况下,StarRocks 为 query 模块生成转储日志。模块名称之间用逗号 (,) 和空格分隔。 -- 引入版本:- +- 默认值: query +- 类型: String[] +- 单位: - +- 可变: No +- 描述: StarRocks 生成转储日志条目的模块。默认情况下,StarRocks 为 query 模块生成转储日志。模块名称之间用逗号 (,) 和空格分隔。 +- 引入版本: - ##### dump_log_roll_interval -- 默认值:DAY -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:StarRocks 轮转转储日志条目的时间间隔。有效值:`DAY` 和 `HOUR`。 +- 默认值: DAY +- 类型: String +- 单位: - +- 可变: No +- 描述: StarRocks 轮转转储日志条目的时间间隔。有效值: `DAY` 和 `HOUR`。 - 如果此参数设置为 `DAY`,则在转储日志文件名称中添加 `yyyyMMdd` 格式的后缀。 - 如果此参数设置为 `HOUR`,则在转储日志文件名称中添加 `yyyyMMddHH` 格式的后缀。 -- 引入版本:- +- 引入版本: - ##### dump_log_roll_num -- 默认值:10 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:在 `dump_log_roll_interval` 参数指定的每个保留期内可以保留的转储日志文件的最大数量。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 在 `dump_log_roll_interval` 参数指定的每个保留期内,可以保留的转储日志文件的最大数量。 +- 引入版本: - ##### edit_log_write_slow_log_threshold_ms -- 默认值:2000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:JournalWriter 用于检测和记录慢编辑日志批次写入的阈值(毫秒)。在批次提交后,如果批次持续时间超过此值,JournalWriter 将发出 WARN 消息,其中包含批次大小、持续时间和当前 Journal 队列大小(速率限制为大约每 2 秒一次)。此设置仅控制 FE Leader 上潜在 IO 或复制延迟的日志记录/警报;它不改变提交或滚动行为(请参阅 `edit_log_roll_num` 和与提交相关的设置)。无论此阈值如何,度量更新仍会发生。 -- 引入版本:v3.2.3 +- 默认值: 2000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: JournalWriter 用于检测和记录慢编辑日志批写入的阈值(毫秒)。在批提交之后,如果批持续时间超过此值,JournalWriter 会发出 WARN 消息,其中包含批大小、持续时间以及当前日志队列大小(速率限制为大约每 2 秒一次)。此设置仅控制 FE Leader 上潜在 I/O 或复制延迟的日志记录/警报;它不改变提交或滚动行为(请参阅 `edit_log_roll_num` 和与提交相关的设置)。无论此阈值如何,度量更新仍然会发生。 +- 引入版本: v3.2.3 ##### enable_audit_sql -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:当此项设置为 `true` 时,FE 审计子系统会将语句的 SQL 文本记录到 ConnectProcessor 处理的 FE 审计日志 (`fe.audit.log`) 中。存储的语句遵循其他控制:加密语句会被 redacted (`AuditEncryptionChecker`),如果 `enable_sql_desensitize_in_log` 设置为 true,敏感凭据可能会被 redacted 或 desensitized,并且摘要记录由 `enable_sql_digest` 控制。当设置为 `false` 时,ConnectProcessor 会在审计事件中将语句文本替换为 "?" — 其他审计字段(用户、主机、持续时间、状态、通过 `qe_slow_log_ms` 进行的慢查询检测以及指标)仍会被记录。启用 SQL 审计会增加取证和故障排除的可见性,但可能会暴露敏感的 SQL 内容并增加日志量和 I/O;禁用它以牺牲审计日志中失去完整语句可见性为代价来提高隐私性。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 当此项设置为 `true` 时,FE 审计子系统会将由 ConnectProcessor 处理的 SQL 语句文本记录到 FE 审计日志 (`fe.audit.log`) 中。存储的语句遵循其他控制:加密语句将被 redact (`AuditEncryptionChecker`),敏感凭据如果 `enable_sql_desensitize_in_log` 设置为 true 则可能被 redact 或脱敏,摘要记录由 `enable_sql_digest` 控制。当设置为 `false` 时,ConnectProcessor 会在审计事件中将语句文本替换为 "?" - 其他审计字段(用户、主机、持续时间、状态、通过 `qe_slow_log_ms` 进行的慢查询检测以及指标)仍会被记录。启用 SQL 审计会增加取证和故障排除的可见性,但可能会暴露敏感 SQL 内容并增加日志量和 I/O;禁用它以牺牲审计日志中完整语句的可见性为代价来提高隐私性。 +- 引入版本: - ##### enable_profile_log -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否启用 Profile 日志记录。当启用此功能时,FE 会将每个查询的 Profile 日志(ProfileManager 生成的序列化 queryDetail JSON)写入 Profile 日志接收器。仅当 `enable_collect_query_detail_info` 也启用时才会执行此日志记录;当 `enable_profile_log_compress` 启用时,JSON 可能会在日志记录前进行 gzip 压缩。Profile 日志文件由 `profile_log_dir`、`profile_log_roll_num`、`profile_log_roll_interval` 管理,并根据 `profile_log_delete_age`(支持 `7d`、`10h`、`60m`、`120s` 等格式)进行轮转/删除。禁用此功能将停止写入 Profile 日志(减少磁盘 I/O、压缩 CPU 和存储使用)。 -- 引入版本:v3.2.5 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否启用 profile 日志记录。启用此功能后,FE 会将每个查询的 profile 日志(由 `ProfileManager` 生成的序列化 `queryDetail` JSON)写入 profile 日志接收器。仅当 `enable_collect_query_detail_info` 也启用时才执行此日志记录;当 `enable_profile_log_compress` 启用时,JSON 可能会在日志记录前进行 gzip 压缩。Profile 日志文件由 `profile_log_dir`、`profile_log_roll_num`、`profile_log_roll_interval` 管理,并根据 `profile_log_delete_age` (支持 `7d`、`10h`、`60m`、`120s` 等格式) 进行轮转/删除。禁用此功能将停止写入 profile 日志(减少磁盘 I/O、压缩 CPU 和存储使用)。 +- 引入版本: v3.2.5 ##### enable_qe_slow_log -- 默认值:true -- 类型:Boolean -- 单位:N/A -- 是否可变:是 -- 描述:启用时,FE 内置审计插件 (AuditLogBuilder) 会将执行时间("Time" 字段)超过 `qe_slow_log_ms` 配置阈值的查询事件写入慢查询审计日志 (AuditLog.getSlowAudit)。如果禁用,这些慢查询条目将被抑制(常规查询和连接审计日志不受影响)。慢审计条目遵循全局 `audit_log_json_format` 设置(JSON 与纯字符串)。使用此标志可独立于常规审计日志记录控制慢查询审计量;当 `qe_slow_log_ms` 较低或工作负载产生许多长时间运行的查询时,关闭此功能可能会减少日志 I/O。 -- 引入版本:3.2.11 +- 默认值: true +- 类型: Boolean +- 单位: N/A +- 可变: Yes +- 描述: 启用后,FE 内置审计插件 (AuditLogBuilder) 将把其测量执行时间("Time" 字段)超过由 qe_slow_log_ms 配置的阈值的查询事件写入慢查询审计日志 (AuditLog.getSlowAudit)。如果禁用,这些慢查询条目将被抑制(常规查询和连接审计日志不受影响)。慢查询审计条目遵循全局的 audit_log_json_format 设置(JSON 与纯字符串)。使用此标志可独立于常规审计日志记录控制慢查询审计的生成量;当 qe_slow_log_ms 较低或工作负载产生许多长时间运行的查询时,关闭它可能会减少日志 I/O。 +- 引入版本: 3.2.11 ##### enable_sql_desensitize_in_log -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:当此项设置为 `true` 时,系统会在将敏感 SQL 内容写入日志和查询详细信息记录之前对其进行替换或隐藏。遵循此配置的代码路径包括 ConnectProcessor.formatStmt(审计日志)、StmtExecutor.addRunningQueryDetail(查询详细信息)和 SimpleExecutor.formatSQL(内部执行器日志)。启用此功能后,无效的 SQL 可能会被替换为固定的脱敏消息,凭据(用户/密码)将被隐藏,并且 SQL 格式化程序需要生成一个经过清理的表示(它还可以启用摘要式输出)。这减少了审计/内部日志中敏感文字和凭据的泄露,但也意味着日志和查询详细信息不再包含原始完整 SQL 文本(这会影响重放或调试)。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 当此项设置为 `true` 时,系统会在将敏感 SQL 内容写入日志和查询详细记录之前对其进行替换或隐藏。遵循此配置的代码路径包括 ConnectProcessor.formatStmt(审计日志)、StmtExecutor.addRunningQueryDetail(查询详细信息)和 SimpleExecutor.formatSQL(内部执行器日志)。启用此功能后,无效 SQL 可能会被替换为固定的脱敏消息,凭据(用户/密码)将被隐藏,并且 SQL 格式化程序必须生成一个经过清理的表示(它还可以启用摘要式输出)。这减少了审计/内部日志中敏感字面量和凭据的泄露,但也意味着日志和查询详细信息不再包含原始的完整 SQL 文本(这可能会影响回放或调试)。 +- 引入版本: - ##### internal_log_delete_age -- 默认值:7d -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:指定 FE 内部日志文件(写入 `internal_log_dir`)的保留期限。该值是一个持续时间字符串。支持的后缀:`d`(天)、`h`(小时)、`m`(分钟)、`s`(秒)。示例:`7d`(7 天)、`10h`(10 小时)、`60m`(60 分钟)、`120s`(120 秒)。此项作为 `` 谓词替换到 Log4j 配置中,由 RollingFile 删除策略使用。在日志滚动期间,最后修改时间早于此持续时间的文件将被删除。增加此值可更快释放磁盘空间,或减少此值以更长时间保留内部物化视图或统计日志。 -- 引入版本:v3.2.4 +- 默认值: 7d +- 类型: String +- 单位: - +- 可变: No +- 描述: 指定 FE 内部日志文件(写入 `internal_log_dir`)的保留期限。该值为持续时间字符串。支持的后缀:`d` (天),`h` (小时),`m` (分钟),`s` (秒)。示例:`7d` (7 天),`10h` (10 小时),`60m` (60 分钟),`120s` (120 秒)。此项被替换到 log4j 配置中,作为 RollingFile 删除策略使用的 `` 谓词。上次修改时间早于此持续时间的文件将在日志轮转期间删除。增加此值可以更快释放磁盘空间,或减少此值以更长时间保留内部物化视图或统计日志。 +- 引入版本: v3.2.4 ##### internal_log_dir -- 默认值:`Config.STARROCKS_HOME_DIR + "/log"` -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:FE 日志子系统用于存储内部日志 (`fe.internal.log`) 的目录。此配置将替换到 Log4j 配置中,并确定 InternalFile appender 写入内部/物化视图/统计日志的位置,以及 `internal.` 下的每个模块日志记录器放置其文件的位置。确保目录存在、可写且有足够的磁盘空间。此目录中文件的日志轮转和保留由 `log_roll_size_mb`、`internal_log_roll_num`、`internal_log_delete_age` 和 `internal_log_roll_interval` 控制。如果 `sys_log_to_console` 已启用,内部日志可能会写入控制台而不是此目录。 -- 引入版本:v3.2.4 +- 默认值: `Config.STARROCKS_HOME_DIR + "/log"` +- 类型: String +- 单位: - +- 可变: No +- 描述: FE 日志子系统用于存储内部日志 (`fe.internal.log`) 的目录。此配置被替换到 Log4j 配置中,并决定 InternalFile appender 在何处写入内部/物化视图/统计日志以及 `internal.` 下的每个模块日志记录器在何处放置其文件。确保目录存在、可写且有足够的磁盘空间。此目录中文件的日志轮转和保留由 `log_roll_size_mb`、`internal_log_roll_num`、`internal_log_delete_age` 和 `internal_log_roll_interval` 控制。如果 `sys_log_to_console` 启用,内部日志可能会写入控制台而不是此目录。 +- 引入版本: v3.2.4 ##### internal_log_json_format -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `true` 时,内部统计/审计条目将作为紧凑的 JSON 对象写入统计审计日志记录器。JSON 包含键 "executeType" (InternalType: QUERY 或 DML)、"queryId"、"sql" 和 "time"(经过的毫秒)。当设置为 `false` 时,相同的信息将作为单个格式化文本行记录 ("statistic execute: ... | QueryId: [...] | SQL: ...")。启用 JSON 可改善机器解析和与日志处理器的集成,但也会导致原始 SQL 文本包含在日志中,这可能会暴露敏感信息并增加日志大小。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `true` 时,内部统计/审计条目将以紧凑的 JSON 对象形式写入统计审计日志记录器。JSON 包含键 "executeType" (InternalType: QUERY 或 DML)、"queryId"、"sql" 和 "time"(耗时毫秒)。当设置为 `false` 时,相同信息将记录为单个格式化文本行 ("statistic execute: ... | QueryId: [...] | SQL: ...")。启用 JSON 改进了机器解析和与日志处理器的集成,但也导致原始 SQL 文本包含在日志中,这可能会暴露敏感信息并增加日志大小。 +- 引入版本: - ##### internal_log_modules -- 默认值:`{"base", "statistic"}` -- 类型:String[] -- 单位:- -- 是否可变:否 -- 描述:将接收专用内部日志记录的模块标识符列表。对于每个条目 X,Log4j 会创建一个名为 `internal.` 的日志记录器,其级别为 INFO,additivity="false"。这些日志记录器被路由到内部 appender(写入 `fe.internal.log`)或在 `sys_log_to_console` 启用时路由到控制台。根据需要使用短名称或包片段 — 确切的日志记录器名称变为 `internal.` + 配置的字符串。内部日志文件轮转和保留遵循 `internal_log_dir`、`internal_log_roll_num`、`internal_log_delete_age`、`internal_log_roll_interval` 和 `log_roll_size_mb`。添加模块会导致其运行时消息分离到内部日志流中,以便于调试和审计。 -- 引入版本:v3.2.4 +- 默认值: `{"base", "statistic"}` +- 类型: String[] +- 单位: - +- 可变: No +- 描述: 将接收专用内部日志记录的模块标识符列表。对于每个条目 X,Log4j 会创建一个名为 `internal.` 的日志记录器,其级别为 INFO,additivity="false"。这些日志记录器被路由到内部 appender(写入 `fe.internal.log`)或在 `sys_log_to_console` 启用时路由到控制台。根据需要使用短名称或包片段——确切的日志记录器名称将变为 `internal.` + 配置的字符串。内部日志文件轮转和保留遵循 `internal_log_dir`、`internal_log_roll_num`、`internal_log_delete_age`、`internal_log_roll_interval` 和 `log_roll_size_mb`。添加模块会导致其运行时消息分离到内部日志流中,以便于调试和审计。 +- 引入版本: v3.2.4 ##### internal_log_roll_interval -- 默认值:DAY -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:控制 FE 内部日志 appender 的基于时间的滚动间隔。接受的值(不区分大小写)为 `HOUR` 和 `DAY`。`HOUR` 生成每小时文件模式 (`"%d{yyyyMMddHH}"`),`DAY` 生成每日文件模式 (`"%d{yyyyMMdd}"`),这些模式由 RollingFile TimeBasedTriggeringPolicy 用于命名轮转的 `fe.internal.log` 文件。无效值会导致初始化失败(在构建活动 Log4j 配置时抛出 IOException)。滚动行为还取决于相关设置,如 `internal_log_dir`、`internal_roll_maxsize`、`internal_log_roll_num` 和 `internal_log_delete_age`。 -- 引入版本:v3.2.4 +- 默认值: DAY +- 类型: String +- 单位: - +- 可变: No +- 描述: 控制 FE 内部日志 appender 的基于时间的滚动间隔。接受的值(不区分大小写)为 `HOUR` 和 `DAY`。`HOUR` 生成小时文件模式 (`"%d{yyyyMMddHH}"`),`DAY` 生成每日文件模式 (`"%d{yyyyMMdd}"`),这些模式由 RollingFile TimeBasedTriggeringPolicy 用于命名轮转的 `fe.internal.log` 文件。无效值会导致初始化失败(构建活动 Log4j 配置时抛出 IOException)。滚动行为还取决于相关设置,如 `internal_log_dir`、`internal_roll_maxsize`、`internal_log_roll_num` 和 `internal_log_delete_age`。 +- 引入版本: v3.2.4 ##### internal_log_roll_num -- 默认值:90 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:为内部 appender (`fe.internal.log`) 保留的轮转内部 FE 日志文件的最大数量。此值用作 Log4j DefaultRolloverStrategy `max` 属性;当发生轮转时,StarRocks 最多保留 `internal_log_roll_num` 个归档文件并删除旧文件(也受 `internal_log_delete_age` 控制)。较低的值会减少磁盘使用,但缩短日志历史记录;较高的值会保留更多的历史内部日志。此项与 `internal_log_dir`、`internal_log_roll_interval` 和 `internal_roll_maxsize` 协同工作。 -- 引入版本:v3.2.4 +- 默认值: 90 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 内部 appender (`fe.internal.log`) 保留的轮转内部 FE 日志文件的最大数量。此值用作 Log4j DefaultRolloverStrategy `max` 属性;当发生轮转时,StarRocks 最多保留 `internal_log_roll_num` 个归档文件并删除旧文件(也受 `internal_log_delete_age` 控制)。较低的值会减少磁盘使用但缩短日志历史;较高的值会保留更多历史内部日志。此项与 `internal_log_dir`、`internal_log_roll_interval` 和 `internal_roll_maxsize` 协同工作。 +- 引入版本: v3.2.4 ##### log_cleaner_audit_log_min_retention_days -- 默认值:3 -- 类型:Int -- 单位:天 -- 是否可变:是 -- 描述:审计日志文件的最小保留天数。早于此天数的审计日志文件即使磁盘使用率高也不会被删除。这确保了审计日志为合规性和故障排除目的而保留。 -- 引入版本:- +- 默认值: 3 +- 类型: Int +- 单位: Days +- 可变: Yes +- 描述: 审计日志文件的最小保留天数。即使磁盘使用率很高,早于此时间点的审计日志文件也不会被删除。这确保了审计日志用于合规性和故障排除目的得以保留。 +- 引入版本: - ##### log_cleaner_check_interval_second -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:检查磁盘使用情况和清理日志的时间间隔(秒)。清理程序定期检查每个日志目录的磁盘使用情况,并在必要时触发清理。默认值为 300 秒(5 分钟)。 -- 引入版本:- +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 检查磁盘使用情况和清理日志的间隔(秒)。清理器会定期检查每个日志目录的磁盘使用情况,并在必要时触发清理。默认值为 300 秒(5 分钟)。 +- 引入版本: - ##### log_cleaner_disk_usage_target -- 默认值:60 -- 类型:Int -- 单位:百分比 -- 是否可变:是 -- 描述:日志清理后的目标磁盘使用率(百分比)。日志清理将持续进行,直到磁盘使用率降至此阈值以下。清理程序会逐个删除最旧的日志文件,直到达到目标。 -- 引入版本:- +- 默认值: 60 +- 类型: Int +- 单位: Percentage +- 可变: Yes +- 描述: 日志清理后的目标磁盘使用率(百分比)。日志清理将持续进行,直到磁盘使用率降至此阈值以下。清理器会逐个删除最旧的日志文件,直到达到目标。 +- 引入版本: - ##### log_cleaner_disk_usage_threshold -- 默认值:80 -- 类型:Int -- 单位:百分比 -- 是否可变:是 -- 描述:触发日志清理的磁盘使用率阈值(百分比)。当磁盘使用率超过此阈值时,日志清理将开始。清理程序独立检查每个配置的日志目录,并处理超过此阈值的目录。 -- 引入版本:- +- 默认值: 80 +- 类型: Int +- 单位: Percentage +- 可变: Yes +- 描述: 触发日志清理的磁盘使用率阈值(百分比)。当磁盘使用率超过此阈值时,日志清理将开始。清理器会独立检查每个配置的日志目录,并处理超过此阈值的目录。 +- 引入版本: - ##### log_cleaner_disk_util_based_enable -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用基于磁盘使用率的自动日志清理。启用后,当磁盘使用率超过阈值时,将清理日志。日志清理程序作为 FE 节点上的后台守护进程运行,有助于防止日志文件堆积耗尽磁盘空间。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用基于磁盘使用率的自动日志清理。启用后,当磁盘使用率超过阈值时,日志将被清理。日志清理器作为 FE 节点上的后台守护程序运行,有助于防止日志文件堆积导致磁盘空间耗尽。 +- 引入版本: - ##### log_plan_cancelled_by_crash_be -- 默认值:true -- 类型:boolean -- 单位:- -- 是否可变:是 -- 描述:当查询因 BE 崩溃或 RPC 异常而被取消时,是否启用查询执行计划日志记录。启用此功能后,当查询因 BE 崩溃或 `RpcException` 而被取消时,StarRocks 会将查询执行计划(在 `TExplainLevel.COSTS` 级别)记录为 WARN 条目。日志条目包括 QueryId、SQL 和 COSTS 计划;在 ExecuteExceptionHandler 路径中,异常堆栈跟踪也会被记录。当 `enable_collect_query_detail_info` 启用时(计划存储在查询详细信息中),日志记录会被跳过 —— 在代码路径中,通过验证查询详细信息是否为 null 来执行检查。请注意,在 ExecuteExceptionHandler 中,计划仅在第一次重试 (`retryTime == 0`) 时记录。启用此功能可能会增加日志量,因为完整的 COSTS 计划可能很大。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: boolean +- 单位: - +- 可变: Yes +- 描述: 是否在由于 BE 崩溃或 RPC 异常而取消查询时启用查询执行计划日志记录。启用此功能后,当查询因 BE 崩溃或 `RpcException` 而取消时,StarRocks 会将查询执行计划(在 `TExplainLevel.COSTS` 级别)记录为 WARN 条目。日志条目包括 QueryId、SQL 和 COSTS 计划;在 ExecuteExceptionHandler 路径中,还会记录异常堆栈跟踪。当 `enable_collect_query_detail_info` 启用时,日志记录会被跳过(计划会存储在查询详细信息中)——在代码路径中,通过验证查询详细信息是否为 null 来执行检查。请注意,在 ExecuteExceptionHandler 中,计划仅在第一次重试 (`retryTime == 0`) 时记录。启用此功能可能会增加日志量,因为完整的 COSTS 计划可能很大。 +- 引入版本: v3.2.0 ##### log_register_and_unregister_query_id -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否允许 FE 记录来自 QeProcessorImpl 的查询注册和注销消息(例如,`"register query id = {}"` 和 `"deregister query id = {}"`)。仅当查询具有非空 ConnectContext 且命令不是 `COM_STMT_EXECUTE` 或会话变量 `isAuditExecuteStmt()` 为 true 时才发出日志。由于这些消息是针对每个查询生命周期事件写入的,因此启用此功能可能会产生大量日志并成为高并发环境中的吞吐量瓶颈。启用它用于调试或审计;禁用它以减少日志记录开销并提高性能。 -- 引入版本:v3.3.0, v3.4.0, v3.5.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否允许 FE 记录来自 QeProcessorImpl 的查询注册和注销消息(例如,`"register query id = {}"` 和 `"deregister query id = {}"`)。仅当查询具有非空 ConnectContext 且命令不是 `COM_STMT_EXECUTE` 或会话变量 `isAuditExecuteStmt()` 为 true 时,才会发出日志。由于这些消息是为每个查询生命周期事件写入的,因此启用此功能可能会产生高日志量,并在高并发环境中成为吞吐量瓶颈。启用它用于调试或审计;禁用它以减少日志记录开销并提高性能。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### log_roll_size_mb -- 默认值:1024 -- 类型:Int -- 单位:MB -- 是否可变:否 -- 描述:系统日志文件或审计日志文件的最大大小。 -- 引入版本:- +- 默认值: 1024 +- 类型: Int +- 单位: MB +- 可变: No +- 描述: 系统日志文件或审计日志文件的最大大小。 +- 引入版本: - ##### proc_profile_file_retained_days -- 默认值:1 -- 类型:Int -- 单位:天 -- 是否可变:是 -- 描述:保留在 `sys_log_dir/proc_profile` 下生成的进程分析文件(CPU 和内存)的天数。ProcProfileCollector 通过从当前时间(格式为 yyyyMMdd-HHmmss)减去 `proc_profile_file_retained_days` 天来计算截止日期,并删除时间戳部分在字典序上早于该截止日期的分析文件(即 `timePart.compareTo(timeToDelete) < 0`)。文件删除还遵循由 `proc_profile_file_retained_size_bytes` 控制的基于大小的截止日期。分析文件使用 `cpu-profile-` 和 `mem-profile-` 前缀,并在收集后进行压缩。 -- 引入版本:v3.2.12 +- 默认值: 1 +- 类型: Int +- 单位: Days +- 可变: Yes +- 描述: 在 `sys_log_dir/proc_profile` 下生成的进程分析文件(CPU 和内存)的保留天数。ProcProfileCollector 通过将当前时间(格式为 yyyyMMdd-HHmmss)减去 `proc_profile_file_retained_days` 天来计算截止时间,并删除其时间戳部分在字典顺序上早于该截止时间的分析文件(即 `timePart.compareTo(timeToDelete) < 0`)。文件删除也遵循 `proc_profile_file_retained_size_bytes` 控制的基于大小的截止时间。分析文件使用 `cpu-profile-` 和 `mem-profile-` 前缀,并在收集后进行压缩。 +- 引入版本: v3.2.12 ##### proc_profile_file_retained_size_bytes -- 默认值:2L * 1024 * 1024 * 1024 (2147483648) -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:在分析目录下保留的已收集 CPU 和内存分析文件(文件名前缀为 `cpu-profile-` 和 `mem-profile-`)的最大总字节数。当有效分析文件的总大小超过 `proc_profile_file_retained_size_bytes` 时,收集器会删除最旧的分析文件,直到剩余总大小小于或等于 `proc_profile_file_retained_size_bytes`。早于 `proc_profile_file_retained_days` 的文件也会被删除,无论大小如何。此设置控制分析归档的磁盘使用情况,并与 `proc_profile_file_retained_days` 交互以确定删除顺序和保留。 -- 引入版本:v3.2.12 +- 默认值: 2L * 1024 * 1024 * 1024 (2147483648) +- 类型: Long +- 单位: Bytes +- 可变: Yes +- 描述: 在 profile 目录下保留的已收集 CPU 和内存 profile 文件(文件名前缀为 `cpu-profile-` 和 `mem-profile-`)的最大总字节数。当有效 profile 文件的总和超过 `proc_profile_file_retained_size_bytes` 时,收集器会删除最旧的 profile 文件,直到剩余总大小小于或等于 `proc_profile_file_retained_size_bytes`。早于 `proc_profile_file_retained_days` 的文件也会被删除,无论大小如何。此设置控制 profile 存档的磁盘使用情况,并与 `proc_profile_file_retained_days` 交互以确定删除顺序和保留。 +- 引入版本: v3.2.12 ##### profile_log_delete_age -- 默认值:1d -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:控制 FE profile 日志文件在可删除之前保留多长时间。该值被注入到 Log4j 的 `` 策略(通过 `Log4jConfig`)中,并与 `profile_log_roll_interval` 和 `profile_log_roll_num` 等轮转设置一起应用。支持的后缀:`d`(天)、`h`(小时)、`m`(分钟)、`s`(秒)。例如:`7d`(7 天)、`10h`(10 小时)、`60m`(60 分钟)、`120s`(120 秒)。 -- 引入版本:v3.2.5 +- 默认值: 1d +- 类型: String +- 单位: - +- 可变: No +- 描述: 控制 FE profile 日志文件在符合删除条件之前的保留时间。该值被注入到 Log4j 的 `` 策略(通过 `Log4jConfig`),并与轮转设置(如 `profile_log_roll_interval` 和 `profile_log_roll_num`)一起应用。支持的后缀:`d` (天), `h` (小时), `m` (分钟), `s` (秒)。例如: `7d` (7 天), `10h` (10 小时), `60m` (60 分钟), `120s` (120 秒)。 +- 引入版本: v3.2.5 ##### profile_log_dir -- 默认值:`Config.STARROCKS_HOME_DIR + "/log"` -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:FE Profile 日志写入的目录。Log4jConfig 使用此值来放置与 Profile 相关的 appender(在此目录下创建 `fe.profile.log` 和 `fe.features.log` 等文件)。这些文件的轮转和保留由 `profile_log_roll_size_mb`、`profile_log_roll_num` 和 `profile_log_delete_age` 控制;时间戳后缀格式由 `profile_log_roll_interval` 控制(支持 DAY 或 HOUR)。由于默认目录位于 `STARROCKS_HOME_DIR` 下,请确保 FE 进程对此目录具有写入和轮转/删除权限。 -- 引入版本:v3.2.5 +- 默认值: `Config.STARROCKS_HOME_DIR + "/log"` +- 类型: String +- 单位: - +- 可变: No +- 描述: FE profile 日志的写入目录。Log4jConfig 使用此值来放置与 profile 相关的 appender(在此目录下创建 `fe.profile.log` 和 `fe.features.log` 等文件)。这些文件的轮转和保留由 `profile_log_roll_size_mb`、`profile_log_roll_num` 和 `profile_log_delete_age` 控制;时间戳后缀格式由 `profile_log_roll_interval` 控制(支持 DAY 或 HOUR)。由于默认目录位于 `STARROCKS_HOME_DIR` 下,请确保 FE 进程对此目录具有写入和轮转/删除权限。 +- 引入版本: v3.2.5 ##### profile_log_roll_interval -- 默认值:DAY -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:控制用于生成 Profile 日志文件名日期部分的时间粒度。有效值(不区分大小写)为 `HOUR` 和 `DAY`。`HOUR` 生成 `"%d{yyyyMMddHH}"` 模式(每小时时间桶),`DAY` 生成 `"%d{yyyyMMdd}"` 模式(每日时间桶)。此值在 Log4j 配置中计算 `profile_file_pattern` 时使用,并且仅影响滚动文件名称的基于时间的组件;基于大小的滚动仍由 `profile_log_roll_size_mb` 控制,保留由 `profile_log_roll_num` / `profile_log_delete_age` 控制。无效值会导致日志初始化期间抛出 IOException(错误消息:`"profile_log_roll_interval config error: "`)。对于高容量 Profile,选择 `HOUR` 以限制每小时的文件大小;对于每日聚合,选择 `DAY`。 -- 引入版本:v3.2.5 +- 默认值: DAY +- 类型: String +- 单位: - +- 可变: No +- 描述: 控制用于生成 profile 日志文件名日期部分的时间粒度。有效值(不区分大小写)为 `HOUR` 和 `DAY`。`HOUR` 生成 `"%d{yyyyMMddHH}"` 模式(每小时时间桶),`DAY` 生成 `"%d{yyyyMMdd}"` 模式(每日时间桶)。此值在 Log4j 配置中计算 `profile_file_pattern` 时使用,并且仅影响轮转文件名称的基于时间的组件;基于大小的轮转仍由 `profile_log_roll_size_mb` 控制,保留由 `profile_log_roll_num` / `profile_log_delete_age` 控制。无效值会在日志初始化期间导致 IOException(错误消息:`"profile_log_roll_interval config error: "`)。对于高容量 profile,选择 `HOUR` 以限制每小时的文件大小,或选择 `DAY` 以进行每日聚合。 +- 引入版本: v3.2.5 ##### profile_log_roll_num -- 默认值:5 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:指定 Log4j 的 DefaultRolloverStrategy 为 Profile 日志记录器保留的轮转 Profile 日志文件的最大数量。此值注入到日志 XML 中,作为 `${profile_log_roll_num}`(例如 ``)。轮转由 `profile_log_roll_size_mb` 或 `profile_log_roll_interval` 触发;当发生轮转时,Log4j 最多保留这些索引文件,旧的索引文件将有资格被删除。磁盘上的实际保留也受 `profile_log_delete_age` 和 `profile_log_dir` 位置的影响。较低的值会减少磁盘使用,但限制保留历史记录;较高的值会保留更多的历史 Profile 日志。 -- 引入版本:v3.2.5 +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 指定 Log4j 的 DefaultRolloverStrategy 为 profile 日志记录器保留的轮转 profile 日志文件的最大数量。此值作为 `${profile_log_roll_num}` 注入到日志 XML 中(例如 ``)。轮转由 `profile_log_roll_size_mb` 或 `profile_log_roll_interval` 触发;当发生轮转时,Log4j 最多保留这些索引文件,旧索引文件符合删除条件。磁盘上的实际保留也受 `profile_log_delete_age` 和 `profile_log_dir` 位置的影响。较低的值会减少磁盘使用但限制保留历史;较高的值会保留更多历史 profile 日志。 +- 引入版本: v3.2.5 ##### profile_log_roll_size_mb -- 默认值:1024 -- 类型:Int -- 单位:MB -- 是否可变:否 -- 描述:设置触发 FE Profile 日志文件基于大小轮转的大小阈值(以兆字节为单位)。此值由 Log4j RollingFile SizeBasedTriggeringPolicy 用于 `ProfileFile` appender;当 Profile 日志超过 `profile_log_roll_size_mb` 时,它将被轮转。当达到 `profile_log_roll_interval` 时,也可以按时间进行轮转——任一条件都将触发轮转。结合 `profile_log_roll_num` 和 `profile_log_delete_age`,此项控制保留多少历史 Profile 文件以及何时删除旧文件。轮转文件的压缩由 `enable_profile_log_compress` 控制。 -- 引入版本:v3.2.5 +- 默认值: 1024 +- 类型: Int +- 单位: MB +- 可变: No +- 描述: 设置触发 FE profile 日志文件基于大小轮转的大小阈值(以兆字节为单位)。此值由 Log4j RollingFile SizeBasedTriggeringPolicy 用于 `ProfileFile` appender;当 profile 日志超过 `profile_log_roll_size_mb` 时,它将被轮转。当达到 `profile_log_roll_interval` 时,也可能发生基于时间的轮转——任一条件都将触发轮转。结合 `profile_log_roll_num` 和 `profile_log_delete_age`,此项控制保留多少历史 profile 文件以及何时删除旧文件。轮转文件的压缩由 `enable_profile_log_compress` 控制。 +- 引入版本: v3.2.5 ##### qe_slow_log_ms -- 默认值:5000 -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:用于确定查询是否为慢查询的阈值。如果查询的响应时间超过此阈值,则会在 **fe.audit.log** 中记录为慢查询。 -- 引入版本:- +- 默认值: 5000 +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: 用于判断查询是否为慢查询的阈值。如果查询的响应时间超过此阈值,则将其记录为 **fe.audit.log** 中的慢查询。 +- 引入版本: - ##### slow_lock_log_every_ms -- 默认值:3000L -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:在为同一个 SlowLockLogStats 实例发出另一个“慢锁”警告之前等待的最小间隔(毫秒)。LockUtils 在锁等待超过 slow_lock_threshold_ms 后检查此值,并且会抑制额外的警告,直到自上次记录的慢锁事件以来经过了 slow_lock_log_every_ms 毫秒。使用较大的值可减少长时间争用期间的日志量,或使用较小的值可获得更频繁的诊断。更改在运行时对后续检查生效。 -- 引入版本:v3.2.0 +- 默认值: 3000L +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: 在为同一 SlowLockLogStats 实例发出另一个“慢锁”警告之前等待的最小间隔(毫秒)。LockUtils 在锁等待超过 slow_lock_threshold_ms 后检查此值,并且会抑制额外的警告,直到自上次记录的慢锁事件以来经过 slow_lock_log_every_ms 毫秒。使用较大的值可减少长时间争用期间的日志量,或使用较小的值以获得更频繁的诊断。更改在运行时对后续检查生效。 +- 引入版本: v3.2.0 ##### slow_lock_print_stack -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否允许 LockManager 在 `logSlowLockTrace` 发出的慢锁警告的 JSON 有效负载中包含拥有线程的完整堆栈跟踪("stack" 数组通过 `LogUtil.getStackTraceToJsonArray` 使用 `start=0` 和 `max=Short.MAX_VALUE` 填充)。此配置仅控制当锁获取超过 `slow_lock_threshold_ms` 配置的阈值时显示的锁拥有者的额外堆栈信息。启用此功能通过提供精确的持有锁的线程堆栈来帮助调试;禁用此功能可减少日志量和高并发环境中捕获和序列化堆栈跟踪导致的 CPU/内存开销。 -- 引入版本:v3.3.16, v3.4.5, v3.5.1 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否允许 LockManager 在 `logSlowLockTrace` 发出的慢锁警告的 JSON 有效负载中包含拥有线程的完整堆栈跟踪("stack" 数组通过 `LogUtil.getStackTraceToJsonArray` 填充,`start=0`,`max=Short.MAX_VALUE`)。此配置仅控制当锁获取超过 `slow_lock_threshold_ms` 配置的阈值时显示的锁拥有者的额外堆栈信息。启用此功能通过提供精确的持有锁的线程堆栈来帮助调试;禁用它可减少日志量和在高并发环境中捕获和序列化堆栈跟踪导致的 CPU/内存开销。 +- 引入版本: v3.3.16, v3.4.5, v3.5.1 ##### slow_lock_threshold_ms -- 默认值:3000L -- 类型:long -- 单位:毫秒 -- 是否可变:是 -- 描述:用于将锁操作或持有的锁归类为“慢”的阈值(毫秒)。当锁的等待或持有时间超过此值时,StarRocks 将(根据上下文)发出诊断日志、包含堆栈跟踪或等待者/拥有者信息,并且——在 LockManager 中——在此延迟后开始死锁检测。它被 LockUtils(慢锁日志记录)、QueryableReentrantReadWriteLock(过滤慢读器)、LockManager(死锁检测延迟和慢锁跟踪)、LockChecker(周期性慢锁检测)和其他调用者(例如 DiskAndTabletLoadReBalancer 日志记录)使用。降低该值会增加敏感度和日志/诊断开销;将其设置为 0 或负值会禁用初始基于等待的死锁检测延迟行为。与 slow_lock_log_every_ms、slow_lock_print_stack 和 slow_lock_stack_trace_reserve_levels 一起调整。 -- 引入版本:3.2.0 +- 默认值: 3000L +- 类型: long +- 单位: Milliseconds +- 可变: Yes +- 描述: 用于将锁操作或持有的锁归类为“慢”的阈值(毫秒)。当锁的等待或持有时间超过此值时,StarRocks 将(根据上下文)发出诊断日志,包含堆栈跟踪或等待者/拥有者信息,并且——在 LockManager 中——在此延迟之后开始死锁检测。它被 LockUtils(慢锁日志记录)、QueryableReentrantReadWriteLock(过滤慢读器)、LockManager(死锁检测延迟和慢锁跟踪)、LockChecker(周期性慢锁检测)和其他调用者(例如 DiskAndTabletLoadReBalancer 日志记录)使用。降低此值会增加敏感性并增加日志记录/诊断开销;将其设置为 0 或负值会禁用初始基于等待的死锁检测延迟行为。与 slow_lock_log_every_ms、slow_lock_print_stack 和 slow_lock_stack_trace_reserve_levels 一起调整。 +- 引入版本: 3.2.0 ##### sys_log_delete_age -- 默认值:7d -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:系统日志文件的保留期限。默认值 `7d` 指定每个系统日志文件可以保留 7 天。StarRocks 会检查每个系统日志文件并删除 7 天前生成的那些文件。 -- 引入版本:- +- 默认值: 7d +- 类型: String +- 单位: - +- 可变: No +- 描述: 系统日志文件的保留期限。默认值 `7d` 指定每个系统日志文件可以保留 7 天。StarRocks 会检查每个系统日志文件并删除 7 天前生成的文件。 +- 引入版本: - ##### sys_log_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/log" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储系统日志文件的目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/log" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储系统日志文件的目录。 +- 引入版本: - ##### sys_log_enable_compress -- 默认值:false -- 类型:boolean -- 单位:- -- 是否可变:否 -- 描述:当此项设置为 `true` 时,系统会将 ".gz" 后缀附加到轮转的系统日志文件名,以便 Log4j 生成 gzip 压缩的轮转 FE 系统日志(例如,fe.log.*)。此值在 Log4j 配置生成期间读取 (Log4jConfig.initLogging / generateActiveLog4jXmlConfig),并控制 RollingFile filePattern 中使用的 `sys_file_postfix` 属性。启用此功能可减少保留日志的磁盘使用,但会增加轮转期间的 CPU 和 I/O,并更改日志文件名,因此读取日志的工具或脚本必须能够处理 .gz 文件。请注意,审计日志使用单独的压缩配置,即 `audit_log_enable_compress`。 -- 引入版本:v3.2.12 +- 默认值: false +- 类型: boolean +- 单位: - +- 可变: No +- 描述: 当此项设置为 `true` 时,系统会在轮转的系统日志文件名后追加 ".gz" 后缀,以便 Log4j 生产 gzip 压缩的轮转 FE 系统日志(例如 fe.log.*)。此值在 Log4j 配置生成期间读取(Log4jConfig.initLogging / generateActiveLog4jXmlConfig),并控制 RollingFile filePattern 中使用的 `sys_file_postfix` 属性。启用此功能可减少保留日志的磁盘使用,但会在轮转期间增加 CPU 和 I/O,并更改日志文件名,因此读取日志的工具或脚本必须能够处理 .gz 文件。请注意,审计日志使用单独的压缩配置,即 `audit_log_enable_compress`。 +- 引入版本: v3.2.12 ##### sys_log_format -- 默认值:"plaintext" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:选择用于 FE 日志的 Log4j 布局。有效值:`"plaintext"`(默认)和 `"json"`。值不区分大小写。`"plaintext"` 配置 PatternLayout,具有人类可读的时间戳、级别、线程、类.方法:行以及 WARN/ERROR 的堆栈跟踪。`"json"` 配置 JsonTemplateLayout,并发出结构化 JSON 事件(UTC 时间戳、级别、线程 ID/名称、源文件/方法/行、消息、异常堆栈跟踪),适用于日志聚合器 (ELK, Splunk)。JSON 输出遵循 `sys_log_json_max_string_length` 和 `sys_log_json_profile_max_string_length` 的最大字符串长度。 -- 引入版本:v3.2.10 +- 默认值: "plaintext" +- 类型: String +- 单位: - +- 可变: No +- 描述: 选择用于 FE 日志的 Log4j 布局。有效值: `"plaintext"` (默认) 和 `"json"`。值不区分大小写。`"plaintext"` 配置 PatternLayout,具有人类可读的时间戳、级别、线程、类.方法:行,以及 WARN/ERROR 的堆栈跟踪。`"json"` 配置 JsonTemplateLayout 并发出结构化 JSON 事件(UTC 时间戳、级别、线程 ID/名称、源文件/方法/行、消息、异常堆栈跟踪),适用于日志聚合器(ELK、Splunk)。JSON 输出遵循 `sys_log_json_max_string_length` 和 `sys_log_json_profile_max_string_length` 的最大字符串长度限制。 +- 引入版本: v3.2.10 ##### sys_log_json_max_string_length -- 默认值:1048576 -- 类型:Int -- 单位:字节 -- 是否可变:否 -- 描述:设置用于 JSON 格式系统日志的 JsonTemplateLayout "maxStringLength" 值。当 `sys_log_format` 设置为 `"json"` 时,如果字符串值字段(例如 "message" 和字符串化的异常堆栈跟踪)的长度超过此限制,则会被截断。该值注入到 `Log4jConfig.generateActiveLog4jXmlConfig()` 中生成的 Log4j XML 中,并应用于默认、警告、审计、转储和大查询布局。Profile 布局使用单独的配置 (`sys_log_json_profile_max_string_length`)。降低此值可减少日志大小,但可能会截断有用信息。 -- 引入版本:3.2.11 +- 默认值: 1048576 +- 类型: Int +- 单位: Bytes +- 可变: No +- 描述: 设置 JSON 格式系统日志使用的 JsonTemplateLayout "maxStringLength" 值。当 `sys_log_format` 设置为 `"json"` 时,字符串类型字段(例如“message”和字符串化异常堆栈跟踪)如果长度超过此限制将被截断。该值被注入到 `Log4jConfig.generateActiveLog4jXmlConfig()` 中生成的 Log4j XML 中,并应用于默认、警告、审计、转储和大查询布局。profile 布局使用单独的配置 (`sys_log_json_profile_max_string_length`)。降低此值会减少日志大小,但可能会截断有用信息。 +- 引入版本: 3.2.11 ##### sys_log_json_profile_max_string_length -- 默认值:104857600 (100 MB) -- 类型:Int -- 单位:字节 -- 是否可变:否 -- 描述:当 `sys_log_format` 为 "json" 时,设置 Profile(及相关功能)日志 appender 的 JsonTemplateLayout 的 maxStringLength。JSON 格式 Profile 日志中的字符串字段值将被截断到此字节长度;非字符串字段不受影响。此项应用于 Log4jConfig `JsonTemplateLayout maxStringLength`,并在使用 `plaintext` 日志记录时忽略。将该值设置得足够大以容纳所需的完整消息,但请注意,较大的值会增加日志大小和 I/O。 -- 引入版本:v3.2.11 +- 默认值: 104857600 (100 MB) +- 类型: Int +- 单位: Bytes +- 可变: No +- 描述: 当 `sys_log_format` 为 "json" 时,设置 profile(及相关功能)日志 appender 的 JsonTemplateLayout 的 maxStringLength。JSON 格式 profile 日志中的字符串字段值将被截断为此字节长度;非字符串字段不受影响。此项在 Log4jConfig `JsonTemplateLayout maxStringLength` 中应用,并在使用 `plaintext` 日志记录时被忽略。将值设置得足够大以容纳您需要的完整消息,但请注意,较大的值会增加日志大小和 I/O。 +- 引入版本: v3.2.11 ##### sys_log_level -- 默认值:INFO -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:系统日志条目分类的严重性级别。有效值:`INFO`、`WARN`、`ERROR` 和 `FATAL`。 -- 引入版本:- +- 默认值: INFO +- 类型: String +- 单位: - +- 可变: No +- 描述: 系统日志条目分类的严重性级别。有效值: `INFO`、`WARN`、`ERROR` 和 `FATAL`。 +- 引入版本: - ##### sys_log_roll_interval -- 默认值:DAY -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:StarRocks 轮转系统日志条目的时间间隔。有效值:`DAY` 和 `HOUR`。 +- 默认值: DAY +- 类型: String +- 单位: - +- 可变: No +- 描述: StarRocks 轮转系统日志条目的时间间隔。有效值: `DAY` 和 `HOUR`。 - 如果此参数设置为 `DAY`,则在系统日志文件名称中添加 `yyyyMMdd` 格式的后缀。 - 如果此参数设置为 `HOUR`,则在系统日志文件名称中添加 `yyyyMMddHH` 格式的后缀。 -- 引入版本:- +- 引入版本: - ##### sys_log_roll_num -- 默认值:10 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:在 `sys_log_roll_interval` 参数指定的每个保留期内可以保留的系统日志文件的最大数量。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 在 `sys_log_roll_interval` 参数指定的每个保留期内,可以保留的系统日志文件的最大数量。 +- 引入版本: - ##### sys_log_to_console -- 默认值:false(除非环境变量 `SYS_LOG_TO_CONSOLE` 设置为 "1") -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:当此项设置为 `true` 时,系统会将 Log4j 配置为将所有日志发送到控制台 (ConsoleErr appender),而不是基于文件的 appender。此值在生成活动 Log4j XML 配置时读取(这会影响根日志记录器和每个模块日志记录器 appender 的选择)。其值在进程启动时从 `SYS_LOG_TO_CONSOLE` 环境变量中捕获。在运行时更改它无效。此配置通常用于容器化或 CI 环境中,其中首选 stdout/stderr 日志收集而不是写入日志文件。 -- 引入版本:v3.2.0 +- 默认值: false (除非环境变量 `SYS_LOG_TO_CONSOLE` 设置为 "1") +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 当此项设置为 `true` 时,系统会将 Log4j 配置为将所有日志发送到控制台 (ConsoleErr appender),而不是基于文件的 appender。此值在生成活动 Log4j XML 配置时读取(这会影响根日志记录器和每个模块日志记录器 appender 的选择)。其值在进程启动时从 `SYS_LOG_TO_CONSOLE` 环境变量中捕获。在运行时更改它无效。此配置通常用于容器化或 CI 环境中,其中 stdout/stderr 日志收集优于写入日志文件。 +- 引入版本: v3.2.0 ##### sys_log_verbose_modules -- 默认值:空字符串 -- 类型:String[] -- 单位:- -- 是否可变:否 -- 描述:StarRocks 为其生成系统日志的模块。如果此参数设置为 `org.apache.starrocks.catalog`,StarRocks 将仅为 catalog 模块生成系统日志。模块名称之间用逗号 (,) 和空格分隔。 -- 引入版本:- +- 默认值: Empty string +- 类型: String[] +- 单位: - +- 可变: No +- 描述: StarRocks 生成系统日志的模块。如果此参数设置为 `org.apache.starrocks.catalog`,StarRocks 将仅为 catalog 模块生成系统日志。模块名称之间用逗号 (,) 和空格分隔。 +- 引入版本: - ##### sys_log_warn_modules -- 默认值:{} -- 类型:String[] -- 单位:- -- 是否可变:否 -- 描述:在启动时系统将配置为 WARN 级别日志记录器并路由到警告 appender (SysWF) —— `fe.warn.log` 文件的日志记录器名称或包前缀列表。条目插入到生成的 Log4j 配置中(与 org.apache.kafka、org.apache.hudi 和 org.apache.hadoop.io.compress 等内置警告模块一起),并生成 `` 这样的日志记录器元素。建议使用完全限定的包和类前缀(例如,"com.example.lib"),以抑制 INFO/DEBUG 输出到常规日志中,并允许单独捕获警告。 -- 引入版本:v3.2.13 +- 默认值: {} +- 类型: String[] +- 单位: - +- 可变: No +- 描述: 启动时系统将配置为 WARN 级别日志记录器并路由到警告 appender (SysWF)——`fe.warn.log` 文件的日志记录器名称或包前缀列表。条目插入到生成的 Log4j 配置中(与内置警告模块如 org.apache.kafka、org.apache.hudi 和 org.apache.hadoop.io.compress 一起),并生成类似 `` 的日志记录器元素。建议使用完全限定的包和类前缀(例如,“com.example.lib”)以抑制常规日志中的嘈杂 INFO/DEBUG 输出,并允许单独捕获警告。 +- 引入版本: v3.2.13 ### 服务器 ##### brpc_idle_wait_max_time -- 默认值:10000 -- 类型:Int -- 单位:毫秒 -- 是否可变:否 -- 描述:bRPC 客户端在空闲状态下等待的最长时间。 -- 引入版本:- +- 默认值: 10000 +- 类型: Int +- 单位: ms +- 可变: No +- 描述: bRPC 客户端在空闲状态下等待的最长时间。 +- 引入版本: - ##### brpc_inner_reuse_pool -- 默认值:true -- 类型:boolean -- 单位:- -- 是否可变:否 -- 描述:控制底层 BRPC 客户端是否为连接/通道使用内部共享重用池。StarRocks 在 BrpcProxy 中构建 RpcClientOptions 时读取 `brpc_inner_reuse_pool`(通过 `rpcOptions.setInnerResuePool(...)`)。启用时(true),RPC 客户端重用内部池以减少每次调用的连接创建,降低 FE 到 BE / LakeService RPC 的连接流失、内存和文件描述符使用。禁用时(false),客户端可能会创建更多隔离池(以更高的资源使用为代价增加并发隔离)。更改此值需要重启进程才能生效。 -- 引入版本:v3.3.11, v3.4.1, v3.5.0 +- 默认值: true +- 类型: boolean +- 单位: - +- 可变: No +- 描述: 控制底层 BRPC 客户端是否使用内部共享复用池进行连接/通道。StarRocks 在 BrpcProxy 中构造 RpcClientOptions 时读取 `brpc_inner_reuse_pool`(通过 `rpcOptions.setInnerResuePool(...)`)。启用时(true),RPC 客户端复用内部池以减少每次调用连接创建,从而降低 FE 到 BE / LakeService RPC 的连接 churn、内存和文件描述符使用。禁用时(false),客户端可能会创建更多隔离池(以更高的资源使用为代价增加并发隔离)。更改此值需要重新启动进程才能生效。 +- 引入版本: v3.3.11, v3.4.1, v3.5.0 ##### brpc_min_evictable_idle_time_ms -- 默认值:120000 -- 类型:Int -- 单位:毫秒 -- 是否可变:否 -- 描述:空闲 BRPC 连接在连接池中必须保持空闲状态的毫秒数,然后才能被驱逐。应用于 `BrpcProxy` 使用的 RpcClientOptions(通过 RpcClientOptions.setMinEvictableIdleTime)。增加此值可使空闲连接保持更长时间(减少重新连接的流失);降低此值可更快释放未使用的套接字(减少资源使用)。与 `brpc_connection_pool_size` 和 `brpc_idle_wait_max_time` 一起调整,以平衡连接重用、池增长和驱逐行为。 -- 引入版本:v3.3.11, v3.4.1, v3.5.0 +- 默认值: 120000 +- 类型: Int +- 单位: Milliseconds +- 可变: No +- 描述: 空闲 BRPC 连接在连接池中必须保持空闲状态才能被驱逐的时间(毫秒)。应用于 BrpcProxy 使用的 RpcClientOptions(通过 RpcClientOptions.setMinEvictableIdleTime)。增加此值可使空闲连接保持更长时间(减少重新连接 churn);降低此值可更快释放未使用的套接字(减少资源使用)。与 `brpc_connection_pool_size` 和 `brpc_idle_wait_max_time` 一起调整,以平衡连接复用、池增长和驱逐行为。 +- 引入版本: v3.3.11, v3.4.1, v3.5.0 ##### brpc_reuse_addr -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:当为 true 时,StarRocks 会设置套接字选项,允许 brpc RpcClient 创建的客户端套接字重用本地地址(通过 RpcClientOptions.setReuseAddress)。启用此功能可减少绑定失败,并允许在套接字关闭后更快地重新绑定本地端口,这对于高速率连接流失或快速重启很有帮助。当为 false 时,禁用地址/端口重用,这可以减少意外端口共享的可能性,但可能会增加瞬时绑定错误。此选项与 `brpc_connection_pool_size` 和 `brpc_short_connection` 配置的连接行为交互,因为它会影响客户端套接字重新绑定和重用的速度。 -- 引入版本:v3.3.11, v3.4.1, v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 当为 true 时,StarRocks 会设置套接字选项,允许 brpc RpcClient 创建的客户端套接字重用本地地址(通过 RpcClientOptions.setReuseAddress)。启用此功能可减少绑定失败并允许在套接字关闭后更快地重新绑定本地端口,这对于高速率连接 churn 或快速重启很有帮助。当为 false 时,地址/端口重用被禁用,这可以减少意外端口共享的可能性,但可能会增加瞬态绑定错误。此选项与 `brpc_connection_pool_size` 和 `brpc_short_connection` 配置的连接行为交互,因为它影响客户端套接字重新绑定和重用的速度。 +- 引入版本: v3.3.11, v3.4.1, v3.5.0 ##### cluster_name -- 默认值:StarRocks Cluster -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:FE 所属的 StarRocks 集群的名称。集群名称显示在网页的 `Title` 上。 -- 引入版本:- +- 默认值: StarRocks Cluster +- 类型: String +- 单位: - +- 可变: No +- 描述: FE 所属的 StarRocks 集群的名称。集群名称在网页上显示为 `Title`。 +- 引入版本: - ##### dns_cache_ttl_seconds -- 默认值:60 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:成功 DNS 查询的 DNS 缓存 TTL(生存时间,秒)。此项设置 Java 安全属性 `networkaddress.cache.ttl`,它控制 JVM 缓存成功 DNS 查询的时长。将此项设置为 `-1` 以允许系统始终缓存信息,或设置为 `0` 以禁用缓存。这在 IP 地址频繁变化的环境中特别有用,例如 Kubernetes 部署或使用动态 DNS 时。 -- 引入版本:v3.5.11, v4.0.4 +- 默认值: 60 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: 成功 DNS 查找的 DNS 缓存 TTL(存活时间,秒)。这设置了 Java 安全属性 `networkaddress.cache.ttl`,它控制 JVM 缓存成功 DNS 查找的时间。将此项设置为 `-1` 以允许系统始终缓存信息,或 `0` 以禁用缓存。这在 IP 地址频繁变化的环境中特别有用,例如 Kubernetes 部署或使用动态 DNS 时。 +- 引入版本: v3.5.11, v4.0.4 ##### enable_http_async_handler -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否允许系统异步处理 HTTP 请求。如果启用此功能,Netty worker 线程收到的 HTTP 请求将被提交到单独的线程池进行服务逻辑处理,以避免阻塞 HTTP 服务器。如果禁用,Netty worker 将处理服务逻辑。 -- 引入版本:4.0.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否允许系统异步处理 HTTP 请求。如果启用此功能,Netty 工作线程收到的 HTTP 请求将被提交到单独的线程池进行服务逻辑处理,以避免阻塞 HTTP 服务器。如果禁用,Netty 工作线程将处理服务逻辑。 +- 引入版本: 4.0.0 ##### enable_http_validate_headers -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:控制 Netty 的 HttpServerCodec 是否执行严格的 HTTP 头部验证。该值在 HttpServer 中初始化 HTTP 管道时传递给 HttpServerCodec(参见 UseLocations)。默认值为 false 以实现向后兼容性,因为较新的 Netty 版本强制执行更严格的头部规则 (https://github.com/netty/netty/pull/12760)。设置为 true 以强制执行符合 RFC 的头部检查;这样做可能会导致来自旧客户端或代理的格式错误或不符合规范的请求被拒绝。更改需要重启 HTTP 服务器才能生效。 -- 引入版本:v3.3.0, v3.4.0, v3.5.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 控制 Netty 的 HttpServerCodec 是否执行严格的 HTTP 头部验证。该值在 HttpServer 中初始化 HTTP 管道时传递给 HttpServerCodec(参见 UseLocations)。默认值为 false 以实现向后兼容,因为较新的 netty 版本强制执行更严格的头部规则 (https://github.com/netty/netty/pull/12760)。设置为 true 以强制执行符合 RFC 的头部检查;这样做可能会导致来自旧客户端或代理的格式错误或不符合规范的请求被拒绝。更改需要重新启动 HTTP 服务器才能生效。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### enable_https -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:FE 节点是否启用 HTTPS 服务器以及 HTTP 服务器。 -- 引入版本:v4.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否在 FE 节点中与 HTTP 服务器一起启用 HTTPS 服务器。 +- 引入版本: v4.0 ##### frontend_address -- 默认值:0.0.0.0 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:FE 节点的 IP 地址。 -- 引入版本:- +- 默认值: 0.0.0.0 +- 类型: String +- 单位: - +- 可变: No +- 描述: FE 节点的 IP 地址。 +- 引入版本: - ##### http_async_threads_num -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:异步 HTTP 请求处理线程池的大小。别名是 `max_http_sql_service_task_threads_num`。 -- 引入版本:4.0.0 +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 异步 HTTP 请求处理的线程池大小。别名为 `max_http_sql_service_task_threads_num`。 +- 引入版本: 4.0.0 ##### http_backlog_num -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 HTTP 服务器持有的积压队列的长度。 -- 引入版本:- +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 HTTP 服务器持有的积压队列的长度。 +- 引入版本: - ##### http_max_chunk_size -- 默认值:8192 -- 类型:Int -- 单位:字节 -- 是否可变:否 -- 描述:设置 FE HTTP 服务器中 Netty 的 HttpServerCodec 处理的单个 HTTP 块的最大允许大小(以字节为单位)。它作为第三个参数传递给 HttpServerCodec,并限制分块传输或流式请求/响应期间块的长度。如果传入块超过此值,Netty 将引发帧过大错误(例如 TooLongFrameException),并且请求可能会被拒绝。对于合法的分块上传,请增加此值;保持较小以减少内存压力和 DoS 攻击的表面积。此设置与 `http_max_initial_line_length`、`http_max_header_size` 和 `enable_http_validate_headers` 一起使用。 -- 引入版本:v3.2.0 +- 默认值: 8192 +- 类型: Int +- 单位: Bytes +- 可变: No +- 描述: 设置 FE HTTP 服务器中 Netty 的 HttpServerCodec 处理的单个 HTTP 块的最大允许大小(字节)。它作为第三个参数传递给 HttpServerCodec,并限制分块传输或流式请求/响应期间块的长度。如果传入块超过此值,Netty 将引发帧过大错误(例如 TooLongFrameException),并且请求可能会被拒绝。对于合法的分块上传,请增加此值;保持其较小以减少内存压力和 DoS 攻击的攻击面。此设置与 `http_max_initial_line_length`、`http_max_header_size` 和 `enable_http_validate_headers` 一起使用。 +- 引入版本: v3.2.0 ##### http_max_header_size -- 默认值:32768 -- 类型:Int -- 单位:字节 -- 是否可变:否 -- 描述:Netty 的 `HttpServerCodec` 解析的 HTTP 请求头块的最大允许大小(以字节为单位)。StarRocks 将此值传递给 `HttpServerCodec`(作为 `Config.http_max_header_size`);如果传入请求的头部(名称和值组合)超过此限制,编解码器将拒绝请求(解码器异常),并且连接/请求将失败。仅当客户端合法发送非常大的头部(大型 Cookie 或许多自定义头部)时才增加此值;较大的值会增加每个连接的内存使用。与 `http_max_initial_line_length` 和 `http_max_chunk_size` 结合调整。更改需要 FE 重启。 -- 引入版本:v3.2.0 +- 默认值: 32768 +- 类型: Int +- 单位: Bytes +- 可变: No +- 描述: Netty 的 `HttpServerCodec` 解析的 HTTP 请求头块的最大允许大小(字节)。StarRocks 将此值传递给 `HttpServerCodec`(作为 `Config.http_max_header_size`);如果传入请求的头(名称和值组合)超过此限制,编解码器将拒绝该请求(解码器异常),并且连接/请求将失败。仅当客户端合法发送非常大的头(大型 cookie 或许多自定义头)时才增加此值;较大的值会增加每个连接的内存使用。与 `http_max_initial_line_length` 和 `http_max_chunk_size` 结合调整。更改需要重新启动 FE。 +- 引入版本: v3.2.0 ##### http_max_initial_line_length -- 默认值:4096 -- 类型:Int -- 单位:字节 -- 是否可变:否 -- 描述:设置 HttpServer 中使用的 Netty `HttpServerCodec` 接受的 HTTP 初始请求行(方法 + 请求目标 + HTTP 版本)的最大允许长度(以字节为单位)。该值传递给 Netty 的解码器,初始行长于此值的请求将被拒绝 (TooLongFrameException)。仅当您必须支持非常长的请求 URI 时才增加此值;较大的值会增加内存使用,并可能增加遭受格式错误/请求滥用的风险。与 `http_max_header_size` 和 `http_max_chunk_size` 一起调整。 -- 引入版本:v3.2.0 +- 默认值: 4096 +- 类型: Int +- 单位: Bytes +- 可变: No +- 描述: 设置 HttpServer 中使用的 Netty `HttpServerCodec` 接受的 HTTP 初始请求行(方法 + 请求目标 + HTTP 版本)的最大允许长度(字节)。该值传递给 Netty 的解码器,初始行长于此值的请求将被拒绝(TooLongFrameException)。仅当您必须支持非常长的请求 URI 时才增加此值;较大的值会增加内存使用,并可能增加暴露于格式错误/请求滥用的风险。与 `http_max_header_size` 和 `http_max_chunk_size` 一起调整。 +- 引入版本: v3.2.0 ##### http_port -- 默认值:8030 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 HTTP 服务器监听的端口。 -- 引入版本:- +- 默认值: 8030 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 HTTP 服务器监听的端口。 +- 引入版本: - ##### http_web_page_display_hardware -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当为 true 时,HTTP 索引页面 (/index) 将包含通过 oshi 库(CPU、内存、进程、磁盘、文件系统、网络等)填充的硬件信息部分。oshi 可能会间接调用系统实用程序或读取系统文件(例如,它可以执行 `getent passwd` 等命令),这可能会暴露敏感的系统数据。如果您需要更严格的安全性或希望避免在主机上执行这些间接命令,请将此配置设置为 false,以禁用在 Web UI 上收集和显示硬件详细信息。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当为 true 时,HTTP 索引页面 (/index) 将包含一个通过 oshi 库填充的硬件信息部分(CPU、内存、进程、磁盘、文件系统、网络等)。oshi 可能会间接调用系统实用程序或读取系统文件(例如,它可以执行 `getent passwd` 等命令),这可能会暴露敏感系统数据。如果您需要更严格的安全性或希望避免在主机上执行这些间接命令,请将此配置设置为 false,以禁用在 Web UI 上收集和显示硬件详细信息。 +- 引入版本: v3.2.0 ##### http_worker_threads_num -- 默认值:0 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:HTTP 服务器处理 HTTP 请求的工作线程数。对于负值或 0 值,线程数将是 CPU 核心数的两倍。 -- 引入版本:v2.5.18, v3.0.10, v3.1.7, v3.2.2 +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: No +- 描述: HTTP 服务器处理 HTTP 请求的工作线程数。如果为负值或 0,线程数将是 CPU 核心数的两倍。 +- 引入版本: v2.5.18, v3.0.10, v3.1.7, v3.2.2 ##### https_port -- 默认值:8443 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 HTTPS 服务器监听的端口。 -- 引入版本:v4.0 +- 默认值: 8443 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 HTTPS 服务器监听的端口。 +- 引入版本: v4.0 ##### max_mysql_service_task_threads_num -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 MySQL 服务器可以运行的最大任务处理线程数。 -- 引入版本:- +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 MySQL 服务器可运行以处理任务的最大线程数。 +- 引入版本: - ##### max_task_runs_threads_num -- 默认值:512 -- 类型:Int -- 单位:线程 -- 是否可变:否 -- 描述:控制任务运行执行器线程池中的最大线程数。此值是并发任务运行执行的上限;增加它会提高并行度,但也会增加 CPU、内存和网络使用率,而减少它可能导致任务运行积压和更高的延迟。根据预期的并发调度作业和可用的系统资源调整此值。 -- 引入版本:v3.2.0 +- 默认值: 512 +- 类型: Int +- 单位: Threads +- 可变: No +- 描述: 控制任务运行执行器线程池中的最大线程数。此值是并发任务运行执行的上限;增加它会提高并行度,但也会增加 CPU、内存和网络使用,而减少它可能导致任务运行积压和更高的延迟。根据预期的并发计划作业和可用的系统资源调整此值。 +- 引入版本: v3.2.0 ##### memory_tracker_enable -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用 FE 内存跟踪子系统。当 `memory_tracker_enable` 设置为 `true` 时,`MemoryUsageTracker` 定期扫描注册的元数据模块,更新内存中的 `MemoryUsageTracker.MEMORY_USAGE` map,记录总数,并导致 `MetricRepo` 在指标输出中暴露内存使用和对象计数 gauge。使用 `memory_tracker_interval_seconds` 控制采样间隔。启用此功能有助于监控和调试内存消耗,但会引入 CPU 和 I/O 开销以及额外的指标基数。 -- 引入版本:v3.2.4 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用 FE 内存跟踪器子系统。当 `memory_tracker_enable` 设置为 `true` 时,`MemoryUsageTracker` 定期扫描注册的元数据模块,更新内存中的 `MemoryUsageTracker.MEMORY_USAGE` 映射,记录总计,并导致 `MetricRepo` 在指标输出中暴露内存使用和对象计数度量。使用 `memory_tracker_interval_seconds` 控制采样间隔。启用此功能有助于监视和调试内存消耗,但会引入 CPU 和 I/O 开销以及额外的指标基数。 +- 引入版本: v3.2.4 ##### memory_tracker_interval_seconds -- 默认值:60 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:FE `MemoryUsageTracker` 守护进程轮询并记录 FE 进程和注册的 `MemoryTrackable` 模块内存使用情况的时间间隔(秒)。当 `memory_tracker_enable` 设置为 `true` 时,跟踪器按此频率运行,更新 `MEMORY_USAGE`,并记录聚合的 JVM 和跟踪模块使用情况。 -- 引入版本:v3.2.4 +- 默认值: 60 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: FE `MemoryUsageTracker` 守护程序轮询和记录 FE 进程和注册的 `MemoryTrackable` 模块的内存使用情况的间隔(秒)。当 `memory_tracker_enable` 设置为 `true` 时,跟踪器以此频率运行,更新 `MEMORY_USAGE`,并记录聚合的 JVM 和跟踪模块的使用情况。 +- 引入版本: v3.2.4 ##### mysql_nio_backlog_num -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 MySQL 服务器持有的积压队列的长度。 -- 引入版本:- +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 MySQL 服务器持有的积压队列的长度。 +- 引入版本: - ##### mysql_server_version -- 默认值:8.0.33 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:返回给客户端的 MySQL 服务器版本。修改此参数将影响以下情况的版本信息: +- 默认值: 8.0.33 +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 返回给客户端的 MySQL 服务器版本。修改此参数将影响以下情况下的版本信息: 1. `select version();` 2. 握手包版本 3. 全局变量 `version` 的值 (`show variables like 'version';`) -- 引入版本:- +- 引入版本: - ##### mysql_service_io_threads_num -- 默认值:4 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 MySQL 服务器可以运行的最大 I/O 事件处理线程数。 -- 引入版本:- +- 默认值: 4 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 MySQL 服务器可运行以处理 I/O 事件的最大线程数。 +- 引入版本: - ##### mysql_service_kill_after_disconnect -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:控制当检测到 MySQL TCP 连接关闭(读取时 EOF)时,服务器如何处理会话。如果设置为 `true`,服务器会立即终止该连接的任何正在运行的查询并立即执行清理。如果设置为 `false`,服务器在断开连接时不会终止正在运行的查询,并且仅在没有待处理请求任务时才执行清理,允许长时间运行的查询在客户端断开连接后继续执行。注意:尽管有一个简短的注释建议 TCP Keep-Alive,但此参数专门控制断开连接后的终止行为,应根据您是否希望终止孤立查询(在不可靠/负载均衡客户端后面推荐)或允许其完成进行设置。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 控制当 MySQL TCP 连接检测到关闭(读取时 EOF)时服务器如何处理会话。如果设置为 `true`,服务器会立即终止该连接的任何正在运行的查询并立即执行清理。如果为 `false`,服务器在断开连接时不会终止正在运行的查询,并且仅在没有待处理的请求任务时才执行清理,允许长时间运行的查询在客户端断开连接后继续。注意:尽管有一个简短的注释暗示 TCP keep-alive,但此参数专门管理断开连接后的终止行为,应根据您是否希望终止孤立查询(在不可靠/负载平衡客户端后面推荐)或允许其完成进行设置。 +- 引入版本: - ##### mysql_service_nio_enable_keep_alive -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:为 MySQL 连接启用 TCP Keep-Alive。对于负载均衡器后的长时间空闲连接很有用。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 为 MySQL 连接启用 TCP Keep-Alive。对于负载均衡器后面的长时间空闲连接很有用。 +- 引入版本: - ##### net_use_ipv6_when_priority_networks_empty -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:一个布尔值,用于控制当未指定 `priority_networks` 时是否优先使用 IPv6 地址。`true` 表示当托管节点的服务器同时具有 IPv4 和 IPv6 地址且未指定 `priority_networks` 时,允许系统优先使用 IPv6 地址。 -- 引入版本:v3.3.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 一个布尔值,用于控制当 `priority_networks` 未指定时是否优先使用 IPv6 地址。`true` 表示当托管节点的服务器同时具有 IPv4 和 IPv6 地址且 `priority_networks` 未指定时,允许系统优先使用 IPv6 地址。 +- 引入版本: v3.3.0 ##### priority_networks -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:声明一种用于具有多个 IP 地址的服务器的选择策略。请注意,最多只有一个 IP 地址必须与此参数指定的列表匹配。此参数的值是一个列表,由以分号 (;) 分隔的 CIDR 表示法条目组成,例如 10.10.10.0/24。如果没有 IP 地址与此列表中的条目匹配,则将随机选择服务器的可用 IP 地址。从 v3.3.0 开始,StarRocks 支持基于 IPv6 的部署。如果服务器同时具有 IPv4 和 IPv6 地址,并且未指定此参数,则系统默认使用 IPv4 地址。您可以通过将 `net_use_ipv6_when_priority_networks_empty` 设置为 `true` 来更改此行为。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 声明多 IP 地址服务器的选择策略。请注意,最多一个 IP 地址必须与此参数指定的列表匹配。此参数的值是一个列表,由 CIDR 表示法中以分号 (;) 分隔的条目组成,例如 10.10.10.0/24。如果没有 IP 地址与此列表中的条目匹配,则将随机选择服务器的可用 IP 地址。从 v3.3.0 开始,StarRocks 支持基于 IPv6 的部署。如果服务器同时具有 IPv4 和 IPv6 地址,并且未指定此参数,则系统默认使用 IPv4 地址。您可以通过将 `net_use_ipv6_when_priority_networks_empty` 设置为 `true` 来更改此行为。 +- 引入版本: - ##### proc_profile_cpu_enable -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `true` 时,后台 `ProcProfileCollector` 将使用 `AsyncProfiler` 收集 CPU Profile,并将 HTML 报告写入 `sys_log_dir/proc_profile` 下。每次收集运行都会在 `proc_profile_collect_time_s` 配置的持续时间内记录 CPU 堆栈,并使用 `proc_profile_jstack_depth` 作为 Java 堆栈深度。生成的 Profile 会根据 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 进行压缩和删除。`AsyncProfiler` 需要本机库 (`libasyncProfiler.so`);`one.profiler.extractPath` 设置为 `STARROCKS_HOME_DIR/bin` 以避免 `/tmp` 上的 noexec 问题。 -- 引入版本:v3.2.12 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `true` 时,后台 `ProcProfileCollector` 将使用 `AsyncProfiler` 收集 CPU profile,并将 HTML 报告写入 `sys_log_dir/proc_profile`。每次收集运行都会记录 `proc_profile_collect_time_s` 配置的持续时间内的 CPU 堆栈,并使用 `proc_profile_jstack_depth` 作为 Java 堆栈深度。生成的 profile 会被压缩,旧文件会根据 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 进行清理。`AsyncProfiler` 需要原生库 (`libasyncProfiler.so`);`one.profiler.extractPath` 设置为 `STARROCKS_HOME_DIR/bin` 以避免 `/tmp` 上的 noexec 问题。 +- 引入版本: v3.2.12 ##### qe_max_connection -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:所有用户可以与 FE 节点建立的最大连接数。从 v3.1.12 和 v3.2.7 开始,默认值已从 `1024` 更改为 `4096`。 -- 引入版本:- +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 所有用户可以与 FE 节点建立的最大连接数。从 v3.1.12 和 v3.2.7 开始,默认值已从 `1024` 更改为 `4096`。 +- 引入版本: - ##### query_port -- 默认值:9030 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 MySQL 服务器监听的端口。 -- 引入版本:- +- 默认值: 9030 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 MySQL 服务器监听的端口。 +- 引入版本: - ##### rpc_port -- 默认值:9020 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 Thrift 服务器监听的端口。 -- 引入版本:- +- 默认值: 9020 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 Thrift 服务器监听的端口。 +- 引入版本: - ##### slow_lock_stack_trace_reserve_levels -- 默认值:15 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:控制当 StarRocks 转储慢锁或持有锁的调试信息时捕获和发出的堆栈跟踪帧数。此值由 `QueryableReentrantReadWriteLock` 传递给 `LogUtil.getStackTraceToJsonArray`,用于生成独占锁所有者、当前线程和最旧/共享读取器的 JSON。增加此值可提供更多上下文以诊断慢锁或死锁问题,但会以更大的 JSON 有效负载和捕获堆栈的 CPU/内存略高为代价;减少此值可降低开销。注意:当仅记录慢锁时,读取器条目可以由 `slow_lock_threshold_ms` 过滤。 -- 引入版本:v3.4.0, v3.5.0 +- 默认值: 15 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 控制当 StarRocks 为慢锁或持有锁转储锁调试信息时捕获和发出的堆栈跟踪帧数。此值由 `QueryableReentrantReadWriteLock` 传递给 `LogUtil.getStackTraceToJsonArray`,用于生成独占锁拥有者、当前线程和最旧/共享读者的 JSON。增加此值可提供更多上下文以诊断慢锁或死锁问题,但代价是更大的 JSON 有效负载和捕获堆栈时略高的 CPU/内存开销;减小此值可减少开销。注意:当仅记录慢锁时,读者条目可以通过 `slow_lock_threshold_ms` 过滤。 +- 引入版本: v3.4.0, v3.5.0 ##### ssl_cipher_blacklist -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:以逗号分隔的列表,支持正则表达式,用于通过 IANA 名称列出 SSL 密码套件黑名单。如果同时设置了白名单和黑名单,则黑名单优先。 -- 引入版本:v4.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 以逗号分隔的列表,支持正则表达式,用于通过 IANA 名称将 SSL 密码套件列入黑名单。如果同时设置了白名单和黑名单,则黑名单优先。 +- 引入版本: v4.0 ##### ssl_cipher_whitelist -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:以逗号分隔的列表,支持正则表达式,用于通过 IANA 名称列出 SSL 密码套件白名单。如果同时设置了白名单和黑名单,则黑名单优先。 -- 引入版本:v4.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 以逗号分隔的列表,支持正则表达式,用于通过 IANA 名称将 SSL 密码套件列入白名单。如果同时设置了白名单和黑名单,则黑名单优先。 +- 引入版本: v4.0 ##### task_runs_concurrency -- 默认值:4 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:并发运行 TaskRun 实例的全局限制。当当前运行计数大于或等于 `task_runs_concurrency` 时,`TaskRunScheduler` 会停止调度新的运行,因此此值限制了跨调度器的并行 TaskRun 执行。它还用于 `MVPCTRefreshPartitioner` 计算每个 TaskRun 的分区刷新粒度。增加该值会提高并行度并增加资源使用;减少它会降低并发度并使每个运行的分区刷新更大。除非有意禁用调度,否则不要设置为 0 或负值:0(或负值)将有效地阻止 `TaskRunScheduler` 调度新的 TaskRun。 -- 引入版本:v3.2.0 +- 默认值: 4 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 全局限制并发运行的 TaskRun 实例数。当当前运行计数大于或等于 `task_runs_concurrency` 时,`TaskRunScheduler` 会停止调度新的运行,因此此值限制了调度器中并行 TaskRun 执行的上限。它还被 `MVPCTRefreshPartitioner` 用于计算每个 TaskRun 的分区刷新粒度。增加此值会提高并行度并增加资源使用;减少此值会降低并发度并使每个运行的分区刷新更大。除非故意禁用调度,否则不要设置为 0 或负值:0(或负值)将有效地阻止 `TaskRunScheduler` 调度新的 TaskRun。 +- 引入版本: v3.2.0 ##### task_runs_queue_length -- 默认值:500 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:限制待处理队列中保留的最大待处理 TaskRun 项数。`TaskRunManager` 检查当前待处理计数,并且当有效待处理 TaskRun 计数大于或等于 `task_runs_queue_length` 时拒绝新的提交。在添加合并/接受的 TaskRun 之前,会重新检查相同的限制。调整此值以平衡内存和调度积压:对于大量突发工作负载,设置为较高值以避免拒绝;对于限制内存和减少待处理积压,设置为较低值。 -- 引入版本:v3.2.0 +- 默认值: 500 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 限制挂起队列中保留的 TaskRun 项的最大数量。`TaskRunManager` 检查当前挂起计数,当有效挂起 TaskRun 计数大于或等于 `task_runs_queue_length` 时,拒绝新的提交。在添加合并/接受的 TaskRun 之前,会重新检查相同的限制。调整此值以平衡内存和调度积压:对于大型突发工作负载,设置为更高以避免拒绝,或设置为更低以限制内存并减少挂起积压。 +- 引入版本: v3.2.0 ##### thrift_backlog_num -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 节点中 Thrift 服务器持有的积压队列的长度。 -- 引入版本:- +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 节点中 Thrift 服务器持有的积压队列的长度。 +- 引入版本: - ##### thrift_client_timeout_ms -- 默认值:5000 -- 类型:Int -- 单位:毫秒 -- 是否可变:否 -- 描述:空闲客户端连接超时的时间长度。 -- 引入版本:- +- 默认值: 5000 +- 类型: Int +- 单位: Milliseconds +- 可变: No +- 描述: 空闲客户端连接超时的时间长度。 +- 引入版本: - ##### thrift_rpc_max_body_size -- 默认值:-1 -- 类型:Int -- 单位:字节 -- 是否可变:否 -- 描述:控制构建服务器的 Thrift 协议时允许的最大 Thrift RPC 消息体大小(以字节为单位)(传递给 `ThriftServer` 中的 TBinaryProtocol.Factory)。值为 `-1` 表示禁用限制(无界)。设置正值会强制执行上限,以便大于此值的消息被 Thrift 层拒绝,这有助于限制内存使用并缓解超大请求或 DoS 风险。将此设置为足够大的值,以适应预期负载(大型结构或批量数据),以避免拒绝合法请求。 -- 引入版本:v3.2.0 +- 默认值: -1 +- 类型: Int +- 单位: Bytes +- 可变: No +- 描述: 控制构建服务器 Thrift 协议时使用的 Thrift RPC 消息体最大允许大小(字节)(传递给 `ThriftServer` 中的 TBinaryProtocol.Factory)。值为 `-1` 表示禁用限制(无界)。设置正值会强制执行上限,以便大于此值的消息被 Thrift 层拒绝,这有助于限制内存使用并缓解超大请求或 DoS 风险。将其设置为足够大的值以适应预期负载(大型结构或批量数据),以避免拒绝合法请求。 +- 引入版本: v3.2.0 ##### thrift_server_max_worker_threads -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE 节点中 Thrift 服务器支持的最大工作线程数。 -- 引入版本:- +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE 节点中 Thrift 服务器支持的最大工作线程数。 +- 引入版本: - ##### thrift_server_queue_size -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:请求待处理队列的长度。如果 Thrift 服务器中正在处理的线程数超过 `thrift_server_max_worker_threads` 中指定的值,则新请求将添加到待处理队列中。 -- 引入版本:- +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 请求挂起队列的长度。如果 thrift 服务器中正在处理的线程数超过 `thrift_server_max_worker_threads` 中指定的值,新请求将被添加到挂起队列。 +- 引入版本: - ### 元数据和集群管理 ##### alter_max_worker_queue_size -- 默认值:4096 -- 类型:Int -- 单位:任务 -- 是否可变:否 -- 描述:控制 alter 子系统使用的内部 worker 线程池队列的容量。它与 `alter_max_worker_threads` 一起传递给 `AlterHandler` 中的 `ThreadPoolManager.newDaemonCacheThreadPool`。当待处理的 alter 任务数量超过 `alter_max_worker_queue_size` 时,新的提交将被拒绝,并可能抛出 `RejectedExecutionException`(参见 `AlterHandler.handleFinishAlterTask`)。调整此值以平衡内存使用和您允许的并发 alter 任务的积压量。 -- 引入版本:v3.2.0 +- 默认值: 4096 +- 类型: Int +- 单位: Tasks +- 可变: No +- 描述: 控制 alter 子系统使用的内部工作线程池队列的容量。它与 `alter_max_worker_threads` 一起传递给 `AlterHandler` 中的 `ThreadPoolManager.newDaemonCacheThreadPool`。当待处理 alter 任务的数量超过 `alter_max_worker_queue_size` 时,新的提交将被拒绝,并可能抛出 `RejectedExecutionException`(参见 `AlterHandler.handleFinishAlterTask`)。调整此值以平衡内存使用和允许的并发 alter 任务的积压量。 +- 引入版本: v3.2.0 ##### alter_max_worker_threads -- 默认值:4 -- 类型:Int -- 单位:线程 -- 是否可变:否 -- 描述:设置 AlterHandler 线程池中 worker 线程的最大数量。AlterHandler 使用此值构建执行器来运行和完成与 alter 相关的任务(例如,通过 handleFinishAlterTask 提交 AlterReplicaTask)。此值限制并发执行 alter 操作;增加它会提高并行度并增加资源使用,降低它会限制并发 alter 并可能成为瓶颈。执行器与 `alter_max_worker_queue_size` 一起创建,并且处理程序调度使用 `alter_scheduler_interval_millisecond`。 -- 引入版本:v3.2.0 +- 默认值: 4 +- 类型: Int +- 单位: Threads +- 可变: No +- 描述: 设置 AlterHandler 线程池中的最大工作线程数。AlterHandler 使用此值构造执行器来运行和完成与 alter 相关的任务(例如,通过 handleFinishAlterTask 提交 `AlterReplicaTask`)。此值限制了 alter 操作的并发执行;增加它会提高并行度并增加资源使用,降低它会限制并发 alter 并可能成为瓶颈。执行器与 `alter_max_worker_queue_size` 一起创建,并且处理程序调度使用 `alter_scheduler_interval_millisecond`。 +- 引入版本: v3.2.0 ##### automated_cluster_snapshot_interval_seconds -- 默认值:600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:触发自动集群快照任务的时间间隔。 -- 引入版本:v3.4.2 +- 默认值: 600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 自动集群快照任务触发的间隔。 +- 引入版本: v3.4.2 ##### background_refresh_metadata_interval_millis -- 默认值:600000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:两次连续 Hive 元数据缓存刷新之间的时间间隔。 -- 引入版本:v2.5.5 +- 默认值: 600000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 两次连续 Hive 元数据缓存刷新的间隔。 +- 引入版本: v2.5.5 ##### background_refresh_metadata_time_secs_since_last_access_secs -- 默认值:3600 * 24 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:Hive 元数据刷新任务的过期时间。对于已访问的 Hive Catalog,如果超过指定时间未访问,StarRocks 将停止刷新其缓存的元数据。对于未访问的 Hive Catalog,StarRocks 将不会刷新其缓存的元数据。 -- 引入版本:v2.5.5 +- 默认值: 3600 * 24 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: Hive 元数据刷新任务的过期时间。对于已访问的 Hive Catalog,如果超过指定时间未访问,StarRocks 将停止刷新其缓存的元数据。对于未访问的 Hive Catalog,StarRocks 不会刷新其缓存的元数据。 +- 引入版本: v2.5.5 ##### bdbje_cleaner_threads -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:StarRocks 日志使用的 Berkeley DB Java Edition (JE) 环境的后台清理线程数。此值在 `BDBEnvironment.initConfigs` 中的环境初始化期间读取,并使用 `Config.bdbje_cleaner_threads` 应用于 `EnvironmentConfig.CLEANER_THREADS`。它控制 JE 日志清理和空间回收的并行度;增加它可以加速清理,但会以额外的 CPU 和 I/O 干扰前台操作为代价。更改仅在 BDB 环境(重新)初始化时生效,因此需要重启前端才能应用新值。 -- 引入版本:v3.2.0 +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: No +- 描述: StarRocks 日志使用的 Berkeley DB Java Edition (JE) 环境的后台清理线程数。此值在 `BDBEnvironment.initConfigs` 中环境初始化期间读取,并使用 `Config.bdbje_cleaner_threads` 应用于 `EnvironmentConfig.CLEANER_THREADS`。它控制 JE 日志清理和空间回收的并行度;增加它可以加快清理速度,但代价是增加 CPU 和 I/O 对前台操作的干扰。更改仅在 BDB 环境(重新)初始化时生效,因此需要重新启动前端才能应用新值。 +- 引入版本: v3.2.0 ##### bdbje_heartbeat_timeout_second -- 默认值:30 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:StarRocks 集群中 Leader、Follower 和 Observer FE 之间心跳超时的时间量。 -- 引入版本:- +- 默认值: 30 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: StarRocks 集群中 Leader、Follower 和 Observer FE 之间心跳超时的时间量。 +- 引入版本: - ##### bdbje_lock_timeout_second -- 默认值:1 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:基于 BDB JE 的 FE 中的锁超时的时间量。 -- 引入版本:- +- 默认值: 1 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: 基于 BDB JE 的 FE 中的锁超时时间。 +- 引入版本: - ##### bdbje_replay_cost_percent -- 默认值:150 -- 类型:Int -- 单位:百分比 -- 是否可变:否 -- 描述:设置从 BDB JE 日志重放事务相对于通过网络恢复获取相同数据的相对成本(百分比)。该值提供给底层 JE 复制参数 REPLAY_COST_PERCENT,通常 `>100` 表示重放通常比网络恢复更昂贵。当决定是否保留清理的日志文件以进行潜在重放时,系统将重放成本乘以日志大小与网络恢复的成本进行比较;如果网络恢复被判断为更有效,则文件将被删除。值为 0 会禁用基于此成本比较的保留。在 `REP_STREAM_TIMEOUT` 内或任何活动复制所需的日志文件始终保留。 -- 引入版本:v3.2.0 +- 默认值: 150 +- 类型: Int +- 单位: Percent +- 可变: No +- 描述: 设置从 BDB JE 日志重放事务与通过网络恢复获取相同数据的相对成本(百分比)。该值提供给底层 JE 复制参数 REPLAY_COST_PERCENT,通常 `>100` 表示重放通常比网络恢复更昂贵。当决定是否保留已清理的日志文件以进行潜在重放时,系统将重放成本乘以日志大小与网络恢复成本进行比较;如果网络恢复被判断为更高效,则将删除文件。值为 0 会禁用基于此成本比较的保留。在 `REP_STREAM_TIMEOUT` 内或任何活动复制所需的日志文件始终会保留。 +- 引入版本: v3.2.0 ##### bdbje_replica_ack_timeout_second -- 默认值:10 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:当元数据从 Leader FE 写入 Follower FE 时,Leader FE 可以等待指定数量的 Follower FE 返回 ACK 消息的最长时间。单位:秒。如果正在写入大量元数据,Follower FE 需要很长时间才能向 Leader FE 返回 ACK 消息,从而导致 ACK 超时。在这种情况下,元数据写入失败,FE 进程退出。我们建议您增加此参数的值以防止这种情况。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: Leader FE 将元数据写入 Follower FE 时,Leader FE 等待指定数量的 Follower FE 返回 ACK 消息的最长时间。单位:秒。如果正在写入大量元数据,Follower FE 需要很长时间才能向 Leader FE 返回 ACK 消息,从而导致 ACK 超时。在这种情况下,元数据写入失败,FE 进程退出。我们建议您增加此参数的值以防止这种情况。 +- 引入版本: - ##### bdbje_reserved_disk_size -- 默认值:512 * 1024 * 1024 (536870912) -- 类型:Long -- 单位:字节 -- 是否可变:否 -- 描述:限制 Berkeley DB JE 将保留为“未保护”(可删除)日志/数据文件的字节数。StarRocks 通过 `EnvironmentConfig.RESERVED_DISK` 在 BDBEnvironment 中将此值传递给 JE;JE 的内置默认值为 0(无限制)。StarRocks 的默认值(512 MiB)可防止 JE 为未保护文件保留过多磁盘空间,同时允许安全清理过时文件。在磁盘受限的系统上调整此值:减少它可以让 JE 更快地释放更多文件,增加它可以让 JE 保留更多保留空间。更改需要重启进程才能生效。 -- 引入版本:v3.2.0 +- 默认值: 512 * 1024 * 1024 (536870912) +- 类型: Long +- 单位: Bytes +- 可变: No +- 描述: 限制 Berkeley DB JE 将保留为“未受保护”(可删除)日志/数据文件的字节数。StarRocks 通过 `BDBEnvironment` 中的 `EnvironmentConfig.RESERVED_DISK` 将此值传递给 JE;JE 的内置默认值为 0(无限制)。StarRocks 的默认值(512 MiB)可防止 JE 为未受保护的文件保留过多磁盘空间,同时允许安全清理过时文件。在磁盘受限的系统上调整此值:减小它允许 JE 更早释放更多文件,增加它允许 JE 保留更多保留空间。更改需要重新启动进程才能生效。 +- 引入版本: v3.2.0 ##### bdbje_reset_election_group -- 默认值:false -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:是否重置 BDBJE 复制组。如果此参数设置为 `TRUE`,FE 将重置 BDBJE 复制组(即删除所有可选举 FE 节点的信息),并作为 Leader FE 启动。重置后,此 FE 将是集群中唯一的成员,其他 FE 可以通过使用 `ALTER SYSTEM ADD/DROP FOLLOWER/OBSERVER 'xxx'` 重新加入此集群。仅当由于大多数 Follower FE 的数据已损坏而无法选举 Leader FE 时才使用此设置。`reset_election_group` 用于替换 `metadata_failure_recovery`。 -- 引入版本:- +- 默认值: false +- 类型: String +- 单位: - +- 可变: No +- 描述: 是否重置 BDBJE 复制组。如果此参数设置为 `TRUE`,FE 将重置 BDBJE 复制组(即删除所有可选 FE 节点的信息)并作为 Leader FE 启动。重置后,此 FE 将是集群中唯一的成员,其他 FE 可以通过使用 `ALTER SYSTEM ADD/DROP FOLLOWER/OBSERVER 'xxx'` 重新加入此集群。仅当由于大多数 Follower FE 的数据已损坏而无法选举 Leader FE 时才使用此设置。`reset_election_group` 用于替换 `metadata_failure_recovery`。 +- 引入版本: - ##### black_host_connect_failures_within_time -- 默认值:5 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:黑名单 BE 节点允许的连接失败阈值。如果 BE 节点被自动添加到 BE 黑名单,StarRocks 将评估其连接性并判断是否可以将其从 BE 黑名单中移除。在 `black_host_history_sec` 内,只有当黑名单 BE 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从 BE 黑名单中移除。 -- 引入版本:v3.3.0 +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 允许列入黑名单的 BE 节点的连接失败阈值。如果 BE 节点被自动添加到 BE 黑名单,StarRocks 将评估其连接性并判断是否可以从 BE 黑名单中删除它。在 `black_host_history_sec` 内,只有当列入黑名单的 BE 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,它才能从 BE 黑名单中删除。 +- 引入版本: v3.3.0 ##### black_host_history_sec -- 默认值:2 * 60 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:保留 BE 黑名单中 BE 节点历史连接失败的时间长度。如果 BE 节点被自动添加到 BE 黑名单,StarRocks 将评估其连接性并判断是否可以将其从 BE 黑名单中移除。在 `black_host_history_sec` 内,只有当黑名单 BE 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,才能将其从 BE 黑名单中移除。 -- 引入版本:v3.3.0 +- 默认值: 2 * 60 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 保留在 BE 黑名单中的 BE 节点历史连接失败的时间持续时间。如果 BE 节点被自动添加到 BE 黑名单,StarRocks 将评估其连接性并判断是否可以从 BE 黑名单中删除它。在 `black_host_history_sec` 内,只有当列入黑名单的 BE 节点的连接失败次数少于 `black_host_connect_failures_within_time` 中设置的阈值时,它才能从 BE 黑名单中删除。 +- 引入版本: v3.3.0 ##### brpc_connection_pool_size -- 默认值:16 -- 类型:Int -- 单位:连接 -- 是否可变:否 -- 描述:FE 的 BrpcProxy 每个端点使用的最大 BRPC 连接池数量。此值通过 `setMaxTotoal` 和 `setMaxIdleSize` 应用于 RpcClientOptions,因此它直接限制并发传出的 BRPC 请求,因为每个请求都必须从池中借用一个连接。在高并发场景中,增加此值以避免请求排队;增加它会提高套接字和内存使用率,并可能增加远程服务器负载。调整时,请考虑相关设置,如 `brpc_idle_wait_max_time`、`brpc_short_connection`、`brpc_inner_reuse_pool`、`brpc_reuse_addr` 和 `brpc_min_evictable_idle_time_ms`。更改此值不可热重载,需要重启。 -- 引入版本:v3.2.0 +- 默认值: 16 +- 类型: Int +- 单位: Connections +- 可变: No +- 描述: FE 的 BrpcProxy 每个端点使用的池化 BRPC 连接的最大数量。此值通过 `setMaxTotoal` 和 `setMaxIdleSize` 应用于 RpcClientOptions,因此它直接限制并发传出 BRPC 请求,因为每个请求都必须从池中借用一个连接。在高并发场景中增加此值以避免请求排队;增加它会提高套接字和内存使用,并可能增加远程服务器负载。调整时,请考虑相关设置,如 `brpc_idle_wait_max_time`、`brpc_short_connection`、`brpc_inner_reuse_pool`、`brpc_reuse_addr` 和 `brpc_min_evictable_idle_time_ms`。更改此值不支持热加载,需要重新启动。 +- 引入版本: v3.2.0 ##### brpc_short_connection -- 默认值:false -- 类型:boolean -- 单位:- -- 是否可变:否 -- 描述:控制底层 brpc RpcClient 是否使用短生命周期连接。启用时 (`true`),RpcClientOptions.setShortConnection 将被设置,并且连接在请求完成后关闭,从而以更高的连接设置开销和增加的延迟为代价减少长生命周期套接字的数量。禁用时 (`false`,默认值) 使用持久连接和连接池。启用此选项会影响连接池行为,应与 `brpc_connection_pool_size`、`brpc_idle_wait_max_time`、`brpc_min_evictable_idle_time_ms`、`brpc_reuse_addr` 和 `brpc_inner_reuse_pool` 一起考虑。对于典型的高吞吐量部署,请保持禁用状态;仅在需要限制套接字生命周期或网络策略要求短连接时才启用。 -- 引入版本:v3.3.11, v3.4.1, v3.5.0 +- 默认值: false +- 类型: boolean +- 单位: - +- 可变: No +- 描述: 控制底层 brpc RpcClient 是否使用短连接。启用时 (`true`),RpcClientOptions.setShortConnection 设置为 true,请求完成后连接关闭,从而减少长连接套接字的数量,但代价是更高的连接建立开销和增加的延迟。禁用时(`false`,默认),使用持久连接和连接池。启用此选项会影响连接池行为,应与 `brpc_connection_pool_size`、`brpc_idle_wait_max_time`、`brpc_min_evictable_idle_time_ms`、`brpc_reuse_addr` 和 `brpc_inner_reuse_pool` 一起考虑。对于典型的高吞吐量部署,请保持禁用状态;仅在需要限制套接字生命周期或网络策略要求短连接时才启用。 +- 引入版本: v3.3.11, v3.4.1, v3.5.0 ##### catalog_try_lock_timeout_ms -- 默认值:5000 -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:获取全局锁的超时时长。 -- 引入版本:- +- 默认值: 5000 +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: 获取全局锁的超时时长。 +- 引入版本: - ##### checkpoint_only_on_leader -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当 `true` 时,CheckpointController 将只选择 Leader FE 作为检查点 worker;当 `false` 时,控制器可能会选择任何前端并优先选择堆使用率较低的节点。当 `false` 时,worker 按最近的失败时间排序,并按 `heapUsedPercent` 排序(Leader 被视为具有无限使用率以避免选择它)。对于需要集群快照元数据的操作,控制器已经强制选择 Leader,无论此标志如何。启用 `true` 会将检查点工作集中到 Leader 上(更简单,但会增加 Leader 的 CPU/内存和网络负载);保持 `false` 会将检查点负载分配给负载较低的 FE。此设置影响 worker 选择以及与超时(如 `checkpoint_timeout_seconds`)和 RPC 设置(如 `thrift_rpc_timeout_ms`)的交互。 -- 引入版本:v3.4.0, v3.5.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当 `true` 时,CheckpointController 将只选择 Leader FE 作为 checkpoint 工作线程;当 `false` 时,控制器可能会选择任何前端,并且倾向于堆使用率较低的节点。当为 `false` 时,工作线程按最近的失败时间和 `heapUsedPercent` 排序(Leader 被视为具有无限使用率以避免选择它)。对于需要集群快照元数据的操作,控制器无论此标志如何都会强制选择 Leader。启用 `true` 会将 checkpoint 工作集中在 Leader 上(更简单,但增加了 Leader CPU/内存和网络负载);保持 `false` 会将 checkpoint 负载分配给负载较小的 FE。此设置影响工作线程选择以及与 `checkpoint_timeout_seconds` 等超时和 `thrift_rpc_timeout_ms` 等 RPC 设置的交互。 +- 引入版本: v3.4.0, v3.5.0 ##### checkpoint_timeout_seconds -- 默认值:24 * 3600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:Leader 的 CheckpointController 等待检查点 worker 完成检查点的最长时间(秒)。控制器将此值转换为纳秒并轮询 worker 结果队列;如果在超时时间内未收到成功完成,则检查点被视为失败,并且 createImage 返回失败。增加此值可适应长时间运行的检查点,但会延迟故障检测和后续镜像传播;减少此值会导致更快的故障转移/重试,但可能会对慢 worker 产生错误超时。此设置仅控制 `CheckpointController` 在检查点创建期间的等待时间,不改变 worker 的内部检查点行为。 -- 引入版本:v3.4.0, v3.5.0 +- 默认值: 24 * 3600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: Leader 的 CheckpointController 等待 checkpoint 工作线程完成 checkpoint 的最长时间(秒)。控制器将此值转换为纳秒并轮询工作线程结果队列;如果在此超时内未收到成功的完成,则 checkpoint 被视为失败,并返回失败。增加此值可适应更长时间运行的 checkpoint,但会延迟故障检测和随后的镜像传播;减少此值会导致更快的故障转移/重试,但可能对慢速工作线程产生虚假超时。此设置仅控制 `CheckpointController` 在 checkpoint 创建期间的等待时间,不改变工作线程的内部 checkpoint 行为。 +- 引入版本: v3.4.0, v3.5.0 ##### db_used_data_quota_update_interval_secs -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:数据库已用数据配额更新的时间间隔。StarRocks 定期更新所有数据库的已用数据配额,以跟踪存储消耗。此值用于配额强制和指标收集。允许的最小间隔为 30 秒,以防止系统负载过高。小于 30 的值将被拒绝。 -- 引入版本:- +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 数据库已用数据配额更新的间隔。StarRocks 定期更新所有数据库的已用数据配额以跟踪存储消耗。此值用于配额强制执行和指标收集。允许的最小间隔为 30 秒,以防止过高的系统负载。小于 30 的值将被拒绝。 +- 引入版本: - ##### drop_backend_after_decommission -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:BE 退役后是否删除该 BE。`TRUE` 表示 BE 退役后立即删除该 BE。`FALSE` 表示 BE 退役后不删除该 BE。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: BE 节点下线后是否删除 BE。`TRUE` 表示 BE 节点下线后立即删除 BE。`FALSE` 表示 BE 节点下线后不删除 BE。 +- 引入版本: - ##### edit_log_port -- 默认值:9010 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:集群中 Leader、Follower 和 Observer FE 之间通信使用的端口。 -- 引入版本:- +- 默认值: 9010 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 集群中 Leader、Follower 和 Observer FE 之间用于通信的端口。 +- 引入版本: - ##### edit_log_roll_num -- 默认值:50000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:在为日志条目创建日志文件之前可以写入的最大元数据日志条目数。此参数用于控制日志文件的大小。新日志文件写入 BDBJE 数据库。 -- 引入版本:- +- 默认值: 50000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 在为日志条目创建日志文件之前可以写入的最大元数据日志条目数。此参数用于控制日志文件的大小。新日志文件写入 BDBJE 数据库。 +- 引入版本: - ##### edit_log_type -- 默认值:BDB -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:可以生成的编辑日志类型。将值设置为 `BDB`。 -- 引入版本:- +- 默认值: BDB +- 类型: String +- 单位: - +- 可变: No +- 描述: 可以生成的编辑日志的类型。将值设置为 `BDB`。 +- 引入版本: - ##### enable_background_refresh_connector_metadata -- 默认值:v3.0 及更高版本中为 true,v2.5 中为 false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用周期性 Hive 元数据缓存刷新。启用后,StarRocks 会轮询 Hive 集群的 Metastore(Hive Metastore 或 AWS Glue),并刷新常用 Hive Catalog 的缓存元数据,以感知数据变化。`true` 表示启用 Hive 元数据缓存刷新,`false` 表示禁用。 -- 引入版本:v2.5.5 +- 默认值: v3.0 及更高版本为 true,v2.5 为 false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用周期性 Hive 元数据缓存刷新。启用后,StarRocks 会轮询 Hive 集群的 metastore (Hive Metastore 或 AWS Glue),并刷新频繁访问的 Hive Catalog 的缓存元数据,以感知数据变化。`true` 表示启用 Hive 元数据缓存刷新,`false` 表示禁用。 +- 引入版本: v2.5.5 ##### enable_collect_query_detail_info -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否收集查询的 Profile。如果此参数设置为 `TRUE`,系统将收集查询的 Profile。如果此参数设置为 `FALSE`,系统将不收集查询的 Profile。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否收集查询的 profile。如果此参数设置为 `TRUE`,系统将收集查询的 profile。如果此参数设置为 `FALSE`,系统将不收集查询的 profile。 +- 引入版本: - ##### enable_create_partial_partition_in_batch -- 默认值:false -- 类型:boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `false`(默认)时,StarRocks 强制批处理创建的范围分区与标准时间单位边界对齐。它将拒绝非对齐范围以避免创建空洞。将此项设置为 `true` 会禁用该对齐检查,并允许在批处理中创建部分(非标准)分区,这可能产生间隙或错位的分区范围。仅当您有意需要部分批处理分区并接受相关风险时才应将其设置为 `true`。 -- 引入版本:v3.2.0 +- 默认值: false +- 类型: boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `false`(默认)时,StarRocks 强制批量创建的范围分区与标准时间单位边界对齐。它将拒绝非对齐范围以避免创建空洞。将此项设置为 `true` 会禁用该对齐检查,并允许批量创建部分(非标准)分区,这可能会产生间隙或错位的分区范围。您应仅在有意需要部分批量分区并接受相关风险时才将其设置为 `true`。 +- 引入版本: v3.2.0 ##### enable_internal_sql -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:当此项设置为 `true` 时,内部组件(例如 SimpleExecutor)执行的内部 SQL 语句将被保留并写入内部审计或日志消息(如果 `enable_sql_desensitize_in_log` 设置为 true,还可以进一步脱敏)。当设置为 `false` 时,内部 SQL 文本将被抑制:格式化代码 (SimpleExecutor.formatSQL) 返回 "?",并且实际语句不会发出到内部审计或日志消息中。此配置不改变内部语句的执行语义 — 它仅控制内部 SQL 的日志记录和可见性,以用于隐私或安全目的。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 当此项设置为 `true` 时,由内部组件(例如 SimpleExecutor)执行的内部 SQL 语句将被保留并写入内部审计或日志消息(如果 `enable_sql_desensitize_in_log` 设置为 true,则可以进一步脱敏)。当设置为 `false` 时,内部 SQL 文本将被抑制:格式化代码 (SimpleExecutor.formatSQL) 返回 "?",并且实际语句不会发出到内部审计或日志消息。此配置不改变内部语句的执行语义——它只控制内部 SQL 的日志记录和可见性,以用于隐私或安全目的。 +- 引入版本: - ##### enable_legacy_compatibility_for_replication -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用复制的旧版兼容性。StarRocks 在旧版本和新版本之间可能表现不同,导致跨集群数据迁移出现问题。因此,在数据迁移之前,必须为目标集群启用旧版兼容性,并在数据迁移完成后禁用它。`true` 表示启用此模式。 -- 引入版本:v3.1.10, v3.2.6 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用复制的旧版兼容性。StarRocks 在旧版本和新版本之间可能表现不同,导致跨集群数据迁移出现问题。因此,在数据迁移之前,您必须为目标集群启用旧版兼容性,并在数据迁移完成后禁用它。`true` 表示启用此模式。 +- 引入版本: v3.1.10, v3.2.6 ##### enable_show_materialized_views_include_all_task_runs -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:控制 SHOW MATERIALIZED VIEWS 命令返回 TaskRun 的方式。当此项设置为 `false` 时,StarRocks 仅返回每个任务的最新 TaskRun(为兼容性而保留的旧版行为)。当设置为 `true`(默认值)时,`TaskManager` 可能会仅在 TaskRun 共享相同的起始 TaskRun ID(例如,属于同一个作业)时包含同一任务的额外 TaskRun,从而防止不相关的重复运行出现,同时允许显示与一个作业相关的多个状态。将此项设置为 `false` 以恢复单次运行输出,或显示多运行作业历史记录以进行调试和监控。 -- 引入版本:v3.3.0, v3.4.0, v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 控制如何将 TaskRun 返回给 SHOW MATERIALIZED VIEWS 命令。当此项设置为 `false` 时,StarRocks 仅返回每个任务的最新 TaskRun(为兼容性而保留的旧行为)。当设置为 `true`(默认)时,`TaskManager` 仅在它们共享相同的启动 TaskRun ID(例如,属于同一作业)时才可能包含同一任务的额外 TaskRun,从而防止不相关的重复运行出现,同时允许显示与一个作业相关的多个状态。将此项设置为 `false` 以恢复单次运行输出,或显示多运行作业历史以进行调试和监视。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### enable_statistics_collect_profile -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否为统计查询生成 Profile。您可以将此项设置为 `true`,以允许 StarRocks 为系统统计查询生成查询 Profile。 -- 引入版本:v3.1.5 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为统计查询生成 profile。您可以将此项设置为 `true`,以允许 StarRocks 为系统统计信息查询生成查询 profile。 +- 引入版本: v3.1.5 ##### enable_table_name_case_insensitive -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否对 Catalog 名称、数据库名称、表名称、视图名称和物化视图名称启用不区分大小写处理。目前,表名称默认区分大小写。 - - 启用此功能后,所有相关名称将以小写形式存储,所有包含这些名称的 SQL 命令将自动将其转换为小写。 - - 仅在创建集群时才能启用此功能。**集群启动后,此配置的值不能以任何方式修改**。任何修改尝试都会导致错误。当 FE 检测到此配置项的值与集群首次启动时不一致时,FE 将无法启动。 - - 目前,此功能不支持 JDBC Catalog 和表名称。如果您想对 JDBC 或 ODBC 数据源执行不区分大小写处理,请不要启用此功能。 -- 引入版本:v4.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否对 Catalog 名称、数据库名称、表名称、视图名称和物化视图名称启用大小写不敏感处理。目前,表名称默认区分大小写。 + - 启用此功能后,所有相关名称都将以小写形式存储,并且所有包含这些名称的 SQL 命令将自动将其转换为小写。 + - 您只能在创建集群时启用此功能。**集群启动后,此配置的值不能以任何方式修改**。任何修改尝试都将导致错误。当 FE 检测到此配置项的值与集群首次启动时不一致时,FE 将无法启动。 + - 目前,此功能不支持 JDBC Catalog 和表名称。如果您想对 JDBC 或 ODBC 数据源执行大小写不敏感处理,请不要启用此功能。 +- 引入版本: v4.0 ##### enable_task_history_archive -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用时,完成的任务运行记录将被存档到持久化任务运行历史表,并记录到编辑日志中,以便查找(例如 `lookupHistory`、`lookupHistoryByTaskNames`、`lookupLastJobOfTasks`)包含存档结果。存档由 FE Leader 执行,并在单元测试期间跳过 (`FeConstants.runningUnitTest`)。启用时,内存中过期和强制 GC 路径将被绕过(代码从 `removeExpiredRuns` 和 `forceGC` 提前返回),因此保留/驱逐由持久存档处理,而不是 `task_runs_ttl_second` 和 `task_runs_max_history_number`。禁用时,历史记录保留在内存中,并由这些配置进行修剪。 -- 引入版本:v3.3.1, v3.4.0, v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用后,已完成的任务运行记录将被归档到持久化任务运行历史表,并记录到编辑日志中,以便查找(例如 `lookupHistory`、`lookupHistoryByTaskNames`、`lookupLastJobOfTasks`)包括归档结果。归档由 FE Leader 执行,并在单元测试期间跳过 (`FeConstants.runningUnitTest`)。启用后,内存中过期和强制 GC 路径将被绕过(代码会从 `removeExpiredRuns` 和 `forceGC` 中提前返回),因此保留/驱逐由持久化归档处理,而不是 `task_runs_ttl_second` 和 `task_runs_max_history_number`。禁用后,历史记录保留在内存中,并由这些配置进行修剪。 +- 引入版本: v3.3.1, v3.4.0, v3.5.0 ##### enable_task_run_fe_evaluation -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用时,FE 将对 `TaskRunsSystemTable.supportFeEvaluation` 中的系统表 `task_runs` 执行本地评估。FE 端评估仅允许对将列与常量进行比较的合取相等谓词,并且仅限于 `QUERY_ID` 和 `TASK_NAME` 列。启用此功能可通过避免更广泛的扫描或额外的远程处理来提高目标查找的性能;禁用此功能会强制规划器跳过 `task_runs` 的 FE 评估,这可能会减少谓词修剪并影响这些过滤器的查询延迟。 -- 引入版本:v3.3.13, v3.4.3, v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用后,FE 将在 `TaskRunsSystemTable.supportFeEvaluation` 中对系统表 `task_runs` 执行本地评估。FE 侧评估仅允许对将列与常量进行比较的合取相等谓词,并且仅限于列 `QUERY_ID` 和 `TASK_NAME`。启用此功能可提高定向查找的性能,避免更广泛的扫描或额外的远程处理;禁用此功能会强制规划器跳过 `task_runs` 的 FE 评估,这可能会减少谓词修剪并影响这些过滤器的查询延迟。 +- 引入版本: v3.3.13, v3.4.3, v3.5.0 ##### heartbeat_mgr_blocking_queue_size -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:Heartbeat Manager 运行心跳任务的阻塞队列的大小。 -- 引入版本:- +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: No +- 描述: Heartbeat Manager 运行的心跳任务存储的阻塞队列的大小。 +- 引入版本: - ##### heartbeat_mgr_threads_num -- 默认值:8 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:Heartbeat Manager 运行心跳任务的线程数。 -- 引入版本:- +- 默认值: 8 +- 类型: Int +- 单位: - +- 可变: No +- 描述: Heartbeat Manager 可运行以运行心跳任务的线程数。 +- 引入版本: - ##### ignore_materialized_view_error -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:FE 是否忽略物化视图错误导致的元数据异常。如果 FE 因物化视图错误导致的元数据异常而无法启动,您可以将此参数设置为 `true`,以允许 FE 忽略此异常。 -- 引入版本:v2.5.10 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: FE 是否忽略物化视图错误导致的元数据异常。如果 FE 因物化视图错误导致的元数据异常而无法启动,您可以将此参数设置为 `true` 以允许 FE 忽略该异常。 +- 引入版本: v2.5.10 ##### ignore_meta_check -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:非 Leader FE 是否忽略 Leader FE 的元数据差异。如果值为 TRUE,非 Leader FE 将忽略 Leader FE 的元数据差异,并继续提供数据读取服务。此参数可确保即使您长时间停止 Leader FE,也能持续提供数据读取服务。如果值为 FALSE,非 Leader FE 不会忽略 Leader FE 的元数据差异,并停止提供数据读取服务。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 非 Leader FE 是否忽略 Leader FE 的元数据差距。如果值为 TRUE,非 Leader FE 忽略 Leader FE 的元数据差距并继续提供数据读取服务。此参数即使在长时间停止 Leader FE 时也能确保连续数据读取服务。如果值为 FALSE,非 Leader FE 不忽略 Leader FE 的元数据差距并停止提供数据读取服务。 +- 引入版本: - ##### ignore_task_run_history_replay_error -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当 StarRocks 为 `information_schema.task_runs` 反序列化 TaskRun 历史行时,损坏或无效的 JSON 行通常会导致反序列化记录警告并抛出 RuntimeException。如果此项设置为 `true`,系统将捕获反序列化错误,跳过格式错误的记录,并继续处理剩余行,而不是使查询失败。这将使 `information_schema.task_runs` 查询能够容忍 `_statistics_.task_run_history` 表中的错误条目。请注意,启用它将静默删除损坏的历史记录(潜在数据丢失),而不是显式报错。 -- 引入版本:v3.3.3, v3.4.0, v3.5.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当 StarRocks 反序列化 `information_schema.task_runs` 的 TaskRun 历史行时,损坏或无效的 JSON 行通常会导致反序列化记录警告并抛出 RuntimeException。如果此项设置为 `true`,系统将捕获反序列化错误,跳过格式错误的记录,并继续处理剩余行,而不是使查询失败。这将使 `information_schema.task_runs` 查询能够容忍 `_statistics_.task_run_history` 表中的错误条目。请注意,启用它将静默删除损坏的历史记录(潜在数据丢失),而不是显示明确的错误。 +- 引入版本: v3.3.3, v3.4.0, v3.5.0 ##### lock_checker_interval_second -- 默认值:30 -- 类型:long -- 单位:秒 -- 是否可变:是 -- 描述:LockChecker 前端守护进程(名为 "deadlock-checker")执行之间的时间间隔(秒)。该守护进程执行死锁检测和慢锁扫描;配置的值乘以 1000 以毫秒为单位设置计时器。减小此值可降低检测延迟,但会增加调度和 CPU 开销;增加此值可降低开销,但会延迟检测和慢锁报告。由于守护进程每次运行都会重置其间隔,因此更改在运行时生效。此设置与 `lock_checker_enable_deadlock_check`(启用死锁检查)和 `slow_lock_threshold_ms`(定义什么是慢锁)交互。 -- 引入版本:v3.2.0 +- 默认值: 30 +- 类型: long +- 单位: Seconds +- 可变: Yes +- 描述: LockChecker 前端守护程序(命名为“deadlock-checker”)执行之间的间隔(秒)。守护程序执行死锁检测和慢锁扫描;配置的值乘以 1000 以毫秒为单位设置计时器。减小此值会降低检测延迟但增加调度和 CPU 开销;增加此值会降低开销但延迟检测和慢锁报告。更改在运行时生效,因为守护程序每次运行都会重置其间隔。此设置与 `lock_checker_enable_deadlock_check`(启用死锁检查)和 `slow_lock_threshold_ms`(定义何谓慢锁)交互。 +- 引入版本: v3.2.0 ##### master_sync_policy -- 默认值:SYNC -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Leader FE 将日志刷新到磁盘的策略。此参数仅在当前 FE 为 Leader FE 时有效。有效值: - - `SYNC`:事务提交时,日志条目同时生成并刷新到磁盘。 - - `NO_SYNC`:事务提交时,日志条目的生成和刷新不同时发生。 - - `WRITE_NO_SYNC`:事务提交时,日志条目同时生成但不刷新到磁盘。 +- 默认值: SYNC +- 类型: String +- 单位: - +- 可变: No +- 描述: Leader FE 将日志刷新到磁盘的策略。此参数仅在当前 FE 是 Leader FE 时有效。有效值: + - `SYNC`: 事务提交时,同时生成日志条目并刷新到磁盘。 + - `NO_SYNC`: 事务提交时,日志条目的生成和刷新不同时发生。 + - `WRITE_NO_SYNC`: 事务提交时,日志条目同时生成但不刷新到磁盘。 - 如果您只部署了一个 Follower FE,建议将此参数设置为 `SYNC`。如果您部署了三个或更多 Follower FE,建议将此参数和 `replica_sync_policy` 都设置为 `WRITE_NO_SYNC`。 + 如果您只部署了一个 Follower FE,建议您将此参数设置为 `SYNC`。如果您部署了三个或更多 Follower FE,建议您将此参数和 `replica_sync_policy` 都设置为 `WRITE_NO_SYNC`。 -- 引入版本:- +- 引入版本: - ##### max_bdbje_clock_delta_ms -- 默认值:5000 -- 类型:Long -- 单位:毫秒 -- 是否可变:否 -- 描述:StarRocks 集群中 Leader FE 与 Follower 或 Observer FE 之间允许的最大时钟偏移。 -- 引入版本:- +- 默认值: 5000 +- 类型: Long +- 单位: Milliseconds +- 可变: No +- 描述: StarRocks 集群中 Leader FE 与 Follower 或 Observer FE 之间允许的最大时钟偏移。 +- 引入版本: - ##### meta_delay_toleration_second -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:Follower 和 Observer FE 上的元数据与 Leader FE 上的元数据之间允许的最大延迟持续时间。单位:秒。如果超过此持续时间,非 Leader FE 将停止提供服务。 -- 引入版本:- +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: Follower 和 Observer FE 上的元数据可以落后于 Leader FE 上的元数据的最大持续时间。单位:秒。如果超过此持续时间,非 Leader FE 将停止提供服务。 +- 引入版本: - ##### meta_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/meta" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储元数据的目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/meta" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储元数据的目录。 +- 引入版本: - ##### metadata_ignore_unknown_operation_type -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否忽略未知日志 ID。当 FE 回滚时,早期版本的 FE 可能无法识别某些日志 ID。如果值为 `TRUE`,FE 将忽略未知日志 ID。如果值为 `FALSE`,FE 将退出。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否忽略未知日志 ID。当 FE 回滚时,早期版本的 FE 可能无法识别某些日志 ID。如果值为 `TRUE`,FE 忽略未知日志 ID。如果值为 `FALSE`,FE 退出。 +- 引入版本: - ##### profile_info_format -- 默认值:default -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:系统输出的 Profile 格式。有效值:`default` 和 `json`。当设置为 `default` 时,Profile 为默认格式。当设置为 `json` 时,系统以 JSON 格式输出 Profile。 -- 引入版本:v2.5 +- 默认值: default +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 系统输出的 Profile 格式。有效值: `default` 和 `json`。设置为 `default` 时,Profile 为默认格式。设置为 `json` 时,系统以 JSON 格式输出 Profile。 +- 引入版本: v2.5 ##### replica_ack_policy -- 默认值:SIMPLE_MAJORITY -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:日志条目被视为有效的策略。默认值 `SIMPLE_MAJORITY` 指定如果大多数 Follower FE 返回 ACK 消息,则日志条目被视为有效。 -- 引入版本:- +- 默认值: SIMPLE_MAJORITY +- 类型: String +- 单位: - +- 可变: No +- 描述: 日志条目被认为是有效的策略。默认值 `SIMPLE_MAJORITY` 指定如果大多数 Follower FE 返回 ACK 消息,则日志条目被认为是有效的。 +- 引入版本: - ##### replica_sync_policy -- 默认值:SYNC -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Follower FE 将日志刷新到磁盘的策略。此参数仅在当前 FE 为 Follower FE 时有效。有效值: - - `SYNC`:事务提交时,日志条目同时生成并刷新到磁盘。 - - `NO_SYNC`:事务提交时,日志条目的生成和刷新不同时发生。 - - `WRITE_NO_SYNC`:事务提交时,日志条目同时生成但不刷新到磁盘。 -- 引入版本:- +- 默认值: SYNC +- 类型: String +- 单位: - +- 可变: No +- 描述: Follower FE 将日志刷新到磁盘的策略。此参数仅在当前 FE 是 Follower FE 时有效。有效值: + - `SYNC`: 事务提交时,同时生成日志条目并刷新到磁盘。 + - `NO_SYNC`: 事务提交时,日志条目的生成和刷新不同时发生。 + - `WRITE_NO_SYNC`: 事务提交时,日志条目同时生成但不刷新到磁盘。 +- 引入版本: - ##### start_with_incomplete_meta -- 默认值:false -- 类型:boolean -- 单位:- -- 是否可变:否 -- 描述:当为 true 时,FE 将允许在镜像数据存在但 Berkeley DB JE (BDB) 日志文件丢失或损坏时启动。`MetaHelper.checkMetaDir()` 使用此标志绕过安全检查,否则会阻止在没有相应 BDB 日志的情况下从镜像启动;以这种方式启动可能会产生陈旧或不一致的元数据,并且仅应用于紧急恢复。`RestoreClusterSnapshotMgr` 在恢复集群快照时暂时将此标志设置为 true,然后回滚;该组件在恢复期间也会切换 `bdbje_reset_election_group`。在正常操作中不要启用 — 仅在从损坏的 BDB 数据恢复或显式恢复基于镜像的快照时才启用。 -- 引入版本:v3.2.0 +- 默认值: false +- 类型: boolean +- 单位: - +- 可变: No +- 描述: 当为 true 时,如果镜像数据存在但 Berkeley DB JE (BDB) 日志文件丢失或损坏,FE 将允许启动。`MetaHelper.checkMetaDir()` 使用此标志绕过安全检查,否则该检查会阻止在没有相应 BDB 日志的情况下从镜像启动;以这种方式启动可能会生成过时或不一致的元数据,并且只能用于紧急恢复。`RestoreClusterSnapshotMgr` 在恢复集群快照时暂时将此标志设置为 true,然后将其回滚;该组件在恢复期间也会切换 `bdbje_reset_election_group`。在正常操作中不要启用——仅在从损坏的 BDB 数据恢复或显式恢复基于镜像的快照时才启用。 +- 引入版本: v3.2.0 ##### table_keeper_interval_second -- 默认值:30 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:TableKeeper 守护进程执行之间的时间间隔(秒)。TableKeeperDaemon 使用此值(乘以 1000)设置其内部计时器,并定期运行 keeper 任务,以确保历史表存在,正确的表属性(复制数量)并更新分区 TTL。守护进程仅在 Leader 节点上执行工作,并在 `table_keeper_interval_second` 更改时通过 setInterval 更新其运行时间隔。增加以减少调度频率和负载;减少以更快地响应缺失或陈旧的历史表。 -- 引入版本:v3.3.1, v3.4.0, v3.5.0 +- 默认值: 30 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: TableKeeper 守护程序执行之间的间隔(秒)。TableKeeperDaemon 使用此值(乘以 1000)设置其内部计时器,并定期运行 keeper 任务,以确保历史表存在、正确的表属性(复制号)并更新分区 TTL。守护程序仅在 Leader 节点上执行工作,并在 `table_keeper_interval_second` 更改时通过 setInterval 更新其运行时间隔。增加此值可减少调度频率和负载;减少此值可更快地响应缺失或过期的历史表。 +- 引入版本: v3.3.1, v3.4.0, v3.5.0 ##### task_runs_ttl_second -- 默认值:7 * 24 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:控制任务运行历史记录的生存时间 (TTL)。降低此值可缩短历史保留时间并减少内存/磁盘使用;提高它可使历史记录保留更长时间,但会增加资源使用。与 `task_runs_max_history_number` 和 `enable_task_history_archive` 一起调整,以实现可预测的保留和存储行为。 -- 引入版本:v3.2.0 +- 默认值: 7 * 24 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 控制任务运行历史记录的生存时间 (TTL)。降低此值会缩短历史记录保留时间并减少内存/磁盘使用;提高此值会使历史记录保留更长时间但增加资源使用。与 `task_runs_max_history_number` 和 `enable_task_history_archive` 一起调整,以实现可预测的保留和存储行为。 +- 引入版本: v3.2.0 ##### task_ttl_second -- 默认值:24 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:任务的生存时间 (TTL)。对于手动任务(未设置计划时),TaskBuilder 使用此值计算任务的 `expireTime` (`expireTime = now + task_ttl_second * 1000L`)。TaskRun 还使用此值作为计算运行执行超时的上限 — 有效执行超时是 `min(task_runs_timeout_second, task_runs_ttl_second, task_ttl_second)`。调整此值会改变手动创建任务的有效时间长度,并可能间接限制任务运行的最大允许执行时间。 -- 引入版本:v3.2.0 +- 默认值: 24 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 任务的生存时间 (TTL)。对于手动任务(未设置计划时),TaskBuilder 使用此值计算任务的 `expireTime` (`expireTime = now + task_ttl_second * 1000L`)。TaskRun 也将此值用作计算运行执行超时的上限——有效执行超时为 `min(task_runs_timeout_second, task_runs_ttl_second, task_ttl_second)`。调整此值会更改手动创建的任务的有效时间以及间接限制任务运行的最大允许执行时间。 +- 引入版本: v3.2.0 ##### thrift_rpc_retry_times -- 默认值:3 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:控制 Thrift RPC 调用将进行的尝试总数。此值由 `ThriftRPCRequestExecutor`(以及 `NodeMgr` 和 `VariableMgr` 等调用者)用作重试的循环计数 — 即,值 3 允许最多三次尝试,包括初始尝试。在 `TTransportException` 上,执行器将尝试重新打开连接并重试此计数;当原因是 `SocketTimeoutException` 或重新打开失败时,它不会重试。每次尝试都受 `thrift_rpc_timeout_ms` 配置的每次尝试超时限制。增加此值可提高对瞬态连接故障的弹性,但可能会增加整体 RPC 延迟和资源使用。 -- 引入版本:v3.2.0 +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 控制 Thrift RPC 调用将尝试的总次数。此值由 `ThriftRPCRequestExecutor`(以及 `NodeMgr` 和 `VariableMgr` 等调用者)用作重试的循环计数——即,值为 3 允许最多三次尝试,包括初始尝试。在 `TTransportException` 上,执行器将尝试重新打开连接并重试达到此计数;当原因是 `SocketTimeoutException` 或重新打开失败时,它不会重试。每次尝试都受 `thrift_rpc_timeout_ms` 配置的每次尝试超时限制。增加此值可提高对瞬态连接故障的弹性,但会增加整体 RPC 延迟和资源使用。 +- 引入版本: v3.2.0 ##### thrift_rpc_strict_mode -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:控制 Thrift 服务器使用的 TBinaryProtocol "strict read" 模式。此值作为第一个参数传递给 Thrift 服务器堆栈中的 `org.apache.thrift.protocol.TBinaryProtocol.Factory`,并影响如何解析和验证传入的 Thrift 消息。当 `true`(默认值)时,服务器强制执行严格的 Thrift 编码/版本检查并遵守配置的 `thrift_rpc_max_body_size` 限制;当 `false` 时,服务器接受非严格(旧版/宽松)消息格式,这可以提高与旧客户端的兼容性,但可能会绕过某些协议验证。在运行中的集群上更改此值时请谨慎,因为它不可变且会影响互操作性和解析安全性。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 控制 Thrift 服务器使用的 TBinaryProtocol "strict read" 模式。此值作为第一个参数传递给 Thrift 服务器堆栈中的 org.apache.thrift.protocol.TBinaryProtocol.Factory,并影响如何解析和验证传入的 Thrift 消息。当为 `true`(默认)时,服务器强制执行严格的 Thrift 编码/版本检查并遵守配置的 `thrift_rpc_max_body_size` 限制;当为 `false` 时,服务器接受非严格(旧版/宽松)消息格式,这可以提高与旧客户端的兼容性,但可能会绕过某些协议验证。在运行中的集群上更改此设置时要谨慎,因为它不可变且会影响互操作性和解析安全性。 +- 引入版本: v3.2.0 ##### thrift_rpc_timeout_ms -- 默认值:10000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:用作 Thrift RPC 调用的默认网络/套接字超时的毫秒数。在 `ThriftConnectionPool` 中创建 Thrift 客户端时(由前端和后端池使用)传递给 TSocket,并且在计算 RPC 调用超时时(例如 `ConfigBase`、`LeaderOpExecutor`、`GlobalStateMgr`、`NodeMgr`、`VariableMgr` 和 `CheckpointWorker` 等地方)也添加到操作的执行超时(例如 ExecTimeout*1000 + `thrift_rpc_timeout_ms`)。增加此值可使 RPC 调用容忍更长的网络或远程处理延迟;降低此值可在慢速网络上更快地进行故障转移。更改此值会影响执行 Thrift RPC 的 FE 代码路径中的连接创建和请求截止日期。 -- 引入版本:v3.2.0 +- 默认值: 10000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 用作 Thrift RPC 调用的默认网络/套接字超时的毫秒数。它在 `ThriftConnectionPool` 中创建 Thrift 客户端时传递给 TSocket(由前端和后端池使用),并且还添加到操作的执行超时(例如 ExecTimeout*1000 + `thrift_rpc_timeout_ms`)中,用于计算 `ConfigBase`、`LeaderOpExecutor`、`GlobalStateMgr`、`NodeMgr`、`VariableMgr` 和 `CheckpointWorker` 等位置的 RPC 调用超时。增加此值使 RPC 调用能够容忍更长的网络或远程处理延迟;减少此值会在慢速网络上导致更快的故障转移。更改此值会影响执行 Thrift RPC 的 FE 代码路径中的连接创建和请求截止日期。 +- 引入版本: v3.2.0 ##### txn_latency_metric_report_groups -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:以逗号分隔的事务延迟指标组列表,用于报告。加载类型被分类为逻辑组以进行监控。启用某个组时,其名称将作为“type”标签添加到事务指标中。有效值:`stream_load`、`routine_load`、`broker_load`、`insert` 和 `compaction`(仅适用于共享数据集群)。示例:`"stream_load,routine_load"`。 -- 引入版本:v4.0 +- 默认值: An empty string +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 以逗号分隔的事务延迟指标组列表,用于报告。加载类型被分类为逻辑组以进行监控。当启用某个组时,其名称将作为“type”标签添加到事务指标中。有效值: `stream_load`、`routine_load`、`broker_load`、`insert` 和 `compaction` (仅适用于共享数据集群)。示例: `"stream_load,routine_load"`。 +- 引入版本: v4.0 ##### txn_rollback_limit -- 默认值:100 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:可以回滚的最大事务数。 -- 引入版本:- +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 可以回滚的事务的最大数量。 +- 引入版本: - ### 用户、角色和权限 ##### enable_task_info_mask_credential -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当为 true 时,StarRocks 在将任务 SQL 定义返回到 `information_schema.tasks` 和 `information_schema.task_runs` 之前,通过对 DEFINITION 列应用 `SqlCredentialRedactor.redact` 来编辑凭据。在 `information_schema.task_runs` 中,无论定义来自任务运行状态,还是在为空时来自任务定义查找,都应用相同的编辑。当为 false 时,返回原始任务定义(可能会暴露凭据)。掩码是 CPU/字符串处理工作,当任务或 `task_runs` 数量很大时可能会非常耗时;仅当您需要未编辑的定义并接受安全风险时才禁用。 -- 引入版本:v3.5.6 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当为 true 时,StarRocks 会在将任务 SQL 定义返回到 information_schema.tasks 和 information_schema.task_runs 之前,通过对 DEFINITION 列应用 SqlCredentialRedactor.redact 来屏蔽凭据。在 `information_schema.task_runs` 中,无论定义是来自任务运行状态,还是在为空时来自任务定义查找,都应用相同的屏蔽。当为 false 时,返回原始任务定义(可能会暴露凭据)。屏蔽是 CPU/字符串处理工作,当任务或 task_runs 数量很大时可能非常耗时;仅当您需要未屏蔽的定义并接受安全风险时才禁用。 +- 引入版本: v3.5.6 ##### privilege_max_role_depth -- 默认值:16 -- 类型:Int -- 单位: -- 是否可变:是 -- 描述:角色的最大角色深度(继承级别)。 -- 引入版本:v3.0.0 +- 默认值: 16 +- 类型: Int +- 单位: +- 可变: Yes +- 描述: 角色的最大角色深度(继承级别)。 +- 引入版本: v3.0.0 ##### privilege_max_total_roles_per_user -- 默认值:64 -- 类型:Int -- 单位: -- 是否可变:是 -- 描述:一个用户可以拥有的最大角色数。 -- 引入版本:v3.0.0 +- 默认值: 64 +- 类型: Int +- 单位: +- 可变: Yes +- 描述: 用户可以拥有的最大角色数。 +- 引入版本: v3.0.0 ### 查询引擎 ##### brpc_send_plan_fragment_timeout_ms -- 默认值:60000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:在发送计划片段之前应用于 BRPC TalkTimeoutController 的超时(毫秒)。`BackendServiceClient.sendPlanFragmentAsync` 在调用后端 `execPlanFragmentAsync` 之前设置此值。它控制 BRPC 在从连接池借用空闲连接以及执行发送时将等待多长时间;如果超出,RPC 将失败并可能触发方法的重试逻辑。在争用情况下,将其设置得较低可快速失败;将其设置得较高可容忍瞬态池耗尽或慢速网络。请谨慎:非常大的值可能会延迟故障检测并阻塞请求线程。 -- 引入版本:v3.3.11, v3.4.1, v3.5.0 +- 默认值: 60000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 在发送计划片段之前应用于 BRPC TalkTimeoutController 的超时(毫秒)。`BackendServiceClient.sendPlanFragmentAsync` 在调用后端 `execPlanFragmentAsync` 之前设置此值。它控制 BRPC 在从连接池借用空闲连接以及执行发送时将等待多长时间;如果超出,RPC 将失败并可能触发方法的重试逻辑。将其设置得较低可在争用下快速失败,或提高它以容忍瞬态池耗尽或慢速网络。请谨慎:非常大的值可能会延迟故障检测并阻塞请求线程。 +- 引入版本: v3.3.11, v3.4.1, v3.5.0 ##### connector_table_query_trigger_analyze_large_table_interval -- 默认值:12 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:大表的查询触发 ANALYZE 任务的时间间隔。 -- 引入版本:v3.4.0 +- 默认值: 12 * 3600 +- 类型: Int +- 单位: Second +- 可变: Yes +- 描述: 大表查询触发 ANALYZE 任务的间隔。 +- 引入版本: v3.4.0 ##### connector_table_query_trigger_analyze_max_pending_task_num -- 默认值:100 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE 上处于 Pending 状态的查询触发 ANALYZE 任务的最大数量。 -- 引入版本:v3.4.0 +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE 上处于 Pending 状态的查询触发 ANALYZE 任务的最大数量。 +- 引入版本: v3.4.0 ##### connector_table_query_trigger_analyze_max_running_task_num -- 默认值:2 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE 上处于 Running 状态的查询触发 ANALYZE 任务的最大数量。 -- 引入版本:v3.4.0 +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE 上处于 Running 状态的查询触发 ANALYZE 任务的最大数量。 +- 引入版本: v3.4.0 ##### connector_table_query_trigger_analyze_small_table_interval -- 默认值:2 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:小表的查询触发 ANALYZE 任务的时间间隔。 -- 引入版本:v3.4.0 +- 默认值: 2 * 3600 +- 类型: Int +- 单位: Second +- 可变: Yes +- 描述: 小表查询触发 ANALYZE 任务的间隔。 +- 引入版本: v3.4.0 ##### connector_table_query_trigger_analyze_small_table_rows -- 默认值:10000000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:查询触发 ANALYZE 任务判断表是否为小表的阈值。 -- 引入版本:v3.4.0 +- 默认值: 10000000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 用于判断表是否为查询触发 ANALYZE 任务的小表的阈值。 +- 引入版本: v3.4.0 ##### connector_table_query_trigger_task_schedule_interval -- 默认值:30 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:调度器线程调度查询触发后台任务的时间间隔。此项取代了 v3.4.0 中引入的 `connector_table_query_trigger_analyze_schedule_interval`。此处,后台任务指 v3.4 中的 `ANALYZE` 任务,以及 v3.4 之后版本中低基数列字典的收集任务。 -- 引入版本:v3.4.2 +- 默认值: 30 +- 类型: Int +- 单位: Second +- 可变: Yes +- 描述: 调度器线程调度查询触发后台任务的间隔。此项用于替换 v3.4.0 中引入的 `connector_table_query_trigger_analyze_schedule_interval`。此处,后台任务指 v3.4 中的 `ANALYZE` 任务,以及 v3.4 之后版本中低基数列字典的收集任务。 +- 引入版本: v3.4.2 ##### create_table_max_serial_replicas -- 默认值:128 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:串行创建副本的最大数量。如果实际副本数量超过此值,则将并发创建副本。如果表创建时间过长,请尝试减小此值。 -- 引入版本:- +- 默认值: 128 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 串行创建副本的最大数量。如果实际副本数量超过此值,则将并发创建副本。如果表创建时间过长,请尝试减小此值。 +- 引入版本: - ##### default_mv_partition_refresh_number -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:当物化视图刷新涉及多个分区时,此参数默认控制在单个批次中刷新多少个分区。 -从 v3.3.0 版本开始,系统默认一次刷新一个分区,以避免潜在的 OOM(内存溢出)问题。在早期版本中,默认一次刷新所有分区,这可能导致内存耗尽和任务失败。但是,请注意,当物化视图刷新涉及大量分区时,一次只刷新一个分区可能会导致过多的调度开销、更长的总体刷新时间以及大量的刷新记录。在这种情况下,建议适当调整此参数以提高刷新效率并降低调度成本。 -- 引入版本:v3.3.0 +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 当物化视图刷新涉及多个分区时,此参数默认控制单个批次刷新多少个分区。 +从版本 3.3.0 开始,系统默认一次刷新一个分区,以避免潜在的内存不足 (OOM) 问题。在早期版本中,默认情况下所有分区一次刷新,这可能导致内存耗尽和任务失败。但是,请注意,当物化视图刷新涉及大量分区时,一次只刷新一个分区可能会导致过多的调度开销、更长的总体刷新时间以及大量的刷新记录。在这种情况下,建议适当调整此参数以提高刷新效率并减少调度成本。 +- 引入版本: v3.3.0 ##### default_mv_refresh_immediate -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在创建异步物化视图后立即刷新。当此项设置为 `true` 时,新创建的物化视图将立即刷新。 -- 引入版本:v3.2.3 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否在创建后立即刷新异步物化视图。当此项设置为 `true` 时,新创建的物化视图将立即刷新。 +- 引入版本: v3.2.3 ##### dynamic_partition_check_interval_seconds -- 默认值:600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:检查新数据的时间间隔。如果检测到新数据,StarRocks 会自动为数据创建分区。 -- 引入版本:- +- 默认值: 600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 检查新数据的间隔。如果检测到新数据,StarRocks 会自动为数据创建分区。 +- 引入版本: - ##### dynamic_partition_enable -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用动态分区功能。启用此功能后,StarRocks 会动态为新数据创建分区,并自动删除过期分区以确保数据的时效性。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用动态分区功能。启用此功能后,StarRocks 会为新数据动态创建分区,并自动删除过期分区以确保数据的时效性。 +- 引入版本: - ##### enable_active_materialized_view_schema_strict_check -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:在激活非活动物化视图时,是否严格检查数据类型长度一致性。当此项设置为 `false` 时,如果基表中的数据类型长度已更改,则物化视图的激活不受影响。 -- 引入版本:v3.3.4 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 激活非活动物化视图时是否严格检查数据类型的长度一致性。当此项设置为 `false` 时,如果基表中的数据类型长度发生变化,物化视图的激活不受影响。 +- 引入版本: v3.3.4 ##### enable_auto_collect_array_ndv -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用 ARRAY 类型 NDV 信息的自动收集。 -- 引入版本:v4.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为 ARRAY 类型启用 NDV 信息的自动收集。 +- 引入版本: v4.0 ##### enable_backup_materialized_view -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:在备份或恢复特定数据库时,是否启用异步物化视图的 BACKUP 和 RESTORE。如果此项设置为 `false`,StarRocks 将跳过备份异步物化视图。 -- 引入版本:v3.2.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 在备份或恢复特定数据库时,是否启用异步物化视图的 BACKUP 和 RESTORE。如果此项设置为 `false`,StarRocks 将跳过备份异步物化视图。 +- 引入版本: v3.2.0 ##### enable_collect_full_statistic -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用自动全量统计信息收集。此功能默认启用。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用自动全量统计信息收集。此功能默认启用。 +- 引入版本: - ##### enable_colocate_mv_index -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:创建同步物化视图时,是否支持将同步物化视图索引与基表进行 Colocate。如果此项设置为 `true`,Tablet sink 将加速同步物化视图的写入性能。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 创建同步物化视图时是否支持将同步物化视图索引与基表共置。如果此项设置为 `true`,Tablet Sink 将加快同步物化视图的写入性能。 +- 引入版本: v3.2.0 ##### enable_decimal_v3 -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否支持 DECIMAL V3 数据类型。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否支持 DECIMAL V3 数据类型。 +- 引入版本: - ##### enable_experimental_mv -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用异步物化视图功能。TRUE 表示启用此功能。从 v2.5.2 版本开始,此功能默认启用。对于 v2.5.2 之前的版本,此功能默认禁用。 -- 引入版本:v2.4 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用异步物化视图功能。TRUE 表示启用此功能。从 v2.5.2 开始,此功能默认启用。对于 v2.5.2 之前的版本,此功能默认禁用。 +- 引入版本: v2.4 ##### enable_local_replica_selection -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:查询是否选择本地副本。本地副本可降低网络传输成本。如果此参数设置为 TRUE,CBO 优先选择与当前 FE 具有相同 IP 地址的 BE 上的 Tablet 副本。如果此参数设置为 `FALSE`,则可以选择本地副本和非本地副本。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为查询选择本地副本。本地副本可以减少网络传输成本。如果此参数设置为 TRUE,CBO 优先选择与当前 FE 具有相同 IP 地址的 BE 上的 tablet 副本。如果此参数设置为 `FALSE`,则可以选择本地副本和非本地副本。 +- 引入版本: - ##### enable_manual_collect_array_ndv -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用 ARRAY 类型 NDV 信息的手动收集。 -- 引入版本:v4.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为 ARRAY 类型启用 NDV 信息的手动收集。 +- 引入版本: v4.0 ##### enable_materialized_view -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用物化视图创建。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用物化视图的创建。 +- 引入版本: - ##### enable_materialized_view_external_table_precise_refresh -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:将此项设置为 `true` 以启用物化视图刷新时的内部优化,当基表是外部(非云原生)表时。启用后,物化视图刷新处理器会计算候选分区并仅刷新受影响的基表分区,而不是所有分区,从而减少 I/O 和刷新成本。将其设置为 `false` 以强制对外部表进行全分区刷新。 -- 引入版本:v3.2.9 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 将此项设置为 `true` 以启用物化视图刷新时针对基表是外部(非云原生)表的内部优化。启用后,物化视图刷新处理器会计算候选分区并仅刷新受影响的基表分区而不是所有分区,从而减少 I/O 和刷新成本。将其设置为 `false` 以强制对外部表进行全分区刷新。 +- 引入版本: v3.2.9 ##### enable_materialized_view_metrics_collect -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否默认收集异步物化视图的监控指标。 -- 引入版本:v3.1.11, v3.2.5 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否默认收集异步物化视图的监控指标。 +- 引入版本: v3.1.11, v3.2.5 ##### enable_materialized_view_spill -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用物化视图刷新任务的中间结果溢出。 -- 引入版本:v3.1.1 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为物化视图刷新任务启用中间结果溢出。 +- 引入版本: v3.1.1 ##### enable_materialized_view_text_based_rewrite -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否默认启用基于文本的查询重写。如果此项设置为 `true`,系统将在创建异步物化视图时构建抽象语法树。 -- 引入版本:v3.2.5 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否默认启用基于文本的查询重写。如果此项设置为 `true`,系统将在创建异步物化视图时构建抽象语法树。 +- 引入版本: v3.2.5 ##### enable_mv_automatic_active_check -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用系统自动检查并重新激活因其基表(视图)发生 Schema 变更或被删除和重新创建而设置为非活动的异步物化视图。请注意,此功能不会重新激活用户手动设置为非活动的物化视图。 -- 引入版本:v3.1.6 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用系统自动检查并重新激活因基表(视图)Schema 变更或被删除和重新创建而设置为非活动状态的异步物化视图。请注意,此功能不会重新激活用户手动设置为非活动状态的物化视图。 +- 引入版本: v3.1.6 ##### enable_mv_automatic_repairing_for_broken_base_tables -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `true` 时,StarRocks 将尝试在基外部表被删除并重新创建或其表标识符更改时自动修复物化视图基表元数据。修复流程可以更新物化视图的基表信息,收集外部表分区的分区级修复信息,并驱动异步自动刷新物化视图的分区刷新决策,同时遵守 `autoRefreshPartitionsLimit`。目前,自动修复支持 Hive 外部表;不支持的表类型将导致物化视图设置为非活动状态并引发修复异常。分区信息收集是非阻塞的,并且会记录失败。 -- 引入版本:v3.3.19, v3.4.8, v3.5.6 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `true` 时,StarRocks 将尝试在基外部表被删除并重新创建或其表标识符更改时自动修复物化视图基表元数据。修复流程可以更新物化视图的基表信息,收集外部表分区的分区级修复信息,并为异步自动刷新物化视图驱动分区刷新决策,同时遵守 `autoRefreshPartitionsLimit`。目前,自动修复支持 Hive 外部表;不支持的表类型将导致物化视图设置为非活动状态并引发修复异常。分区信息收集是非阻塞的,并且失败会记录日志。 +- 引入版本: v3.3.19, v3.4.8, v3.5.6 ##### enable_predicate_columns_collection -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用谓词列收集。如果禁用,谓词列在查询优化期间将不会被记录。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用谓词列收集。如果禁用,谓词列将不在查询优化期间记录。 +- 引入版本: - ##### enable_query_queue_v2 -- 默认值:true -- 类型:boolean -- 单位:- -- 是否可变:否 -- 描述:当为 true 时,将 FE 基于 Slot 的查询调度器切换到 Query Queue V2。此标志由 Slot 管理器和跟踪器(例如 `BaseSlotManager.isEnableQueryQueueV2` 和 `SlotTracker#createSlotSelectionStrategy`)读取,以选择 `SlotSelectionStrategyV2` 而不是旧版策略。`query_queue_v2_xxx` 配置选项和 `QueryQueueOptions` 仅在此标志启用时生效。从 v4.1 开始,默认值从 `false` 更改为 `true`。 -- 引入版本:v3.3.4, v3.4.0, v3.5.0 +- 默认值: true +- 类型: boolean +- 单位: - +- 可变: No +- 描述: 当为 true 时,将 FE 基于槽的查询调度器切换到查询队列 V2。该标志由槽管理器和跟踪器(例如 `BaseSlotManager.isEnableQueryQueueV2` 和 `SlotTracker#createSlotSelectionStrategy`)读取,以选择 `SlotSelectionStrategyV2` 而不是旧策略。`query_queue_v2_xxx` 配置选项和 `QueryQueueOptions` 仅在此标志启用时生效。从 v4.1 开始,默认值从 `false` 更改为 `true`。 +- 引入版本: v3.3.4, v3.4.0, v3.5.0 ##### enable_sql_blacklist -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用 SQL 查询黑名单检查。启用此功能后,黑名单中的查询无法执行。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用 SQL 查询的黑名单检查。启用此功能后,黑名单中的查询无法执行。 +- 引入版本: - ##### enable_statistic_collect -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否为 CBO 收集统计信息。此功能默认启用。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为 CBO 收集统计信息。此功能默认启用。 +- 引入版本: - ##### enable_statistic_collect_on_first_load -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:控制数据加载操作触发的自动统计信息收集和维护。这包括: +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 控制由数据加载操作触发的自动统计信息收集和维护。这包括: - 数据首次加载到分区时(分区版本等于 2)的统计信息收集。 - 数据加载到多分区表的空分区时的统计信息收集。 - INSERT OVERWRITE 操作的统计信息复制和更新。 @@ -1867,2634 +1867,2634 @@ ADMIN SET FRONTEND CONFIG ("key" = "value"); **同步行为:** - - 对于 DML 语句 (INSERT INTO/INSERT OVERWRITE):同步模式,带表锁。加载操作等待统计信息收集完成(最长 `semi_sync_collect_statistic_await_seconds`)。 - - 对于 Stream Load 和 Broker Load:异步模式,不带锁。统计信息收集在后台运行,不阻塞加载操作。 + - 对于 DML 语句 (INSERT INTO/INSERT OVERWRITE):同步模式,带表锁。加载操作等待统计信息收集完成(最长达 `semi_sync_collect_statistic_await_seconds`)。 + - 对于 Stream Load 和 Broker Load:异步模式,无锁。统计信息收集在后台运行,不阻塞加载操作。 :::note 禁用此配置将阻止所有加载触发的统计信息操作,包括 INSERT OVERWRITE 的统计信息维护,这可能导致表缺乏统计信息。如果频繁创建新表并频繁加载数据,启用此功能将增加内存和 CPU 开销。 ::: -- 引入版本:v3.1 +- 引入版本: v3.1 ##### enable_statistic_collect_on_update -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:控制 UPDATE 语句是否可以触发自动统计信息收集。启用后,修改表数据的 UPDATE 操作可能会通过与 `enable_statistic_collect_on_first_load` 控制的基于摄取数据的统计信息框架调度统计信息收集。禁用此配置将跳过 UPDATE 语句的统计信息收集,同时保持加载触发的统计信息收集行为不变。 -- 引入版本:v3.5.11, v4.0.4 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 控制 UPDATE 语句是否可以触发自动统计信息收集。启用后,修改表数据的 UPDATE 操作可能会通过 `enable_statistic_collect_on_first_load` 控制的基于摄取统计信息框架调度统计信息收集。禁用此配置会跳过 UPDATE 语句的统计信息收集,同时保持加载触发的统计信息收集行为不变。 +- 引入版本: v3.5.11, v4.0.4 ##### enable_udf -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否启用 UDF。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否启用 UDF。 +- 引入版本: - ##### expr_children_limit -- 默认值:10000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:一个表达式中允许的最大子表达式数量。 -- 引入版本:- +- 默认值: 10000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 表达式中允许的最大子表达式数。 +- 引入版本: - ##### histogram_buckets_size -- 默认值:64 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:直方图的默认桶数。 -- 引入版本:- +- 默认值: 64 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 直方图的默认桶数。 +- 引入版本: - ##### histogram_max_sample_row_count -- 默认值:10000000 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:为直方图收集的最大行数。 -- 引入版本:- +- 默认值: 10000000 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 为直方图收集的最大行数。 +- 引入版本: - ##### histogram_mcv_size -- 默认值:100 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:直方图的最常见值 (MCV) 数量。 -- 引入版本:- +- 默认值: 100 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 直方图的最常见值 (MCV) 的数量。 +- 引入版本: - ##### histogram_sample_ratio -- 默认值:0.1 -- 类型:Double -- 单位:- -- 是否可变:是 -- 描述:直方图的采样率。 -- 引入版本:- +- 默认值: 0.1 +- 类型: Double +- 单位: - +- 可变: Yes +- 描述: 直方图的采样率。 +- 引入版本: - ##### http_slow_request_threshold_ms -- 默认值:5000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:如果 HTTP 请求的响应时间超过此参数指定的值,则生成日志以跟踪此请求。 -- 引入版本:v2.5.15, v3.1.5 +- 默认值: 5000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 如果 HTTP 请求的响应时间超过此参数指定的值,则生成日志以跟踪此请求。 +- 引入版本: v2.5.15, v3.1.5 ##### lock_checker_enable_deadlock_check -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用后,LockChecker 线程使用 ThreadMXBean.findDeadlockedThreads() 执行 JVM 级死锁检测,并记录违规线程的堆栈跟踪。该检查在 LockChecker 守护进程(其频率由 `lock_checker_interval_second` 控制)内部运行,并将详细的堆栈信息写入日志,这可能需要大量的 CPU 和 I/O。仅在故障排除活动或可重现的死锁问题时才启用此选项;在正常操作中保持启用状态可能会增加开销和日志量。 -- 引入版本:v3.2.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用后,LockChecker 线程使用 ThreadMXBean.findDeadlockedThreads() 执行 JVM 级别的死锁检测,并记录违规线程的堆栈跟踪。检查在 LockChecker 守护程序内部运行(其频率由 `lock_checker_interval_second` 控制),并将详细的堆栈信息写入日志,这可能占用 CPU 和 I/O。仅用于故障排除实时或可重现的死锁问题;在正常操作中保持启用状态会增加开销和日志量。 +- 引入版本: v3.2.0 ##### low_cardinality_threshold -- 默认值:255 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:低基数字典的阈值。 -- 引入版本:v3.5.0 +- 默认值: 255 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 低基数字典的阈值。 +- 引入版本: v3.5.0 ##### materialized_view_min_refresh_interval -- 默认值:60 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:ASYNC 物化视图调度的最小允许刷新间隔(秒)。当使用基于时间的间隔创建物化视图时,间隔会转换为秒,并且不得小于此值;否则 CREATE/ALTER 操作将因 DDL 错误而失败。如果此值大于 0,则强制执行检查;将其设置为 0 或负值以禁用限制,这可以防止过度的 TaskManager 调度和过高频率刷新导致的 FE 内存/CPU 使用。此项不适用于 EVENT_TRIGGERED 刷新。 -- 引入版本:v3.3.0, v3.4.0, v3.5.0 +- 默认值: 60 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: ASYNC 物化视图计划允许的最小刷新间隔(秒)。当使用基于时间的间隔创建物化视图时,该间隔转换为秒,并且不能小于此值;否则 CREATE/ALTER 操作将因 DDL 错误而失败。如果此值大于 0,则强制执行检查;将其设置为 0 或负值以禁用限制,这可以防止TaskManager过度调度和过频繁刷新导致 FE 内存/CPU 使用率过高。此项不适用于 EVENT_TRIGGERED 刷新。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### materialized_view_refresh_ascending -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `true` 时,物化视图分区刷新将按分区键升序(从旧到新)迭代分区。当设置为 `false`(默认值)时,系统将按降序(从新到旧)迭代。StarRocks 在列表分区和范围分区物化视图刷新逻辑中都使用此项,以选择在应用分区刷新限制时要处理的分区,并计算后续 TaskRun 执行的下一个开始/结束分区边界。更改此项会改变哪些分区首先刷新以及如何派生下一个分区范围;对于范围分区物化视图,调度器会验证新的开始/结束,如果更改会创建重复边界(死循环),则会引发错误,因此请谨慎设置此项。 -- 引入版本:v3.3.1, v3.4.0, v3.5.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `true` 时,物化视图分区刷新将按分区键升序(从最旧到最新)迭代分区。当设置为 `false`(默认)时,系统按降序(从最新到最旧)迭代。StarRocks 在列表和范围分区物化视图刷新逻辑中使用此项,以选择当分区刷新限制适用时要处理的分区,并计算后续 TaskRun 执行的下一个开始/结束分区边界。更改此项会改变哪些分区首先刷新以及如何派生下一个分区范围;对于范围分区物化视图,调度器会验证新的开始/结束,如果更改会创建重复边界(死循环),则会引发错误,因此请谨慎设置此项。 +- 引入版本: v3.3.1, v3.4.0, v3.5.0 ##### max_allowed_in_element_num_of_delete -- 默认值:10000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:DELETE 语句中 IN 谓词允许的最大元素数量。 -- 引入版本:- +- 默认值: 10000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: DELETE 语句中 IN 谓词允许的最大元素数量。 +- 引入版本: - ##### max_create_table_timeout_second -- 默认值:600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:创建表的最大超时持续时间。 -- 引入版本:- +- 默认值: 600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 创建表的最大超时持续时间。 +- 引入版本: - ##### max_distribution_pruner_recursion_depth -- 默认值:100 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:分区修剪器允许的最大递归深度。增加递归深度可以修剪更多元素,但也会增加 CPU 消耗。 -- 引入版本:- +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 分区修剪器允许的最大递归深度。增加递归深度可以修剪更多元素,但也会增加 CPU 消耗。 +- 引入版本: - ##### max_partitions_in_one_batch -- 默认值:4096 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:批量创建分区时可以创建的最大分区数。 -- 引入版本:- +- 默认值: 4096 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 批量创建分区时可以创建的最大分区数。 +- 引入版本: - ##### max_planner_scalar_rewrite_num -- 默认值:100000 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:优化器可以重写标量操作符的最大次数。 -- 引入版本:- +- 默认值: 100000 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 优化器可以重写标量操作符的最大次数。 +- 引入版本: - ##### max_query_queue_history_slots_number -- 默认值:0 -- 类型:Int -- 单位:Slot -- 是否可变:是 -- 描述:控制每个查询队列保留多少最近释放的(历史)已分配 Slot 以进行监控和可观测性。当 `max_query_queue_history_slots_number` 设置为大于 `0` 的值时,BaseSlotTracker 会在内存队列中保留最多指定数量的最新释放的 LogicalSlot 条目,并在超出限制时驱逐最旧的条目。启用此功能会导致 getSlots() 包含这些历史条目(最新的在前),允许 BaseSlotTracker 尝试使用 ConnectContext 注册 Slot 以获取更丰富的 ExtraMessage 数据,并允许 LogicalSlot.ConnectContextListener 将查询完成元数据附加到历史 Slot。当 `max_query_queue_history_slots_number` `<= 0` 时,历史机制被禁用(不使用额外内存)。使用合理的值来平衡可观测性和内存开销。 -- 引入版本:v3.5.0 +- 默认值: 0 +- 类型: Int +- 单位: Slots +- 可变: Yes +- 描述: 控制每个查询队列保留多少最近释放的(历史)已分配槽,用于监控和可观察性。当 `max_query_queue_history_slots_number` 设置为 `> 0` 的值时,BaseSlotTracker 会在内存队列中保留最多指定数量的最近释放的 LogicalSlot 条目,超出限制时驱逐最旧的。启用此功能会导致 getSlots() 包含这些历史条目(最新优先),允许 BaseSlotTracker 尝试向 ConnectContext 注册槽以获取更丰富的 ExtraMessage 数据,并允许 LogicalSlot.ConnectContextListener 将查询完成元数据附加到历史槽。当 `max_query_queue_history_slots_number` `<= 0` 时,历史机制被禁用(不使用额外内存)。使用合理的值来平衡可观察性和内存开销。 +- 引入版本: v3.5.0 ##### max_query_retry_time -- 默认值:2 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE 上查询的最大重试次数。 -- 引入版本:- +- 默认值: 2 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE 上的最大查询重试次数。 +- 引入版本: - ##### max_running_rollup_job_num_per_table -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:一张表可以并行运行的 Rollup 作业的最大数量。 -- 引入版本:- +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 表可以并行运行的最大 rollup 作业数。 +- 引入版本: - ##### max_scalar_operator_flat_children - 默认值:10000 - 类型:Int - 单位:- -- 是否可变:是 +- 可变: Yes - 描述:ScalarOperator 的最大扁平子节点数。您可以设置此限制以防止优化器使用过多内存。 -- 引入版本:- +- 引入版本: - ##### max_scalar_operator_optimize_depth - 默认值:256 - 类型:Int - 单位:- -- 是否可变:是 -- 描述:ScalarOperator 优化可以应用的最大深度。 -- 引入版本:- +- 可变: Yes +- 描述: ScalarOperator 优化可以应用的最大深度。 +- 引入版本: - ##### mv_active_checker_interval_seconds -- 默认值:60 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:当后台 active_checker 线程启用时,系统将定期检测并自动重新激活因其基表(或视图)的 Schema 变更或重建而变为非活动的物化视图。此参数控制检查器线程的调度间隔,以秒为单位。默认值为系统定义。 -- 引入版本:v3.1.6 +- 默认值: 60 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 当后台 active_checker 线程启用时,系统将定期检测并自动重新激活因 Schema 变更或基表(或视图)重建而变为非活动状态的物化视图。此参数控制检查器线程的调度间隔,单位为秒。默认值由系统定义。 +- 引入版本: v3.1.6 ##### mv_rewrite_consider_data_layout_mode -- 默认值:`enable` -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:控制物化视图重写在选择最佳物化视图时是否应考虑基表数据布局。有效值: - - `disable`:在选择候选物化视图时,从不使用数据布局标准。 - - `enable`:仅当查询被识别为对布局敏感时才使用数据布局标准。 - - `force`:在选择最佳物化视图时,始终应用数据布局标准。 - 更改此项会影响 `BestMvSelector` 的行为,并可以根据物理布局是否影响计划正确性或性能来改进或扩大重写适用性。 -- 引入版本:- +- 默认值: `enable` +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 控制物化视图重写在选择最佳物化视图时是否应考虑基表数据布局。有效值: + - `disable`: 在选择候选物化视图时从不使用数据布局标准。 + - `enable`: 仅当查询被识别为对布局敏感时才使用数据布局标准。 + - `force`: 在选择最佳物化视图时始终应用数据布局标准。 + 更改此项会影响 `BestMvSelector` 的行为,并可以根据物理布局是否对计划正确性或性能重要来改进或扩展重写适用性。 +- 引入版本: - ##### publish_version_interval_ms -- 默认值:10 -- 类型:Int -- 单位:毫秒 -- 是否可变:否 -- 描述:发布验证任务发出的时间间隔。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: Milliseconds +- 可变: No +- 描述: 发布验证任务发出的时间间隔。 +- 引入版本: - ##### query_queue_slots_estimator_strategy -- 默认值:MAX -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:当 `enable_query_queue_v2` 为 true 时,选择用于基于队列的查询的 Slot 估算策略。有效值:MBE(基于内存)、PBE(基于并行度)、MAX(取 MBE 和 PBE 的最大值)和 MIN(取 MBE 和 PBE 的最小值)。MBE 根据预测内存或计划成本除以每个 Slot 的内存目标来估算 Slot,并由 `totalSlots` 限制。PBE 根据片段并行度(扫描范围计数或基数 / 每 Slot 行数)和基于 CPU 成本的计算(使用每个 Slot 的 CPU 成本)派生 Slot,然后将结果限制在 [numSlots/2, numSlots] 范围内。MAX 和 MIN 通过取其最大值或最小值来组合 MBE 和 PBE。如果配置值无效,则使用默认值 (`MAX`)。 -- 引入版本:v3.5.0 +- 默认值: MAX +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 当 `enable_query_queue_v2` 为 true 时,选择用于基于队列查询的槽估计策略。有效值: MBE(基于内存)、PBE(基于并行度)、MAX(取 MBE 和 PBE 的最大值)和 MIN(取 MBE 和 PBE 的最小值)。MBE 根据预测内存或计划成本除以每个槽内存目标来估计槽,并受 `totalSlots` 限制。PBE 从片段并行度(扫描范围计数或基数/每槽行数)和基于 CPU 成本的计算(使用每槽 CPU 成本)中派生槽,然后将结果限制在 [numSlots/2, numSlots] 范围内。MAX 和 MIN 通过取最大值或最小值来组合 MBE 和 PBE。如果配置值无效,则使用默认值 (`MAX`)。 +- 引入版本: v3.5.0 ##### query_queue_v2_concurrency_level -- 默认值:4 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:控制计算系统总查询 Slot 时使用的逻辑并发“层”数。在 Shared-Nothing 模式下,总 Slot = `query_queue_v2_concurrency_level` * BE 数量 * 每个 BE 的核心数(从 BackendResourceStat 派生)。在多仓库模式下,有效并发会缩减到 max(1, `query_queue_v2_concurrency_level` / 4)。如果配置值为非正数,则将其视为 `4`。更改此值会增加或减少总 Slot(以及因此的并发查询容量),并影响每个 Slot 的资源:memBytesPerSlot 通过将每个 worker 内存除以(每个 worker 的核心数 * 并发)得出,CPU 核算使用 `query_queue_v2_cpu_costs_per_slot`。将其设置为与集群大小成比例;非常大的值可能会减少每个 Slot 的内存并导致资源碎片化。 -- 引入版本:v3.3.4, v3.4.0, v3.5.0 +- 默认值: 4 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 控制计算系统总查询槽时使用的逻辑并发“层”数。在无共享模式下,总槽数 = `query_queue_v2_concurrency_level` * BE 数量 * 每个 BE 的核心数(来自 BackendResourceStat)。在多仓库模式下,有效并发度缩小到 max(1, `query_queue_v2_concurrency_level` / 4)。如果配置值为非正数,则视为 `4`。更改此值会增加或减少总槽数(从而增加或减少并发查询容量)并影响每个槽的资源:memBytesPerSlot 通过将每个工作线程内存除以(每个工作线程的核心数 * 并发度)得出,并且 CPU 记帐使用 `query_queue_v2_cpu_costs_per_slot`。将其设置为与集群大小成比例;非常大的值可能会减少每个槽的内存并导致资源碎片。 +- 引入版本: v3.3.4, v3.4.0, v3.5.0 ##### query_queue_v2_cpu_costs_per_slot -- 默认值:1000000000 -- 类型:Long -- 单位:规划器 CPU 成本单位 -- 是否可变:是 -- 描述:每个 Slot 的 CPU 成本阈值,用于根据查询的规划器 CPU 成本估算查询所需的 Slot 数量。调度器将 Slot 计算为整数(plan_cpu_costs / `query_queue_v2_cpu_costs_per_slot`),然后将结果限制在 [1, totalSlots] 范围内(totalSlots 从查询队列 V2 `V2` 参数派生)。V2 代码将非正设置规范化为 1 (Math.max(1, value)),因此非正值实际上变为 `1`。增加此值可减少每个查询分配的 Slot(倾向于较少、较大 Slot 的查询);减少此值会增加每个查询的 Slot。与 `query_queue_v2_num_rows_per_slot` 和并发设置一起调整,以控制并行度与资源粒度。 -- 引入版本:v3.3.4, v3.4.0, v3.5.0 +- 默认值: 1000000000 +- 类型: Long +- 单位: planner CPU cost units +- 可变: Yes +- 描述: 用于估计查询需要多少槽的每个槽 CPU 成本阈值(以 planner CPU 成本单位表示)。调度器计算槽为整数(plan_cpu_costs / `query_queue_v2_cpu_costs_per_slot`),然后将结果限制在 [1, totalSlots] 范围内(totalSlots 从查询队列 V2 `V2` 参数派生)。V2 代码将非正设置归一化为 1 (Math.max(1, value)),因此非正值实际上变为 `1`。增加此值会减少每次查询分配的槽数(倾向于更少、更大槽的查询);减少此值会增加每次查询的槽数。与 `query_queue_v2_num_rows_per_slot` 和并发设置一起调整,以控制并行度与资源粒度。 +- 引入版本: v3.3.4, v3.4.0, v3.5.0 ##### query_queue_v2_num_rows_per_slot -- 默认值:4096 -- 类型:Int -- 单位:行 -- 是否可变:是 -- 描述:估算每个查询 Slot 计数时,分配给单个调度 Slot 的目标源行记录数。StarRocks 计算 estimated_slots = (Source Node 的基数) / `query_queue_v2_num_rows_per_slot`,然后将结果限制在 [1, totalSlots] 范围内,如果计算值为非正数,则强制最小值为 1。totalSlots 从可用资源(大约 DOP * `query_queue_v2_concurrency_level` * worker/BE 数量)派生,因此取决于集群/核心计数。增加此值可减少 Slot 计数(每个 Slot 处理更多行)并降低调度开销;减少它可增加并行度(更多、更小的 Slot),直到达到资源限制。 -- 引入版本:v3.3.4, v3.4.0, v3.5.0 +- 默认值: 4096 +- 类型: Int +- 单位: Rows +- 可变: Yes +- 描述: 在估计每次查询的槽数时,分配给单个调度槽的目标源行记录数。StarRocks 计算 estimated_slots = (Source 节点的基数) / `query_queue_v2_num_rows_per_slot`,然后将结果限制在 [1, totalSlots] 范围内,如果计算值为非正数,则强制最小值为 1。totalSlots 从可用资源(大致为 DOP * `query_queue_v2_concurrency_level` * 工作线程数/BE)派生,因此取决于集群/核心计数。增加此值可减少槽数(每个槽处理更多行)并降低调度开销;减少此值可增加并行度(更多、更小的槽),直至达到资源限制。 +- 引入版本: v3.3.4, v3.4.0, v3.5.0 ##### query_queue_v2_schedule_strategy -- 默认值:SWRR -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:选择 Query Queue V2 用于排序待处理查询的调度策略。支持的值(不区分大小写)为 `SWRR` (Smooth Weighted Round Robin) —— 默认值,适用于需要公平加权共享的混合/混合工作负载 —— 和 `SJF` (Short Job First + Aging) —— 优先处理短作业,同时使用老化机制避免饥饿。该值通过不区分大小写的枚举查找进行解析;无法识别的值将作为错误记录并使用默认策略。此配置仅在 Query Queue V2 启用时影响行为,并与 V2 大小设置(例如 `query_queue_v2_concurrency_level`)交互。 -- 引入版本:v3.3.12, v3.4.2, v3.5.0 +- 默认值: SWRR +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 选择 Query Queue V2 用于对挂起查询进行排序的调度策略。支持的值(不区分大小写)为 `SWRR` (Smooth Weighted Round Robin) – 默认值,适用于需要公平加权共享的混合/混合工作负载 – 和 `SJF` (Short Job First + Aging) – 优先处理短作业,同时使用老化机制避免饥饿。该值通过不区分大小写的枚举查找进行解析;无法识别的值将记录为错误并使用默认策略。此配置仅在 Query Queue V2 启用时影响行为,并与 `query_queue_v2_concurrency_level` 等 V2 大小设置交互。 +- 引入版本: v3.3.12, v3.4.2, v3.5.0 ##### semi_sync_collect_statistic_await_seconds -- 默认值:30 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:DML 操作(INSERT INTO 和 INSERT OVERWRITE 语句)期间半同步统计信息收集的最大等待时间。Stream Load 和 Broker Load 使用异步模式,不受此配置影响。如果统计信息收集时间超过此值,加载操作将继续而不等待收集完成。此配置与 `enable_statistic_collect_on_first_load` 协同工作。 -- 引入版本:v3.1 +- 默认值: 30 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: DML 操作(INSERT INTO 和 INSERT OVERWRITE 语句)期间半同步统计信息收集的最大等待时间。Stream Load 和 Broker Load 使用异步模式,不受此配置影响。如果统计信息收集时间超过此值,加载操作将继续而不等待收集完成。此配置与 `enable_statistic_collect_on_first_load` 结合使用。 +- 引入版本: v3.1 ##### slow_query_analyze_threshold -- 默认值:5 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:查询执行时间阈值,用于触发查询反馈分析。 -- 引入版本:v3.4.0 +- 默认值: 5 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 查询触发查询反馈分析的执行时间阈值。 +- 引入版本: v3.4.0 ##### statistic_analyze_status_keep_second -- 默认值:3 * 24 * 3600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:保留收集任务历史记录的持续时间。默认值为 3 天。 -- 引入版本:- +- 默认值: 3 * 24 * 3600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 收集任务历史记录的保留时长。默认值为 3 天。 +- 引入版本: - ##### statistic_auto_analyze_end_time -- 默认值:23:59:59 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:自动收集的结束时间。取值范围:`00:00:00` - `23:59:59`。 -- 引入版本:- +- 默认值: 23:59:59 +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 自动收集的结束时间。取值范围: `00:00:00` - `23:59:59`。 +- 引入版本: - ##### statistic_auto_analyze_start_time -- 默认值:00:00:00 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:自动收集的开始时间。取值范围:`00:00:00` - `23:59:59`。 -- 引入版本:- +- 默认值: 00:00:00 +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 自动收集的开始时间。取值范围: `00:00:00` - `23:59:59`。 +- 引入版本: - ##### statistic_auto_collect_ratio -- 默认值:0.8 -- 类型:Double -- 单位:- -- 是否可变:是 -- 描述:判断自动收集统计信息是否健康的阈值。如果统计信息健康度低于此阈值,则触发自动收集。 -- 引入版本:- +- 默认值: 0.8 +- 类型: Double +- 单位: - +- 可变: Yes +- 描述: 用于判断自动收集统计信息是否健康的阈值。如果统计信息健康度低于此阈值,则触发自动收集。 +- 引入版本: - ##### statistic_auto_collect_small_table_rows -- 默认值:10000000 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:自动收集期间判断外部数据源(Hive、Iceberg、Hudi)中的表是否为小表的阈值。如果表的行数小于此值,则认为该表为小表。 -- 引入版本:v3.2 +- 默认值: 10000000 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 自动收集期间,判断外部数据源(Hive、Iceberg、Hudi)中的表是否为小表的阈值。如果表的行数小于此值,则该表被视为小表。 +- 引入版本: v3.2 ##### statistic_cache_columns -- 默认值:100000 -- 类型:Long -- 单位:- -- 是否可变:否 -- 描述:统计信息表可以缓存的行数。 -- 引入版本:- +- 默认值: 100000 +- 类型: Long +- 单位: - +- 可变: No +- 描述: 统计信息表可缓存的行数。 +- 引入版本: - ##### statistic_cache_thread_pool_size -- 默认值:10 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:用于刷新统计信息缓存的线程池大小。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 用于刷新统计信息缓存的线程池大小。 +- 引入版本: - ##### statistic_collect_interval_sec -- 默认值:5 * 60 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:自动收集期间检查数据更新的间隔。 -- 引入版本:- +- 默认值: 5 * 60 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 自动收集期间检查数据更新的间隔。 +- 引入版本: - ##### statistic_max_full_collect_data_size -- 默认值:100 * 1024 * 1024 * 1024 -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:统计信息自动收集的数据大小阈值。如果总大小超过此值,则执行采样收集而不是全量收集。 -- 引入版本:- +- 默认值: 100 * 1024 * 1024 * 1024 +- 类型: Long +- 单位: bytes +- 可变: Yes +- 描述: 统计信息自动收集的数据大小阈值。如果总大小超过此值,则执行采样收集而非完全收集。 +- 引入版本: - ##### statistic_sample_collect_rows -- 默认值:200000 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:在加载触发的统计信息操作期间,用于决定 SAMPLE 和 FULL 统计信息收集的行数阈值。如果加载或更改的行数超过此阈值(默认 200,000),则使用 SAMPLE 统计信息收集;否则,使用 FULL 统计信息收集。此设置与 `enable_statistic_collect_on_first_load` 和 `statistic_sample_collect_ratio_threshold_of_first_load` 协同工作。 -- 引入版本:- +- 默认值: 200000 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 在加载触发的统计信息操作期间,用于决定 SAMPLE 和 FULL 统计信息收集的行数阈值。如果加载或更改的行数超过此阈值(默认 200,000),则使用 SAMPLE 统计信息收集;否则,使用 FULL 统计信息收集。此设置与 `enable_statistic_collect_on_first_load` 和 `statistic_sample_collect_ratio_threshold_of_first_load` 结合使用。 +- 引入版本: - ##### statistic_update_interval_sec -- 默认值:24 * 60 * 60 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:统计信息缓存的更新间隔。 -- 引入版本:- +- 默认值: 24 * 60 * 60 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 统计信息缓存更新的间隔。 +- 引入版本: - ##### task_check_interval_second -- 默认值:60 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:任务后台作业执行之间的时间间隔。GlobalStateMgr 使用此值调度 TaskCleaner FrontendDaemon,该守护进程调用 `doTaskBackgroundJob()`;该值乘以 1000 以毫秒为单位设置守护进程间隔。减小此值可使后台维护(任务清理、检查)运行更频繁并更快响应,但会增加 CPU/IO 开销;增加它可减少开销,但会延迟清理和陈旧任务的检测。调整此值以平衡维护响应速度和资源使用。 -- 引入版本:v3.2.0 +- 默认值: 60 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 任务后台作业执行之间的间隔(秒)。GlobalStateMgr 使用此值来调度 TaskCleaner FrontendDaemon,该守护程序调用 `doTaskBackgroundJob()`;该值乘以 1000 以毫秒为单位设置守护程序间隔。减小该值会使后台维护(任务清理、检查)运行更频繁,反应更快,但会增加 CPU/IO 开销;增加该值会减少开销,但会延迟清理和过期任务的检测。调整此值以平衡维护响应性和资源使用。 +- 引入版本: v3.2.0 ##### task_min_schedule_interval_s -- 默认值:10 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:SQL 层检查任务调度的最小允许调度间隔(秒)。提交任务时,TaskAnalyzer 将调度周期转换为秒,如果周期小于 `task_min_schedule_interval_s`,则拒绝提交并显示 ERR_INVALID_PARAMETER 错误。这可以防止创建运行过于频繁的任务,并保护调度器免受高频任务的影响。如果调度没有明确的开始时间,TaskAnalyzer 会将开始时间设置为当前纪元秒。 -- 引入版本:v3.3.0, v3.4.0, v3.5.0 +- 默认值: 10 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: SQL 层检查的任务计划允许的最小计划间隔(秒)。提交任务时,TaskAnalyzer 将计划周期转换为秒,如果周期小于 `task_min_schedule_interval_s`,则拒绝提交并返回 ERR_INVALID_PARAMETER。这可以防止创建运行过于频繁的任务,并保护调度器免受高频任务的影响。如果计划没有明确的开始时间,TaskAnalyzer 将开始时间设置为当前纪元秒。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### task_runs_timeout_second -- 默认值:4 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:TaskRun 的默认执行超时(秒)。此项用作 TaskRun 执行的基线超时。如果任务运行的属性包含 `query_timeout` 或 `insert_timeout` 具有正整数值的会话变量,则运行时将使用该会话超时与 `task_runs_timeout_second` 之间的较大值。然后,有效超时将被限制为不超过配置的 `task_runs_ttl_second` 和 `task_ttl_second`。设置此项以限制任务运行可能执行的时间长度。非常大的值可能会被任务/任务运行 TTL 设置截断。 -- 引入版本:- +- 默认值: 4 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: TaskRun 的默认执行超时(秒)。此项用作 TaskRun 执行的基准超时。如果任务运行的属性包含 `query_timeout` 或 `insert_timeout` 具有正整数值的会话变量,则运行时使用该会话超时和 `task_runs_timeout_second` 中较大的值。然后将有效超时限制为不超过配置的 `task_runs_ttl_second` 和 `task_ttl_second`。设置此项以限制任务运行可能执行的时长。非常大的值可能会被任务/任务运行 TTL 设置截断。 +- 引入版本: - ### 加载和卸载 ##### broker_load_default_timeout_second -- 默认值:14400 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:Broker Load 作业的超时持续时间。 -- 引入版本:- +- 默认值: 14400 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: Broker Load 作业的超时持续时间。 +- 引入版本: - ##### desired_max_waiting_jobs -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE 中待处理作业的最大数量。该数量指所有作业,例如表创建、加载和 Schema Change 作业。如果 FE 中待处理作业的数量达到此值,FE 将拒绝新的加载请求。此参数仅对异步加载有效。从 v2.5 开始,默认值从 100 更改为 1024。 -- 引入版本:- +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE 中挂起作业的最大数量。该数量指所有作业,例如表创建、加载和 Schema 变更作业。如果 FE 中挂起作业的数量达到此值,FE 将拒绝新的加载请求。此参数仅对异步加载生效。从 v2.5 开始,默认值从 100 更改为 1024。 +- 引入版本: - ##### disable_load_job -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当集群遇到错误时是否禁用加载。这可以防止因集群错误造成的任何损失。默认值为 `FALSE`,表示不禁用加载。`TRUE` 表示禁用加载,集群处于只读状态。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当集群遇到错误时是否禁用加载。这可以防止因集群错误造成的任何损失。默认值为 `FALSE`,表示不禁用加载。`TRUE` 表示禁用加载,集群处于只读状态。 +- 引入版本: - ##### empty_load_as_error -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:如果没有加载数据,是否返回错误消息 "all partitions have no load data"。有效值: - - `true`:如果没有加载数据,系统将显示失败消息并返回错误 "all partitions have no load data"。 - - `false`:如果没有加载数据,系统将显示成功消息并返回 OK,而不是错误。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 如果没有数据加载,是否返回错误消息 "all partitions have no load data"。有效值: + - `true`: 如果没有数据加载,系统显示失败消息并返回错误 "all partitions have no load data"。 + - `false`: 如果没有数据加载,系统显示成功消息并返回 OK,而不是错误。 +- 引入版本: - ##### enable_file_bundling -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否为云原生表启用文件捆绑优化。当此功能启用(设置为 `true`)时,系统会自动捆绑加载、Compaction 或 Publish 操作生成的数据文件,从而减少高频访问外部存储系统造成的 API 成本。您还可以使用 CREATE TABLE 属性 `file_bundling` 在表级别控制此行为。有关详细说明,请参阅 [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md)。 -- 引入版本:v4.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为云原生表启用文件捆绑优化。启用此功能 (设置为 `true`) 后,系统会自动捆绑加载、Compaction 或 Publish 操作生成的数据文件,从而降低因高频访问外部存储系统而产生的 API 成本。您也可以通过表属性 `file_bundling` 在表级别控制此行为。详细说明请参阅 [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md)。 +- 引入版本: v4.0 ##### enable_routine_load_lag_metrics -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否收集 Routine Load Kafka 分区偏移量滞后指标。请注意,将此项设置为 `true` 将调用 Kafka API 获取分区的最新偏移量。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否收集 Routine Load Kafka 分区偏移量滞后指标。请注意,将此项设置为 `true` 将调用 Kafka API 获取分区的最新偏移量。 +- 引入版本: - ##### enable_sync_publish -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在加载事务的发布阶段同步执行应用任务。此参数仅适用于主键表。有效值: - - `TRUE`(默认值):应用任务在加载事务的发布阶段同步执行。这意味着只有在应用任务完成后,加载事务才会被报告为成功,并且加载的数据才能真正被查询。当任务一次加载大量数据或频繁加载数据时,将此参数设置为 `true` 可以提高查询性能和稳定性,但可能会增加加载延迟。 - - `FALSE`:应用任务在加载事务的发布阶段异步执行。这意味着加载事务在应用任务提交后就被报告为成功,但加载的数据不能立即被查询。在这种情况下,并发查询需要等待应用任务完成或超时才能继续。当任务一次加载大量数据或频繁加载数据时,将此参数设置为 `false` 可能会影响查询性能和稳定性。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否在加载事务的发布阶段同步执行应用任务。此参数仅适用于 Primary Key 表。有效值: + - `TRUE` (默认): 应用任务在加载事务的发布阶段同步执行。这意味着只有在应用任务完成后,加载事务才报告成功,并且加载的数据才能真正查询。当任务一次加载大量数据或频繁加载数据时,将此参数设置为 `true` 可以提高查询性能和稳定性,但可能会增加加载延迟。 + - `FALSE`: 应用任务在加载事务的发布阶段异步执行。这意味着加载事务在应用任务提交后报告成功,但加载的数据无法立即查询。在这种情况下,并发查询需要等待应用任务完成或超时才能继续。当任务一次加载大量数据或频繁加载数据时,将此参数设置为 `false` 可能会影响查询性能和稳定性。 +- 引入版本: v3.2.0 ##### export_checker_interval_second -- 默认值:5 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:加载作业的调度时间间隔。 -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: 加载作业的调度时间间隔。 +- 引入版本: - ##### export_max_bytes_per_be_per_task -- 默认值:268435456 -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:单个数据卸载任务从单个 BE 导出的最大数据量。 -- 引入版本:- +- 默认值: 268435456 +- 类型: Long +- 单位: Bytes +- 可变: Yes +- 描述: 单个数据卸载任务可以从单个 BE 导出的最大数据量。 +- 引入版本: - ##### export_running_job_num_limit -- 默认值:5 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:可以并行运行的数据导出任务的最大数量。 -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 可以并行运行的数据导出任务的最大数量。 +- 引入版本: - ##### export_task_default_timeout_second -- 默认值:2 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:数据导出任务的超时持续时间。 -- 引入版本:- +- 默认值: 2 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 数据导出任务的超时持续时间。 +- 引入版本: - ##### export_task_pool_size -- 默认值:5 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:卸载任务线程池的大小。 -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 卸载任务线程池的大小。 +- 引入版本: - ##### external_table_commit_timeout_ms -- 默认值:10000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:将写入事务提交(发布)到 StarRocks 外部表的超时持续时间。默认值 `10000` 表示 10 秒的超时持续时间。 -- 引入版本:- +- 默认值: 10000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 提交(发布)写入事务到 StarRocks 外部表的超时持续时间。默认值 `10000` 表示 10 秒的超时持续时间。 +- 引入版本: - ##### finish_transaction_default_lock_timeout_ms -- 默认值:1000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:完成事务期间获取数据库和表锁的默认超时时间。 -- 引入版本:v4.0.0, v3.5.8 +- 默认值: 1000 +- 类型: Int +- 单位: MilliSeconds +- 可变: Yes +- 描述: 完成事务期间获取数据库和表锁的默认超时时间。 +- 引入版本: v4.0.0, v3.5.8 ##### history_job_keep_max_second -- 默认值:7 * 24 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:历史作业(例如 Schema Change 作业)可以保留的最长持续时间。 -- 引入版本:- +- 默认值: 7 * 24 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 历史作业(如 Schema 变更作业)可保留的最大持续时间。 +- 引入版本: - ##### insert_load_default_timeout_second -- 默认值:3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:用于加载数据的 INSERT INTO 语句的超时持续时间。 -- 引入版本:- +- 默认值: 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 用于加载数据的 INSERT INTO 语句的超时持续时间。 +- 引入版本: - ##### label_clean_interval_second -- 默认值:4 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:标签清理的时间间隔。单位:秒。建议指定较短的时间间隔,以确保可以及时清理历史标签。 -- 引入版本:- +- 默认值: 4 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: 标签清理的时间间隔。单位:秒。建议您指定较短的时间间隔,以确保历史标签能够及时清理。 +- 引入版本: - ##### label_keep_max_num -- 默认值:1000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:一段时间内可以保留的加载作业的最大数量。如果超过此数量,历史作业的信息将被删除。 -- 引入版本:- +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 在一段时间内可以保留的加载作业的最大数量。如果超过此数量,历史作业信息将被删除。 +- 引入版本: - ##### label_keep_max_second -- 默认值:3 * 24 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:已完成且处于 FINISHED 或 CANCELLED 状态的加载作业的标签最长保留时间(秒)。默认值为 3 天。超过此持续时间后,标签将被删除。此参数适用于所有类型的加载作业。值过大会消耗大量内存。 -- 引入版本:- +- 默认值: 3 * 24 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 已完成并处于 FINISHED 或 CANCELLED 状态的加载作业标签的最大保留时长(秒)。默认值为 3 天。超过此时长后,标签将被删除。此参数适用于所有类型的加载作业。值过大会消耗大量内存。 +- 引入版本: - ##### load_checker_interval_second -- 默认值:5 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:加载作业循环处理的时间间隔。 -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: 加载作业以滚动方式处理的时间间隔。 +- 引入版本: - ##### load_parallel_instance_num -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:控制为 Broker Load 和 Stream Load 在单个主机上创建的并行加载片段实例的数量。LoadPlanner 使用此值作为每主机并行度,除非会话启用自适应 Sink DOP;如果会话变量 `enable_adaptive_sink_dop` 为 true,则会话的 `sink_degree_of_parallelism` 将覆盖此配置。当需要 Shuffle 时,此值应用于片段并行执行(扫描片段和 Sink 片段并行执行实例)。当不需要 Shuffle 时,它用作 Sink Pipeline DOP。注意:从本地文件加载被强制为单个实例(Pipeline DOP = 1,并行执行 = 1),以避免本地磁盘争用。增加此值可提高每主机并发性和吞吐量,但可能会增加 CPU、内存和 I/O 争用。 -- 引入版本:v3.2.0 +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 控制单个主机上为 Broker 和 Stream Load 创建的并行加载片段实例的数量。LoadPlanner 使用此值作为每个主机的并行度,除非会话启用自适应 Sink DOP;如果会话变量 `enable_adaptive_sink_dop` 为 true,则会话的 `sink_degree_of_parallelism` 将覆盖此配置。当需要 Shuffle 时,此值应用于片段并行执行(扫描片段和 Sink 片段并行执行实例)。当不需要 Shuffle 时,它用作 Sink 管道 DOP。注意:从本地文件加载被强制为单个实例(管道 DOP = 1,并行执行 = 1)以避免本地磁盘争用。增加此数字会提高每个主机的并发性和吞吐量,但可能会增加 CPU、内存和 I/O 争用。 +- 引入版本: v3.2.0 ##### load_straggler_wait_second -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:BE 副本可以容忍的最大加载延迟。如果超过此值,则执行克隆以从其他副本克隆数据。 -- 引入版本:- +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: BE 副本可容忍的最大加载延迟。如果超过此值,将执行克隆操作以从其他副本克隆数据。 +- 引入版本: - ##### loads_history_retained_days -- 默认值:30 -- 类型:Int -- 单位:天 -- 是否可变:是 -- 描述:内部 `_statistics_.loads_history` 表中保留加载历史记录的天数。此值用于表创建以设置表属性 `partition_live_number`,并传递给 `TableKeeper`(限制最小值为 1)以确定要保留多少每日分区。增加或减少此值可调整完成的加载作业在每日分区中保留的时间;它会影响新表创建和 keeper 的修剪行为,但不会自动重新创建过去的分区。`LoadsHistorySyncer` 依赖此保留在管理加载历史生命周期时;其同步频率由 `loads_history_sync_interval_second` 控制。 -- 引入版本:v3.3.6, v3.4.0, v3.5.0 +- 默认值: 30 +- 类型: Int +- 单位: Days +- 可变: Yes +- 描述: 内部 `_statistics_.loads_history` 表中加载历史记录的保留天数。此值用于表创建以设置表属性 `partition_live_number`,并传递给 `TableKeeper`(最小值为 1)以确定要保留多少每日分区。增加或减少此值会调整已完成加载作业在每日分区中的保留时间;它影响新表创建和 keeper 的修剪行为,但不会自动重新创建过去的分区。`LoadsHistorySyncer` 在管理加载历史生命周期时依赖此保留;其同步节奏由 `loads_history_sync_interval_second` 控制。 +- 引入版本: v3.3.6, v3.4.0, v3.5.0 ##### loads_history_sync_interval_second -- 默认值:60 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:LoadsHistorySyncer 用于调度定期将已完成的加载作业从 `information_schema.loads` 同步到内部 `_statistics_.loads_history` 表的时间间隔(秒)。该值在构造函数中乘以 1000 以设置 FrontendDaemon 间隔。同步器跳过第一次运行(以允许表创建)并且仅导入一分钟前完成的加载;较小的值会增加 DML 和执行器负载,而较大的值会延迟历史加载记录的可用性。有关目标表的保留/分区行为,请参见 `loads_history_retained_days`。 -- 引入版本:v3.3.6, v3.4.0, v3.5.0 +- 默认值: 60 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: LoadsHistorySyncer 用于调度从 `information_schema.loads` 到内部 `_statistics_.loads_history` 表的定期同步已完成加载作业的间隔(秒)。该值在构造函数中乘以 1000 以设置 FrontendDaemon 间隔。同步器跳过第一次运行(以允许表创建)并且只导入一分钟前完成的加载;小值会增加 DML 和执行器负载,而大值会延迟历史加载记录的可用性。有关目标表的保留/分区行为,请参阅 `loads_history_retained_days`。 +- 引入版本: v3.3.6, v3.4.0, v3.5.0 ##### max_broker_load_job_concurrency -- 默认值:5 -- 别名:async_load_task_pool_size -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:StarRocks 集群中允许的最大并发 Broker Load 作业数。此参数仅对 Broker Load 有效。此参数的值必须小于 `max_running_txn_num_per_db` 的值。从 v2.5 开始,默认值从 `10` 更改为 `5`。 -- 引入版本:- +- 默认值: 5 +- 别名: async_load_task_pool_size +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: StarRocks 集群中允许的最大并发 Broker Load 作业数。此参数仅对 Broker Load 有效。此参数的值必须小于 `max_running_txn_num_per_db` 的值。从 v2.5 开始,默认值从 `10` 更改为 `5`。 +- 引入版本: - ##### max_load_timeout_second -- 默认值:259200 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:加载作业允许的最大超时持续时间。如果超过此限制,加载作业将失败。此限制适用于所有类型的加载作业。 -- 引入版本:- +- 默认值: 259200 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 加载作业允许的最大超时持续时间。如果超过此限制,加载作业将失败。此限制适用于所有类型的加载作业。 +- 引入版本: - ##### max_routine_load_batch_size -- 默认值:4294967296 -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:Routine Load 任务可以加载的最大数据量。 -- 引入版本:- +- 默认值: 4294967296 +- 类型: Long +- 单位: Bytes +- 可变: Yes +- 描述: Routine Load 任务可加载的最大数据量。 +- 引入版本: - ##### max_routine_load_task_concurrent_num -- 默认值:5 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:每个 Routine Load 作业的最大并发任务数。 -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 每个 Routine Load 作业的最大并发任务数。 +- 引入版本: - ##### max_routine_load_task_num_per_be -- 默认值:16 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:每个 BE 上最大并发 Routine Load 任务数。从 v3.1.0 起,此参数的默认值从 5 增加到 16,并且不再需要小于或等于 BE 静态参数 `routine_load_thread_pool_size` (已弃用) 的值。 -- 引入版本:- +- 默认值: 16 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 每个 BE 上最大并发 Routine Load 任务数。自 v3.1.0 起,此参数的默认值从 5 增加到 16,并且不再需要小于或等于 BE 静态参数 `routine_load_thread_pool_size` (已弃用) 的值。 +- 引入版本: - ##### max_running_txn_num_per_db -- 默认值:1000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:StarRocks 集群中每个数据库允许运行的最大加载事务数。默认值为 `1000`。从 v3.1 开始,默认值从 `100` 更改为 `1000`。当数据库正在运行的加载事务的实际数量超过此参数的值时,将不处理新的加载请求。同步加载作业的新请求将被拒绝,异步加载作业的新请求将被放入队列。不建议增加此参数的值,因为这会增加系统负载。 -- 引入版本:- +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: StarRocks 集群中每个数据库允许运行的最大加载事务数。默认值为 `1000`。从 v3.1 开始,默认值从 `100` 更改为 `1000`。当数据库正在运行的加载事务的实际数量超过此参数的值时,将不处理新的加载请求。同步加载作业的新请求将被拒绝,异步加载作业的新请求将排队。我们不建议您增加此参数的值,因为这会增加系统负载。 +- 引入版本: - ##### max_stream_load_timeout_second -- 默认值:259200 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:Stream Load 作业允许的最大超时持续时间。 -- 引入版本:- +- 默认值: 259200 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: Stream Load 作业允许的最大超时持续时间。 +- 引入版本: - ##### max_tolerable_backend_down_num -- 默认值:0 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:允许的最大故障 BE 节点数。如果超过此数量,Routine Load 作业将无法自动恢复。 -- 引入版本:- +- Default: 0 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 允许的最大故障 BE 节点数。如果超过此数量,Routine Load 作业无法自动恢复。 +- 引入版本: - ##### min_bytes_per_broker_scanner -- 默认值:67108864 -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:Broker Load 实例可以处理的最小数据量。 -- 引入版本:- +- 默认值: 67108864 +- 类型: Long +- 单位: Bytes +- 可变: Yes +- 描述: Broker Load 实例可处理的最小数据量。 +- 引入版本: - ##### min_load_timeout_second -- 默认值:1 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:加载作业允许的最小超时持续时间。此限制适用于所有类型的加载作业。 -- 引入版本:- +- 默认值: 1 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 加载作业允许的最小超时持续时间。此限制适用于所有类型的加载作业。 +- 引入版本: - ##### min_routine_load_lag_for_metrics -- 默认值:10000 -- 类型:INT -- 单位:- -- 是否可变:是 -- 描述:Routine Load 作业在监控指标中显示的最小偏移量滞后。偏移量滞后大于此值的 Routine Load 作业将显示在指标中。 -- 引入版本:- +- 默认值: 10000 +- 类型: INT +- 单位: - +- 可变: Yes +- 描述: Routine Load 作业在监控指标中显示的最小偏移量滞后。偏移量滞后大于此值的 Routine Load 作业将显示在指标中。 +- 引入版本: - ##### period_of_auto_resume_min -- 默认值:5 -- 类型:Int -- 单位:分钟 -- 是否可变:是 -- 描述:Routine Load 作业自动恢复的时间间隔。 -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: Minutes +- 可变: Yes +- 描述: Routine Load 作业自动恢复的间隔。 +- 引入版本: - ##### prepared_transaction_default_timeout_second -- 默认值:86400 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:准备事务的默认超时持续时间。 -- 引入版本:- +- 默认值: 86400 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 准备事务的默认超时持续时间。 +- 引入版本: - ##### routine_load_task_consume_second -- 默认值:15 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:集群中每个 Routine Load 任务消耗数据的最长时间。自 v3.1.0 起,Routine Load 作业在 [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties) 中支持新参数 `task_consume_second`。此参数适用于 Routine Load 作业中的单个加载任务,更具灵活性。 -- 引入版本:- +- 默认值: 15 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 集群中每个 Routine Load 任务消耗数据的最长时间。自 v3.1.0 起,Routine Load 作业在 [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties) 中支持新参数 `task_consume_second`。此参数适用于 Routine Load 作业中的单个加载任务,更具灵活性。 +- 引入版本: - ##### routine_load_task_timeout_second -- 默认值:60 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:集群中每个 Routine Load 任务的超时持续时间。自 v3.1.0 起,Routine Load 作业在 [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties) 中支持新参数 `task_timeout_second`。此参数适用于 Routine Load 作业中的单个加载任务,更具灵活性。 -- 引入版本:- +- 默认值: 60 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 集群中每个 Routine Load 任务的超时持续时间。自 v3.1.0 起,Routine Load 作业在 [job_properties](../../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md#job_properties) 中支持新参数 `task_timeout_second`。此参数适用于 Routine Load 作业中的单个加载任务,更具灵活性。 +- 引入版本: - ##### routine_load_unstable_threshold_second -- 默认值:3600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:如果 Routine Load 作业中的任何任务滞后,则该作业被设置为 UNSTABLE 状态。具体而言,如果正在消费的消息的时间戳与当前时间之差超过此阈值,并且数据源中存在未消费的消息。 -- 引入版本:- +- 默认值: 3600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 如果 Routine Load 作业中的任何任务滞后,Routine Load 作业将设置为 UNSTABLE 状态。具体来说,如果正在消费的消息的时间戳与当前时间之间的差值超过此阈值,并且数据源中存在未消费的消息。 +- 引入版本: - ##### spark_dpp_version -- 默认值:1.0.0 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:使用的 Spark 动态分区修剪 (DPP) 版本。 -- 引入版本:- +- 默认值: 1.0.0 +- 类型: String +- 单位: - +- 可变: No +- 描述: 使用的 Spark 动态分区剪枝 (DPP) 版本。 +- 引入版本: - ##### spark_home_default_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/lib/spark2x" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Spark 客户端的根目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/lib/spark2x" +- 类型: String +- 单位: - +- 可变: No +- 描述: Spark 客户端的根目录。 +- 引入版本: - ##### spark_launcher_log_dir -- 默认值:sys_log_dir + "/spark_launcher_log" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储 Spark 日志文件的目录。 -- 引入版本:- +- 默认值: sys_log_dir + "/spark_launcher_log" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储 Spark 日志文件的目录。 +- 引入版本: - ##### spark_load_default_timeout_second -- 默认值:86400 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:每个 Spark Load 作业的超时持续时间。 -- 引入版本:- +- 默认值: 86400 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 每个 Spark Load 作业的超时持续时间。 +- 引入版本: - ##### spark_load_submit_timeout_second -- 默认值:300 -- 类型:long -- 单位:秒 -- 是否可变:否 -- 描述:提交 Spark 应用程序后等待 YARN 响应的最长时间(秒)。`SparkLauncherMonitor.LogMonitor` 将此值转换为毫秒,如果作业在 UNKNOWN/CONNECTED/SUBMITTED 状态停留时间超过此超时,它将停止监控并强制终止 Spark 启动器进程。`SparkLoadJob` 将此配置作为默认值读取,并允许通过 `LoadStmt.SPARK_LOAD_SUBMIT_TIMEOUT` 属性进行每个加载的覆盖。将其设置得足够高以适应 YARN 排队延迟;设置过低可能会中止合法排队的作业,而设置过高可能会延迟故障处理和资源清理。 -- 引入版本:v3.2.0 +- 默认值: 300 +- 类型: long +- 单位: Seconds +- 可变: No +- 描述: 提交 Spark 应用程序后等待 YARN 响应的最大时间(秒)。`SparkLauncherMonitor.LogMonitor` 将此值转换为毫秒,如果作业在 UNKNOWN/CONNECTED/SUBMITTED 状态停留时间超过此超时,它将停止监控并强制终止 Spark 启动器进程。`SparkLoadJob` 将此配置作为默认值读取,并通过 `LoadStmt.SPARK_LOAD_SUBMIT_TIMEOUT` 属性允许每个加载进行覆盖。将其设置得足够高以适应 YARN 排队延迟;设置得太低可能会中止合法排队的作业,而设置得太高可能会延迟故障处理和资源清理。 +- 引入版本: v3.2.0 ##### spark_resource_path -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Spark 依赖包的根目录。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: Spark 依赖包的根目录。 +- 引入版本: - ##### stream_load_default_timeout_second -- 默认值:600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:每个 Stream Load 作业的默认超时持续时间。 -- 引入版本:- +- 默认值: 600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 每个 Stream Load 作业的默认超时持续时间。 +- 引入版本: - ##### stream_load_max_txn_num_per_be -- 默认值:-1 -- 类型:Int -- 单位:事务 -- 是否可变:是 -- 描述:限制从单个 BE(后端)主机接受的并发 Stream Load 事务的数量。当设置为非负整数时,FrontendServiceImpl 检查 BE(按客户端 IP)的当前事务计数,如果计数 `>=` 此限制,则拒绝新的 Stream Load 开始请求。值 `< 0` 禁用限制(无限制)。此检查在 Stream Load 开始时发生,并且在超出时可能会导致 `streamload txn num per be exceeds limit` 错误。相关的运行时行为使用 `stream_load_default_timeout_second` 作为请求超时的回退。 -- 引入版本:v3.3.0, v3.4.0, v3.5.0 +- 默认值: -1 +- 类型: Int +- 单位: Transactions +- 可变: Yes +- 描述: 限制从单个 BE(后端)主机接受的并发 Stream Load 事务数。当设置为非负整数时,FrontendServiceImpl 检查 BE(按客户端 IP)的当前事务计数,如果计数 `>=` 此限制,则拒绝新的 Stream Load 开始请求。值 `< 0` 禁用限制(无限制)。此检查发生在 Stream Load 开始期间,如果超出,可能会导致 `streamload txn num per be exceeds limit` 错误。相关的运行时行为使用 `stream_load_default_timeout_second` 进行请求超时回退。 +- 引入版本: v3.3.0, v3.4.0, v3.5.0 ##### stream_load_task_keep_max_num -- 默认值:1000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:StreamLoadMgr 在内存中保留的 Stream Load 任务的最大数量(全局适用于所有数据库)。当跟踪任务 (`idToStreamLoadTask`) 的数量超过此阈值时,StreamLoadMgr 首先调用 `cleanSyncStreamLoadTasks()` 删除已完成的同步 Stream Load 任务;如果大小仍然大于此阈值的一半,它将调用 `cleanOldStreamLoadTasks(true)` 强制删除旧任务或已完成的任务。增加此值可在内存中保留更多任务历史记录;减小它可减少内存使用并使清理更具侵略性。此值仅控制内存中的保留,不影响持久化/重放的任务。 -- 引入版本:v3.2.0 +- 默认值: 1000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: StreamLoadMgr 在内存中(跨所有数据库全局)保留的 Stream Load 任务的最大数量。当跟踪任务的数量 (`idToStreamLoadTask`) 超过此阈值时,StreamLoadMgr 首先调用 `cleanSyncStreamLoadTasks()` 删除已完成的同步 Stream Load 任务;如果大小仍大于此阈值的一半,它将调用 `cleanOldStreamLoadTasks(true)` 强制删除更旧或已完成的任务。增加此值可在内存中保留更多任务历史记录;减少此值可减少内存使用并使清理更具侵略性。此值仅控制内存中的保留,不影响持久化/重播任务。 +- 引入版本: v3.2.0 ##### stream_load_task_keep_max_second -- 默认值:3 * 24 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:已完成或已取消的 Stream Load 任务的保留窗口。当任务达到最终状态且其结束时间戳早于此阈值 (`currentMs - endTimeMs > stream_load_task_keep_max_second * 1000`) 时,它将有资格由 `StreamLoadMgr.cleanOldStreamLoadTasks` 删除,并在加载持久化状态时丢弃。适用于 `StreamLoadTask` 和 `StreamLoadMultiStmtTask`。如果总任务计数超过 `stream_load_task_keep_max_num`,清理可能会更早触发(同步任务由 `cleanSyncStreamLoadTasks` 优先处理)。设置此值以平衡历史/可调试性与内存使用。 -- 引入版本:v3.2.0 +- 默认值: 3 * 24 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 已完成或取消的 Stream Load 任务的保留窗口。当任务达到最终状态且其结束时间戳早于此阈值 (`currentMs - endTimeMs > stream_load_task_keep_max_second * 1000`) 时,它将符合 `StreamLoadMgr.cleanOldStreamLoadTasks` 的删除条件,并在加载持久化状态时被丢弃。适用于 `StreamLoadTask` 和 `StreamLoadMultiStmtTask`。如果总任务计数超过 `stream_load_task_keep_max_num`,清理可能会更早触发(同步任务由 `cleanSyncStreamLoadTasks` 优先处理)。设置此项以平衡历史记录/可调试性与内存使用。 +- 引入版本: v3.2.0 ##### transaction_clean_interval_second -- 默认值:30 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:已完成事务的清理时间间隔。单位:秒。建议指定较短的时间间隔,以确保可以及时清理已完成的事务。 -- 引入版本:- +- 默认值: 30 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: 完成事务的清理时间间隔。单位:秒。建议您指定较短的时间间隔,以确保完成的事务能够及时清理。 +- 引入版本: - ##### transaction_stream_load_coordinator_cache_capacity -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:存储从事务标签到协调器节点映射的缓存容量。 -- 引入版本:- +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 存储从事务标签到协调器节点映射的缓存容量。 +- 引入版本: - ##### transaction_stream_load_coordinator_cache_expire_seconds -- 默认值:900 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:协调器映射在缓存中保留直到被驱逐(TTL)的时间。 -- 引入版本:- +- 默认值: 900 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 在缓存中保留协调器映射的时间(TTL),超过此时间将被逐出。 +- 引入版本: - ##### yarn_client_path -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Yarn 客户端包的根目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-client/hadoop/bin/yarn" +- 类型: String +- 单位: - +- 可变: No +- 描述: Yarn 客户端包的根目录。 +- 引入版本: - ##### yarn_config_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-config" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储 Yarn 配置文件的目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/lib/yarn-config" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储 Yarn 配置文件的目录。 +- 引入版本: - ### 统计报告 ##### enable_collect_warehouse_metrics -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `true` 时,系统将收集并导出每个仓库的指标。启用它会将仓库级别的指标(Slot/使用率/可用性)添加到指标输出中,并增加指标基数和收集开销。禁用它将省略特定于仓库的指标,并减少 CPU/网络和监控存储成本。 -- 引入版本:v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `true` 时,系统将收集并导出每个仓库的指标。启用它会将仓库级指标(槽/使用率/可用性)添加到指标输出中,并增加指标基数和收集开销。禁用它将省略仓库特定指标并减少 CPU/网络和监控存储成本。 +- 引入版本: v3.5.0 ##### enable_http_detail_metrics -- 默认值:false -- 类型:boolean -- 单位:- -- 是否可变:是 -- 描述:当为 true 时,HTTP 服务器计算并暴露详细的 HTTP worker 指标(特别是 `HTTP_WORKER_PENDING_TASKS_NUM` gauge)。启用此功能会导致服务器迭代 Netty worker 执行器并在每个 `NioEventLoop` 上调用 `pendingTasks()` 以汇总待处理任务计数;禁用时,gauge 返回 0 以避免该成本。此额外收集可能对 CPU 和延迟敏感——仅在调试或详细调查时才启用。 -- 引入版本:v3.2.3 +- 默认值: false +- 类型: boolean +- 单位: - +- 可变: Yes +- 描述: 当为 true 时,HTTP 服务器会计算并公开详细的 HTTP worker 指标(特别是 `HTTP_WORKER_PENDING_TASKS_NUM` 计数器)。启用此功能会导致服务器遍历 Netty worker 执行器并调用每个 `NioEventLoop` 上的 `pendingTasks()` 以汇总待处理任务计数;禁用时,该计数器返回 0 以避免该成本。这种额外的收集可能对 CPU 和延迟敏感——仅在调试或详细调查时启用。 +- 引入版本: v3.2.3 ##### proc_profile_collect_time_s -- 默认值:120 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:单个进程 Profile 收集的持续时间(秒)。当 `proc_profile_cpu_enable` 或 `proc_profile_mem_enable` 设置为 `true` 时,`AsyncProfiler` 会启动,收集器线程会睡眠此持续时间,然后 Profiler 停止并写入 Profile。较大的值会增加采样覆盖率和文件大小,但会延长 Profiler 运行时并延迟后续收集;较小的值会减少开销,但可能生成不足的样本。确保此值与 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 等保留设置对齐。 -- 引入版本:v3.2.12 +- 默认值: 120 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 单个进程 profile 收集的持续时间(秒)。当 `proc_profile_cpu_enable` 或 `proc_profile_mem_enable` 设置为 `true` 时,AsyncProfiler 启动,收集器线程休眠此持续时间,然后停止 profile 并写入 profile。较大的值会增加样本覆盖率和文件大小,但会延长 profile 运行时并延迟后续收集;较小的值会减少开销,但可能产生不足的样本。确保此值与 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 等保留设置保持一致。 +- 引入版本: v3.2.12 ### 存储 ##### alter_table_timeout_second -- 默认值:86400 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:Schema Change 操作 (ALTER TABLE) 的超时持续时间。 -- 引入版本:- +- 默认值: 86400 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: Schema 变更操作 (ALTER TABLE) 的超时持续时间。 +- 引入版本: - ##### capacity_used_percent_high_water -- 默认值:0.75 -- 类型:double -- 单位:分数 (0.0–1.0) -- 是否可变:是 -- 描述:计算后端负载分数时使用的磁盘容量使用百分比的高水位阈值(总容量的一部分)。`BackendLoadStatistic.calcSore` 使用 `capacity_used_percent_high_water` 设置 `LoadScore.capacityCoefficient`:如果后端使用百分比小于 0.5,则系数等于 0.5;如果使用百分比 `>` `capacity_used_percent_high_water`,则系数 = 1.0;否则系数通过 (2 * usedPercent - 0.5) 线性变化。当系数为 1.0 时,负载分数完全由容量比例驱动;较低的值会增加副本计数的权重。调整此值会改变平衡器惩罚磁盘利用率高的后端的积极程度。 -- 引入版本:v3.2.0 +- 默认值: 0.75 +- 类型: double +- 单位: Fraction (0.0–1.0) +- 可变: Yes +- 描述: 在计算后端负载分数时使用的磁盘容量使用率(总容量的百分比)高水位阈值。`BackendLoadStatistic.calcSore` 使用 `capacity_used_percent_high_water` 设置 `LoadScore.capacityCoefficient`:如果后端使用率小于 0.5,则系数等于 0.5;如果使用率 `>` `capacity_used_percent_high_water`,则系数等于 1.0;否则,系数通过 (2 * usedPercent - 0.5) 线性变化。当系数为 1.0 时,负载分数完全由容量比例驱动;较低的值会增加副本计数的权重。调整此值会改变均衡器惩罚磁盘利用率高的后端时的激进程度。 +- 引入版本: v3.2.0 ##### catalog_trash_expire_second -- 默认值:86400 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:数据库、表或分区删除后元数据可以保留的最长持续时间。如果此持续时间过期,数据将被删除,并且无法通过 [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) 命令恢复。 -- 引入版本:- +- 默认值: 86400 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 数据库、表或分区删除后,元数据可以保留的最长时间。如果超过此持续时间,数据将被删除,并且无法通过 [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) 命令恢复。 +- 引入版本: - ##### check_consistency_default_timeout_second -- 默认值:600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:副本一致性检查的超时持续时间。您可以根据 Tablet 的大小设置此参数。 -- 引入版本:- +- 默认值: 600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 副本一致性检查的超时持续时间。您可以根据 Tablet 的大小设置此参数。 +- 引入版本: - ##### consistency_check_cooldown_time_second -- 默认值:24 * 3600 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:控制同一 Tablet 之间一致性检查所需的最小间隔(秒)。在 Tablet 选择期间,Tablet 仅在 `tablet.getLastCheckTime()` 小于 `(currentTimeMillis - consistency_check_cooldown_time_second * 1000)` 时才被视为合格。默认值 (24 * 3600) 强制每个 Tablet 大约每天检查一次,以减少后端磁盘 I/O。降低此值会增加检查频率和资源使用;提高此值会以较慢检测不一致性为代价减少 I/O。该值在从索引的 Tablet 列表中过滤冷却的 Tablet 时全局应用。 -- 引入版本:v3.5.5 +- 默认值: 24 * 3600 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 控制同一 Tablet 两次一致性检查之间的最小间隔(秒)。在 Tablet 选择期间,只有当 `tablet.getLastCheckTime()` 小于 `(currentTimeMillis - consistency_check_cooldown_time_second * 1000)` 时,Tablet 才被视为符合条件。默认值 (24 * 3600) 强制每个 Tablet 大约每天检查一次,以减少后端磁盘 I/O。降低此值会增加检查频率和资源使用;提高此值会减少 I/O,但会降低不一致检测速度。该值在从索引的 Tablet 列表中过滤冷却的 Tablet 时全局应用。 +- 引入版本: v3.5.5 ##### consistency_check_end_time -- 默认值:"4" -- 类型:String -- 单位:一天中的小时 (0-23) -- 是否可变:否 -- 描述:指定 ConsistencyChecker 工作窗口的结束小时(一天中的小时)。该值在系统时区中使用 SimpleDateFormat("HH") 解析,并接受 0–23(一位或两位数字)。StarRocks 将其与 `consistency_check_start_time` 一起使用,以决定何时调度和添加一致性检查作业。当 `consistency_check_start_time` 大于 `consistency_check_end_time` 时,窗口跨越午夜(例如,默认值是 `consistency_check_start_time` = "23" 到 `consistency_check_end_time` = "4")。当 `consistency_check_start_time` 等于 `consistency_check_end_time` 时,检查器永不运行。解析失败将导致 FE 启动时记录错误并退出,因此请提供有效的小时字符串。 -- 引入版本:v3.2.0 +- 默认值: "4" +- 类型: String +- 单位: Hour of day (0-23) +- 可变: No +- 描述: 指定 ConsistencyChecker 工作窗口的结束小时(一天中的小时)。该值使用 SimpleDateFormat("HH") 在系统时区中解析,并接受 0-23(一位或两位数字)。StarRocks 将其与 `consistency_check_start_time` 一起使用,以决定何时调度和添加一致性检查作业。当 `consistency_check_start_time` 大于 `consistency_check_end_time` 时,窗口跨越午夜(例如,默认 `consistency_check_start_time` = "23" 到 `consistency_check_end_time` = "4")。当 `consistency_check_start_time` 等于 `consistency_check_end_time` 时,检查器永远不会运行。解析失败将导致 FE 启动记录错误并退出,因此请提供有效的小时字符串。 +- 引入版本: v3.2.0 ##### consistency_check_start_time -- 默认值:"23" -- 类型:String -- 单位:一天中的小时 (00-23) -- 是否可变:否 -- 描述:指定 ConsistencyChecker 工作窗口的开始小时(一天中的小时)。该值在系统时区中使用 SimpleDateFormat("HH") 解析,并接受 0–23(一位或两位数字)。StarRocks 将其与 `consistency_check_end_time` 一起使用,以决定何时调度和添加一致性检查作业。当 `consistency_check_start_time` 大于 `consistency_check_end_time` 时,窗口跨越午夜(例如,默认值是 `consistency_check_start_time` = "23" 到 `consistency_check_end_time` = "4")。当 `consistency_check_start_time` 等于 `consistency_check_end_time` 时,检查器永不运行。解析失败将导致 FE 启动时记录错误并退出,因此请提供有效的小时字符串。 -- 引入版本:v3.2.0 +- 默认值: "23" +- 类型: String +- 单位: Hour of day (00-23) +- 可变: No +- 描述: 指定 ConsistencyChecker 工作窗口的开始小时(一天中的小时)。该值使用 SimpleDateFormat("HH") 在系统时区中解析,并接受 0-23(一位或两位数字)。StarRocks 将其与 `consistency_check_end_time` 一起使用,以决定何时调度和添加一致性检查作业。当 `consistency_check_start_time` 大于 `consistency_check_end_time` 时,窗口跨越午夜(例如,默认 `consistency_check_start_time` = "23" 到 `consistency_check_end_time` = "4")。当 `consistency_check_start_time` 等于 `consistency_check_end_time` 时,检查器永远不会运行。解析失败将导致 FE 启动记录错误并退出,因此请提供有效的小时字符串。 +- 引入版本: v3.2.0 ##### consistency_tablet_meta_check_interval_ms -- 默认值:2 * 3600 * 1000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:ConsistencyChecker 用于在 `TabletInvertedIndex` 和 `LocalMetastore` 之间运行完整 Tablet 元数据一致性扫描的时间间隔。`runAfterCatalogReady` 中的守护进程在 `current time - lastTabletMetaCheckTime` 超过此值时触发 checkTabletMetaConsistency。当第一次检测到无效 Tablet 时,其 `toBeCleanedTime` 设置为 `now + (consistency_tablet_meta_check_interval_ms / 2)`,因此实际删除会延迟到后续扫描。增加此值可减少扫描频率和负载(清理速度变慢);减少此值可更快检测和删除陈旧 Tablet(开销更高)。 -- 引入版本:v3.2.0 +- 默认值: 2 * 3600 * 1000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: ConsistencyChecker 运行 `TabletInvertedIndex` 和 `LocalMetastore` 之间完整 Tablet 元数据一致性扫描的间隔。`runAfterCatalogReady` 中的守护程序在 `current time - lastTabletMetaCheckTime` 超过此值时触发 checkTabletMetaConsistency。当首次检测到无效 Tablet 时,其 `toBeCleanedTime` 设置为 `now + (consistency_tablet_meta_check_interval_ms / 2)`,因此实际删除延迟到后续扫描。增加此值可减少扫描频率和负载(清理速度变慢);减少此值可更快检测和删除过期 Tablet(开销更高)。 +- 引入版本: v3.2.0 ##### default_replication_num -- 默认值:3 -- 类型:Short -- 单位:- -- 是否可变:是 -- 描述:设置在 StarRocks 中创建表时每个数据分区的默认副本数。此设置可以在创建表时通过在 CREATE TABLE DDL 中指定 `replication_num=x` 进行覆盖。 -- 引入版本:- +- 默认值: 3 +- 类型: Short +- 单位: - +- 可变: Yes +- 描述: 设置在 StarRocks 中创建表时每个数据分区的默认副本数。在创建表时可以通过在 CREATE TABLE DDL 中指定 `replication_num=x` 来覆盖此设置。 +- 引入版本: - ##### enable_auto_tablet_distribution -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否自动设置桶的数量。 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否自动设置桶的数量。 - 如果此参数设置为 `TRUE`,则在创建表或添加分区时无需指定桶的数量。StarRocks 会自动确定桶的数量。 - - 如果此参数设置为 `FALSE`,则在创建表或添加分区时需要手动指定桶的数量。如果为表添加新分区时未指定桶计数,则新分区将继承表创建时设置的桶计数。但是,您也可以手动为新分区指定桶的数量。 -- 引入版本:v2.5.7 + - 如果此参数设置为 `FALSE`,则在创建表或添加分区时需要手动指定桶的数量。如果在向表添加新分区时未指定桶计数,则新分区将继承表创建时设置的桶计数。但是,您也可以手动为新分区指定桶的数量。 +- 引入版本: v2.5.7 ##### enable_experimental_rowstore -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用 [混合行-列存储](../../table_design/hybrid_table.md) 功能。 -- 引入版本:v3.2.3 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用 [行列混合存储](../../table_design/hybrid_table.md) 功能。 +- 引入版本: v3.2.3 ##### enable_fast_schema_evolution -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否为 StarRocks 集群中的所有表启用快速 Schema 演进。有效值为 `TRUE` 和 `FALSE`(默认值)。启用快速 Schema 演进可以提高 Schema 变更的速度,并减少添加或删除列时的资源使用。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为 StarRocks 集群中的所有表启用快速 Schema 演进。有效值为 `TRUE` 和 `FALSE`(默认)。启用快速 Schema 演进可以提高 Schema 变更的速度,并在添加或删除列时减少资源使用。 +- 引入版本: v3.2.0 > **注意** > > - StarRocks 共享数据集群从 v3.3.0 开始支持此参数。 -> - 如果您需要为特定表配置快速 Schema 演进,例如禁用特定表的快速 Schema 演进,您可以在表创建时设置表属性 [`fast_schema_evolution`](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md#set-fast-schema-evolution)。 +> - 如果需要为特定表配置快速 Schema 演进,例如禁用特定表的快速 Schema 演进,可以在表创建时设置表属性 [`fast_schema_evolution`](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md#set-fast-schema-evolution)。 ##### enable_online_optimize_table -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:控制 StarRocks 在创建优化作业时是否使用非阻塞的在线优化路径。当 `enable_online_optimize_table` 为 true 且目标表满足兼容性检查(无分区/键/排序规范,分布不是 `RandomDistributionDesc`,存储类型不是 `COLUMN_WITH_ROW`,复制存储已启用,并且表不是云原生表或物化视图)时,规划器创建 `OnlineOptimizeJobV2` 以执行优化而不阻塞写入。如果为 false 或任何兼容性条件失败,StarRocks 回退到 `OptimizeJobV2`,这可能会在优化期间阻塞写入操作。 -- 引入版本:v3.3.3, v3.4.0, v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 控制 StarRocks 在创建优化作业时是否使用非阻塞在线优化路径。当 `enable_online_optimize_table` 为 true 且目标表符合兼容性检查(无分区/键/排序规范,分布不是 `RandomDistributionDesc`,存储类型不是 `COLUMN_WITH_ROW`,启用复制存储,并且表不是云原生表或物化视图)时,规划器创建 `OnlineOptimizeJobV2` 以执行优化而不阻塞写入。如果为 false 或任何兼容性条件失败,StarRocks 回退到 `OptimizeJobV2`,这可能会在优化期间阻塞写入操作。 +- 引入版本: v3.3.3, v3.4.0, v3.5.0 ##### enable_strict_storage_medium_check -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:FE 在用户创建表时是否严格检查 BE 的存储介质。如果此参数设置为 `TRUE`,FE 在用户创建表时会检查 BE 的存储介质,如果 BE 的存储介质与 CREATE TABLE 语句中指定的 `storage_medium` 参数不同,则返回错误。例如,CREATE TABLE 语句中指定的存储介质是 SSD,但 BE 的实际存储介质是 HDD。结果是表创建失败。如果此参数为 `FALSE`,FE 在用户创建表时不会检查 BE 的存储介质。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: FE 在用户创建表时是否严格检查 BE 的存储介质。如果此参数设置为 `TRUE`,FE 在用户创建表时检查 BE 的存储介质,如果 BE 的存储介质与 CREATE TABLE 语句中指定的 `storage_medium` 参数不同,则返回错误。例如,CREATE TABLE 语句中指定的存储介质是 SSD,但 BE 的实际存储介质是 HDD。结果,表创建失败。如果此参数为 `FALSE`,FE 在用户创建表时不会检查 BE 的存储介质。 +- 引入版本: - ##### max_bucket_number_per_partition -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:一个分区中可以创建的最大桶数。 -- 引入版本:v3.3.2 +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 一个分区中可以创建的最大桶数。 +- 引入版本: v3.3.2 ##### max_column_number_per_table -- 默认值:10000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:一个表中可以创建的最大列数。 -- 引入版本:v3.3.2 +- 默认值: 10000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 一个表中可以创建的最大列数。 +- 引入版本: v3.3.2 ##### max_dynamic_partition_num -- 默认值:500 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:在分析或创建动态分区表时,一次可以创建的最大分区数。在动态分区属性验证期间,系统 task_runs_max_history_number 计算预期分区(结束偏移量 + 历史分区号),如果总数超过 `max_dynamic_partition_num`,则抛出 DDL 错误。仅当您期望合法的大的分区范围时才增加此值;增加它允许创建更多分区,但可能会增加元数据大小、调度工作和操作复杂性。 -- 引入版本:v3.2.0 +- 默认值: 500 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 限制在分析或创建动态分区表时可以一次创建的最大分区数。在动态分区属性验证期间,系统会计算预期分区数(结束偏移量 + 历史分区数),如果总数超过 `max_dynamic_partition_num`,则抛出 DDL 错误。仅当您期望合法的分区范围较大时才增加此值;增加它允许创建更多分区,但可能会增加元数据大小、调度工作和操作复杂性。 +- 引入版本: v3.2.0 ##### max_partition_number_per_table -- 默认值:100000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:一个表中可以创建的最大分区数。 -- 引入版本:v3.3.2 +- 默认值: 100000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 一个表中可以创建的最大分区数。 +- 引入版本: v3.3.2 ##### max_task_consecutive_fail_count -- 默认值:10 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:任务在调度器自动暂停之前可能连续失败的最大次数。当 `TaskSource.MV.equals(task.getSource())` 且 `max_task_consecutive_fail_count` 大于 0 时,如果任务的连续失败计数达到或超过 `max_task_consecutive_fail_count`,则任务通过 TaskManager 暂停,并且对于物化视图任务,物化视图将被置为非活动状态。将抛出异常指示暂停以及如何重新激活(例如 `ALTER MATERIALIZED VIEW ACTIVE`)。将此项设置为 0 或负值以禁用自动暂停。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 任务在调度器自动暂停它之前允许的最大连续失败次数。当 `TaskSource.MV.equals(task.getSource())` 并且 `max_task_consecutive_fail_count` 大于 0 时,如果任务的连续失败计数达到或超过 `max_task_consecutive_fail_count`,任务将通过 TaskManager 暂停,并且对于物化视图任务,物化视图将失效。抛出异常指示暂停以及如何重新激活(例如 `ALTER MATERIALIZED VIEW ACTIVE`)。将此项设置为 0 或负值以禁用自动暂停。 +- 引入版本: - ##### partition_recycle_retention_period_secs -- 默认值:1800 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:通过 INSERT OVERWRITE 或物化视图刷新操作删除的分区的元数据保留时间。请注意,此类元数据无法通过执行 [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) 恢复。 -- 引入版本:v3.5.9 +- 默认值: 1800 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 通过 INSERT OVERWRITE 或物化视图刷新操作删除的分区的元数据保留时间。请注意,此类元数据无法通过执行 [RECOVER](../../sql-reference/sql-statements/backup_restore/RECOVER.md) 恢复。 +- 引入版本: v3.5.9 ##### recover_with_empty_tablet -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否用空副本替换丢失或损坏的 Tablet 副本。如果 Tablet 副本丢失或损坏,对此 Tablet 或其他健康 Tablet 的数据查询可能会失败。用空 Tablet 替换丢失或损坏的 Tablet 副本可确保查询仍能执行。但是,结果可能不正确,因为数据已丢失。默认值为 `FALSE`,表示不替换丢失或损坏的 Tablet 副本,并且查询失败。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否用空 Tablet 替换丢失或损坏的 Tablet 副本。如果 Tablet 副本丢失或损坏,对此 Tablet 或其他健康 Tablet 的数据查询可能会失败。用空 Tablet 替换丢失或损坏的 Tablet 副本可以确保查询仍然可以执行。但是,结果可能不正确,因为数据已丢失。默认值为 `FALSE`,表示丢失或损坏的 Tablet 副本不替换为空副本,查询失败。 +- 引入版本: - ##### storage_usage_hard_limit_percent -- 默认值:95 -- 别名:storage_flood_stage_usage_percent -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:BE 目录存储使用百分比的硬限制。如果 BE 存储目录的存储使用率(百分比)超过此值且剩余存储空间小于 `storage_usage_hard_limit_reserve_bytes`,则加载和恢复作业将被拒绝。您需要将此项与 BE 配置项 `storage_flood_stage_usage_percent` 一起设置才能使配置生效。 -- 引入版本:- +- 默认值: 95 +- 别名: storage_flood_stage_usage_percent +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: BE 目录中存储使用率的硬限制(百分比)。如果 BE 存储目录的存储使用率(百分比)超过此值,并且剩余存储空间小于 `storage_usage_hard_limit_reserve_bytes`,则加载和恢复作业将被拒绝。您需要将此项与 BE 配置项 `storage_flood_stage_usage_percent` 一起设置才能使配置生效。 +- 引入版本: - ##### storage_usage_hard_limit_reserve_bytes -- 默认值:100 * 1024 * 1024 * 1024 -- 别名:storage_flood_stage_left_capacity_bytes -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:BE 目录剩余存储空间的硬限制。如果 BE 存储目录的剩余存储空间小于此值且存储使用率(百分比)超过 `storage_usage_hard_limit_percent`,则加载和恢复作业将被拒绝。您需要将此项与 BE 配置项 `storage_flood_stage_left_capacity_bytes` 一起设置才能使配置生效。 -- 引入版本:- +- 默认值: 100 * 1024 * 1024 * 1024 (2147483648) +- 别名: storage_flood_stage_left_capacity_bytes +- 类型: Long +- 单位: Bytes +- 可变: Yes +- 描述: BE 目录中剩余存储空间的硬限制。如果 BE 存储目录中剩余存储空间小于此值,并且存储使用率(百分比)超过 `storage_usage_hard_limit_percent`,则加载和恢复作业将被拒绝。您需要将此项与 BE 配置项 `storage_flood_stage_left_capacity_bytes` 一起设置才能使配置生效。 +- 引入版本: - ##### storage_usage_soft_limit_percent -- 默认值:90 -- 别名:storage_high_watermark_usage_percent -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:BE 目录存储使用百分比的软限制。如果 BE 存储目录的存储使用率(百分比)超过此值且剩余存储空间小于 `storage_usage_soft_limit_reserve_bytes`,则 Tablet 无法克隆到此目录。 -- 引入版本:- +- 默认值: 90 +- 别名: storage_high_watermark_usage_percent +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: BE 目录中存储使用率的软限制(百分比)。如果 BE 存储目录的存储使用率(百分比)超过此值,并且剩余存储空间小于 `storage_usage_soft_limit_reserve_bytes`,则不能将 Tablet 克隆到此目录。 +- 引入版本: - ##### storage_usage_soft_limit_reserve_bytes -- 默认值:200 * 1024 * 1024 * 1024 -- 别名:storage_min_left_capacity_bytes -- 类型:Long -- 单位:字节 -- 是否可变:是 -- 描述:BE 目录剩余存储空间的软限制。如果 BE 存储目录的剩余存储空间小于此值且存储使用率(百分比)超过 `storage_usage_soft_limit_percent`,则 Tablet 无法克隆到此目录。 -- 引入版本:- +- 默认值: 200 * 1024 * 1024 * 1024 +- 别名: storage_min_left_capacity_bytes +- 类型: Long +- 单位: Bytes +- 可变: Yes +- 描述: BE 目录中剩余存储空间的软限制。如果 BE 存储目录中剩余存储空间小于此值,并且存储使用率(百分比)超过 `storage_usage_soft_limit_percent`,则不能将 Tablet 克隆到此目录。 +- 引入版本: - ##### tablet_checker_lock_time_per_cycle_ms -- 默认值:1000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:Tablet 检查器在释放并重新获取表锁之前,每个周期最大锁持有时间。小于 100 的值将被视为 100。 -- 引入版本:v3.5.9, v4.0.2 +- 默认值: 1000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: Tablet 检查器在释放并重新获取表锁之前,每个周期内最大锁持有时间。小于 100 的值将被视为 100。 +- 引入版本: v3.5.9, v4.0.2 ##### tablet_create_timeout_second -- 默认值:10 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:创建 Tablet 的超时持续时间。从 v3.1 开始,默认值从 1 更改为 10。 -- 引入版本:- +- 默认值: 10 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 创建 Tablet 的超时持续时间。默认值从 v3.1 开始从 1 更改为 10。 +- 引入版本: - ##### tablet_delete_timeout_second -- 默认值:2 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:删除 Tablet 的超时持续时间。 -- 引入版本:- +- 默认值: 2 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 删除 Tablet 的超时持续时间。 +- 引入版本: - ##### tablet_sched_balance_load_disk_safe_threshold -- 默认值:0.5 -- 别名:balance_load_disk_safe_threshold -- 类型:Double -- 单位:- -- 是否可变:是 -- 描述:用于判断 BE 磁盘使用率是否均衡的百分比阈值。如果所有 BE 的磁盘使用率均低于此值,则认为均衡。如果磁盘使用率大于此值且最高与最低 BE 磁盘使用率之差大于 10%,则认为磁盘使用率不均衡并触发 Tablet 重均衡。 -- 引入版本:- +- 默认值: 0.5 +- 别名: balance_load_disk_safe_threshold +- 类型: Double +- 单位: - +- 可变: Yes +- 描述: 用于判断 BE 磁盘使用率是否均衡的百分比阈值。如果所有 BE 的磁盘使用率均低于此值,则认为均衡。如果磁盘使用率大于此值,并且最高和最低 BE 磁盘使用率之间的差异大于 10%,则认为磁盘使用率不均衡并触发 Tablet 重均衡。 +- 引入版本: - ##### tablet_sched_balance_load_score_threshold -- 默认值:0.1 -- 别名:balance_load_score_threshold -- 类型:Double -- 单位:- -- 是否可变:是 -- 描述:用于判断 BE 负载是否均衡的百分比阈值。如果 BE 的负载低于所有 BE 的平均负载且差值大于此值,则此 BE 处于低负载状态。反之,如果 BE 的负载高于平均负载且差值大于此值,则此 BE 处于高负载状态。 -- 引入版本:- +- 默认值: 0.1 +- 别名: balance_load_score_threshold +- 类型: Double +- 单位: - +- 可变: Yes +- 描述: 用于判断 BE 负载是否均衡的百分比阈值。如果 BE 的负载低于所有 BE 的平均负载,并且差异大于此值,则此 BE 处于低负载状态。相反,如果 BE 的负载高于平均负载,并且差异大于此值,则此 BE 处于高负载状态。 +- 引入版本: - ##### tablet_sched_be_down_tolerate_time_s -- 默认值:900 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:调度器允许 BE 节点保持非活动状态的最长持续时间。达到时间阈值后,该 BE 节点上的 Tablet 将迁移到其他活动 BE 节点。 -- 引入版本:v2.5.7 +- 默认值: 900 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 调度器允许 BE 节点保持非活动状态的最长时间。达到时间阈值后,该 BE 节点上的 Tablet 将迁移到其他活动 BE 节点。 +- 引入版本: v2.5.7 ##### tablet_sched_disable_balance -- 默认值:false -- 别名:disable_balance -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否禁用 Tablet 均衡。`TRUE` 表示禁用 Tablet 均衡。`FALSE` 表示启用 Tablet 均衡。 -- 引入版本:- +- 默认值: false +- 别名: disable_balance +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否禁用 Tablet 均衡。`TRUE` 表示禁用 Tablet 均衡。`FALSE` 表示启用 Tablet 均衡。 +- 引入版本: - ##### tablet_sched_disable_colocate_balance -- 默认值:false -- 别名:disable_colocate_balance -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否禁用 Colocate 表的副本均衡。`TRUE` 表示禁用副本均衡。`FALSE` 表示启用副本均衡。 -- 引入版本:- +- 默认值: false +- 别名: disable_colocate_balance +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否禁用 Colocate Table 的副本均衡。`TRUE` 表示禁用副本均衡。`FALSE` 表示启用副本均衡。 +- 引入版本: - ##### tablet_sched_max_balancing_tablets -- 默认值:500 -- 别名:max_balancing_tablets -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:可以同时均衡的 Tablet 的最大数量。如果超过此值,则跳过 Tablet 重均衡。 -- 引入版本:- +- 默认值: 500 +- 别名: max_balancing_tablets +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 可以同时均衡的 Tablet 的最大数量。如果超过此值,将跳过 Tablet 重均衡。 +- 引入版本: - ##### tablet_sched_max_clone_task_timeout_sec -- 默认值:2 * 60 * 60 -- 别名:max_clone_task_timeout_sec -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:克隆 Tablet 的最大超时持续时间。 -- 引入版本:- +- 默认值: 2 * 60 * 60 +- 别名: max_clone_task_timeout_sec +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 克隆 Tablet 的最大超时持续时间。 +- 引入版本: - ##### tablet_sched_max_not_being_scheduled_interval_ms -- 默认值:15 * 60 * 1000 -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:当正在调度 Tablet 克隆任务时,如果 Tablet 在此参数指定的时间内未被调度,StarRocks 会给予它更高的优先级以尽快调度。 -- 引入版本:- +- 默认值: 15 * 60 * 1000 +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: 当 Tablet 克隆任务正在调度时,如果 Tablet 在此参数中指定的时间内未被调度,StarRocks 将赋予它更高的优先级以尽快调度。 +- 引入版本: - ##### tablet_sched_max_scheduling_tablets -- 默认值:10000 -- 别名:max_scheduling_tablets -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:可以同时调度的 Tablet 的最大数量。如果超过此值,则跳过 Tablet 均衡和修复检查。 -- 引入版本:- +- 默认值: 10000 +- 别名: max_scheduling_tablets +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 可以同时调度的 Tablet 的最大数量。如果超过此值,将跳过 Tablet 均衡和修复检查。 +- 引入版本: - ##### tablet_sched_min_clone_task_timeout_sec -- 默认值:3 * 60 -- 别名:min_clone_task_timeout_sec -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:克隆 Tablet 的最小超时持续时间。 -- 引入版本:- +- 默认值: 3 * 60 +- 别名: min_clone_task_timeout_sec +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 克隆 Tablet 的最小超时持续时间。 +- 引入版本: - ##### tablet_sched_num_based_balance_threshold_ratio -- 默认值:0.5 -- 别名:- -- 类型:Double -- 单位:- -- 是否可变:是 -- 描述:基于数量的均衡可能会破坏磁盘大小均衡,但磁盘之间的最大差距不能超过 tablet_sched_num_based_balance_threshold_ratio * tablet_sched_balance_load_score_threshold。如果集群中有 Tablet 不断从 A 均衡到 B,B 均衡到 A,请减小此值。如果您希望 Tablet 分布更均衡,请增加此值。 -- 引入版本:- 3.1 +- 默认值: 0.5 +- 别名: - +- 类型: Double +- 单位: - +- 可变: Yes +- 描述: 基于数量的均衡可能会破坏磁盘大小均衡,但磁盘之间的最大差距不能超过 tablet_sched_num_based_balance_threshold_ratio * tablet_sched_balance_load_score_threshold。如果集群中有 Tablet 不断从 A 均衡到 B,又从 B 均衡到 A,请减小此值。如果您希望 Tablet 分布更均衡,请增加此值。 +- 引入版本: - 3.1 ##### tablet_sched_repair_delay_factor_second -- 默认值:60 -- 别名:tablet_repair_delay_factor_second -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:副本修复的时间间隔,以秒为单位。 -- 引入版本:- +- 默认值: 60 +- 别名: tablet_repair_delay_factor_second +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 副本修复的间隔,单位为秒。 +- 引入版本: - ##### tablet_sched_slot_num_per_path -- 默认值:8 -- 别名:schedule_slot_num_per_path -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:单个 BE 存储目录中可以并发运行的 Tablet 相关任务的最大数量。从 v2.5 开始,此参数的默认值从 `4` 更改为 `8`。 -- 引入版本:- +- 默认值: 8 +- 别名: schedule_slot_num_per_path +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 每个 BE 存储目录中可以并发运行的 Tablet 相关任务的最大数量。从 v2.5 开始,此参数的默认值从 `4` 更改为 `8`。 +- 引入版本: - ##### tablet_sched_storage_cooldown_second -- 默认值:-1 -- 别名:storage_cooldown_second -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:从表创建时间开始自动冷却的延迟。默认值 `-1` 指定禁用自动冷却。如果您想启用自动冷却,请将此参数设置为大于 `-1` 的值。 -- 引入版本:- +- 默认值: -1 +- 别名: storage_cooldown_second +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 从表创建时间开始的自动冷却延迟。默认值 `-1` 指定禁用自动冷却。如果需要启用自动冷却,请将此参数设置为大于 `-1` 的值。 +- 引入版本: - ##### tablet_stat_update_interval_second -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:否 -- 描述:FE 从每个 BE 检索 Tablet 统计信息的时间间隔。 -- 引入版本:- +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: No +- 描述: FE 从每个 BE 检索 Tablet 统计信息的时间间隔。 +- 引入版本: - -### 共享数据 +### 共享存储 ##### aws_s3_access_key -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于访问 S3 存储桶的 Access Key ID。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于访问 S3 存储桶的 Access Key ID。 +- 引入版本: v3.0 ##### aws_s3_endpoint -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于访问 S3 存储桶的 Endpoint,例如 `https://s3.us-west-2.amazonaws.com`。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于访问 S3 存储桶的端点,例如 `https://s3.us-west-2.amazonaws.com`。 +- 引入版本: v3.0 ##### aws_s3_external_id -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于跨账号访问 S3 存储桶的 AWS 账户的 External ID。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于跨账户访问 S3 存储桶的 AWS 账户的外部 ID。 +- 引入版本: v3.0 ##### aws_s3_iam_role_arn -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:在存储数据文件的 S3 存储桶上具有权限的 IAM 角色的 ARN。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 具有 S3 存储桶(存储数据文件)权限的 IAM 角色的 ARN。 +- 引入版本: v3.0 ##### aws_s3_path -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于存储数据的 S3 路径。它由 S3 存储桶的名称及其下的子路径(如果有)组成,例如 `testbucket/subpath`。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于存储数据的 S3 路径。它由 S3 存储桶的名称及其下的子路径(如果有)组成,例如 `testbucket/subpath`。 +- 引入版本: v3.0 ##### aws_s3_region -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:S3 存储桶所在的区域,例如 `us-west-2`。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: S3 存储桶所在的区域,例如 `us-west-2`。 +- 引入版本: v3.0 ##### aws_s3_secret_key -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于访问 S3 存储桶的 Secret Access Key。 -- 引入版本:v3.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于访问 S3 存储桶的 Secret Access Key。 +- 引入版本: v3.0 ##### aws_s3_use_aws_sdk_default_behavior -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否使用 AWS SDK 的默认认证凭证。有效值:true 和 false(默认)。 -- 引入版本:v3.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否使用 AWS SDK 的默认身份验证凭据。有效值: true 和 false (默认)。 +- 引入版本: v3.0 ##### aws_s3_use_instance_profile -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否使用 Instance Profile 和 Assumed Role 作为访问 S3 的凭证方法。有效值:true 和 false(默认)。 - - 如果您使用基于 IAM 用户的凭证(Access Key 和 Secret Key)访问 S3,则必须将此项指定为 `false`,并指定 `aws_s3_access_key` 和 `aws_s3_secret_key`。 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否使用 Instance Profile 和 Assumed Role 作为访问 S3 的凭证方法。有效值: true 和 false (默认)。 + - 如果您使用基于 IAM 用户(Access Key 和 Secret Key)的凭证访问 S3,则必须将此项指定为 `false`,并指定 `aws_s3_access_key` 和 `aws_s3_secret_key`。 - 如果您使用 Instance Profile 访问 S3,则必须将此项指定为 `true`。 - 如果您使用 Assumed Role 访问 S3,则必须将此项指定为 `true`,并指定 `aws_s3_iam_role_arn`。 - 如果您使用外部 AWS 账户,则还必须指定 `aws_s3_external_id`。 -- 引入版本:v3.0 +- 引入版本: v3.0 ##### azure_adls2_endpoint -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Azure Data Lake Storage Gen2 账户的 Endpoint,例如 `https://test.dfs.core.windows.net`。 -- 引入版本:v3.4.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: Azure Data Lake Storage Gen2 帐户的端点,例如 `https://test.dfs.core.windows.net`。 +- 引入版本: v3.4.1 ##### azure_adls2_oauth2_client_id -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 Azure Data Lake Storage Gen2 请求的托管标识的 Client ID。 -- 引入版本:v3.4.4 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 Azure Data Lake Storage Gen2 请求的托管身份的客户端 ID。 +- 引入版本: v3.4.4 ##### azure_adls2_oauth2_tenant_id -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 Azure Data Lake Storage Gen2 请求的托管标识的 Tenant ID。 -- 引入版本:v3.4.4 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 Azure Data Lake Storage Gen2 请求的托管身份的租户 ID。 +- 引入版本: v3.4.4 ##### azure_adls2_oauth2_use_managed_identity -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否使用托管标识授权 Azure Data Lake Storage Gen2 请求。 -- 引入版本:v3.4.4 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否使用托管身份授权 Azure Data Lake Storage Gen2 请求。 +- 引入版本: v3.4.4 ##### azure_adls2_path -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于存储数据的 Azure Data Lake Storage Gen2 路径。它由文件系统名称和目录名称组成,例如 `testfilesystem/starrocks`。 -- 引入版本:v3.4.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于存储数据的 Azure Data Lake Storage Gen2 路径。它由文件系统名称和目录名称组成,例如 `testfilesystem/starrocks`。 +- 引入版本: v3.4.1 ##### azure_adls2_sas_token -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 Azure Data Lake Storage Gen2 请求的共享访问签名 (SAS)。 -- 引入版本:v3.4.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 Azure Data Lake Storage Gen2 请求的共享访问签名 (SAS)。 +- 引入版本: v3.4.1 ##### azure_adls2_shared_key -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 Azure Data Lake Storage Gen2 请求的 Shared Key。 -- 引入版本:v3.4.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 Azure Data Lake Storage Gen2 请求的共享密钥。 +- 引入版本: v3.4.1 ##### azure_blob_endpoint -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:Azure Blob Storage 账户的 Endpoint,例如 `https://test.blob.core.windows.net`。 -- 引入版本:v3.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: Azure Blob Storage 帐户的端点,例如 `https://test.blob.core.windows.net`。 +- 引入版本: v3.1 ##### azure_blob_path -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于存储数据的 Azure Blob Storage 路径。它由存储账户中容器的名称及其下的子路径(如果有)组成,例如 `testcontainer/subpath`。 -- 引入版本:v3.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于存储数据的 Azure Blob Storage 路径。它由存储帐户中容器的名称及其下的子路径(如果有)组成,例如 `testcontainer/subpath`。 +- 引入版本: v3.1 ##### azure_blob_sas_token -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 Azure Blob Storage 请求的共享访问签名 (SAS)。 -- 引入版本:v3.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 Azure Blob Storage 请求的共享访问签名 (SAS)。 +- 引入版本: v3.1 ##### azure_blob_shared_key -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 Azure Blob Storage 请求的 Shared Key。 -- 引入版本:v3.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 Azure Blob Storage 请求的共享密钥。 +- 引入版本: v3.1 ##### azure_use_native_sdk -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否使用原生 SDK 访问 Azure Blob Storage,从而允许使用托管标识和服务主体进行身份验证。如果此项设置为 `false`,则仅允许使用 Shared Key 和 SAS Token 进行身份验证。 -- 引入版本:v3.4.4 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否使用原生 SDK 访问 Azure Blob Storage,从而允许使用托管身份和服务主体进行身份验证。如果此项设置为 `false`,则仅允许使用共享密钥和 SAS 令牌进行身份验证。 +- 引入版本: v3.4.4 ##### cloud_native_hdfs_url -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:HDFS 存储的 URL,例如 `hdfs://127.0.0.1:9000/user/xxx/starrocks/`。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: HDFS 存储的 URL,例如 `hdfs://127.0.0.1:9000/user/xxx/starrocks/`。 +- 引入版本: - ##### cloud_native_meta_port -- 默认值:6090 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:FE 云原生元数据服务器 RPC 监听端口。 -- 引入版本:- +- 默认值: 6090 +- 类型: Int +- 单位: - +- 可变: No +- 描述: FE 云原生元数据服务器 RPC 监听端口。 +- 引入版本: - ##### cloud_native_storage_type -- 默认值:S3 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:您使用的对象存储类型。在共享数据模式下,StarRocks 支持将数据存储在 HDFS、Azure Blob(从 v3.1.1 开始支持)、Azure Data Lake Storage Gen2(从 v3.4.1 开始支持)、Google Storage(使用原生 SDK,从 v3.5.1 开始支持)以及与 S3 协议兼容的对象存储系统(如 AWS S3 和 MinIO)。有效值:`S3`(默认)、`HDFS`、`AZBLOB`、`ADLS2` 和 `GS`。如果将此参数指定为 `S3`,则必须添加以 `aws_s3` 为前缀的参数。如果将此参数指定为 `AZBLOB`,则必须添加以 `azure_blob` 为前缀的参数。如果将此参数指定为 `ADLS2`,则必须添加以 `azure_adls2` 为前缀的参数。如果将此参数指定为 `GS`,则必须添加以 `gcp_gcs` 为前缀的参数。如果将此参数指定为 `HDFS`,则只需指定 `cloud_native_hdfs_url`。 -- 引入版本:- +- 默认值: S3 +- 类型: String +- 单位: - +- 可变: No +- 描述: 您使用的对象存储类型。在共享存储模式下,StarRocks 支持将数据存储在 HDFS、Azure Blob(从 v3.1.1 起支持)、Azure Data Lake Storage Gen2(从 v3.4.1 起支持)、Google Storage(使用原生 SDK,从 v3.5.1 起支持)以及与 S3 协议兼容的对象存储系统(如 AWS S3 和 MinIO)中。有效值: `S3` (默认), `HDFS`, `AZBLOB`, `ADLS2`, 和 `GS`。如果您将此参数指定为 `S3`,则必须添加以 `aws_s3` 为前缀的参数。如果您将此参数指定为 `AZBLOB`,则必须添加以 `azure_blob` 为前缀的参数。如果您将此参数指定为 `ADLS2`,则必须添加以 `azure_adls2` 为前缀的参数。如果您将此参数指定为 `GS`,则必须添加以 `gcp_gcs` 为前缀的参数。如果您将此参数指定为 `HDFS`,则只需指定 `cloud_native_hdfs_url`。 +- 引入版本: - ##### enable_load_volume_from_conf -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否允许 StarRocks 使用 FE 配置文件中指定的对象存储相关属性创建内置存储卷。默认值从 v3.4.1 开始从 `true` 更改为 `false`。 -- 引入版本:v3.1.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否允许 StarRocks 使用 FE 配置文件中指定的对象存储相关属性创建内置存储卷。从 v3.4.1 开始,默认值从 `true` 更改为 `false`。 +- 引入版本: v3.1.0 ##### gcp_gcs_impersonation_service_account -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:如果您使用基于模拟的身份验证访问 Google Storage,您想要模拟的服务账户。 -- 引入版本:v3.5.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 如果您使用基于模拟的身份验证访问 Google Storage,您希望模拟的服务帐户。 +- 引入版本: v3.5.1 ##### gcp_gcs_path -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于存储数据的 Google Cloud 路径。它由 Google Cloud 存储桶的名称及其下的子路径(如果有)组成,例如 `testbucket/subpath`。 -- 引入版本:v3.5.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于存储数据的 Google Cloud 路径。它由 Google Cloud 存储桶的名称及其下的子路径(如果有)组成,例如 `testbucket/subpath`。 +- 引入版本: v3.5.1 ##### gcp_gcs_service_account_email -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:在创建服务账户时生成的 JSON 文件中的电子邮件地址,例如 `user@hello.iam.gserviceaccount.com`。 -- 引入版本:v3.5.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 服务帐户创建时生成的 JSON 文件中的电子邮件地址,例如 `user@hello.iam.gserviceaccount.com`。 +- 引入版本: v3.5.1 ##### gcp_gcs_service_account_private_key -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:在创建服务账户时生成的 JSON 文件中的私钥,例如 `-----BEGIN PRIVATE KEY----xxxx-----END PRIVATE KEY-----\n`。 -- 引入版本:v3.5.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 服务帐户创建时生成的 JSON 文件中的私钥,例如 `-----BEGIN PRIVATE KEY----xxxx-----END PRIVATE KEY-----\n`。 +- 引入版本: v3.5.1 ##### gcp_gcs_service_account_private_key_id -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:在创建服务账户时生成的 JSON 文件中的私钥 ID。 -- 引入版本:v3.5.1 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 服务帐户创建时生成的 JSON 文件中的私钥 ID。 +- 引入版本: v3.5.1 ##### gcp_gcs_use_compute_engine_service_account -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:是否使用绑定到 Compute Engine 的服务账户。 -- 引入版本:v3.5.1 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 是否使用绑定到 Compute Engine 的服务帐户。 +- 引入版本: v3.5.1 ##### hdfs_file_system_expire_seconds -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:HdfsFsManager 管理的未使用缓存 HDFS/ObjectStore FileSystem 的生存时间(秒)。FileSystemExpirationChecker(每 60 秒运行一次)使用此值调用每个 HdfsFs.isExpired(...);当过期时,管理器关闭底层 FileSystem 并将其从缓存中删除。访问器方法(例如 `HdfsFs.getDFSFileSystem`、`getUserName`、`getConfiguration`)会更新最后访问时间戳,因此过期是基于不活动状态。较小的值会减少空闲资源占用,但会增加重新打开的开销;较大的值会使句柄保留更长时间,并可能消耗更多资源。 -- 引入版本:v3.2.0 +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 由 HdfsFsManager 管理的未使用缓存 HDFS/ObjectStore FileSystem 的生存时间(秒)。FileSystemExpirationChecker(每 60 秒运行一次)使用此值调用每个 HdfsFs.isExpired(...);过期时,管理器关闭底层 FileSystem 并将其从缓存中删除。访问器方法(例如 `HdfsFs.getDFSFileSystem`、`getUserName`、`getConfiguration`)更新上次访问时间戳,因此过期基于不活动。较低的值会减少空闲资源持有但增加重新打开开销;较高的值会使句柄保持更长时间并可能消耗更多资源。 +- 引入版本: v3.2.0 ##### lake_autovacuum_grace_period_minutes -- 默认值:30 -- 类型:Long -- 单位:分钟 -- 是否可变:是 -- 描述:共享数据集群中保留历史数据版本的时间范围。在此时间范围内的历史数据版本在 Compaction 后不会通过 AutoVacuum 自动清理。您需要将此值设置得大于最大查询时间,以避免正在运行的查询访问的数据在查询完成前被删除。默认值从 v3.3.0、v3.2.5 和 v3.1.10 开始已从 `5` 更改为 `30`。 -- 引入版本:v3.1.0 +- 默认值: 30 +- 类型: Long +- 单位: Minutes +- 可变: Yes +- 描述: 共享存储集群中历史数据版本的保留时间范围。在此时间范围内的历史数据版本在 Compactions 后不会通过 AutoVacuum 自动清理。您需要将此值设置为大于最大查询时间,以避免正在运行的查询访问的数据在查询完成之前被删除。从 v3.3.0、v3.2.5 和 v3.1.10 开始,默认值已从 `5` 更改为 `30`。 +- 引入版本: v3.1.0 ##### lake_autovacuum_parallel_partitions -- 默认值:8 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:共享数据集群中可以同时进行 AutoVacuum 的最大分区数。AutoVacuum 是 Compaction 后的垃圾回收。 -- 引入版本:v3.1.0 +- 默认值: 8 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 共享存储集群中可以同时进行 AutoVacuum 的分区最大数量。AutoVacuum 是 Compactions 后的垃圾收集。 +- 引入版本: v3.1.0 ##### lake_autovacuum_partition_naptime_seconds -- 默认值:180 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:共享数据集群中同一分区上 AutoVacuum 操作之间的最小间隔。 -- 引入版本:v3.1.0 +- 默认值: 180 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: 共享存储集群中同一分区两次 AutoVacuum 操作之间的最小间隔。 +- 引入版本: v3.1.0 ##### lake_autovacuum_stale_partition_threshold -- 默认值:12 -- 类型:Long -- 单位:小时 -- 是否可变:是 -- 描述:如果分区在此时间范围内没有更新(加载、DELETE 或 Compaction),系统将不会对此分区执行 AutoVacuum。 -- 引入版本:v3.1.0 +- 默认值: 12 +- 类型: Long +- 单位: Hours +- 可变: Yes +- 描述: 如果分区在此时间范围内没有更新(加载、DELETE 或 Compactions),系统将不会对此分区执行 AutoVacuum。 +- 引入版本: v3.1.0 ##### lake_compaction_allow_partial_success -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:如果此项设置为 `true`,系统将认为共享数据集群中的 Compaction 操作在其中一个子任务成功时即为成功。 -- 引入版本:v3.5.2 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 如果此项设置为 `true`,系统将认为共享数据集群中的 Compaction 操作在其中一个子任务成功时为成功。 +- 引入版本: v3.5.2 ##### lake_compaction_disable_ids -- 默认值:"" -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:在共享数据模式下禁用 Compaction 的表或分区列表。格式为 `tableId1;partitionId2`,以分号分隔,例如 `12345;98765`。 -- 引入版本:v3.4.4 +- 默认值: "" +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 共享数据模式下禁用 Compaction 的表或分区列表。格式为 `tableId1;partitionId2`,以分号分隔,例如 `12345;98765`。 +- 引入版本: v3.4.4 ##### lake_compaction_history_size -- 默认值:20 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:共享数据集群中 Leader FE 节点内存中保留的最近成功 Compaction 任务记录的数量。您可以使用 `SHOW PROC '/compactions'` 命令查看最近成功的 Compaction 任务记录。请注意,Compaction 历史记录存储在 FE 进程内存中,如果 FE 进程重启,它将丢失。 -- 引入版本:v3.1.0 +- 默认值: 20 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 共享数据集群中 Leader FE 节点内存中保留的最近成功 Compaction 任务记录数。您可以使用 `SHOW PROC '/compactions'` 命令查看最近成功的 Compaction 任务记录。请注意,Compaction 历史记录存储在 FE 进程内存中,如果 FE 进程重新启动,它将丢失。 +- 引入版本: v3.1.0 ##### lake_compaction_max_tasks -- 默认值:-1 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:共享数据集群中允许的最大并发 Compaction 任务数。将此项设置为 `-1` 表示以自适应方式计算并发任务数。将此值设置为 `0` 将禁用 Compaction。 -- 引入版本:v3.1.0 +- 默认值: -1 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 共享数据集群中允许的最大并发 Compaction 任务数。将此项设置为 `-1` 表示以自适应方式计算并发任务数。将此值设置为 `0` 将禁用 Compaction。 +- 引入版本: v3.1.0 ##### lake_compaction_score_selector_min_score -- 默认值:10.0 -- 类型:Double -- 单位:- -- 是否可变:是 -- 描述:触发共享数据集群中 Compaction 操作的 Compaction Score 阈值。当分区的 Compaction Score 大于或等于此值时,系统会对该分区执行 Compaction。 -- 引入版本:v3.1.0 +- 默认值: 10.0 +- 类型: Double +- 单位: - +- 可变: Yes +- 描述: 触发共享数据集群中 Compaction 操作的 Compaction Score 阈值。当分区的 Compaction Score 大于或等于此值时,系统对该分区执行 Compaction。 +- 引入版本: v3.1.0 ##### lake_compaction_score_upper_bound -- 默认值:2000 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:共享数据集群中分区 Compaction Score 的上限。`0` 表示无上限。此项仅在 `lake_enable_ingest_slowdown` 设置为 `true` 时生效。当分区的 Compaction Score 达到或超过此上限时,传入的加载任务将被拒绝。从 v3.3.6 开始,默认值从 `0` 更改为 `2000`。 -- 引入版本:v3.2.0 +- 默认值: 2000 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 共享数据集群中分区 Compaction Score 的上限。`0` 表示无上限。此项仅在 `lake_enable_ingest_slowdown` 设置为 `true` 时生效。当分区的 Compaction Score 达到或超过此上限时,传入的加载任务将被拒绝。从 v3.3.6 开始,默认值从 `0` 更改为 `2000`。 +- 引入版本: v3.2.0 ##### lake_enable_balance_tablets_between_workers -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:在共享数据集群中云原生表的 Tablet 迁移期间,是否在 Compute 节点之间平衡 Tablet 的数量。`true` 表示在 Compute 节点之间平衡 Tablet,`false` 表示禁用此功能。 -- 引入版本:v3.3.4 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 在共享存储集群中云原生表 Tablet 迁移期间,是否在计算节点之间均衡 Tablet 数量。`true` 表示在计算节点之间均衡 Tablet,`false` 表示禁用此功能。 +- 引入版本: v3.3.4 ##### lake_enable_ingest_slowdown -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在共享数据集群中启用数据摄取减速。当数据摄取减速启用时,如果分区的 Compaction Score 超过 `lake_ingest_slowdown_threshold`,则该分区上的加载任务将被限流。此配置仅在 `run_mode` 设置为 `shared_data` 时生效。从 v3.3.6 开始,默认值从 `false` 更改为 `true`。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否在共享数据集群中启用数据摄取减速。当数据摄取减速启用时,如果分区的 Compaction 分数超过 `lake_ingest_slowdown_threshold`,则该分区上的加载任务将被限制。此配置仅在 `run_mode` 设置为 `shared_data` 时生效。从 v3.3.6 开始,默认值从 `false` 更改为 `true`。 +- 引入版本: v3.2.0 ##### lake_ingest_slowdown_threshold -- 默认值:100 -- 类型:Long -- 单位:- -- 是否可变:是 -- 描述:触发共享数据集群中数据摄取减速的 Compaction Score 阈值。此配置仅在 `lake_enable_ingest_slowdown` 设置为 `true` 时生效。 -- 引入版本:v3.2.0 +- 默认值: 100 +- 类型: Long +- 单位: - +- 可变: Yes +- 描述: 共享数据集群中触发数据摄取减速的 Compaction Score 阈值。此配置仅在 `lake_enable_ingest_slowdown` 设置为 `true` 时生效。 +- 引入版本: v3.2.0 ##### lake_publish_version_max_threads -- 默认值:512 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:共享数据集群中版本发布任务的最大线程数。 -- 引入版本:v3.2.0 +- 默认值: 512 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 共享数据集群中版本发布任务的最大线程数。 +- 引入版本: v3.2.0 ##### meta_sync_force_delete_shard_meta -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否允许直接删除共享数据集群的元数据,绕过清理远程存储文件。建议仅在需要清理的 shard 过多,导致 FE JVM 内存压力过大时,才将此项设置为 `true`。请注意,启用此功能后,属于 shard 或 Tablet 的数据文件无法自动清理。 -- 引入版本:v3.2.10, v3.3.3 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否允许直接删除共享存储集群的元数据,绕过清理远程存储文件。建议仅在需要清理的 shard 数量过多导致 FE JVM 内存压力过大时才将此项设置为 `true`。请注意,启用此功能后,属于 shard 或 tablet 的数据文件无法自动清理。 +- 引入版本: v3.2.10, v3.3.3 ##### run_mode -- 默认值:shared_nothing -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:StarRocks 集群的运行模式。有效值:`shared_data` 和 `shared_nothing`(默认)。 +- 默认值: shared_nothing +- 类型: String +- 单位: - +- 可变: No +- 描述: StarRocks 集群的运行模式。有效值: `shared_data` 和 `shared_nothing` (默认)。 - `shared_data` 表示以共享数据模式运行 StarRocks。 - - `shared_nothing` 表示以 Shared-Nothing 模式运行 StarRocks。 + - `shared_nothing` 表示以无共享模式运行 StarRocks。 > **注意** > > - StarRocks 集群不能同时采用 `shared_data` 和 `shared_nothing` 模式。不支持混合部署。 - > - 集群部署后,请勿更改 `run_mode`。否则,集群将无法重启。不支持从 Shared-Nothing 集群转换为共享数据集群,反之亦然。 + > - 集群部署后,请勿更改 `run_mode`。否则,集群将无法重新启动。不支持从无共享集群转换为共享数据集群,反之亦然。 -- 引入版本:- +- 引入版本: - ##### shard_group_clean_threshold_sec -- 默认值:3600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:FE 清理共享数据集群中未使用的 Tablet 和 shard 组的时间。在此阈值内创建的 Tablet 和 shard 组将不会被清理。 -- 引入版本:- +- 默认值: 3600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: FE 清理共享数据集群中未使用 Tablet 和 shard 组的时间。在此阈值内创建的 Tablet 和 shard 组不会被清理。 +- 引入版本: - ##### star_mgr_meta_sync_interval_sec -- 默认值:600 -- 类型:Long -- 单位:秒 -- 是否可变:否 -- 描述:FE 与共享数据集群中的 StarMgr 运行周期性元数据同步的时间间隔。 -- 引入版本:- +- 默认值: 600 +- 类型: Long +- 单位: Seconds +- 可变: No +- 描述: FE 在共享数据集群中与 StarMgr 进行周期性元数据同步的间隔。 +- 引入版本: - ##### starmgr_grpc_server_max_worker_threads -- 默认值:1024 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE starmgr 模块中 grpc 服务器使用的最大工作线程数。 -- 引入版本:v4.0.0, v3.5.8 +- 默认值: 1024 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE starmgr 模块中 grpc 服务器使用的最大工作线程数。 +- 引入版本: v4.0.0, v3.5.8 ##### starmgr_grpc_timeout_seconds -- 默认值:5 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述: -- 引入版本:- +- 默认值: 5 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: +- 引入版本: - ### 数据湖 ##### files_enable_insert_push_down_schema -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用时,分析器将尝试将目标表 Schema 推入 `files()` 表函数,用于 INSERT ... FROM files() 操作。这仅适用于源是 FileTableFunctionRelation、目标是原生表且 SELECT 列表中包含相应的 slot-ref 列(或 *)的情况。分析器会将选择列与目标列匹配(计数必须匹配),短暂锁定目标表,并用非复杂类型(Parquet JSON `->` `array` 等复杂类型将被跳过)的深拷贝目标列类型替换文件列类型。保留原始文件表的列名。这减少了摄取期间由于文件类型推断导致的类型不匹配和宽松性。 -- 引入版本:v3.4.0, v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用后,分析器将尝试将目标表 Schema 推入 `files()` 表函数,用于 INSERT ... FROM files() 操作。这仅适用于源是 FileTableFunctionRelation、目标是原生表且 SELECT 列表中包含相应的 slot-ref 列(或 *)的情况。分析器将匹配选择列到目标列(计数必须匹配),短暂锁定目标表,并用深拷贝的目标列类型替换非复杂类型的文件列类型(Parquet JSON `->` `array` 等复杂类型被跳过)。原始文件表中的列名保留。这减少了摄取期间文件类型推断导致的类型不匹配和松散性。 +- 引入版本: v3.4.0, v3.5.0 ##### hdfs_read_buffer_size_kb -- 默认值:8192 -- 类型:Int -- 单位:千字节 -- 是否可变:是 -- 描述:HDFS 读取缓冲区的大小(千字节)。StarRocks 将此值转换为字节 (`<< 10`),并将其用于在 `HdfsFsManager` 中初始化 HDFS 读取缓冲区,并在不使用 Broker 访问时填充发送到 BE 任务的 thrift 字段 `hdfs_read_buffer_size_kb`(例如 `TBrokerScanRangeParams`、`TDownloadReq`)。增加 `hdfs_read_buffer_size_kb` 可以提高顺序读取吞吐量并减少系统调用开销,但会以更高的每流内存使用为代价;减少它会减少内存占用,但可能会降低 I/O 效率。调整时考虑工作负载(许多小流与少量大顺序读取)。 -- 引入版本:v3.2.0 +- 默认值: 8192 +- 类型: Int +- 单位: Kilobytes +- 可变: Yes +- 描述: HDFS 读取缓冲区大小(KB)。StarRocks 将此值转换为字节 (`<< 10`),并用于在 `HdfsFsManager` 中初始化 HDFS 读取缓冲区,并在不使用 Broker 访问时填充发送到 BE 任务的 Thrift 字段 `hdfs_read_buffer_size_kb`(例如 `TBrokerScanRangeParams`、`TDownloadReq`)。增加 `hdfs_read_buffer_size_kb` 可以提高顺序读取吞吐量并减少系统调用开销,但代价是每个流的内存使用量更高;减小它会减少内存占用但可能降低 I/O 效率。调整时考虑工作负载(许多小流与少数大顺序读取)。 +- 引入版本: v3.2.0 ##### hdfs_write_buffer_size_kb -- 默认值:1024 -- 类型:Int -- 单位:千字节 -- 是否可变:是 -- 描述:设置用于直接写入 HDFS 或对象存储时(不使用 Broker)的 HDFS 写入缓冲区大小(KB)。FE 将此值转换为字节 (`<< 10`),并在 HdfsFsManager 中初始化本地写入缓冲区,并将其传播到 Thrift 请求中(例如 TUploadReq、TExportSink、Sink Options),以便后端/代理使用相同的缓冲区大小。增加此值可以提高大型顺序写入的吞吐量,但会以每个写入器更多内存为代价;减少它会减少每个流的内存使用,并可能降低小型写入的延迟。与 `hdfs_read_buffer_size_kb` 一起调整,并考虑可用内存和并发写入器。 -- 引入版本:v3.2.0 +- 默认值: 1024 +- 类型: Int +- 单位: Kilobytes +- 可变: Yes +- 描述: 设置用于直接写入 HDFS 或对象存储(不使用 Broker 时)的 HDFS 写入缓冲区大小(KB)。FE 将此值转换为字节 (`<< 10`),并在 HdfsFsManager 中初始化本地写入缓冲区,并将其传播到 Thrift 请求中(例如 TUploadReq、TExportSink、Sink 选项),以便后端/代理使用相同的缓冲区大小。增加此值可以提高大型顺序写入的吞吐量,但代价是每个写入器需要更多内存;减小此值可以减少每个流的内存使用量,并可能降低小型写入的延迟。与 `hdfs_read_buffer_size_kb` 一起调整,并考虑可用内存和并发写入器。 +- 引入版本: v3.2.0 ##### lake_batch_publish_max_version_num -- 默认值:10 -- 类型:Int -- 单位:计数 -- 是否可变:是 -- 描述:设置构建用于湖表(云原生表)的发布批次时,可能分组在一起的连续事务版本的上限。该值传递给事务图批处理例程(参见 getReadyToPublishTxnListBatch),并与 `lake_batch_publish_min_version_num` 协同工作,以确定 TransactionStateBatch 的候选范围大小。较大的值可以通过批处理更多提交来提高发布吞吐量,但会增加原子发布范围(更长的可见性延迟和更大的回滚表面),并且当版本不连续时可能会在运行时受到限制。根据工作负载和可见性/延迟要求进行调整。 -- 引入版本:v3.2.0 +- 默认值: 10 +- 类型: Int +- 单位: Count +- 可变: Yes +- 描述: 设置在为 Lake(云原生)表构建发布批次时,可以分组的连续事务版本的上限。该值传递给事务图批处理例程(参见 getReadyToPublishTxnListBatch),并与 `lake_batch_publish_min_version_num` 协同工作,以确定 TransactionStateBatch 的候选范围大小。较大的值可以通过批处理更多提交来提高发布吞吐量,但会增加原子发布的范围(更长的可见性延迟和更大的回滚表面),并且当版本不连续时可能会在运行时受到限制。根据工作负载和可见性/延迟要求进行调整。 +- 引入版本: v3.2.0 ##### lake_batch_publish_min_version_num -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:设置形成湖表发布批次所需的最小连续事务版本数。DatabaseTransactionMgr.getReadyToPublishTxnListBatch 将此值与 `lake_batch_publish_max_version_num` 一起传递给 transactionGraph.getTxnsWithTxnDependencyBatch 以选择依赖事务。值为 `1` 允许单事务发布(不批处理)。值 `>1` 要求至少有那么多连续版本、单表、非复制事务可用;如果版本不连续、出现复制事务或 Schema Change 消耗了一个版本,则批处理将中止。增加此值可以通过分组提交来提高发布吞吐量,但可能会在等待足够多的连续事务时延迟发布。 -- 引入版本:v3.2.0 +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 设置构成 Lake 表发布批次所需的最小连续事务版本数。DatabaseTransactionMgr.getReadyToPublishTxnListBatch 将此值与 `lake_batch_publish_max_version_num` 一起传递给 transactionGraph.getTxnsWithTxnDependencyBatch,以选择依赖事务。值为 `1` 允许单事务发布(不进行批处理)。值 `>1` 要求至少有那么多连续版本、单表、非复制事务可用;如果版本不连续、出现复制事务或 Schema 更改消耗了某个版本,则批处理将中止。增加此值可以通过分组提交来提高发布吞吐量,但可能会在等待足够多的连续事务时延迟发布。 +- 引入版本: v3.2.0 ##### lake_enable_batch_publish_version -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:启用后,PublishVersionDaemon 将同一湖(共享数据)表/分区的就绪事务进行批处理,并将其版本一起发布,而不是发布每个事务。在 Shared-Data 模式下,守护进程调用 getReadyPublishTransactionsBatch() 并使用 publishVersionForLakeTableBatch(...) 执行分组发布操作(减少 RPC 并提高吞吐量)。禁用时,守护进程回退到通过 publishVersionForLakeTable(...) 进行每个事务发布。该实现使用内部集合协调正在进行的工作,以避免在切换开关时重复发布,并受 `lake_publish_version_max_threads` 的线程池大小影响。 -- 引入版本:v3.2.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 启用后,PublishVersionDaemon 将为同一 Lake(共享数据)表/分区批处理就绪事务,并将其版本一起发布,而不是单独发布每个事务。在 RunMode 共享数据中,守护程序调用 getReadyPublishTransactionsBatch() 并使用 publishVersionForLakeTableBatch(...) 执行分组发布操作(减少 RPC 并提高吞吐量)。禁用时,守护程序回退到通过 publishVersionForLakeTable(...) 进行的每个事务发布。实现使用内部集合协调正在进行的工作,以避免在切换开关时重复发布,并受 `lake_publish_version_max_threads` 线程池大小的影响。 +- 引入版本: v3.2.0 ##### lake_enable_tablet_creation_optimization -- 默认值:false -- 类型:boolean -- 单位:- -- 是否可变:是 -- 描述:启用后,StarRocks 在共享数据模式下为云原生表和物化视图优化 Tablet 创建,方法是为物理分区下的所有 Tablet 创建单个共享 Tablet 元数据,而不是为每个 Tablet 创建不同的元数据。这减少了表创建、Rollup 和 Schema Change 作业期间创建的 Tablet 任务和元数据/文件数量。此优化仅适用于云原生表/物化视图,并与 `file_bundling` 结合使用(后者重用相同的优化逻辑)。注意:Schema Change 和 Rollup 作业明确禁用使用 `file_bundling` 的表的优化,以避免覆盖同名文件。谨慎启用——它改变了创建的 Tablet 元数据的粒度,并可能影响副本创建和文件命名行为。 -- 引入版本:v3.3.1, v3.4.0, v3.5.0 +- 默认值: false +- 类型: boolean +- 单位: - +- 可变: Yes +- 描述: 启用后,StarRocks 优化共享数据模式下云原生表和物化视图的 Tablet 创建,通过为物理分区下的所有 Tablet 创建单个共享 Tablet 元数据,而不是为每个 Tablet 创建独立的元数据。这减少了在表创建、Rollup 和 Schema 变更作业期间生成的 Tablet 创建任务和元数据/文件数量。该优化仅适用于云原生表/物化视图,并与 `file_bundling` 结合使用(后者重用相同的优化逻辑)。注意:Schema 变更和 Rollup 作业明确禁用 `file_bundling` 表的优化,以避免使用相同名称的文件覆盖。谨慎启用——它改变了创建的 Tablet 元数据的粒度,并可能影响副本创建和文件命名行为。 +- 引入版本: v3.3.1, v3.4.0, v3.5.0 ##### lake_use_combined_txn_log -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:当此项设置为 `true` 时,系统允许湖表对相关事务使用组合事务日志路径。仅适用于共享数据集群。 -- 引入版本:v3.3.7, v3.4.0, v3.5.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 当此项设置为 `true` 时,系统允许 Lake 表使用合并事务日志路径进行相关事务。仅适用于共享数据集群。 +- 引入版本: v3.3.7, v3.4.0, v3.5.0 ##### enable_iceberg_commit_queue -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用 Iceberg 表的提交队列以避免并发提交冲突。Iceberg 使用乐观并发控制(OCC)进行元数据提交。当多个线程并发提交到同一表时,可能会出现冲突,例如:“无法提交:基础元数据位置与当前表元数据位置不同”。启用后,每个 Iceberg 表都有自己的单线程执行器用于提交操作,确保对同一表的提交是串行化的,从而防止 OCC 冲突。不同的表可以并发提交,从而保持整体吞吐量。这是一个系统级优化,旨在提高可靠性,应默认启用。如果禁用,并发提交可能会由于乐观锁定冲突而失败。 -- 引入版本:v4.1.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否为 Iceberg 表启用提交队列以避免并发提交冲突。Iceberg 使用乐观并发控制 (OCC) 进行元数据提交。当多个线程同时向同一个表提交时,可能会发生冲突,并出现诸如“无法提交:基本元数据位置与当前表元数据位置不同”之类的错误。启用后,每个 Iceberg 表都有自己的单线程执行器进行提交操作,确保对同一个表的提交是串行化的,从而防止 OCC 冲突。不同的表可以并发提交,保持整体吞吐量。这是一个系统级优化,旨在提高可靠性,应默认启用。如果禁用,并发提交可能会因乐观锁冲突而失败。 +- 引入版本: v4.1.0 ##### iceberg_commit_queue_timeout_seconds -- 默认值:300 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:等待 Iceberg 提交操作完成的超时时间(秒)。当使用提交队列 (`enable_iceberg_commit_queue=true`) 时,每个提交操作必须在此超时时间内完成。如果提交时间超过此超时,它将被取消并引发错误。影响提交时间的因素包括:提交的数据文件数量、表的元数据大小、底层存储(例如 S3、HDFS)的性能。 -- 引入版本:v4.1.0 +- 默认值: 300 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 等待 Iceberg 提交操作完成的超时时间(秒)。当使用提交队列 (`enable_iceberg_commit_queue=true`) 时,每个提交操作必须在此超时时间内完成。如果提交时间超过此超时,它将被取消并引发错误。影响提交时间的因素包括:提交的数据文件数量、表的元数据大小、底层存储(例如 S3、HDFS)的性能。 +- 引入版本: v4.1.0 ##### iceberg_commit_queue_max_size -- 默认值:1000 -- 类型:Int -- 单位:计数 -- 是否可变:否 -- 描述:每个 Iceberg 表待处理提交操作的最大数量。当使用提交队列 (`enable_iceberg_commit_queue=true`) 时,这限制了单个表可以排队的提交操作数量。当达到限制时,额外的提交操作将在调用线程中执行(阻塞直到容量可用)。此配置在 FE 启动时读取并应用于新创建的表执行器。需要重启 FE 才能生效。如果您预期对同一表进行许多并发提交,请增加此值。如果此值过低,在高并发期间提交可能会在调用线程中阻塞。 -- 引入版本:v4.1.0 +- 默认值: 1000 +- 类型: Int +- 单位: Count +- 可变: No +- 描述: 每个 Iceberg 表的最大待处理提交操作数。当使用提交队列 (`enable_iceberg_commit_queue=true`) 时,这限制了单个表可以排队等待的提交操作数。当达到限制时,额外的提交操作将在调用线程中执行(阻塞直到容量可用)。此配置在 FE 启动时读取,并应用于新创建的表执行器。需要重新启动 FE 才能生效。如果您期望对同一个表进行大量并发提交,请增加此值。如果此值太低,在高并发期间提交可能会在调用线程中阻塞。 +- 引入版本: v4.1.0 ### 其他 ##### agent_task_resend_wait_time_ms -- 默认值:5000 -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:FE 重新发送 Agent 任务之前必须等待的持续时间。仅当任务创建时间与当前时间之间的间隔超过此参数的值时,才能重新发送 Agent 任务。此参数用于防止重复发送 Agent 任务。 -- 引入版本:- +- 默认值: 5000 +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: FE 在重新发送代理任务之前必须等待的持续时间。仅当任务创建时间与当前时间之间的间隔超过此参数的值时,才能重新发送代理任务。此参数用于防止重复发送代理任务。 +- 引入版本: - ##### allow_system_reserved_names -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否允许用户创建名称以 `__op` 和 `__row` 开头的列。要启用此功能,请将此参数设置为 `TRUE`。请注意,这些名称格式在 StarRocks 中保留用于特殊目的,创建此类列可能会导致未定义的行为。因此,此功能默认禁用。 -- 引入版本:v3.2.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否允许用户创建以 `__op` 和 `__row` 开头的列名。要启用此功能,请将此参数设置为 `TRUE`。请注意,这些名称格式在 StarRocks 中保留用于特殊目的,创建此类列可能导致未定义的行为。因此,此功能默认禁用。 +- 引入版本: v3.2.0 ##### auth_token -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于 StarRocks 集群(FE 所属集群)内部身份验证的 Token。如果未指定此参数,StarRocks 会在集群的 Leader FE 首次启动时为集群生成一个随机 Token。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于 StarRocks 集群(FE 所属)内部身份验证的令牌。如果未指定此参数,StarRocks 将在集群 Leader FE 首次启动时为集群生成一个随机令牌。 +- 引入版本: - ##### authentication_ldap_simple_bind_base_dn -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:基本 DN,即 LDAP 服务器开始搜索用户身份验证信息的起点。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 基本 DN,即 LDAP 服务器开始搜索用户身份验证信息的起点。 +- 引入版本: - ##### authentication_ldap_simple_bind_root_dn -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:用于搜索用户身份验证信息的管理员 DN。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 用于搜索用户身份验证信息的管理员 DN。 +- 引入版本: - ##### authentication_ldap_simple_bind_root_pwd -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:用于搜索用户身份验证信息的管理员密码。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: Yes +- 描述: 用于搜索用户身份验证信息的管理员密码。 +- 引入版本: - ##### authentication_ldap_simple_server_host -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:LDAP 服务器运行的主机。 -- 引入版本:- +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: Yes +- 描述: LDAP 服务器运行的主机。 +- 引入版本: - ##### authentication_ldap_simple_server_port -- 默认值:389 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:LDAP 服务器的端口。 -- 引入版本:- +- 默认值: 389 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: LDAP 服务器的端口。 +- 引入版本: - ##### authentication_ldap_simple_user_search_attr -- 默认值:uid -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:用于在 LDAP 对象中识别用户的属性名称。 -- 引入版本:- +- 默认值: uid +- 类型: String +- 单位: - +- 可变: Yes +- 描述: LDAP 对象中标识用户的属性名称。 +- 引入版本: - ##### backup_job_default_timeout_ms -- 默认值:86400 * 1000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:备份作业的超时持续时间。如果超过此值,备份作业将失败。 -- 引入版本:- +- 默认值: 86400 * 1000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 备份作业的超时持续时间。如果超过此值,备份作业将失败。 +- 引入版本: - ##### enable_collect_tablet_num_in_show_proc_backend_disk_path -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在 `SHOW PROC /BACKENDS/{id}` 命令中启用每个磁盘的 Tablet 数量收集。 -- 引入版本:v4.0.1, v3.5.8 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否在 `SHOW PROC /BACKENDS/{id}` 命令中启用收集每个磁盘的 Tablet 数量 +- 引入版本: v4.0.1, v3.5.8 ##### enable_colocate_restore -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用 Colocate 表的备份和恢复。`true` 表示启用 Colocate 表的备份和恢复,`false` 表示禁用。 -- 引入版本:v3.2.10, v3.3.3 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用 Colocate Tables 的备份和恢复。`true` 表示启用 Colocate Tables 的备份和恢复,`false` 表示禁用。 +- 引入版本: v3.2.10, v3.3.3 ##### enable_materialized_view_concurrent_prepare -- 默认值:true -- 类型:Boolean -- 单位: -- 是否可变:是 -- 描述:是否并发准备物化视图以提高性能。 -- 引入版本:v3.4.4 +- 默认值: true +- 类型: Boolean +- 单位: +- 可变: Yes +- 描述: 是否并发准备物化视图以提高性能。 +- 引入版本: v3.4.4 ##### enable_metric_calculator -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:否 -- 描述:指定是否启用用于定期收集指标的功能。有效值:`TRUE` 和 `FALSE`。`TRUE` 指定启用此功能,`FALSE` 指定禁用此功能。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: No +- 描述: 指定是否启用定期收集指标的功能。有效值: `TRUE` 和 `FALSE`。`TRUE` 指定启用此功能,`FALSE` 指定禁用此功能。 +- 引入版本: - ##### enable_table_metrics_collect -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在 FE 中导出表级别指标。禁用时,FE 将跳过导出表指标(例如表扫描/加载计数器和表大小指标),但仍将计数器记录在内存中。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: FE 是否导出表级指标。禁用后,FE 将跳过导出表指标(例如表扫描/加载计数器和表大小指标),但仍将计数器记录在内存中。 +- 引入版本: - ##### enable_mv_post_image_reload_cache -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:FE 加载镜像后是否执行重新加载标志检查。如果对基本物化视图执行检查,则无需对与之相关的其他物化视图执行检查。 -- 引入版本:v3.5.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: FE 加载镜像后是否执行重新加载标志检查。如果为基物化视图执行检查,则不需要为与之相关的其他物化视图执行检查。 +- 引入版本: v3.5.0 ##### enable_mv_query_context_cache -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用查询级别的物化视图重写缓存以提高查询重写性能。 -- 引入版本:v3.3 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用查询级物化视图重写缓存以提高查询重写性能。 +- 引入版本: v3.3 ##### enable_mv_refresh_collect_profile -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否默认在刷新物化视图时为所有物化视图启用 Profile。 -- 引入版本:v3.3.0 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否默认启用所有物化视图刷新时的 profile 收集。 +- 引入版本: v3.3.0 ##### enable_mv_refresh_extra_prefix_logging -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在日志中启用物化视图名称前缀以更好地进行调试。 -- 引入版本:v3.4.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否在日志中启用物化视图名称前缀,以便更好地调试。 +- 引入版本: v3.4.0 ##### enable_mv_refresh_query_rewrite -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否在物化视图刷新期间启用重写查询,以便查询可以直接使用重写的物化视图而不是基表,以提高查询性能。 -- 引入版本:v3.3 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否在物化视图刷新期间启用重写查询,以便查询可以直接使用重写后的物化视图而不是基表来提高查询性能。 +- 引入版本: v3.3 ##### enable_trace_historical_node -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否允许系统跟踪历史节点。通过将此项设置为 `true`,您可以启用缓存共享功能,并允许系统在弹性扩展期间选择正确的缓存节点。 -- 引入版本:v3.5.1 +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否允许系统跟踪历史节点。通过将此项设置为 `true`,您可以启用缓存共享功能,并允许系统在弹性伸缩期间选择正确的缓存节点。 +- 引入版本: v3.5.1 ##### es_state_sync_interval_second -- 默认值:10 -- 类型:Long -- 单位:秒 -- 是否可变:否 -- 描述:FE 获取 Elasticsearch 索引并同步 StarRocks 外部表元数据的时间间隔。 -- 引入版本:- +- 默认值: 10 +- 类型: Long +- 单位: Seconds +- 可变: No +- 描述: FE 获取 Elasticsearch 索引并同步 StarRocks 外部表元数据的时间间隔。 +- 引入版本: - ##### hive_meta_cache_refresh_interval_s -- 默认值:3600 * 2 -- 类型:Long -- 单位:秒 -- 是否可变:否 -- 描述:Hive 外部表缓存元数据的更新时间间隔。 -- 引入版本:- +- 默认值: 3600 * 2 +- 类型: Long +- 单位: Seconds +- 可变: No +- 描述: Hive 外部表缓存元数据更新的时间间隔。 +- 引入版本: - ##### hive_meta_store_timeout_s -- 默认值:10 -- 类型:Long -- 单位:秒 -- 是否可变:否 -- 描述:与 Hive Metastore 连接超时的时间量。 -- 引入版本:- +- 默认值: 10 +- 类型: Long +- 单位: Seconds +- 可变: No +- 描述: 连接 Hive metastore 的超时时间。 +- 引入版本: - ##### jdbc_connection_idle_timeout_ms -- 默认值:600000 -- 类型:Int -- 单位:毫秒 -- 是否可变:否 -- 描述:访问 JDBC Catalog 的连接超时后最长时间。超时连接被视为空闲。 -- 引入版本:- +- 默认值: 600000 +- 类型: Int +- 单位: Milliseconds +- 可变: No +- 描述: 访问 JDBC Catalog 的连接超时最长时间。超时连接被视为空闲。 +- 引入版本: - ##### jdbc_connection_timeout_ms -- 默认值:10000 -- 类型:Long -- 单位:毫秒 -- 是否可变:否 -- 描述:HikariCP 连接池获取连接的超时时间(毫秒)。如果在此时间内无法从池中获取连接,操作将失败。 -- 引入版本:v3.5.13 +- 默认值: 10000 +- 类型: Long +- 单位: Milliseconds +- 可变: No +- 描述: HikariCP 连接池获取连接的超时时间(毫秒)。如果在此时间内无法从池中获取连接,操作将失败。 +- 引入版本: v3.5.13 ##### jdbc_query_timeout_ms -- 默认值:30000 -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:JDBC 语句查询执行的超时时间(毫秒)。此超时应用于通过 JDBC Catalog 执行的所有 SQL 查询(例如分区元数据查询)。该值在传递给 JDBC 驱动程序时转换为秒。 -- 引入版本:v3.5.13 +- 默认值: 30000 +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: JDBC 语句查询执行的超时时间(毫秒)。此超时应用于通过 JDBC Catalog 执行的所有 SQL 查询(例如分区元数据查询)。该值在传递给 JDBC 驱动程序时转换为秒。 +- 引入版本: v3.5.13 ##### jdbc_network_timeout_ms -- 默认值:30000 -- 类型:Long -- 单位:毫秒 -- 是否可变:是 -- 描述:JDBC 网络操作(套接字读取)的超时时间(毫秒)。此超时适用于数据库元数据调用(例如 getSchemas()、getTables()、getColumns()),以防止在外部数据库无响应时无限期阻塞。 -- 引入版本:v3.5.13 +- 默认值: 30000 +- 类型: Long +- 单位: Milliseconds +- 可变: Yes +- 描述: JDBC 网络操作(socket 读取)的超时时间(毫秒)。此超时适用于数据库元数据调用(例如 getSchemas()、getTables()、getColumns()),以防止外部数据库无响应时无限期阻塞。 +- 引入版本: v3.5.13 ##### jdbc_connection_pool_size -- 默认值:8 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:用于访问 JDBC Catalog 的 JDBC 连接池的最大容量。 -- 引入版本:- +- 默认值: 8 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 访问 JDBC Catalog 的 JDBC 连接池的最大容量。 +- 引入版本: - ##### jdbc_meta_default_cache_enable -- 默认值:false -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:JDBC Catalog 元数据缓存是否启用的默认值。设置为 True 时,新创建的 JDBC Catalog 将默认启用元数据缓存。 -- 引入版本:- +- 默认值: false +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: JDBC Catalog 元数据缓存是否启用的默认值。当设置为 True 时,新创建的 JDBC Catalog 将默认启用元数据缓存。 +- 引入版本: - ##### jdbc_meta_default_cache_expire_sec -- 默认值:600 -- 类型:Long -- 单位:秒 -- 是否可变:是 -- 描述:JDBC Catalog 元数据缓存的默认过期时间。当 `jdbc_meta_default_cache_enable` 设置为 true 时,新创建的 JDBC Catalog 将默认设置元数据缓存的过期时间。 -- 引入版本:- +- 默认值: 600 +- 类型: Long +- 单位: Seconds +- 可变: Yes +- 描述: JDBC Catalog 元数据缓存的默认过期时间。当 `jdbc_meta_default_cache_enable` 设置为 true 时,新创建的 JDBC Catalog 将默认设置元数据缓存的过期时间。 +- 引入版本: - ##### jdbc_minimum_idle_connections -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:用于访问 JDBC Catalog 的 JDBC 连接池中的最小空闲连接数。 -- 引入版本:- +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 访问 JDBC Catalog 的 JDBC 连接池中的最小空闲连接数。 +- 引入版本: - ##### jwt_jwks_url -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:JSON Web Key Set (JWKS) 服务的 URL 或 `fe/conf` 目录下公共密钥本地文件的路径。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: JSON Web Key Set (JWKS) 服务的 URL 或 `fe/conf` 目录下公钥本地文件的路径。 +- 引入版本: v3.5.0 ##### jwt_principal_field -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于标识 JWT 中表示主题 (`sub`) 字段的字符串。默认值为 `sub`。此字段的值必须与登录 StarRocks 的用户名相同。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于标识 JWT 中主题 (`sub`) 字段的字符串。默认值为 `sub`。此字段的值必须与登录 StarRocks 的用户名相同。 +- 引入版本: v3.5.0 ##### jwt_required_audience -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于标识 JWT 中受众 (`aud`) 字段的字符串列表。仅当列表中一个值与 JWT 受众匹配时,JWT 才被视为有效。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于标识 JWT 中受众 (`aud`) 的字符串列表。JWT 仅在列表中有一个值与 JWT 受众匹配时才被视为有效。 +- 引入版本: v3.5.0 ##### jwt_required_issuer -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于标识 JWT 中颁发者 (`iss`) 字段的字符串列表。仅当列表中一个值与 JWT 颁发者匹配时,JWT 才被视为有效。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于标识 JWT 中发行者 (`iss`) 的字符串列表。JWT 仅在列表中有一个值与 JWT 发行者匹配时才被视为有效。 +- 引入版本: v3.5.0 ##### locale -- 默认值:zh_CN.UTF-8 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:FE 使用的字符集。 -- 引入版本:- +- 默认值: zh_CN.UTF-8 +- 类型: String +- 单位: - +- 可变: No +- 描述: FE 使用的字符集。 +- 引入版本: - ##### max_agent_task_threads_num -- 默认值:4096 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:Agent 任务线程池中允许的最大线程数。 -- 引入版本:- +- 默认值: 4096 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 代理任务线程池中允许的最大线程数。 +- 引入版本: - ##### max_download_task_per_be -- 默认值:0 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:在每次 RESTORE 操作中,StarRocks 分配给 BE 节点的最大下载任务数。当此项设置为小于或等于 0 时,不对任务数量施加限制。 -- 引入版本:v3.1.0 +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 在每个 RESTORE 操作中,StarRocks 分配给 BE 节点的最大下载任务数。当此项设置为小于或等于 0 时,不对任务数施加限制。 +- 引入版本: v3.1.0 ##### max_mv_check_base_table_change_retry_times -- 默认值:10 -- 类型:- -- 单位:- -- 是否可变:是 -- 描述:刷新物化视图时检测基表更改的最大重试次数。 -- 引入版本:v3.3.0 +- 默认值: 10 +- 类型: - +- 单位: - +- 可变: Yes +- 描述: 刷新物化视图时检测基表变更的最大重试次数。 +- 引入版本: v3.3.0 ##### max_mv_refresh_failure_retry_times -- 默认值:1 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:物化视图刷新失败时的最大重试次数。 -- 引入版本:v3.3.0 +- 默认值: 1 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 物化视图刷新失败时的最大重试次数。 +- 引入版本: v3.3.0 ##### max_mv_refresh_try_lock_failure_retry_times -- 默认值:3 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:物化视图刷新失败时尝试锁定的最大重试次数。 -- 引入版本:v3.3.0 +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 物化视图刷新失败时尝试获取锁的最大重试次数。 +- 引入版本: v3.3.0 ##### max_small_file_number -- 默认值:100 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:FE 目录中可以存储的小文件最大数量。 -- 引入版本:- +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: FE 目录中可以存储的小文件最大数量。 +- 引入版本: - ##### max_small_file_size_bytes -- 默认值:1024 * 1024 -- 类型:Int -- 单位:字节 -- 是否可变:是 -- 描述:小文件的最大大小。 -- 引入版本:- +- 默认值: 1024 * 1024 +- 类型: Int +- 单位: Bytes +- 可变: Yes +- 描述: 小文件的最大大小。 +- 引入版本: - ##### max_upload_task_per_be -- 默认值:0 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:在每次 BACKUP 操作中,StarRocks 分配给 BE 节点的最大上传任务数。当此项设置为小于或等于 0 时,不对任务数量施加限制。 -- 引入版本:v3.1.0 +- 默认值: 0 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 在每个 BACKUP 操作中,StarRocks 分配给 BE 节点的最大上传任务数。当此项设置为小于或等于 0 时,不对任务数施加限制。 +- 引入版本: v3.1.0 ##### mv_create_partition_batch_interval_ms -- 默认值:1000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:物化视图刷新期间,如果需要批量创建多个分区,系统会将其分成每批 64 个分区。为了降低频繁创建分区导致的失败风险,每个批次之间设置了默认间隔(毫秒)以控制创建频率。 -- 引入版本:v3.3 +- 默认值: 1000 +- 类型: Int +- 单位: ms +- 可变: Yes +- 描述: 物化视图刷新期间,如果需要批量创建多个分区,系统会将其分成每批 64 个分区。为了降低因频繁创建分区而导致的失败风险,在每个批次之间设置了一个默认间隔(毫秒)以控制创建频率。 +- 引入版本: v3.3 ##### mv_plan_cache_max_size -- 默认值:1000 -- 类型:Long -- 单位: -- 是否可变:是 -- 描述:物化视图计划缓存(用于物化视图重写)的最大大小。如果有很多物化视图用于透明查询重写,您可以增加此值。 -- 引入版本:v3.2 +- 默认值: 1000 +- 类型: Long +- 单位: +- 可变: Yes +- 描述: 物化视图计划缓存的最大大小(用于物化视图重写)。如果用于透明查询重写的物化视图很多,可以增加此值。 +- 引入版本: v3.2 ##### mv_plan_cache_thread_pool_size -- 默认值:3 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:物化视图计划缓存(用于物化视图重写)的默认线程池大小。 -- 引入版本:v3.2 +- 默认值: 3 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 物化视图计划缓存的默认线程池大小(用于物化视图重写)。 +- 引入版本: v3.2 ##### mv_refresh_default_planner_optimize_timeout -- 默认值:30000 -- 类型:- -- 单位:- -- 是否可变:是 -- 描述:刷新物化视图时优化器规划阶段的默认超时。 -- 引入版本:v3.3.0 +- 默认值: 30000 +- 类型: - +- 单位: - +- 可变: Yes +- 描述: 刷新物化视图时优化器规划阶段的默认超时时间。 +- 引入版本: v3.3.0 ##### mv_refresh_fail_on_filter_data -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:如果在刷新期间有过滤数据,物化视图刷新失败,默认为 true,否则通过忽略过滤数据返回成功。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 如果刷新过程中存在过滤数据,物化视图刷新会失败,默认为 true,否则通过忽略过滤数据返回成功。 +- 引入版本: - ##### mv_refresh_try_lock_timeout_ms -- 默认值:30000 -- 类型:Int -- 单位:毫秒 -- 是否可变:是 -- 描述:物化视图刷新尝试其基表/物化视图的 DB 锁的默认尝试锁定超时。 -- 引入版本:v3.3.0 +- 默认值: 30000 +- 类型: Int +- 单位: Milliseconds +- 可变: Yes +- 描述: 物化视图刷新尝试获取其基表/物化视图的 DB 锁的默认尝试锁超时时间。 +- 引入版本: v3.3.0 ##### oauth2_auth_server_url -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:授权 URL。用户浏览器将被重定向到此 URL 以开始 OAuth 2.0 授权过程。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 授权 URL。用户的浏览器将被重定向到此 URL 以开始 OAuth 2.0 授权过程。 +- 引入版本: v3.5.0 ##### oauth2_client_id -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:StarRocks 客户端的公共标识符。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: StarRocks 客户端的公共标识符。 +- 引入版本: v3.5.0 ##### oauth2_client_secret -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于授权 StarRocks 客户端与授权服务器通信的 Secret。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于授权 StarRocks 客户端与授权服务器通信的密钥。 +- 引入版本: v3.5.0 ##### oauth2_jwks_url -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:JSON Web Key Set (JWKS) 服务的 URL 或 `conf` 目录下本地文件的路径。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: JSON Web Key Set (JWKS) 服务的 URL 或 `conf` 目录下本地文件的路径。 +- 引入版本: v3.5.0 ##### oauth2_principal_field -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于标识 JWT 中表示主题 (`sub`) 字段的字符串。默认值为 `sub`。此字段的值必须与登录 StarRocks 的用户名相同。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于标识 JWT 中主题 (`sub`) 字段的字符串。默认值为 `sub`。此字段的值必须与登录 StarRocks 的用户名相同。 +- 引入版本: v3.5.0 ##### oauth2_redirect_url -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:OAuth 2.0 身份验证成功后,用户浏览器将被重定向到的 URL。授权码将发送到此 URL。在大多数情况下,需要将其配置为 `http://:/api/oauth2`。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: OAuth 2.0 身份验证成功后,用户的浏览器将被重定向到的 URL。授权码将发送到此 URL。在大多数情况下,需要配置为 `http://:/api/oauth2`。 +- 引入版本: v3.5.0 ##### oauth2_required_audience -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于标识 JWT 中受众 (`aud`) 字段的字符串列表。仅当列表中一个值与 JWT 受众匹配时,JWT 才被视为有效。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于标识 JWT 中受众 (`aud`) 的字符串列表。JWT 仅在列表中有一个值与 JWT 受众匹配时才被视为有效。 +- 引入版本: v3.5.0 ##### oauth2_required_issuer -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:用于标识 JWT 中颁发者 (`iss`) 字段的字符串列表。仅当列表中一个值与 JWT 颁发者匹配时,JWT 才被视为有效。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: 用于标识 JWT 中发行者 (`iss`) 的字符串列表。JWT 仅在列表中有一个值与 JWT 发行者匹配时才被视为有效。 +- 引入版本: v3.5.0 ##### oauth2_token_server_url -- 默认值:空字符串 -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:StarRocks 从授权服务器获取访问 Token 的端点 URL。 -- 引入版本:v3.5.0 +- 默认值: Empty string +- 类型: String +- 单位: - +- 可变: No +- 描述: StarRocks 从授权服务器获取访问令牌的端点 URL。 +- 引入版本: v3.5.0 ##### plugin_dir -- 默认值:System.getenv("STARROCKS_HOME") + "/plugins" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储插件安装包的目录。 -- 引入版本:- +- 默认值: System.getenv("STARROCKS_HOME") + "/plugins" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储插件安装包的目录。 +- 引入版本: - ##### plugin_enable -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否可以在 FE 上安装插件。插件只能在 Leader FE 上安装或卸载。 -- 引入版本:- +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否可以在 FE 上安装插件。插件只能在 Leader FE 上安装或卸载。 +- 引入版本: - ##### proc_profile_jstack_depth -- 默认值:128 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:系统收集 CPU 和内存 Profile 时 Java 堆栈的最大深度。此值控制每个采样堆栈捕获的 Java 堆栈帧数:较大的值会增加跟踪详细信息和输出大小,并可能增加 Profiling 开销,而较小的值会减少详细信息。此设置在 CPU 和内存 Profiling 启动时都使用,因此请根据诊断需求和性能影响进行调整。 -- 引入版本:- +- 默认值: 128 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 系统收集 CPU 和内存 profile 时的最大 Java 堆栈深度。此值控制为每个采样堆栈捕获的 Java 堆栈帧数:较大的值会增加跟踪详细程度和输出大小,并可能增加 profile 开销,而较小的值会减少详细程度。此设置在启动 CPU 和内存 profile 时都使用,因此请根据诊断需求和性能影响进行调整。 +- 引入版本: - ##### proc_profile_mem_enable -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:是否启用进程内存分配 Profile 收集。当此项设置为 `true` 时,系统会在 `sys_log_dir/proc_profile` 下生成名为 `mem-profile-.html` 的 HTML Profile,然后在采样期间睡眠 `proc_profile_collect_time_s` 秒,并使用 `proc_profile_jstack_depth` 作为 Java 堆栈深度。生成的文件根据 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 进行压缩和清除。原生提取路径使用 `STARROCKS_HOME_DIR` 以避免 `/tmp` noexec 问题。此项旨在解决内存分配热点问题。启用它会增加 CPU、I/O 和磁盘使用,并可能产生大文件。 -- 引入版本:v3.2.12 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 是否启用进程内存分配 profile 收集。当此项设置为 `true` 时,系统在 `sys_log_dir/proc_profile` 下生成名为 `mem-profile-.html` 的 HTML profile,在采样期间休眠 `proc_profile_collect_time_s` 秒,并使用 `proc_profile_jstack_depth` 作为 Java 堆栈深度。生成的文件会根据 `proc_profile_file_retained_days` 和 `proc_profile_file_retained_size_bytes` 进行压缩和清除。原生提取路径使用 `STARROCKS_HOME_DIR` 以避免 `/tmp` noexec 问题。此项旨在故障排除内存分配热点。启用它会增加 CPU、I/O 和磁盘使用率,并可能生成大文件。 +- 引入版本: v3.2.12 ##### query_detail_explain_level -- 默认值:COSTS -- 类型:String -- 单位:- -- 是否可变:是 -- 描述:EXPLAIN 语句返回的查询计划的详细级别。有效值:COSTS, NORMAL, VERBOSE。 -- 引入版本:v3.2.12, v3.3.5 +- 默认值: COSTS +- 类型: String +- 单位: - +- 可变: true +- 描述: EXPLAIN 语句返回的查询计划详细级别。有效值: COSTS、NORMAL、VERBOSE。 +- 引入版本: v3.2.12, v3.3.5 ##### replication_interval_ms -- 默认值:100 -- 类型:Int -- 单位:- -- 是否可变:否 -- 描述:调度复制任务的最小时间间隔。 -- 引入版本:v3.3.5 +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: No +- 描述: 复制任务调度的最小时间间隔。 +- 引入版本: v3.3.5 ##### replication_max_parallel_data_size_mb -- 默认值:1048576 -- 类型:Int -- 单位:MB -- 是否可变:是 -- 描述:并发同步允许的最大数据大小。 -- 引入版本:v3.3.5 +- 默认值: 1048576 +- 类型: Int +- 单位: MB +- 可变: Yes +- 描述: 允许并发同步的最大数据大小。 +- 引入版本: v3.3.5 ##### replication_max_parallel_replica_count -- 默认值:10240 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:并发同步允许的最大 Tablet 副本数。 -- 引入版本:v3.3.5 +- 默认值: 10240 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 允许并发同步的 Tablet 副本的最大数量。 +- 引入版本: v3.3.5 ##### replication_max_parallel_table_count -- 默认值:100 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:允许的最大并发数据同步任务数。StarRocks 为每张表创建一个同步任务。 -- 引入版本:v3.3.5 +- 默认值: 100 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 允许的最大并发数据同步任务数。StarRocks 为每个表创建一个同步任务。 +- 引入版本: v3.3.5 ##### replication_transaction_timeout_sec -- 默认值:86400 -- 类型:Int -- 单位:秒 -- 是否可变:是 -- 描述:同步任务的超时持续时间。 -- 引入版本:v3.3.5 +- 默认值: 86400 +- 类型: Int +- 单位: Seconds +- 可变: Yes +- 描述: 同步任务的超时持续时间。 +- 引入版本: v3.3.5 ##### skip_whole_phase_lock_mv_limit -- 默认值:5 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:控制 StarRocks 何时对具有相关物化视图的表应用“无锁”优化。当此项设置为小于 0 时,系统始终应用无锁优化,并且不为查询复制相关物化视图(FE 内存使用和元数据复制/锁争用减少,但元数据并发问题风险可能增加)。当设置为 0 时,禁用无锁优化(系统始终使用安全的复制和锁定路径)。当设置为大于 0 时,仅当相关物化视图的数量小于或等于配置的阈值时才应用无锁优化。此外,当值大于等于 0 时,规划器将查询 OLAP 表记录到优化器上下文中以启用与物化视图相关的重写路径;当小于 0 时,此步骤被跳过。 -- 引入版本:v3.2.1 +- 默认值: 5 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 控制 StarRocks 何时对具有相关物化视图的表应用“无锁”优化。当此项设置为小于 0 时,系统始终应用无锁优化,并且不为查询复制相关物化视图(FE 内存使用和元数据复制/锁争用减少,但元数据并发问题风险可能增加)。当设置为 0 时,无锁优化被禁用(系统始终使用安全的复制和锁路径)。当设置为大于 0 时,无锁优化仅应用于相关物化视图数量小于或等于配置阈值的表。此外,当值大于等于 0 时,规划器将查询 OLAP 表记录到优化器上下文中以启用与物化视图相关的重写路径;当小于 0 时,此步骤将被跳过。 +- 引入版本: v3.2.1 ##### small_file_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/small_files" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:小文件的根目录。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/small_files" +- 类型: String +- 单位: - +- 可变: No +- 描述: 小文件的根目录。 +- 引入版本: - ##### task_runs_max_history_number -- 默认值:10000 -- 类型:Int -- 单位:- -- 是否可变:是 -- 描述:内存中保留的任务运行记录的最大数量,并用作查询存档任务运行历史记录时的默认 LIMIT。当 `enable_task_history_archive` 为 false 时,此值限制内存中的历史记录:强制 GC 修剪旧条目,因此只保留最新的 `task_runs_max_history_number`。当查询存档历史记录时(且未提供显式 LIMIT),如果此值大于 0,`TaskRunHistoryTable.lookup` 将使用 `"ORDER BY create_time DESC LIMIT "`。注意:将其设置为 0 会禁用查询端的 LIMIT(无上限),但会导致内存中的历史记录截断为零(除非启用了存档)。 -- 引入版本:v3.2.0 +- 默认值: 10000 +- 类型: Int +- 单位: - +- 可变: Yes +- 描述: 内存中保留的任务运行记录的最大数量,并用作查询归档任务运行历史记录时的默认 LIMIT。当 `enable_task_history_archive` 为 false 时,此值限制内存中的历史记录:强制 GC 修剪旧条目,只保留最新的 `task_runs_max_history_number` 条。当查询归档历史记录时(且未提供显式 LIMIT),如果此值大于 0,则 `TaskRunHistoryTable.lookup` 使用 `"ORDER BY create_time DESC LIMIT "`。注意:将其设置为 0 会禁用查询侧的 LIMIT(无上限),但会导致内存中的历史记录被截断为零(除非启用了归档)。 +- 引入版本: v3.2.0 ##### tmp_dir -- 默认值:StarRocksFE.STARROCKS_HOME_DIR + "/temp_dir" -- 类型:String -- 单位:- -- 是否可变:否 -- 描述:存储临时文件的目录,例如备份和恢复过程中生成的文件。这些过程完成后,生成的临时文件将被删除。 -- 引入版本:- +- 默认值: StarRocksFE.STARROCKS_HOME_DIR + "/temp_dir" +- 类型: String +- 单位: - +- 可变: No +- 描述: 存储临时文件的目录,例如备份和恢复过程中生成的文件。这些过程完成后,生成的临时文件将被删除。 +- 引入版本: - ##### transform_type_prefer_string_for_varchar -- 默认值:true -- 类型:Boolean -- 单位:- -- 是否可变:是 -- 描述:在物化视图创建和 CTAS 操作中,是否更喜欢对固定长度 VARCHAR 列使用 STRING 类型。 -- 引入版本:v4.0.0 +- 默认值: true +- 类型: Boolean +- 单位: - +- 可变: Yes +- 描述: 在物化视图创建和 CTAS 操作中,是否更倾向于为固定长度 varchar 列使用 string 类型。 +- 引入版本: v4.0.0 diff --git a/docs/zh/administration/management/Scale_up_down.md b/docs/zh/administration/management/Scale_up_down.md index 3c26128..f86d41c 100644 --- a/docs/zh/administration/management/Scale_up_down.md +++ b/docs/zh/administration/management/Scale_up_down.md @@ -2,16 +2,16 @@ displayed_sidebar: docs --- -# 扩缩容 +# 扩容与缩容 -本主题描述了如何对 StarRocks 节点进行扩缩容。 +本主题描述了如何对 StarRocks 节点进行扩容与缩容。 -## FE 节点扩缩容 +## FE 扩容与缩容 -StarRocks 有两种类型的 FE 节点:Follower 和 Observer。Follower 参与选举投票和写入。Observer 仅用于同步日志和扩展读取性能。 +StarRocks 有两种类型的 FE 节点:Follower 和 Observer。Follower 节点参与选举投票和写入。Observer 节点仅用于同步日志和扩展读取性能。 -> * Follower FE(包括 Leader)的数量必须是奇数,建议部署 3 个以形成高可用 (HA) 模式。 -> * 当 FE 以高可用模式部署(1 个 Leader,2 个 Follower)时,建议添加 Observer FE 以获得更好的读取性能。 +> * Follower FE(包括 leader)的数量必须是奇数,建议部署 3 个以形成高可用 (HA) 模式。 +> * 当 FE 处于高可用部署(1 个 leader,2 个 follower)时,建议添加 Observer FE 以获得更好的读取性能。 ### FE 扩容 @@ -31,17 +31,17 @@ alter system drop follower "fe_host:edit_log_port"; alter system drop observer "fe_host:edit_log_port"; ~~~ -扩缩容完成后,可以通过运行 `show proc '/frontends';` 查看节点信息。 +扩容和缩容后,可以通过运行 `show proc '/frontends';` 查看节点信息。 -## BE 节点扩缩容 +## BE 扩容与缩容 -StarRocks 会在 BE 节点扩缩容后自动执行负载均衡,而不会影响整体性能。 +StarRocks 在 BE 扩容或缩容后会自动执行负载均衡,而不会影响整体性能。 -当您添加新的 BE 节点时,系统的 Tablet Scheduler 会检测到新节点及其低负载。然后,它将开始把 Tablet 从高负载的 BE 节点移动到新的、低负载的 BE 节点,以确保数据和负载在整个集群中均匀分布。 +当您添加新的 BE 节点时,系统的 Tablet Scheduler 会检测到新节点及其较低的负载。然后,它将开始将 tablet 从高负载 BE 节点移动到新的低负载 BE 节点,以确保数据和负载在整个集群中均匀分布。 -负载均衡过程基于为每个 BE 计算的 loadScore,该 loadScore 考虑了磁盘利用率和副本数量。系统旨在将 Tablet 从具有较高 loadScore 的节点移动到具有较低 loadScore 的节点。 +均衡过程基于为每个 BE 计算的 loadScore,该 loadScore 综合考虑了磁盘利用率和副本数。系统旨在将 tablet 从 loadScore 较高的节点移动到 loadScore 较低的节点。 -您可以检查 FE 配置参数 `tablet_sched_disable_balance` 以确保自动均衡未被禁用(该参数默认为 false,这意味着 Tablet 均衡默认是启用的)。更多详细信息请参阅 [manage replica docs](./resource_management/Replica.md)。 +您可以检查 FE 配置参数 `tablet_sched_disable_balance`,以确保自动均衡未被禁用(该参数默认为 false,这意味着 tablet 均衡默认启用)。更多详细信息请参见 [manage replica docs](./resource_management/Replica.md)。 ### BE 扩容 @@ -51,7 +51,7 @@ StarRocks 会在 BE 节点扩缩容后自动执行负载均衡,而不会影响 alter system add backend 'be_host:be_heartbeat_service_port'; ~~~ -运行以下命令查看 BE 状态。 +运行以下命令检查 BE 状态。 ~~~sql show proc '/backends'; @@ -59,20 +59,20 @@ show proc '/backends'; ### BE 缩容 -缩容 BE 节点有两种方式:`DROP` 和 `DECOMMISSION`。 +缩容 BE 节点有两种方式——`DROP` 和 `DECOMMISSION`。 -`DROP` 将立即删除 BE 节点,丢失的副本将由 FE 调度补齐。`DECOMMISSION` 将首先确保副本已补齐,然后再删除 BE 节点。`DECOMMISSION` 更加友好,推荐用于 BE 缩容。 +`DROP` 会立即删除 BE 节点,并且丢失的副本将由 FE 调度补齐。`DECOMMISSION` 会首先确保副本补齐,然后再删除 BE 节点。`DECOMMISSION` 方式更为友好,建议用于 BE 缩容。 两种方法的命令类似: * `alter system decommission backend "be_host:be_heartbeat_service_port";` * `alter system drop backend "be_host:be_heartbeat_service_port";` -DROP BE 是一种危险操作,执行前需要二次确认。 +删除 backend 是一项危险操作,因此在执行前您需要确认两次。 * `alter system drop backend "be_host:be_heartbeat_service_port";` -## CN 节点扩缩容 +## CN 扩容与缩容 ### CN 扩容 @@ -82,7 +82,7 @@ DROP BE 是一种危险操作,执行前需要二次确认。 ALTER SYSTEM ADD COMPUTE NODE "cn_host:cn_heartbeat_service_port"; ~~~ -运行以下命令查看 CN 状态。 +运行以下命令检查 CN 状态。 ~~~sql SHOW PROC '/compute_nodes'; diff --git a/docs/zh/administration/management/audit_loader.md b/docs/zh/administration/management/audit_loader.md index 86b6229..7c57108 100644 --- a/docs/zh/administration/management/audit_loader.md +++ b/docs/zh/administration/management/audit_loader.md @@ -2,24 +2,24 @@ displayed_sidebar: docs --- -# 通过 AuditLoader 在 StarRocks 中管理审计日志 +# 通过 AuditLoader 管理 StarRocks 内部的审计日志 -本主题介绍如何通过插件 AuditLoader 在表中管理 StarRocks 审计日志。 +本文介绍如何通过插件 AuditLoader 在 StarRocks 表中管理审计日志。 -StarRocks 将其审计日志存储在本地文件 **fe/log/fe.audit.log** 中,而不是内部数据库中。插件 AuditLoader 允许您直接在集群中管理审计日志。安装后,AuditLoader 从文件读取日志,并通过 HTTP PUT 将其加载到 StarRocks 中。然后您可以使用 SQL 语句在 StarRocks 中查询审计日志。 +StarRocks 将审计日志存储在本地文件 **fe/log/fe.audit.log** 中,而不是内部数据库。AuditLoader 插件允许您直接在集群中管理审计日志。安装后,AuditLoader 从文件中读取日志,并通过 HTTP PUT 将其加载到 StarRocks 中。然后,您可以使用 SQL 语句在 StarRocks 中查询审计日志。 -## 创建表以存储审计日志 +## 创建用于存储审计日志的表 -在 StarRocks 集群中创建数据库和表以存储审计日志。有关详细说明,请参阅 [CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) 和 [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md)。 +在 StarRocks 集群中创建数据库和表,用于存储审计日志。详细说明请参阅 [CREATE DATABASE](../../sql-reference/sql-statements/Database/CREATE_DATABASE.md) 和 [CREATE TABLE](../../sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md)。 -由于审计日志的字段在不同 StarRocks 版本之间有所差异,因此务必遵循以下建议,以避免在升级过程中出现兼容性问题: +由于不同 StarRocks 版本的审计日志字段可能有所不同,请务必遵循以下建议,以避免在升级过程中出现兼容性问题: > **注意** > -> - 所有新字段都应标记为 `NULL`。 +> - 所有新增字段应标记为 `NULL`。 > - 字段不应重命名,因为用户可能依赖它们。 -> - 字段类型只能应用向后兼容的更改,例如 `VARCHAR(32)` -> `VARCHAR(64)`,以避免插入时出错。 -> - `AuditEvent` 字段仅通过名称解析。表中列的顺序无关紧要,并且用户可以随时更改。 +> - 字段类型只能进行向后兼容的更改,例如 `VARCHAR(32)` -> `VARCHAR(64)`,以避免插入时出错。 +> - `AuditEvent` 字段仅按名称解析。表中列的顺序无关紧要,用户可以随时更改。 > - 表中不存在的 `AuditEvent` 字段将被忽略,因此用户可以删除不需要的列。 ```SQL @@ -31,10 +31,10 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( `queryType` VARCHAR(12) COMMENT "查询类型 (query, slow_query, connection)", `clientIp` VARCHAR(32) COMMENT "客户端IP", `user` VARCHAR(64) COMMENT "查询用户名", - `authorizedUser` VARCHAR(64) COMMENT "用户的唯一标识,即user_identity", + `authorizedUser` VARCHAR(64) COMMENT "用户的唯一标识符,即user_identity", `resourceGroup` VARCHAR(64) COMMENT "资源组名称", `catalog` VARCHAR(32) COMMENT "Catalog 名称", - `db` VARCHAR(96) COMMENT "查询运行的数据库", + `db` VARCHAR(96) COMMENT "查询运行所在的数据库", `state` VARCHAR(8) COMMENT "查询状态 (EOF, ERR, OK)", `errorCode` VARCHAR(512) COMMENT "错误码", `queryTime` BIGINT COMMENT "查询执行时间(毫秒)", @@ -44,12 +44,12 @@ CREATE TABLE starrocks_audit_db__.starrocks_audit_tbl__ ( `cpuCostNs` BIGINT COMMENT "查询消耗的CPU时间(纳秒)", `memCostBytes` BIGINT COMMENT "查询消耗的内存(字节)", `stmtId` INT COMMENT "SQL语句的增量ID", - `isQuery` TINYINT COMMENT "SQL是否为查询语句(1或0)", + `isQuery` TINYINT COMMENT "该SQL是否为查询语句 (1或0)", `feIp` VARCHAR(128) COMMENT "执行该语句的FE IP", `stmt` VARCHAR(1048576) COMMENT "原始SQL语句", `digest` VARCHAR(32) COMMENT "慢SQL的指纹", - `planCpuCosts` DOUBLE COMMENT "查询规划阶段的CPU使用量(纳秒)", - `planMemCosts` DOUBLE COMMENT "查询规划阶段的内存使用量(字节)", + `planCpuCosts` DOUBLE COMMENT "查询规划期间的CPU使用量(纳秒)", + `planMemCosts` DOUBLE COMMENT "查询规划期间的内存使用量(字节)", `pendingTimeMs` BIGINT COMMENT "查询在队列中等待的时间(毫秒)", `candidateMVs` VARCHAR(65533) NULL COMMENT "候选物化视图列表", `hitMvs` VARCHAR(65533) NULL COMMENT "匹配的物化视图列表", @@ -64,7 +64,7 @@ PROPERTIES ( ); ``` -`starrocks_audit_tbl__` 是使用动态分区创建的。默认情况下,第一个动态分区在表创建后 10 分钟创建。然后可以将审计日志加载到表中。您可以使用以下语句检查表中的分区: +`starrocks_audit_tbl__` 以动态分区创建。默认情况下,第一个动态分区在表创建后 10 分钟创建。然后可以将审计日志加载到表中。您可以使用以下语句检查表中的分区: ```SQL SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; @@ -72,59 +72,59 @@ SHOW PARTITIONS FROM starrocks_audit_db__.starrocks_audit_tbl__; 分区创建后,您可以继续下一步。 -## 下载和配置 AuditLoader +## 下载并配置 AuditLoader -1. [下载](https://releases.starrocks.io/resources/auditloader.zip) AuditLoader 安装包。该软件包与所有可用版本的 StarRocks 兼容。 +1. [下载](https://releases.starrocks.io/resources/auditloader.zip) AuditLoader 安装包。该软件包兼容所有可用的 StarRocks 版本。 -2. 解压安装包。 +2. 解压安装包。 ```shell unzip auditloader.zip ``` - 解压以下文件: + 解压后包含以下文件: - - **auditloader.jar**:AuditLoader 的 JAR 文件。 - - **plugin.properties**:AuditLoader 的属性文件。您无需修改此文件。 - - **plugin.conf**:AuditLoader 的配置文件。在大多数情况下,您只需修改文件中的 `user` 和 `password` 字段。 + - **auditloader.jar**:AuditLoader 的 JAR 文件。 + - **plugin.properties**:AuditLoader 的属性文件。您无需修改此文件。 + - **plugin.conf**:AuditLoader 的配置文件。在大多数情况下,您只需修改文件中的 `user` 和 `password` 字段。 -3. 修改 **plugin.conf** 以配置 AuditLoader。您必须配置以下项以确保 AuditLoader 正常工作: +3. 修改 **plugin.conf** 以配置 AuditLoader。您必须配置以下各项,以确保 AuditLoader 正常工作: - - `frontend_host_port`:FE IP 地址和 HTTP 端口,格式为 `:`。建议将其设置为默认值 `127.0.0.1:8030`。StarRocks 中的每个 FE 独立管理自己的审计日志,安装插件后,每个 FE 将启动自己的后台线程来获取并保存审计日志,并通过 Stream Load 将其写入。`frontend_host_port` 配置项用于为插件的后台 Stream Load 任务提供 HTTP 协议的 IP 和端口,此参数不支持多个值。参数的 IP 部分可以使用集群中任何 FE 的 IP,但不推荐这样做,因为如果相应的 FE 崩溃,其他 FE 后台的审计日志写入任务也会因为通信失败而失败。建议将其设置为默认值 `127.0.0.1:8030`,这样每个 FE 使用自己的 HTTP 端口进行通信,从而避免在其他 FE 出现异常时影响通信(所有写入任务最终都将转发到 FE Leader 节点执行)。 - - `database`:您为存储审计日志而创建的数据库名称。 - - `table`:您为存储审计日志而创建的表名称。 - - `user`:您的集群用户名。您必须具有向表中加载数据(LOAD_PRIV)的权限。 - - `password`:您的用户密码。 - - `secret_key`:用于加密密码的密钥(字符串,长度不得超过 16 字节)。如果未设置此参数,则表示 **plugin.conf** 中的密码不会被加密,您只需在 `password` 中指定明文密码。如果指定此参数,则表示密码由该密钥加密,您需要在 `password` 中指定加密字符串。加密密码可以在 StarRocks 中使用 `AES_ENCRYPT` 函数生成:`SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`。 - - `filter`:审计日志加载的过滤条件。此参数基于 Stream Load 中的 [WHERE 参数](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties),即 `-H “where: ”`,默认为空字符串。示例:`filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`。 + - `frontend_host_port`:FE 的 IP 地址和 HTTP 端口,格式为 `:`。建议将其设置为默认值 `127.0.0.1:8030`。StarRocks 中的每个 FE 都独立管理其审计日志,安装插件后,每个 FE 都会启动自己的后台线程来抓取和保存审计日志,并通过 Stream Load 将其写入。`frontend_host_port` 配置项用于为插件的后台 Stream Load 任务提供 HTTP 协议的 IP 和端口,此参数不支持多个值。参数的 IP 部分可以使用集群中任何 FE 的 IP,但不建议这样做,因为如果相应的 FE 崩溃,其他 FE 后台的审计日志写入任务也会因通信失败而失败。建议将其设置为默认值 `127.0.0.1:8030`,这样每个 FE 都使用自己的 HTTP 端口进行通信,从而避免在其他 FE 出现异常时影响通信(所有写入任务最终都会转发到 FE Leader 节点执行)。 + - `database`:您创建的用于存储审计日志的数据库名称。 + - `table`:您创建的用于存储审计日志的表名称。 + - `user`:您的集群用户名。您必须拥有将数据加载 (LOAD_PRIV) 到表中的权限。 + - `password`:您的用户密码。 + - `secret_key`:用于加密密码的密钥(字符串,不能超过 16 字节)。如果未设置此参数,则表示 **plugin.conf** 中的密码不会被加密,您只需在 `password` 中指定明文密码。如果指定了此参数,则表示密码已通过此密钥加密,您需要在 `password` 中指定加密字符串。加密密码可以在 StarRocks 中使用 `AES_ENCRYPT` 函数生成:`SELECT TO_BASE64(AES_ENCRYPT('password','secret_key'));`。 + - `filter`:审计日志加载的过滤条件。此参数基于 Stream Load 中的 [WHERE 参数](../../sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md#opt_properties),即 `-H “where: ”`,默认为空字符串。示例:`filter=isQuery=1 and clientIp like '127.0.0.1%' and user='root'`。 -4. 将文件重新打包。 +4. 将文件重新打包。 ```shell zip -q -m -r auditloader.zip auditloader.jar plugin.conf plugin.properties ``` -5. 将软件包分发到所有托管 FE 节点的机器上。确保所有软件包都存储在相同的路径中。否则,安装将失败。分发软件包后,请记住复制软件包的绝对路径。 +5. 将软件包分发到所有托管 FE 节点的机器上。确保所有软件包都存储在相同的路径中。否则,安装将失败。分发软件包后,请记住复制软件包的绝对路径。 > **注意** > - > 您也可以将 **auditloader.zip** 分发到所有 FE 可访问的 HTTP 服务(例如,`httpd` 或 `nginx`),并通过网络安装。请注意,在两种情况下,**auditloader.zip** 在安装执行后都需要持久化在路径中,并且源文件在安装后不应被删除。 + > 您还可以将 **auditloader.zip** 分发到所有 FE 均可访问的 HTTP 服务(例如,`httpd` 或 `nginx`),并通过网络进行安装。请注意,在这两种情况下,**auditloader.zip** 都需要在执行安装后保留在路径中,并且安装后不应删除源文件。 ## 安装 AuditLoader -执行以下语句以及您复制的路径,将 AuditLoader 作为插件安装到 StarRocks 中: +执行以下语句以及您复制的路径,以在 StarRocks 中将 AuditLoader 作为插件安装: ```SQL INSTALL PLUGIN FROM ""; ``` -从本地软件包安装的示例: +从本地软件包安装示例: ```SQL INSTALL PLUGIN FROM ""; ``` -如果您想通过网络路径安装插件,您需要在 INSTALL 语句的 properties 中提供软件包的 md5。 +如果您想通过网络路径安装插件,您需要在 INSTALL 语句的 PROPERTIES 中提供软件包的 MD5 值。 示例: @@ -132,11 +132,11 @@ INSTALL PLUGIN FROM ""; INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5sum" = "3975F7B880C9490FE95F42E2B2A28E2D"); ``` -有关详细说明,请参阅 [INSTALL PLUGIN](../../sql-reference/sql-statements/cluster-management/plugin/INSTALL_PLUGIN.md)。 +详细说明请参阅 [INSTALL PLUGIN](../../sql-reference/sql-statements/cluster-management/plugin/INSTALL_PLUGIN.md)。 ## 验证安装并查询审计日志 -1. 您可以通过 [SHOW PLUGINS](../../sql-reference/sql-statements/cluster-management/plugin/SHOW_PLUGINS.md) 检查安装是否成功。 +1. 您可以通过 [SHOW PLUGINS](../../sql-reference/sql-statements/cluster-management/plugin/SHOW_PLUGINS.md) 检查安装是否成功。 在以下示例中,插件 `AuditLoader` 的 `Status` 为 `INSTALLED`,表示安装成功。 @@ -156,7 +156,7 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 *************************** 2. row *************************** Name: AuditLoader Type: AUDIT - Description: 适用于 3.3.11+ 版本。将审计日志加载到 StarRocks,用户可以查看查询的统计信息 + Description: Available for versions 3.3.11+. Load audit log to starrocks, and user can view the statistic of queries Version: 5.0.0 JavaVersion: 11 ClassName: com.starrocks.plugin.audit.AuditLoaderPlugin @@ -167,9 +167,9 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 2 rows in set (0.01 sec) ``` -2. 执行一些随机 SQLs 以生成审计日志,并等待 60 秒(或您在配置 AuditLoader 时在 `max_batch_interval_sec` 项中指定的时间)以允许 AuditLoader 将审计日志加载到 StarRocks 中。 +2. 执行一些随机 SQL 语句以生成审计日志,并等待 60 秒(或您在配置 AuditLoader 时在 `max_batch_interval_sec` 项中指定的时间),以允许 AuditLoader 将审计日志加载到 StarRocks 中。 -3. 通过查询表检查审计日志。 +3. 通过查询表来检查审计日志。 ```SQL SELECT * FROM starrocks_audit_db__.starrocks_audit_tbl__; @@ -218,4 +218,5 @@ INSTALL PLUGIN FROM "http://xx.xx.xxx.xxx/extra/auditloader.zip" PROPERTIES("md5 UNINSTALL PLUGIN AuditLoader; ``` -AuditLoader 的日志打印在 **fe.log** 中,您可以通过在 **fe.log** 中搜索关键字 `audit` 来检索它们。所有配置都正确设置后,您可以按照上述步骤再次安装 AuditLoader。 +AuditLoader 的日志会打印在 **fe.log** 中,您可以通过在 **fe.log** 中搜索关键词 `audit` 来检索它们。所有配置正确设置后,您可以按照上述步骤再次安装 AuditLoader。 +``` diff --git a/docs/zh/administration/management/compaction.md b/docs/zh/administration/management/compaction.md index b1a8f14..faa170f 100644 --- a/docs/zh/administration/management/compaction.md +++ b/docs/zh/administration/management/compaction.md @@ -4,89 +4,89 @@ displayed_sidebar: docs # 共享数据集群的 Compaction -本主题介绍如何在 StarRocks 共享数据集群中管理 Compaction。 +本主题介绍了如何在 StarRocks 共享数据集群中管理 Compaction。 ## 概述 -StarRocks 中的每次数据加载操作都会生成新版本的数据文件。Compaction 将不同版本的数据文件合并成更大的文件,从而减少小文件的数量并提高查询效率。 +StarRocks 中的每次数据加载操作都会生成新版本的数据文件。Compaction 将不同版本的数据文件合并成更大的文件,减少小文件数量,提高查询效率。 ## Compaction 分数 ### 概述 -*Compaction 分数*反映了分区中数据文件的合并状态。分数越高,表示合并进度越低,这意味着该分区有更多未合并的数据文件版本。FE 维护每个分区的 Compaction 分数信息,包括最大 Compaction 分数(分区中所有 Tablet 的最高分数)。 +*Compaction 分数* 反映了分区中数据文件的合并状态。分数越高表示合并进度越低,意味着该分区有更多未合并的数据文件版本。FE 维护每个分区的 Compaction 分数信息,包括最大 Compaction 分数(分区中所有 Tablet 的最高分数)。 -如果一个分区的最大 Compaction 分数低于 FE 参数 `lake_compaction_score_selector_min_score`(默认值:10),则该分区的 Compaction 被视为完成。最大 Compaction 分数超过 100 表示 Compaction 状态不健康。当分数超过 FE 参数 `lake_ingest_slowdown_threshold`(默认值:100)时,系统会减慢该分区的数据加载事务提交速度。如果它超过 `lake_compaction_score_upper_bound`(默认值:2000),系统会拒绝该分区的导入事务。 +如果分区的最大 Compaction 分数低于 FE 参数 `lake_compaction_score_selector_min_score`(默认值:10),则该分区的 Compaction 被认为已完成。最大 Compaction 分数超过 100 表示 Compaction 处于不健康状态。当分数超过 FE 参数 `lake_ingest_slowdown_threshold`(默认值:100)时,系统会减慢该分区的数据加载事务提交速度。如果超过 `lake_compaction_score_upper_bound`(默认值:2000),系统将拒绝该分区的导入事务。 ### 计算规则 -通常,每个数据文件对 Compaction 分数的贡献为 1。例如,如果一个分区有一个 Tablet,并且从第一次加载操作中生成了 10 个数据文件,则该分区的最大 Compaction 分数为 10。一个 Tablet 内由事务生成的所有数据文件都作为 Rowset 进行分组。 +通常,每个数据文件对 Compaction 分数贡献 1。例如,如果一个分区有一个 Tablet,并且第一次加载操作生成了 10 个数据文件,则该分区的最大 Compaction 分数为 10。一个事务在 Tablet 内生成的所有数据文件都被分组为一个 Rowset。 -在分数计算期间,Tablet 的 Rowset 会按大小分组,文件数量最多的组决定了该 Tablet 的 Compaction 分数。 +在分数计算过程中,Tablet 的 Rowset 按大小分组,文件数量最多的组决定了 Tablet 的 Compaction 分数。 -例如,一个 Tablet 经历了 7 次加载操作,生成了大小分别为:100 MB、100 MB、100 MB、10 MB、10 MB、10 MB 和 10 MB 的 Rowset。在计算过程中,系统会将三个 100 MB 的 Rowset 组成一组,将四个 10 MB 的 Rowset 组成另一组。Compaction 分数是根据文件数量较多的组计算的。在这种情况下,第二组的 Compaction 分数更大。Compaction 优先处理分数较高的组,因此在第一次 Compaction 后,Rowset 的分布将是:100 MB、100 MB、100 MB 和 40 MB。 +例如,一个 Tablet 进行了 7 次加载操作,生成了大小分别为:100 MB、100 MB、100 MB、10 MB、10 MB、10 MB 和 10 MB 的 Rowset。在计算过程中,系统会将三个 100 MB 的 Rowset 分成一组,将四个 10 MB 的 Rowset 分成另一组。Compaction 分数是根据文件数量更多的组来计算的。在本例中,第二组具有更高的 Compaction 分数。Compaction 优先处理分数更高的组,因此在第一次 Compaction 后,Rowset 分布将是:100 MB、100 MB、100 MB 和 40 MB。 ## Compaction 工作流程 -对于共享数据集群,StarRocks 引入了一种新的 FE 控制的 Compaction 机制: +对于共享数据集群,StarRocks 引入了一种新的由 FE 控制的 Compaction 机制: -1. 分数计算:Leader FE 节点根据事务发布结果计算并存储分区的 Compaction 分数。 -2. 候选选择:FE 选择具有最高最大 Compaction 分数的分区作为 Compaction 候选。 -3. 任务生成:FE 为选定的分区启动 Compaction 事务,生成 Tablet 级别的子任务,并将其分派给计算节点 (CN),直到达到 FE 参数 `lake_compaction_max_tasks` 设置的限制。 -4. 子任务执行:CN 在后台执行 Compaction 子任务。每个 CN 的并发子任务数量由 CN 参数 `compact_threads` 控制。 -5. 结果收集:FE 聚合子任务结果并提交 Compaction 事务。 -6. 发布:FE 发布成功提交的 Compaction 事务。 +1. **分数计算**:Leader FE 节点根据事务发布结果计算并存储分区的 Compaction 分数。 +2. **候选选择**:FE 选择最大 Compaction 分数最高的分区作为 Compaction 候选。 +3. **任务生成**:FE 为选定的分区启动 Compaction 事务,生成 Tablet 级别的子任务,并将其分发给计算节点 (CN),直到达到 FE 参数 `lake_compaction_max_tasks` 设置的限制。 +4. **子任务执行**:CN 在后台执行 Compaction 子任务。每个 CN 的并发子任务数量由 CN 参数 `compact_threads` 控制。 +5. **结果收集**:FE 聚合子任务结果并提交 Compaction 事务。 +6. **发布**:FE 发布成功提交的 Compaction 事务。 ## 管理 Compaction ### 查看 Compaction 分数 -- 您可以使用 SHOW PROC 语句查看特定表中分区的 Compaction 分数。通常,您只需关注 `MaxCS` 字段。如果 `MaxCS` 低于 10,则认为 Compaction 已完成。如果 `MaxCS` 高于 100,则 Compaction 分数相对较高。如果 `MaxCS` 超过 500,则 Compaction 分数非常高,可能需要手动干预。 - - ```Plain - SHOW PARTITIONS FROM - SHOW PROC '/dbs///partitions' - ``` - - 示例: - - ```Plain - mysql> SHOW PROC '/dbs/load_benchmark/store_sales/partitions'; - +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ - | PartitionId | PartitionName | CompactVersion | VisibleVersion | NextVersion | State | PartitionKey | Range | DistributionKey | Buckets | DataSize | RowCount | CacheTTL | AsyncWrite | AvgCS | P50CS | MaxCS | - +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ - | 38028 | store_sales | 913 | 921 | 923 | NORMAL | | | ss_item_sk, ss_ticket_number | 64 | 15.6GB | 273857126 | 2592000 | false | 10.00 | 10.00 | 10.00 | - +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ - 1 row in set (0.20 sec) - ``` - -- 您还可以通过查询系统定义的视图 `information_schema.partitions_meta` 来查看分区 Compaction 分数。 - - 示例: - - ```Plain - mysql> SELECT * FROM information_schema.partitions_meta ORDER BY Max_CS LIMIT 10; - +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ - | DB_NAME | TABLE_NAME | PARTITION_NAME | PARTITION_ID | COMPACT_VERSION | VISIBLE_VERSION | VISIBLE_VERSION_TIME | NEXT_VERSION | PARTITION_KEY | PARTITION_VALUE | DISTRIBUTION_KEY | BUCKETS | REPLICATION_NUM | STORAGE_MEDIUM | COOLDOWN_TIME | LAST_CONSISTENCY_CHECK_TIME | IS_IN_MEMORY | IS_TEMP | DATA_SIZE | ROW_COUNT | ENABLE_DATACACHE | AVG_CS | P50_CS | MAX_CS | STORAGE_PATH | - +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ - | tpcds_1t | call_center | call_center | 11905 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | cc_call_center_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 12.3KB | 42 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11906/11905 | - | tpcds_1t | web_returns | web_returns | 12030 | 3 | 3 | 2024-03-17 08:40:48 | 4 | | | wr_item_sk, wr_order_number | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 3.5GB | 71997522 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12031/12030 | - | tpcds_1t | warehouse | warehouse | 11847 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | w_warehouse_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 4.2KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11848/11847 | - | tpcds_1t | ship_mode | ship_mode | 11851 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | sm_ship_mode_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 1.7KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11852/11851 | - | tpcds_1t | customer_address | customer_address | 11790 | 0 | 2 | 2024-03-17 08:32:19 | 3 | | | ca_address_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 120.9MB | 6000000 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11791/11790 | - | tpcds_1t | time_dim | time_dim | 11855 | 0 | 2 | 2024-03-17 08:30:48 | 3 | | | t_time_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 864.7KB | 86400 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11856/11855 | - | tpcds_1t | web_sales | web_sales | 12049 | 3 | 3 | 2024-03-17 10:14:20 | 4 | | | ws_item_sk, ws_order_number | 128 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 47.7GB | 720000376 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12050/12049 | - | tpcds_1t | store | store | 11901 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | s_store_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 95.6KB | 1002 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11902/11901 | - | tpcds_1t | web_site | web_site | 11928 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | web_site_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 13.4KB | 54 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11929/11928 | - | tpcds_1t | household_demographics | household_demographics | 11932 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | hd_demo_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 2.1KB | 7200 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11933/11932 | - +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ - ``` +- 您可以使用 SHOW PROC 语句查看特定表中分区的 Compaction 分数。通常,您只需要关注 `MaxCS` 字段。如果 `MaxCS` 低于 10,则认为 Compaction 已完成。如果 `MaxCS` 高于 100,则 Compaction 分数相对较高。如果 `MaxCS` 超过 500,则 Compaction 分数非常高,可能需要手动干预。 + + ```Plain + SHOW PARTITIONS FROM + SHOW PROC '/dbs///partitions' + ``` + + 示例: + + ```Plain + mysql> SHOW PROC '/dbs/load_benchmark/store_sales/partitions'; + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | PartitionId | PartitionName | CompactVersion | VisibleVersion | NextVersion | State | PartitionKey | Range | DistributionKey | Buckets | DataSize | RowCount | CacheTTL | AsyncWrite | AvgCS | P50CS | MaxCS | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + | 38028 | store_sales | 913 | 921 | 923 | NORMAL | | | ss_item_sk, ss_ticket_number | 64 | 15.6GB | 273857126 | 2592000 | false | 10.00 | 10.00 | 10.00 | + +-------------+---------------+----------------+----------------+-------------+--------+--------------+-------+------------------------------+---------+----------+-----------+----------+------------+-------+-------+-------+ + 1 row in set (0.20 sec) + ``` + +- 您还可以通过查询系统定义的视图 `information_schema.partitions_meta` 来查看分区 Compaction 分数。 + + 示例: + + ```Plain + mysql> SELECT * FROM information_schema.partitions_meta ORDER BY Max_CS LIMIT 10; + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | DB_NAME | TABLE_NAME | PARTITION_NAME | PARTITION_ID | COMPACT_VERSION | VISIBLE_VERSION | VISIBLE_VERSION_TIME | NEXT_VERSION | PARTITION_KEY | PARTITION_VALUE | DISTRIBUTION_KEY | BUCKETS | REPLICATION_NUM | STORAGE_MEDIUM | COOLDOWN_TIME | LAST_CONSISTENCY_CHECK_TIME | IS_IN_MEMORY | IS_TEMP | DATA_SIZE | ROW_COUNT | ENABLE_DATACACHE | AVG_CS | P50_CS | MAX_CS | STORAGE_PATH | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + | tpcds_1t | call_center | call_center | 11905 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | cc_call_center_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 12.3KB | 42 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11906/11905 | + | tpcds_1t | web_returns | web_returns | 12030 | 3 | 3 | 2024-03-17 08:40:48 | 4 | | | wr_item_sk, wr_order_number | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 3.5GB | 71997522 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12031/12030 | + | tpcds_1t | warehouse | warehouse | 11847 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | w_warehouse_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 4.2KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11848/11847 | + | tpcds_1t | ship_mode | ship_mode | 11851 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | sm_ship_mode_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 1.7KB | 20 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11852/11851 | + | tpcds_1t | customer_address | customer_address | 11790 | 0 | 2 | 2024-03-17 08:32:19 | 3 | | | ca_address_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 120.9MB | 6000000 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11791/11790 | + | tpcds_1t | time_dim | time_dim | 11855 | 0 | 2 | 2024-03-17 08:30:48 | 3 | | | t_time_sk | 16 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 864.7KB | 86400 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11856/11855 | + | tpcds_1t | web_sales | web_sales | 12049 | 3 | 3 | 2024-03-17 10:14:20 | 4 | | | ws_item_sk, ws_order_number | 128 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 47.7GB | 720000376 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/12050/12049 | + | tpcds_1t | store | store | 11901 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | s_store_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 95.6KB | 1002 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11902/11901 | + | tpcds_1t | web_site | web_site | 11928 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | web_site_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 13.4KB | 54 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11929/11928 | + | tpcds_1t | household_demographics | household_demographics | 11932 | 0 | 2 | 2024-03-17 08:30:47 | 3 | | | hd_demo_sk | 1 | 1 | HDD | 9999-12-31 23:59:59 | NULL | 0 | 0 | 2.1KB | 7200 | 0 | 0 | 0 | 0 | s3://XXX/536a3c77-52c3-485a-8217-781734a970b1/db10328/11933/11932 | + +--------------+----------------------------+----------------------------+--------------+-----------------+-----------------+----------------------+--------------+---------------+-----------------+-----------------------------------------+---------+-----------------+----------------+---------------------+-----------------------------+--------------+---------+-----------+------------+------------------+----------+--------+--------+-------------------------------------------------------------------+ + ``` ### 查看 Compaction 任务 -当新数据加载到系统时,FE 会不断调度 Compaction 任务在不同的 CN 节点上执行。您可以先在 FE 上查看 Compaction 任务的总体状态,然后查看每个任务在 CN 上的执行详情。 +随着新数据加载到系统,FE 会不断调度 Compaction 任务在不同的 CN 节点上执行。您可以首先在 FE 上查看 Compaction 任务的总体状态,然后查看每个任务在 CN 上的执行详情。 -#### 查看 Compaction 任务的总体状态 +#### 查看 Compaction 任务总体状态 您可以使用 SHOW PROC 语句查看 Compaction 任务的总体状态。 @@ -109,27 +109,27 @@ mysql> SHOW PROC '/compactions'; 返回以下字段: -- `Partition`: `Compaction` 任务所属的分区。 -- `TxnID`: 分配给 `Compaction` 任务的事务 ID。 -- `StartTime`: `Compaction` 任务开始的时间。`NULL` 表示任务尚未启动。 -- `CommitTime`: `Compaction` 任务提交数据的时间。`NULL` 表示数据尚未提交。 -- `FinishTime`: `Compaction` 任务发布数据的时间。`NULL` 表示数据尚未发布。 -- `Error`: `Compaction` 任务的错误信息(如果有)。 -- `Profile`: (从 v3.2.12 和 v3.3.4 开始支持)`Compaction` 任务完成后的 Profile。 - - `sub_task_count`: 分区中的子任务(相当于 Tablet)数量。 - - `read_local_sec`: 所有子任务从本地缓存读取数据的总耗时。单位:秒。 - - `read_local_mb`: 所有子任务从本地缓存读取数据的总大小。单位:MB。 - - `read_remote_sec`: 所有子任务从远端存储读取数据的总耗时。单位:秒。 - - `read_remote_mb`: 所有子任务从远端存储读取数据的总大小。单位:MB。 - - `read_segment_count`: 所有子任务读取的文件总数。 - - `write_segment_count`: 所有子任务生成的新文件总数。 - - `write_segment_mb`: 所有子任务生成的新文件总大小。单位:MB。 - - `write_remote_sec`: 所有子任务写入远端存储的总耗时。单位:秒。 - - `in_queue_sec`: 所有子任务在队列中停留的总时间。单位:秒。 - -#### 查看 Compaction 任务的执行详情 - -每个 Compaction 任务都分为多个子任务,每个子任务对应一个 Tablet。您可以通过查询系统定义的视图 `information_schema.be_cloud_native_compactions` 来查看每个子任务的执行详情。 +- `Partition`:Compaction 任务所属的分区。 +- `TxnID`:分配给 Compaction 任务的事务 ID。 +- `StartTime`:Compaction 任务开始时间。`NULL` 表示任务尚未启动。 +- `CommitTime`:Compaction 任务提交数据的时间。`NULL` 表示数据尚未提交。 +- `FinishTime`:Compaction 任务发布数据的时间。`NULL` 表示数据尚未发布。 +- `Error`:Compaction 任务的错误信息(如果有)。 +- `Profile`:(v3.2.12 和 v3.3.4 起支持)Compaction 任务完成后的 Profile。 + - `sub_task_count`:分区中的子任务数量(等同于 Tablet 数量)。 + - `read_local_sec`:所有子任务从本地缓存读取数据的总耗时。单位:秒。 + - `read_local_mb`:所有子任务从本地缓存读取数据的总大小。单位:MB。 + - `read_remote_sec`:所有子任务从远端存储读取数据的总耗时。单位:秒。 + - `read_remote_mb`:所有子任务从远端存储读取数据的总大小。单位:MB。 + - `read_segment_count`:所有子任务读取的文件总数。 + - `write_segment_count`:所有子任务生成的新文件总数。 + - `write_segment_mb`:所有子任务生成的新文件总大小。单位:MB。 + - `write_remote_sec`:所有子任务向远端存储写入数据的总耗时。单位:秒。 + - `in_queue_sec`:所有子任务在队列中的总停留时间。单位:秒。 + +#### 查看 Compaction 任务执行详情 + +每个 Compaction 任务被分成多个子任务,每个子任务对应一个 Tablet。您可以通过查询系统定义的视图 `information_schema.be_cloud_native_compactions` 来查看每个子任务的执行详情。 示例: @@ -141,35 +141,35 @@ mysql> SELECT * FROM information_schema.be_cloud_native_compactions; | 10001 | 51047 | 43034 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | | 10001 | 51048 | 43032 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":32,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | | 10001 | 51049 | 43033 | 12 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 82 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | -| 10001 | 51051 | 43038 | 9 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 84 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | -| 10001 | 51052 | 43036 | 12 | 0 | 0 | NULL | NULL | 0 | | | -| 10001 | 51053 | 43035 | 12 | 0 | 1 | 2024-09-24 19:15:16 | NULL | 2 | | {"read_local_sec":0,"read_local_mb":1,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":100,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | -+-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ + | 10001 | 51051 | 43038 | 9 | 0 | 1 | 2024-09-24 19:15:15 | NULL | 84 | | {"read_local_sec":0,"read_local_mb":31,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":1900,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | + | 10001 | 51052 | 43036 | 12 | 0 | 0 | NULL | NULL | 0 | | | + | 10001 | 51053 | 43035 | 12 | 0 | 1 | 2024-09-24 19:15:16 | NULL | 2 | | {"read_local_sec":0,"read_local_mb":1,"read_remote_sec":0,"read_remote_mb":0,"read_remote_count":0,"read_local_count":100,"segment_init_sec":0,"column_iterator_init_sec":0,"in_queue_sec":0} | + +-------+--------+-----------+---------+---------+------+---------------------+-------------+----------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ``` 返回以下字段: -- `BE_ID`: CN 的 ID。 -- `TXN_ID`: 子任务所属事务的 ID。 -- `TABLET_ID`: 子任务所属 Tablet 的 ID。 -- `VERSION`: Tablet 的版本。 -- `RUNS`: 子任务已执行的次数。 -- `START_TIME`: 子任务开始的时间。 -- `FINISH_TIME`: 子任务完成的时间。 -- `PROGRESS`: Tablet 的 Compaction 进度百分比。 -- `STATUS`: 子任务的状态。如果出现错误,此字段将返回错误消息。 -- `PROFILE`: (从 v3.2.12 和 v3.3.4 开始支持)子任务的运行时 Profile。 - - `read_local_sec`: 子任务从本地缓存读取数据的耗时。单位:秒。 - - `read_local_mb`: 子任务从本地缓存读取数据的大小。单位:MB。 - - `read_remote_sec`: 子任务从远端存储读取数据的耗时。单位:秒。 - - `read_remote_mb`: 子任务从远端存储读取数据的大小。单位:MB。 - - `read_local_count`: 子任务从本地缓存读取数据的次数。 - - `read_remote_count`: 子任务从远端存储读取数据的次数。 - - `in_queue_sec`: 子任务在队列中停留的时间。单位:秒。 +- `BE_ID`:CN 的 ID。 +- `TXN_ID`:子任务所属事务的 ID。 +- `TABLET_ID`:子任务所属 Tablet 的 ID。 +- `VERSION`:Tablet 的版本。 +- `RUNS`:子任务已执行的次数。 +- `START_TIME`:子任务开始时间。 +- `FINISH_TIME`:子任务结束时间。 +- `PROGRESS`:Tablet 的 Compaction 进度(百分比)。 +- `STATUS`:子任务的状态。如果发生错误,此字段将返回错误信息。 +- `PROFILE`:(v3.2.12 和 v3.3.4 起支持)子任务的运行时 Profile。 + - `read_local_sec`:子任务从本地缓存读取数据的耗时。单位:秒。 + - `read_local_mb`:子任务从本地缓存读取数据的大小。单位:MB。 + - `read_remote_sec`:子任务从远端存储读取数据的耗时。单位:秒。 + - `read_remote_mb`:子任务从远端存储读取数据的大小。单位:MB。 + - `read_local_count`:子任务从本地缓存读取数据的次数。 + - `read_remote_count`:子任务从远端存储读取数据的次数。 + - `in_queue_sec`:子任务在队列中的停留时间。单位:秒。 ### 配置 Compaction 任务 -您可以使用这些 FE 和 CN (BE) 参数配置 Compaction 任务。 +您可以使用这些 FE 和 CN (BE) 参数来配置 Compaction 任务。 #### FE 参数 @@ -181,12 +181,12 @@ ADMIN SET FRONTEND CONFIG ("lake_compaction_max_tasks" = "-1"); ##### lake_compaction_max_tasks -- 默认值: -1 -- 类型: Int -- 单位: - -- 是否可变: Yes -- 描述: 共享数据集群中允许的最大并发 Compaction 任务数。将此项设置为 `-1` 表示以自适应方式计算并发任务数,即存活 CN 节点数乘以 16。将此值设置为 `0` 将禁用 Compaction。 -- 引入版本: v3.1.0 +- 默认值:-1 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:共享数据集群中允许的最大并发 Compaction 任务数量。将此项设置为 `-1` 表示以自适应方式计算并发任务数量,即存活 CN 节点的数量乘以 16。将此值设置为 `0` 将禁用 Compaction。 +- 引入版本:v3.1.0 ```SQL ADMIN SET FRONTEND CONFIG ("lake_compaction_disable_tables" = "11111;22222"); @@ -194,30 +194,30 @@ ADMIN SET FRONTEND CONFIG ("lake_compaction_disable_tables" = "11111;22222"); ##### lake_compaction_disable_tables -- 默认值:"" -- 类型:String -- 单位:- -- 是否可变:Yes -- 描述:禁用某些表的 Compaction。这不会影响已开始的 Compaction。此项的值是表 ID。多个值用 ';' 分隔。 -- 引入版本:v3.2.7 +- 默认值:"" +- 类型:String +- 单位:- +- 是否可变:是 +- 描述:禁用某些表的 Compaction。这不会影响已开始的 Compaction。此项的值是表 ID。多个值用分号分隔。 +- 引入版本:v3.2.7 #### CN 参数 您可以动态配置以下 CN 参数。 ```SQL -UPDATE information_schema.be_configs SET VALUE = 8 +UPDATE information_schema.be_configs SET VALUE = 8 WHERE name = "compact_threads"; ``` ##### compact_threads -- 默认值: 4 -- 类型: Int -- 单位: - -- 是否可变: Yes -- 描述: 用于并发 Compaction 任务的最大线程数。此配置从 v3.1.7 和 v3.2.2 起变为动态。 -- 引入版本: v3.0.0 +- 默认值:4 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:用于并发 Compaction 任务的最大线程数。此配置从 v3.1.7 和 v3.2.2 起改为动态。 +- 引入版本:v3.0.0 > **注意** > @@ -225,38 +225,38 @@ WHERE name = "compact_threads"; ##### max_cumulative_compaction_num_singleton_deltas -- 默认值: 500 -- 类型: Int -- 单位: - -- 是否可变: Yes -- 描述: 单个 Cumulative Compaction 中可以合并的最大 Segment 数。如果 Compaction 期间发生 OOM,您可以减小此值。 -- 引入版本: - +- 默认值:500 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:单个 Cumulative Compaction 中可以合并的最大段数。如果在 Compaction 期间发生 OOM,您可以减小此值。 +- 引入版本:- > **注意** > -> 在生产环境中,建议将 `max_cumulative_compaction_num_singleton_deltas` 设置为 `100` 以加速 Compaction 任务并减少其资源消耗。 +> 在生产环境中,建议将 `max_cumulative_compaction_num_singleton_deltas` 设置为 `100`,以加速 Compaction 任务并减少其资源消耗。 ##### lake_pk_compaction_max_input_rowsets -- 默认值: 500 -- 类型: Int -- 单位: - -- 是否可变: Yes -- 描述: 共享数据集群中 Primary Key 表 Compaction 任务允许的最大输入 Rowset 数。此参数的默认值已从 v3.2.4 和 v3.1.10 起从 `5` 更改为 `1000`,并从 v3.3.1 和 v3.2.9 起更改为 `500`。为 Primary Key 表启用分级 Compaction 策略(通过将 `enable_pk_size_tiered_compaction_strategy` 设置为 `true`)后,StarRocks 无需限制每次 Compaction 的 Rowset 数量以减少写入放大。因此,此参数的默认值已增加。 -- 引入版本: v3.1.8, v3.2.3 +- 默认值:500 +- 类型:Int +- 单位:- +- 是否可变:是 +- 描述:共享数据集群中主键表 Compaction 任务允许的最大输入 Rowset 数量。此参数的默认值从 `5` 更改为 `1000`(自 v3.2.4 和 v3.1.10 起),并更改为 `500`(自 v3.3.1 和 v3.2.9 起)。在为主键表启用分层 Compaction 策略(通过将 `enable_pk_size_tiered_compaction_strategy` 设置为 `true`)后,StarRocks 无需限制每次 Compaction 的 Rowset 数量来减少写入放大。因此,此参数的默认值有所增加。 +- 引入版本:v3.1.8, v3.2.3 > **注意** > -> 在生产环境中,建议将 `max_cumulative_compaction_num_singleton_deltas` 设置为 `100` 以加速 Compaction 任务并减少其资源消耗。 +> 在生产环境中,建议将 `max_cumulative_compaction_num_singleton_deltas` 设置为 `100`,以加速 Compaction 任务并减少其资源消耗。 ### 手动触发 Compaction 任务 ```SQL --- Trigger compaction for the whole table. +-- 触发整个表的 Compaction。 ALTER TABLE COMPACT; --- Trigger compaction for a specific partition. +-- 触发特定分区的 Compaction。 ALTER TABLE COMPACT ; --- Trigger compaction for multiple partitions. +-- 触发多个分区的 Compaction。 ALTER TABLE COMPACT (, , ...); ``` @@ -270,34 +270,34 @@ CANCEL COMPACTION WHERE TXN_ID = ; > **注意** > -> - CANCEL COMPACTION 语句必须从 Leader FE 节点提交。 -> - CANCEL COMPACTION 语句仅适用于尚未提交的事务,即 `SHOW PROC '/compactions'` 返回结果中 `CommitTime` 为 NULL 的事务。 -> - CANCEL COMPACTION 是一个异步过程。您可以通过执行 `SHOW PROC '/compactions'` 来检查任务是否已取消。 +> - CANCEL COMPACTION 语句必须从 Leader FE 节点提交。 +> - CANCEL COMPACTION 语句仅适用于尚未提交的事务,即在 `SHOW PROC '/compactions'` 的返回结果中,`CommitTime` 为 NULL。 +> - CANCEL COMPACTION 是一个异步过程。您可以通过执行 `SHOW PROC '/compactions'` 来检查任务是否已取消。 ## 最佳实践 -由于 Compaction 对查询性能至关重要,建议定期监控表和分区的数据合并状态。以下是一些最佳实践和指导: +由于 Compaction 对查询性能至关重要,建议定期监控表和分区的数据合并状态。以下是一些最佳实践和指南: -- 尝试增加加载之间的时间间隔(避免间隔小于 10 秒的场景),并增加每次加载的批次大小(避免批次大小小于 100 行数据)。 -- 调整 CN 上并行 Compaction worker 线程的数量以加速任务执行。在生产环境中,建议将 `compact_threads` 设置为 BE/CN CPU 核心数的 25%。 -- 使用 `show proc '/compactions'` 和 `select * from information_schema.be_cloud_native_compactions;` 监控 Compaction 任务状态。 -- 监控 Compaction 分数,并根据其配置警报。StarRocks 内置的 Grafana 监控模板包含此指标。 -- 关注 Compaction 期间的资源消耗,尤其是内存使用。Grafana 监控模板也包含此指标。 +- 尝试增加加载间隔时间(避免小于 10 秒的间隔)并增加每次加载的批次大小(避免小于 100 行数据的批次大小)。 +- 调整 CN 上并行 Compaction 工作线程的数量以加速任务执行。在生产环境中,建议将 `compact_threads` 设置为 BE/CN CPU 核心数的 25%。 +- 使用 `show proc '/compactions'` 和 `select * from information_schema.be_cloud_native_compactions;` 监控 Compaction 任务状态。 +- 监控 Compaction 分数,并根据其配置告警。StarRocks 内置的 Grafana 监控模板包含此指标。 +- 关注 Compaction 期间的资源消耗,尤其是内存使用。Grafana 监控模板也包含此指标。 ## 故障排除 ### 慢查询 -要识别由 Compaction 不及时导致的慢查询,您可以在 SQL Profile 中检查单个 Fragment 内 `SegmentsReadCount` 除以 `TabletCount` 的值。如果该值很大,例如几十或更多,则 Compaction 不及时可能是慢查询的原因。 +要识别由不及时 Compaction 导致的慢查询,您可以在 SQL Profile 中检查单个 Fragment 内 `SegmentsReadCount` 除以 `TabletCount` 的值。如果这是一个很大的值,例如几十或更多,则不及时 Compaction 可能是慢查询的原因。 ### 集群中最大 Compaction 分数过高 -1. 使用 `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` 和 `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"` 检查 Compaction 相关参数是否在合理范围内。 -2. 使用 `SHOW PROC '/compactions'` 检查 Compaction 是否卡住: - - 如果 `CommitTime` 保持为 NULL,请检查系统视图 `information_schema.be_cloud_native_compactions` 以查找 Compaction 卡住的原因。 - - 如果 `FinishTime` 保持为 NULL,请使用 `TxnID` 在 Leader FE 日志中搜索发布失败的原因。 -3. 使用 `SHOW PROC '/compactions'` 检查 Compaction 是否运行缓慢: - - 如果 `sub_task_count` 过大(使用 `SHOW PARTITIONS` 检查此分区中每个 Tablet 的大小),则表可能创建不当。 - - 如果 `read_remote_mb` 过大(超过总读取数据的 30%),请检查服务器磁盘大小,并通过 `SHOW BACKENDS` 的 `DataCacheMetrics` 字段检查缓存配额。 - - 如果 `write_remote_sec` 过大(超过总 Compaction 时间的 90%),则写入远端存储可能过慢。这可以通过检查带有关键词 `single upload latency` 和 `multi upload latency` 的共享数据特定监控指标来验证。 - - 如果 `in_queue_sec` 过大(每个 Tablet 的平均等待时间超过 60 秒),则参数设置可能不合理或其它正在运行的 Compaction 任务过慢。 +1. 使用 `ADMIN SHOW FRONTEND CONFIG LIKE "%lake_compaction%"` 和 `SELECT * FROM information_schema.be_configs WHERE name = "compact_threads"` 检查 Compaction 相关参数是否在合理范围内。 +2. 使用 `SHOW PROC '/compactions'` 检查 Compaction 是否卡住: + - 如果 `CommitTime` 保持为 NULL,请检查系统视图 `information_schema.be_cloud_native_compactions` 以了解 Compaction 卡住的原因。 + - 如果 `FinishTime` 保持为 NULL,请使用 `TxnID` 在 Leader FE 日志中搜索发布失败的原因。 +3. 使用 `SHOW PROC '/compactions'` 检查 Compaction 是否运行缓慢: + - 如果 `sub_task_count` 过大(使用 `SHOW PARTITIONS` 检查此分区中每个 Tablet 的大小),则表可能创建不当。 + - 如果 `read_remote_mb` 过大(超过总读取数据的 30%),请检查服务器磁盘大小,并通过 `SHOW BACKENDS` 检查 `DataCacheMetrics` 字段的缓存配额。 + - 如果 `write_remote_sec` 过大(超过总 Compaction 时间的 90%),则写入远端存储可能过慢。这可以通过检查带有关键字 `single upload latency` 和 `multi upload latency` 的共享数据特定监控指标来验证。 + - 如果 `in_queue_sec` 过大(每个 Tablet 的平均等待时间超过 60 秒),则参数设置可能不合理或有其他正在运行的 Compaction 过慢。 diff --git a/docs/zh/administration/management/configuration.mdx b/docs/zh/administration/management/configuration.mdx new file mode 100644 index 0000000..9df74b9 --- /dev/null +++ b/docs/zh/administration/management/configuration.mdx @@ -0,0 +1,11 @@ +--- +displayed_sidebar: docs +--- + +# 配置 + +FE 和 BE 节点的配置参数。 + +import DocCardList from '@theme/DocCardList'; + + diff --git a/docs/zh/administration/management/enable_fqdn.md b/docs/zh/administration/management/enable_fqdn.md new file mode 100644 index 0000000..0b2b8bf --- /dev/null +++ b/docs/zh/administration/management/enable_fqdn.md @@ -0,0 +1,163 @@ +--- +displayed_sidebar: docs +--- + +# 启用 FQDN 访问 + +本文介绍如何通过使用完全限定域名 (FQDN) 来启用集群访问。FQDN 是可在互联网上访问的特定实体的**完整域名**。FQDN 由两部分组成:主机名 (hostname) 和域名 (domain name)。 + +在 2.4 版本之前,StarRocks 仅支持通过 IP 地址访问 FE 和 BE。即使使用 FQDN 将节点添加到集群,最终也会转换为 IP 地址。这给 DBA 带来了极大的不便,因为更改 StarRocks 集群中某些节点的 IP 地址可能会导致节点访问失败。在 2.4 版本中,StarRocks 将每个节点与其 IP 地址解耦。您现在可以仅通过其 FQDN 来管理 StarRocks 中的节点。 + +## 前提条件 + +要为 StarRocks 集群启用 FQDN 访问,请确保满足以下要求: + +- 集群中的每台机器都必须有一个 hostname。 + +- 在每台机器的 **/etc/hosts** 文件中,您必须指定集群中其他机器对应的 IP 地址和 FQDN。 + +- **/etc/hosts** 文件中的 IP 地址必须是唯一的。 + +## 设置具有 FQDN 访问的新集群 + +默认情况下,新集群中的 FE 节点是通过 IP 地址访问启动的。要启动一个具有 FQDN 访问的新集群,您必须在**首次启动集群时**运行以下命令来启动 FE 节点: + +```Shell +./bin/start_fe.sh --host_type FQDN --daemon +``` + +属性 `--host_type` 指定用于启动节点的访问方法。有效值包括 `FQDN` 和 `IP`。您只需在首次启动节点时指定此属性一次。 + +每个 BE 节点都通过 FE 元数据中定义的 `BE Address` 来标识自身。因此,您在启动 BE 节点时**无需**指定 `--host_type`。如果 `BE Address` 使用 FQDN 定义了一个 BE 节点,那么该 BE 节点将通过此 FQDN 来标识自身。 + +## 在现有集群中启用 FQDN 访问 + +要在之前通过 IP 地址启动的现有集群中启用 FQDN 访问,您必须首先将 StarRocks **升级**到 2.4.0 或更高版本。 + +### 启用 FE 节点的 FQDN 访问 + +您需要先为所有非 Leader Follower FE 节点启用 FQDN 访问,然后再为 Leader FE 节点启用。 + +> **注意** +> +> 在为 FE 节点启用 FQDN 访问之前,请确保集群至少有三个 Follower FE 节点。 + +#### 为非 Leader Follower FE 节点启用 FQDN 访问 + +1. 导航到 FE 节点的部署目录,然后运行以下命令停止 FE 节点: + + ```Shell + ./bin/stop_fe.sh + ``` + +2. 通过您的 MySQL 客户端执行以下语句,检查已停止的 FE 节点的 `Alive` 状态。等待直到 `Alive` 状态变为 `false`。 + + ```SQL + SHOW PROC '/frontends'\G + ``` + +3. 执行以下语句,将 IP 地址替换为 FQDN。 + + ```SQL + ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; + ``` + +4. 运行以下命令,以 FQDN 访问方式启动 FE 节点。 + + ```Shell + ./bin/start_fe.sh --host_type FQDN --daemon + ``` + + 属性 `--host_type` 指定用于启动节点的访问方法。有效值包括 `FQDN` 和 `IP`。您只需在修改节点后重新启动节点时指定此属性一次。 + +5. 检查 FE 节点的 `Alive` 状态。等待直到 `Alive` 状态变为 `true`。 + + ```SQL + SHOW PROC '/frontends'\G + ``` + +6. 当前 FE 节点的 `Alive` 状态为 `true` 后,重复上述步骤,依次为其他非 Leader Follower FE 节点启用 FQDN 访问。 + +#### 为 Leader FE 节点启用 FQDN 访问 + +在所有非 Leader FE 节点都已成功修改并重新启动后,您现在可以为 Leader FE 节点启用 FQDN 访问。 + +> **注意** +> +> 在为 Leader FE 节点启用 FQDN 访问之前,用于将节点添加到集群的 FQDN 仍会转换为相应的 IP 地址。当集群选出一个启用了 FQDN 访问的 Leader FE 节点后,FQDN 将不再转换为 IP 地址。 + +1. 导航到 Leader FE 节点的部署目录,然后运行以下命令停止 Leader FE 节点。 + + ```Shell + ./bin/stop_fe.sh + ``` + +2. 通过您的 MySQL 客户端执行以下语句,检查集群是否已选出新的 Leader FE 节点。 + + ```SQL + SHOW PROC '/frontends'\G + ``` + + 任何 `Alive` 状态为 `true` 且 `isMaster` 为 `true` 的 FE 节点都是正在运行的 Leader FE。 + +3. 执行以下语句,将 IP 地址替换为 FQDN。 + + ```SQL + ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; + ``` + +4. 运行以下命令,以 FQDN 访问方式启动 FE 节点。 + + ```Shell + ./bin/start_fe.sh --host_type FQDN --daemon + ``` + + 属性 `--host_type` 指定用于启动节点的访问方法。有效值包括 `FQDN` 和 `IP`。您只需在修改节点后重新启动节点时指定此属性一次。 + +5. 检查 FE 节点的 `Alive` 状态。 + + ```Plain + SHOW PROC '/frontends'\G + ``` + + 如果 `Alive` 状态变为 `true`,则 FE 节点已成功修改并作为 Follower FE 节点添加到集群中。 + +### 启用 BE 节点的 FQDN 访问 + +通过您的 MySQL 客户端执行以下语句,将 IP 地址替换为 FQDN,为 BE 节点启用 FQDN 访问。 + +```SQL +ALTER SYSTEM MODIFY BACKEND HOST "" TO ""; +``` + +> **注意** +> +> 启用 FQDN 访问后,您**无需**重新启动 BE 节点。 + +## 回滚 + +要将启用了 FQDN 访问的 StarRocks 集群回滚到不支持 FQDN 访问的早期版本,您必须首先为集群中的所有节点启用 IP 地址访问。您可以参考[在现有集群中启用 FQDN 访问](#enable-fqdn-access-in-an-existing-cluster)以获取一般指导,但需要将 SQL 命令更改为以下内容: + +- 为 FE 节点启用 IP 地址访问: + +```SQL +ALTER SYSTEM MODIFY FRONTEND HOST "" TO ""; +``` + +- 为 BE 节点启用 IP 地址访问: + +```SQL +ALTER SYSTEM MODIFY BACKEND HOST "" TO ""; +``` + +修改将在您的集群成功重启后生效。 + +## 常见问题 + +**Q: 为 FE 节点启用 FQDN 访问时出现错误:"required 1 replica. But none were active with this master"。我该怎么办?** + +A: 在为 FE 节点启用 FQDN 访问之前,请确保集群至少有三个 Follower FE 节点。 + +**Q: 我可以将一个新节点通过 IP 地址添加到启用了 FQDN 访问的集群中吗?** + +A: 是的。 diff --git a/docs/zh/administration/management/graceful_exit.md b/docs/zh/administration/management/graceful_exit.md new file mode 100644 index 0000000..ac31119 --- /dev/null +++ b/docs/zh/administration/management/graceful_exit.md @@ -0,0 +1,276 @@ +--- +displayed_sidebar: docs +--- + +# 优雅退出 + +从 v3.3 版本开始,StarRocks 支持优雅退出。 + +## 概述 + +优雅退出是一种旨在支持 StarRocks FE、BE 和 CN 节点**无中断升级和重启**的机制。它的主要目标是在节点重启、滚动升级或集群扩缩容等维护操作期间,最大限度地减少对正在运行的查询和数据摄取任务的影响。 + +优雅退出确保: + +- 节点一旦开始退出,**停止接受新任务**; +- 允许现有查询和加载作业在受控的时间窗口内**完成**; +- 系统组件 (FE/BE/CN) **协调状态变更**,以便集群正确重新路由流量。 + +FE 和 BE/CN 节点之间的优雅退出机制有所不同,如下所述。 + +### FE 优雅退出机制 + +#### 触发信号 + +FE 优雅退出通过以下方式触发: + +```bash +stop_fe.sh -g +``` + +这会发送 `SIGUSR1` 信号,而默认退出(不带 `-g` 选项)会发送 `SIGTERM` 信号。 + +#### 负载均衡器感知 + +收到信号后: + +- FE 立即在 `/api/health` 端点返回 **HTTP 500**。 +- 负载均衡器在大约 15 秒内检测到降级状态,并停止将新连接路由到此 FE。 + +#### 连接排空和关闭逻辑 + +**Follower FE** + +- 处理只读查询。 +- 如果 FE 节点没有活动会话,连接会立即关闭。 +- 如果 SQL 正在运行,FE 节点会等待执行完成,然后关闭会话。 + +**Leader FE** + +- 读取请求处理与 Follower 相同。 +- 写入请求处理需要: + + - 关闭 BDBJE。 + - 允许新的 Leader 选举完成。 + - 将后续写入重定向到新选出的 Leader。 + +#### 超时控制 + +如果查询运行时间过长,FE 会在 **60 秒** 后强制退出(可通过 `--timeout` 选项配置)。 + +### BE/CN 优雅退出机制 + +#### 触发信号 + +BE 优雅退出通过以下方式触发: + +```bash +stop_be.sh -g +``` + +CN 优雅退出通过以下方式触发: + +```bash +stop_cn.sh -g +``` + +这会发送 `SIGTERM` 信号,而默认退出(不带 `-g` 选项)会发送 `SIGKILL` 信号。 + +#### 状态转换 + +收到信号后: + +- BE/CN 节点将自身标记为**正在退出**。 +- 它通过返回 `INTERNAL_ERROR` 拒绝**新的查询片段**。 +- 它继续处理现有片段。 + +#### 等待进行中查询的循环 + +BE/CN 等待现有片段完成的行为由 BE/CN 配置 `loop_count_wait_fragments_finish` 控制(默认值:2)。实际等待持续时间等于 `loop_count_wait_fragments_finish × 10 秒`(即默认 20 秒)。如果超时后仍有片段,BE/CN 将继续正常关闭(关闭线程、网络和其他进程)。 + +#### 改进的 FE 感知 + +从 v3.4 版本开始,FE 不再根据心跳失败将 BE/CN 标记为 `DEAD`。它正确识别 BE/CN 的“正在退出”状态,从而允许更长的优雅退出窗口以完成片段。 + +## 配置 + +### FE 配置 + +#### `stop_fe.sh -g --timeout` + +- 描述:FE 被强制终止前的最大等待时间。 +- 默认值:60(秒) +- 如何应用:在脚本命令中指定,例如,`--timeout 120`。 + +#### *最小 LB 检测时间* + +- 描述:LB 需要至少 15 秒才能检测到降级状态。 +- 默认值:15(秒) +- 如何应用:固定值 + +### BE/CN 配置 + +#### `loop_count_wait_fragments_finish` + +- 描述:BE/CN 等待现有片段的持续时间。将该值乘以 10 秒。 +- 默认值:2 +- 如何应用:在 BE/CN 配置文件中修改或动态更新。 + +#### `graceful_exit_wait_for_frontend_heartbeat` + +- 描述:BE/CN 是否等待 FE 通过心跳确认 **SHUTDOWN**。从 v3.4.5 版本开始。 +- 默认值:false +- 如何应用:在 BE/CN 配置文件中修改或动态更新。 + +#### `stop_be.sh -g --timeout`, `stop_cn.sh -g --timeout` + +- 描述:BE/CN 被强制终止前的最大等待时间。将其设置为大于 `loop_count_wait_fragments_finish` * 10 的值,以防止在 BE/CN 等待持续时间达到之前终止。 +- 默认值:false +- 如何应用:在脚本命令中指定,例如,`--timeout 30`。 + +### 全局开关 + +从 v3.4 版本开始,优雅退出**默认启用**。要暂时禁用它,请将 BE/CN 配置 `loop_count_wait_fragments_finish` 设置为 `0`。 + +## 优雅退出期间的预期行为 + +### 查询工作负载 + +| 查询类型 | 预期行为 | +| ----------------------------------- | ----------------------------------------------------------------------------------- | +| **短查询(小于 20 秒)** | BE/CN 等待时间足够长,因此查询正常完成。 | +| **中查询(20 到 60 秒)** | 在 BE/CN 等待窗口内完成的查询成功返回;否则查询被取消并需要手动重试。 | +| **长查询(大于 60 秒)** | 查询很可能由于超时被 FE/BE/CN 终止,需要手动重试。 | + +### 数据摄取任务 + +- **通过 Flink 或 Kafka Connectors 的加载任务**会自动重试,用户不会察觉到中断。 +- **Stream Load(非框架)、Broker Load 和 Routine Load 任务**如果连接中断可能会失败。需要手动重试。 +- **后台任务**由 FE 重试机制自动重新调度和执行。 + +### 升级和重启操作 + +优雅退出确保: + +- 没有全集群停机时间; +- 通过逐个排空节点实现安全的滚动升级。 + +## 限制和版本差异 + +### 版本行为差异 + +| 版本 | 行为 | +| --------- | ------------------------------------------------------------------------------------------------------------------------- | +| **v3.3** | BE 优雅退出存在缺陷:FE 可能会过早地将 BE/CN 标记为 `DEAD`,导致查询被取消。有效等待时间有限(默认为 15 秒)。 | +| **v3.4+** | 完全支持更长的等待持续时间;FE 正确识别 BE/CN 的“正在退出”状态。推荐用于生产环境。 | + +### 操作限制 + +- 在极端情况下(例如,BE/CN 挂起),优雅退出可能会失败。终止进程需要 `kill -9`,这可能会导致部分数据持久化风险(可通过快照恢复)。 + +## 使用方法 + +### 先决条件 + +**StarRocks 版本**: + +- **v3.3+**:基本的优雅退出支持。 +- **v3.4+**:增强的状态管理,更长的等待窗口(长达数分钟)。 + +**配置**: + +- 确保 `loop_count_wait_fragments_finish` 设置为正整数。 +- 将 `graceful_exit_wait_for_frontend_heartbeat` 设置为 `true`,以允许 FE 检测 BE 的“正在退出”状态。 + +### 执行 FE 优雅退出 + +```bash +./bin/stop_fe.sh -g --timeout 60 +``` + +参数: + +- `--timeout`:FE 节点被强制终止前的最大等待时间。 + +行为: + +- 系统首先发送 `SIGUSR1` 信号。 +- 超时后,它会回退到 `SIGKILL`。 + +#### 验证 FE 状态 + +您可以通过以下 API 检查 FE 健康状况: + +``` +http://:8030/api/health +``` + +LB 在收到连续的非 200 响应后会移除该节点。 + +### 执行 BE/CN 优雅退出 + +- **对于 v3.3:** + + - BE: + + ```bash + ./be/bin/stop_be.sh -g + ``` + + - CN: + + ```bash + ./be/bin/stop_cn.sh -g + ``` + +- **对于 v3.4+:** + + - BE: + + ```bash + ./bin/stop_be.sh -g --timeout 600 + ``` + + - CN: + + ```bash + ./bin/stop_cn.sh -g --timeout 600 + ``` + +如果没有剩余片段,BE/CN 会立即退出。 + +#### 验证 BE/CN 状态 + +在 FE 上运行: + +```sql +SHOW BACKENDS; +``` + +`StatusCode`: + +- `SHUTDOWN`:BE/CN 优雅退出进行中。 +- `DISCONNECTED`:BE/CN 节点已完全退出。 + +## 滚动升级工作流程 + +### 步骤 + +1. 在节点 `A` 上执行优雅退出。 +2. 确认节点 `A` 在 FE 端显示为 `DISCONNECTED`。 +3. 升级并重启节点 `A`。 +4. 对其余节点重复上述步骤。 + +### 监控优雅退出 + +检查 FE 日志 `fe.log`、BE 日志 `be.log` 或 CN 日志 `cn.log`,以确保在退出期间没有任务。 + +## 故障排除 + +### BE/CN 因超时而退出 + +如果任务未能在优雅退出期间完成,BE/CN 将触发强制终止(`SIGKILL`)。验证这是否由任务持续时间过长或配置不当(例如,`--timeout` 值过小)引起。 + +### 节点状态不是 SHUTDOWN + +如果节点状态不是 `SHUTDOWN`,请验证 `loop_count_wait_fragments_finish` 是否设置为正整数,或者 BE/CN 在退出前是否报告了心跳(如果否,请将 `graceful_exit_wait_for_frontend_heartbeat` 设置为 `true`)。