Skip to content

Conversation

@BohuTANG
Copy link
Member

@BohuTANG BohuTANG commented Dec 8, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

This PR fixes spill metrics and makes local disk spill control more explicit.

  1. Fix local vs remote spill metrics

    • Previously, is_local = true incorrectly updated RemoteSpill* metrics.
    • Now, is_local = true updates LocalSpill* metrics; remote spill still updates RemoteSpill*.
  2. Profile spill I/O by real storage target

    • Introduce a SpillTarget derived from StorageParams (Fs = local, others = remote).
    • All spill readers and writers use SpillTarget so metrics always reflect the actual spill location.
  3. Add per-operator local disk spill quotas

    • New [spill] config fields and SpillConfig members:
      • sort_spilling_disk_quota_ratio
      • window_partition_spilling_disk_quota_ratio
      • result_set_spilling_disk_quota_ratio
    • GlobalConfig::spill.*_spill_bytes_limit() converts these ratios into per-query byte limits from the global local-spill cap.
    • Sort and window operators now use these limits instead of session-level *_spilling_to_disk_bytes_limit() settings.
  4. Prepare HTTP result-set local spill (kept off by default)

    • Wire result_set_spilling_disk_quota_ratio, but keep its default at 0 because HTTP result-set spilling to local disk has a known bug.
  5. Spill config: disable implicit cache-disk local spill

  • When [spill] does not explicitly configure a local spill path, local spill is now disabled (no more auto-fallback to data cache disk).
  • To enable local spill, users must explicitly set:
    • [spill] + spill_local_disk_path

Configuration Example

[spill]
# Local spill root
spill_local_disk_path = "./.databend/temp/_query_spill"
# Global local spill limit in bytes (e.g. 100GB)
spill_local_disk_max_bytes = 107374182400

# Per-query share of the global local spill limit
sort_spilling_disk_quota_ratio = 60
window_partition_spilling_disk_quota_ratio = 40

# HTTP result-set local spill has a known bug; keep it disabled by default.
result_set_spilling_disk_quota_ratio = 0

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable


This change is Reviewable

@BohuTANG BohuTANG changed the title Fix spill read profile classification feat: spill profile metrics Dec 8, 2025
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Dec 8, 2025
@BohuTANG BohuTANG marked this pull request as ready for review December 9, 2025 14:12
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@BohuTANG BohuTANG added the ci-cloud Build docker image for cloud test label Dec 9, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 9, 2025

Docker Image for PR

  • tag: pr-19075-6202d95-1765292993

note: this image tag is only available for internal use.

Copy link
Collaborator

@dqhl76 dqhl76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BohuTANG BohuTANG added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Dec 10, 2025
@BohuTANG
Copy link
Member Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-19075-f72ab4b-1765338895

note: this image tag is only available for internal use.

@BohuTANG
Copy link
Member Author

BohuTANG commented Dec 10, 2025

Bench

TPCH-1000

EXPLAIN ANALYZE SELECT
    l_orderkey,
    l_partkey,
    l_quantity,
    l_extendedprice,
    ROW_NUMBER() OVER (PARTITION BY l_orderkey ORDER BY l_extendedprice DESC) AS row_num,
    RANK() OVER (PARTITION BY l_orderkey ORDER BY l_extendedprice DESC) AS rank_num
FROM
    lineitem ignore_result;

window spill to remote

Query Duration: 5m 1.6s

Window
    └── WindowPartition
        ├── cpu time: 1834.502748217s
        ├── wait time: 1801.584809958s
        ├── output rows: 6 billion
        ├── output bytes: 178.81 GiB
        ├── numbers remote spilled by write: 396
        ├── bytes remote spilled by write: 60.67 GiB
        ├── remote spilled time by write: 1559.21s
        ├── numbers remote spilled by read: 396
        ├── bytes remote spilled by read: 60.67 GiB
        ├── remote spilled time by read: 721.872s

window spill to local disk

Query Duration: 3m 51.7s

Window
    └── WindowPartition
        ├── cpu time: 1826.024185141s
        ├── wait time: 770.098310939s
        ├── output rows: 6 billion
        ├── output bytes: 178.81 GiB
        ├── numbers remote spilled by write: 7
        ├── bytes remote spilled by write: 941.30 MiB
        ├── remote spilled time by write: 22.882s
        ├── numbers remote spilled by read: 7
        ├── bytes remote spilled by read: 941.30 MiB
        ├── remote spilled time by read: 10.244s
        ├── numbers local spilled by write: 386
        ├── bytes local spilled by write: 59.99 GiB
        ├── local spilled time by write: 1157.282s
        ├── numbers local spilled by read: 386
        ├── bytes local spilled by read: 59.99 GiB
        ├── local spilled time by read: 41.194s

@BohuTANG BohuTANG added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Dec 10, 2025
@BohuTANG BohuTANG added ci-cloud Build docker image for cloud test and removed ci-cloud Build docker image for cloud test labels Dec 10, 2025
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-19075-6ae79df-1765348228

note: this image tag is only available for internal use.

@BohuTANG BohuTANG merged commit d377ab0 into databendlabs:main Dec 10, 2025
96 checks passed
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-19075-8864b41-1765349755

note: this image tag is only available for internal use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-cloud Build docker image for cloud test pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants