Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sink decoupling during backfill for CREATE SINK INTO TABLE #19285

Open
hzxa21 opened this issue Nov 7, 2024 · 1 comment
Open

Support sink decoupling during backfill for CREATE SINK INTO TABLE #19285

hzxa21 opened this issue Nov 7, 2024 · 1 comment

Comments

@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 7, 2024

Is your feature request related to a problem? Please describe.

Backfilling can backpressure upstream, causing the existing streaming jobs to be slower or even stuck. There are three cases where backfilling can happen:

  1. CREATE MV
  2. CREATE SINK with connector
  3. CREATE SINK INTO TABLE

The current way to mitigate backfilling effect on upstream

  • SET BACKFILL_RATE_LIMIT to xxx. Supported for 1, 2, 3.
  • SET sink_decouple to true (default on). Supported for 2.
  • SET streaming_use_snapshot_backfill to true (default off, experimental now). Supported for 1.

The only effective way for 3 is use rate limit, which requires manual operation and understanding on the workload before determining a good value. Therefore, I think we should also support sink decoupling for sink into table as well. This is also a perquisite of doing severless backfill for sink into table.

Describe the solution you'd like

There are two ways to implement sink decoupling for sink into table:

  1. Use kv log store for SINK INTO TABLE, similar to what we did for sink with connector.
  2. Record L0 changelog and support snapshot backfilling for SINK INTO TABLE, similar to what we did for MV.

Describe alternatives you've considered

No response

Additional context

No response

@github-actions github-actions bot added this to the release-2.2 milestone Nov 7, 2024
@st1page
Copy link
Contributor

st1page commented Nov 7, 2024

The general idea LGTM. I think we need some more detailed design to ensure that the data in the log store can converge to 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants