Skip to content

cloudstorage: fix a bug that may cause storage sink get stuck (#12142) #12146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #12142

What problem does this PR solve?

Issue Number: close #9162

What is changed and how it works?

Check List

Tests

  • Unit test
  • Manual test
  1. I created a container on Azure.
    image

  2. I modified the WriteFile function as follows to make it easier to throw a context timeout error.

func (s *extStorageWithTimeout) WriteFile(ctx context.Context, name string, data []byte) error {
	timeout := s.timeout
	if counter.Load()%10 == 0 {
		// Make it more likely to timeout
		timeout = 5 * time.Millisecond
	}
	ctx, cancel := context.WithTimeout(ctx, timeout)
	counter.Add(1)
	defer cancel()
	return s.ExternalStorage.WriteFile(ctx, name, data)
}
  1. I compiled the modified CDC and created a changefeed to synchronize data to Azure Blob Storage.
./cdc cli changefeed create -c test --sink-uri="azure://dongmen-test/cdc?protocol=canal-json&account-name=${}account-key=${}" 
  1. I wrote data to TiDB.
./workload -database-host 127.0.0.1 -database-port 4000 -database-db-name "test" -table-count 4 -workload-type large_row -total-row-count 10000 -action prepare -thread 1
  1. I observed the following warning message from the changefeed.
./cdc cli changefeed list
[
  {
    "id": "test",
    "namespace": "default",
    "summary": {
      "state": "warning",
      "tso": 457412143773646869,
      "checkpoint": "2025-04-17 19:20:54.117",
      "error": {
        "time": "2025-04-17T19:21:20.841014+08:00",
        "addr": "127.0.0.1:8300",
        "code": "CDC:ErrProcessorUnknown",
        "message": "Failed to write azure blob file, file info: bucket(container)='dongmen-test', key='cdc/test/large_row_1/457412143747432452/2025-04-17/CDC00000000000000000001.json': context deadline exceeded"
      }
    }
  }
]
  1. The changefeed retried the operation in the sink module when encountering this error and quickly recovered.
> ./cdc cli changefeed list
[
  {
    "id": "test",
    "namespace": "default",
    "summary": {
      "state": "normal",
      "tso": 457412177498472454,
      "checkpoint": "2025-04-17 19:23:02.767",
      "error": null
    }
  }
]
  1. I checked the Azure Blob Storage, and all the data was synchronized correctly.

The above test process verifies that the underlying calls of the WriteFile function respect the context with a timeout, which validates the effectiveness of the fix.

Moreover, this test verifies that the changefeed can quickly recover from similar errors.

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Fix a bug that may cause changefeed with storage sink getting stuck.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-6.5 This PR is cherry-picked to release-6.5 from a source PR. labels Apr 17, 2025
Copy link
Contributor

ti-chi-bot bot commented Apr 17, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asddongmen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Apr 17, 2025
@ti-chi-bot ti-chi-bot bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved cherry-pick-approved Cherry pick PR approved by release team. labels Apr 28, 2025
@ti-chi-bot ti-chi-bot bot merged commit b2e1e55 into pingcap:release-6.5 Apr 28, 2025
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. type/cherry-pick-for-release-6.5 This PR is cherry-picked to release-6.5 from a source PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants