Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROTOCOL RFC] Table Redirection Feature #3702

Open
1 of 3 tasks
kamcheungting-db opened this issue Sep 20, 2024 · 0 comments
Open
1 of 3 tasks

[PROTOCOL RFC] Table Redirection Feature #3702

kamcheungting-db opened this issue Sep 20, 2024 · 0 comments

Comments

@kamcheungting-db
Copy link
Contributor

kamcheungting-db commented Sep 20, 2024

Protocol Change Request

Overview

Currently, DeltaLog lacks a seamless method for migrating an existing table to new storage. Users must establish their own lengthy and complex data cloning procedures. This feature request proposes a new table functionality that allows an existing Delta table to be redirected to a new storage location. Once the redirection process is complete, the table's data, Delta log, checkpoint, and checksum files would be cloned to the new storage location. All subsequent workloads would then be managed on the new storage location.

Description of the protocol change

The detail proposal and the required protocol changes are sketched out in this doc.

At a high level, we propose two new features for Delta tables: redirectReaderWriter and redirectWriterOnly. Both features are similar, but with distinct functionalities. The redirectReaderWriter feature blocks both read and write queries from Delta clients that have not implemented this feature. In contrast, the redirectWriterOnly feature only blocks write queries from such clients.

These table feature includes the following capabilities:

  1. RedirectReaderWriter: This feature supports redirecting both read and write operations from the source to the destination, while blocking read and write operations from legacy Delta clients.
  2. RedirectWriterOnly: This feature allows redirecting read and write operations from the source to the destination but only blocks write operations from legacy Delta clients, permitting reads from the source tables.
  3. Both features should enable tables to be redirected to different storage and catalogs without ambiguity.
  4. Time Traveling:
    4.1. Fully supports time-travel queries.
    4.2. Allows restoring to any version before or after the table redirect commit, without reversing the table redirect property.
  5. Neither feature should cause noticeable performance regression.
  6. During both the enabling and dropping these table features, all committed transactions should appear correctly in the redirected table, while uncommitted transactions should not be visible.
  7. There should be clear guidelines on which queries can or cannot be processed at each stage of the enablement and disablement procedures.
  8. The source table should remain in a valid state if a user cancels the redirect table feature.

Willingness to contribute

The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?

  • Yes. I can contribute.
  • Yes. I would be willing to contribute with guidance from the Delta Lake community.
  • No. I cannot contribute at this time.
@kamcheungting-db kamcheungting-db changed the title [PROTOCOL RFC] Table Redirection feature [PROTOCOL RFC] Table Redirection Feature Sep 20, 2024
vkorukanti pushed a commit that referenced this issue Oct 25, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description


This PR introduces a new reader-writer table feature "redirection". This
table feature would redirect the read and write query from the current
storage location to a new storage location described inside the value of
table feature.

The redirection has several phases to ensure no anomaly. To label these
phases, we introduces four states:

0. NO-REDIRECT: This state indicates that redirect is not enabled on the
table.
1. ENABLE-REDIRECT-IN-PROGRESS: This state indicates that the redirect
process is still going on. No DML or DDL transaction can be committed to
the table when the table is in this state.
2. REDIRECT-READY: This state indicates that the redirect process is
completed. All types of queries would be redirected to the table
specified inside RedirectSpec object.
3. DROP-REDIRECT-IN-PROGRESS: The table redirection is under withdrawal
and the redirection property is going to be removed from the delta
table. In this state, the delta client stops redirecting new queries to
redirect destination tables, and only accepts read-only queries to
access the redirect source table.

To ensure no undefined behavior, the valid procedures of state
transition are:

0. NO-REDIRECT -> ENABLE-REDIRECT-IN-PROGRESS
1. ENABLE-REDIRECT-IN-PROGRESS -> REDIRECT-READY
2. REDIRECT-READY -> DROP-REDIRECT-IN-PROGRESS
3. DROP-REDIRECT-IN-PROGRESS -> NO-REDIRECT
4. ENABLE-REDIRECT-IN-PROGRESS -> NO-REDIRECT


The protocol RFC document is on:
#3702

## How was this patch tested?

Unit Test of transition between different states of redirection.


## Does this PR introduce _any_ user-facing changes?
No
vkorukanti pushed a commit that referenced this issue Oct 26, 2024
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description


This PR introduces a new reader-writer table feature "redirection". This
table feature would redirect the read and write query from the current
storage location to a new storage location described inside the value of
table feature.

The redirection has several phases to ensure no anomaly. To label these
phases, we introduces four states:

0. NO-REDIRECT: This state indicates that redirect is not enabled on the
table.
1. ENABLE-REDIRECT-IN-PROGRESS: This state indicates that the redirect
process is still going on. No DML or DDL transaction can be committed to
the table when the table is in this state.
2. REDIRECT-READY: This state indicates that the redirect process is
completed. All types of queries would be redirected to the table
specified inside RedirectSpec object.
3. DROP-REDIRECT-IN-PROGRESS: The table redirection is under withdrawal
and the redirection property is going to be removed from the delta
table. In this state, the delta client stops redirecting new queries to
redirect destination tables, and only accepts read-only queries to
access the redirect source table.

To ensure no undefined behavior, the valid procedures of state
transition are:

0. NO-REDIRECT -> ENABLE-REDIRECT-IN-PROGRESS
1. ENABLE-REDIRECT-IN-PROGRESS -> REDIRECT-READY
2. REDIRECT-READY -> DROP-REDIRECT-IN-PROGRESS
3. DROP-REDIRECT-IN-PROGRESS -> NO-REDIRECT
4. ENABLE-REDIRECT-IN-PROGRESS -> NO-REDIRECT


The protocol RFC document is on:
#3702

## How was this patch tested?

Unit Test of transition between different states of redirection.


## Does this PR introduce _any_ user-facing changes?
No
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant