-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROTOCOL RFC] Table Redirection Feature #3702
Comments
kamcheungting-db
changed the title
[PROTOCOL RFC] Table Redirection feature
[PROTOCOL RFC] Table Redirection Feature
Sep 20, 2024
5 tasks
vkorukanti
pushed a commit
that referenced
this issue
Oct 25, 2024
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This PR introduces a new reader-writer table feature "redirection". This table feature would redirect the read and write query from the current storage location to a new storage location described inside the value of table feature. The redirection has several phases to ensure no anomaly. To label these phases, we introduces four states: 0. NO-REDIRECT: This state indicates that redirect is not enabled on the table. 1. ENABLE-REDIRECT-IN-PROGRESS: This state indicates that the redirect process is still going on. No DML or DDL transaction can be committed to the table when the table is in this state. 2. REDIRECT-READY: This state indicates that the redirect process is completed. All types of queries would be redirected to the table specified inside RedirectSpec object. 3. DROP-REDIRECT-IN-PROGRESS: The table redirection is under withdrawal and the redirection property is going to be removed from the delta table. In this state, the delta client stops redirecting new queries to redirect destination tables, and only accepts read-only queries to access the redirect source table. To ensure no undefined behavior, the valid procedures of state transition are: 0. NO-REDIRECT -> ENABLE-REDIRECT-IN-PROGRESS 1. ENABLE-REDIRECT-IN-PROGRESS -> REDIRECT-READY 2. REDIRECT-READY -> DROP-REDIRECT-IN-PROGRESS 3. DROP-REDIRECT-IN-PROGRESS -> NO-REDIRECT 4. ENABLE-REDIRECT-IN-PROGRESS -> NO-REDIRECT The protocol RFC document is on: #3702 ## How was this patch tested? Unit Test of transition between different states of redirection. ## Does this PR introduce _any_ user-facing changes? No
5 tasks
vkorukanti
pushed a commit
that referenced
this issue
Oct 26, 2024
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [x] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) ## Description This PR introduces a new reader-writer table feature "redirection". This table feature would redirect the read and write query from the current storage location to a new storage location described inside the value of table feature. The redirection has several phases to ensure no anomaly. To label these phases, we introduces four states: 0. NO-REDIRECT: This state indicates that redirect is not enabled on the table. 1. ENABLE-REDIRECT-IN-PROGRESS: This state indicates that the redirect process is still going on. No DML or DDL transaction can be committed to the table when the table is in this state. 2. REDIRECT-READY: This state indicates that the redirect process is completed. All types of queries would be redirected to the table specified inside RedirectSpec object. 3. DROP-REDIRECT-IN-PROGRESS: The table redirection is under withdrawal and the redirection property is going to be removed from the delta table. In this state, the delta client stops redirecting new queries to redirect destination tables, and only accepts read-only queries to access the redirect source table. To ensure no undefined behavior, the valid procedures of state transition are: 0. NO-REDIRECT -> ENABLE-REDIRECT-IN-PROGRESS 1. ENABLE-REDIRECT-IN-PROGRESS -> REDIRECT-READY 2. REDIRECT-READY -> DROP-REDIRECT-IN-PROGRESS 3. DROP-REDIRECT-IN-PROGRESS -> NO-REDIRECT 4. ENABLE-REDIRECT-IN-PROGRESS -> NO-REDIRECT The protocol RFC document is on: #3702 ## How was this patch tested? Unit Test of transition between different states of redirection. ## Does this PR introduce _any_ user-facing changes? No
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Protocol Change Request
Overview
Currently, DeltaLog lacks a seamless method for migrating an existing table to new storage. Users must establish their own lengthy and complex data cloning procedures. This feature request proposes a new table functionality that allows an existing Delta table to be redirected to a new storage location. Once the redirection process is complete, the table's data, Delta log, checkpoint, and checksum files would be cloned to the new storage location. All subsequent workloads would then be managed on the new storage location.
Description of the protocol change
The detail proposal and the required protocol changes are sketched out in this doc.
At a high level, we propose two new features for Delta tables: redirectReaderWriter and redirectWriterOnly. Both features are similar, but with distinct functionalities. The redirectReaderWriter feature blocks both read and write queries from Delta clients that have not implemented this feature. In contrast, the redirectWriterOnly feature only blocks write queries from such clients.
These table feature includes the following capabilities:
4.1. Fully supports time-travel queries.
4.2. Allows restoring to any version before or after the table redirect commit, without reversing the table redirect property.
Willingness to contribute
The Delta Lake Community encourages protocol innovations. Would you or another member of your organization be willing to contribute this feature to the Delta Lake code base?
The text was updated successfully, but these errors were encountered: