feat: enable create / drop tags and using tags as version on select queries#198
feat: enable create / drop tags and using tags as version on select queries#198hamersaw wants to merge 3 commits intolance-format:mainfrom
Conversation
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
…' support Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
|
Hi, @hamersaw I have a question to discuss. Lance has branch feature. So, if we support to query from a branch, what do you think the Spark SQL's grammar to specify branch and tag ? If we use: SELECT * FROM <table> VERSION AS OF <tag>We can not define the specific branch in the sql. |
@fangbo , this is next up on my "random backfill" TODO. I tried to add an I looked around a bit and saw Iceberg supports this by adding a prefixed ID to the table (ex. |
Thanks for your reply. On the other hand, |
|
I think I agree with @fangbo that we should probably treat tag and branch both using What about the following: and For examples: There is a problem of what if the branch name has a |
For branch in lance, data can be insert/updated/deleted and schema also can be changed. One of our customers treats branch as a new table in Spark and execute Spark dml like: Although |
I think that's the difficulty. Adding some level of branch support int he |
The time travel syntax is purely for read, that's expected. For reference, in Iceberg we did 2 approaches for DML:
|
I really like this idea. This is what I trying to propose, but certainly put more eloquently. I think if we support branch integrated in the the table identifier than that covers all of our bases. We could still add to |
+1, Good idea. Actually this method treats the branch as a normal table. I think it aligns better with developers' usage habits. |
I agree. As someone who was a part of the group that originally designed the syntax, I think at this point it's a failed experiment. Engines never agreed upon the right syntax ( I am good with directly implementing it in table identifier to support read and write, and we can implement the syntax sugar later. |
Adding support for tags in various APIs:
Spark SQL
To create a new tag using the specified "" or latest if not provided.
ALTER TABLE <table> CREATE TAG <tag> [VERSION AS OF <version>]To delete an existing tag
ALTER TABLE <table> DROP TAG <tag>To query a table using tag as version
SELECT * FROM <table> VERSION AS OF <tag>Spark API