feat: enable create / drop tags and using tags as version on select queries by hamersaw · Pull Request #198 · lance-format/lance-spark

hamersaw · 2026-02-02T20:12:55Z

Adding support for tags in various APIs:

Spark SQL

To create a new tag using the specified "" or latest if not provided.
ALTER TABLE <table> CREATE TAG <tag> [VERSION AS OF <version>]

To delete an existing tag
ALTER TABLE <table> DROP TAG <tag>

To query a table using tag as version
SELECT * FROM <table> VERSION AS OF <tag>

Spark API

spark.read()
    .option("version", "<tag>")
    ...

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

…' support Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

fangbo · 2026-02-03T08:01:38Z

Hi, @hamersaw I have a question to discuss.

Lance has branch feature. So, if we support to query from a branch, what do you think the Spark SQL's grammar to specify branch and tag ?

If we use:

SELECT * FROM <table> VERSION AS OF <tag>

We can not define the specific branch in the sql.

hamersaw · 2026-02-03T16:20:49Z

Hi, @hamersaw I have a question to discuss.

Lance has branch feature. So, if we support to query from a branch, what do you think the Spark SQL's grammar to specify branch and tag ?

If we use:
SELECT * FROM <table> VERSION AS OF <tag>
We can not define the specific branch in the sql.

@fangbo , this is next up on my "random backfill" TODO. I tried to add an ON BRANCH <branch> clause to spark SQL statements because that would be wildly ergonomic - ex. SELECT * FROM <table> [ON BRANCH <branch>] [VERSION AS OF <tag>] but there is no clean way to do that without rewriting most of the grammer to be lance specific (similar to how ALTER TABLE <table> CREATE TAG is proposed here -- built on your COLUMN work).

I looked around a bit and saw Iceberg supports this by adding a prefixed ID to the table (ex. SELECT * FROM db.table.branch_foo). Without major grammer updates this seems the least intrusive approach because it needs to be supported across a TON of statments (ex. SELECT, ALTER TABLE, etc). I'm certainly going to open a discussion before submitting a PR on this and am VERY interested in others thoughts!

fangbo · 2026-02-04T02:56:35Z

Hi, @hamersaw I have a question to discuss.
Lance has branch feature. So, if we support to query from a branch, what do you think the Spark SQL's grammar to specify branch and tag ?
If we use:
SELECT * FROM <table> VERSION AS OF <tag>
We can not define the specific branch in the sql.
@fangbo , this is next up on my "random backfill" TODO. I tried to add an ON BRANCH <branch> clause to spark SQL statements because that would be wildly ergonomic - ex. SELECT * FROM <table> [ON BRANCH <branch>] [VERSION AS OF <tag>] but there is no clean way to do that without rewriting most of the grammer to be lance specific (similar to how ALTER TABLE <table> CREATE TAG is proposed here -- built on your COLUMN work).

I looked around a bit and saw Iceberg supports this by adding a prefixed ID to the table (ex. SELECT * FROM db.table.branch_foo). Without major grammer updates this seems the least intrusive approach because it needs to be supported across a TON of statments (ex. SELECT, ALTER TABLE, etc). I'm certainly going to open a discussion before submitting a PR on this and am VERY interested in others thoughts!

Thanks for your reply. On the other hand, insert/update/delete/merge into should also be supported for branch. One of my customer currently use lance branch . I think iceberg's adding a prefixed ID to the table (ex. SELECT * FROM db.table.branch_foo) is a feasible solution.

jackye1995 · 2026-02-11T04:46:19Z

I think I agree with @fangbo that we should probably treat tag and branch both using VERSION AS OF, instead of using a new syntax which requires SQL extension for branch.

What about the following:

SELECT * FROM TABLE VERSION AS OF "ref/<branch_name>/<tag_name_or_version_number>"

and branch_name=main means the main branch?

For examples:

SELECT * FROM TABLE VERSION AS OF "ref/main/1"

SELECT * FROM TABLE VERSION AS OF "ref/main/v1.0"

SELECT * FROM TABLE VERSION AS OF "ref/staging/10"

SELECT * FROM TABLE VERSION AS OF "ref/staging/v1.0"

There is a problem of what if the branch name has a / in it. My thinking is that the character after ref is used as the delimiter, so we can also do ref$release/v2.0$10.

What do we htink? @fangbo @hamersaw

fangbo · 2026-02-11T12:28:09Z

SELECT * FROM TABLE VERSION AS OF "ref/<branch_name>/<tag_name_or_version_number>"

For branch in lance, data can be insert/updated/deleted and schema also can be changed. One of our customers treats branch as a new table in Spark and execute Spark dml like: update ... delete from, merge into ... on this branch.

Although VERSION AS OF "ref/<branch_name>/<tag_name_or_version_number> is a good expression for select, Spark currently does not support update, delete , merge into using this expression. So I think it is a tricky problem about how to express branches in Spark DML.

hamersaw · 2026-02-11T16:09:30Z

SELECT * FROM TABLE VERSION AS OF "ref/<branch_name>/<tag_name_or_version_number>"
For branch in lance, data can be insert/updated/deleted and schema also can be changed. One of our customers treats branch as a new table in Spark and execute Spark dml like: update ... delete from, merge into ... on this branch.

Although VERSION AS OF "ref/<branch_name>/<tag_name_or_version_number> is a good expression for select, Spark currently does not support update, delete , merge into using this expression. So I think it is a tricky problem about how to express branches in Spark DML.

I think that's the difficulty. Adding some level of branch support int he VERSION AS OF <version> clause is relatively simple. IMO a better approach would be to figure out first class branch support across the Lance Spark SQL extension so that this works with everything (ex. INSERT, UPDATE, etc).

jackye1995 · 2026-02-11T17:56:22Z

Spark currently does not support update, delete , merge into using this expression. So I think it is a tricky problem about how to express branches in Spark DML.

The time travel syntax is purely for read, that's expected.

For reference, in Iceberg we did 2 approaches for DML:

table name convention: use some way to express branch in table name, for example we can do <table_name>__branch/<branch_name>. This also works for SELECT, and it's actually ironically easier to use than the dedicated AS OF time travel syntax, because it is very friendly for no-code applications to integrate with without the need to change underlying SQL statement. You can either use table name convention, or use time travel syntax for SELECT, you cannot specify both.
use environment variable: this is typically more used for write-audit-publish workflows, that you set a env config or Spark option like WAP_BRANCH=<branch_name>, and then reads and writes automatically switch to that branch.

What do we think? @fangbo @hamersaw

hamersaw · 2026-02-11T18:33:10Z

Spark currently does not support update, delete , merge into using this expression. So I think it is a tricky problem about how to express branches in Spark DML.

The time travel syntax is purely for read, that's expected.

For reference, in Iceberg we did 2 approaches for DML:
1. table name convention: use some way to express branch in table name, for example we can do `<table_name>__branch/<branch_name>`. This also works for SELECT, and it's actually ironically easier to use than the dedicated `AS OF` time travel syntax, because it is very friendly for no-code applications to integrate with without the need to change underlying SQL statement. You can either use table name convention, or use time travel syntax for SELECT, you cannot specify both.

2. use environment variable: this is typically more used for write-audit-publish workflows, that you set a env config or Spark option like `WAP_BRANCH=<branch_name>`, and then reads and writes automatically switch to that branch.
What do we think? @fangbo @hamersaw

I really like this idea. This is what I trying to propose, but certainly put more eloquently. I think if we support branch integrated in the the table identifier than that covers all of our bases. We could still add to VERSION AS OF for syntactic sugar, but I think it's a bit more ergonomic to centralize branches in the table ID.

fangbo · 2026-02-12T01:03:26Z

table name convention: use some way to express branch in table name, for example we can do <table_name>__branch/<branch_name>. This also works for SELECT, and it's actually ironically easier to use than the dedicated AS OF time travel syntax, because it is very friendly for no-code applications to integrate with without the need to change underlying SQL statement. You can either use table name convention, or use time travel syntax for SELECT, you cannot specify both.

+1, Good idea. Actually this method treats the branch as a normal table. I think it aligns better with developers' usage habits.

jackye1995 · 2026-02-12T01:05:41Z

I think it's a bit more ergonomic to centralize branches in the table ID.

I agree. As someone who was a part of the group that originally designed the syntax, I think at this point it's a failed experiment. Engines never agreed upon the right syntax (FOR SYSTEM_VERSION AS OF vs VERSION AS OF), and it's hard to integrate because caller has to change SQL syntax, and it's just never integrated into the write path properly.

I am good with directly implementing it in table identifier to support read and write, and we can implement the syntax sugar later.

hamersaw added 3 commits February 2, 2026 11:01

if version is provided as a string we use it to set a tag instead

8d29cbe

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

added 'ALTER TABLE <table> CREATE TAG <tag> [VERSION AS OF <version>]…

c452ff9

…' support Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

added support for 'ALTER TABLE <table> DROP TAG <tag>

d53bd0d

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

hamersaw mentioned this pull request Feb 5, 2026

feat: support time travel #201

Merged

hamersaw mentioned this pull request Feb 12, 2026

feat: parse Ref from version and version numbers lance-format/lance#5584

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: enable create / drop tags and using tags as version on select queries#198

feat: enable create / drop tags and using tags as version on select queries#198
hamersaw wants to merge 3 commits intolance-format:mainfrom
hamersaw:feature/support-tags

hamersaw commented Feb 2, 2026

Uh oh!

fangbo commented Feb 3, 2026

Uh oh!

hamersaw commented Feb 3, 2026 •

edited

Loading

Uh oh!

fangbo commented Feb 4, 2026

Uh oh!

jackye1995 commented Feb 11, 2026 •

edited

Loading

Uh oh!

fangbo commented Feb 11, 2026 •

edited

Loading

Uh oh!

hamersaw commented Feb 11, 2026

Uh oh!

jackye1995 commented Feb 11, 2026

Uh oh!

hamersaw commented Feb 11, 2026 •

edited

Loading

Uh oh!

fangbo commented Feb 12, 2026

Uh oh!

jackye1995 commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hamersaw commented Feb 2, 2026

Spark SQL

Spark API

Uh oh!

fangbo commented Feb 3, 2026

Uh oh!

hamersaw commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fangbo commented Feb 4, 2026

Uh oh!

jackye1995 commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fangbo commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hamersaw commented Feb 11, 2026

Uh oh!

jackye1995 commented Feb 11, 2026

Uh oh!

hamersaw commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fangbo commented Feb 12, 2026

Uh oh!

jackye1995 commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hamersaw commented Feb 3, 2026 •

edited

Loading

jackye1995 commented Feb 11, 2026 •

edited

Loading

fangbo commented Feb 11, 2026 •

edited

Loading

hamersaw commented Feb 11, 2026 •

edited

Loading