CLI: add tables/find commands by MonkeyCanCode · Pull Request #4075 · apache/polaris

MonkeyCanCode · 2026-03-28T19:29:37Z

This is phase two of CLI: Add summarize subcommand, with great feedback from @flyrain and community from ML, this PR added the following support:

find command to locate identifier via fuzzy search
tables command to handle some basic Iceberg table operation (get/list/summarize/non-purge delete)

Also, a newline is added per section for summarize sub-commands introduced from phase one for easier readability.

While working on this, I noticed our test suits for CLI is a bit messy. I will create a follow up PR to clean up those and add missing one (currently the missing test cases are been tracked via #4017)

Here are couple sample output:

Find command

# fuzzy search for all entities across all catalogs
➜  polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find user
Searching for 'user'...
[Global]
  Principal:           quickstart_user
  Principal:           readonly_user
  Principal:           dev_user
  Principal Role:      quickstart_user_role
  Principal Role:      readonly_user_role
  Principal Role:      dev_user_role

[Catalog: quickstart_catalog]
  Table:               dev_namespace.sub_namespace.user
  View:                dev_namespace.sub_namespace.user_view

Found 8 matches (3 Principals, 3 Principal Roles, 1 Table, 1 View).

# fuzzy search for all entities within a single catalog
➜  polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find dev --catalog quickstart_catalog
Searching for 'dev'...
[Catalog: quickstart_catalog]
  Catalog Role:        dev_catalog_role
  Namespace:           dev_namespace

Found 2 matches (1 Catalog Role, 1 Namespace).

# fuzzy search for entity catalog role within a single catalog
➜  polaris git:(cli_summary_subcommand_v2) ./polaris --profile dev find dev --catalog quickstart_catalog --type catalog-role
Searching for 'dev'...
[Catalog: quickstart_catalog]
  Catalog Role:        dev_catalog_role

Found 1 matches (1 Catalog Role).

Tables command

# list tables
➜  polaris git:(cli_summary_subcommand_v2) ✗ ./polaris --profile dev tables list --catalog quickstart_catalog --namespace dev_namespace.sub_namespace
{"namespace": ["dev_namespace", "sub_namespace"], "name": "user"}

# get full table metadata
➜  polaris git:(cli_summary_subcommand_v2) ✗ ./polaris --profile dev tables get user --catalog quickstart_catalog --namespace dev_namespace.sub_namespace
{"metadata-location": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00002-fa1347d8-c14a-4af7-974d-2e80bc0a5866.metadata.json", "metadata": {"format-version": 3, "table-uuid": "35836a86-bf3a-43df-a6a4-ace9e5c8fb22", "location": "file:///var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user", "last-updated-ms": 1774722865518, "next-row-id": 1, "properties": {"owner": "yong", "created-at": "2026-03-28T18:34:23.090216Z", "write.distribution-mode": "range", "write.parquet.compression-codec": "zstd"}, "schemas": [{"type": "struct", "fields": [{"id": 1, "name": "id", "type": "long", "required": true, "doc": "Row ID"}, {"id": 2, "name": "user", "type": {"type": "struct", "fields": [{"id": 9, "name": "user_id", "type": "string", "required": false}, {"id": 10, "name": "name", "type": "string", "required": false}, {"id": 11, "name": "address", "type": {"type": "struct", "fields": [{"id": 12, "name": "street", "type": "string", "required": false}, {"id": 13, "name": "city", "type": "string", "required": false}, {"id": 14, "name": "country", "type": "string", "required": false}]}, "required": false}]}, "required": true, "doc": "User info"}, {"id": 3, "name": "tags", "type": {"type": "list", "element-id": 15, "element": "string", "element-required": false}, "required": false, "doc": "tags"}, {"id": 4, "name": "attributes", "type": {"type": "map", "key-id": 16, "key": "string", "value-id": 17, "value": "string", "value-required": false}, "required": false, "doc": "User attributes"}, {"id": 5, "name": "events", "type": {"type": "list", "element-id": 18, "element": {"type": "struct", "fields": [{"id": 19, "name": "event_type", "type": "string", "required": false}, {"id": 20, "name": "event_time", "type": "timestamptz", "required": false}, {"id": 21, "name": "metadata", "type": {"type": "map", "key-id": 22, "key": "string", "value-id": 23, "value": "string", "value-required": false}, "required": false}]}, "element-required": false}, "required": false, "doc": "User event history"}, {"id": 6, "name": "event_data", "type": "variant", "required": false, "doc": "User event data"}, {"id": 7, "name": "category", "type": "string", "required": true, "doc": "Event category"}, {"id": 8, "name": "created_at", "type": "timestamptz", "required": true, "doc": "Event creation time"}]}], "current-schema-id": 0, "last-column-id": 23, "partition-specs": [{"fields": [{"field-id": 1000, "source-id": 8, "name": "created_at_day", "transform": "day"}, {"field-id": 1001, "source-id": 7, "name": "category", "transform": "identity"}]}], "default-spec-id": 0, "last-partition-id": 1001, "sort-orders": [{"fields": []}, {"fields": [{"source-id": 8, "transform": "identity", "direction": "desc", "null-order": "nulls-last"}, {"source-id": 1, "transform": "identity", "direction": "asc", "null-order": "nulls-first"}]}], "default-sort-order-id": 1, "snapshots": [{"snapshot-id": 201003753560339990, "sequence-number": 1, "timestamp-ms": 1774722865518, "manifest-list": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/snap-201003753560339990-1-e0dcc235-e5a1-454a-a303-6a1c8fa22525.avro", "first-row-id": 0, "summary": {"operation": "append", "spark.app.id": "local-1774722859049", "added-data-files": "1", "added-records": "1", "added-files-size": "5600", "changed-partition-count": "1", "total-records": "1", "total-files-size": "5600", "total-data-files": "1", "total-delete-files": "0", "total-position-deletes": "0", "total-equality-deletes": "0", "engine-version": "4.0.2", "app-id": "local-1774722859049", "engine-name": "spark", "iceberg-version": "Apache Iceberg 1.10.1 (commit ccb8bc435062171e64bc8b7e5f56e6aed9c5b934)"}, "schema-id": 0}], "refs": {"main": {"type": "branch", "snapshot-id": 201003753560339990}}, "current-snapshot-id": 201003753560339990, "last-sequence-number": 1, "snapshot-log": [{"snapshot-id": 201003753560339990, "timestamp-ms": 1774722865518}], "metadata-log": [{"metadata-file": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00000-9cac3cd7-7dbd-4355-be3d-2d3da33d3158.metadata.json", "timestamp-ms": 1774722863092}, {"metadata-file": "file:/var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user/metadata/00001-ef4623e9-286d-4859-9aa6-e90e968b8b12.metadata.json", "timestamp-ms": 1774722863221}], "statistics": [], "partition-statistics": []}}

# table summarize
➜  polaris git:(cli_summary_subcommand_v2) ✗ ./polaris --profile dev tables summarize user --catalog quickstart_catalog --namespace dev_namespace.sub_namespace
Table: dev_namespace.sub_namespace.user
--------------------------------------------------------------------------------
Metadata
  Location:                      file:///var/tmp/quickstart_catalog/dev_namespace/sub_namespace/user
  Format Version:                3
  Snapshots:                     1
  Current Snapshot ID:           201003753560339990
  Last Updated:                  2026-03-28 18:34:25 UTC

Statistics
  Total Records:                 1
  Total Data Files:              1
  Total Files Size:              5600

Schema
  +----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+
  | ID | Field Name | Type                                                                                            | Required | Comment             |
  +----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+
  | 1  | id         | long                                                                                            | *        | Row ID              |
  | 2  | user       | struct<user_id:string, name:string, address:struct<street:string, city:string, country:string>> | *        | User info           |
  | 3  | tags       | list<string>                                                                                    |          | tags                |
  | 4  | attributes | map<string, string>                                                                             |          | User attributes     |
  | 5  | events     | list<struct<event_type:string, event_time:timestamptz, metadata:map<string, string>>>           |          | User event history  |
  | 6  | event_data | variant                                                                                         |          | User event data     |
  | 7  | category   | string                                                                                          | *        | Event category      |
  | 8  | created_at | timestamptz                                                                                     | *        | Event creation time |
  +----+------------+-------------------------------------------------------------------------------------------------+----------+---------------------+

Partitioning
  +-----------+----------------+-----------+
  | Source ID | Field Name     | Transform |
  +-----------+----------------+-----------+
  | 8         | created_at_day | day       |
  | 7         | category       | identity  |
  +-----------+----------------+-----------+

Sort order
  +-----------+-----------+-------------+-----------+
  | Source ID | Transform | Null Order  | Direction |
  +-----------+-----------+-------------+-----------+
  | 8         | identity  | nulls-last  | desc      |
  | 1         | identity  | nulls-first | asc       |
  +-----------+-----------+-------------+-----------+

Effective policies
  - orphan-file-policy (Inherited from dev_namespace)
  - snapshot-expiry-policy (Inherited from dev_namespace)
--------------------------------------------------------------------------------

Setup instructions used for above

# setup
## boostrap 
./polaris --profile dev setup apply site/content/guides/assets/polaris/reference-setup-config.yaml

## create sample table with complex types and sort order etc.
CREATE TABLE IF NOT EXISTS dev_namespace.sub_namespace.user (
    id BIGINT NOT NULL COMMENT 'Row ID',
    user STRUCT<user_id: STRING, name: STRING, address: STRUCT<street: STRING, city: STRING, country: STRING>> NOT NULL COMMENT 'User info',
    tags ARRAY<STRING> COMMENT 'tags',
    attributes MAP<STRING, STRING> COMMENT 'User attributes',
    events ARRAY<STRUCT<event_type: STRING, event_time: TIMESTAMP, metadata: MAP<STRING, STRING>>> COMMENT 'User event history',
    event_data VARIANT COMMENT 'User event data',
    category STRING NOT NULL COMMENT 'Event category',
    created_at TIMESTAMP NOT NULL COMMENT 'Event creation time'
)
USING iceberg
PARTITIONED BY (days(created_at), category)
TBLPROPERTIES ('format-version' = '3');

ALTER TABLE dev_namespace.sub_namespace.user WRITE ORDERED BY (created_at DESC, id);

INSERT INTO dev_namespace.sub_namespace.user VALUES (
  1,
  named_struct(
    'user_id', 'u1',
    'name', 'xxx',
    'address', named_struct('street', 'xxx', 'city', 'xxx', 'country', 'xx')
  ),
  array('tag1', 'tag2'),
  map('key1', 'value1'),
  array(
    named_struct(
      'event_type', 'x',
      'event_time', timestamp '2026-03-24 12:00:00',
      'metadata', map('k', 'v')
    )
  ),
  parse_json('{"dynamic_field": 123, "nested": {"a": true}}'),
  'xxx',
  timestamp '2026-03-24 12:00:00'
);

CREATE VIEW IF NOT EXISTS dev_namespace.sub_namespace.user_view AS SELECT * FROM dev_namespace.sub_namespace.user;

Checklist

🛡️ Don't disclose security issues! (contact security@apache.org)
🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
🧪 Added/updated tests with good coverage, or manually tested (and explained how)
💡 Added comments for complex logic
🧾 Updated CHANGELOG.md (if needed)
📚 Updated documentation in site/content/in-dev/unreleased (if needed)

…/SUMMARIZE/LIST

client/python/apache_polaris/cli/command/tables.py

site/content/in-dev/unreleased/command-line-interface.md

dimas-b

Nice new tools, @MonkeyCanCode 👍

Just one comment about matching logic 😅

dimas-b · 2026-04-01T18:42:16Z

client/python/apache_polaris/cli/command/utils.py

+    # Subsequence match: enabled for length > 2
+    if query_len > 2:
+        iterator = iter(t)
+        if all(char in iterator for char in q):


This will match q: max, t: mixed bag of exceptions, right? Is that intended?

Yes. Similar to fuzzy search, we don't know the total length. So users can reduce the search result by providing more characters. I put 3 characters minimal before fuzzy search to avoid user typed 'a' then it returns everything contains letter 'a'.

I'm fine with this logic if it works for you :) just wanted to make sure the behaviour was intentional :)

I am more than happy to take feedback on how to better handle this and if min of 4 characters is too verbose to trigger a fuzzy search. This requirement is from @flyrain , any thoughts on this route?

TBH, I'm not sure if there are any (realistic) cases that will get a match by this rule, but not get a match by the SequenceMatcher (below) 🤔 Do you have any examples like that?

This was added to avoid FP noise. For example, if we allow SequenceMatcher on any character lengths, a single letter a will match anything contains letter a.
Thus, what I thought was following:

len 1: only exact or prefix match

len 2: add substring match (q in t)

len 3: add subsequence match

len 4+: similarity ratio check via SequenceMatcher

When I was testing this earlier with setup setup, allow similarity search on len 3 is too noise. Thus, I added subsequence match here instead. But it is not really necessary if a bit noise output is acceptable.

In my personal opinion, matching max to mixed bag of exceptions (the subsequence rule) is noise too 😅 TBH, I do not see "logic" behind this rule 😅

I'd use SequenceMatcher immediately if exact substring matches do not yield True, but use different thresholds depending on the query string size to reduce noise.

However, like I said, I do not feel strongly about this.

MonkeyCanCode added 15 commits March 23, 2026 00:32

wip

80620a8

wip

727108f

wip

ae18531

wip

366fa84

wip

65db78c

Merge branch 'main' into cli_summary_subcommand_v2

3b066f4

wip

25d940f

wip

c790e88

wip

8630f60

wip

c8d7cdc

wip

c5aefe8

wip

99a73aa

wip

0363de3

Add tables and find commands to CLI

3a97e96

Update table validate to check table_name only if subcommands are GET…

c2b8d13

…/SUMMARIZE/LIST

github-project-automation bot added this to Basic Kanban Board Mar 28, 2026

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board Mar 28, 2026

MonkeyCanCode requested review from HonahX, dimas-b, flyrain and jbonofre March 28, 2026 19:29

Fix cli doc

b7b2c3b

MonkeyCanCode mentioned this pull request Mar 31, 2026

Better UI/UX for Polaris CLI #4090

Open

jbonofre reviewed Mar 31, 2026

View reviewed changes

client/python/apache_polaris/cli/command/tables.py Outdated Show resolved Hide resolved

site/content/in-dev/unreleased/command-line-interface.md Outdated Show resolved Hide resolved

MonkeyCanCode mentioned this pull request Apr 1, 2026

Add doc for setup command under getting started #4087

Open

6 tasks

MonkeyCanCode added 2 commits March 31, 2026 22:32

Merge branch 'main' into cli_summary_subcommand_v2

1aafa02

Fix typo and only print complete msg for drop table if no exception

d1ab03c

dimas-b reviewed Apr 1, 2026

View reviewed changes

dimas-b approved these changes Apr 1, 2026

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI: add tables/find commands#4075

CLI: add tables/find commands#4075
MonkeyCanCode wants to merge 18 commits intoapache:mainfrom
MonkeyCanCode:cli_summary_subcommand_v2

MonkeyCanCode commented Mar 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

dimas-b left a comment

Uh oh!

dimas-b Apr 1, 2026

Uh oh!

MonkeyCanCode Apr 1, 2026

Uh oh!

dimas-b Apr 1, 2026

Uh oh!

MonkeyCanCode Apr 1, 2026 •

edited

Loading

Uh oh!

dimas-b Apr 1, 2026 •

edited

Loading

Uh oh!

MonkeyCanCode Apr 1, 2026

Uh oh!

dimas-b Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MonkeyCanCode commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Find command

Tables command

Setup instructions used for above

Checklist

Uh oh!

Uh oh!

Uh oh!

dimas-b left a comment

Choose a reason for hiding this comment

Uh oh!

dimas-b Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

MonkeyCanCode Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

dimas-b Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

MonkeyCanCode Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimas-b Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MonkeyCanCode Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

dimas-b Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MonkeyCanCode commented Mar 28, 2026 •

edited

Loading

MonkeyCanCode Apr 1, 2026 •

edited

Loading

dimas-b Apr 1, 2026 •

edited

Loading