feat: dataset folders (backend) #32520

betodealmeida · 2025-03-05T19:47:07Z

SUMMARY

Tentative implementation of the backend for #32351.

I ended up doing the simplest thing that works, based on these requirements:

We want to reuse existing dataset APIs (there will be no new APIs to manage folders).
Order of folders is important, as well as the order of elements within folders.
Support for nested folders should be present from the start (new requirement).

I started the implementation with a Folder model, and mapped relationships:

Folder n:1 SqlaTable
SqlMetric n:1 Folder
TableColumn n:1 Folder

One problem with this approach is that order is important, so I had to keep track of the order of each folder and each element inside a folder. This required additional columns, and complex bookkeeping when a dataset was updated — did the positions change? Were metrics removed, columns added, folders renamed? Moving the last element in a folder to the first position, for example, would require updating the position of all elements inside the folder.

Additionally, representing nested folders would require additional relationships to be tracked.

I ditched that approach, and instead opted to serialize the folder structure to a new JSON column in the SqlaTable model called folders. When doing a GET request to /api/v1/dataset/ or /api/v1/explore/ the response now includes UUIDs for metrics and columns, and has the new attribute folders:

{
  ...
  "metrics": [
    {
      "metric_name": "count",
      "uuid": "uuid2",
      ...
    },
    ...
  ],
  "columns": [
    {
      "column_name": "country",
      "uuid": "uuid5",
      ...
    },
    {
      "column_name": "column-not-in-any-folder",
      "uuid": "uuid6",
      ...
    },
    ...
  ],
  "folders": [
    {
      "uuid": "uuid1",
      "name": "My metrics",
      "children": [
        {
          "uuid": "uuid2"
        },
      ],
    },
    {
      "uuid": "uuid3",
      "name": "My columns",
      "children": [
        {
          "uuid": "uuid4",
          "name": "Dimensions",
          "children": [
            {
              "uuid": "uuid5"
            },
          ],
        },
      ],
    },
  ]
}

With this solution the frontend can easily build the UI from this response. Note that the payload only includes custom folders, and the metrics and columns attribute in the response are unmodified (meaning a metric that is present in a folder will still show up under metrics). It's up to the frontend to build the existing "Columns" and "Metrics" sections by removing any elements that are present in custom folders. This way we can build the feature progressively by first enhancing the API, and later adding the custom UI.

To organize the metrics and columns into folders, as well as create new folders, the user must edit the dataset (dataset creation doesn't show metrics nor columns). After creating folders and adding metrics/columns to them the user can save the dataset. The PUT request will then send a payload that can also be enhanced with the folders attribute.

There is no model for folders, since it offered little to no value. Instead, the client simply declare the folder name, UUID, and an optional description, as well as the tree structure. The UUID is used for external bookkeeping, for example, for systems where semantics are defined outside of Superset and periodically synced via the API.

Depends on #32680.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added tests.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

korbit-ai · 2025-03-05T19:47:11Z

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

mistercrunch · 2025-03-05T21:16:36Z

Pointing out something related here, is that the tree component is likely going to be composed of react nodes (not just the description). Today we have [from memory, components names may not match] MetricLabel, ColumnLabel, CalculatedColumnLabel, and these require props beyond what's in folders (description to make a InfoBubbleTooltip, sql_expression, data type, ...). Assuming we want for a rich tree with the full labels (at least on the left panel in explore), this means we'll still have to lookup the related objects from the API.

Thinking about future frontend development, we'll need some sort of assembleTreeDataForComponent(apiPayload) method, and that it'll have to build rich react nodes. One the other side, we'll need something that takes the AntdTree.treeData and prepares the POST payload required by the API, this one should be fairly simple I think.

betodealmeida · 2025-03-06T16:37:20Z

/korbit-review

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Fix Detected
	Non-normalized JSON storage design ▹ view
	Unbounded Recursive Nesting ▹ view

Suppressed issues based on your team's Korbit activity

This issue	Is similar to	Because
lines 565:567: The DATASET_FOLDERS feature flag is disabled by default which prevents users from using the newly implemented folder organization functionality.	Default value of CATALOGS_SIMPLIFIED_MIGRATION negates intended performance improvement	Similar issues were not addressed in the past

When you react to issues (for example, an upvote or downvote) or you fix them, Korbit will tune future reviews based on these signals.

Files scanned

File Path	Reviewed
superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py	✅
superset/commands/dataset/update.py	✅
superset/datasets/schemas.py	✅
superset/datasets/api.py	✅
superset/connectors/sqla/models.py	✅
superset/config.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.

You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.

Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.

On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.

Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

Check out our docs on how you can make Korbit work best for you and your team.

Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings

Setting Value

Review Schedule Automatic excluding drafts

Max Issue Count 10

Automatic PR Descriptions ❌

Issue Categories

Category Enabled

Documentation ✅

Logging ✅

Error Handling ✅

Readability ✅

Design ✅

Performance ✅

Security ✅

Functionality ✅

Feedback and Support

Tell us what you think of Korbit

Schedule a call with our team

Email us @ support@korbit.ai

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py

superset/datasets/schemas.py

korbit-ai

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Category	Issue	Fix Detected
	Lost validation error context ▹ view
	Non-required type field compromises folder functionality ▹ view
	Ambiguous column name for JSON structure ▹ view

Files scanned

File Path	Reviewed
superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py	✅
superset/commands/dataset/update.py	✅
superset/datasets/schemas.py	✅
superset/datasets/api.py	✅
superset/connectors/sqla/models.py	✅
superset/config.py	✅

Explore our documentation to understand the languages and file types we support and the files we ignore.

Need a new review? Comment /korbit-review on this PR and I'll review your latest changes.

Korbit Guide: Usage and Customization

Interacting with Korbit

You can manually ask Korbit to review your PR using the /korbit-review command in a comment at the root of your PR.

You can ask Korbit to generate a new PR description using the /korbit-generate-pr-description command in any comment on your PR.

Too many Korbit comments? I can resolve all my comment threads if you use the /korbit-resolve command in any comment on your PR.

On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.

Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.

Customizing Korbit

Check out our docs on how you can make Korbit work best for you and your team.

Customize Korbit for your organization through the Korbit Console.

Current Korbit Configuration

General Settings

Setting Value

Review Schedule Automatic excluding drafts

Max Issue Count 10

Automatic PR Descriptions ❌

Issue Categories

Category Enabled

Documentation ✅

Logging ✅

Error Handling ✅

Readability ✅

Design ✅

Performance ✅

Security ✅

Functionality ✅

Feedback and Support

Tell us what you think of Korbit

Schedule a call with our team

Email us @ support@korbit.ai

Note

Korbit Pro is free for open source projects 🎉

Looking to add Korbit to your team? Get started with a free 2 week trial here

superset/commands/dataset/update.py

superset/datasets/schemas.py

korbit-ai · 2025-03-13T13:50:32Z

superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py

+def upgrade():
+    op.add_column(
+        "tables",
+        sa.Column("folders", JSON, nullable=True),


Ambiguous column name for JSON structure

Tell me more

What is the issue?

The column name 'folders' is ambiguous and doesn't clearly indicate its purpose or structure in the database schema.

Why this matters

Without a more descriptive name, future developers will need to investigate other parts of the codebase to understand what type of folders are stored and how they are structured in the JSON field.

Suggested change ∙ Feature Preview

Rename the column to be more specific, such as:

sa.Column("dataset_folder_config", JSON, nullable=True)

or

sa.Column("column_folder_hierarchy", JSON, nullable=True)

_{💬 Looking for more details? Reply to this comment to chat with Korbit.}

we now have util functions for adding and dropping columns that we'll want to use.

Thank you for implementing this feature. I agree with your approach of creating a serialized folder structure in a new JSON column. However, I would suggest renaming column 'folders' to 'folder_hierarchy' or 'folder_structure' for better clarity and context. This would give a clear hint about the structure without being too verbose. Additionally, could you please utilize the existing util functions for adding and dropping columns, that would help in maintaining consistent coding patterns across the repo.

geido · 2025-03-13T15:32:39Z

superset/commands/dataset/update.py

+        "column": {column.uuid: column.column_name for column in columns},
+    }
+
+    queue: list[tuple[FolderSchema, list[str]]] = [(folder, []) for folder in folders]


eschutho · 2025-03-14T20:58:49Z

superset/datasets/schemas.py

@@ -88,6 +88,18 @@ class DatasetMetricsPutSchema(Schema):
    uuid = fields.UUID(allow_none=True)


+class FolderSchema(Schema):
+    uuid = fields.UUID()


@betodealmeida does this UUID need to be generated on the client if it's required here?

Right, the client needs to generate a UUID when creating folders. They are only used by tools that want to keep track of external folders and Superset folders.

eschutho · 2025-03-14T21:00:14Z

tests/integration_tests/datasets/api_tests.py

+                                {
+                                    "type": "column",
+                                    "uuid": dataset.columns[1].uuid,
+                                    "name": dataset.columns[1].column_name,


Not a big deal, but when using the api, it seems unnecessary to need the uuid and the name? We should be able to identify it by just the type and uuid?

Right, I considered making them optional.

I included them in the GET API because it makes it easier to understand the API response, even thought they are redundant. But I think we could make them optional in the PUT, and infer types/names from the existing metrics/columns.

mistercrunch · 2025-04-10T19:56:28Z

NOTE: ran into something a little funky where both this PR and #32680 were running the same db migration with different names. Removed the one here as a duplicate of the other.

…allowed anymore

betodealmeida · 2025-04-11T02:18:44Z

NOTE: ran into something a little funky where both this PR and #32680 were running the same db migration with different names. Removed the one here as a duplicate of the other.

I had spit the migration into a separate PR by @sad's request, and this beach was based on the migration branch IIRC.

mistercrunch · 2025-04-11T02:32:52Z

I think I'm mostly through. Great code and TIL about unpacking with *[]. So new that both AI and I were a bit confused about it :)

…either way is fine by me

mistercrunch · 2025-04-11T18:30:09Z

superset-frontend/package-lock.json

@@ -50863,7 +50863,7 @@
      "version": "0.20.3",
      "license": "Apache-2.0",
      "dependencies": {
-        "@types/react-redux": "^7.1.10",
+        "@types/react-redux": "^7.1.34",


a [desired AFAIK] artifact from docker-compose up running npm i, I think we should merge this along with any PR

Co-authored-by: Maxime Beauchemin <maximebeauchemin@gmail.com> (cherry picked from commit 7ab8534)

pull-request-size bot added the size/L label Mar 5, 2025

github-actions bot added risk:db-migration PRs that require a DB migration api Related to the REST API preset-io labels Mar 5, 2025

betodealmeida force-pushed the folder-api branch from b03aa11 to 33a78f2 Compare March 5, 2025 20:23

pull-request-size bot added size/XL and removed size/L labels Mar 5, 2025

betodealmeida force-pushed the folder-api branch 2 times, most recently from 29bdf93 to 3811473 Compare March 6, 2025 16:30

korbit-ai bot reviewed Mar 6, 2025

View reviewed changes

superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py Outdated Show resolved Hide resolved

superset/datasets/schemas.py Outdated Show resolved Hide resolved

sadpandajoe added the review:draft label Mar 6, 2025

kgabryje mentioned this pull request Mar 7, 2025

feat(explore): Integrate dataset panel with Folders feature #32548

Closed

9 tasks

betodealmeida marked this pull request as ready for review March 13, 2025 13:45

betodealmeida requested review from mistercrunch, michael-s-molina and eschutho as code owners March 13, 2025 13:45

dosubot bot added the data:dataset Related to dataset configurations label Mar 13, 2025

korbit-ai bot reviewed Mar 13, 2025

View reviewed changes

geido reviewed Mar 13, 2025

View reviewed changes

michael-s-molina removed the review:draft label Mar 13, 2025

michael-s-molina requested a review from villebro March 13, 2025 17:25

betodealmeida mentioned this pull request Mar 14, 2025

feat: DB migration for dataset folders #32680

Merged

9 tasks

betodealmeida force-pushed the folder-api branch 2 times, most recently from 135c7ea to ae67d14 Compare March 14, 2025 18:19

github-actions bot removed the risk:db-migration PRs that require a DB migration label Mar 14, 2025

eschutho reviewed Mar 14, 2025

View reviewed changes

github-actions bot removed the risk:db-migration PRs that require a DB migration label Apr 10, 2025

mistercrunch added 3 commits April 10, 2025 12:59

fix a test

63935fa

fix another master issue

114fdcb

fixing more tests

0c5a40a

github-actions bot added the dependencies:npm label Apr 11, 2025

mistercrunch added 2 commits April 10, 2025 19:02

bypassing the 403 or 404 issue I'm seeing

570d1db

removing test that doesn't make sense anymore as passing names isn't …

1223313

…allowed anymore

testing that marshmallow handles invalid uuids as expected

1a8c430

mistercrunch added 4 commits April 10, 2025 19:40

making rv.status_code in (403, 404) as there's flakiness around it - …

7f08829

…either way is fine by me

some more nonsense

dec115b

skip one

7e4dbbb

tell pytest to keep going

81e6f8d

mistercrunch requested review from rusackas, nytai, craig-rueda, kgabryje and dpgaspar as code owners April 11, 2025 03:34

github-actions bot added the github_actions Pull requests that update GitHub Actions code label Apr 11, 2025

mistercrunch added 3 commits April 11, 2025 09:27

Merge branch 'master' into folder-api

25c5d91

fixing more tests

fb3fc0c

aligning with master

8012a9c

eschutho approved these changes Apr 11, 2025

View reviewed changes

mistercrunch reviewed Apr 11, 2025

View reviewed changes

eschutho merged commit 7ab8534 into master Apr 11, 2025
54 checks passed

eschutho deleted the folder-api branch April 11, 2025 18:38

eschutho mentioned this pull request Apr 11, 2025

feat(explore): Integrate dataset panel with Folders feature #33104

Merged

9 tasks

alexandrusoare pushed a commit to alexandrusoare/superset that referenced this pull request Jun 19, 2025

feat: dataset folders (backend) (apache#32520)

872cd8e

Co-authored-by: Maxime Beauchemin <maximebeauchemin@gmail.com> (cherry picked from commit 7ab8534)

Setting	Value
Review Schedule	Automatic excluding drafts
Max Issue Count	10
Automatic PR Descriptions	❌

Category	Enabled
Documentation	✅
Logging	✅
Error Handling	✅
Readability	✅
Design	✅
Performance	✅
Security	✅
Functionality	✅

feat: dataset folders (backend) #32520

feat: dataset folders (backend) #32520

Uh oh!

Conversation

betodealmeida commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Uh oh!

korbit-ai bot commented Mar 5, 2025

Uh oh!

mistercrunch commented Mar 5, 2025

Uh oh!

betodealmeida commented Mar 6, 2025

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Interacting with Korbit

Customizing Korbit

Current Korbit Configuration

Feedback and Support

Looking to add Korbit to your team? Get started with a free 2 week trial here

Uh oh!

Uh oh!

Uh oh!

korbit-ai bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.

Interacting with Korbit

Customizing Korbit

Current Korbit Configuration

Feedback and Support

Looking to add Korbit to your team? Get started with a free 2 week trial here

Uh oh!

Uh oh!

Uh oh!

korbit-ai bot Mar 13, 2025

Choose a reason for hiding this comment

Ambiguous column name for JSON structure

What is the issue?

Why this matters

Suggested change ∙ Feature Preview

Uh oh!

sadpandajoe Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

korbit-ai bot Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

geido Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

eschutho Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

betodealmeida Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

eschutho Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

betodealmeida Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

mistercrunch commented Apr 10, 2025

Uh oh!

betodealmeida commented Apr 11, 2025

Uh oh!

mistercrunch commented Apr 11, 2025

Uh oh!

mistercrunch Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

betodealmeida commented Mar 5, 2025 •

edited

Loading

korbit-ai bot left a comment •

edited

Loading

korbit-ai bot left a comment •

edited

Loading