-
Notifications
You must be signed in to change notification settings - Fork 15.3k
feat: dataset folders (backend) #32520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
b03aa11
to
33a78f2
Compare
Pointing out something related here, is that the tree component is likely going to be composed of react nodes (not just the description). Today we have [from memory, components names may not match] MetricLabel, ColumnLabel, CalculatedColumnLabel, and these require props beyond what's in folders (description to make a InfoBubbleTooltip, sql_expression, data type, ...). Assuming we want for a rich tree with the full labels (at least on the left panel in explore), this means we'll still have to lookup the related objects from the API. Thinking about future frontend development, we'll need some sort of |
29bdf93
to
3811473
Compare
/korbit-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Fix Detected |
---|---|---|
Non-normalized JSON storage design ▹ view | ||
Unbounded Recursive Nesting ▹ view |
Suppressed issues based on your team's Korbit activity
This issue | Is similar to | Because |
---|---|---|
The DATASET_FOLDERS feature flag is disabled by default which prevents users from using the newly implemented folder organization functionality. |
Default value of CATALOGS_SIMPLIFIED_MIGRATION negates intended performance improvement |
Similar issues were not addressed in the past |
When you react to issues (for example, an upvote or downvote) or you fix them, Korbit will tune future reviews based on these signals.
Files scanned
File Path | Reviewed |
---|---|
superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py | ✅ |
superset/commands/dataset/update.py | ✅ |
superset/datasets/schemas.py | ✅ |
superset/datasets/api.py | ✅ |
superset/connectors/sqla/models.py | ✅ |
superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Documentation ✅ Logging ✅ Error Handling ✅ Readability ✅ Design ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
Category | Issue | Fix Detected |
---|---|---|
Lost validation error context ▹ view | ||
Non-required type field compromises folder functionality ▹ view | ||
Ambiguous column name for JSON structure ▹ view |
Files scanned
File Path | Reviewed |
---|---|
superset/migrations/versions/2025-03-03_20-52_94e7a3499973_add_folder_table.py | ✅ |
superset/commands/dataset/update.py | ✅ |
superset/datasets/schemas.py | ✅ |
superset/datasets/api.py | ✅ |
superset/connectors/sqla/models.py | ✅ |
superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- On any given comment that Korbit raises on your pull request, you can have a discussion with Korbit by replying to the comment.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Documentation ✅ Logging ✅ Error Handling ✅ Readability ✅ Design ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
def upgrade(): | ||
op.add_column( | ||
"tables", | ||
sa.Column("folders", JSON, nullable=True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ambiguous column name for JSON structure 
Tell me more
What is the issue?
The column name 'folders' is ambiguous and doesn't clearly indicate its purpose or structure in the database schema.
Why this matters
Without a more descriptive name, future developers will need to investigate other parts of the codebase to understand what type of folders are stored and how they are structured in the JSON field.
Suggested change ∙ Feature Preview
Rename the column to be more specific, such as:
sa.Column("dataset_folder_config", JSON, nullable=True)
or
sa.Column("column_folder_hierarchy", JSON, nullable=True)
💬 Looking for more details? Reply to this comment to chat with Korbit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we now have util functions for adding and dropping columns that we'll want to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for implementing this feature. I agree with your approach of creating a serialized folder structure in a new JSON column. However, I would suggest renaming column 'folders' to 'folder_hierarchy' or 'folder_structure' for better clarity and context. This would give a clear hint about the structure without being too verbose. Additionally, could you please utilize the existing util functions for adding and dropping columns, that would help in maintaining consistent coding patterns across the repo.
"column": {column.uuid: column.column_name for column in columns}, | ||
} | ||
|
||
queue: list[tuple[FolderSchema, list[str]]] = [(folder, []) for folder in folders] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
135c7ea
to
ae67d14
Compare
superset/datasets/schemas.py
Outdated
@@ -88,6 +88,18 @@ class DatasetMetricsPutSchema(Schema): | |||
uuid = fields.UUID(allow_none=True) | |||
|
|||
|
|||
class FolderSchema(Schema): | |||
uuid = fields.UUID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@betodealmeida does this UUID need to be generated on the client if it's required here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the client needs to generate a UUID when creating folders. They are only used by tools that want to keep track of external folders and Superset folders.
{ | ||
"type": "column", | ||
"uuid": dataset.columns[1].uuid, | ||
"name": dataset.columns[1].column_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but when using the api, it seems unnecessary to need the uuid and the name? We should be able to identify it by just the type and uuid?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I considered making them optional.
I included them in the GET
API because it makes it easier to understand the API response, even thought they are redundant. But I think we could make them optional in the PUT
, and infer types/names from the existing metrics/columns.
NOTE: ran into something a little funky where both this PR and #32680 were running the same db migration with different names. Removed the one here as a duplicate of the other. |
I think I'm mostly through. Great code and TIL about unpacking with |
@@ -50863,7 +50863,7 @@ | |||
"version": "0.20.3", | |||
"license": "Apache-2.0", | |||
"dependencies": { | |||
"@types/react-redux": "^7.1.10", | |||
"@types/react-redux": "^7.1.34", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a [desired AFAIK] artifact from docker-compose up
running npm i
, I think we should merge this along with any PR
Co-authored-by: Maxime Beauchemin <maximebeauchemin@gmail.com> (cherry picked from commit 7ab8534)
SUMMARY
Tentative implementation of the backend for #32351.
I ended up doing the simplest thing that works, based on these requirements:
I started the implementation with a
Folder
model, and mapped relationships:Folder n:1 SqlaTable
SqlMetric n:1 Folder
TableColumn n:1 Folder
One problem with this approach is that order is important, so I had to keep track of the order of each folder and each element inside a folder. This required additional columns, and complex bookkeeping when a dataset was updated — did the positions change? Were metrics removed, columns added, folders renamed? Moving the last element in a folder to the first position, for example, would require updating the position of all elements inside the folder.
Additionally, representing nested folders would require additional relationships to be tracked.
I ditched that approach, and instead opted to serialize the folder structure to a new JSON column in the
SqlaTable
model calledfolders
. When doing aGET
request to/api/v1/dataset/
or/api/v1/explore/
the response now includes UUIDs for metrics and columns, and has the new attributefolders
:With this solution the frontend can easily build the UI from this response. Note that the payload only includes custom folders, and the
metrics
andcolumns
attribute in the response are unmodified (meaning a metric that is present in a folder will still show up undermetrics
). It's up to the frontend to build the existing "Columns" and "Metrics" sections by removing any elements that are present in custom folders. This way we can build the feature progressively by first enhancing the API, and later adding the custom UI.To organize the metrics and columns into folders, as well as create new folders, the user must edit the dataset (dataset creation doesn't show metrics nor columns). After creating folders and adding metrics/columns to them the user can save the dataset. The
PUT
request will then send a payload that can also be enhanced with thefolders
attribute.There is no model for folders, since it offered little to no value. Instead, the client simply declare the folder name, UUID, and an optional description, as well as the tree structure. The UUID is used for external bookkeeping, for example, for systems where semantics are defined outside of Superset and periodically synced via the API.
Depends on #32680.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A
TESTING INSTRUCTIONS
Added tests.
ADDITIONAL INFORMATION