Skip to content

Conversation

@dforsber
Copy link

Support for CREATE VIEW

This PR adds support for CREATE VIEW statements on the Airport extension.

Motivation

I would like to add support for materialised views on my FlightRPC Data Ingestion Server (https://www.boilstream.com). While, CREATE TABLE works already and can be used to create topics, CREATE VIEW would be used to create derived topics, i.e. materialised views.

Example

D ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://localhost:50051/');
D SHOW ALL TABLES;
┌──────────┬─────────┬─────────┬──────────────┬──────────────┬───────────┐
│ database │ schema  │  name   │ column_names │ column_types │ temporary │
│ varcharvarcharvarcharvarchar[]   │  varchar[]   │  boolean  │
├──────────┴─────────┴─────────┴──────────────┴──────────────┴───────────┤
│                                 0 rows                                 │
└────────────────────────────────────────────────────────────────────────┘
D CREATE TABLE boilstream.s3.people (name VARCHAR, age INT, tags VARCHAR[]);
D CREATE VIEW boilstream.s3.filtered_a AS SELECT * FROM boilstream.s3.people WHERE name LIKE 'a%';
D CREATE VIEW boilstream.s3.filtered_b AS SELECT * FROM boilstream.s3.people WHERE name LIKE 'b%';
D CREATE VIEW boilstream.s3.filtered_adults AS SELECT * FROM boilstream.s3.people WHERE age > 50;
D ATTACH 'boilstream' (TYPE AIRPORT, location 'grpc://localhost:50051/');
D SELECT table_name, comment FROM duckdb_tables();
┌────────────────────────┬─────────────────────────────────────────────────────────────────────────────┐
│       table_name       │                                   comment                                   │
│        varcharvarchar                                   │
├────────────────────────┼─────────────────────────────────────────────────────────────────────────────┤
│ people→filtered_adults │ Materialized view: SELECT * FROM boilstream.s3.people WHERE age > 50;       │
│ people→filtered_b      │ Materialized view: SELECT * FROM boilstream.s3.people WHERE name LIKE 'b%'; │
│ people→filtered_a      │ Materialized view: SELECT * FROM boilstream.s3.people WHERE name LIKE 'a%'; │
│ people                 │ Topic created from DuckDB Airport CREATE TABLE request for table 'people'   │
└────────────────────────┴─────────────────────────────────────────────────────────────────────────────┘

Copy link
Collaborator

@rustyconover rustyconover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this contribution! I really appreciate you taking the time to work on this - it's exactly the kind of collaboration that makes this project better. I'm excited to get this feature merged soon.

I wanted to share a few ideas for how we might build on this:

  1. Right now the code works beautifully when users create views within their session, though those views are session-specific. It would be great to extend this so the server can include existing views when responding to list_schemas calls.
  2. We should store views in their own catalog collection, separate from tables. This could make schema lookups more intuitive and organized.
  3. The question of server-side materialization is an interesting one - ultimately that's a server decision, though it would be helpful if DuckDB could provide hints about materialization preferences. I'm happy to leave that as a potential enhancement for the team to explore down the road.

Thanks again for getting this started - looking forward getting completed after some travel today.

@rustyconover
Copy link
Collaborator

Hey @dforsber,

I was reading and thinking about this PR on my flight this morning and wanted to share some thoughts.

It hit me a few lines into working on the merge that this PR is really centered around materialized views—as opposed to the traditional (non-materialized) views that DuckDB supports. In DuckDB, views aren’t materialized and are typically a different type of object in the catalog. The objects you’re creating here behave more like tables, which is great—it just shifts the mental model a bit.

Long-term, I’d love for Airport to support both kinds of views. I actually have some ideas around materialized views that could be interesting in other Query.Farm extensions too. Right now though, overloading CREATE VIEW to build a remote materialized view might be a little surprising for users who expect a regular (non-materialized) view. Unfortunately, DuckDB doesn’t currently provide a clean way to let users specify which kind they want without extending the SQL parser.

I’m not at all opposed to extending the DuckDB parser—I just haven’t done it yet in Airport. I plan to take a look at the flockmtl extension soon since I believe they’ve gone down this path and might offer some good examples.

As for this PR specifically, I’ll need a little time before I can fully dive back into it—I have a few commercial priorities I need to get through first at Query.Farm.

Here’s where I think we should head next:

  1. Add support for an AirportViewSet to hold regular views in schemas.

  2. Implement support for regular (non-materialized) views.

  3. Extend the parser to allow specifying materialized vs. non-materialized views.

  4. Think through the extra considerations for remote materialized views:

    • Possible confusion between DROP TABLE and DROP VIEW.
    • From the client’s perspective, these feel more like tables than views.
    • What happens with ALTER TABLE?

Really appreciate your work on this PR this is super valuable functionality and I’m excited about where we can take it!

@dforsber
Copy link
Author

👍🏻 Yes, I think the cleanest way forward is normal VIEW support for Airport, along with DROP VIEW.

This PR was my quick AI assisted way forward for supporting VIEW creation with Airport in a way that the FlightRPC server can get the information, along with DROP.

I need to dig deeper into the extension to see how to implement the AirportViewSet.

I noticed that the DuckDB catalog and then the remove FlightRPC server returned catalog information (SHOW ALL TABLES) get populated differently depending on whether you are using the session where you created the TABLE or whether you start new session, ATTACH and run SHOW ALL TABLES.

@cmettler
Copy link

I was also thinking it would be nice to be able to create remote views in my Flight server. In my case, these would be views on SQL Server. Currently, this can't work as I would like because I want to reference SQL Server-specific functions in the view, which DuckDB could not parse unless I map them 1:1 as functions in Airport/DuckDB. However, I would not want to map the complete set of SQL Server functions to DuckDB.
An alternative approach could be a generic Airport DDL action where I would pass a SQL Server-specific raw DDL statement.
This action could return:

  • A new FlightInfo if something new has been created (i would treat external/remote views as a table)
  • Both old and new FlightInfo if something has been altered
  • The FlightInfo of something that has been dropped
  • No FlightInfo if nothing has been changed
  • A flag indicating that all metadata should be re-read (similar to a detach+attach operation) if several resources have been updated and can't be expressed with a single returned FlightInfo
    Could this work with the current duckdb catalog/airport integration?

@dforsber
Copy link
Author

@rustyconover Feel free to close this PR, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants