-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add flag to allow disabling creation of catalog tables #1155
Conversation
… default behavior of Metadata.create_all()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Do you mind adding a test to show that this works for when
- tables already exist
- some tables exist, but not all
- none of the tables exist yet
@isc-patrick do you know how it will check if the table exists? I'm asking since a |
I can certainly add tests, but that is really testing the Metadata.create_all() function in SQLAlchemy and not pyiceberg code. I think that CREATE TABLE IF NOT EXISTS requires same permissions as CREATE TABLE. How this is executed will be determined by the Dialect. |
please do! we want to ensure that this change does not break new and existing DB integrations. |
It does not look like CREATE TABLE IF NOT EXISTS is what is used. It is specific to each dialect and how they implement the has_table method, but the ones I looked at use metadata on the schema to determine the existence of the table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @isc-patrick - thank you very much for putting up this PR, and for leading the thorough discussion on #1148.
I have added some nits, and in addition could we:
- update the PR description to fit the updated scope of work for this PR
- add a line to the SqlCatalog property so that users will be able to find this property?
iceberg-python/mkdocs/docs/configuration.md
Lines 261 to 265 in 0dc5408
| Key | Example | Default | Description | | |
| ------------- | ------------------------------------------------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | |
| uri | postgresql+psycopg2://username:password@localhost/mydatabase | | SQLAlchemy backend URL for the catalog database (see [documentation for URL format](https://docs.sqlalchemy.org/en/20/core/engines.html#backend-specific-urls)) | | |
| echo | true | false | SQLAlchemy engine [echo param](https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.echo) to log all statements to the default log handler | | |
| pool_pre_ping | true | false | SQLAlchemy engine [pool_pre_ping param](https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.pool_pre_ping) to test connections for liveness upon each checkout | |
tests/catalog/test_sql.py
Outdated
@@ -225,6 +237,93 @@ def test_creation_from_impl(catalog_name: str, warehouse: Path) -> None: | |||
) | |||
|
|||
|
|||
def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]: | |
def confirm_no_tables_exist(alchemy_engine: Engine) -> List[str]: |
for python3.8 compatibility (which will be deprecated soon), but this is currently needed for our CI to pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not returning this anymore, so removed, CATALOG_TABLES is no a constant. PR title updated and flag added to docs
tests/catalog/test_sql.py
Outdated
if inspector.has_table(c.__tablename__): | ||
c.__table__.drop(alchemy_engine) | ||
|
||
catalog_tables = [c.__tablename__ for c in SqlCatalogBaseTable.__subclasses__()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems to serve two purposes:
- confirm that no tables exist
- return the list of
SqlCatalogBaseTable.__subclasses__()
(2) is a static list of tables required for the SqlCatalog. Would it be cleaner to separate out (2) to a global variable in this module, instead of returning it and passing it as an input to confirm_all_tables_exist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it a constant
tests/catalog/test_sql.py
Outdated
return catalog_tables | ||
|
||
|
||
def confirm_all_tables_exist(inspector: Inspector, catalog_tables: list[str], catalog: Catalog) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are passing the catalog
here just to assert its type - is this necessary for this check? Was the intention to check the tables through the catalog's engine instead like:
inspect(catalog.engine).get_table_names()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not passing the inspector anymore. Also needed to cast the Catalog to use the engine. Made the uri a fixture to insure using same source between catalog and inspector that was creating tables.
Thank you again for working on this PR @isc-patrick ! |
Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all()