Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add flag to allow disabling creation of catalog tables #1155

Merged
merged 5 commits into from
Oct 18, 2024

Conversation

isc-patrick
Copy link
Contributor

Remove unnecessary _ensure_tables_exist method as this is already the default behavior of Metadata.create_all()

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Do you mind adding a test to show that this works for when

  1. tables already exist
  2. some tables exist, but not all
  3. none of the tables exist yet

@Fokko
Copy link
Contributor

Fokko commented Sep 10, 2024

@isc-patrick do you know how it will check if the table exists? I'm asking since a CREATE TABLE IF NOT EXISTS ... might require CREATE TABLE permissions.

@isc-patrick
Copy link
Contributor Author

I can certainly add tests, but that is really testing the Metadata.create_all() function in SQLAlchemy and not pyiceberg code. I think that CREATE TABLE IF NOT EXISTS requires same permissions as CREATE TABLE. How this is executed will be determined by the Dialect.

@kevinjqliu
Copy link
Contributor

please do! we want to ensure that this change does not break new and existing DB integrations.

@isc-patrick
Copy link
Contributor Author

It does not look like CREATE TABLE IF NOT EXISTS is what is used. It is specific to each dialect and how they implement the has_table method, but the ones I looked at use metadata on the schema to determine the existence of the table.

Copy link
Collaborator

@sungwy sungwy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @isc-patrick - thank you very much for putting up this PR, and for leading the thorough discussion on #1148.

I have added some nits, and in addition could we:

  • update the PR description to fit the updated scope of work for this PR
  • add a line to the SqlCatalog property so that users will be able to find this property?

| Key | Example | Default | Description |
| ------------- | ------------------------------------------------------------ | ------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| uri | postgresql+psycopg2://username:password@localhost/mydatabase | | SQLAlchemy backend URL for the catalog database (see [documentation for URL format](https://docs.sqlalchemy.org/en/20/core/engines.html#backend-specific-urls)) |
| echo | true | false | SQLAlchemy engine [echo param](https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.echo) to log all statements to the default log handler |
| pool_pre_ping | true | false | SQLAlchemy engine [pool_pre_ping param](https://docs.sqlalchemy.org/en/20/core/engines.html#sqlalchemy.create_engine.params.pool_pre_ping) to test connections for liveness upon each checkout |

@@ -225,6 +237,93 @@ def test_creation_from_impl(catalog_name: str, warehouse: Path) -> None:
)


def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def confirm_no_tables_exist(alchemy_engine: Engine) -> list[str]:
def confirm_no_tables_exist(alchemy_engine: Engine) -> List[str]:

for python3.8 compatibility (which will be deprecated soon), but this is currently needed for our CI to pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning this anymore, so removed, CATALOG_TABLES is no a constant. PR title updated and flag added to docs

if inspector.has_table(c.__tablename__):
c.__table__.drop(alchemy_engine)

catalog_tables = [c.__tablename__ for c in SqlCatalogBaseTable.__subclasses__()]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to serve two purposes:

  1. confirm that no tables exist
  2. return the list of SqlCatalogBaseTable.__subclasses__()

(2) is a static list of tables required for the SqlCatalog. Would it be cleaner to separate out (2) to a global variable in this module, instead of returning it and passing it as an input to confirm_all_tables_exist ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a constant

return catalog_tables


def confirm_all_tables_exist(inspector: Inspector, catalog_tables: list[str], catalog: Catalog) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we are passing the catalog here just to assert its type - is this necessary for this check? Was the intention to check the tables through the catalog's engine instead like:

inspect(catalog.engine).get_table_names()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not passing the inspector anymore. Also needed to cast the Catalog to use the engine. Made the uri a fixture to insure using same source between catalog and inspector that was creating tables.

@isc-patrick isc-patrick changed the title Remove unnecessary _ensure_tables_exist method Add flag to allow diabling creation of catalog tables Sep 17, 2024
@isc-patrick isc-patrick changed the title Add flag to allow diabling creation of catalog tables Add flag to allow disabling creation of catalog tables Sep 17, 2024
@sungwy
Copy link
Collaborator

sungwy commented Oct 18, 2024

Thank you again for working on this PR @isc-patrick !

@sungwy sungwy merged commit 1c50f53 into apache:main Oct 18, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants