Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema and table filter arent optimizing discovery #23

Open
pnadolny13 opened this issue May 26, 2023 · 3 comments
Open

Schema and table filter arent optimizing discovery #23

pnadolny13 opened this issue May 26, 2023 · 3 comments

Comments

@pnadolny13
Copy link
Contributor

pnadolny13 commented May 26, 2023

@kgpayne I like the new support for the tables parameter but it looks like its implemented differently than I would expect. Let me know if this was discussed in other SDK issues already but here are my thoughts as a user on what I was expecting to happen:

Expected Behavior

  1. I provide a schema, the tap does discovery only on that schema
  2. I provide a schema and the tables array, then tap does discovery only on that schema and those tables

In 2 if I only include a single table name then I would expect that the tap only queries the metadata of that one table.

Current Behavior

The schema seems to be used only to create a connection but not for filtering using that schema name, so my sync job runs SHOW queries for every schema in my warehouse still. The tables config works as expected but also iterates all schemas/tables in the process even though I've provided a short list of tables to consider. It doesnt discover the schema for every table but it still has to iterate through each one.

Questions and Considerations

  1. Its a little misleading to accept a schema but then not use it for filtering. Whats the purpose of configuring a schema in the current state? Does it have something to do with permissions or credit usage because mine worked the same without that setting?
  2. Was it intentional that we wanted to support syncing data from multiple schemas in the same job? Thats probably more flexible but one advantage of using the schema settings as a filter would be to avoid having to use the fully qualified table name for the tables array. Not a huge benefit but could be a quality of life thing.
  3. If someone provides a tables array wouldnt it be better to flip the logic of the discovery step and search specifically for those tables instead of iterating all available schemas/tables?
@pnadolny13
Copy link
Contributor Author

After a second test it looks like it's not repeating the same behavior. I'll do more testing but it seems like maybe it's only doing this when my tables selection is not finding a match.

@kgpayne
Copy link
Collaborator

kgpayne commented Jun 27, 2023

@pnadolny13 any update on this?

@nidhi-akkio
Copy link
Contributor

create a PR to address this: #36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants