Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL validation is running too long, how to cut it into different pieces #764

Open
Folux opened this issue Feb 5, 2024 · 1 comment
Open

Comments

@Folux
Copy link

Folux commented Feb 5, 2024

Hey,

I'm running the SQL checks on a huge project and after 3 hours my process runs into a timeout. I'm trying to understand how I can make the SQL validation smaller because it should not take multiple hours to run, even if the project is bigger.

I'm running only on incremental changes on a branch. The branch did not change anything related to the LookML so it should not run any SQL checks in my understanding.

This is my command:
spectacles content --project huge_project --branch my-spectacles-test --incremental --remote-reset -v

I can see that it is validating hundreds of queries in the logs:

Waiting to acquire a query slot
Creating async query for delivery/provider_provider/*
Attempting to get results for 3 query tasks
Query for delivery/provider_provider/* created as query 38363719
Running query Query(explore=provider_provider n=4) [qid=38363719]
Starting query 38363719
Query task 07be71e976c618a783d7b6dd99f85c92 status is: running
Query task 2d4c7f836991a24f5dfcf6caab45f496 status is: running
Query task bbb856146bb5cc5f6b48f73f8df12b45 status is: error
Query 38363719 is running under query task fb2889c3e7e866bc58ae21806fbdd2b9
Waiting to acquire a query slot
Creating async query for delivery/provider_provider/*
Attempting to get results for 3 query tasks
Query for delivery/provider_provider/* created as query 38363673
Running query Query(explore=provider_provider n=4) [qid=38363673]

In other projects, the SQL check runs for a few minutes. I can't really understand why it runs for several hours in this project, and why it checks so many things if nothing has changed.

I could not find much in the documentation, so can you how to configure this huge SQL validation, so it does not want to check everything and runs faster?

@DylanBaker
Copy link
Collaborator

Hi @Folux. Thanks for opening the issue.

Do you know if this project has any SQL errors in production? I ask as this can significantly adversely affect run times.

Our docs could probably be clearer on this, but the way we check for errors incrementally is as follows:

  • Generate a single SQL query for each explore with all dimensions selected in the branch/commit being tested.
  • Generate a single SQL query for each explore with all dimensions selected in the branch being merged into (typically production)
  • Find the explores where the SQL has been changed to identify those that need to have the queries run for them.
  • Run the single large SQL query for all the changed explores
  • If the explore has an error drill down to the dimension level.
  • For each dimension with an error, generate the SQL for that dimension in the branch being merged into (typically production) to understand if the error to the dimension is "new".

We do it this way because it is typically quicker to run the queries down to the dimension level than generating the SQL for all dimensions (though @joshtemple would should validate if this is still the case with async).

As a result of this, even if an explore doesn't have new errors, we will need to drill down to the dimension level in the explore to ascertain whether that is the case or not. I could be wrong (and please correct me if I am) but I expect there might be production errors on some large explores that are affected by your changes which is causing the long run time.

If I'm wrong, I'm not sure what the other reasons could be for long runtimes, but would be happy to dig in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants