Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: show column description in the table schema #79

Merged
merged 22 commits into from
Dec 26, 2024
Merged

Conversation

ashish10alex
Copy link
Owner

@ashish10alex ashish10alex commented Dec 23, 2024

Add description column to the table schema shown in the compiled query web view view nav bar. From my testing on Dataform graph with approx 280 nodes this functionality adds less than 20ms overhead. Also since computation is performed post the compiled query is displayed, the perceived latency is quite low / imperceptible to human eye.

Other minor changes

  • No need to check BigQuery client if query is undefined or ""
  • If schema is not available for preOps + main query use schema from main query if available. This does not seem have any additional performance penalty due to the function calls being wrapped in Promise.all
  • make changes based on resolution from fix: dry run not working with @@query_label #80

@ashish10alex ashish10alex marked this pull request as draft December 23, 2024 13:04
@ashish10alex ashish10alex changed the title feat: show column description in the schema table feat: show column description in the table schema Dec 23, 2024
@ashish10alex ashish10alex marked this pull request as ready for review December 23, 2024 17:14
@HampB
Copy link
Collaborator

HampB commented Dec 23, 2024

I've had some issues where the schema is not loading, had it before aswell but not put much energy trying to find out why.
Any tips where to put down some logging localy to figure it out?

Currently it's just blank like this:
image

@HampB
Copy link
Collaborator

HampB commented Dec 23, 2024

It now showed up when looking at another model. But is now stuck on showing that models schema even when i swap to another model.

@ashish10alex
Copy link
Owner Author

ashish10alex commented Dec 23, 2024

Hi @HampB , interesting issue. The easiest way to figure out if the schema is even retrieved from your BigQuery job is by printing out this this line

https://github.com/ashish10alex/vscode-dataform-tools/blob/add_col_desc/src/bigqueryDryRun.ts#L56


I'd also add a log statement to print out compiledQuerySchema variable here

https://github.com/ashish10alex/vscode-dataform-tools/blob/add_col_desc/src/views/register-preview-compiled-panel.ts#L346

compiledQuerySchema should print out an object of type compiledQuerySchema

--

The only reason the schema would be stuck even if you change the active editor is if the next active editor is of type operation or is not a sqlx file. Otherwise it should.


Please let me know what you get after trying the above

@HampB
Copy link
Collaborator

HampB commented Dec 23, 2024

Thanks @ashish10alex, seems like job.metadata.statistics.query.schema is undefined after a dry run. A bit strange, especially since it works sometimes.

@HampB
Copy link
Collaborator

HampB commented Dec 23, 2024

Debugging the job variable seems to show the same, schema is missing
image

@HampB
Copy link
Collaborator

HampB commented Dec 23, 2024

And if i look at a model where the schema actually loads, i cant seem to find any differences. They are both of the type "Table", have the same dataset, same project and so on...

@ashish10alex
Copy link
Owner Author

ashish10alex commented Dec 23, 2024

interesting, I do not have a single instance where it does not work for me. It even works on my personal gcp project. Is there any other key in the job object which has the schema ?

--

It might be worth trying the api in isolation e.g take the code where we get the job and do try run put it in a test.js file and execute it in isolation with a a query e.g. select 1 as someColumn and see if you get the schema as part of job object ?

@HampB
Copy link
Collaborator

HampB commented Dec 23, 2024

It seems like it's actually the entire dry run that is not completing successfully, or perhaps it’s not being awaited correctly for the problematic models.

When I add a breakpoint just before the return statement (here) on a non-working model, I need to press "continue" 2–3 times. During this process, the billed bytes remain at 0, and no schema appears.

In contrast, when I do the same on a working model, the first 2–3 continues also show 0 bytes billed and no schema. However, additional breaks occur where the billed bytes start increasing, and the schema is eventually populated.

I'll do some further testing tomorrow

@benjaminwestern
Copy link
Collaborator

Testing this now gents. I am seeing the same issue, so just seeing if I can unpack it as well.

@ashish10alex
Copy link
Owner Author

thanks @benjaminwestern , is it possible to create a minimum reproducible example. e.g does adding the following model test.sqlx give you the following output ?

config {
  type: 'table',
  assertions: {
    uniqueKey: ["uniqueId"]
  }
}


SELECT
  1                AS UNIQUEID
  , "myData"       AS SOMECOLUMN
  , CURRENT_DATE() AS TODAYS_DATE
image

@ashish10alex
Copy link
Owner Author

Debugging the job variable seems to show the same, schema is missing image

@HampB in this example I notice that statement type is SCRIPT. From the models I have of type "table" and "view" I have them come out as SELECT type from the dryRun api . I can only get SCRIPT type when I use "operation" type

@ashish10alex
Copy link
Owner Author

Hi @HampB / @benjaminwestern, I believe I have found and fixed the issue that you guys might have encountered. The schema might not have been retrieved due to there being a pre operation before the main query. Since earlier we did the dry run on preOps + main query, the schema would not be retrieved if the preOps query was of type script. I have modified the dryRun Promise.all call to include the main query as well [commit] so this should now solve the issue experienced. Appreciate it if you could test the latest version. Minimal example code I have tested.

config {
  type: 'table',
}

pre_operations {
  DROP TABLE IF EXISTS ${self()};
}

SELECT
  1                AS UNIQUEID
  , "myData"       AS SOMECOLUMN
  , CURRENT_DATE() AS TODAYS_DATE

@HampB
Copy link
Collaborator

HampB commented Dec 24, 2024

Ah, ofcource... Looking good now!
But still seems like none of my incremental models are working as inteded. Not sure why yet.

A super simple example like this works;

config {
  type: 'incremental',
}
pre_operations {
  SELECT 2
}

SELECT 1

@HampB
Copy link
Collaborator

HampB commented Dec 24, 2024

It seems to have to do with setitng @@query_label in the pre_operations.
For example, a model with this operations block does not work, but if i remove the @@query_label statement it works.

pre_operations {
  DECLARE event_date_checkpoint DEFAULT (SELECT TIMESTAMP("2023-01-01"));

  SET @@query_label = "${query_label.from_object(labels)}";

  ${when(incremental(), 
  `SET event_date_checkpoint = (SELECT TIMESTAMP_SUB(MAX(ModifiedAt), INTERVAL 3 DAY) FROM ${self()});` 
)}
}

Where query_labe.from_object is:

 var function from_object(labels) {
    return Object.entries(labels).map(([key, value]) => `dataform_${key}:${value}`).join(",")
}

A bit short on time today, but can keep investigating later on.

@ashish10alex
Copy link
Owner Author

@HampB , thanks for the PR to fix the issue #80


I have merged the changes made in that PR to this branch. Can you please do a final review to make sure things are working on your end before we merge this PR ?

Copy link
Collaborator

@HampB HampB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, nice feature!

@ashish10alex ashish10alex merged commit 0bbd35b into main Dec 26, 2024
@ashish10alex ashish10alex deleted the add_col_desc branch December 26, 2024 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants