Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser Error: syntax error at or near "[" on read_json() #491

Open
JasperHDB opened this issue Dec 10, 2024 · 2 comments
Open

Parser Error: syntax error at or near "[" on read_json() #491

JasperHDB opened this issue Dec 10, 2024 · 2 comments

Comments

@JasperHDB
Copy link

JasperHDB commented Dec 10, 2024

My model utilizes the read_json() function to fetch JSON from an API like this:

WITH source AS (
    SELECT *
    FROM read_json(
        'https://url/?parameter={{ var("parameter") }}', 
        auto_detect=true, format='auto', 
        maximum_object_size=104857600
    )
)

SELECT column
FROM source

The profile I use looks like this:

dbt_duckdb_small:
  target: "{{ env_var('DBT_TARGET', 'nonprod')}}"
  outputs:
    prod:
      type: duckdb
      use_credential_provider: aws
      settings:
          # Some S3 settings
      schema: "{{ env_var('DBT_TARGET_SCHEMA', 'schema_prod')}}"
    nonprod:
      type: duckdb
      use_credential_provider: aws
      settings:
          # Some S3 settings
      schema: "{{ env_var('DBT_TARGET_SCHEMA', 'schema_nonprod')}}"

Running the model with target prod works, but the second time I run it on target prod, I get the following error:

➜  dbt run -s <model> --profile=dbt_duckdb_small --target=prod
10:23:28  Running with dbt=1.8.7
10:23:29  Registered adapter: duckdb=1.9.1
10:23:29  Found 75 models, 36 sources, 785 macros
10:23:29  
10:23:29  Concurrency: 1 threads (target='prod')
10:23:29  
10:23:29  1 of 1 START sql external model <model> ............. [RUN]
10:24:44  1 of 1 ERROR creating sql external model <model> .... [ERROR in 75.46s]
10:24:44  
10:24:44  Finished running 1 external model in 0 hours 1 minutes and 15.59 seconds (75.59s).
10:24:44  
10:24:44  Completed with 1 error and 0 warnings:
10:24:44  
10:24:44    Runtime Error in model <model>
  Parser Error: syntax error at or near "["
10:24:44  
10:24:44  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

Subsequent runs also fail. When I switch target to nonprod, the run succeeds again. When running afterwards with target nonprod, it fails again. It only seems to succeed on the run where I switch target. Am I missing something? The API endpoint is valid and doesn't contain malformed JSON.

I'm not super experienced with dbt or the dbt-duckdb adapter. If any more info is needed please ask.

@jwills
Copy link
Collaborator

jwills commented Dec 10, 2024

Huh-- what's the value of the {{ var("parameter") }} bit that is getting compiled in the URL string? That would be the first thing I would look at; in general you can always look at the dbt generated sql for a model with the refs and vars substituted in target/compiled directory and then the actual SQL that was executed for it (with the create/alter/etc. statements included) in the target/run directory; my guess is that looking there on both successful and failed runs will lead you to sussing out what is going on here.

@JasperHDB
Copy link
Author

JasperHDB commented Dec 10, 2024

Hey jwills, thanks for the quick response. I could have been more clear on the URL part. The parameters are practically dates:

http://domain/endpoint?startDate={{ start_date }}&endDate={{ end_date }}

where start_date and end_date are configured like this at the top of the script:

{% set start_date = modules.datetime.datetime.now().strftime('%Y-%m-%d') %}

So in the target/compiled directory I find the compiled url like this, as the endpoint expects:

http://domain/endpoint?startDate=2024-12-10&endDate=2024-12-17

Also worth mentioning is that I have another model which can produce the same exception, here the url looks like this:

https://otherdomain/endpoint?startDate={{ var("year") }}-{{ var("month") }}-{{ var("day") }}

Where all variables are passed in the dbt run command with --vars '{day: "10", month: "12", year: "2024"}'.
That model also only succeeds when running it with a specific target for the first time.

When looking at target/compiled and target/run on succeeding/failing runs, the compiled scripts in target/compiled is the same, with valid URL's.
The scripts in target/run vary: on a fail, no code of the run is present, but I think this intended? On a success, the script is this, like all my other scripts:

create or replace view memory.<schema>.<model>__dbt_int as (
  select * from read_parquet('s3://bucket/path/*/*/*/*.parquet', union_by_name=False)
);

The same structure, except for <model> and the S3 bucket path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants