-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BadRequest when performing merge after write in the same Spark session #964
Comments
More context: If I switch the writing of |
which spark version and connector version are you using? |
When using the DIRECT write method it my take a few seconds until the data appears in the table. Have you tried to use the INDIRECT write method? |
The data does appear if I query it, but I still get the error above if I try to MERGE into the table after the first write.
Yes, that's what I mentioned in my second comment above. If I create/write the table using the indirect method, then I don't get the error with the subsequent MERGE. |
Spark 3.3 and connector 0.30.0. |
@yirutang can you please have a look? |
We are working on making data after commit to be on formal storage, but that is WIP. In the meantime, commit will trigger conversion, and the time is less than publicly documented streaming delay. Our study shows 99% conversion will be done in 2 min and the longest tail we saw is 25 min. |
Given that, I suggest to switch to the INDIRECT mode for the time being, or add a retry logic. |
I have a use case where we need to use MERGE INTO, but because the connector doesn't support it natively (there's an issue for it #575) we do this by writing the delta dataframe to a temp table and then use the python-bigquery library to execute a MERGE sql query:
However, this results in the following error:
Relevant part of the stack trace:
I was able to find some relevant docs about availability of data after streaming into BigQuery, which states that it may take up to 90 minutes to be available, so I put the above code in a retry loop that retried for 2 hours, and it still hits the same issue. Also if I execute the first write and the second merge steps in two separate runs back to back, it works fine. So I don't think the above documentation page is relevant to this issue.
The text was updated successfully, but these errors were encountered: