-
Notifications
You must be signed in to change notification settings - Fork 89
[PLUGIN-1715] Implement retry feature for BigQuery Execute Plugin #1334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PLUGIN-1715] Implement retry feature for BigQuery Execute Plugin #1334
Conversation
57ae3ae
to
b74a689
Compare
import io.cdap.plugin.gcp.bigquery.sink.BigQuerySinkUtils; | ||
import io.cdap.plugin.gcp.bigquery.util.BigQueryUtil; | ||
import io.cdap.plugin.gcp.common.CmekUtils; | ||
import io.cdap.plugin.gcp.common.GCPUtils; | ||
import org.jetbrains.annotations.TestOnly; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use jetbrains annotations in production code. This may bring unnessesary dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to mark a method as visible only for the purposes of testing, use @VisibleForTesting
which is already used elsewhere in the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the jetbrains annotation, using @ VisibleForTesting
|
||
private Config config; | ||
@TestOnly | ||
public BigQueryExecute(Config config) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually package-only access is enough for testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding an explicit config would effectively remove a no-argument constructor. Which constructor will be used for the production run? If this one, it should not be marked as TestOnly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made this package only.
@@ -69,8 +76,13 @@ public final class BigQueryExecute extends AbstractBigQueryAction { | |||
private static final Logger LOG = LoggerFactory.getLogger(BigQueryExecute.class); | |||
public static final String NAME = "BigQueryExecute"; | |||
private static final String RECORDS_PROCESSED = "records.processed"; | |||
private final Config config; | |||
private static final Set<String> retryOnReason = ImmutableSet.of("jobBackendError", "jobInternalError"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All constant should be uppercased
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the strings here should be in their own constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated !
Duration initialRetryDuration = Duration.ofSeconds(1); | ||
Duration maxRetryDuration = Duration.ofSeconds(32); | ||
int multiplier = 2; | ||
int maxRetryCount = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this should be configurable. Retries must be disableable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Please add a Retry Configuration
section in the UI and allow the users to fine-tune this behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tivv @fernst It was mentioned previously on the bug ticket also that Bigquery SLA is for 32 seconds and that is the reason we have added these fix properties to go upto that limit. Same was confirmed in design doc also.
Can you please confirm in design doc also. Shared with you.
Also have a look at this. https://cloud.google.com/bigquery/sla#:~:text=%22-,Back%2Doff%20Requirements,-%22%C2%A0means%2C%20when
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few points here:
- Customer may not wish to wait so long. Yep, BQ may recover, but it's also possible that it's too long for given customer
- You are changing the behaviour. We must have an option to disable this behavior
- SLAs are statistical. They may still be violated
- SLAs are point in time. They may change and we need to be able to adjust the bahvior without code change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
|
||
@Test | ||
public void testExecuteQueryWithExponentialBackoffSuccess() throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want immediate retries, otherwise this test would be waiting unnesessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed !
bq.executeQueryWithExponentialBackoff(bigQuery, queryJobConfiguration, context); | ||
} | ||
|
||
@Test(expected = java.lang.RuntimeException.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you expecting just a RuntimeException here? It that it should go through retries and give some reasonable error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Vitalii's suggestions here. Please address.
Failsafe.with(getRetryPolicy()).run(() -> executeQuery(bigQuery, queryConfig, context)); | ||
} catch (RuntimeException e) { | ||
LOG.error("Failed to execute BigQuery job. Error: {}", e.getMessage()); | ||
throw new BigQueryJobExecutionException("Failed to execute BigQuery job. Reason: " + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This way you are loosing all the additional information from original BigQueryException stack trace. There are only some case where you would want to hide all details of the underlying exception by using e.getMessage instead of attaching a root cause. I don't see why would we want to have it here. Same applies to the catch clause in executeQuery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated !
186fa05
to
5475c5b
Compare
c759d01
to
35987f2
Compare
Changes Squashed ! |
src/test/java/io/cdap/plugin/gcp/bigquery/action/BigQueryExecuteTest.java
Outdated
Show resolved
Hide resolved
b859ad2
to
b8d5c96
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
bq.executeQueryWithExponentialBackoff(bigQuery, queryJobConfiguration, context); | ||
}); | ||
String actualMessage = exception.getMessage(); | ||
Assert.assertEquals("Failed to execute BigQuery job. Reason: " + mockErrorMessageNoRetry, actualMessage); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, basically we say "Failed to execute BigQuery job. Reason: Job execution failed with error: $error" This is what I want to avoid as it's saying same thing twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be in top level catch we should just find the correct cause and rethrow in instead of wrapping it again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i have removed the wrapping of error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this way you are loosing the underlying error. You should either:
- do visa versa and pop up only the underlying error message "Job execution failed with error: $error"
- prepend it with some text that adds value, e.g. "Job retries failed after X tries. Last try failure: Job execution failed with error: $error".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup I am doing the 1st one , by removing wrapping I ment removing the extra messages string that was present before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you are hiding the underlying "$error", so the only thing that we can see is that retries exhausted without much information about the underlying error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As i already have logging when nth retry is made or the policy is out of retry should i just pop the error and throw that ?
something like this
try {
Failsafe.with(getRetryPolicy()).run(() -> executeQuery(bigQuery, queryConfig, context));
} catch (FailsafeException e) {
throw e.getCause();
}
Also i could drop the catch (RuntimeException e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's option #1 in the list above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would check if e.getCause is not null as generally not every failure is an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔
if (e.getCause() != null) {
throw e.getCause();
}
throw e;
b76ef45
to
34a7113
Compare
Thanks @tivv , i have made the changes and squashed them , hopefully you can approve this now. |
34a7113
to
ddadce4
Compare
widgets/BigQueryExecute-action.json
Outdated
"value": "false", | ||
"label": "NO" | ||
}, | ||
"default": "false" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your point makes a lot of sense. Setting this behavior to true by default aligns with the goal of enhancing plugin resilience and reducing intermittent failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we usually have two things:
- Default for UI. This is what we want default experience to be. In this case it should be "true"
- Default for backend (when option is not set). This is usually how things worked before to ensure backward compatibility unless we have explicit OK from product to break backward compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itsankit-google we had a discussion with deepinder, the UI fields should be hidden and true by default !
ddadce4
to
7a56dad
Compare
New changes include
|
7a56dad
to
2793ac7
Compare
Resolved conflicts caused by PR: #1344 in |
Added BQ Retry
2793ac7
to
a5e3e7b
Compare
Implement retry feature for BigQuery Execute Plugin
Jira : PLUGIN-1715
jobBackendError
,jobInternalError
.Code Changes
dev.failsafe
that handles the retry logic based on the exception raised.Unit Tests