Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry logic for COPY operation in SnowflakeCopyBatchInsert with configurable max retries #83

Merged
merged 3 commits into from
Jan 14, 2025

Conversation

kentoyoshida
Copy link
Contributor

Overview

This pull request introduces retry logic for the COPY operation in the SnowflakeCopyBatchInsert class and ensures better handling of transient errors during data load processes.

Detailed Description

  1. Retry Logic:
    • Implemented a retry mechanism for the COPY operation in the CopyTask class.
    • Added a new configuration parameter max_copy_retries to allow users to define the maximum number of retry attempts. The default value is set to 3.
    • Included the method isRetryableForCopyTask to classify errors as retryable based on specific error messages (e.g., "JDBC driver encountered communication error").
  2. Configuration Updates:
    • Modified the SnowflakeOutputPlugin to accept and pass the new max_copy_retries parameter.
  3. Logging Improvements:
    • Enhanced logging for retry attempts, including error messages and retry counts, for better debugging and monitoring.

Related Issues

This changes addresses potential failures in the COPY operation caused by transient communication errors. Previously, such errors could lead to incomplete data loads without any retry attempts.

Impact

  • Improved Reliability:
    This update ensures better handling of transient errors and increases the likelihood of successful data loads, even under unstable network conditions.
  • Breaking Changes:
    There are no significant breaking changes, as the default value for max_copy_retries ensures backward compatibility.

Additional Notes

  • Please review the retry mechanism and configuration changes to confirm they align with project requirements and coding standards.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…onfigurable max retries

- Introduced a new configuration parameter max_copy_retries to specify the maximum number of retries for the COPY operation in Snowflake.
- Updated SnowflakeOutputPlugin to include max_copy_retries configuration.
- Modified SnowflakeCopyBatchInsert to handle retries for the COPY operation using the newly added parameter.
- Added a method isRetryableForCopyTask to determine whether a COPY error is retryable.
- Adjusted constructor and method signatures to include maxCopyRetries where necessary.
- Ensured proper resource cleanup and retry handling for the COPY task.
- Improved logging for retries to provide more context during errors.
Copy link
Contributor

@yas-okadatech yas-okadatech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@chikamura chikamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a description of max_copy_retries in the configuration section of README.md, together with max_upload_retries?

@chikamura
Copy link
Contributor

chikamura commented Jan 12, 2025

Memo: There is already a class in embulk that handles retries.

https://dev.embulk.org/embulk-util-retryhelper/0.8.0/javadoc/org/embulk/util/retryhelper/RetryExecutor.html
https://github.com/embulk/embulk-output-jdbc/blob/1a04c539bf879f8771245178f9bbc0d2cee71d30/embulk-output-jdbc/src/main/java/org/embulk/output/jdbc/AbstractJdbcOutputPlugin.java#L1324-L1329

But in the existing code, they're making changes like this one.

public Void call() throws IOException, SQLException, InterruptedException {
int retries = 0;
try {
long startTime = System.currentTimeMillis();
// put file to snowflake internal storage
SnowflakeOutputConnection con = (SnowflakeOutputConnection) connector.connect(true);
while (true) {
try {
logger.info(
String.format(
"Uploading file id %s to Snowflake (%,d bytes %,d rows)",
snowflakeStageFileName, file.length(), batchRows));
FileInputStream fileInputStream = new FileInputStream(file);
con.runUploadFile(stageIdentifier, snowflakeStageFileName, fileInputStream);
break;
} catch (SQLException e) {
retries++;
if (retries > this.maxUploadRetries) {
throw e;
}
logger.warn(
String.format(
"Upload error %s file %s retries: %d", e, snowflakeStageFileName, retries));
Thread.sleep(retries * retries * 1000);
}
}
double seconds = (System.currentTimeMillis() - startTime) / 1000.0;
logger.info(
String.format("Uploaded file %s (%.2f seconds)", snowflakeStageFileName, seconds));
} finally {
file.delete();
}
return null;
}
}

@chikamura chikamura self-requested a review January 12, 2025 05:04
Copy link
Contributor

@chikamura chikamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, could you add a description of max_copy_retries in the configuration section of README.md, together with max_upload_retries?

@chikamura chikamura self-requested a review January 13, 2025 05:42
Copy link
Contributor

@chikamura chikamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error occurs because the connection that was closed on the second retry is being accessed again.

- Resolved an issue where SnowflakeOutputConnection was prematurely closed during retries in the COPY operation.
- Refactored the code to use try-with-resources for managing SnowflakeOutputConnection, ensuring a new connection is created for each retry.
- Removed the explicit con.close() call in the finally block, as connections are now safely managed within the retry loop.
…figuration options

- Added descriptions for the max_upload_retries and max_copy_retries configuration options in the README.
- max_upload_retries: Specifies the maximum number of retries for file uploads to Snowflake. Default value is 3.
- max_copy_retries: Specifies the maximum number of retries for COPY operations in Snowflake. Default value is 3. Retries are triggered for transient errors such as communication failures.
- Improved documentation to help users understand the retry configurations and their default behaviors.
@kentoyoshida kentoyoshida force-pushed the add-rety-for-copy-task branch from 6054e17 to 7a3f515 Compare January 14, 2025 02:59
Copy link
Contributor

@chikamura chikamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kentoyoshida
Copy link
Contributor Author

@chikamura
Thank you very much for your review 🙇
I also appreciate you identifying the connection resource release bug.

The issue where the connection was being closed during retries has been addressed in the following commit:
5a6318a

The README update has been handled in the following commit:
7a3f515

@kentoyoshida kentoyoshida merged commit f612567 into main Jan 14, 2025
1 check passed
@kentoyoshida kentoyoshida deleted the add-rety-for-copy-task branch January 14, 2025 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants