-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-31760 Protect against delta save failures #18632
HPCC-31760 Protect against delta save failures #18632
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-31760 Jirabot Action Result: |
@ghalliday @mckellyln - please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments. Looks good, and could merge as-is.
{ | ||
StringBuffer fname(dataPath); | ||
fname.append(DELTAINPROGRESS); | ||
OwnedIFile deltaIPIFile = createIFile(fname.append(DELTAINPROGRESS).str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to remove this append since you added one in the line above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, well spotted. I'll revert it to how it was (remove line 8803)
dali/base/dasds.cpp
Outdated
} | ||
catch (IException *e) | ||
{ | ||
LOG(MCoperatorWarning, unknownJob, e, "save: failure whilst committing deltas to disk! Remdial action must be taken"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: remedial
@ghalliday - please see commit #2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved - altho I think more here than I fully understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jakesmith please squash. I'll rescan to double check.
If there are temporary failures (e.g. failures to write to disk), then protect against the delta writer thread from exiting. Retry all disk actions. NB: the initial inspiration for this was that there was a failure due to Dali running out of file handles. Which itself is probably caused by the flood of connections from AKS VM's to Dali. Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
71fd4ba
to
b36734a
Compare
@ghalliday - squashed. |
If there are temporary failures (e.g. failures to write to disk), then protect against the delta writer thread from exiting. Retry all disk actions.
NB: the initial inspiration for this was that there was a failure due to Dali running out of file handles. Which itself is probably caused by the flood of connections from AKS VM's to Dali.
Type of change:
Checklist:
Smoketest:
Testing: