-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support adaptive update interval for low resolution ts #1484
Support adaptive update interval for low resolution ts #1484
Conversation
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
56c7714
to
19c6546
Compare
// time. | ||
// WARNING: This method does not guarantee whether the generated timestamp is legal for accessing the data. | ||
// Neither is it safe to use it for verifying the legality of another calculated timestamp. | ||
// Be sure to validate the timestamp before using it to access the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about deprecating the usages of this interface in the future, or integrate the check of ValidateSnapshotReadTS
into it?
It's more robust to return a verifed timestamp.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered this. It makes sence, but it can't be easily done due to the current usage in TiDB repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify GetStaleTimestamp
to return the latest fetched timestamp, without adding the arrival duration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @MyonKeminta , the refactor done could be done in another PR as it requires a lot of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify
GetStaleTimestamp
to return the latest fetched timestamp, without adding the arrival duration?
What you said is exactly what GetLowResolutionTSO
does. If we deprecate the GetStaleTimestamp
method, the change can be complictated and difficult considering the current usage in TiDB repo. Simply changing all usages of GetStaleTimestamp
to GetLowResolutionTSO
may significantly reduce the precision of user specified staleness.
|
||
adaptiveUpdateIntervalState struct { | ||
// The mutex to avoid racing between updateTS goroutine and SetLowResolutionTimestampUpdateInterval. | ||
mu sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving the to be protected fields into the mu
like
mu {
var1
var2
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it's actually not protected. There are also some accesses without locking.🤔
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
oracle/oracles/pd.go
Outdated
o.adaptiveLastTSUpdateInterval.Store(int64(configuredInterval)) | ||
} | ||
if o.adaptiveUpdateIntervalState.state != adaptiveUpdateTSIntervalStateUnadjustable { | ||
logutil.Logger(context.Background()).Info("update low resolution ts interval is not being adaptive because the configured interval is too short", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logutil.Logger(context.Background()).Info("update low resolution ts interval is not being adaptive because the configured interval is too short", | |
logutil.BgLogger().Info("update low resolution ts interval is not being adaptive because the configured interval is too short", |
oracle/oracles/pd.go
Outdated
prevAdaptiveInterval := currentAdaptiveInterval | ||
currentAdaptiveInterval = max(requiredStaleness-adaptiveUpdateTSIntervalShrinkingPreserve, minAllowedAdaptiveUpdateTSInterval) | ||
o.adaptiveLastTSUpdateInterval.Store(int64(currentAdaptiveInterval)) | ||
logutil.Logger(context.Background()).Info("shrink low resolution ts update interval immediately", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
oracle/oracles/pd.go
Outdated
zap.Duration("requestedStaleness", requiredStaleness), | ||
zap.Duration("prevAdaptiveUpdateInterval", prevAdaptiveInterval), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why those log field names don't match the field name...
oracle/oracles/pd.go
Outdated
|
||
func (o *pdOracle) getCurrentTSForValidation(ctx context.Context, opt *oracle.Option) (uint64, error) { | ||
ch := o.tsForValidation.DoChan(opt.TxnScope, func() (interface{}, error) { | ||
//metrics.ValidateReadTSFromPDCount.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there should be a metrics here.
oracle/oracles/pd.go
Outdated
// current ctx. | ||
res, err := o.GetTimestamp(context.Background(), opt) | ||
// After finishing the current call, allow the next call to trigger fetching a new TS. | ||
o.tsForValidation.Forget(opt.TxnScope) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call Forget
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's not needed. I just realized that I misunderstood the usage of singleflight.
// If the call that triggers the execution of this function is canceled by the context, other calls that are | ||
// waiting for reusing the same result should not be canceled. So pass context.Background() instead of the | ||
// current ctx. | ||
res, err := o.GetTimestamp(context.Background(), opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
res, err := o.GetTimestamp(context.Background(), opt) | |
res, err := o.GetTimestamp(ctx, opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using context.Background() here is expected, as said in the above comments.
estimatedCurrentTS, err := o.getStaleTimestamp(opt.TxnScope, 0) | ||
if err != nil { | ||
logutil.Logger(ctx).Warn("failed to estimate current ts by getSlateTimestamp for auto-adjusting update low resolution ts interval", | ||
zap.Error(err), zap.Uint64("readTS", readTS), zap.String("txnScope", opt.TxnScope)) | ||
} else { | ||
o.adjustUpdateLowResolutionTSIntervalWithRequestedStaleness(readTS, estimatedCurrentTS, time.Now()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I don't understand why we need those logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We estimate the gap between the current time and the readTS, to see if the gap is shorter than the update interval, so that we need to shrink the update interval.
// time. | ||
// WARNING: This method does not guarantee whether the generated timestamp is legal for accessing the data. | ||
// Neither is it safe to use it for verifying the legality of another calculated timestamp. | ||
// Be sure to validate the timestamp before using it to access the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify GetStaleTimestamp
to return the latest fetched timestamp, without adding the arrival duration?
oracle/oracles/pd.go
Outdated
|
||
configuredInterval := time.Duration(o.lastTSUpdateInterval.Load()) | ||
currentAdaptiveInterval := time.Duration(o.adaptiveLastTSUpdateInterval.Load()) | ||
if configuredInterval <= minAllowedAdaptiveUpdateTSInterval { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If configuredInterval <= minAllowedAdaptiveUpdateTSInterval
is true, then adaptiveLastTSUpdateInterval
can be set to configuredInterval
. The variable minAllowedAdaptiveUpdateTSInterval
doesn't behave like its name.
How about defining a minUpdateTSInterval
and both lastTSUpdateInterval
and adaptiveLastTSUpdateInterval
should less than minUpdateTSInterval
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name minAllowedAdaptiveUpdateTSInterval
expected to limit the adaptive update ts interval, to avoid it automatically performs the update too frequently. If the user configures a short interval intentionally, we adopt the user's choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way I tried to refine the code of the nextUpdateInterval function, which extracted each branches into small closures and unified the log printing. Please see if it's better than the previous one.
const ( | ||
// minAllowedAdaptiveUpdateTSInterval is the lower bound of the adaptive update ts interval for avoiding an abnormal | ||
// read operation causing the update interval to be too short. | ||
minAllowedAdaptiveUpdateTSInterval = 500 * time.Millisecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The minimal value of tidb variable tidb_low_resolution_tso_update_interval
is 10ms, why minAllowedAdaptiveUpdateTSInterval
is 500ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the adaptive update interval mechanism automatically and silently chooses a too low interval can causing too high frequency of getting ts from PD. But we do not reject it if the user configures it explicitly and the low update interval is just what the user expects.
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
nextState := func(checkFuncs ...func() (adaptiveUpdateTSIntervalState, time.Duration)) time.Duration { | ||
for _, f := range checkFuncs { | ||
state, newInterval := f() | ||
if state == none { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be returned here if the state
is adaptiveUpdateTSIntervalStateUnadjustable
instead of continue processing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If state is not none, it should return at line 454
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
latestTS, err := o.GetLowResolutionTimestamp(ctx, opt) | ||
// If we fail to get latestTS or the readTS exceeds it, get a timestamp from PD to double-check. | ||
// But we don't need to strictly fetch the latest TS. So if there are already concurrent calls to this function | ||
// loading the latest TS, we can just reuse the same result to avoid too many concurrent GetTS calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation is enough I think. This is just a minor suggestion.
Use singleflight to reduce the GetTS calls is reasonable, but it's possible that a valid ts is invalid in this case.
- thread 1 validate current_ts and send a get ts request, PD allocates its latest TSO and the response is slow.
- thread 2 validate current_ts and the singleflight will reuse the thread 1's TSO.
- The GetTS request of thread 1's is responded.
Then thread 2 can get a stale TSO from PD which increases the failure possibility.
Maybe an enhancement is to resend a singleflight GetTS request if readTS > latestTS
to recheck the timestamp in this case.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cfzjywxk, crazycs520 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Ref: pingcap/tidb#56809