Skip to content

Make drain delay configurable via drainDelaySeconds field#96

Merged
lukebond merged 2 commits intorestatedev:mainfrom
mt-bt:configurable-drain-delay
Mar 17, 2026
Merged

Make drain delay configurable via drainDelaySeconds field#96
lukebond merged 2 commits intorestatedev:mainfrom
mt-bt:configurable-drain-delay

Conversation

@mt-bt
Copy link
Contributor

@mt-bt mt-bt commented Mar 10, 2026

Summary

  • Adds an optional drainDelaySeconds field to RestateSpec so the delay before removing old drained versions can be configured
  • Defaults to 300 seconds (5 minutes) for backward compatibility
  • Applies to both ReplicaSet and Knative deployment modes

Motivation

There was already a // todo configurable? comment on the hardcoded TimeDelta::minutes(5) in replicaset.rs.

In resource-constrained environments, a shorter drain delay lets the operator evict old pods more quickly, freeing capacity for incoming ones.

@tillrohrmann
Copy link
Contributor

Thanks for creating this PR @mt-bt. We are going to take a look at it. cc @lukebond for visibility.

@tillrohrmann tillrohrmann requested a review from lukebond March 12, 2026 08:46
Copy link
Contributor

@lukebond lukebond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this @mt-bt, really clean change. it's a purely additive optional field with nullable: true and skip_serializing_if, so existing RestateDeployment resources are completely unaffected since they'll just get the 300s default; appreciated.

a few things before merging:

  1. crd/RestateDeployment.pkl file needs updating to match. add drainDelaySeconds: Int? to the Restate class around line 67, alongside useHttp11. you can look at how useHttp11 was added there as a reference. then run just generate-examples to make sure everything stays in sync.
  2. we have a release notes process now, please add a file at release-notes/unreleased/96-configurable-drain-delay.md. see release-notes/README.md for the format, the "New Feature Example" section is probably closest to what you need here.
  3. in knative.rs you nicely log the drain delay when scheduling removal, but the equivalent code path in replicaset.rs doesn't log anything. could you add a similar info!() line there so both deployment modes have consistent logging?

for the CLA you just need to comment on the PR with the exact text:

I have read the CLA Document and I hereby sign the CLA

(CLA Document is here)

@mt-bt mt-bt force-pushed the configurable-drain-delay branch from 1e95544 to 037de63 Compare March 16, 2026 14:03
@github-actions
Copy link

github-actions bot commented Mar 16, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@mt-bt
Copy link
Contributor Author

mt-bt commented Mar 16, 2026

I have read the CLA Document and I hereby sign the CLA

@mt-bt
Copy link
Contributor Author

mt-bt commented Mar 16, 2026

@lukebond thanks for the review! I've addressed all three points:

  • Updated crd/RestateDeployment.pk with drainDelaySeconds
  • Added release notes at release-notes/unreleased/96-configurable-drain-delay.md
  • Added info! logging with structured fields in replicaset.rs to match knative.rs

When running just generate-example, I get the following diff, I suspect it is not related to this PR and might have been out of sync already? I didn't check this diff in, but let me know if you do want it in.

+++ b/crd/examples/restatecluster.yaml
@@ -3,11 +3,7 @@ kind: RestateCluster
metadata:
  name: restate-test
spec:
-  cluster:
-    autoProvision: true
  compute:
-    image: restatedev/restate:1.5.6
+    image: restatedev/restate:1.4
  storage:
    storageRequestBytes: 2147483648
-  config: |
-    auto-provision = false

@lukebond
Copy link
Contributor

lukebond commented Mar 17, 2026

@mt-bt thanks. looks good.

that pkl diff suggests the changes in #77 were missing updates to the schema. i'll sort that out separately, we can merge this change as is without adding that generated output as it would remove things.

@lukebond lukebond merged commit 2b50c29 into restatedev:main Mar 17, 2026
2 of 3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 17, 2026
@lukebond
Copy link
Contributor

@mt-bt i have a couple of other changes lined up and, once merged, i'll do another release

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants