Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the Supervisor endpoint to not restart the Supervisor if the spec was unmodified #17707

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

aho135
Copy link
Contributor

@aho135 aho135 commented Feb 8, 2025

Description

This PR adds an optional query parameter called restartIfUnmodified to the /druid/indexer/v1/supervisor endpoint. The caller can optionally set restartIfUnmodified=false so that the supervisor is not restarted if the spec is unchanged. Multiple members of the community mentioned that they maintain their own scripts to check whether the spec has changed before submitting to the endpoint: https://apachedruidworkspace.slack.com/archives/C0303FDCZEZ/p1738017586080509

For those that rely on this endpoint for restarting the supervisor, the behavior remains unchanged as restartIfUnmodified defaults to true.

Release note

Adds an optional query parameter called restartIfUnmodified to the /druid/indexer/v1/supervisor endpoint. Callers can set restartIfUnmodified=false to not restart the supervisor if the spec is unchanged. Example:

curl -X POST --header "Content-Type: application/json" -d @supervisor.json localhost:8888/druid/indexer/v1/supervisor?restartIfUnmodified=false


Key changed/added classes in this PR
  • SupervisorResource
  • SupervisorManager
  • SupervisorResourceTest

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aho135 , thanks for the PR. It makes sense to not update/restart the supervisor if not required.

I have left some minor feedback on the approach.

@@ -166,6 +166,18 @@ public boolean createOrUpdateAndStartSupervisor(SupervisorSpec spec)
}
}

public boolean wasSupervisorSpecModified(SupervisorSpec spec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a short javadoc.
Also, I find the method name to be slightly ambiguous.
Maybe rename to shouldUpdateSupervisor() as that expresses the intent of the usage of this method more clearly.

public Response specPost(
final SupervisorSpec spec,
@Context final HttpServletRequest req,
@QueryParam("restartIfUnmodified") @DefaultValue("true") boolean restartIfUnmodified)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@QueryParam("restartIfUnmodified") @DefaultValue("true") boolean restartIfUnmodified)
@QueryParam("restartIfUnmodified") @DefaultValue("true") boolean restartIfUnmodified
)

Preconditions.checkNotNull(spec.getDataSources(), "spec.getDatasources()");
synchronized (lock) {
Preconditions.checkState(started, "SupervisorManager not started");
return metadataSupervisorManager.wasSupervisorSpecModified(spec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should need to call metadata store here.

  • If the supervisor is active (running or suspended), it would already be in memory in the map SupervisorManager.supervisors. We can compare with the version in memory to determine if the spec has been updated.
  • If the supervisor is not present in the map, that indicates that the supervisor is either terminated or doesn't exist at all. In both the cases, we should just return true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @kfaraz! I thought to do this in MetadataSupervisorManager because the comparison relies on the ObjectMapper in SQLMetadataSupervisorManager

Let me dig more and see how to get the ObjectMapper into SupervisorManager

* @param SupervisorSpec spec being submitted
* @return whether the spec was modified
*/
boolean wasSupervisorSpecModified(SupervisorSpec spec);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably don't need this method. Please see the other comment.

@AmatyaAvadhanula
Copy link
Contributor

AmatyaAvadhanula commented Feb 9, 2025

Thank you for these changes, @aho135!

I think we would benefit from a change where we check if the spec has changed. If it hasn't we still restart the supervisor, but do not go to the metadata store and add an unnecessary entry in the spec history. Otherwise, the flow remains unchanged. I think @kfaraz has suggested this as well.
I believe the supervisor restart itself would be helpful to rollover tasks easily or to get out of an idle supervisor state etc.

I also wanted to understand if the problem was with the metadata operations associated with it including an unneeded entry, or if the supervisor operation is also problematic.

If it is just the first case, is a feature flag really needed?
I believe we should skip the metadata operation and history update as there is no benefit in both cases

If you still believe that the supervisor operation is wasteful, and want to introduce a flag, please add the relevant docs in docs/api-reference/supervisor-api.md.

@aho135
Copy link
Contributor Author

aho135 commented Feb 10, 2025

Thank you for these changes, @aho135!

I think we would benefit from a change where we check if the spec has changed. If it hasn't we still restart the supervisor, but do not go to the metadata store and add an unnecessary entry in the spec history. Otherwise, the flow remains unchanged. I think @kfaraz has suggested this as well. I believe the supervisor restart itself would be helpful to rollover tasks easily or to get out of an idle supervisor state etc.

I also wanted to understand if the problem was with the metadata operations associated with it including an unneeded entry, or if the supervisor operation is also problematic.

If it is just the first case, is a feature flag really needed? I believe we should skip the metadata operation and history update as there is no benefit in both cases

If you still believe that the supervisor operation is wasteful, and want to introduce a flag, please add the relevant docs in docs/api-reference/supervisor-api.md.

Thanks for the review @AmatyaAvadhanula! My original motivation for this change was to avoid unnecessary restarts of the Supervisor if possible. Our use case is that we maintain a repository of schemas and do periodic releases. It is often unclear which schemas were actually modified. We want to be able to submit them all, and just restart the Supervisors which had schema updates. This is so we can avoid the undesirable side effects of task restart, such as small segments.

With this use case in mind, I think that having the feature flag does make sense. I will add an update in the relevant doc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants