Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1317][FOLLOWUP] HttpServer avoid Jetty's acceptor thread shrink for stopping #2450

Closed
wants to merge 1 commit into from

Conversation

SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Apr 8, 2024

What changes were proposed in this pull request?

HttpServer set idle timeout to 0 in order to avoid Jetty's acceptor thread shrink for stopping.

Why are the changes needed?

When the Jetty's acceptor thread shrinks before the main thread sends a signal to the thread, the issue java.io.IOException: No such file or directory could happen.

Jetty's acceptor thread waits for a new connection request and blocked by accept(this.fd, newfd, isaa) in sun.nio.ch.ServerSocketChannelImpl#accept.

When org.eclipse.jetty.server.Server.doStop is called in the main thread, the thread reaches this code.

The server socket descriptor will be closed by nd.preClose in the main thread.
Then, accept() in acceptor thread throws an Exception due to "Bad file descriptor" in case of macOS.
After the exception is thrown, the acceptor thread will continue to fetch a task.
If the thread obtain the SHRINK task here, the thread will be shrink.
If, the acceptor thread finishes before NativeThread.signal is called in the main thread, this issue happens.

Environment:

  • Jetty: v9.4.52.v20230823
  • JDK: Oracle JDK 1.8
  • OS: Linux version 5.10.0 (gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, GNU ld (GNU Binutils for Debian) 2.35.2)

Backport apache/spark#28437.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

No.

@SteNicholas
Copy link
Member Author

cc @RexXiong, @turboFei.

Copy link
Contributor

@RexXiong RexXiong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pan3793
Copy link
Member

pan3793 commented Apr 8, 2024

does it happen today too?

@SteNicholas
Copy link
Member Author

@pan3793, I have tested in K8S test environment and occured the same exception with above Spark pull request, which exception is occasionally thrown.

Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the PR description to mention your case, e.g. Jetty version, JDK version, OS info

@RexXiong
Copy link
Contributor

RexXiong commented Apr 9, 2024

Thanks. Merge to main(v0.5.0)

@RexXiong RexXiong closed this in f25972d Apr 9, 2024
@SteNicholas SteNicholas deleted the CELEBORN-1317 branch May 8, 2024 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants