-
Notifications
You must be signed in to change notification settings - Fork 724
Winsock Fixes #3433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Winsock Fixes #3433
Conversation
This results in a partially working winsock
BUG: A connection currently only becomes communicative after a new connection comes in. But Windows TCP is working after that.
| var realHandle: HANDLE? = nil | ||
| let success = DuplicateHandle( | ||
| GetCurrentProcess(), // Source process | ||
| GetCurrentThread(), // Source handle (pseudo-handle) | ||
| GetCurrentProcess(), // Target process | ||
| &realHandle, // Target handle (real handle) | ||
| 0, // Desired access (0 = same as source) | ||
| false, // Inherit handle | ||
| DWORD(DUPLICATE_SAME_ACCESS) // Options | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows gives pseudo handles by default, so they're always correct for the current thread. However, when spawning work on another thread, this handle results in getting the incorrect ThreadID, causing QueueUserAPC to notify the wrong (or non-existing) thread. This in turn prevents SleepEx from waking up the eventloop
|
|
||
| static var currentThread: ThreadOpsSystem.ThreadHandle { | ||
| GetCurrentThread() | ||
| var realHandle: HANDLE? = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here
| registrationID: SelectorRegistrationID | ||
| ) throws { | ||
| fatalError("TODO: Unimplemented") | ||
| if let index = self.pollFDs.firstIndex(where: { $0.fd == UInt64(fileDescriptor) }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deregistering is just removing the FD. However, we can't just remove it here as we're iterating over the same pollFDs at the same time. deregister0 is called down the stack of try body((SelectorEvent(io: selectorEvent, registration: registration))). So the available indices of pollFDs changes causing a crash
| continue | ||
| } | ||
|
|
||
| try body((SelectorEvent(io: selectorEvent, registration: registration))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line often calls deregister0 indirectly, so we effectively can't mutate pollFDs in deregister0
| // now clean up any deregistered fds | ||
| // In reverse order so we don't have to copy elements out of the array | ||
| // If we do in in normal order, we'll have to shift all elements after the removed one | ||
| for i in self.deregisteredFDs.indices.reversed() { | ||
| if self.deregisteredFDs[i] { | ||
| // remove this one | ||
| let fd = self.pollFDs[i].fd | ||
| self.pollFDs.remove(at: i) | ||
| self.deregisteredFDs.remove(at: i) | ||
| self.registrations.removeValue(forKey: Int(fd)) | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the deregister0 work of cleaning up after the polling is done
| func initialiseState0() throws { | ||
| self.pollFDs.reserveCapacity(16) | ||
| self.deregisteredFDs.reserveCapacity(16) | ||
| self.lifecycleState = .open |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lifecycle never became open yet
Sources/NIOPosix/SocketChannel.swift
Outdated
| #if os(Windows) | ||
| case .winsock(WSAEWOULDBLOCK): | ||
| return false | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accept returns WSAEWOULDBLOCK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't the current logic for syscall(blocking: true) cover this?
|
CC @fabianfett and @zamderax |
This way new TCP connections will be functional
| #if os(Windows) | ||
| if | ||
| let err = err as? IOError, | ||
| case .winsock(WSAEWOULDBLOCK) = err.error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this case reached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I've found is that when WSAPoll gives the server socket a .read event when a client is inbound:
- Server accepts the client through
readable()->readable0()->ServerSocketChannel.readFromSocket() - The
socket.accept(..)finds a socket SelectableEventLooprepeats this again again to get another read (see maxMessagesPerRead)- This time,
socket.accept(..)runs intoWinSDK.INVALID_SOCKET - Gets the error using
WSAGetLastError() - The error ends up reaching
.winsock(WSAEWOULDBLOCK)
In hindsight, I just noticed that accept() returns an optional so I'll leverage that instead.
| } else { | ||
| let result = self.pollFDs.withUnsafeMutableBufferPointer { ptr in | ||
| WSAPoll(ptr.baseAddress!, UInt32(ptr.count), time) | ||
| WSAPoll(ptr.baseAddress!, UInt32(ptr.count), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely following why this change needed to be made. Can you elaborate this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- When you initially create the eventloop, there is no FD to poll. So it goes into a
SleepEx(INFINITE, true). - When the server FD is added, it wakes up the selector using
wakeup0() - This implementation calls
QueueUserAPC(..)on the EventLoop's thread - QueueUserAPC wakes up WSAPoll and observes the socket
At this point there are FDs, so instead of SleepEx - the EventLoop calls into WSAPoll for those FDs.
- A client connects to the server
- Server accepts the socket, registering it to WSAPoll with a minimal set of events (
.resetand.errorIIRC) - The eventloop is done, and goes into WSAPoll with this minimal set of events
- After some back & forth, the client
reregister0s itself with more events - WSAPoll is not aware of the change, WSAPoll is still invoked with the previous events subset
- Socket I/O is ignored, becase
.readand.writewere not part of the events
Then..
- A new (second) client attaches to the server
- It goes through the same flow, except now WSAPoll correctly polls the old socket's new events (read/write)
- New socket goes through the same trouble
To remedy this temporarily, I'm waking up WSAPoll every 1ms so we don't completely block I/O for new sockets. But this 100% not something we'll want to stick with of course. I don't yet know how to best tackle this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is WSAPoll not being made aware of the change? That seems like the obvious bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WSAPoll cannot be woken up early
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the long term solution here would be. I'd love to hear @fabianfett 's thoughts - not sure if he's available again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so assuming that statement is true (I haven't checked), the solution here is to have a self-pipe. This allows us to mostly replicate the eventfd pattern on Linux, so that in order to wake up the loop we write to a pipe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's clever! I'll take a look at that
Sources/NIOPosix/SocketChannel.swift
Outdated
| #if os(Windows) | ||
| case .winsock(WSAEWOULDBLOCK): | ||
| return false | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't the current logic for syscall(blocking: true) cover this?
Sources/NIOPosix/SocketChannel.swift
Outdated
|
|
||
| private func shouldCloseOnErrnoCode(_ errnoCode: CInt) -> Bool { | ||
| switch errnoCode { | ||
| private func shouldCloseOnErrnoCode(_ errno: CInt) -> Bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What motivated the changes to this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was messing with that code but the function itself shouldn't have changed, I'll revert it.
This PR gets TCP (Servers) on Windows mostly working. I'll annotate my PR for clarity.
The one bug:
If you open a TCP client connection to a Windows TCP server, it doesn't read/write yet until a second connection comes in. This happens because
reregister0is called afterWSAPollis called - but WSAPoll isn't woken up to know this so it doesn't know the new socket requests read/writes/...Also the EventLoop cannot be woken up during
el.execute { .. }becauseSleepExisn't running. WSAPoll doesn't respond to this signal it seems.This makes the TCP server functionality not yet usable, but I thought with this draft we could figure out what I'm missing.
Alternative considered: We can make
WSAPolllimited to wake up every 1 ms for example, that way we can still get to these events at some point.