Skip to content

Conversation

@Joannis
Copy link

@Joannis Joannis commented Nov 3, 2025

This PR gets TCP (Servers) on Windows mostly working. I'll annotate my PR for clarity.

The one bug:
If you open a TCP client connection to a Windows TCP server, it doesn't read/write yet until a second connection comes in. This happens because reregister0 is called after WSAPoll is called - but WSAPoll isn't woken up to know this so it doesn't know the new socket requests read/writes/...

Also the EventLoop cannot be woken up during el.execute { .. } because SleepEx isn't running. WSAPoll doesn't respond to this signal it seems.

This makes the TCP server functionality not yet usable, but I thought with this draft we could figure out what I'm missing.

Alternative considered: We can make WSAPoll limited to wake up every 1 ms for example, that way we can still get to these events at some point.

This results in a partially working winsock
BUG: A connection currently only becomes communicative after a new connection comes in. But Windows TCP is working after that.
Comment on lines +48 to +57
var realHandle: HANDLE? = nil
let success = DuplicateHandle(
GetCurrentProcess(), // Source process
GetCurrentThread(), // Source handle (pseudo-handle)
GetCurrentProcess(), // Target process
&realHandle, // Target handle (real handle)
0, // Desired access (0 = same as source)
false, // Inherit handle
DWORD(DUPLICATE_SAME_ACCESS) // Options
)
Copy link
Author

@Joannis Joannis Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows gives pseudo handles by default, so they're always correct for the current thread. However, when spawning work on another thread, this handle results in getting the incorrect ThreadID, causing QueueUserAPC to notify the wrong (or non-existing) thread. This in turn prevents SleepEx from waking up the eventloop


static var currentThread: ThreadOpsSystem.ThreadHandle {
GetCurrentThread()
var realHandle: HANDLE? = nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here

registrationID: SelectorRegistrationID
) throws {
fatalError("TODO: Unimplemented")
if let index = self.pollFDs.firstIndex(where: { $0.fd == UInt64(fileDescriptor) }) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deregistering is just removing the FD. However, we can't just remove it here as we're iterating over the same pollFDs at the same time. deregister0 is called down the stack of try body((SelectorEvent(io: selectorEvent, registration: registration))). So the available indices of pollFDs changes causing a crash

continue
}

try body((SelectorEvent(io: selectorEvent, registration: registration)))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line often calls deregister0 indirectly, so we effectively can't mutate pollFDs in deregister0

Comment on lines +137 to +148
// now clean up any deregistered fds
// In reverse order so we don't have to copy elements out of the array
// If we do in in normal order, we'll have to shift all elements after the removed one
for i in self.deregisteredFDs.indices.reversed() {
if self.deregisteredFDs[i] {
// remove this one
let fd = self.pollFDs[i].fd
self.pollFDs.remove(at: i)
self.deregisteredFDs.remove(at: i)
self.registrations.removeValue(forKey: Int(fd))
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the deregister0 work of cleaning up after the polling is done

func initialiseState0() throws {
self.pollFDs.reserveCapacity(16)
self.deregisteredFDs.reserveCapacity(16)
self.lifecycleState = .open
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lifecycle never became open yet

Comment on lines 396 to 399
#if os(Windows)
case .winsock(WSAEWOULDBLOCK):
return false
#endif
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accept returns WSAEWOULDBLOCK

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't the current logic for syscall(blocking: true) cover this?

@Joannis
Copy link
Author

Joannis commented Nov 3, 2025

CC @fabianfett and @zamderax

@Joannis Joannis marked this pull request as ready for review November 3, 2025 14:22
This way new TCP connections will be functional
#if os(Windows)
if
let err = err as? IOError,
case .winsock(WSAEWOULDBLOCK) = err.error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this case reached?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I've found is that when WSAPoll gives the server socket a .read event when a client is inbound:

  • Server accepts the client through readable() -> readable0() -> ServerSocketChannel.readFromSocket()
  • The socket.accept(..) finds a socket
  • SelectableEventLoop repeats this again again to get another read (see maxMessagesPerRead)
  • This time, socket.accept(..) runs into WinSDK.INVALID_SOCKET
  • Gets the error using WSAGetLastError()
  • The error ends up reaching .winsock(WSAEWOULDBLOCK)

In hindsight, I just noticed that accept() returns an optional so I'll leverage that instead.

} else {
let result = self.pollFDs.withUnsafeMutableBufferPointer { ptr in
WSAPoll(ptr.baseAddress!, UInt32(ptr.count), time)
WSAPoll(ptr.baseAddress!, UInt32(ptr.count), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely following why this change needed to be made. Can you elaborate this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. When you initially create the eventloop, there is no FD to poll. So it goes into a SleepEx(INFINITE, true).
  2. When the server FD is added, it wakes up the selector using wakeup0()
  3. This implementation calls QueueUserAPC(..) on the EventLoop's thread
  4. QueueUserAPC wakes up WSAPoll and observes the socket

At this point there are FDs, so instead of SleepEx - the EventLoop calls into WSAPoll for those FDs.

  1. A client connects to the server
  2. Server accepts the socket, registering it to WSAPoll with a minimal set of events (.reset and .error IIRC)
  3. The eventloop is done, and goes into WSAPoll with this minimal set of events
  4. After some back & forth, the client reregister0s itself with more events
  5. WSAPoll is not aware of the change, WSAPoll is still invoked with the previous events subset
  6. Socket I/O is ignored, becase .read and .write were not part of the events

Then..

  1. A new (second) client attaches to the server
  2. It goes through the same flow, except now WSAPoll correctly polls the old socket's new events (read/write)
  3. New socket goes through the same trouble

To remedy this temporarily, I'm waking up WSAPoll every 1ms so we don't completely block I/O for new sockets. But this 100% not something we'll want to stick with of course. I don't yet know how to best tackle this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is WSAPoll not being made aware of the change? That seems like the obvious bug.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WSAPoll cannot be woken up early

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious what the long term solution here would be. I'd love to hear @fabianfett 's thoughts - not sure if he's available again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so assuming that statement is true (I haven't checked), the solution here is to have a self-pipe. This allows us to mostly replicate the eventfd pattern on Linux, so that in order to wake up the loop we write to a pipe.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's clever! I'll take a look at that

Comment on lines 396 to 399
#if os(Windows)
case .winsock(WSAEWOULDBLOCK):
return false
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't the current logic for syscall(blocking: true) cover this?


private func shouldCloseOnErrnoCode(_ errnoCode: CInt) -> Bool {
switch errnoCode {
private func shouldCloseOnErrnoCode(_ errno: CInt) -> Bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What motivated the changes to this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was messing with that code but the function itself shouldn't have changed, I'll revert it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants