Skip to content

Conversation

@ysbaddaden
Copy link
Contributor

@ysbaddaden ysbaddaden commented Oct 28, 2025

This patch implements a reference counted lock to protect IO objects that depend on a reusable system fd (IO::FileDescriptor, File and Socket) to protect them against thread safety issues around close:

  • Thread 1 wants to read from fd 123;
  • The OS preempts Thread 1;
  • Thread 2 closes fd 123;
  • Thread 2 opens something else and the OS reuses fd 123;
  • The OS resumes Thread 1;
  • Thread 1 reads from the reused fd 123!!!

The same issue arises for any operation that would mutate the fd: write, fchown, ftruncate, setsockopt, ... as they risk affecting a reused fd.

NOTE: The lock is currently implemented on the UNIX target only, but we might want to use it on every target. Go uses its fdMutex on every targets.

Extracted from #16209 (follow-up with single reader/writer)
Depends on #16288 (EventLoop#shutdown)
Closes #16127
Obsoletes #16128

This patch implements a reference counted lock to protect IO objects
that depend on a reusable system fd (IO::FileDescriptor, File and
Socket) to protect them against thread safety issues around close:

- Thread 1 wants to read from fd 123;
- The OS preempts Thread 1;
- Thread 2 closes fd 123;
- Thread 2 opens something else and the OS reuses fd 123;
- The OS resumes Thread 1;
- Thread 1 reads from the reused fd 123!!!

The issue arises for any operation that would mutate the fd: write,
fchown, ftruncate, setsockopt, ... as they risk affecting a reused fd
instead of the expected one.
Only operations that can affect the file descriptor are counted, for
example read or write, truncating a file or changing file permissions.

Mere queries with no side effects go through normally because at worst
they will fail (they would have anyway).
@ysbaddaden ysbaddaden force-pushed the feature/add-crystal-fd-lock branch from ef5d08d to 3150772 Compare November 14, 2025 14:28
@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Nov 14, 2025

Rebased on master to remove #16288 that has been merged + its fixup (#16366).

@straight-shoota
Copy link
Member

Is there a particular reason why we're rolling this out only on Unix targets instead of globally?
There is merit in making smaller increments, but that's a bit offset by the extra method overrides only for the Unix implementations (system_read & co).

@ysbaddaden
Copy link
Contributor Author

Because the issue is on UNIX.

I can move it out of Crystal::System if we believe there's value for every targets' IO::FileDescriptor and Socket.

@straight-shoota
Copy link
Member

It seems useful to share the same implementation across platforms. Even if it's not strictly necessary on Windows, it's easier to maintain if we only have to worry one mechanism.
That's assuming there are no grave downsides to using this on Windows? I presume there might be some performance implications, but closing doesn't seem like a very contested operation.

@ysbaddaden
Copy link
Contributor Author

Close doesn't create a contention point. The problem is concurrency to the same stdio, file or socket, because we must atomically increment. Many fibers frantically writing to STDOUT will see an impact.

The next step to have a single reader and a single writer (#16209) could be useful on Windows to replace the custom thread communication to read async from the console: when we could merely detach the current thread (#15871) yet make sure only one thread is blocked —which we could use on UNIX to replace the TTY hack (#16353).

@straight-shoota
Copy link
Member

Many fibers frantically writing to STDOUT will see an impact.

That probably produces a big jumble anyway, so it doesn't seem like a very relevant use case.

@ysbaddaden
Copy link
Contributor Author

If you're careful to buffer your message and to fit within PIPE_BUF then writing to an stdio is atomic (POSIX requirement). In practice it appears to be fine for files.

The tracing feature heavily relies on this.

In practice you don't need to write so frantically as printing every malloc or write something every few microseconds, and using a channel + fiber (as Log does) will completely remove the contention.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Nov 17, 2025

Anyway: I'll move @fd_lock out of Crystal::System 👍

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Nov 20, 2025

I started moving @fd_lock out of Crystal::System and I don't like it 😢

The explicit relationship between the lock and the fd, for example @fd_lock.reference { LibC.fsync(fd) }, is replaced with a blind lock because the wrapped method might implicitly reference fd, for example @fd_lock.reference { system_fsync }.

That looks bad and feels brittle.

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Nov 20, 2025

I'd prefer to duplicate the behavior in Crystal::System for Windows to protect the handle, and that could come as a follow up.

@straight-shoota
Copy link
Member

There are already a number of indirect reference where the locked block delegates to the event loop that I'm concerned about.
For example, the wrappers at the end of unix/socket.cr.

The complexity of delegation is already quite high between the public API, system implementations and event loop.
Would be great if there was any chance to simplify that somehow.

This is totally not a stopper, though. Maybe we figure out something later (probably not, though 🤷).

@ysbaddaden
Copy link
Contributor Author

ysbaddaden commented Nov 21, 2025

Tried again, and from the point of view of "protecting the system_ methods" it feels better.

I hit a blocker though: we must implement Crystal::EventLoop::IOCP#shutdown otherwise the refcount won't be decremented and the files could at worst be never closed and fibers get stuck.

It's easy for Socket, but IO::FileDescriptor is another story: we must memorize the pending overlapped ops for every file, and actively cancel them (which may be in whatever IOCP instance, possibly multiple of them). We must also be careful with the STDIN console hack, as well as the blocking read/write calls —can they be canceled?

As for Like the io_uring event loop, I believe we'll want to wait for the follow-up that serializes reads and writes so there can be only one reader and one writer at most.

@straight-shoota
Copy link
Member

As for the io_uring event loop, I believe we'll want to wait for the follow-up that serializes reads and writes so there can be only one reader and one writer at most.

Would that make it simpler for IOCP as well?

@ysbaddaden
Copy link
Contributor Author

Yes, this is what I meant.

@ysbaddaden ysbaddaden moved this from Review to Approved in Multi-threading Nov 24, 2025
@ysbaddaden ysbaddaden added this to the 1.19.0 milestone Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:bug A bug in the code. Does not apply to documentation, specs, etc. platform:unix topic:multithreading topic:stdlib:runtime

Projects

Status: Approved

Development

Successfully merging this pull request may close these issues.

Closing fd is thread unsafe on UNIX targets

3 participants