Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec for Socket#read_nonblock with a provided buffer #1145

Merged
merged 1 commit into from
Mar 25, 2024

Conversation

casperisfine
Copy link
Contributor

When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to Encoding::BINARY

cc @eregon

Discovered as part of redis-rb/redis-client#184

IO.select([@r], nil, nil, 2)
@r.read_nonblock(5, buffer)
buffer.should == "aaa"
buffer.encoding.should == initial_encoding
Copy link
Member

@eregon eregon Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the buffer is replaced by the contents of the read, ignoring the previous contents, it's a good question what should be the encoding, should it be the encoding of the read, or the original string encoding?

IMO using the original string encoding is wrong, the data returned there might not be valid in that encoding.
OTOH it clearly seems CRuby's behavior.
Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?

I could easily see the CRuby behavior causing bugs, e.g. appending binary data to a buffer declared as +"" which is UTF-8 and so causing extra needless computations e.g. for computing the coderange.

cc @ioquatix WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: thank you for the PR and I'll merge this, but it seems good to have a bit of discussion if the semantics are desired or maybe accidental.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an immediate strong opinion, except that using binary everywhere is the most predictable approach.

I wonder if the encoding should be set to whatever the encoding of the IO is. That would make sense to me, i.e. it would make sense to be consistent:

buffer1 = io.read_nonblock(1024)

buffer2 = String.new
io.read_nonblock(1024, buffer2)

buffer1.encoding == buffer2.encoding # ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?

AFAIK yes.

Personally I find this behavior desirable. If I know I'm using a protocol with a particular encoding, I don't want to have to force the encoding on every read.

As for the result being potentially broken, yes but that's on me to handle with a valid_encoding? check, and if needed I can use that as a signal that I need to wait for more data.

@casperisfine
Copy link
Contributor Author

I augmented the spec a bit.

@casperisfine
Copy link
Contributor Author

I also added a similar spec for #read and amusingly TruffleRuby's behavior match MRI's on read, just not on read_nonblock.

casperisfine pushed a commit to redis-rb/redis-client that referenced this pull request Mar 23, 2024
The behavior of `IO#read_nonblock` differs sligthly.
See: ruby/spec#1145
Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the new specs!

@eregon
Copy link
Member

eregon commented Mar 25, 2024

@casperisfine The CI fails on Windows for the new specs, could you exclude the problematic specs on Windows (platform_is_not :windows)?

When provided a buffer MRI preserves the original encoding
but TruffleRuby sets the encoding to `Encoding::BINARY`
@eregon eregon merged commit c8ab292 into ruby:master Mar 25, 2024
14 checks passed
@eregon
Copy link
Member

eregon commented Apr 9, 2024

It seems whether the encoding is changed depend on arguments for read, see https://bugs.ruby-lang.org/issues/20416

@casperisfine
Copy link
Contributor Author

Ref: ruby/ruby@0ca7036

  • io.c (read_all): should associate default external encoding.
  • io.c (io_read): should NOT associate default external encoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants