-
-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spec for Socket#read_nonblock
with a provided buffer
#1145
Conversation
IO.select([@r], nil, nil, 2) | ||
@r.read_nonblock(5, buffer) | ||
buffer.should == "aaa" | ||
buffer.encoding.should == initial_encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the buffer is replaced by the contents of the read, ignoring the previous contents, it's a good question what should be the encoding, should it be the encoding of the read, or the original string encoding?
IMO using the original string encoding is wrong, the data returned there might not be valid in that encoding.
OTOH it clearly seems CRuby's behavior.
Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?
I could easily see the CRuby behavior causing bugs, e.g. appending binary data to a buffer declared as +""
which is UTF-8 and so causing extra needless computations e.g. for computing the coderange.
cc @ioquatix WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: thank you for the PR and I'll merge this, but it seems good to have a bit of discussion if the semantics are desired or maybe accidental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an immediate strong opinion, except that using binary everywhere is the most predictable approach.
I wonder if the encoding should be set to whatever the encoding of the IO is. That would make sense to me, i.e. it would make sense to be consistent:
buffer1 = io.read_nonblock(1024)
buffer2 = String.new
io.read_nonblock(1024, buffer2)
buffer1.encoding == buffer2.encoding # ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if CRuby behaves like this for all IO methods taking a buffer, i.e. keeping the String encoding and basically replacing the contents as bytes, regardless of that being possibly meaningless in the String encoding?
AFAIK yes.
Personally I find this behavior desirable. If I know I'm using a protocol with a particular encoding, I don't want to have to force the encoding on every read.
As for the result being potentially broken, yes but that's on me to handle with a valid_encoding?
check, and if needed I can use that as a signal that I need to wait for more data.
484e95c
to
c78a7e9
Compare
I augmented the spec a bit. |
c78a7e9
to
23cd936
Compare
I also added a similar spec for |
The behavior of `IO#read_nonblock` differs sligthly. See: ruby/spec#1145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the new specs!
@casperisfine The CI fails on Windows for the new specs, could you exclude the problematic specs on Windows ( |
When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to `Encoding::BINARY`
23cd936
to
d7213cb
Compare
It seems whether the encoding is changed depend on arguments for |
Ref: ruby/ruby@0ca7036
|
When provided a buffer MRI preserves the original encoding but TruffleRuby sets the encoding to
Encoding::BINARY
cc @eregon
Discovered as part of redis-rb/redis-client#184