Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent encoding of binary values #432

Closed
jgraham opened this issue May 25, 2023 · 3 comments
Closed

Consistent encoding of binary values #432

jgraham opened this issue May 25, 2023 · 3 comments

Comments

@jgraham
Copy link
Member

jgraham commented May 25, 2023

When dealing with network data we need to handle the case of bytes that can't be directly encoded in JSON (at least in a portable way).

Currently we do this for network headers and cookies by having either a value field for byte sequences that can decode to a UTF-8 string, or a binaryValue field for sequences that cannot. The latter is an array of integer byte values. So, ignoring the fact that we only use the binaryValue representation when the value representation doesn't work, a value "foo" could be represented as

{"value": "foo"}

or

{"binaryValue": [102, 111, 111]}

The latter has significant overhead; each byte requires at least 2 bytes in the representation (at least one digit and a comma or ]), and it can be up to 5 bytes per byte (three digits taking one byte each, a comma, and a space).

Network request interception requires us to be able to serialize request and response bodies so they can be overridden. In the case of bodies the 2-5x overhead for non-UTF-8 is clearly unacceptable, so instead we use base64-encoding, which has a roughly 33% overhead. In particular the proposal at the moment is that UTF-8 bodies are encoded like:

{"body": {"type": "string", "value": "foo"}}

and non-UTF-8 bodies are encoded like:

{"body": {"type": "base64", "value": "Zm9v\n"}}

So there are two differences here:

  • Use of base64 rather than array of bytes.
  • Use of nominative typing rather than structural typing.

I think we should align the two representations. Clearly base4 is a better idea than array of bytes. I'd also like to use the nominative typing. It's similar to the way we represent js values (to the extent that the string representation is directly compatible) and, for bodies, it makes it clear how to extend if in the future we want to allow an IOStream representation (i.e. a handle that can be used to pull the actual bytes async)..

@jimevans
Copy link
Collaborator

@jgraham So is the idea that we would align values for network.Cookie, network.Header, and network.Body? I could probably get behind that. It would certainly simplify parsing the values on the local end, from my perspective.

@jgraham
Copy link
Member Author

jgraham commented Jul 5, 2023

#472 implements this.

@jimevans
Copy link
Collaborator

@jgraham Given that #472 has been approved and merged, can this issue be closed?

@jgraham jgraham closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants