Consistent encoding of binary values #432

jgraham · 2023-05-25T13:38:47Z

When dealing with network data we need to handle the case of bytes that can't be directly encoded in JSON (at least in a portable way).

Currently we do this for network headers and cookies by having either a value field for byte sequences that can decode to a UTF-8 string, or a binaryValue field for sequences that cannot. The latter is an array of integer byte values. So, ignoring the fact that we only use the binaryValue representation when the value representation doesn't work, a value "foo" could be represented as

{"value": "foo"}

or

{"binaryValue": [102, 111, 111]}

The latter has significant overhead; each byte requires at least 2 bytes in the representation (at least one digit and a comma or ]), and it can be up to 5 bytes per byte (three digits taking one byte each, a comma, and a space).

Network request interception requires us to be able to serialize request and response bodies so they can be overridden. In the case of bodies the 2-5x overhead for non-UTF-8 is clearly unacceptable, so instead we use base64-encoding, which has a roughly 33% overhead. In particular the proposal at the moment is that UTF-8 bodies are encoded like:

{"body": {"type": "string", "value": "foo"}}

and non-UTF-8 bodies are encoded like:

{"body": {"type": "base64", "value": "Zm9v\n"}}

So there are two differences here:

Use of base64 rather than array of bytes.
Use of nominative typing rather than structural typing.

I think we should align the two representations. Clearly base4 is a better idea than array of bytes. I'd also like to use the nominative typing. It's similar to the way we represent js values (to the extent that the string representation is directly compatible) and, for bodies, it makes it clear how to extend if in the future we want to allow an IOStream representation (i.e. a handle that can be used to pull the actual bytes async)..

The text was updated successfully, but these errors were encountered:

jimevans · 2023-05-25T15:48:42Z

@jgraham So is the idea that we would align values for network.Cookie, network.Header, and network.Body? I could probably get behind that. It would certainly simplify parsing the values on the local end, from my perspective.

jgraham · 2023-07-05T11:59:33Z

#472 implements this.

jimevans · 2023-07-20T21:32:04Z

@jgraham Given that #472 has been approved and merged, can this issue be closed?

jgraham closed this as completed Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent encoding of binary values #432

Consistent encoding of binary values #432

jgraham commented May 25, 2023

jimevans commented May 25, 2023

jgraham commented Jul 5, 2023

jimevans commented Jul 20, 2023

Consistent encoding of binary values #432

Consistent encoding of binary values #432

Comments

jgraham commented May 25, 2023

jimevans commented May 25, 2023

jgraham commented Jul 5, 2023

jimevans commented Jul 20, 2023