You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When dealing with network data we need to handle the case of bytes that can't be directly encoded in JSON (at least in a portable way).
Currently we do this for network headers and cookies by having either a value field for byte sequences that can decode to a UTF-8 string, or a binaryValue field for sequences that cannot. The latter is an array of integer byte values. So, ignoring the fact that we only use the binaryValue representation when the value representation doesn't work, a value "foo" could be represented as
{"value": "foo"}
or
{"binaryValue": [102, 111, 111]}
The latter has significant overhead; each byte requires at least 2 bytes in the representation (at least one digit and a comma or ]), and it can be up to 5 bytes per byte (three digits taking one byte each, a comma, and a space).
Network request interception requires us to be able to serialize request and response bodies so they can be overridden. In the case of bodies the 2-5x overhead for non-UTF-8 is clearly unacceptable, so instead we use base64-encoding, which has a roughly 33% overhead. In particular the proposal at the moment is that UTF-8 bodies are encoded like:
{"body": {"type": "string", "value": "foo"}}
and non-UTF-8 bodies are encoded like:
{"body": {"type": "base64", "value": "Zm9v\n"}}
So there are two differences here:
Use of base64 rather than array of bytes.
Use of nominative typing rather than structural typing.
I think we should align the two representations. Clearly base4 is a better idea than array of bytes. I'd also like to use the nominative typing. It's similar to the way we represent js values (to the extent that the string representation is directly compatible) and, for bodies, it makes it clear how to extend if in the future we want to allow an IOStream representation (i.e. a handle that can be used to pull the actual bytes async)..
The text was updated successfully, but these errors were encountered:
@jgraham So is the idea that we would align values for network.Cookie, network.Header, and network.Body? I could probably get behind that. It would certainly simplify parsing the values on the local end, from my perspective.
When dealing with network data we need to handle the case of bytes that can't be directly encoded in JSON (at least in a portable way).
Currently we do this for network headers and cookies by having either a
value
field for byte sequences that can decode to a UTF-8 string, or abinaryValue
field for sequences that cannot. The latter is an array of integer byte values. So, ignoring the fact that we only use the binaryValue representation when the value representation doesn't work, a value "foo" could be represented asor
The latter has significant overhead; each byte requires at least 2 bytes in the representation (at least one digit and a comma or
]
), and it can be up to 5 bytes per byte (three digits taking one byte each, a comma, and a space).Network request interception requires us to be able to serialize request and response bodies so they can be overridden. In the case of bodies the 2-5x overhead for non-UTF-8 is clearly unacceptable, so instead we use base64-encoding, which has a roughly 33% overhead. In particular the proposal at the moment is that UTF-8 bodies are encoded like:
and non-UTF-8 bodies are encoded like:
So there are two differences here:
I think we should align the two representations. Clearly base4 is a better idea than array of bytes. I'd also like to use the nominative typing. It's similar to the way we represent js values (to the extent that the string representation is directly compatible) and, for bodies, it makes it clear how to extend if in the future we want to allow an IOStream representation (i.e. a handle that can be used to pull the actual bytes async)..
The text was updated successfully, but these errors were encountered: