Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert procedure result to JSON with consistent scalar representations #24

Merged
merged 24 commits into from
Apr 3, 2024

Conversation

hallettj
Copy link
Collaborator

@hallettj hallettj commented Apr 1, 2024

Implements bson_to_json which mostly works as an inverse of json_to_bson. This replaces the bson library's stock serialization implementation which outputs a variation of extjson. We don't want to output extjson (which includes inline type tags) because we keep type information out-of-band, and using extjson would prevent output scalar values from matching input formats despite showing the same scalar types in the graph configuration.

I want to apply the same output conversion to normal query responses. That will be in another PR.

I set up round-trip serialization testing to check for consistency between bson_to_json and json_to_bson. The way it is implemented round-trips do not work with heterogeneous data (cases where we infer a type of Any at any position). So I configured the round-trip test to avoid those cases. An alternative is to output extjson in cases where we show the type as Any so we can losslessly convert back and forth - but that has a trade-off where representations would be inconsistent at different parts of the structure. I'm planning to write up a document to discuss the best way to handle this. Per our discussions I'll change the representation for Any to extended JSON.

For the moment round-trip testing is failing with Datetime and Decimal128 types and possibly some others. I intend to fix that before merging this PR. I fixed these. Round-trip tests caught a bug in Datetime conversion, and I added a normalization step when generating arbitrary Decimal128 values.

Ticket: https://hasurahq.atlassian.net/browse/MDB-95

@dmoverton
Copy link
Contributor

As discussed, we should use Extended JSON for values with type Any.

@hallettj
Copy link
Collaborator Author

hallettj commented Apr 2, 2024

@dmoverton I disabled Undefined outputs from arb_bson because those were failing round-trip tests. Using undefined as the bottom type for unifying object field types is logical. But it does mean that we lose undefined as a distinct type when it appears in arrays with other types. For example one of the round-trip test cases that failed was the bson value [null, undefined]. After conversion to json and back we get [null, null] because type unification of null and undefined yields the type null. In my experience with javascript semantics I think "missing object field" is a sensible "bottom" case, while undefined itself is a unit type like null. But it does seem like the difference probably doesn't matter in 99% of cases.

@dmoverton
Copy link
Contributor

@dmoverton I disabled Undefined outputs from arb_bson because those were failing round-trip tests. Using undefined as the bottom type for unifying object field types is logical. But it does mean that we lose undefined as a distinct type when it appears in arrays with other types. For example one of the round-trip test cases that failed was the bson value [null, undefined]. After conversion to json and back we get [null, null] because type unification of null and undefined yields the type null. In my experience with javascript semantics I think "missing object field" is a sensible "bottom" case, while undefined itself is a unit type like null. But it does seem like the difference probably doesn't matter in 99% of cases.

Yeah I've been wanting to create a more canonical type representation to use, at least while we're doing inference and type unification. It would have its own bottom type. We would then convert into the current Type representation only when inference is done and we want to serialize the types.

Copy link
Contributor

@dmoverton dmoverton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@hallettj hallettj marked this pull request as ready for review April 3, 2024 03:06
@hallettj
Copy link
Collaborator Author

hallettj commented Apr 3, 2024

I'll merge this once the round-trip test is fixed

@dmoverton
Copy link
Contributor

@hallettj I'm happy for you to merge this now

@hallettj hallettj merged commit 7d947d5 into main Apr 3, 2024
1 check passed
@hallettj hallettj deleted the jesse/bson_to_json branch April 3, 2024 23:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants