Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stdlib): Add Json module #1133

Merged
merged 40 commits into from
Mar 4, 2024
Merged

feat(stdlib): Add Json module #1133

merged 40 commits into from
Mar 4, 2024

Conversation

cician
Copy link
Contributor

@cician cician commented Feb 6, 2022

This PR aims to include my JSON parser/printer (https://github.com/cician/grain-json) into stdlib. Hopefully for 0.5 release.

I'm opening this PR as a draft for now since I expect to still make some important changes once stack allocated chars and thick numbers are in. Strictly speaking no changes are really required after stack chars PR is merged, but code can be improved for clarity by using chars instead of code points as raw numbers. Thick numbers on the other hand are going to affect the behavior of number parsing.

Copy link
Member

@peblair peblair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! I put down some thoughts, but this is really amazing work. Thank you so much for putting this together!

stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.gr Outdated
*
* @example print(parse("{\"currency\":\"$\",\"price\":119}"))
*/
export let parse: String -> Result<JSON, JSONParseError> = (str: String) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want a parseOpt as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any options for parsing in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can only think of options relative to number parsing. And maybe some control over internal buffer sizes, but I'd rather not expose the buffer related stuff to allow future internal refactoring.

The user could want to parse into 32 bit floats or use a lax, but fast decimal to float conversion algorithm. Probably only interesting for relatively niche use cases with enormous data sets in JSON, for which Grain wouldn't be very efficient anyway since it boxes float numbers.

If Grain gains a BigDecimal number type in the future, then an option would be necessary to support both and maintain compatibility with default parsing behavior to floats. Though I don't even know if BigDecimal makes sense as a subtype of Number. If not, then this issue would not apply anyway as the current JSON enum would not do the job. Alternatively to BigDecimal I could keep the current "hack" that uses rationals, but now with BigInt numerator/denominator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would have been nice, another potentional option would have been allowComments, while it doesnt follow the spec it is something that is nice to have and could fit well into a parse option. But i think we came to the conclusion in the discord that we dont want to allow parsing options.

@jozanza jozanza self-requested a review March 31, 2022 16:16
@marcusroberts
Copy link
Member

I like the API. The thing missing is a searchable data structure but I think you address that in "Since then I decided that it's OK for this API not to present a directly searchable data structure and that this job can be left to a higher level API built on top. No specific plans yet, suggestions welcome." which I agree with.

Great work.

Copy link
Contributor

@jozanza jozanza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we waiting for BigInt to land before landing this?

@cician
Copy link
Contributor Author

cician commented Apr 10, 2022

Are we waiting for BigInt to land before landing this?

Sorry I left everyone in the limbo for more than a month and haven't made any progress on this. Now that I have again some time, I should make it more clear and explicit what is actually the blocker, even if it ends up being a bit of a recap of what have already been said in the past on Discord.

The current plan for number parsing is:

Integer numbers in the range -2^63..2^63 are parsed into Simple/Int32/Int64 variants of Number without problem.

Greater integer numbers would require BigInt, unless we choose to parse these into 64 bit floats with precision loss, and even that has its limit (10^308). I don't think it would make sense with BigInt on the horizon. If we really want, we could ship with a version that returns error in these cases and later change it to parse into BigInt without changes to the API.

Since there's currently no plan for inclusion of something like BigDecimal into Grain (AFAIK), the only choice is to parse numbers with decimal digits as floats. As most JSON parsers do, although some give you a choice. If I recall correctly this was the consensus.

So the major blocker is that we don't have float parsing. What is missing is basically float parsing, minus the actual parsing (which is the easy part and already done). Building a binary float from decimal integer+fraction is easy on the surface, but hard to get right in respect to rounding and stable round-tripping. I think it's very important given the Grain's primary target (business logic and smart contracts). I was planning on porting some existing code to achieve this, but haven't even started and haven't looked at the matter since February.

@ospencer
Copy link
Member

@cician gotcha. If you want to rebase off of the bigint branch you definitely can, and the format is pretty simple. But otherwise, yes, keep us updated on floats, and let us know if there's anything we can do to help.

@phated
Copy link
Member

phated commented May 24, 2022

@cician thiccnums have landed if you want to update this with them!

@phated phated assigned cician, jozanza and ospencer and unassigned cician, jozanza and ospencer Nov 30, 2022
@marcusroberts
Copy link
Member

@cician are you still around to work on this? Otherwise I'll take your fine work and finish it off for 0.6

@cician
Copy link
Contributor Author

cician commented Jan 15, 2023

Oscar reached out to me a few days ago saying he's willing to pick this up. I started to take a look again after I saw the float parsing PR merged, but haven't gotten far, so it's probably for the better you guys finish it, in order to merge it in time for v0.6. I leave below a few notes on different approaches I though of. Let me know if something is not clear here or in the parser code.

Currently the parser is structured internally for consuming a sequential stream of chars, one at a time. You can keep it this way for number parsing as well, but don't have to, since the API only accepts a String input. I've done it this way thinking of maybe extending the API in the future if we have a concept of data streams or something.

Solution A

Simple and dirty.

A valid JSON number token is a subset of valid inputs of Number.parseFloat. JSON doesn't allow NaNs, infinities or numbers with additional leading zeros, underscores etc.

So let the existing parser code in parseNumberValue function do the validation and just accumulate number token chars in a Buffer. Reuse the Buffer from JSONParserState already used for strings. At the end of the function just build a string from the buffer and call parseFloat. Remove the code that computes significand and exponent. Done.

This is the quickest solution to implement, but somewhat ugly and and slow. There's a cost of accumulating chars in the buffer, an allocation of a temporary String for each number, a temporary Bytes instance allocated by Buffer.toString (could be avoided btw) and redundant work of actual number parsing between the JSON parser and parseFloat. Also there's coupling with parsing logic of parseFloat, which is somewhat grain specific and not fully documented.

Solution B

Similar to solution A, but accumulate just the decimal digits in a Buffer and the decimal point offset. Use it to build the float like in the slow path of parseFloat.

Unintuitively this approach may actually be slower than solution A by skipping the fast path. Needs allocating an instance of the Decimal record with Bytes from the decimal buffer, but we skip allocating a String and we avoid coupling JSON parser with behavior of parseFloat. Building a Bytes instance could be avoided by allowing access to the raw Bytes instance backing the Buffer.

Solution C

Optimized parsing function similar to parseFloat with unsafe code, but tailored for JSON and working of a slice of memory instead of a string instance.

This would result in great performance, but other than using unsafe code, would probably require moving away from the idea of a sequential char stream for JSON, unless parseLongMantissa in the slow path can somehow resume without need to restart from the beginning of the string.

Solution D

A hybrid between solutions B and C as a compromise to keep sequential streaming logic. Fast path is tried by reimplementing parseFloatToParts directly inside parseNumberValue, but the buffer of chars is always filled from the beginning to be used in the slow path.

I was attempting solution B, but only gotten as far as filling the buffer of decimals and breaking up the parseLongMantissa function to have an overload that starts from a Decimal record instead of a String.

edit: typo

@marcusroberts
Copy link
Member

Thanks @cician that's really useful and food for thought!

@jozanza jozanza self-assigned this Feb 16, 2023
@JairusSW
Copy link

JairusSW commented Mar 4, 2023

Great lib, @cician!
(I ended up passing development to jake because i'm new to grain)

@spotandjake
Copy link
Member

Looks like the tests are passing which is good. This should be all ready for review a note is I wrote a TODO for eventually switching to a streaming based float parsing implementation that works on the indivdual chars rather then storing it in a buffer, so I need to open an issue for that and reference the issue in the comment.

A question for @cician is under the toString functions you have a FIXME magic numbers but I do not know what magic number's you are talking about because from what i understand NaN and Infintiy would be those numbers but those are already properly handled.

stdlib/json.gr Outdated Show resolved Hide resolved
stdlib/json.md Outdated Show resolved Hide resolved
@phated phated changed the title feat(stdlib): JSON parsing and printing. feat(stdlib): Add Json module Mar 4, 2024
Copy link
Member

@phated phated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's done folks! 🎉

@phated phated dismissed ospencer’s stale review March 4, 2024 02:24

Pretty sure these changes were made

@phated phated added this pull request to the merge queue Mar 4, 2024
Merged via the queue into grain-lang:main with commit 5a6e4c6 Mar 4, 2024
12 checks passed
@phated phated mentioned this pull request Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

9 participants