feat(stdlib): Add `Json` module #1133

cician · 2022-02-06T22:49:36Z

This PR aims to include my JSON parser/printer (https://github.com/cician/grain-json) into stdlib. Hopefully for 0.5 release.

I'm opening this PR as a draft for now since I expect to still make some important changes once stack allocated chars and thick numbers are in. Strictly speaking no changes are really required after stack chars PR is merged, but code can be improved for clarity by using chars instead of code points as raw numbers. Thick numbers on the other hand are going to affect the behavior of number parsing.

peblair

This looks awesome! I put down some thoughts, but this is really amazing work. Thank you so much for putting this together!

stdlib/json.gr

peblair · 2022-03-05T20:33:43Z

stdlib/json.gr

+ * 
+ * @example print(parse("{\"currency\":\"$\",\"price\":119}"))
+ */
+export let parse: String -> Result<JSON, JSONParseError> = (str: String) => {


We might want a parseOpt as well

Do you have any options for parsing in mind?

I can only think of options relative to number parsing. And maybe some control over internal buffer sizes, but I'd rather not expose the buffer related stuff to allow future internal refactoring.

The user could want to parse into 32 bit floats or use a lax, but fast decimal to float conversion algorithm. Probably only interesting for relatively niche use cases with enormous data sets in JSON, for which Grain wouldn't be very efficient anyway since it boxes float numbers.

If Grain gains a BigDecimal number type in the future, then an option would be necessary to support both and maintain compatibility with default parsing behavior to floats. Though I don't even know if BigDecimal makes sense as a subtype of Number. If not, then this issue would not apply anyway as the current JSON enum would not do the job. Alternatively to BigDecimal I could keep the current "hack" that uses rationals, but now with BigInt numerator/denominator.

I think this would have been nice, another potentional option would have been allowComments, while it doesnt follow the spec it is something that is nice to have and could fit well into a parse option. But i think we came to the conclusion in the discord that we dont want to allow parsing options.

marcusroberts · 2022-03-31T16:33:47Z

I like the API. The thing missing is a searchable data structure but I think you address that in "Since then I decided that it's OK for this API not to present a directly searchable data structure and that this job can be left to a higher level API built on top. No specific plans yet, suggestions welcome." which I agree with.

Great work.

compiler/test/stdlib/json.test.gr

jozanza

Are we waiting for BigInt to land before landing this?

cician · 2022-04-10T19:30:12Z

Are we waiting for BigInt to land before landing this?

Sorry I left everyone in the limbo for more than a month and haven't made any progress on this. Now that I have again some time, I should make it more clear and explicit what is actually the blocker, even if it ends up being a bit of a recap of what have already been said in the past on Discord.

The current plan for number parsing is:

Integer numbers in the range -2^63..2^63 are parsed into Simple/Int32/Int64 variants of Number without problem.

Greater integer numbers would require BigInt, unless we choose to parse these into 64 bit floats with precision loss, and even that has its limit (10^308). I don't think it would make sense with BigInt on the horizon. If we really want, we could ship with a version that returns error in these cases and later change it to parse into BigInt without changes to the API.

Since there's currently no plan for inclusion of something like BigDecimal into Grain (AFAIK), the only choice is to parse numbers with decimal digits as floats. As most JSON parsers do, although some give you a choice. If I recall correctly this was the consensus.

So the major blocker is that we don't have float parsing. What is missing is basically float parsing, minus the actual parsing (which is the easy part and already done). Building a binary float from decimal integer+fraction is easy on the surface, but hard to get right in respect to rounding and stable round-tripping. I think it's very important given the Grain's primary target (business logic and smart contracts). I was planning on porting some existing code to achieve this, but haven't even started and haven't looked at the matter since February.

ospencer · 2022-04-10T20:00:32Z

@cician gotcha. If you want to rebase off of the bigint branch you definitely can, and the format is pretty simple. But otherwise, yes, keep us updated on floats, and let us know if there's anything we can do to help.

phated · 2022-05-24T21:22:06Z

@cician thiccnums have landed if you want to update this with them!

marcusroberts · 2023-01-11T20:45:25Z

@cician are you still around to work on this? Otherwise I'll take your fine work and finish it off for 0.6

cician · 2023-01-15T12:51:13Z

Oscar reached out to me a few days ago saying he's willing to pick this up. I started to take a look again after I saw the float parsing PR merged, but haven't gotten far, so it's probably for the better you guys finish it, in order to merge it in time for v0.6. I leave below a few notes on different approaches I though of. Let me know if something is not clear here or in the parser code.

Currently the parser is structured internally for consuming a sequential stream of chars, one at a time. You can keep it this way for number parsing as well, but don't have to, since the API only accepts a String input. I've done it this way thinking of maybe extending the API in the future if we have a concept of data streams or something.

Solution A

Simple and dirty.

A valid JSON number token is a subset of valid inputs of Number.parseFloat. JSON doesn't allow NaNs, infinities or numbers with additional leading zeros, underscores etc.

So let the existing parser code in parseNumberValue function do the validation and just accumulate number token chars in a Buffer. Reuse the Buffer from JSONParserState already used for strings. At the end of the function just build a string from the buffer and call parseFloat. Remove the code that computes significand and exponent. Done.

This is the quickest solution to implement, but somewhat ugly and and slow. There's a cost of accumulating chars in the buffer, an allocation of a temporary String for each number, a temporary Bytes instance allocated by Buffer.toString (could be avoided btw) and redundant work of actual number parsing between the JSON parser and parseFloat. Also there's coupling with parsing logic of parseFloat, which is somewhat grain specific and not fully documented.

Solution B

Similar to solution A, but accumulate just the decimal digits in a Buffer and the decimal point offset. Use it to build the float like in the slow path of parseFloat.

Unintuitively this approach may actually be slower than solution A by skipping the fast path. Needs allocating an instance of the Decimal record with Bytes from the decimal buffer, but we skip allocating a String and we avoid coupling JSON parser with behavior of parseFloat. Building a Bytes instance could be avoided by allowing access to the raw Bytes instance backing the Buffer.

Solution C

Optimized parsing function similar to parseFloat with unsafe code, but tailored for JSON and working of a slice of memory instead of a string instance.

This would result in great performance, but other than using unsafe code, would probably require moving away from the idea of a sequential char stream for JSON, unless parseLongMantissa in the slow path can somehow resume without need to restart from the beginning of the string.

Solution D

A hybrid between solutions B and C as a compromise to keep sequential streaming logic. Fast path is tried by reimplementing parseFloatToParts directly inside parseNumberValue, but the buffer of chars is always filled from the beginning to be used in the slow path.

I was attempting solution B, but only gotten as far as filling the buffer of decimals and breaking up the parseLongMantissa function to have an overload that starts from a Decimal record instead of a String.

edit: typo

marcusroberts · 2023-01-15T13:40:41Z

Thanks @cician that's really useful and food for thought!

JairusSW · 2023-03-04T00:40:11Z

Great lib, @cician!
(I ended up passing development to jake because i'm new to grain)

spotandjake · 2023-03-31T20:03:04Z

Looks like the tests are passing which is good. This should be all ready for review a note is I wrote a TODO for eventually switching to a streaming based float parsing implementation that works on the indivdual chars rather then storing it in a buffer, so I need to open an issue for that and reference the issue in the comment.

A question for @cician is under the toString functions you have a FIXME magic numbers but I do not know what magic number's you are talking about because from what i understand NaN and Infintiy would be those numbers but those are already properly handled.

stdlib/json.gr

stdlib/json.md

phated

I think it's done folks! 🎉

Pretty sure these changes were made

peblair reviewed Mar 5, 2022

View reviewed changes

jozanza self-requested a review March 31, 2022 16:16

jozanza reviewed Apr 8, 2022

View reviewed changes

compiler/test/stdlib/json.test.gr Outdated Show resolved Hide resolved

jozanza approved these changes Apr 8, 2022

View reviewed changes

jozanza reviewed Apr 9, 2022

View reviewed changes

cician mentioned this pull request Apr 16, 2022

Floating point number parsing #1186

Closed

phated assigned cician, jozanza and ospencer and unassigned cician, jozanza and ospencer Nov 30, 2022

jozanza self-assigned this Feb 16, 2023

spotandjake assigned JairusSW Mar 5, 2023

spotandjake assigned spotandjake and unassigned cician Mar 17, 2023

spotandjake force-pushed the json branch from 627bc3a to ecc1ad8 Compare March 31, 2023 19:27

spotandjake marked this pull request as ready for review March 31, 2023 19:34

spotandjake requested review from ospencer and marcusroberts as code owners March 31, 2023 19:34

spotandjake added 18 commits March 2, 2024 20:40

feat: Merge Compact And Pretty Writer

d8352c5

Chore: Switch to using options

8a9ea7e

chore: Small optimization, avoid closure on common identation setting

3005912

chore: rename JsonWriterImplHelper

90f897f

chore: Cleanup

0cd8761

chore: More cleanup

af939e2

chore: Regen docs

ed92685

chore: Apply suggestions from code review

e2684f1

Chore: Apply suggestions from code review

7e86584

Chore: Improve Examples

d278186

chore: Begin switching to variant docs

6c34341

chore: Switch to variant docs

b3b6e70

chore: Apply suggestions from code review

936e5b3

chore: Apply suggestions from code review

14b74da

chore: Apply suggestions from code review

12d045d

chore: Apply changes for new syntax

2a35720

chore: Apply suggestions from code review

d17a615

chore: Apply suggestions from code review

738883f

spotandjake force-pushed the json branch from bfac593 to 738883f Compare March 3, 2024 01:41

phated reviewed Mar 3, 2024

View reviewed changes

stdlib/json.gr Outdated Show resolved Hide resolved

phated reviewed Mar 3, 2024

View reviewed changes

stdlib/json.md Outdated Show resolved Hide resolved

chore: Apply suggestions from code review

18f5f09

phated changed the title ~~feat(stdlib): JSON parsing and printing.~~ feat(stdlib): Add Json module Mar 4, 2024

phated mentioned this pull request Mar 4, 2024

Docs: Add variant docs to Json data structure #2057

Closed

phated approved these changes Mar 4, 2024

View reviewed changes

phated added this pull request to the merge queue Mar 4, 2024

Merged via the queue into grain-lang:main with commit 5a6e4c6 Mar 4, 2024
12 checks passed

phated mentioned this pull request Mar 4, 2024

Stdlib: Json #245

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stdlib): Add `Json` module #1133

feat(stdlib): Add `Json` module #1133

cician commented Feb 6, 2022

peblair left a comment

peblair Mar 5, 2022

cician Apr 10, 2022

cician Apr 16, 2022

spotandjake Jul 15, 2023

marcusroberts commented Mar 31, 2022

jozanza left a comment

cician commented Apr 10, 2022

ospencer commented Apr 10, 2022

phated commented May 24, 2022

marcusroberts commented Jan 11, 2023

cician commented Jan 15, 2023 •

edited

Loading

marcusroberts commented Jan 15, 2023

JairusSW commented Mar 4, 2023 •

edited

Loading

spotandjake commented Mar 31, 2023

phated left a comment

feat(stdlib): Add Json module #1133

feat(stdlib): Add Json module #1133

Conversation

cician commented Feb 6, 2022

peblair left a comment

Choose a reason for hiding this comment

peblair Mar 5, 2022

Choose a reason for hiding this comment

cician Apr 10, 2022

Choose a reason for hiding this comment

cician Apr 16, 2022

Choose a reason for hiding this comment

spotandjake Jul 15, 2023

Choose a reason for hiding this comment

marcusroberts commented Mar 31, 2022

jozanza left a comment

Choose a reason for hiding this comment

cician commented Apr 10, 2022

ospencer commented Apr 10, 2022

phated commented May 24, 2022

marcusroberts commented Jan 11, 2023

cician commented Jan 15, 2023 • edited Loading

Solution A

Solution B

Solution C

Solution D

marcusroberts commented Jan 15, 2023

JairusSW commented Mar 4, 2023 • edited Loading

spotandjake commented Mar 31, 2023

phated left a comment

Choose a reason for hiding this comment

feat(stdlib): Add `Json` module #1133

feat(stdlib): Add `Json` module #1133

cician commented Jan 15, 2023 •

edited

Loading

JairusSW commented Mar 4, 2023 •

edited

Loading