-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(stdlib): Add Json
module
#1133
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! I put down some thoughts, but this is really amazing work. Thank you so much for putting this together!
stdlib/json.gr
Outdated
* | ||
* @example print(parse("{\"currency\":\"$\",\"price\":119}")) | ||
*/ | ||
export let parse: String -> Result<JSON, JSONParseError> = (str: String) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want a parseOpt
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any options for parsing in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can only think of options relative to number parsing. And maybe some control over internal buffer sizes, but I'd rather not expose the buffer related stuff to allow future internal refactoring.
The user could want to parse into 32 bit floats or use a lax, but fast decimal to float conversion algorithm. Probably only interesting for relatively niche use cases with enormous data sets in JSON, for which Grain wouldn't be very efficient anyway since it boxes float numbers.
If Grain gains a BigDecimal number type in the future, then an option would be necessary to support both and maintain compatibility with default parsing behavior to floats. Though I don't even know if BigDecimal makes sense as a subtype of Number. If not, then this issue would not apply anyway as the current JSON
enum would not do the job. Alternatively to BigDecimal I could keep the current "hack" that uses rationals, but now with BigInt numerator/denominator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would have been nice, another potentional option would have been allowComments
, while it doesnt follow the spec it is something that is nice to have and could fit well into a parse option. But i think we came to the conclusion in the discord that we dont want to allow parsing options.
I like the API. The thing missing is a searchable data structure but I think you address that in "Since then I decided that it's OK for this API not to present a directly searchable data structure and that this job can be left to a higher level API built on top. No specific plans yet, suggestions welcome." which I agree with. Great work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we waiting for BigInt to land before landing this?
Sorry I left everyone in the limbo for more than a month and haven't made any progress on this. Now that I have again some time, I should make it more clear and explicit what is actually the blocker, even if it ends up being a bit of a recap of what have already been said in the past on Discord. The current plan for number parsing is: Integer numbers in the range -2^63..2^63 are parsed into Simple/Int32/Int64 variants of Number without problem. Greater integer numbers would require BigInt, unless we choose to parse these into 64 bit floats with precision loss, and even that has its limit (10^308). I don't think it would make sense with BigInt on the horizon. If we really want, we could ship with a version that returns error in these cases and later change it to parse into BigInt without changes to the API. Since there's currently no plan for inclusion of something like BigDecimal into Grain (AFAIK), the only choice is to parse numbers with decimal digits as floats. As most JSON parsers do, although some give you a choice. If I recall correctly this was the consensus. So the major blocker is that we don't have float parsing. What is missing is basically float parsing, minus the actual parsing (which is the easy part and already done). Building a binary float from decimal integer+fraction is easy on the surface, but hard to get right in respect to rounding and stable round-tripping. I think it's very important given the Grain's primary target (business logic and smart contracts). I was planning on porting some existing code to achieve this, but haven't even started and haven't looked at the matter since February. |
@cician gotcha. If you want to rebase off of the bigint branch you definitely can, and the format is pretty simple. But otherwise, yes, keep us updated on floats, and let us know if there's anything we can do to help. |
@cician thiccnums have landed if you want to update this with them! |
@cician are you still around to work on this? Otherwise I'll take your fine work and finish it off for 0.6 |
Oscar reached out to me a few days ago saying he's willing to pick this up. I started to take a look again after I saw the float parsing PR merged, but haven't gotten far, so it's probably for the better you guys finish it, in order to merge it in time for v0.6. I leave below a few notes on different approaches I though of. Let me know if something is not clear here or in the parser code. Currently the parser is structured internally for consuming a sequential stream of chars, one at a time. You can keep it this way for number parsing as well, but don't have to, since the API only accepts a String input. I've done it this way thinking of maybe extending the API in the future if we have a concept of data streams or something. Solution ASimple and dirty. A valid JSON number token is a subset of valid inputs of So let the existing parser code in This is the quickest solution to implement, but somewhat ugly and and slow. There's a cost of accumulating chars in the buffer, an allocation of a temporary String for each number, a temporary Bytes instance allocated by Solution BSimilar to solution A, but accumulate just the decimal digits in a Buffer and the decimal point offset. Use it to build the float like in the slow path of Unintuitively this approach may actually be slower than solution A by skipping the fast path. Needs allocating an instance of the Decimal record with Bytes from the decimal buffer, but we skip allocating a String and we avoid coupling JSON parser with behavior of Solution COptimized parsing function similar to This would result in great performance, but other than using unsafe code, would probably require moving away from the idea of a sequential char stream for JSON, unless Solution DA hybrid between solutions B and C as a compromise to keep sequential streaming logic. Fast path is tried by reimplementing I was attempting solution B, but only gotten as far as filling the buffer of decimals and breaking up the edit: typo |
Thanks @cician that's really useful and food for thought! |
Great lib, @cician! |
Looks like the tests are passing which is good. This should be all ready for review a note is I wrote a A question for @cician is under the |
Json
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's done folks! 🎉
This PR aims to include my JSON parser/printer (https://github.com/cician/grain-json) into stdlib. Hopefully for 0.5 release.
I'm opening this PR as a draft for now since I expect to still make some important changes once stack allocated chars and thick numbers are in. Strictly speaking no changes are really required after stack chars PR is merged, but code can be improved for clarity by using chars instead of code points as raw numbers. Thick numbers on the other hand are going to affect the behavior of number parsing.