Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(jql-parser): switch to winnow #245

Merged
merged 1 commit into from
Dec 3, 2023
Merged

chore(jql-parser): switch to winnow #245

merged 1 commit into from
Dec 3, 2023

Conversation

yamafaktory
Copy link
Owner

@yamafaktory yamafaktory added the enhancement New feature or request label Dec 3, 2023
@yamafaktory yamafaktory self-assigned this Dec 3, 2023
@yamafaktory yamafaktory merged commit dab3434 into main Dec 3, 2023
@yamafaktory yamafaktory deleted the chore/winnow branch December 3, 2023 22:41
@@ -33,32 +32,29 @@ use crate::{
};

/// Parses the provided input and map it to the first matching token.
fn parse_fragment(input: &str) -> IResult<&str, Token> {
fn parse_fragment<'a>(input: &mut &'a str) -> PResult<Token<'a>> {
alt((
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When all (or most performance critical) elements of a alt have unique first characters, dispatch! can offer some big performance wins

e.g. see https://github.com/toml-rs/toml/blob/main/crates/toml_edit/src/parser/value.rs#L18

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds really good! I've just switched to it in a local branch and it works like a charm.

})(input)
pub(crate) fn parse_number(input: &mut &str) -> PResult<Index> {
digit1
.try_map(|index: &str| index.parse::<usize>().map(Index))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we do have Parser::parse_to

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually I tried it and it failed... now I get my error (simply needed to implement FromStr for Index).

{
trim(delimited(char('"'), take_until(r#"""#), char('"')))
pub(crate) fn parse_key<'a>(input: &mut &'a str) -> PResult<&'a str> {
trim(delimited(DOUBLE_QUOTE, take_until0(r#"""#), DOUBLE_QUOTE)).parse_next(input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you use take_until*, running cargo add winnow -F simd will re-enable optimizations that we made disabled by-default compared to nom

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, a big prevents it for &str, working on a fix

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5.22 is now out with the fix

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fast! Thanks a ton.

fn parse_keys(input: &str) -> IResult<&str, Vec<&str>> {
separated_list1(trim(tag(",")), trim(parse_key()))(input)
fn parse_keys<'a>(input: &mut &'a str) -> PResult<Vec<&'a str>> {
separated(1.., trim(parse_key), trim(COMMA)).parse_next(input)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You likely only need to trim either parse_key or COMMA and not both and then wrap the entire expression in a trim. That'll reduce the amount of redundant operations being performed which might offer a small speed bump.

Personally, I would move that wrapping trim into the caller.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another advantage of moving trim up is when you get to any alt if you can move the trim to the outside of the alt so it isn't re-doing a trim on every branch of the alt

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for the feedback! Now that I'm looking at my own code, I see how inefficient it is.

Comment on lines +99 to +103
trim(delimited(
trim(CURLY_BRACE_OPEN),
parse_keys,
trim(CURLY_BRACE_CLOSE),
))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Case in point about moving the wrapping trim from parse_keys into here. That'd show that

trim(delimited(
    trim(CURLY_BRACE_OPEN),
    trim(parse_key),
    trim(CURLY_BRACE_CLOSE),
))

is happening which has a bunch of redundant trims.

I'd switch to

delimited(
    CURLY_BRACE_OPEN,
    trim(parse_key),
    CURLY_BRACE_CLOSE,
)

with the caller also responsible for doing a trim

/// A combinator which parses an object range.
pub(crate) fn parse_object_range<'a>()
-> impl FnMut(&'a str) -> IResult<&'a str, (Option<Index>, Option<Index>)> {
pub(crate) fn parse_object_index(input: &mut &str) -> PResult<Vec<Index>> {
trim(delimited(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw if you wrap these in trace("object index", ...) calls, then debugging becomes really easy with cargo test -F winnow/debug

}

/// A combinator which parses any lens value.
pub(crate) fn parse_lens_value<'a>() -> impl FnMut(&'a str) -> IResult<&'a str, LensValue> {
pub(crate) fn parse_lens_value<'a>(input: &mut &'a str) -> PResult<LensValue<'a>> {
alt((
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could likely also be a dispatch!

@epage
Copy link

epage commented Dec 4, 2023

If you feel up to it, I'd love to hear an experience report of using winnow. This was done by the hcl-edit author (here) and it was a big help

Example topics

  • What helped out compared to past parser experience
  • What got in the way
  • What was confusing
  • Any suggestions for making things more obvious
  • Any thoughts on further improving naming

@@ -33,32 +32,29 @@ use crate::{
};

/// Parses the provided input and map it to the first matching token.
fn parse_fragment(input: &str) -> IResult<&str, Token> {
fn parse_fragment<'a>(input: &mut &'a str) -> PResult<Token<'a>> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to call out, if you are able to parse &[u8] instead of &str, that also offers big gains

@yamafaktory
Copy link
Owner Author

Thanks again @epage, I've created a follow-up PR based on your awesome feedback :).

N.B.: since the parser gets its input from Clap as query, I'm not sure if converting it to &[u8] is a good move or not (though the CLI also accepts to read the query as query_from_file but that not the main usage).

@epage
Copy link

epage commented Dec 4, 2023

N.B.: since the parser gets its input from Clap as query, I'm not sure if converting it to &[u8] is a good move or not (though the CLI also accepts to read the query as query_from_file but that not the main usage).

The source doesn't matter (for toml_edit we call as_bytes() on a &str). You will need to re-validate any &[u8] you get out to convert it to &str. For toml_edit, whenever the grammar is ASCII only, I use unsafe to bypass the validation. Depending on you aversion to unsafe and the cost of that validation, I'm not sure how much of a gain it will be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants