-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(jql-parser): switch to winnow #245
Conversation
yamafaktory
commented
Dec 3, 2023
- Why -> https://docs.rs/winnow/latest/winnow/_topic/why/index.html
- Another potential benefit is the performance 🚀
@@ -33,32 +32,29 @@ use crate::{ | |||
}; | |||
|
|||
/// Parses the provided input and map it to the first matching token. | |||
fn parse_fragment(input: &str) -> IResult<&str, Token> { | |||
fn parse_fragment<'a>(input: &mut &'a str) -> PResult<Token<'a>> { | |||
alt(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When all (or most performance critical) elements of a alt
have unique first characters, dispatch!
can offer some big performance wins
e.g. see https://github.com/toml-rs/toml/blob/main/crates/toml_edit/src/parser/value.rs#L18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds really good! I've just switched to it in a local branch and it works like a charm.
})(input) | ||
pub(crate) fn parse_number(input: &mut &str) -> PResult<Index> { | ||
digit1 | ||
.try_map(|index: &str| index.parse::<usize>().map(Index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we do have Parser::parse_to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually I tried it and it failed... now I get my error (simply needed to implement FromStr
for Index
).
{ | ||
trim(delimited(char('"'), take_until(r#"""#), char('"'))) | ||
pub(crate) fn parse_key<'a>(input: &mut &'a str) -> PResult<&'a str> { | ||
trim(delimited(DOUBLE_QUOTE, take_until0(r#"""#), DOUBLE_QUOTE)).parse_next(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you use take_until*
, running cargo add winnow -F simd
will re-enable optimizations that we made disabled by-default compared to nom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, a big prevents it for &str
, working on a fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.5.22 is now out with the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was fast! Thanks a ton.
fn parse_keys(input: &str) -> IResult<&str, Vec<&str>> { | ||
separated_list1(trim(tag(",")), trim(parse_key()))(input) | ||
fn parse_keys<'a>(input: &mut &'a str) -> PResult<Vec<&'a str>> { | ||
separated(1.., trim(parse_key), trim(COMMA)).parse_next(input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You likely only need to trim either parse_key
or COMMA
and not both and then wrap the entire expression in a trim
. That'll reduce the amount of redundant operations being performed which might offer a small speed bump.
Personally, I would move that wrapping trim
into the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another advantage of moving trim
up is when you get to any alt
if you can move the trim
to the outside of the alt
so it isn't re-doing a trim
on every branch of the alt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton for the feedback! Now that I'm looking at my own code, I see how inefficient it is.
trim(delimited( | ||
trim(CURLY_BRACE_OPEN), | ||
parse_keys, | ||
trim(CURLY_BRACE_CLOSE), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Case in point about moving the wrapping trim from parse_keys
into here. That'd show that
trim(delimited(
trim(CURLY_BRACE_OPEN),
trim(parse_key),
trim(CURLY_BRACE_CLOSE),
))
is happening which has a bunch of redundant trims.
I'd switch to
delimited(
CURLY_BRACE_OPEN,
trim(parse_key),
CURLY_BRACE_CLOSE,
)
with the caller also responsible for doing a trim
/// A combinator which parses an object range. | ||
pub(crate) fn parse_object_range<'a>() | ||
-> impl FnMut(&'a str) -> IResult<&'a str, (Option<Index>, Option<Index>)> { | ||
pub(crate) fn parse_object_index(input: &mut &str) -> PResult<Vec<Index>> { | ||
trim(delimited( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw if you wrap these in trace("object index", ...)
calls, then debugging becomes really easy with cargo test -F winnow/debug
} | ||
|
||
/// A combinator which parses any lens value. | ||
pub(crate) fn parse_lens_value<'a>() -> impl FnMut(&'a str) -> IResult<&'a str, LensValue> { | ||
pub(crate) fn parse_lens_value<'a>(input: &mut &'a str) -> PResult<LensValue<'a>> { | ||
alt(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could likely also be a dispatch!
If you feel up to it, I'd love to hear an experience report of using winnow. This was done by the hcl-edit author (here) and it was a big help Example topics
|
@@ -33,32 +32,29 @@ use crate::{ | |||
}; | |||
|
|||
/// Parses the provided input and map it to the first matching token. | |||
fn parse_fragment(input: &str) -> IResult<&str, Token> { | |||
fn parse_fragment<'a>(input: &mut &'a str) -> PResult<Token<'a>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to call out, if you are able to parse &[u8]
instead of &str
, that also offers big gains
Thanks again @epage, I've created a follow-up PR based on your awesome feedback :). N.B.: since the parser gets its input from Clap as |
The source doesn't matter (for toml_edit we call |