[text-format] Fix parsing of string literals #730

cmyr · 2024-06-17T17:40:48Z

This renames next_byte_value to next_str_lit_bytes and changes the signature so that it returns between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

(hopefully) fixes text_format parsing does not correctly handle non-ascii chars? #729

note: I'm not sure how best to add tests for this, and it needs it; in particular there should be a test case of a text-format input that contains a non-ascii string literal. There should probably also be more tests for the weird byte escapes? But definitely a case with non-ascii text.

stepancheg · 2024-06-26T01:26:30Z

Can you please add some test that would fail without this PR?

stepancheg · 2024-06-26T01:27:54Z

protobuf-support/src/lexer/lexer_impl.rs

+/// The raw bytes for a single char or escape sequence in a string literal
+///
+/// The raw bytes are available via an `into_iter` implementation.
+pub struct DecodedBytes {


This seems to be not used outside of the crate, so it should not be public.

It's the return type of a public method, so it needs to be pub. We could modify that signature to return impl Iterator, if that is preferable?

This renames `next_byte_value` to `next_str_lit_bytes` and may return between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

cmyr · 2024-06-26T16:54:23Z

I've added a test case that fails without this patch but passes with it.

Cherry-pick from #730

stepancheg · 2024-09-30T00:05:36Z

Merged in bdc1428.

stepancheg · 2024-09-30T00:53:53Z

It should be bumped tomorrow or so.

stepancheg requested changes Jun 26, 2024

View reviewed changes

[text-format] Fix parsing of string literals

59d6e61

This renames `next_byte_value` to `next_str_lit_bytes` and may return between 1..=4 bytes per call, representing the variable-length nature of the UTF-8 encoding.

cmyr force-pushed the parse-unicode-strings branch from 0eaddf2 to 59d6e61 Compare June 26, 2024 16:53

stepancheg added a commit that referenced this pull request Sep 30, 2024

Repro test non-ASCII text format parsing

82e76bc

Cherry-pick from #730

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[text-format] Fix parsing of string literals #730

[text-format] Fix parsing of string literals #730

cmyr commented Jun 17, 2024

stepancheg commented Jun 26, 2024

stepancheg Jun 26, 2024

cmyr Jun 26, 2024

cmyr commented Jun 26, 2024

stepancheg commented Sep 30, 2024

stepancheg commented Sep 30, 2024

[text-format] Fix parsing of string literals #730

Are you sure you want to change the base?

[text-format] Fix parsing of string literals #730

Conversation

cmyr commented Jun 17, 2024

stepancheg commented Jun 26, 2024

stepancheg Jun 26, 2024

Choose a reason for hiding this comment

cmyr Jun 26, 2024

Choose a reason for hiding this comment

cmyr commented Jun 26, 2024

stepancheg commented Sep 30, 2024

stepancheg commented Sep 30, 2024