Commit a85f11e
committed
Clarify UNICODE_ESCAPE valid token value
This clarifies the UNICODE_ESCAPE rule that the hex value must be a
valid Unicode scalar value. This resolves the problem that a string like
`"\u{ffffff}"` is not a valid token, but the grammar did not reflect
that.
I don't see a practical way to define this with character ranges. The
resulting expression is huge.
Note that this restriction means that the UNICODE_ESCAPE rule will not
match an invalid value, and that all the places where UNICODE_ESCAPE is
used, the preceding character must *not* be `\`, which forces those
rules to fail their match. In turn the only rules that contain
UNICODE_ESCAPE have `'` or `"` characters, which won't match any other
rule in the grammar, forcing them to fail the parse.
If all those assumptions seem too fragile, then we can consider adding
the [cut operator](rust-lang#2104)
just after the `\u` so that the interpretation is clear that a failure
to match the part from the opening brace is an immediate parse failure.1 parent 11f84ce commit a85f11e
1 file changed
+4
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
157 | 157 | | |
158 | 158 | | |
159 | 159 | | |
160 | | - | |
| 160 | + | |
161 | 161 | | |
162 | 162 | | |
| 163 | + | |
| 164 | + | |
163 | 165 | | |
164 | 166 | | |
165 | 167 | | |
| |||
196 | 198 | | |
197 | 199 | | |
198 | 200 | | |
199 | | - | |
| 201 | + | |
200 | 202 | | |
201 | 203 | | |
202 | 204 | | |
| |||
0 commit comments