Skip to content

Add Unicode and escape sequence support#624

Draft
FrancoisLaferriere wants to merge 1 commit intopotassco:wip-20from
FrancoisLaferriere:string-escapes
Draft

Add Unicode and escape sequence support#624
FrancoisLaferriere wants to merge 1 commit intopotassco:wip-20from
FrancoisLaferriere:string-escapes

Conversation

@FrancoisLaferriere
Copy link
Copy Markdown

Refs #123

Changes

Added support for \uXXXX, \t, and \r escape sequences in strings.

  • Lexer: Added \t, \r, and \uXXXX patterns to STRING and FLIT (f-strings)
  • unquote(): Added handling for \t, \r, and \uXXXX escapes
  • removed unused quote()
  • PrintQuoted: Added \r escape handling
  • Tests: Added tests for all new escape sequences

Questions

  • Should \uXXXX be converted to UTF-8 bytes in the internal representation (current behavior), or should it be preserved and printed back as \uXXXX? Currently:

    • "caf\u00E9" → internal UTF-8 → output "café"
  • Currently, escape sequences in f-string literals are NOT processed, they pass through literally: f"\n" outputs "\\n"
    Should we process escape sequences in f-string literals via unquote(), or keep the current behavior where they are passed through literally?

- Updated lexer to recognize \uXXXX, \t, and \r escape sequences in
  STRING and FLIT (f-strings)
- Updated unquote() to parse \uXXXX and output UTF-8
- Updated PrintQuoted to output \r escape sequence
- Removed unused quote() function
- Added tests for escape sequences in strings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant