Skip to content

Conversation

khwilliamson
Copy link
Contributor

Various places in the code are using isWORDCHAR to match the continuation in an identifier. This mostly works, but the two sets are not identical, and the proper thing to do is to match continuation characters. The infrastructure was lacking this macro that would make it easy to do the right thing. This commit adds the infrastructure, leaving it to future commits to use it.

  • This set of changes does not require a perldelta entry.

A reasonably complete list of characters that differ between the two sets is:

MIDDLE DOT
GREEK YPOGEGRAMMENI
GREEK ANO TELEIA
COMBINING CYRILLIC HUNDRED THOUSANDS SIGN
COMBINING CYRILLIC MILLIONS SIGN
ARMENIAN MODIFIER LETTER LEFT HALF RING
ARMENIAN EMPHASIS MARK
NEW TAI LUE THAM DIGIT ONE
COMBINING PARENTHESES OVERLAY
COMBINING ENCLOSING CIRCLE
COMBINING ENCLOSING CIRCLE BACKSLASH
COMBINING ENCLOSING SCREEN
COMBINING ENCLOSING UPWARD POINTING TRIANGLE
MANDAIC LETTER AZ
ESTIMATED SYMBOL
CIRCLED LATIN CAPITAL LETTERs A..Z
VERTICAL TILDE
KATAKANA MIDDLE DOT
COMBINING CYRILLIC TEN MILLIONS SIGN
COMBINING CYRILLIC THOUSAND MILLIONS SIGN
ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
ARABIC LIGATURE JALLAJALALOUHOU
ARABIC FATHATAN ISOLATED FORM
ARABIC DAMMATAN ISOLATED FORM
ARABIC KASRATAN ISOLATED FORM
ARABIC FATHA ISOLATED FORM
ARABIC DAMMA ISOLATED FORM
ARABIC KASRA ISOLATED FORM
ARABIC SHADDA ISOLATED FORM
ARABIC SUKUN ISOLATED FORM
HALFWIDTH KATAKANA MIDDLE DOT
SQUARED LATIN CAPITAL LETTER A
SQUARED LATIN CAPITAL LETTER Z
NEGATIVE CIRCLED LATIN CAPITAL LETTERs A..Z
NEGATIVE SQUARED LATIN CAPITAL LETTERs A..Z

Various places in the code are using isWORDCHAR to match the
continuation in an identifier.  This mostly works, but the two sets are
not identical, and the proper thing to do is to match continuation
characters.  The infrastructure was lacking this macro that would make
it easy to do the right thing.  This commit adds the infrastructure,
leaving it to future commits to use it.

A reasonably complete list of characters that differ between the two
sets is:

MIDDLE DOT
GREEK YPOGEGRAMMENI
GREEK ANO TELEIA
COMBINING CYRILLIC HUNDRED THOUSANDS SIGN
COMBINING CYRILLIC MILLIONS SIGN
ARMENIAN MODIFIER LETTER LEFT HALF RING
ARMENIAN EMPHASIS MARK
NEW TAI LUE THAM DIGIT ONE
COMBINING PARENTHESES OVERLAY
COMBINING ENCLOSING CIRCLE
COMBINING ENCLOSING CIRCLE BACKSLASH
COMBINING ENCLOSING SCREEN
COMBINING ENCLOSING UPWARD POINTING TRIANGLE
MANDAIC LETTER AZ
ESTIMATED SYMBOL
CIRCLED LATIN CAPITAL LETTER A
...
CIRCLED LATIN SMALL LETTER Z
VERTICAL TILDE
KATAKANA MIDDLE DOT
COMBINING CYRILLIC TEN MILLIONS SIGN
COMBINING CYRILLIC THOUSAND MILLIONS SIGN
ARABIC LIGATURE SHADDA WITH DAMMATAN ISOLATED FORM
ARABIC LIGATURE SHADDA WITH SUPERSCRIPT ALEF ISOLATED FORM
ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM
ARABIC LIGATURE JALLAJALALOUHOU
ARABIC FATHATAN ISOLATED FORM
ARABIC DAMMATAN ISOLATED FORM
ARABIC KASRATAN ISOLATED FORM
ARABIC FATHA ISOLATED FORM
ARABIC DAMMA ISOLATED FORM
ARABIC KASRA ISOLATED FORM
ARABIC SHADDA ISOLATED FORM
ARABIC SUKUN ISOLATED FORM
HALFWIDTH KATAKANA MIDDLE DOT
SQUARED LATIN CAPITAL LETTER A
SQUARED LATIN CAPITAL LETTER Z
NEGATIVE CIRCLED LATIN CAPITAL LETTER A
...
NEGATIVE CIRCLED LATIN CAPITAL LETTER Z
NEGATIVE SQUARED LATIN CAPITAL LETTER A
...
NEGATIVE SQUARED LATIN CAPITAL LETTER Z
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant