Skip to content

Proposal to add the word boundary assertions `\b{w}` and `\b{W}` to regular expressions in ECMAScript to match Unicode extended grapheme cluster boundaries.

Notifications You must be signed in to change notification settings

a455bcd9/es-regexp-unicode-extended-grapheme-custers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

ECMAScript proposal: Unicode word boundary assertions

Proposal to add the word boundary assertions \b{w} to regular expressions in ECMAScript with the u flag, to match Unicode extended grapheme cluster boundaries, as described in Unicode Technical Standard #18.

Status

Motivation

Characters that are matched by the short-hand character class \w are the characters that are treated as word characters by word boundaries. So /na\b/u.test('naïve') returns true. And yet nobody would normally consider in Unicode-aware contexts that there are two word boundaries in the middle of naïve or fiancée.

There currently is no way to access these Unicode character properties natively in ECMAScript regular expressions. This makes it painful for developers to support full Unicode in their regular expressions.

Proposed solution

We propose the addition of Unicode word boundaries of the form \b{w} and \B{W} available in regular expressions that have the u flag set. With this feature, the above regular expression could be written as:

/na\b{w}/u.test('naïve')
// → false

About

Proposal to add the word boundary assertions `\b{w}` and `\b{W}` to regular expressions in ECMAScript to match Unicode extended grapheme cluster boundaries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published