ECMAScript proposal: Unicode word boundary assertions

Proposal to add the word boundary assertions \b{w} to regular expressions in ECMAScript with the u flag, to match Unicode extended grapheme cluster boundaries, as described in Unicode Technical Standard #18.

Status

Motivation

Characters that are matched by the short-hand character class \w are the characters that are treated as word characters by word boundaries. So /na\b/u.test('naïve') returns true. And yet nobody would normally consider in Unicode-aware contexts that there are two word boundaries in the middle of naïve or fiancée.

There currently is no way to access these Unicode character properties natively in ECMAScript regular expressions. This makes it painful for developers to support full Unicode in their regular expressions.

Proposed solution

We propose the addition of Unicode word boundaries of the form \b{w} and \B{W} available in regular expressions that have the u flag set. With this feature, the above regular expression could be written as:

/na\b{w}/u.test('naïve')
// → false

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECMAScript proposal: Unicode word boundary assertions

Status

Motivation

Proposed solution

About

Releases

Packages

Contributors 2

a455bcd9/es-regexp-unicode-extended-grapheme-custers

Folders and files

Latest commit

History

Repository files navigation

ECMAScript proposal: Unicode word boundary assertions

Status

Motivation

Proposed solution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages