globset: add POSIX character class support in bracket expressions#3281
Open
KevinKickass wants to merge 2 commits intoBurntSushi:masterfrom
Open
globset: add POSIX character class support in bracket expressions#3281KevinKickass wants to merge 2 commits intoBurntSushi:masterfrom
KevinKickass wants to merge 2 commits intoBurntSushi:masterfrom
Conversation
Expand POSIX classes (e.g. [[:space:]], [[:digit:]]) into ASCII char ranges at parse time. Token::Class remains unchanged, so this has no impact on fuzz/arbitrary targets. All 12 standard classes supported: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit. ASCII-only definitions, no locale-dependent behavior. Fixes BurntSushi#2962
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2962
This adds support for POSIX character classes (
[[:space:]],[[:digit:]], etc.) inside bracket expressions in globset, as defined in POSIX XBD §9.3.5. This is consistent with how git's wildmatch handles POSIX classes in glob patterns (see also git's t3070 tests).Approach: POSIX classes are expanded into ASCII char ranges at parse time, so
Token::Classremains unchanged. Atry_parse_posix_class()method saves/restores parser state on failure, letting invalid class names (e.g.[[:bogus:]]) fall through gracefully — the[is treated as a literal.All 12 standard POSIX classes are supported (
alnum,alpha,blank,cntrl,digit,graph,lower,print,punct,space,upper,xdigit), ASCII-only to avoid locale-dependent behavior.POSIX classes compose naturally with existing syntax:
[[:digit:]a-f][[:digit:][:alpha:]][^[:digit:]]44 new tests covering parsing, matching, negation, mixed ranges, and real-world filename patterns. Full test suite passes (globset + ripgrep).
Prior art: PR #3210 attempted this but was closed due to CI failures from adding a new
Tokenvariant. This PR avoids that entirely by expanding at parse time.