Regular expressions engine for batch processing.
Main features:
- store expressions in trie like data structure and match few expressions by one comparison
- bring several expressions into a unified form
- custom expression parsing
Cliche compile expressions to chain of nodes and than add this chain to tree. Every node have they own key. When adding a new chain to the tree and finding the same key, a new node isn't created, the new expression is simply added to it. This way, the tree tries to be as minimally branched as possible, which is beneficial when scanning text.
Cliche unify expression by few methods.
Character classes stored as range table. All expression bellow the same and have the same node in tree:
[a-z1-2][1-2a-z][12a-z][1a-z2][1-2[a-z]][[1-2][a-z]][12[a-z]][12a[b-z]]
Single character stored as character class too. All expression bellow the same and have them same node in tree:
a[a][aaaa][a-a][a-aa]
Quantificators unified too:
x+equalx{1,}x*equalx{0,}x?equalx{0,1}andx{,1}
Comments removed in simple cases.
For example x equal (?#123)x and stored the same.
Group options unified too. All expression bellow the same and have them same topology in tree:
(?i:y)eqaul(?i)(?:y)(?-i)(?i-m:test)equal(?i-m)(test)(?m-i)
Non-unique variants within an alternation are removed from it. All expression bellow the same and have them same node in tree:
(a|b|c)(a|b|c|c)(a|b|b|c)(a|a|b|c)(a|[a]|b|c)(a|[a]|[b]|[c])
You can see more examples here.
Result of unification - reusing one path or branch by more than one expressions. Scanner can match multiple expressions at once. Of course unification not change behaviour of tree or scanner results.
go get github.com/okneniz/clichepackage main
import (
"fmt"
"github.com/okneniz/cliche"
)
func main() {
tree := cliche.New(cliche.DefaultParser)
tree.Add(
"a[0123-9]+",
"a[01-5[67-9]]{1,}",
)
fmt.Println("tree:")
fmt.Println(tree.String())
text := "Text with a1, b, c32."
fmt.Println("scan text:", text)
for _, match := range tree.Match(text) {
fmt.Printf("text: %s\n", match.SubString())
fmt.Printf("bounds: %v\n", match.Span())
fmt.Println("regexps:")
for _, regexp := range match.Expressions() {
fmt.Printf("\t%v\n", regexp)
}
}
}Output:
tree:
[
{
"key": "[97]",
"type": "*node.class",
"nested": [
{
"key": "[R16(48-57)]+",
"type": "*node.quantifier",
"expressions": [
"a[0123-9]+",
"a[01-5[67-9]]{1,}"
],
"value": {
"key": "[R16(48-57)]",
"type": "*node.class"
}
}
]
}
]
scan text: Text with a1, b, c32.
text: a1
bounds: [10-11]
regexps:
a[01-5[67-9]]{1,}
a[0123-9]+
Cliche have default compabilities common for most regular expressions engine. You can configure your own or copy behaviour of exists engine.
|alternation(...)parentheses()group parts of a regular expression, allowing you to apply quantifiers or other operations to the group as a whole.[...]character class\escape (enable or disable meta character)- postfix expressions as quantifiers
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
- Please make sure you check your spelling and grammar.
- Create individual PR for each suggestion.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request