This project is a Rust implementation of the grep
command-line tool, built as part of the Codecrafters "Build your own grep" challenge. This implementation provides a hands-on understanding of regular expression parsing, pattern matching algorithms, and how text search utilities work under the hood.
This grep implementation covers the fundamental pattern matching operations that grep performs:
- Character Class Matching: Support for
\d
(digits) and\w
(word characters) patterns - Character Groups: Both positive
[abc]
and negative[^abc]
character sets - Anchors: Start-of-line
^
and end-of-line$
matching - Quantifiers: One-or-more
+
and zero-or-one?
repetition operators - Wildcard Matching: Single character wildcard
.
support - Alternation: Support for
(pattern1|pattern2)
alternative matching - Literal Character Matching: Direct character matching for simple patterns
- Complex Pattern Sequences: Combining multiple pattern elements in sequence
- Modular Architecture: Clean separation between parsing (
parser.rs
) and matching (matcher.rs
) - Recursive Pattern Matching: Efficient pattern matching with proper backtracking
- Comprehensive Pattern Support: From simple literal matches to complex regex sequences with alternation
- Memory Safe: Leveraging Rust's ownership system for safe string processing
The grep implementation processes patterns through several key components:
- Converts string patterns into an enum-based AST representation
- Handles nested patterns like alternation groups
(a|b|c)
- Supports escape sequences and special characters
- Recursive parsing for complex pattern structures
match_here
function serves as the core matching engine- Handles different pattern types through pattern matching
- Implements backtracking for quantifiers and alternation
- Optimized character-by-character matching
Literal(char)
: Direct character matchingDigit
: Matches\d
patternsAlphanumeric
: Matches\w
patterns (including underscore)Group(bool, String)
: Character groups with positive/negative matchingWildcard
: Single character wildcard.
OneOrMore/ZeroOrOne
: Quantifier supportAlternation(Vec<Vec<Pattern>>)
: Alternative pattern matchingEndAnchor
: End-of-line$
matching
- Rust 1.80 or later (as specified in
Cargo.toml
)
-
Clone the repository:
git clone https://github.com/Md-Talim/codecrafters-grep-rust cd codecrafters-grep-rust
-
Build the project:
cargo build --release
General Usage:
./your_program.sh -E <pattern>
The program reads from stdin and exits with code 0 if the pattern matches, 1 if no match is found.
Pattern Examples:
Pattern Type | Example | Description | Matches |
---|---|---|---|
Character Classes | \d |
Any digit | 123 , a1b |
\w |
Any word character | hello , test_123 |
|
Character Groups | [abc] |
Any of a, b, or c | apple , bat |
[^xyz] |
Any char except x, y, z | apple , dog |
|
Anchors | ^hello |
Start of line | hello world |
world$ |
End of line | hello world |
|
Quantifiers | ca+t |
One or more 'a' | cat , caaat |
ca?t |
Zero or one 'a' | ct , cat |
|
Wildcard | c.t |
Any character between c and t | cat , cut , c1t |
Alternation | (cat|dog) |
Match either "cat" or "dog" | cat , dog , doggy |
-
Match digits:
echo "hello123" | ./your_program.sh -E "\d" # Exit code: 0 (match found)
-
Match alternation:
echo "I have a dog" | ./your_program.sh -E "(cat|dog)" # Exit code: 0 (dog found)
-
Match with quantifiers:
echo "caaat" | ./your_program.sh -E "ca+t" # Exit code: 0 (one or more 'a' found)
-
Anchor matching:
echo "hello world" | ./your_program.sh -E "^hello" # Exit code: 0 (line starts with "hello")
- Pattern Matching Algorithms: Understanding how regex engines parse and execute patterns
- Recursive Descent Parsing: Building an AST from pattern strings
- Backtracking: Implementing efficient backtracking for quantifiers and alternation
- Alternation Logic: Handling multiple alternative patterns within groups
- Ownership in String Processing: Managing string slices and character iterators safely
- Pattern Matching: Using Rust's powerful enum pattern matching for different regex patterns
- Iterator Patterns: Efficient character-by-character processing with Rust iterators
- Memory Safety: Leveraging Rust's ownership system for safe regex processing
- Alternation Parsing: Implementing proper parsing of
(pattern1|pattern2|pattern3)
syntax with nested groups - Backtracking Complexity: Ensuring efficient backtracking without infinite loops in quantifiers
- String Slice Management: Properly managing string slices and character indices in Rust
- Pattern Precedence: Handling operator precedence correctly in complex patterns
- The Codecrafters "Build your own grep" challenge provided the structure and test cases