Skip to content

Md-Talim/codecrafters-grep-rust

Repository files navigation

Grep Implementation in Rust

progress-banner

This project is a Rust implementation of the grep command-line tool, built as part of the Codecrafters "Build your own grep" challenge. This implementation provides a hands-on understanding of regular expression parsing, pattern matching algorithms, and how text search utilities work under the hood.

What This Project Does

This grep implementation covers the fundamental pattern matching operations that grep performs:

  • Character Class Matching: Support for \d (digits) and \w (word characters) patterns
  • Character Groups: Both positive [abc] and negative [^abc] character sets
  • Anchors: Start-of-line ^ and end-of-line $ matching
  • Quantifiers: One-or-more + and zero-or-one ? repetition operators
  • Wildcard Matching: Single character wildcard . support
  • Alternation: Support for (pattern1|pattern2) alternative matching
  • Literal Character Matching: Direct character matching for simple patterns
  • Complex Pattern Sequences: Combining multiple pattern elements in sequence

Key Features

  • Modular Architecture: Clean separation between parsing (parser.rs) and matching (matcher.rs)
  • Recursive Pattern Matching: Efficient pattern matching with proper backtracking
  • Comprehensive Pattern Support: From simple literal matches to complex regex sequences with alternation
  • Memory Safe: Leveraging Rust's ownership system for safe string processing

How It Works Internally

The grep implementation processes patterns through several key components:

1. Pattern Parsing (parser.rs)

  • Converts string patterns into an enum-based AST representation
  • Handles nested patterns like alternation groups (a|b|c)
  • Supports escape sequences and special characters
  • Recursive parsing for complex pattern structures

2. Pattern Matching (matcher.rs)

  • match_here function serves as the core matching engine
  • Handles different pattern types through pattern matching
  • Implements backtracking for quantifiers and alternation
  • Optimized character-by-character matching

3. Pattern Types

  • Literal(char): Direct character matching
  • Digit: Matches \d patterns
  • Alphanumeric: Matches \w patterns (including underscore)
  • Group(bool, String): Character groups with positive/negative matching
  • Wildcard: Single character wildcard .
  • OneOrMore/ZeroOrOne: Quantifier support
  • Alternation(Vec<Vec<Pattern>>): Alternative pattern matching
  • EndAnchor: End-of-line $ matching

How to Set Up and Run

Prerequisites

  • Rust 1.80 or later (as specified in Cargo.toml)

Installation

  1. Clone the repository:

    git clone https://github.com/Md-Talim/codecrafters-grep-rust
    cd codecrafters-grep-rust
  2. Build the project:

    cargo build --release

Running the Grep Implementation

General Usage:

./your_program.sh -E <pattern>

The program reads from stdin and exits with code 0 if the pattern matches, 1 if no match is found.

Pattern Examples:

Pattern Type Example Description Matches
Character Classes \d Any digit 123, a1b
\w Any word character hello, test_123
Character Groups [abc] Any of a, b, or c apple, bat
[^xyz] Any char except x, y, z apple, dog
Anchors ^hello Start of line hello world
world$ End of line hello world
Quantifiers ca+t One or more 'a' cat, caaat
ca?t Zero or one 'a' ct, cat
Wildcard c.t Any character between c and t cat, cut, c1t
Alternation (cat|dog) Match either "cat" or "dog" cat, dog, doggy

Example Usage

  1. Match digits:

    echo "hello123" | ./your_program.sh -E "\d"
    # Exit code: 0 (match found)
  2. Match alternation:

    echo "I have a dog" | ./your_program.sh -E "(cat|dog)"
    # Exit code: 0 (dog found)
  3. Match with quantifiers:

    echo "caaat" | ./your_program.sh -E "ca+t"
    # Exit code: 0 (one or more 'a' found)
  4. Anchor matching:

    echo "hello world" | ./your_program.sh -E "^hello"
    # Exit code: 0 (line starts with "hello")

What I Learned

Regular Expression Theory

  • Pattern Matching Algorithms: Understanding how regex engines parse and execute patterns
  • Recursive Descent Parsing: Building an AST from pattern strings
  • Backtracking: Implementing efficient backtracking for quantifiers and alternation
  • Alternation Logic: Handling multiple alternative patterns within groups

Rust-Specific Learning

  • Ownership in String Processing: Managing string slices and character iterators safely
  • Pattern Matching: Using Rust's powerful enum pattern matching for different regex patterns
  • Iterator Patterns: Efficient character-by-character processing with Rust iterators
  • Memory Safety: Leveraging Rust's ownership system for safe regex processing

Challenges & Solutions

  • Alternation Parsing: Implementing proper parsing of (pattern1|pattern2|pattern3) syntax with nested groups
  • Backtracking Complexity: Ensuring efficient backtracking without infinite loops in quantifiers
  • String Slice Management: Properly managing string slices and character indices in Rust
  • Pattern Precedence: Handling operator precedence correctly in complex patterns

Acknowledgments

About

Building grep from scratch to understand how regex engines work under the hood.

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •