Grep Implementation in Go

This project is a Go implementation of the grep command-line tool, built as part of the Codecrafters "Build your own grep" challenge. This implementation provides a hands-on understanding of regular expression parsing, pattern matching algorithms, and how text search utilities work under the hood.

📌 What This Project Does

This grep implementation covers the fundamental pattern matching operations that grep performs:

Character Class Matching: Support for \d (digits) and \w (word characters) patterns
Character Groups: Both positive [abc] and negative [^abc] character sets
Anchors: Start-of-line ^ and end-of-line $ matching
Quantifiers: One-or-more + and zero-or-one ? repetition operators
Wildcard Matching: Single character wildcard . support
Literal Character Matching: Direct character matching for simple patterns
Complex Pattern Sequences: Combining multiple pattern elements in sequence

Note: I haven't implemented alternation (|) yet, but it's coming next.

✨ Key Features

Modular Architecture: Clean separation of concerns with multiple matcher types:
- Character Class Matcher: Handles \d and \w patterns
- Character Group Matchers: Support for both positive and negative character sets
- Literal Matcher: Direct character matching
- Core Matcher: Complex pattern sequences with anchors and quantifiers
Chain of Responsibility Pattern: Matchers are tried in order until one can handle the pattern
Comprehensive Pattern Support: From simple literal matches to complex regex sequences
Error Handling: Proper error reporting for invalid patterns and edge cases
Performance Optimized: Efficient pattern matching with minimal backtracking

🛠️ Why I Built This Project

I wanted to understand how pattern matching works at a fundamental level. This project served as an excellent opportunity to:

Demystify Regular Expressions: Understand how regex engines parse and execute patterns
Learn Pattern Matching Algorithms: Implement recursive descent parsing and backtracking
Practice Go Development: Work with string manipulation, interfaces, and error handling
Explore Compiler Theory: Understand lexical analysis and pattern recognition
Build Foundation Knowledge: Prepare for implementing more complex regex features

🔍 How It Works Internally

The grep implementation processes patterns through several key components:

1. Pattern Matching Engine

RegexEngine serves as the main entry point, implementing a chain of responsibility pattern
Simple patterns are handled by specialized matchers for optimal performance
Complex patterns fall through to the CoreMatcher for full regex processing

2. Specialized Matchers

CharacterClassMatcher: Handles \d and \w character classes using predicate functions
NegativeCharacterGroupMatcher: Processes [^abc] patterns by checking exclusion
PositiveCharacterGroupMatcher: Handles [abc] patterns using efficient character set lookups
LiteralMatcher: Direct character matching for single-character patterns

3. Core Pattern Processing

Recursive descent parsing for complex pattern sequences
Anchor handling for start (^) and end ($) of line matching
Quantifier support with proper greedy matching behavior
Backtracking for trying alternative match positions

4. Utility Functions

utils.go provides character classification functions (isDigit, isAlphabetic, isAlphanumeric)
Character predicate system for flexible pattern matching
Efficient character group validation

⚙️ How to Set Up and Run

Prerequisites

Go 1.24 or later (as specified in go.mod)

Installation

Clone the repository:

git clone https://github.com/codecrafters-io/grep-starter-go.git
cd grep-starter-go

Build the project:
```
go build -o mygrep app/*.go
```

Running the Grep Implementation

General Usage:

./mygrep -E <pattern>

The program reads from stdin and exits with code 0 if the pattern matches, 1 if no match is found, or 2 for errors.

Pattern Examples:

Pattern Type	Example	Description	Matches
Character Classes	`\d`	Any digit	`123`, `a1b`
	`\w`	Any word character	`hello`, `test_123`
Character Groups	`[abc]`	Any of a, b, or c	`apple`, `bat`
	`[^xyz]`	Any char except x, y, z	`apple`, `dog`
Anchors	`^hello`	Start of line	`hello world`
	`world$`	End of line	`hello world`
Quantifiers	`ca+t`	One or more 'a'	`cat`, `caaat`
	`ca?t`	Zero or one 'a'	`ct`, `cat`
Wildcard	`c.t`	Any character between c and t	`cat`, `cut`, `c1t`

Example Usage

Match digits:

echo "hello123" | ./mygrep -E "\d"
# Exit code: 0 (match found)

Match character group:

echo "apple" | ./mygrep -E "[aeiou]"
# Exit code: 0 (vowel found)

Match with quantifiers:

echo "cat" | ./mygrep -E "ca+t"
# Exit code: 0 (one or more 'a' found)

Anchor matching:

echo "hello world" | ./mygrep -E "^hello"
# Exit code: 0 (line starts with "hello")

No match example:

echo "hello" | ./mygrep -E "\d"
# Exit code: 1 (no digits found)

💡 What I Learned

Regular Expression Theory

Pattern Matching Algorithms: Understanding how regex engines parse and execute patterns
Finite Automata: The theoretical foundation behind pattern matching state machines
Backtracking: How regex engines handle multiple possible matches and optimization strategies
Greedy vs Lazy Matching: The behavior of quantifiers and their performance implications
Anchoring: How start and end anchors affect pattern matching strategy

String Processing in Go

Byte vs Rune Handling: Understanding the difference between byte-level and Unicode-aware string processing
Efficient String Operations: Using byte slices for performance-critical pattern matching
Character Classification: Implementing efficient character set validation
Pattern Compilation: Converting string patterns into executable matching logic

Software Architecture Patterns

Chain of Responsibility: Implementing a clean pattern for trying multiple matching strategies
Strategy Pattern: Different matchers for different pattern types
Interface Design: Creating clean abstractions for pattern matching operations
Error Handling: Proper error propagation in complex pattern matching scenarios

Test-Driven Development (TDD) with Go

Comprehensive Test Coverage: Testing all pattern types and edge cases
Table-Driven Tests: Using Go's idiomatic approach for testing multiple scenarios
Integration Testing: Full end-to-end testing of the grep application
Error Case Testing: Ensuring robust handling of invalid patterns and edge cases

The TDD approach was particularly valuable when implementing the quantifiers and anchors, as it helped ensure that complex pattern sequences worked correctly across different text inputs and pattern combinations.

🎯 Challenges & Solutions

Backtracking Complexity: Implementing efficient backtracking without infinite loops in complex patterns
Character Encoding: Handling both ASCII and Unicode characters properly in pattern matching
Performance Optimization: Balancing feature completeness with matching speed
Pattern Parsing: Building a robust parser that handles nested patterns and escape sequences
Edge Case Handling: Ensuring correct behavior for empty strings, boundary conditions, and malformed patterns

🚀 Future Enhancements

The next major milestone is implementing alternation (|), which will involve:

Alternative Parsing: Implementing proper parsing of (pattern1|pattern2) syntax
Capture Groups: Supporting parenthesized subpatterns for grouping
Backtracking Optimization: Enhancing the backtracking algorithm for alternation
Advanced Quantifiers: Adding support for {n,m} quantifier syntax
Extended Character Classes: Supporting \s, \S, \b, and other common classes

Additional planned features include:

Case-Insensitive Matching: Adding -i flag support
Multi-line Mode: Supporting ^ and $ in multi-line text
Lookahead/Lookbehind: Advanced assertion operators
Unicode Support: Full Unicode character class support

🙏 Acknowledgments

The Codecrafters "Build your own grep" challenge provided the structure and test cases
This project implements regular expression matching as described in "The Practice of Programming" by Kernighan and Pike

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.codecrafters		.codecrafters
app		app
.gitattributes		.gitattributes
README.md		README.md
codecrafters.yml		codecrafters.yml
go.mod		go.mod
go.sum		go.sum
mygrep.sh		mygrep.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Grep Implementation in Go

📌 What This Project Does

✨ Key Features

🛠️ Why I Built This Project

🔍 How It Works Internally

1. Pattern Matching Engine

2. Specialized Matchers

3. Core Pattern Processing

4. Utility Functions

⚙️ How to Set Up and Run

Prerequisites

Installation

Running the Grep Implementation

Example Usage

💡 What I Learned

Regular Expression Theory

String Processing in Go

Software Architecture Patterns

Test-Driven Development (TDD) with Go

🎯 Challenges & Solutions

🚀 Future Enhancements

🙏 Acknowledgments

About

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

Uh oh!

Md-Talim/codecrafters-grep-go

Folders and files

Latest commit

History

Repository files navigation

Grep Implementation in Go

📌 What This Project Does

✨ Key Features

🛠️ Why I Built This Project

🔍 How It Works Internally

1. Pattern Matching Engine

2. Specialized Matchers

3. Core Pattern Processing

4. Utility Functions

⚙️ How to Set Up and Run

Prerequisites

Installation

Running the Grep Implementation

Example Usage

💡 What I Learned

Regular Expression Theory

String Processing in Go

Software Architecture Patterns

Test-Driven Development (TDD) with Go

🎯 Challenges & Solutions

🚀 Future Enhancements

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages