Skip to content

Commit 1a36242

Browse files
jsbattigclaude
andcommitted
chore: bump version to 7.1.0 and update documentation
Version Update: - Bumped version from 7.0.1 to 7.1.0 README Changes: - Updated installation command to use v7.1.0 tag CHANGELOG Enhancements: - Added comprehensive regex pattern matching documentation - Documented new --rebuild-fts-index flag for index regeneration - Added performance comparison data from Evolution codebase testing - Documented FTS vs grep performance metrics showing CLI startup overhead - Added bug fixes section covering snippet extraction improvements - Documented test suite fixes (14 tests) for token-based regex This release includes the completed Full-Text Search epic with Tantivy integration, providing token-based regex search as a grep replacement with DFA-based engine for ReDoS immunity. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent df2fcc2 commit 1a36242

File tree

3 files changed

+65
-3
lines changed

3 files changed

+65
-3
lines changed

CHANGELOG.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2424

2525
**New CLI Flags**:
2626
- `cidx index --fts` - Build FTS index alongside semantic index
27+
- `cidx index --rebuild-fts-index` - Rebuild FTS index from existing semantic index
2728
- `cidx watch --fts` - Enable real-time FTS index updates
2829
- `cidx query --fts` - Use full-text search mode
30+
- `cidx query --fts --regex` - Token-based regex pattern matching (grep replacement)
2931
- `cidx query --fts --semantic` - Hybrid search (parallel execution)
3032
- `--case-sensitive` - Enable case-sensitive matching (FTS only)
3133
- `--case-insensitive` - Force case-insensitive matching (default)
@@ -61,12 +63,72 @@ pip install tantivy==0.25.0
6163
- Updated teach-ai templates with FTS syntax and examples
6264
- CLI help text includes all FTS options and examples
6365

66+
#### Regex Pattern Matching (Grep Replacement)
67+
68+
**Overview**: Token-based regex search providing 10-50x performance improvement over grep on indexed repositories (Python API mode).
69+
70+
**Core Features**:
71+
- **Token-based matching**: Regex operates on individual tokens (words) after Tantivy tokenization
72+
- **DFA-based engine**: Inherently immune to ReDoS attacks with O(n) time complexity
73+
- **Pre-compilation optimization**: Regex patterns compiled once per query, not per result
74+
- **Unicode-aware**: Character-based column calculation (not byte offsets) for proper multi-byte support
75+
76+
**Usage**:
77+
```bash
78+
# Simple token matching
79+
cidx query "def" --fts --regex
80+
81+
# Wildcard within tokens
82+
cidx query "test_.*" --fts --regex
83+
84+
# Language filtering
85+
cidx query "import" --fts --regex --language python
86+
87+
# Case-insensitive
88+
cidx query "todo" --fts --regex # Default case-insensitive
89+
```
90+
91+
**Limitations** (Token-Based):
92+
- ✅ Works: `def`, `login.*`, `test_.*`, `HTTP.*`
93+
- ❌ Doesn't work: `def\s+\w+`, `public.*class` (spans multiple tokens with whitespace)
94+
95+
**Performance** (Evolution Codebase):
96+
- FTS Python API: 1-4ms per query (warm index)
97+
- FTS CLI: ~1080ms per query (includes startup overhead)
98+
- Grep: ~150ms average for comparison
99+
100+
**Bug Fixes**:
101+
- Fixed regex snippet extraction showing query pattern instead of actual matched text
102+
- Fixed "Line 1, Col 1" bug - now reports correct absolute line/column positions
103+
- Fixed Unicode column calculation using character vs byte offsets
104+
- Added empty match validation with proper error messages for unsupported patterns
105+
106+
### Fixed
107+
108+
#### Critical Regex Snippet Extraction Bugs
109+
- **Match Text Display**: Regex queries now show actual matched text from source code, not the query pattern
110+
- Before: `Match: parts.*` (showing query)
111+
- After: `Match: parts` (showing actual match)
112+
- **Line/Column Positions**: Fixed always showing "Line 1, Col 1" - now reports correct absolute positions
113+
- Implementation: Proper `re.search()` for regex matching instead of literal string search
114+
- **Unicode Support**: Column calculation now uses character offsets instead of byte offsets
115+
- Handles multi-byte UTF-8 correctly (emoji, Japanese, French, etc.)
116+
- **Performance**: Regex pre-compilation moved outside result loop (7x improvement)
117+
118+
#### Test Suite Fixes
119+
- Fixed 14 failing tests in fast-automation.sh
120+
- Updated empty match validation tests to expect ValueError for unsupported patterns
121+
- Fixed regex optimization tests with correct token-based patterns
122+
- Updated documentation tests to exclude FTS planning documents
123+
- Fixed CLI tests to match actual remote query behavior
124+
64125
### Changed
65126

66127
- **CLI Help Text**: Enhanced `cidx query --help` with FTS examples and clear option descriptions
67-
- **Teach-AI Templates**: Updated `cidx_instructions.md` with FTS decision rules and examples
128+
- **Teach-AI Templates**: Updated `cidx_instructions.md` with FTS decision rules and regex examples
68129
- **README Structure**: Added "Full-Text Search (FTS)" section with usage guide and comparison table
69130
- **Version**: Bumped to 7.1.0 to reflect new major feature
131+
- **Plans**: Moved FTS epics to `plans/completed/` (fts-filtering and full-text-search)
70132

71133
### Technical Details
72134

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,7 @@ The code-indexer uses a sophisticated dual-phase parallel processing architectur
191191
### pipx (Recommended)
192192
```bash
193193
# Install the package
194-
pipx install git+https://github.com/jsbattig/code-indexer.git@v7.0.1
194+
pipx install git+https://github.com/jsbattig/code-indexer.git@v7.1.0
195195

196196
# Setup global registry (standalone command - requires sudo)
197197
cidx setup-global-registry

src/code_indexer/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,5 @@
66
through HNSW graph indexing (O(log N) complexity).
77
"""
88

9-
__version__ = "7.0.1"
9+
__version__ = "7.1.0"
1010
__author__ = "Seba Battig"

0 commit comments

Comments
 (0)