-
Notifications
You must be signed in to change notification settings - Fork 3
feat(host): add trie-based host matching for scalability #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implement a reverse domain trie for efficient host pattern matching, designed to scale for MSSP deployments with hundreds/thousands of hosts. Changes: - Add domainTrie data structure with O(m) lookup complexity - Hybrid approach: trie for simple patterns, filepath.Match fallback for complex - Priority system ensures most-specific-first matching behavior - Comprehensive tests and benchmarks Benchmark results (4 mixed lookups per iteration): | Hosts | Slice (old) | Trie (new) | Speedup | |---------|-------------|------------|--------------| | 10 | 4,901 ns | 432 ns | 11x faster | | 100 | 53,221 ns | 419 ns | 127x faster | | 1,000 | 414,463 ns | 428 ns | 968x faster | | 10,000 | 3,835,689 ns| 453 ns | 8,468x faster| Note: For small deployments (1-4 hosts), the existing cache provides sufficient performance. The trie optimization primarily benefits large-scale MSSP deployments.
5ed07f5 to
d274a85
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a reverse domain trie for efficient host pattern matching, replacing the O(n) linear search with O(m) trie-based lookup where m is the domain depth. The optimization is designed to scale for large deployments with hundreds or thousands of host configurations.
- Introduces a hybrid matching system: trie for simple patterns (exact, prefix/suffix wildcards), filepath.Match fallback for complex patterns (middle/embedded wildcards)
- Implements a priority-based system to ensure most-specific-first matching regardless of insertion order
- Maintains backward compatibility with existing API and behavior
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/host/trie.go | New reverse domain trie implementation with priority-based matching, pattern classification, and efficient O(m) lookup |
| pkg/host/root.go | Integration of trie into Manager struct, updated MatchFirstHost to use trie, modified addHost/removeHost to manage trie and complexPatterns |
| pkg/host/root_test.go | Comprehensive integration tests covering single/multiple hosts, priority ordering, wildcards, caching, and removal |
| pkg/host/benchmark_test.go | Performance benchmarks comparing slice-based vs trie-based matching at various scales (10 to 10,000 hosts) |
| pkg/host/TRIE_IMPLEMENTATION.md | Technical documentation explaining the trie structure, matching algorithm, priority system, and pattern classification |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix exactMatchFound logic in trie findMatches - Clarify removeHost comments for complex patterns - Fix race condition: use sync.Map for thread-safe cache access - Add proper type assertion check for cache retrieval
Implement a reverse domain trie for efficient host pattern matching, designed to scale for big deployments with hundreds/thousands of hosts.
Changes:
Benchmark results (4 mixed lookups per iteration):
Note: For small deployments (1-4 hosts), the existing cache provides sufficient performance. The trie optimization primarily benefits large-scale deployments.
note for team: keeping this draft until needed