Skip to content

Conversation

@LaurenceJJones
Copy link
Member

@LaurenceJJones LaurenceJJones commented Nov 26, 2025

Implement a reverse domain trie for efficient host pattern matching, designed to scale for big deployments with hundreds/thousands of hosts.

Changes:

  • Add domainTrie data structure with O(m) lookup complexity where m is the number of domain segments
  • Hybrid approach: trie for simple patterns, filepath.Match fallback for complex
  • Priority system ensures most-specific-first matching behavior
  • Comprehensive tests and benchmarks

Benchmark results (4 mixed lookups per iteration):

Hosts Slice (old) Trie (new) Speedup
10 4,901 ns 432 ns 11x faster
100 53,221 ns 419 ns 127x faster
1,000 414,463 ns 428 ns 968x faster
10,000 3,835,689 ns 453 ns 8,468x faster

Note: For small deployments (1-4 hosts), the existing cache provides sufficient performance. The trie optimization primarily benefits large-scale deployments.

note for team: keeping this draft until needed

Implement a reverse domain trie for efficient host pattern matching,
designed to scale for MSSP deployments with hundreds/thousands of hosts.

Changes:
- Add domainTrie data structure with O(m) lookup complexity
- Hybrid approach: trie for simple patterns, filepath.Match fallback for complex
- Priority system ensures most-specific-first matching behavior
- Comprehensive tests and benchmarks

Benchmark results (4 mixed lookups per iteration):
| Hosts   | Slice (old) | Trie (new) | Speedup      |
|---------|-------------|------------|--------------|
| 10      | 4,901 ns    | 432 ns     | 11x faster   |
| 100     | 53,221 ns   | 419 ns     | 127x faster  |
| 1,000   | 414,463 ns  | 428 ns     | 968x faster  |
| 10,000  | 3,835,689 ns| 453 ns     | 8,468x faster|

Note: For small deployments (1-4 hosts), the existing cache provides
sufficient performance. The trie optimization primarily benefits
large-scale MSSP deployments.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a reverse domain trie for efficient host pattern matching, replacing the O(n) linear search with O(m) trie-based lookup where m is the domain depth. The optimization is designed to scale for large deployments with hundreds or thousands of host configurations.

  • Introduces a hybrid matching system: trie for simple patterns (exact, prefix/suffix wildcards), filepath.Match fallback for complex patterns (middle/embedded wildcards)
  • Implements a priority-based system to ensure most-specific-first matching regardless of insertion order
  • Maintains backward compatibility with existing API and behavior

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pkg/host/trie.go New reverse domain trie implementation with priority-based matching, pattern classification, and efficient O(m) lookup
pkg/host/root.go Integration of trie into Manager struct, updated MatchFirstHost to use trie, modified addHost/removeHost to manage trie and complexPatterns
pkg/host/root_test.go Comprehensive integration tests covering single/multiple hosts, priority ordering, wildcards, caching, and removal
pkg/host/benchmark_test.go Performance benchmarks comparing slice-based vs trie-based matching at various scales (10 to 10,000 hosts)
pkg/host/TRIE_IMPLEMENTATION.md Technical documentation explaining the trie structure, matching algorithm, priority system, and pattern classification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix exactMatchFound logic in trie findMatches
- Clarify removeHost comments for complex patterns
- Fix race condition: use sync.Map for thread-safe cache access
- Add proper type assertion check for cache retrieval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants