feat(dataset): hybrid IP storage with parallel batch processing #128

LaurenceJJones · 2025-11-27T09:45:00Z

Implement memory-efficient hybrid storage that separates individual IPs from CIDR ranges, significantly reducing memory usage for typical workloads.

Changes

Add IPMap using sync.Map for individual IP addresses (O(1) lookups)
Keep BART (RangeSet) only for CIDR ranges requiring LPM
Parallelize Add/Remove operations using sync.WaitGroup.Go() (Go 1.24+)
CheckIP() checks IPMap first, falls back to RangeSet for range matching

Memory Optimization

Most real-world deployments receive individual IP decisions, not ranges. BART's radix trie has significant per-entry overhead optimized for prefix matching. By storing individual IPs in a simple map:

1K IPs: BART ~598KB → Map significantly less
10K IPs: BART ~5.6MB → Map significantly less
50K IPs: BART ~26.5MB → Map significantly less

Performance

IPMap lookup: ~72 ns/op, 0 allocs
RangeSet lookup: ~69 ns/op, 0 allocs
Parallel batch processing for Add/Remove operations

Breaking Changes

None - API unchanged, internal storage optimization only.

Implement memory-efficient hybrid storage that separates individual IPs from CIDR ranges, significantly reducing memory usage for typical workloads. ## Changes - Add IPMap using sync.Map for individual IP addresses (O(1) lookups) - Keep BART (RangeSet) only for CIDR ranges requiring LPM - Parallelize Add/Remove operations using sync.WaitGroup.Go() (Go 1.24+) - CheckIP() checks IPMap first, falls back to RangeSet for range matching ## Memory Optimization Most real-world deployments receive individual IP decisions, not ranges. BART's radix trie has significant per-entry overhead optimized for prefix matching. By storing individual IPs in a simple map: - **1K IPs**: BART ~598KB → Map significantly less - **10K IPs**: BART ~5.6MB → Map significantly less - **50K IPs**: BART ~26.5MB → Map significantly less ## Performance - IPMap lookup: ~72 ns/op, 0 allocs - RangeSet lookup: ~69 ns/op, 0 allocs - Parallel batch processing for Add/Remove operations ## Breaking Changes None - API unchanged, internal storage optimization only.

Copilot

Pull request overview

This PR implements a memory-efficient hybrid IP storage system that separates individual IP addresses from CIDR ranges, using sync.Map for individual IPs and BART radix trie for ranges requiring longest prefix matching (LPM). The implementation includes parallel batch processing using sync.WaitGroup.Go() from Go 1.23+.

Key changes:

Introduced IPMap struct using sync.Map for O(1) individual IP lookups, significantly reducing memory overhead compared to storing IPs in BART
Modified DataSet to use both IPMap (for individual IPs) and RangeSet (for CIDR ranges), with CheckIP() checking IPMap first before falling back to RangeSet
Parallelized Add() and Remove() operations using sync.WaitGroup.Go() for concurrent batch processing across IP/range/country operations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
pkg/dataset/ipmap.go	New file implementing IPMap with sync.Map for memory-efficient individual IP storage with concurrent access support
pkg/dataset/root.go	Updated DataSet to use hybrid storage (IPMap + RangeSet), parallelized batch operations, and modified CheckIP to check IPMap before RangeSet
pkg/dataset/root_test.go	Updated tests to verify both IPMap and RangeSet initialization in DataSet
pkg/dataset/benchmark_test.go	Added benchmarks comparing hybrid vs BART-only storage and lookup performance tests for the hybrid approach

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pkg/dataset/ipmap.go

pkg/dataset/root.go

pkg/dataset/benchmark_test.go

pkg/dataset/ipmap.go

Address Copilot review comments: 1. Clone RemediationIdsMap before modifying in add() and remove() - Prevents race between Contains() readers and Add/Remove modifiers - sync.Map's Load→Modify→Store pattern is unsafe with mutable values 2. Fix version comment: sync.WaitGroup.Go was added in Go 1.23, not 1.24 3. Remove trailing whitespace in benchmark_test.go

More descriptive name since it now only handles CIDR ranges, while individual IPs are stored in IPMap.

* perf: lock-free reads for IPMap and CNSet Use atomic pointers for completely lock-free reads across all data structures. SPOA handlers never block, even during batch updates. Changes: 1. IPMap: atomic.Pointer per entry for individual IPs 2. CNSet: atomic.Pointer for entire country map (small, cloning is cheap) - Added NewCNSet() constructor for consistency - Added defensive nil checks in Add/Remove/Contains 3. BartRangeSet: already uses atomic pointer (unchanged) Concurrency model (consistent across all): - Reads: atomic pointer load (instant, never blocks) - Writes: clone → modify → atomic store (copy-on-write) - Readers always see consistent state (old or new, never partial) This ensures SPOA always has immediate access to decision data, regardless of ongoing batch updates from the stream bouncer. * refactor(ipmap): simplify addLocked - remove unreachable LoadOrStore fallback Since writeMu is held for the entire batch, no other writer can race. The LoadOrStore fallback code was unreachable - simplified to use Store directly. * Simplify RemediationMap: remove ID tracking, optimize cloning - Changed from map[Remediation][]RemediationDetails to map[Remediation]string - Removed ID tracking (LAPI only sends longest decisions) - Simplified Add/Remove methods (no ID parameters) - Optimized IPMap cloning: skip clone for empty maps and single-entry removals - Updated all callers to use new simplified API - Reduced memory allocations and GC pressure * Remove writeMu mutex from IPMap - single writer guarantee - Removed writeMu mutex (unnecessary with single writer thread) - Renamed addLocked/removeLocked to add/remove - Updated comments to reflect single-writer, multiple-reader model - Stream handler processes decisions sequentially, no concurrent writes - Atomic pointer operations handle reader-writer synchronization safely * Address Copilot review comments: remove unused ID parameters - Remove unused 'id' parameter from CNSet.Add and CNSet.Remove methods - Update addCN/removeCN helper functions to match new signatures - Update outdated comments: 'merging IDs' -> 'merging remediations' - Optimize CNSet.Add preallocation: remove +1 to avoid overallocation - Fix map initialization syntax in CNSet.Add * Remove unused ID fields from operation structs - Remove ID field from IPAddOp, IPRemoveOp, BartAddOp, BartRemoveOp - Remove id field from internal cnOp struct - Update all operation struct literals to remove ID assignments - Update benchmark tests to match new struct definitions - IDs are no longer tracked since LAPI behavior ensures only longest decisions * Add comprehensive metrics tests and optimize no-op handling - Add comprehensive test suite for metrics tracking (IPMap, BartRangeSet, CNSet) - Fix BartRangeSet no-op check to use exact prefix lookup (Get) instead of LPM - Add HasRemediation and GetOriginForRemediation methods to BartRangeSet for exact prefix matching - Optimize ipType assignment to only occur after no-op checks - Remove pre-allocation of operation slices to avoid wasting memory when many decisions are no-ops - Replace hardcoded metric increments with len(decisions) for better readability * Fix outdated comment: replace 'ID' with 'remediation' - Update comment in bart_types.go to reflect that IDs are no longer tracked - Addresses Copilot review feedback from PR #129 * Refactor Remove() to return error for cleaner duplicate delete handling - Add ErrRemediationNotFound sentinel error - Update RemediationMap.Remove() to return error instead of silently ignoring - Update all call sites to use errors.Is() for cleaner error checking - Simplify RemoveBatch logic - no need to check existence before calling Remove() - Metrics are only updated for actual removals, not duplicate deletes This makes the code easier to follow and more explicit about error handling.

…formance Replace prometheus.With(Labels{...}) with WithLabelValues() to avoid map allocation overhead. This aligns with the optimization from main branch. - Remove unused prometheus import - Update all metrics calls to use WithLabelValues - Add comments indicating label order for clarity

Resolved conflicts by keeping optimized hybrid storage structure: - Keep IPMap + BartRangeSet separation (hybrid storage) - Keep no-op checks and overwrite handling optimizations - Keep parallel batch processing - Incorporate string cloning from main for better GC - Use WithLabelValues for metrics performance

LaurenceJJones requested a review from Copilot November 27, 2025 09:45

Copilot started reviewing on behalf of LaurenceJJones November 27, 2025 09:45 View session

LaurenceJJones added this to the 0.2.1 milestone Nov 27, 2025

Copilot finished reviewing on behalf of LaurenceJJones November 27, 2025 09:48

Copilot AI reviewed Nov 27, 2025

View reviewed changes

LaurenceJJones added 6 commits November 27, 2025 09:52

refactor(dataset): rename BartUnifiedIPSet to BartRangeSet

c15c7d2

More descriptive name since it now only handles CIDR ranges, while individual IPs are stored in IPMap.

fix(lint): remove unused nolint directive in metrics tests

42d836a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(dataset): hybrid IP storage with parallel batch processing #128

feat(dataset): hybrid IP storage with parallel batch processing #128

Uh oh!

LaurenceJJones commented Nov 27, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(dataset): hybrid IP storage with parallel batch processing #128

Are you sure you want to change the base?

feat(dataset): hybrid IP storage with parallel batch processing #128

Uh oh!

Conversation

LaurenceJJones commented Nov 27, 2025

Changes

Memory Optimization

Performance

Breaking Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants