Add Hollow Protocol Buffers Adapter#791
Open
abersnaze wants to merge 7 commits intoNetflix:masterfrom
Open
Conversation
Implements a new adapter for Protocol Buffers that provides automatic schema inference and type mapping to Hollow, similar to HollowObjectMapper for Java objects. Key Features: - Automatic schema generation from proto Descriptors - Support for all proto scalar types, nested messages, and collections - Custom proto options for Hollow-specific configuration - Primary key extraction via hollow_primary_key message option - List ordering control via ignoreListOrdering() for better deduplication API Design: HollowMessageMapper provides a fluent API matching HollowObjectMapper patterns: - mapper.add(protoMessage) for adding messages - mapper.ignoreListOrdering() for unordered list deduplication (20-40% memory savings) - Automatic handling of repeated fields, nested messages, and oneof fields Custom Proto Options: - hollow_primary_key (message): Define primary key fields for type identity - hollow_ignore_list_ordering (field): Reserved for future per-field control Implementation Details: - HollowMessageMapper: Main API for schema inference and message mapping - HollowProtoAdapter: Processes proto messages into Hollow write records - Supports passthrough types and canonical object field mappings - List ordering control sorts element ordinals before writing for deduplication Build Configuration: - Gradle tasks for proto compilation (main and test protos) - CI workflow updated to install protoc - Fixed task dependencies for sourcesJar and licenseMain Testing: - Comprehensive test coverage for message processing, primary keys, and list ordering - Test proto definitions with various field types and nested structures - Verification of deduplication behavior with ignoreListOrdering enabled Documentation: - README with usage examples, API patterns, and best practices - Javadoc on all public APIs - Notes on when to use list ordering control This adapter enables efficient Protocol Buffer integration with Hollow's in-memory data infrastructure, providing memory-optimized storage with automatic type evolution and deduplication. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ppers
This commit adds proto equivalents for @HollowTypeName and @HollowInline annotations, plus proper handling of google.protobuf wrapper types.
Features added:
- hollow_type_name option: Creates namespaced wrapper types for memory-efficient reference bit allocation (e.g., ProductTitle instead of String)
- hollow_inline option: Forces boxed types to be inlined as primitives instead of referenced
- google.protobuf.*Value unwrapping: Int32Value, StringValue, etc. unwrap to references to Hollow wrapper types (Integer, String) or inline primitives when hollow_inline is set
- google.protobuf.Struct and Value: Treated as references like boxed types in Java
Implementation:
- Extended hollow_options.proto with hollow_type_name and hollow_inline field options
- Updated HollowMessageMapper to detect *Value types, unwrap them, and create appropriate schemas with namespaced types
- Updated HollowProtoAdapter to unwrap *Value messages and wrap primitive values for namespaced types
- Added comprehensive tests for all three features (Product, Account, Document messages)
Technical details:
- *Value types (Int32Value, StringValue, etc.) are completely unwrapped and never create Hollow schemas
- Namespaced wrapper types create single-field wrapper schemas (e.g., ProductTitle { String value })
- google.protobuf.Value and google.protobuf.Struct are NOT unwrapped - they remain as message types
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated hollow_primary_key option from repeated string to a message type
to enable cleaner array syntax: {fields: ["account_id", "region"]}
Changes:
- Added HollowPrimaryKey message type with repeated string fields
- Updated hollow_primary_key option to use HollowPrimaryKey message type
- Updated code to read pkOption.getFieldsList() instead of direct list
- Updated test proto to use new syntax: {fields: ["id"]}
- Added composite primary key example to Account message
- Updated javadoc with both single and composite key examples
Benefits:
- Backwards compatible (field order preserved in array)
- Cleaner syntax matching Java annotation style
- More readable than repeated option lines
- Extensible for future fields in HollowPrimaryKey message
Syntax comparison:
- Old: option (hollow_primary_key) = "id"; (repeated not usable)
- New: option (hollow_primary_key) = {fields: ["account_id", "region"]};
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two improvements based on code review feedback:
1. Cache HollowProtoAdapter instances per type (performance)
- Added adapters map to cache adapters by type name
- Reuse cached adapter instead of creating new one per add() call
- Matches HollowObjectMapper pattern which caches type mappers
- Significant performance improvement for bulk message additions
2. Use FieldType.BYTES for proto BYTES fields (correctness)
- Changed proto BYTES to map to Hollow FieldType.BYTES (not STRING)
- Updated schema generation to create BYTES fields
- Updated data writing to use setBytes() with toByteArray()
- Updated data reading to use getBytes() and ByteString.copyFrom()
- Fixed wrapper type names: BytesValue -> "Bytes" (not "String")
- Separated BYTES from STRING in all switch statements
3. Document UINT32/UINT64 mapping (clarification)
- Added comment referencing protobuf spec for unsigned integer mapping
- "In Java, unsigned integers are represented using their signed
counterparts, with the top bit simply being stored in the sign bit"
- https://protobuf.dev/programming-guides/proto3/#scalar
Before:
- New adapter created on every add() call
- BYTES stored as STRING (incorrect, lost binary data)
After:
- Adapters cached and reused per type
- BYTES stored as BYTES (correct, preserves binary data)
- Clear documentation on unsigned integer handling
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ith conflict detection
Add support for google.protobuf.Struct, Value, and ListValue using lazy
schema creation to handle their intentional circular references. These
types have schemas that reference each other (e.g., Struct contains Value,
Value can contain Struct), which Hollow doesn't support directly.
Key changes:
1. Lazy Schema Creation Pattern:
- Skip Struct/Value/ListValue during initial schema generation
- Create schemas on-demand when actual data is first encountered
- Breaks circular dependency while maintaining full functionality
- Safe for Hollow's snapshot/delta model (validated in tests)
2. Struct Field Type Conflict Detection:
- Tracks field types across all Struct instances
- Validates that the same field name has consistent types
- Throws IllegalStateException if conflicts are detected
- Prevents schema inconsistencies at data write time
- Example: If "age" is a number in one instance, it must be a number in all instances
3. Dynamic Schema Discovery:
- Enhanced HollowProtoAdapter to check state engine for schemas not
in local cache
- Dynamically populate hollowSchemas, hollowWriteRecords, and
canonicalObjectFieldMappings as needed
- Mapper callback allows adapter to trigger lazy schema creation
4. Value Union Schema:
- Fixed schema with all possible oneof fields (nullValue, numberValue,
stringValue, boolValue, structValue, listValue)
- Accommodates any field having any type at runtime within a single instance
- Conflict detection ensures consistency across instances
5. Struct Representation:
- Object with single "fields" reference to MapOfStringToValue
- Fields map contains dynamic key-value pairs
6. Consolidated Test Suite:
- Moved metadata field from Document to Person (Document deleted)
- Changed test_person.json to array of 6 diverse people
- Comprehensive test covering all features:
* Lazy schema creation (Struct/Value/ListValue)
* Multiple instances with different field sets
* Nested Struct values
* Null value handling
* ListValue for arrays
* Empty/missing metadata
* hollow_type_name option (namespaced wrappers)
* hollow_inline option (inline vs reference)
* google.protobuf.*Value types
- Dedicated conflict detection test validates field type enforcement
7. Snapshot/Delta Validation:
- Added testLazySchemaCreationAcrossCycles() to validate lazy schemas
work correctly across multiple snapshot cycles
- Verifies schemas can appear dynamically and consumers receive updates
- Tests both data from before and after schema creation
Documentation:
- Added comprehensive javadoc to HollowMessageMapper explaining lazy
schema creation approach, conflict detection, and circular reference detection
- Documented test methods with detailed coverage information
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Make HollowMessageMapper and HollowProtoAdapter thread-safe for concurrent use: HollowMessageMapper: - Replace HashMap/HashSet with ConcurrentHashMap/ConcurrentHashMap.newKeySet() - Use atomic putIfAbsent() for conflict detection in validateStructFields() - Add double-checked locking for schema creation (Value/Struct/ListValue) - Add lock objects (valueSchemaLock, structSchemaLock, listValueSchemaLock) - Document thread safety guarantees in class javadoc HollowProtoAdapter: - Replace HashMap with ConcurrentHashMap for hollowSchemas and canonicalObjectFieldMappings - Use atomic computeIfAbsent() for dynamic schema lookup in parseMessage() - Use atomic computeIfAbsent() for object field mapping creation Thread Safety Mechanisms: 1. ConcurrentHashMap for all shared state collections 2. Atomic operations (putIfAbsent, computeIfAbsent) for check-then-act patterns 3. Double-checked locking for expensive schema creation 4. ThreadLocal already used appropriately for write records The implementation ensures: - No race conditions in conflict detection - No duplicate schema creation under concurrent load - Safe concurrent access to dynamic schema lookups - Proper visibility of changes across threads 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ta corruption UINT32 and UINT64 map to signed INT/LONG in Hollow. Large unsigned values (> 2^31-1 or 2^63-1) appear as negative numbers, which can cause subtle bugs and data corruption if not understood. Changes: 1. New Proto Option: hollow_unsigned_to_signed - Field-level annotation to acknowledge unsigned-to-signed mapping - REQUIRED for all uint32 and uint64 fields - Documents the limitation with clear warning about negative values 2. Validation in HollowMessageMapper: - validateUnsignedField() checks for required annotation - Throws IllegalStateException with helpful error message if missing - Error includes example of how to add the annotation 3. Updated hollow_options.proto: - Added hollow_unsigned_to_signed field option (ID: 50004) - Comprehensive documentation about the limitation - Example: uint32 value 3000000000 becomes -1294967296 as signed int32 4. Test Coverage: - testUnsignedFieldValidation(): Verifies fields without annotation are rejected - testUnsignedFieldWithAnnotation(): Verifies annotated fields are accepted - BadUnsigned message in test_person.proto: Negative test case - Person message: Positive test case with visitor_count/total_revenue fields Error Message Example: "Field 'counter' in message 'BadUnsigned' is uint32 which maps to signed INT. Large unsigned values (> 2^31-1) will appear as negative numbers. To acknowledge this limitation and allow the field, add: [(com.netflix.hollow.hollow_unsigned_to_signed) = true]" This safety check prevents accidental misuse of unsigned integers and ensures developers explicitly acknowledge the mapping behavior. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sunjeet
approved these changes
Feb 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a new adapter for Protocol Buffers that provides automatic schema inference and type mapping to Hollow, similar to HollowObjectMapper for Java objects.
Key Features
hollow_primary_key)hollow_primary_keymessage optionignoreListOrdering()for better deduplication (20-40% memory savings)API Design
HollowMessageMapperprovides a fluent API matchingHollowObjectMapperpatterns:mapper.add(protoMessage)for adding messagesmapper.ignoreListOrdering()for unordered list deduplicationCustom Proto Options
hollow_primary_key(message): Define primary key fields for type identityImplementation Details
HollowMessageMapper: Main API for schema inference and message mappingHollowProtoAdapter: Processes proto messages into Hollow write recordsgoogle.protobufwell-known types with conflict detectionBuild Configuration
Testing
ignoreListOrderingenabledTest plan
Internal Review
This PR was reviewed internally at abersnaze#2 by @Sunjeet
🤖 Generated with Claude Code
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com