Skip to content

Add Hollow Protocol Buffers Adapter#791

Open
abersnaze wants to merge 7 commits intoNetflix:masterfrom
abersnaze:gsc/proto
Open

Add Hollow Protocol Buffers Adapter#791
abersnaze wants to merge 7 commits intoNetflix:masterfrom
abersnaze:gsc/proto

Conversation

@abersnaze
Copy link

Summary

Implements a new adapter for Protocol Buffers that provides automatic schema inference and type mapping to Hollow, similar to HollowObjectMapper for Java objects.

Key Features

  • Automatic schema generation from proto Descriptors
  • Support for all proto scalar types, nested messages, and collections
  • Custom proto options for Hollow-specific configuration (hollow_primary_key)
  • Primary key extraction via hollow_primary_key message option
  • List ordering control via ignoreListOrdering() for better deduplication (20-40% memory savings)

API Design

HollowMessageMapper provides a fluent API matching HollowObjectMapper patterns:

  • mapper.add(protoMessage) for adding messages
  • mapper.ignoreListOrdering() for unordered list deduplication
  • Automatic handling of repeated fields, nested messages, and oneof fields

Custom Proto Options

  • hollow_primary_key (message): Define primary key fields for type identity

Implementation Details

  • HollowMessageMapper: Main API for schema inference and message mapping
  • HollowProtoAdapter: Processes proto messages into Hollow write records
  • Supports passthrough types and canonical object field mappings
  • Lazy schema creation for google.protobuf well-known types with conflict detection
  • Thread-safe adapter caching

Build Configuration

  • Gradle tasks for proto compilation (main and test protos)
  • CI workflow updated to install protoc
  • Fixed task dependencies for sourcesJar and licenseMain

Testing

  • Comprehensive test coverage for message processing, primary keys, and list ordering
  • Test proto definitions with various field types and nested structures
  • Verification of deduplication behavior with ignoreListOrdering enabled

Test plan

  • Unit tests for all proto scalar types and collections
  • Tests for nested messages and oneof fields
  • Primary key extraction tests
  • List ordering control and deduplication verification
  • Thread safety tests for lazy schema creation
  • CI workflow passes with protoc installation

Internal Review

This PR was reviewed internally at abersnaze#2 by @Sunjeet

🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

abersnaze and others added 7 commits February 1, 2026 11:54
Implements a new adapter for Protocol Buffers that provides automatic schema
inference and type mapping to Hollow, similar to HollowObjectMapper for Java objects.

Key Features:
- Automatic schema generation from proto Descriptors
- Support for all proto scalar types, nested messages, and collections
- Custom proto options for Hollow-specific configuration
- Primary key extraction via hollow_primary_key message option
- List ordering control via ignoreListOrdering() for better deduplication

API Design:
HollowMessageMapper provides a fluent API matching HollowObjectMapper patterns:
- mapper.add(protoMessage) for adding messages
- mapper.ignoreListOrdering() for unordered list deduplication (20-40% memory savings)
- Automatic handling of repeated fields, nested messages, and oneof fields

Custom Proto Options:
- hollow_primary_key (message): Define primary key fields for type identity
- hollow_ignore_list_ordering (field): Reserved for future per-field control

Implementation Details:
- HollowMessageMapper: Main API for schema inference and message mapping
- HollowProtoAdapter: Processes proto messages into Hollow write records
- Supports passthrough types and canonical object field mappings
- List ordering control sorts element ordinals before writing for deduplication

Build Configuration:
- Gradle tasks for proto compilation (main and test protos)
- CI workflow updated to install protoc
- Fixed task dependencies for sourcesJar and licenseMain

Testing:
- Comprehensive test coverage for message processing, primary keys, and list ordering
- Test proto definitions with various field types and nested structures
- Verification of deduplication behavior with ignoreListOrdering enabled

Documentation:
- README with usage examples, API patterns, and best practices
- Javadoc on all public APIs
- Notes on when to use list ordering control

This adapter enables efficient Protocol Buffer integration with Hollow's
in-memory data infrastructure, providing memory-optimized storage with
automatic type evolution and deduplication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ppers

This commit adds proto equivalents for @HollowTypeName and @HollowInline annotations, plus proper handling of google.protobuf wrapper types.

Features added:
- hollow_type_name option: Creates namespaced wrapper types for memory-efficient reference bit allocation (e.g., ProductTitle instead of String)
- hollow_inline option: Forces boxed types to be inlined as primitives instead of referenced
- google.protobuf.*Value unwrapping: Int32Value, StringValue, etc. unwrap to references to Hollow wrapper types (Integer, String) or inline primitives when hollow_inline is set
- google.protobuf.Struct and Value: Treated as references like boxed types in Java

Implementation:
- Extended hollow_options.proto with hollow_type_name and hollow_inline field options
- Updated HollowMessageMapper to detect *Value types, unwrap them, and create appropriate schemas with namespaced types
- Updated HollowProtoAdapter to unwrap *Value messages and wrap primitive values for namespaced types
- Added comprehensive tests for all three features (Product, Account, Document messages)

Technical details:
- *Value types (Int32Value, StringValue, etc.) are completely unwrapped and never create Hollow schemas
- Namespaced wrapper types create single-field wrapper schemas (e.g., ProductTitle { String value })
- google.protobuf.Value and google.protobuf.Struct are NOT unwrapped - they remain as message types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated hollow_primary_key option from repeated string to a message type
to enable cleaner array syntax: {fields: ["account_id", "region"]}

Changes:
- Added HollowPrimaryKey message type with repeated string fields
- Updated hollow_primary_key option to use HollowPrimaryKey message type
- Updated code to read pkOption.getFieldsList() instead of direct list
- Updated test proto to use new syntax: {fields: ["id"]}
- Added composite primary key example to Account message
- Updated javadoc with both single and composite key examples

Benefits:
- Backwards compatible (field order preserved in array)
- Cleaner syntax matching Java annotation style
- More readable than repeated option lines
- Extensible for future fields in HollowPrimaryKey message

Syntax comparison:
- Old: option (hollow_primary_key) = "id";  (repeated not usable)
- New: option (hollow_primary_key) = {fields: ["account_id", "region"]};

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Two improvements based on code review feedback:

1. Cache HollowProtoAdapter instances per type (performance)
   - Added adapters map to cache adapters by type name
   - Reuse cached adapter instead of creating new one per add() call
   - Matches HollowObjectMapper pattern which caches type mappers
   - Significant performance improvement for bulk message additions

2. Use FieldType.BYTES for proto BYTES fields (correctness)
   - Changed proto BYTES to map to Hollow FieldType.BYTES (not STRING)
   - Updated schema generation to create BYTES fields
   - Updated data writing to use setBytes() with toByteArray()
   - Updated data reading to use getBytes() and ByteString.copyFrom()
   - Fixed wrapper type names: BytesValue -> "Bytes" (not "String")
   - Separated BYTES from STRING in all switch statements

3. Document UINT32/UINT64 mapping (clarification)
   - Added comment referencing protobuf spec for unsigned integer mapping
   - "In Java, unsigned integers are represented using their signed
     counterparts, with the top bit simply being stored in the sign bit"
   - https://protobuf.dev/programming-guides/proto3/#scalar

Before:
- New adapter created on every add() call
- BYTES stored as STRING (incorrect, lost binary data)

After:
- Adapters cached and reused per type
- BYTES stored as BYTES (correct, preserves binary data)
- Clear documentation on unsigned integer handling

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ith conflict detection

Add support for google.protobuf.Struct, Value, and ListValue using lazy
schema creation to handle their intentional circular references. These
types have schemas that reference each other (e.g., Struct contains Value,
Value can contain Struct), which Hollow doesn't support directly.

Key changes:

1. Lazy Schema Creation Pattern:
   - Skip Struct/Value/ListValue during initial schema generation
   - Create schemas on-demand when actual data is first encountered
   - Breaks circular dependency while maintaining full functionality
   - Safe for Hollow's snapshot/delta model (validated in tests)

2. Struct Field Type Conflict Detection:
   - Tracks field types across all Struct instances
   - Validates that the same field name has consistent types
   - Throws IllegalStateException if conflicts are detected
   - Prevents schema inconsistencies at data write time
   - Example: If "age" is a number in one instance, it must be a number in all instances

3. Dynamic Schema Discovery:
   - Enhanced HollowProtoAdapter to check state engine for schemas not
     in local cache
   - Dynamically populate hollowSchemas, hollowWriteRecords, and
     canonicalObjectFieldMappings as needed
   - Mapper callback allows adapter to trigger lazy schema creation

4. Value Union Schema:
   - Fixed schema with all possible oneof fields (nullValue, numberValue,
     stringValue, boolValue, structValue, listValue)
   - Accommodates any field having any type at runtime within a single instance
   - Conflict detection ensures consistency across instances

5. Struct Representation:
   - Object with single "fields" reference to MapOfStringToValue
   - Fields map contains dynamic key-value pairs

6. Consolidated Test Suite:
   - Moved metadata field from Document to Person (Document deleted)
   - Changed test_person.json to array of 6 diverse people
   - Comprehensive test covering all features:
     * Lazy schema creation (Struct/Value/ListValue)
     * Multiple instances with different field sets
     * Nested Struct values
     * Null value handling
     * ListValue for arrays
     * Empty/missing metadata
     * hollow_type_name option (namespaced wrappers)
     * hollow_inline option (inline vs reference)
     * google.protobuf.*Value types
   - Dedicated conflict detection test validates field type enforcement

7. Snapshot/Delta Validation:
   - Added testLazySchemaCreationAcrossCycles() to validate lazy schemas
     work correctly across multiple snapshot cycles
   - Verifies schemas can appear dynamically and consumers receive updates
   - Tests both data from before and after schema creation

Documentation:
- Added comprehensive javadoc to HollowMessageMapper explaining lazy
  schema creation approach, conflict detection, and circular reference detection
- Documented test methods with detailed coverage information

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Make HollowMessageMapper and HollowProtoAdapter thread-safe for concurrent use:

HollowMessageMapper:
- Replace HashMap/HashSet with ConcurrentHashMap/ConcurrentHashMap.newKeySet()
- Use atomic putIfAbsent() for conflict detection in validateStructFields()
- Add double-checked locking for schema creation (Value/Struct/ListValue)
- Add lock objects (valueSchemaLock, structSchemaLock, listValueSchemaLock)
- Document thread safety guarantees in class javadoc

HollowProtoAdapter:
- Replace HashMap with ConcurrentHashMap for hollowSchemas and canonicalObjectFieldMappings
- Use atomic computeIfAbsent() for dynamic schema lookup in parseMessage()
- Use atomic computeIfAbsent() for object field mapping creation

Thread Safety Mechanisms:
1. ConcurrentHashMap for all shared state collections
2. Atomic operations (putIfAbsent, computeIfAbsent) for check-then-act patterns
3. Double-checked locking for expensive schema creation
4. ThreadLocal already used appropriately for write records

The implementation ensures:
- No race conditions in conflict detection
- No duplicate schema creation under concurrent load
- Safe concurrent access to dynamic schema lookups
- Proper visibility of changes across threads

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ta corruption

UINT32 and UINT64 map to signed INT/LONG in Hollow. Large unsigned values
(> 2^31-1 or 2^63-1) appear as negative numbers, which can cause subtle bugs
and data corruption if not understood.

Changes:

1. New Proto Option: hollow_unsigned_to_signed
   - Field-level annotation to acknowledge unsigned-to-signed mapping
   - REQUIRED for all uint32 and uint64 fields
   - Documents the limitation with clear warning about negative values

2. Validation in HollowMessageMapper:
   - validateUnsignedField() checks for required annotation
   - Throws IllegalStateException with helpful error message if missing
   - Error includes example of how to add the annotation

3. Updated hollow_options.proto:
   - Added hollow_unsigned_to_signed field option (ID: 50004)
   - Comprehensive documentation about the limitation
   - Example: uint32 value 3000000000 becomes -1294967296 as signed int32

4. Test Coverage:
   - testUnsignedFieldValidation(): Verifies fields without annotation are rejected
   - testUnsignedFieldWithAnnotation(): Verifies annotated fields are accepted
   - BadUnsigned message in test_person.proto: Negative test case
   - Person message: Positive test case with visitor_count/total_revenue fields

Error Message Example:
"Field 'counter' in message 'BadUnsigned' is uint32 which maps to signed INT.
Large unsigned values (> 2^31-1) will appear as negative numbers.
To acknowledge this limitation and allow the field, add:
  [(com.netflix.hollow.hollow_unsigned_to_signed) = true]"

This safety check prevents accidental misuse of unsigned integers and ensures
developers explicitly acknowledge the mapping behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants