-
Notifications
You must be signed in to change notification settings - Fork 6
Upgrade to DataFusion 52 with Utf8View support and fix WAL metadata limits #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…imits - Update delta-rs to ffb794ba to include Utf8View predicate fixes - Migrate string types to Utf8View for better performance - Fix WAL metadata size limit by using hashed topic keys (16-char hex) - Add bincode serialization for WAL entries (schema-less, compact) - Remove unnecessary session state from DML operations - Add buffer_consistency_test.rs with comprehensive buffer/Delta tests - Update test utilities and assertions for Utf8View compatibility
Code Review: PR #14 - DataFusion 52 Upgrade with Utf8View SupportThank you for this substantial PR! The migration to DataFusion 52 and Utf8View is a significant improvement for performance. However, I've identified several critical issues that need attention before merging. 🔴 Critical Issues1. Hash Collision Risk in WAL (
|
…imits - Update delta-rs to ffb794ba to include Utf8View predicate fixes - Migrate string types to Utf8View for better performance - Fix WAL metadata size limit by using hashed topic keys (16-char hex with ahash) - Add bincode serialization for WAL entries (schema-less, compact) - Remove unnecessary session state from DML operations - Add buffer_consistency_test.rs with comprehensive buffer/Delta tests - Update test utilities and assertions for Utf8View compatibility
- Replace unsafe ArrayData::new_unchecked with validated try_new - Add MAX_BATCH_SIZE (100MB) limit to prevent unbounded allocation - Add WAL format versioning (v128) for future compatibility - Add exponential backoff to CAS loop to reduce CPU thrashing - Define named constants for magic numbers - Add support for LargeList, FixedSizeList, Map types in WAL
Pull Request Review: DataFusion 52 Upgrade with Utf8View Support and WAL Metadata FixesOverviewThis PR upgrades to DataFusion 52, migrates string types to Utf8View for better query performance, and fixes a critical WAL metadata size limitation. Overall, this is a well-structured and thoroughly tested PR with good attention to detail. ✅ Strengths1. Excellent Test Coverage
2. Clear Problem Identification & Solution
3. Performance Improvements
4. Code Quality
|
- Add SAFETY comment for unsafe env::set_var explaining why it's acceptable in the Delta table creation context (consistent values, early execution) - Extract duplicated schema coercion logic into scan_delta_table() and coerce_plan_to_schema() helpers, reducing ~60 lines of duplication - Fix convert_expr_to_delta comment to accurately describe the recursive tree transformation behavior
Pull Request Review: DataFusion 52 Upgrade with Utf8View SupportThis is a significant upgrade that modernizes the data type system and fixes critical WAL limitations. Overall, the implementation is solid with good attention to backward compatibility and testing. ✅ Strengths1. Excellent WAL Redesign (src/wal.rs)
2. Comprehensive Test Coverage (tests/buffer_consistency_test.rs)
3. Type Migration Handled Correctly
|
- Enable test_recovery by setting WALRUS_DATA_DIR env var - Use test_helpers for proper schema-compatible test batches - Add #[serial] to prevent test isolation issues - Improve error handling in wal.rs persist_topic() - Remove explicit shutdown to avoid premature WAL consumption
Pull Request Review: DataFusion 52 Upgrade with Utf8View and WAL FixesSummaryThis PR successfully upgrades to DataFusion 52 with Utf8View support and addresses critical WAL metadata size limitations. The implementation is well-structured with comprehensive test coverage (13 new tests in buffer_consistency_test.rs). ✅ Strengths1. Well-Designed WAL Format Evolution
2. Excellent WAL Metadata Fix
3. Robust Memory Reservation System
4. Comprehensive Test Coverage
5. Clean Bincode Migration
🔍 Code Quality ObservationsGood Practices
Areas for Consideration1. WAL Hash Collision Risk (Low Priority)Location: src/wal.rs:234-242 fn walrus_topic_key(project_id: &str, table_name: &str) -> String {
use ahash::AHasher;
let mut hasher = AHasher::default();
project_id.hash(&mut hasher);
table_name.hash(&mut hasher);
format!("{:016x}", hasher.finish())
}Issue: 64-bit hash has ~1 in 10^9 collision probability for 100K tables (birthday paradox). 2. CAS Retry Backoff Could Be Clearer (Documentation)Location: src/buffered_write_layer.rs:145 let backoff_micros = CAS_BACKOFF_BASE_MICROS << attempt.min(CAS_BACKOFF_MAX_EXPONENT);Issue: While capped at exponent 10 (1024μs), the logic is correct but could be clearer. 3. Memory Overhead Multiplier Accuracy (Documentation)Location: src/buffered_write_layer.rs:14-16 const MEMORY_OVERHEAD_MULTIPLIER: f64 = 1.2;Observation: This is an estimate that may vary by workload. 4. WAL Corruption Threshold Behavior (Design Question)Location: src/buffered_write_layer.rs:209-223 5. Utf8View Migration Path (Documentation)Observation: String columns now use Utf8View for better performance (zero-copy). 🔒 Security Considerations✅ No Critical Issues Found
⚡ Performance ConsiderationsPositive Changes
Potential Concerns
🧪 TestingCoverage Analysis
Missing Test Coverage (Suggestions)
📋 Minor SuggestionsCode Style
Documentation
🎯 Recommendations PriorityHigh Priority (Should Address)
Medium Priority (Consider for Follow-up)
Low Priority (Nice to Have)
🏁 ConclusionVerdict: ✅ APPROVED - This is a well-engineered PR that successfully addresses critical issues while maintaining backward compatibility and code quality. Key Achievements
Final RecommendationMerge after considering:
Great work on this upgrade! The attention to backward compatibility and testing is commendable. 🚀 Review generated by Claude Code - monoscope-tech/timefusion PR #14 |
PR Review: Upgrade to DataFusion 52 with Utf8View support and fix WAL metadata limitsGreat work on this comprehensive upgrade! This PR represents a significant modernization of the codebase with important performance and reliability improvements. Below is my detailed review. ✅ Strengths1. Well-Executed Type System MigrationThe migration from
2. Excellent Test CoverageThe new
3. Smart WAL OptimizationThe compact binary serialization format is a significant improvement:
4. Good DocumentationSafety comments throughout (especially around
|
Summary
ffb794bato include critical Utf8View predicate fixes for DataFusion 52project_id:table_nameKey Changes
DataFusion 52 Utf8View Support
Utf8 == Utf8Viewcomparison errorsUtf8Viewtype for zero-copy operations and better performanceschema_force_view_types=trueby defaultWAL Metadata Fix
project_id:table_nameto a 16-char hex hashWalEntrystill contains full project_id/table_name for recoveryNew Tests
buffer_consistency_test.rswith 13 comprehensive testsTest plan
cargo test --lib- 44 passed, 1 ignoredcargo test --test buffer_consistency_test- 13 passedcargo test test_dml- 5 passed