-
Notifications
You must be signed in to change notification settings - Fork 0
Update RPC source adapter #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if checkpoint != nil && checkpoint.LastProcessedLedger > 0 { | ||
| log.Printf("Resuming from checkpoint: ledger %d", checkpoint.LastProcessedLedger) | ||
| currentLedger = checkpoint.LastProcessedLedger + 1 | ||
| totalProcessed = checkpoint.TotalProcessed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid skipping ledger when resuming from checkpoint
After each batch the loop sets currentLedger = batchEnd and later calls saveCheckpoint(currentLedger, …) so the checkpoint stores the next ledger to fetch. On restart this block resumes with currentLedger = checkpoint.LastProcessedLedger + 1, so the first ledger of the next batch is skipped entirely. A crash or manual restart will therefore lose one ledger of data. Persist the last processed ledger (e.g. batchEnd-1) or resume without adding one.
Useful? React with 👍 / 👎.
Code Review: PR #26 - Update RPC Source AdapterExecutive SummaryThis PR introduces significant enhancements to the RPC source adapter including checkpointing support, dual-mode operation (streaming/batch), and comprehensive documentation. The implementation shows good architectural design but contains several critical bugs that must be fixed before merging. Recommendation: 🔴 Request Changes - Critical data integrity issues identified 🎯 What This PR Does Well✅ Excellent Documentation: The README files ( 🚨 Critical Issues (MUST FIX)1. Data Loss Risk: Incorrect Ledger Advancement LogicLocation: ```go // Later in loop Problem: If `GetLedgers(100, 105)` requests 5 ledgers but RPC returns only 3, the code still advances `currentLedger` to 105, skipping ledgers 103-104 without processing them. Impact: Silent data loss - ledgers can be permanently skipped 2. Silent Ledger Skipping Without TrackingLocation: ```go Problem: Malformed ledgers from RPC are silently discarded with only a log message. No tracking of which ledgers were skipped. Impact: Data integrity violations without visibility 3. Checkpoint Resume Off-By-One RiskLocation: ```go Problem: Semantics unclear - is `LastProcessedLedger` the last fully processed ledger or the last one attempted? If checkpoint is saved mid-batch during a crash, this could lead to duplicate processing or skipped ledgers. Impact: Data integrity violations on recovery
4. Unsafe Type Conversions Without ValidationLocation: ```go Problem: No validation that `slFloat` is positive or within uint64 bounds. A negative value like `-1000` would convert to `18446744073709550616`. Impact: Invalid ledger numbers causing RPC failures
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request significantly enhances the RPC source adapter by introducing checkpointing for crash recovery, supporting both batch and streaming modes for ledger processing, and adding comprehensive pipeline configurations and documentation for WebSocket streaming and contract event processing.
Key Changes
- Implemented checkpoint-based crash recovery system with save/load functionality and configurable intervals
- Added batch mode (with defined start/end ledgers) and streaming mode (continuous processing) with automatic mode detection
- Introduced new pipeline configurations for various use cases including USDC transfers, contract events, and WebSocket streaming
Reviewed Changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| source_adapter_rpc.go | Core implementation of checkpointing, batch/streaming modes, and getLedgers loop with XDR decoding |
| config/examples/rpc-batch-mode.yaml | Batch mode example configuration with historical backfill settings and checkpoint configuration |
| config/examples/rpc-streaming-mode.yaml | Streaming mode example with continuous processing and frequent checkpoint saves |
| config/base/websocket_streaming.yaml | Real-time WebSocket streaming pipeline configuration for contract events |
| config/base/usdc_transfers_rpc.yaml | USDC transfer monitoring pipeline with mainnet configuration |
| config/base/contract_events_rpc.yaml | Generic contract events pipeline with multiple filter examples |
| config/base/test_websocket_client.py | Python test client for WebSocket server integration testing |
| config/base/README_WEBSOCKET.md | Comprehensive WebSocket consumer documentation with client examples |
| config/base/README_RPC_EVENTS.md | Complete guide for RPC event filtering and pipeline configuration |
| .gitignore | Added new documentation file to ignore list |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
source_adapter_rpc.go
Outdated
| "encoding/base64" | ||
| "encoding/json" | ||
| "fmt" | ||
| "io/ioutil" |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ioutil package is deprecated since Go 1.16. Use os.ReadFile and os.WriteFile instead.
Replace:
ioutil.ReadFile(path)withos.ReadFile(path)ioutil.WriteFile(tmpPath, data, 0644)withos.WriteFile(tmpPath, data, 0644)
Then remove the io/ioutil import.
| "io/ioutil" |
|
|
||
| for { | ||
| // In batch mode, check if we've reached the end | ||
| if !streamingMode && currentLedger >= endLedger { |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition currentLedger >= endLedger is incorrect for batch mode. If endLedger is meant to be exclusive (which is typical), this condition is correct. However, if endLedger is meant to be inclusive (the last ledger to process), this should be currentLedger > endLedger.
Additionally, when currentLedger == endLedger, the loop would exit without processing endLedger, which may not be the intended behavior. Please clarify the semantics of endLedger and adjust the condition accordingly.
| if !streamingMode && currentLedger >= endLedger { | |
| if !streamingMode && currentLedger > endLedger { |
| } else if totalProcessed%checkpointInterval == 0 { | ||
| // In batch mode, save based on interval |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The checkpoint save logic in batch mode has a potential issue. The condition totalProcessed%checkpointInterval == 0 will trigger when totalProcessed is an exact multiple of checkpointInterval, but this may not align with ledger boundaries as intended.
For example, if checkpointInterval=1000 and you process batches of 100 ledgers, the checkpoint would save at totalProcessed = 1000, 2000, etc. However, this means the checkpoint is saved after processing exactly 1000 ledgers total, regardless of when that occurs within a batch.
Consider whether the checkpoint should be based on ledger numbers instead (e.g., save every N ledgers by checking currentLedger % checkpointInterval == 0) for more predictable behavior.
| } else if totalProcessed%checkpointInterval == 0 { | |
| // In batch mode, save based on interval | |
| } else if checkpointInterval > 0 && currentLedger%checkpointInterval == 0 { | |
| // In batch mode, save checkpoint at ledger boundaries |
| // Rate limiting | ||
| if s.config.PollInterval > 0 { |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rate limiting sleep is applied unconditionally on every iteration, even in batch mode when processing historical data. This can significantly slow down backfills unnecessarily.
Consider only applying rate limiting in streaming mode or when explicitly configured:
// Rate limiting - only in streaming mode or when poll_interval > 0
if streamingMode || s.config.PollInterval > 0 {
time.Sleep(s.config.PollInterval)
}This allows batch mode to run at full speed (when poll_interval: 0 is set) while still supporting rate limiting when needed.
| // Rate limiting | |
| if s.config.PollInterval > 0 { | |
| // Rate limiting - only in streaming mode or when poll_interval > 0 | |
| if s.isStreamingMode() || s.config.PollInterval > 0 { |
source_adapter_rpc.go
Outdated
| // Calculate limit as the number of ledgers in the range | ||
| limit := to - from | ||
| if limit <= 0 { | ||
| limit = 1 | ||
| } |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit calculation is incorrect. The current logic calculates to - from, but this represents the number of ledgers minus 1 (since the range is inclusive on both ends). For example, if from=100 and to=102, you have 3 ledgers (100, 101, 102), but the calculation yields 102-100=2.
Additionally, the comparison limit <= 0 won't work correctly when to < from, which should be validated earlier.
Suggested fix:
// Validate range
if to < from {
return nil, fmt.Errorf("invalid range: to (%d) must be >= from (%d)", to, from)
}
// Calculate limit as number of ledgers (inclusive range)
limit := to - from + 1| // Calculate limit as the number of ledgers in the range | |
| limit := to - from | |
| if limit <= 0 { | |
| limit = 1 | |
| } | |
| // Validate range | |
| if to < from { | |
| return nil, fmt.Errorf("invalid range: to (%d) must be >= from (%d)", to, from) | |
| } | |
| // Calculate limit as number of ledgers (inclusive range) | |
| limit := to - from + 1 |
source_adapter_rpc.go
Outdated
| } else { | ||
| // In batch mode, empty response is unexpected | ||
| log.Printf("WARNING: No ledgers returned for range %d-%d", currentLedger, batchEnd) | ||
| break |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In streaming mode, when no ledgers are returned and you break out of the inner if-else (line 771), the code continues to line 774 which tries to access ledgersResp.Result.Ledgers but we just confirmed it's empty (line 745). This will cause the loop to continue with ledgers := [] and skip the processing block, but then attempt to update counters with len(ledgers) = 0 (line 805).
While this doesn't cause an error, the logic flow is confusing. After the continue statement on line 759 or the context cancellation return on line 765, the code after line 772 should never execute. However, the break on line 770 exits only the inner if statement, not the loop, so execution continues to line 774 with an empty result.
This should be restructured to use continue instead of break on line 770 to skip to the next iteration of the main loop.
| break | |
| continue |
| batch_size: 10 # Fetch 10 ledgers per batch | ||
|
|
||
| # Polling configuration | ||
| poll_interval: 5 # Poll every 5 seconds when caught up (minimum 1s) |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "minimum 1s" for poll_interval, but this constraint is only mentioned in the code (line 709-711 in source_adapter_rpc.go) and enforced with a warning. Users might set a value less than 1 second expecting it to work as configured. Consider adding a more prominent note about this requirement:
poll_interval: 5 # Poll every 5 seconds when caught up (REQUIRED: minimum 1s in streaming mode)| poll_interval: 5 # Poll every 5 seconds when caught up (minimum 1s) | |
| poll_interval: 5 # Poll every 5 seconds when caught up (REQUIRED: minimum 1s in streaming mode) |
| - "CCW67TSZV3SSS2HXMBQ5JFGCKJNXKZM7UQUWUZPUTHXSTZLEO7SJMI75" | ||
| topics: | ||
| # Transfer event signature: transfer(from: Address, to: Address, amount: i128) | ||
| # Base64 encoded "transfer" |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment describes the topic as "Transfer event signature: transfer(from: Address, to: Address, amount: i128)" and states "Base64 encoded 'transfer'", but the actual value shown is the full XDR encoding: AAAADwAAAAh0cmFuc2Zlcg==.
The simpler base64 encoding of just "transfer" would be dHJhbnNmZXI= (as correctly shown in README_RPC_EVENTS.md line 87-88). The XDR encoding includes additional bytes for type information.
Update the comment to clarify this is the XDR-encoded topic, not just base64:
# Transfer event signature: transfer(from: Address, to: Address, amount: i128)
# XDR-encoded topic (not just base64 "transfer")
- ["AAAADwAAAAh0cmFuc2Zlcg==", "*", "*", "*"]| # Base64 encoded "transfer" | |
| # XDR-encoded topic (not just base64 "transfer") |
| ) | ||
|
|
||
| # Run forever (blocking call) | ||
| ws.run_forever(ping_interval=30, ping_timeout=10) |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ping_interval and ping_timeout parameters are passed to run_forever(), but the WebSocket library's ping/pong mechanism needs careful handling. If the server doesn't respond to pings within the timeout period, the connection will be closed.
Consider adding error handling and reconnection logic, especially since this is meant to be a test client. Additionally, document these parameters in the file docstring so users understand their purpose:
# Run forever with keepalive settings
# ping_interval: Send ping every 30 seconds to keep connection alive
# ping_timeout: Wait up to 10 seconds for pong response
ws.run_forever(ping_interval=30, ping_timeout=10)| import websocket | ||
| import json | ||
| import sys | ||
| import time |
Copilot
AI
Nov 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'time' is not used.
| import time |
Pull Request Review: Update RPC Source AdapterThis PR introduces significant improvements for Stellar/Soroban RPC event processing with batch/streaming modes and checkpointing. ✅ Strengths
|
Pull Request Review: Update RPC Source AdapterOverviewThis PR introduces significant enhancements to the RPC source adapter, adding support for batch/streaming modes with checkpointing, new pipeline configurations, and comprehensive documentation. Overall, this is a substantial improvement to the pipeline's reliability and flexibility. 🎯 Strengths1. Excellent Documentation
2. Robust Checkpointing System
3. Good Architecture
4. Production-Ready Features
🐛 Issues FoundCRITICAL: Potential Data Loss in Checkpoint LogicLocation: // Advance to next ledger based on what was actually processed
if processedCount > 0 {
// Resume from the ledger after the last one we successfully processed
currentLedger = lastProcessedLedger + 1
} else {
// No ledgers processed - in streaming mode this means caught up, in batch mode skip the range
if !streamingMode {
log.Printf("WARNING: No ledgers processed in range %d-%d, advancing past range", currentLedger, batchEnd)
}
currentLedger = batchEnd // ⚠️ PROBLEM: Skips ledgers in batch mode
}Problem: In batch mode, when Scenario:
Recommended Fix: } else {
if !streamingMode {
log.Printf("ERROR: No ledgers processed in range %d-%d, retrying...", currentLedger, batchEnd)
// Don't advance - retry the same range
// Add exponential backoff or max retries to prevent infinite loops
time.Sleep(time.Duration(retryCount) * s.config.PollInterval)
}
currentLedger = batchEnd // Only advance in streaming mode
}MAJOR: Race Condition in Stats TrackingLocation: type EventStats struct {
ProcessedEvents uint64
SuccessfulEvents uint64
FailedEvents uint64
LastLedger uint64
LastEventTime time.Time // ⚠️ Not safe for concurrent access
}
type SorobanSourceAdapter struct {
// ...
stats EventStats // ⚠️ No mutex protection
}Problem: Recommended Fix: type SorobanSourceAdapter struct {
config SorobanConfig
processors []cdpProcessor.Processor
client *http.Client
lastLedger uint64
stats EventStats
statsMux sync.RWMutex // Add mutex for thread-safe stats access
}MODERATE: Configuration Type Assertion IssuesLocation: authHeader, ok := config["auth_header"].(string)
if !ok {
return nil, errors.New("auth_header must be specified")
}Problem: Requires Recommended Fix: authHeader := ""
if ah, ok := config["auth_header"].(string); ok {
authHeader = ah
}MODERATE: Checkpoint Race Condition on ShutdownLocation: Problem: When context is cancelled, checkpoint is saved immediately. However, if another iteration also tries to save a checkpoint concurrently, there could be a race condition. Scenario:
Recommended Fix: type SorobanSourceAdapter struct {
// ...
checkpointMux sync.Mutex
}
func (s *SorobanSourceAdapter) saveCheckpoint(ledger, totalProcessed uint64) error {
s.checkpointMux.Lock()
defer s.checkpointMux.Unlock()
// ... existing code
}MINOR: Hardcoded ValuesLocation: client: &http.Client{Timeout: 30 * time.Second}, // Hardcoded timeoutIssue: HTTP timeout is hardcoded. For large batches or slow RPC endpoints, 30s might be insufficient. Recommendation: Make configurable: timeout := 30 * time.Second
if t, ok := config["rpc_timeout"].(int); ok {
timeout = time.Duration(t) * time.Second
}
client: &http.Client{Timeout: timeout},MINOR: Incomplete Error ContextLocation: ledgerCloseMeta, err := decodeLedgerXDR(ledgerMap)
if err != nil {
log.Printf("ERROR: Failed to decode ledger XDR for sequence %d: %v", ledgerSeq, err)
continue // ⚠️ Silently skips ledger
}Issue: When XDR decoding fails, the ledger is skipped without incrementing any error counter or saving state. If this happens frequently, it's hard to detect. Recommendation: if err != nil {
log.Printf("ERROR: Failed to decode ledger XDR for sequence %d: %v", ledgerSeq, err)
s.stats.FailedEvents++ // Track failures
continue
}MINOR: Documentation InconsistencyLocation: echo -n "transfer" | base64
# Output: dHJhbnNmZXI=
# Full XDR encoding: AAAADwAAAAh0cmFuc2Zlcg==Issue: The comment says "Full XDR encoding" but doesn't explain how to go from Recommendation: Add a note explaining this is the XDR-encoded SCVal representation, or provide a link to Stellar docs. 🔒 Security Considerations1. API Key ExposureThe example configs have placeholders like # IMPORTANT: Never commit real API keys to version control!
# Use environment variables or secret management instead
auth_header: "${OBSRVR_API_KEY}" # Example: read from env var2. File Path TraversalLocation: dir := "/tmp/checkpoints/soroban-source"
if checkpointDir, ok := s.config.Extra["checkpoint_dir"].(string); ok {
dir = checkpointDir // ⚠️ No validation
}Issue: User-supplied Recommendation: if checkpointDir, ok := s.config.Extra["checkpoint_dir"].(string); ok {
// Validate path doesn't escape intended directory
cleaned := filepath.Clean(checkpointDir)
if !filepath.IsAbs(cleaned) || strings.Contains(cleaned, "..") {
log.Printf("WARNING: Invalid checkpoint_dir: %s, using default", checkpointDir)
} else {
dir = cleaned
}
}🧪 Test CoverageCurrent State: No test files ( Recommended Tests:
Example: func TestCheckpointSaveLoad(t *testing.T) {
adapter := &SorobanSourceAdapter{
config: SorobanConfig{Extra: map[string]interface{}{}},
}
err := adapter.saveCheckpoint(12345, 100)
require.NoError(t, err)
checkpoint, err := adapter.loadCheckpoint()
require.NoError(t, err)
assert.Equal(t, uint64(12345), checkpoint.LastProcessedLedger)
}📊 Performance Considerations1. Memory Usage in Large Batchesledgers := ledgersResp.Result.Ledgers // Entire batch loaded into memoryWith 2. No Connection Poolingclient: &http.Client{Timeout: 30 * time.Second},The HTTP client doesn't configure Recommendation: client: &http.Client{
Timeout: 30 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
}✅ Follows CLAUDE.md GuidelinesGood Examples:
Suggestions:
🎓 Best Practices Followed✅ Atomic file writes for checkpoints ( 📝 Recommendations SummaryMust Fix Before Merge:
Should Fix:
Nice to Have:
🎉 ConclusionThis is a well-designed, production-quality PR that significantly enhances the pipeline's capabilities. The checkpointing system is solid, documentation is excellent, and the streaming/batch mode distinction is clear. The critical issues are straightforward to fix, and once addressed, this will be a valuable addition to the codebase. Recommendation: Approve with changes requested for the critical issues. Great work! 👏 |
This pull request introduces several new configuration files and code changes that enhance the flexibility, reliability, and usability of the pipeline workflow for Stellar/Soroban RPC event processing. The main improvements include support for batch and streaming modes with checkpointing, new pipelines for contract events and USDC transfers, real-time WebSocket streaming, and a test client for WebSocket integration. Additionally, the codebase is updated to support these features, including improved RPC handling and checkpoint management.
Pipeline configuration enhancements:
rpc-batch-mode.yaml,rpc-streaming-mode.yaml) supporting checkpointing for crash recovery and historical backfills. [1] [2]contract_events_rpc.yaml), USDC transfer events (usdc_transfers_rpc.yaml), and real-time WebSocket streaming (websocket_streaming.yaml). [1] [2] [3]WebSocket integration:
test_websocket_client.py) for verifying WebSocket server functionality and event streaming.Codebase improvements for RPC adapter:
Checkpointstruct for crash recovery.GetLedgersRPC method to use new parameters and response structures, improving compatibility and clarity. [1] [2]General improvements: