Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a health-check feature and per-proposal timeout propagation: new CLI HealthCheck, KfpNode.health_check and HealthCheckComplete event, persistent KeyHealthStatus storage and Keep APIs, descriptor session timeout validation/propagation across protocol, node, CLI, and mobile layers. Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI as "keep-cli\ncmd_frost_network_health_check"
participant Keep as "keep-core\nKeep (Vault & Storage)"
participant Node as "keep-frost-net\nKfpNode"
participant Peers as "Network\nPeers"
User->>CLI: health_check(group, relay?, share?, timeout)
CLI->>Keep: unlock_vault() & read share
Keep-->>CLI: share_data
CLI->>Node: init_node_and_announce(share)
Node->>Peers: discover_peers()
Peers-->>Node: peer_list
CLI->>Node: node.health_check(timeout)
Node->>Peers: ping(peer) (parallel, bounded by timeout)
Peers-->>Node: pong / no response
Node-->>CLI: HealthCheckResult{responsive, unresponsive}
CLI->>Keep: store_health_status(KeyHealthStatus)
Keep-->>CLI: persisted
CLI-->>User: summary + history
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
keep-frost-net/src/node/descriptor.rs (1)
32-72:⚠️ Potential issue | 🟠 MajorValidate
timeout_secsbefore creating the local descriptor session.
request_descriptor_with_timeoutcurrently accepts anyOption<u64>and uses it immediately for local session creation. Invalid values can cause local behavior to diverge from protocol constraints (e.g., zero-second sessions) before peer-side validation kicks in.🔧 Proposed fix
pub async fn request_descriptor_with_timeout( &self, policy: WalletPolicy, network: &str, own_xpub: &str, own_fingerprint: &str, timeout_secs: Option<u64>, ) -> Result<[u8; 32]> { + if let Some(t) = timeout_secs { + if t == 0 || t > DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS { + return Err(FrostNetError::Protocol(format!( + "timeout_secs must be between 1 and {DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS}" + ))); + } + } + let our_index = self.share.metadata.identifier; @@ let msg = KfpMessage::DescriptorPropose(payload); + msg.validate() + .map_err(|e| FrostNetError::Protocol(e.to_string()))?; let json = msg.to_json()?;Also applies to: 87-101
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/node/descriptor.rs` around lines 32 - 72, In request_descriptor_with_timeout validate the timeout_secs parameter before converting/using it to create a local session: check that if timeout_secs is Some(t) then t > 0 (or meets any protocol min) and return an Err early for invalid values; only then set session_timeout and call sessions.create_session_with_timeout. Update the same validation where you later create sessions (the block referenced around create_session_with_timeout) so both uses of session_timeout are guarded and invalid zero or out-of-range durations never produce a local session.
🧹 Nitpick comments (4)
keep-frost-net/src/protocol.rs (1)
275-283: Add explicit boundary tests fortimeout_secs.Please add coverage for
None,0,1,DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS, andmax + 1to lock this protocol behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/protocol.rs` around lines 275 - 283, Add unit tests that exercise KfpMessage::DescriptorPropose timeout handling for the five boundary cases: None, 0, 1, DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS, and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS + 1; locate the logic in protocol.rs around KfpMessage::DescriptorPropose and the timeout_secs field and add tests that construct DescriptorPropose messages (or call the validation path used there) asserting success for None, 1, and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS and asserting an error for 0 and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS + 1 so the protocol behavior is fully covered.keep-mobile/src/lib.rs (1)
743-748: Consider logging callback failures instead of discarding them silently.Suppressing callback errors makes troubleshooting harder in app integrations.
Suggested observability improvement
if let Some(cb) = self.health_callbacks.read().await.as_ref() { - let _ = cb.on_health_check_complete( - result.responsive.clone(), - result.unresponsive.clone(), - ); + if let Err(e) = cb.on_health_check_complete( + result.responsive.clone(), + result.unresponsive.clone(), + ) { + tracing::warn!("Health callback error: {e}"); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-mobile/src/lib.rs` around lines 743 - 748, Replace the silent discard of the callback result (the line using "let _ = cb.on_health_check_complete(...)") with explicit handling: capture the Result returned by cb.on_health_check_complete in a variable, match it, and on Err(e) emit an error log that includes a descriptive message, the error value and the relevant context (e.g. result.responsive / result.unresponsive). Update the block that reads self.health_callbacks and calls on_health_check_complete so failed callbacks are logged (using your project's logging facility such as tracing::error! or log::error!), rather than being ignored.keep-mobile/src/persistence.rs (1)
399-422:load_health_statusesshould prune stale/corrupt index entries.Right now dead keys are silently skipped but retained, which can cause persistent read churn and index bloat.
Suggested cleanup pattern (matching descriptor behavior)
pub(crate) fn load_health_statuses(storage: &Arc<dyn SecureStorage>) -> Vec<KeyHealthStatusInfo> { @@ - let index = load_health_index(storage); - let mut results = Vec::new(); - for key in index { - if let Ok(data) = storage.load_share_by_key(key) { - if let Ok(stored) = serde_json::from_slice::<StoredHealthStatus>(&data) { + let index = load_health_index(storage); + let mut results = Vec::new(); + let mut stale = Vec::new(); + for key in &index { + match storage.load_share_by_key(key.clone()) { + Ok(data) => match serde_json::from_slice::<StoredHealthStatus>(&data) { + Ok(stored) => { let stale_age = now.saturating_sub(stored.last_check_timestamp); results.push(KeyHealthStatusInfo { group_pubkey: stored.group_pubkey, share_index: stored.share_index, last_check_timestamp: stored.last_check_timestamp, responsive: stored.responsive, created_at: stored.created_at.unwrap_or(stored.last_check_timestamp), is_stale: stale_age >= keep_core::wallet::KEY_HEALTH_STALE_THRESHOLD_SECS, is_critical: stale_age >= keep_core::wallet::KEY_HEALTH_CRITICAL_THRESHOLD_SECS, }); - } + } + Err(_) => stale.push(key.clone()), + }, + Err(_) => stale.push(key.clone()), } } + if !stale.is_empty() { + let cleaned: Vec<String> = index.into_iter().filter(|k| !stale.contains(k)).collect(); + let _ = persist_health_index(storage, &cleaned); + } results }keep-cli/src/commands/frost_network/mod.rs (1)
607-631: Health history output should be sorted for deterministic CLI UX.Current order depends on storage iteration order, so output can be unstable across runs/backends.
Suggested deterministic ordering
- let group_statuses: Vec<_> = all_statuses + let mut group_statuses: Vec<_> = all_statuses .iter() .filter(|s| s.group_pubkey == group_pubkey) .collect(); + group_statuses.sort_by_key(|s| s.share_index); if !group_statuses.is_empty() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-cli/src/commands/frost_network/mod.rs` around lines 607 - 631, The health-history iteration is non-deterministic; make group_statuses mutable (let mut group_statuses) and sort it deterministically before printing — e.g. sort primarily by last_check_timestamp descending (newest first) and secondarily by share_index ascending so ties are stable; use group_statuses.sort_by(|a,b| b.last_check_timestamp.cmp(&a.last_check_timestamp).then(a.share_index.cmp(&b.share_index))) after the collection and then iterate the sorted group_statuses for output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@keep-core/src/storage.rs`:
- Around line 784-788: delete_health_status currently always returns Ok even
when the record was missing; update it to mirror other delete APIs by returning
an error for a missing record: call backend.delete(HEALTH_STATUS_TABLE,
key.as_bytes()), detect the backend's "not found" outcome (pattern-match or map
the backend error) and map it to KeepError::NotFound (or the module's
equivalent) instead of swallowing it, while still propagating other backend
errors; reference the delete_health_status function and health_status_key helper
when making the change.
In `@keep-frost-net/src/descriptor_session.rs`:
- Around line 406-415: Validate the incoming timeout in
create_session_with_timeout (and the related create_session overload) to reject
Duration::ZERO and excessively large durations: enforce a minimum >
Duration::ZERO and a configurable maximum (e.g., define a MAX_SESSION_TIMEOUT
constant such as Duration::from_secs(86_400) or wire in a configured cap) and
return an Err if the passed timeout is out of bounds; update callers to use the
same validation path or helper function (e.g., add a
validate_session_timeout(timeout: Duration) -> Result<Duration>) so
DescriptorSession construction always receives a sane, clamped timeout and avoid
immediate expiry or unbounded retention.
In `@keep-mobile/src/lib.rs`:
- Around line 731-741: Validate the timeout_secs argument at the start of
health_check before calling runtime.block_on: enforce a lower bound of 1 and an
upper bound (e.g., 300 seconds to match CLI behavior), and if out of range
return an appropriate KeepMobileError (e.g., KeepMobileError::InvalidInput with
a descriptive message). Update the health_check function to check timeout_secs,
error early on invalid values, and only construct
Duration::from_secs(timeout_secs) after the validation so downstream calls
(node.health_check(...)) never receive 0 or an unbounded value.
- Around line 766-777: The call to persistence::persist_health_status is
currently ignoring its Result (let _ = ...), which can hide failures; update the
code around persist_health_status (the call that builds KeyHealthStatusInfo with
group_pubkey, share_index, last_check_timestamp, responsive, created_at,
is_stale, is_critical) to properly handle the Result: at minimum match or use if
let Err(e) = persistence::persist_health_status(...) and log the error with
context (include group_pubkey and share_index) via the existing logger or
return/propagate the error upward where appropriate so failures are not silently
dropped.
---
Outside diff comments:
In `@keep-frost-net/src/node/descriptor.rs`:
- Around line 32-72: In request_descriptor_with_timeout validate the
timeout_secs parameter before converting/using it to create a local session:
check that if timeout_secs is Some(t) then t > 0 (or meets any protocol min) and
return an Err early for invalid values; only then set session_timeout and call
sessions.create_session_with_timeout. Update the same validation where you later
create sessions (the block referenced around create_session_with_timeout) so
both uses of session_timeout are guarded and invalid zero or out-of-range
durations never produce a local session.
---
Nitpick comments:
In `@keep-cli/src/commands/frost_network/mod.rs`:
- Around line 607-631: The health-history iteration is non-deterministic; make
group_statuses mutable (let mut group_statuses) and sort it deterministically
before printing — e.g. sort primarily by last_check_timestamp descending (newest
first) and secondarily by share_index ascending so ties are stable; use
group_statuses.sort_by(|a,b|
b.last_check_timestamp.cmp(&a.last_check_timestamp).then(a.share_index.cmp(&b.share_index)))
after the collection and then iterate the sorted group_statuses for output.
In `@keep-frost-net/src/protocol.rs`:
- Around line 275-283: Add unit tests that exercise
KfpMessage::DescriptorPropose timeout handling for the five boundary cases:
None, 0, 1, DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS, and
DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS + 1; locate the logic in protocol.rs around
KfpMessage::DescriptorPropose and the timeout_secs field and add tests that
construct DescriptorPropose messages (or call the validation path used there)
asserting success for None, 1, and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS and
asserting an error for 0 and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS + 1 so the
protocol behavior is fully covered.
In `@keep-mobile/src/lib.rs`:
- Around line 743-748: Replace the silent discard of the callback result (the
line using "let _ = cb.on_health_check_complete(...)") with explicit handling:
capture the Result returned by cb.on_health_check_complete in a variable, match
it, and on Err(e) emit an error log that includes a descriptive message, the
error value and the relevant context (e.g. result.responsive /
result.unresponsive). Update the block that reads self.health_callbacks and
calls on_health_check_complete so failed callbacks are logged (using your
project's logging facility such as tracing::error! or log::error!), rather than
being ignored.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
keep-cli/src/cli.rskeep-cli/src/commands/frost_network/mod.rskeep-cli/src/commands/wallet.rskeep-cli/src/main.rskeep-core/src/backend.rskeep-core/src/lib.rskeep-core/src/storage.rskeep-core/src/wallet.rskeep-frost-net/src/descriptor_session.rskeep-frost-net/src/lib.rskeep-frost-net/src/node/descriptor.rskeep-frost-net/src/node/mod.rskeep-frost-net/src/protocol.rskeep-mobile/src/lib.rskeep-mobile/src/persistence.rskeep-mobile/src/types.rs
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
keep-cli/src/commands/frost_network/mod.rs (1)
579-595: Consider graceful handling of partial failures during health status persistence.If
get_health_statusorstore_health_statusfails for one share index, the entire function returns an error, potentially leaving the database in a partially updated state. Depending on the desired behavior, you might want to log individual failures and continue processing remaining shares.♻️ Optional: Continue on individual storage failures
for (&idx, responsive) in result .responsive .iter() .map(|i| (i, true)) .chain(result.unresponsive.iter().map(|i| (i, false))) { - let existing = keep.get_health_status(&group_pubkey, idx)?; + let existing = match keep.get_health_status(&group_pubkey, idx) { + Ok(s) => s, + Err(e) => { + tracing::warn!(share_index = idx, error = %e, "failed to get existing health status"); + None + } + }; let created_at = existing.and_then(|s| s.created_at).unwrap_or(now); let status = keep_core::wallet::KeyHealthStatus { group_pubkey, share_index: idx, last_check_timestamp: now, responsive, created_at: Some(created_at), }; - keep.store_health_status(&status)?; + if let Err(e) = keep.store_health_status(&status) { + tracing::warn!(share_index = idx, error = %e, "failed to store health status"); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-cli/src/commands/frost_network/mod.rs` around lines 579 - 595, The loop currently bails out on the first DB error from keep.get_health_status or keep.store_health_status; change it to handle per-share failures so one bad write/read doesn't abort the whole run: wrap the calls to keep.get_health_status(&group_pubkey, idx) and keep.store_health_status(&status) in per-iteration error handling (e.g., match or if let Err(e) => { log the error including group_pubkey and share index idx and continue } ), and optionally collect encountered errors into a Vec or Count to return/emit after the loop if you need an aggregate failure signal; ensure you still compute created_at from existing when get_health_status succeeds and build status from group_pubkey/idx/now before attempting store.keep-frost-net/src/descriptor_session.rs (1)
21-33: Avoid timeout-cap drift by using the protocol constant directly.
Line 21hardcodes86_400even though descriptor timeout limits already live in protocol constants. Reusing the shared constant keeps bounds consistent across modules.♻️ Suggested alignment
use crate::protocol::{ KeySlot, WalletPolicy, DESCRIPTOR_ACK_PHASE_TIMEOUT_SECS, DESCRIPTOR_CONTRIBUTION_TIMEOUT_SECS, - DESCRIPTOR_FINALIZE_TIMEOUT_SECS, DESCRIPTOR_SESSION_TIMEOUT_SECS, MAX_FINGERPRINT_LENGTH, + DESCRIPTOR_FINALIZE_TIMEOUT_SECS, DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS, + DESCRIPTOR_SESSION_TIMEOUT_SECS, MAX_FINGERPRINT_LENGTH, MAX_XPUB_LENGTH, }; @@ -const MAX_SESSION_TIMEOUT: Duration = Duration::from_secs(86_400); +const MAX_SESSION_TIMEOUT: Duration = + Duration::from_secs(DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/descriptor_session.rs` around lines 21 - 33, The file defines a local MAX_SESSION_TIMEOUT using the hardcoded 86_400s; update validate_session_timeout to use the shared protocol timeout constant instead of the magic number: either construct MAX_SESSION_TIMEOUT from the protocol's session timeout constant or compare timeout.as_secs() directly against the protocol constant (import the protocol constant into this module). Change references inside validate_session_timeout (function name: validate_session_timeout, symbol: MAX_SESSION_TIMEOUT) so bounds come from the protocol constant to keep limits consistent across modules.keep-mobile/src/persistence.rs (1)
398-421: Prune stale health-index entries during reads.At
Line 406-Line 419, missing/corrupt records are skipped but never removed fromHEALTH_STATUS_INDEX_KEY. That causes repeated failed loads and index drift over time. Consider cleaning stale keys the same wayload_descriptorsdoes.♻️ Suggested cleanup pattern
pub(crate) fn load_health_statuses(storage: &Arc<dyn SecureStorage>) -> Vec<KeyHealthStatusInfo> { @@ - let index = load_health_index(storage); - let mut results = Vec::new(); - for key in index { - if let Ok(data) = storage.load_share_by_key(key) { - if let Ok(stored) = serde_json::from_slice::<StoredHealthStatus>(&data) { + let index = load_health_index(storage); + let mut results = Vec::new(); + let mut stale = Vec::new(); + for key in &index { + match storage.load_share_by_key(key.clone()) { + Ok(data) => match serde_json::from_slice::<StoredHealthStatus>(&data) { let stale_age = now.saturating_sub(stored.last_check_timestamp); results.push(KeyHealthStatusInfo { @@ - }); - } + }); + } + Err(_) => stale.push(key.clone()), + } + } + + if !stale.is_empty() { + let cleaned: Vec<String> = index.into_iter().filter(|k| !stale.contains(k)).collect(); + let _ = persist_health_index(storage, &cleaned); + } - } - } results }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-mobile/src/persistence.rs` around lines 398 - 421, In load_health_statuses: the code currently skips missing/corrupt entries from load_health_index but leaves them in the stored HEALTH_STATUS_INDEX_KEY, causing repeated failures; modify load_health_statuses to build a cleaned index (e.g., iterate over the keys from load_health_index, add only keys that successfully load and parse StoredHealthStatus to results) and remove keys that fail storage.load_share_by_key or serde_json::from_slice; after iteration persist the cleaned index back (use the existing index save helper analogous to load_descriptors’ save logic — e.g., a save_health_index or the same persistence API used elsewhere) so the next read won’t repeatedly encounter bad entries; keep existing KeyHealthStatusInfo construction and stale/critical checks as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@keep-frost-net/src/node/descriptor.rs`:
- Around line 198-202: The current code silently drops invalid
payload.timeout_secs by filtering into propose_timeout, so replace the
filter/map pattern with explicit validation: check payload.timeout_secs and if
it is Some(t) but t == 0 or t > DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS return an
error (fail fast) with a clear message, otherwise map the valid value to
std::time::Duration::from_secs and assign to propose_timeout (or leave None when
timeout is legitimately absent); reference the existing propose_timeout binding,
payload.timeout_secs, and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS when implementing
this change.
- Around line 32-40: The function request_descriptor_with_timeout no longer
rejects invalid network values early, allowing bad proposals to be broadcast;
add an early validation of the network parameter inside
request_descriptor_with_timeout (before any session creation or broadcast calls)
that checks the provided network string against the allowed networks and
immediately returns an Err when invalid so invalid proposals cannot proceed;
reference the network parameter and the request_descriptor_with_timeout function
to locate where to insert the guard.
In `@keep-frost-net/src/node/mod.rs`:
- Around line 1036-1050: health_check currently reads peers into all_peers and
then calls ping_peers which takes its own snapshot, causing inconsistent
responsive/unresponsive classification if peers change; fix by taking a single
peers snapshot in health_check (including share_index and any data ping_peers
needs such as PublicKey and last-seen timestamp), add a new helper like
ping_peers_from_snapshot(&self, peers_snapshot: &[(u16, PublicKey, Instant)],
timeout: Duration) -> Result<Vec<u16>> (or change ping_peers to accept a
snapshot), call that from health_check using the same snapshot, and then compute
unresponsive by comparing against that snapshot's share_index values so both
responsive and unresponsive derive from the identical peer set.
In `@keep-mobile/src/lib.rs`:
- Around line 748-753: The code currently swallows errors from the health
callback call in the block that reads self.health_callbacks and invokes
cb.on_health_check_complete(result.responsive.clone(),
result.unresponsive.clone()) using let _ = ..., so change that to capture the
Result and at minimum log failures instead of discarding them; locate the call
to on_health_check_complete and replace the ignored-result pattern with code
that awaits the call result, matches or uses if let Err(e) to emit a clear log
via your logger (or tracing) including the error and context (e.g., which
callback and the responsive/unresponsive values) so callback integration errors
are surfaced.
---
Nitpick comments:
In `@keep-cli/src/commands/frost_network/mod.rs`:
- Around line 579-595: The loop currently bails out on the first DB error from
keep.get_health_status or keep.store_health_status; change it to handle
per-share failures so one bad write/read doesn't abort the whole run: wrap the
calls to keep.get_health_status(&group_pubkey, idx) and
keep.store_health_status(&status) in per-iteration error handling (e.g., match
or if let Err(e) => { log the error including group_pubkey and share index idx
and continue } ), and optionally collect encountered errors into a Vec or Count
to return/emit after the loop if you need an aggregate failure signal; ensure
you still compute created_at from existing when get_health_status succeeds and
build status from group_pubkey/idx/now before attempting store.
In `@keep-frost-net/src/descriptor_session.rs`:
- Around line 21-33: The file defines a local MAX_SESSION_TIMEOUT using the
hardcoded 86_400s; update validate_session_timeout to use the shared protocol
timeout constant instead of the magic number: either construct
MAX_SESSION_TIMEOUT from the protocol's session timeout constant or compare
timeout.as_secs() directly against the protocol constant (import the protocol
constant into this module). Change references inside validate_session_timeout
(function name: validate_session_timeout, symbol: MAX_SESSION_TIMEOUT) so bounds
come from the protocol constant to keep limits consistent across modules.
In `@keep-mobile/src/persistence.rs`:
- Around line 398-421: In load_health_statuses: the code currently skips
missing/corrupt entries from load_health_index but leaves them in the stored
HEALTH_STATUS_INDEX_KEY, causing repeated failures; modify load_health_statuses
to build a cleaned index (e.g., iterate over the keys from load_health_index,
add only keys that successfully load and parse StoredHealthStatus to results)
and remove keys that fail storage.load_share_by_key or serde_json::from_slice;
after iteration persist the cleaned index back (use the existing index save
helper analogous to load_descriptors’ save logic — e.g., a save_health_index or
the same persistence API used elsewhere) so the next read won’t repeatedly
encounter bad entries; keep existing KeyHealthStatusInfo construction and
stale/critical checks as-is.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
keep-cli/src/commands/frost_network/mod.rskeep-core/src/storage.rskeep-frost-net/src/descriptor_session.rskeep-frost-net/src/node/descriptor.rskeep-frost-net/src/node/mod.rskeep-mobile/src/error.rskeep-mobile/src/lib.rskeep-mobile/src/persistence.rs
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
keep-frost-net/src/node/mod.rs (2)
1085-1088: Log ping build/send failures during health checks.These failures are currently swallowed, which makes transport issues look like pure peer liveness issues and complicates diagnosis.
🪵 Proposed logging for observability
- for (_, pubkey, _) in peers_snapshot { - if let Ok(event) = KfpEventBuilder::ping(&self.keys, pubkey) { - let _ = self.client.send_event(&event).await; - } - } + for (_, pubkey, _) in peers_snapshot { + match KfpEventBuilder::ping(&self.keys, pubkey) { + Ok(event) => { + if let Err(e) = self.client.send_event(&event).await { + debug!(peer = %pubkey, error = %e, "Failed to send health-check ping"); + } + } + Err(e) => { + debug!(peer = %pubkey, error = %e, "Failed to build health-check ping"); + } + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/node/mod.rs` around lines 1085 - 1088, The ping build/send failures inside the health-check loop are being ignored; when iterating over peers_snapshot and calling KfpEventBuilder::ping(&self.keys, pubkey) and then self.client.send_event(&event).await, capture and log any Err returned by KfpEventBuilder::ping and the Result from send_event instead of discarding them—use the module's existing logger (or tracing macros) to emit descriptive error messages that include the peer pubkey and the error details so transport failures are distinguishable from peer liveness issues.
1036-1036: Apply timeout bounds at the node API boundary as well.
KfpNode::health_checkis public, but timeout bounds are only enforced inkeep-mobile/src/lib.rs(Lines 730-735). Adding the same guard here keeps behavior consistent for all callers.⏱️ Proposed guard in `health_check`
pub async fn health_check(&self, timeout: Duration) -> Result<HealthCheckResult> { + if timeout < Duration::from_secs(1) || timeout > Duration::from_secs(300) { + return Err(FrostNetError::Protocol(format!( + "health_check timeout must be between 1 and 300 seconds, got {timeout:?}" + ))); + } + let peers_snapshot: Vec<(u16, PublicKey, std::time::Instant)> = self .peers .read()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/node/mod.rs` at line 1036, KfpNode::health_check currently accepts any Duration but timeout bounds are only applied in keep-mobile; replicate the same guard at the API boundary by clamping the incoming timeout to the allowed range before proceeding. In the body of pub async fn health_check(&self, timeout: Duration) -> Result<HealthCheckResult>, apply the same min/max clamp logic used in keep-mobile (or call the shared helper if one exists) to constrain timeout to the configured MIN/MAX values, then use the clamped value for the rest of the function.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@keep-frost-net/src/node/mod.rs`:
- Around line 398-399: Validate session_timeout before performing any
relay/network side effects by calling DescriptorSessionManager::with_timeout(t)
early and handling its error immediately; if session_timeout is Some(t) attempt
DescriptorSessionManager::with_timeout(t)? before any relay add/connect work and
store the returned manager (or, when None, use DescriptorSessionManager::new()),
then reuse that validated manager for subsequent network setup so invalid
timeouts fail fast and avoid side effects and startup latency.
In `@keep-mobile/src/lib.rs`:
- Around line 738-746: The read guard on self.node (node_guard) is held across
an awaited call to node.health_check and any subsequent callback paths, causing
write contention; fix by minimizing the lock scope: immediately clone or extract
the minimal data needed (e.g., call node.as_ref().ok_or(...)? then clone
node.group_pubkey() into a local group_pubkey String or bytes) and drop
node_guard before calling node.health_check(...). Use a short-lived block or
drop(node_guard) so the awaited health_check and any callbacks run without
holding self.node's read lock; apply the same pattern for the other similar
section that holds node_guard across async work.
- Around line 807-812: Add explicit timeout bounds validation to
wallet_descriptor_propose_with_timeout: check if timeout_secs is Some and reject
values <= 0 or > DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS with a clear
KeepMobileError (mirroring health_check's 1–300 pattern), returning an Err
before forwarding the proposal; also apply the same validation to the other
proposer method around the 837–843 block so both
wallet_descriptor_propose_with_timeout and its sibling perform identical checks
instead of relying on downstream peer filtering.
---
Nitpick comments:
In `@keep-frost-net/src/node/mod.rs`:
- Around line 1085-1088: The ping build/send failures inside the health-check
loop are being ignored; when iterating over peers_snapshot and calling
KfpEventBuilder::ping(&self.keys, pubkey) and then
self.client.send_event(&event).await, capture and log any Err returned by
KfpEventBuilder::ping and the Result from send_event instead of discarding
them—use the module's existing logger (or tracing macros) to emit descriptive
error messages that include the peer pubkey and the error details so transport
failures are distinguishable from peer liveness issues.
- Line 1036: KfpNode::health_check currently accepts any Duration but timeout
bounds are only applied in keep-mobile; replicate the same guard at the API
boundary by clamping the incoming timeout to the allowed range before
proceeding. In the body of pub async fn health_check(&self, timeout: Duration)
-> Result<HealthCheckResult>, apply the same min/max clamp logic used in
keep-mobile (or call the shared helper if one exists) to constrain timeout to
the configured MIN/MAX values, then use the clamped value for the rest of the
function.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
keep-frost-net/src/node/descriptor.rskeep-frost-net/src/node/mod.rskeep-mobile/src/lib.rs
- Panic on poisoned mutex instead of silently recovering (H3) - Bound health check timeout to 3600s max (M1) - Use hex fallback instead of empty string for bech32 failure (L1) - Floor ACK timeout at DESCRIPTOR_ACK_TIMEOUT_SECS (ACK asymmetry) - Fix clippy lint: use function reference in map - Use Option<u64> for created_at to avoid false staleness on legacy records - Update lib.rs exports after conflict resolution with main
6acd489 to
e6f9cff
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
keep-mobile/src/lib.rs (1)
738-746:⚠️ Potential issue | 🟠 MajorDrop
self.noderead guards before awaited network work.Both blocks keep
RwLockReadGuardalive across.await, which can delay writers for the full network/timeout window.🔧 Suggested lock-scope fix
- let (group_pubkey, result) = { - let node_guard = self.node.read().await; - let node = node_guard.as_ref().ok_or(KeepMobileError::NotInitialized)?; - let gp = hex::encode(node.group_pubkey()); - let r = node - .health_check(Duration::from_secs(timeout_secs)) - .await - .map_err(|e| KeepMobileError::NetworkError { msg: e.to_string() })?; - (gp, r) - }; + let node = { + let node_guard = self.node.read().await; + node_guard + .as_ref() + .cloned() + .ok_or(KeepMobileError::NotInitialized)? + }; + let group_pubkey = hex::encode(node.group_pubkey()); + let result = node + .health_check(Duration::from_secs(timeout_secs)) + .await + .map_err(|e| KeepMobileError::NetworkError { msg: e.to_string() })?;- let session_id = { - let node_guard = self.node.read().await; - let node = node_guard.as_ref().ok_or(KeepMobileError::NotInitialized)?; + let node = { + let node_guard = self.node.read().await; + node_guard + .as_ref() + .cloned() + .ok_or(KeepMobileError::NotInitialized)? + }; + let session_id = { // existing logic... node.request_descriptor_with_timeout( policy, &network, &xpub, &fingerprint, timeout_secs, ) .await .map_err(|e| KeepMobileError::NetworkError { msg: e.to_string() })? };#!/bin/bash rg -n -C6 'let node_guard = self\.node\.read\(\)\.await;|health_check\(Duration::from_secs|request_descriptor_with_timeout\(' keep-mobile/src/lib.rsAlso applies to: 830-855
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-mobile/src/lib.rs` around lines 738 - 746, The read guard from self.node.read().await (node_guard) is held across an await (node.health_check(...).await), blocking writers; fix by limiting the guard's scope: acquire self.node.read().await, check that node is Some (returning KeepMobileError::NotInitialized if not), clone or extract only the needed sync data (e.g., call node.group_pubkey() and hex::encode it or clone any request params), then drop the guard (end the block) before calling async functions like node.health_check(Duration::from_secs(timeout_secs)). Do the same pattern for other occurrences such as request_descriptor_with_timeout to ensure no RwLockReadGuard is alive across .await.keep-frost-net/src/node/descriptor.rs (1)
205-209:⚠️ Potential issue | 🟠 MajorFail fast on invalid proposal timeout instead of silently downgrading.
Filtering invalid
payload.timeout_secstoNoneconverts bad input into default behavior; this handler should reject out-of-range values explicitly.🔒 Suggested strict timeout handling
- let propose_timeout = payload - .timeout_secs - .filter(|&t| t > 0 && t <= DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS) - .map(std::time::Duration::from_secs); + let propose_timeout = match payload.timeout_secs { + Some(0) => { + return Err(FrostNetError::Session( + "Descriptor timeout must be greater than zero".into(), + )); + } + Some(t) if t > DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS => { + return Err(FrostNetError::Session(format!( + "Descriptor timeout {t}s exceeds maximum {}s", + DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS + ))); + } + Some(t) => Some(std::time::Duration::from_secs(t)), + None => None, + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/node/descriptor.rs` around lines 205 - 209, The code silently converts out-of-range payload.timeout_secs to None by filtering; instead validate payload.timeout_secs explicitly and return an error when it's Some but <= 0 or > DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS. In the block that currently computes propose_timeout, check payload.timeout_secs: if None, leave propose_timeout None; if Some(t) and t is within (0..=DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS) set propose_timeout = Some(Duration::from_secs(t)); otherwise return an Err (or propagate a validation error) from the surrounding handler so invalid timeouts fail fast; reference the propose_timeout variable, payload.timeout_secs, and DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS to locate and update the logic.
🧹 Nitpick comments (1)
keep-mobile/src/persistence.rs (1)
398-421: Prune stale health-index entries during load.Missing/corrupt records are ignored but kept in the index, so dead keys can accumulate and be retried forever.
♻️ Suggested cleanup pattern
pub(crate) fn load_health_statuses(storage: &Arc<dyn SecureStorage>) -> Vec<KeyHealthStatusInfo> { @@ - let index = load_health_index(storage); + let index = load_health_index(storage); let mut results = Vec::new(); - for key in index { - if let Ok(data) = storage.load_share_by_key(key) { + let mut stale = Vec::new(); + for key in &index { + if let Ok(data) = storage.load_share_by_key(key.clone()) { if let Ok(stored) = serde_json::from_slice::<StoredHealthStatus>(&data) { let stale_age = now.saturating_sub(stored.last_check_timestamp); results.push(KeyHealthStatusInfo { @@ }); + } else { + stale.push(key.clone()); } + } else { + stale.push(key.clone()); } } + if !stale.is_empty() { + let cleaned: Vec<String> = index.into_iter().filter(|k| !stale.contains(k)).collect(); + let _ = persist_health_index(storage, &cleaned); + } results }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-mobile/src/persistence.rs` around lines 398 - 421, load_health_statuses currently ignores missing/corrupt entries but leaves their keys in the health index (loaded via load_health_index), causing dead keys to accumulate; modify load_health_statuses so that when storage.load_share_by_key(key) returns Err or serde_json::from_slice::<StoredHealthStatus>(&data) fails it removes that key from the in-memory index and after iterating persists the cleaned index back to storage (call the existing index save function, e.g. save_health_index or equivalent) so only successfully loaded keys remain in the stored index; keep all other logic (timestamp/staleness calculation and constructing KeyHealthStatusInfo) unchanged and ensure errors do not abort the loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@keep-desktop/src/app.rs`:
- Around line 1358-1361: The issue is that matches!(progress,
DescriptorProgress::Contributed) moves progress (DescriptorProgress is Clone but
not Copy) before it is reused; update the condition to borrow progress instead
(e.g., use matches!(&progress, DescriptorProgress::Contributed) or match on
&progress) so the subsequent use of progress when setting s.phase =
SetupPhase::Coordinating(progress) compiles; modify the if condition around
DescriptorProgress and ensure you still pass the owned progress into
SetupPhase::Coordinating when assigning via self.screen -> Screen::Wallet /
WalletScreen.
In `@keep-frost-net/src/node/mod.rs`:
- Around line 1068-1070: The health_check function currently silently clamps the
caller-provided timeout; instead validate the timeout argument and return an
explicit error when it's out of the allowed range. In the health_check method,
remove the timeout.clamp(...) call and add a bounds check against the allowed
minimum and maximum (e.g., Duration::from_secs(1) and Duration::from_secs(300));
if the timeout is outside that range return an Err (use the same
Result/anyhow/Error type used by health_check) with a clear message describing
the invalid timeout value. Keep the rest of health_check logic unchanged (use
the validated timeout variable thereafter).
---
Duplicate comments:
In `@keep-frost-net/src/node/descriptor.rs`:
- Around line 205-209: The code silently converts out-of-range
payload.timeout_secs to None by filtering; instead validate payload.timeout_secs
explicitly and return an error when it's Some but <= 0 or >
DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS. In the block that currently computes
propose_timeout, check payload.timeout_secs: if None, leave propose_timeout
None; if Some(t) and t is within (0..=DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS) set
propose_timeout = Some(Duration::from_secs(t)); otherwise return an Err (or
propagate a validation error) from the surrounding handler so invalid timeouts
fail fast; reference the propose_timeout variable, payload.timeout_secs, and
DESCRIPTOR_SESSION_MAX_TIMEOUT_SECS to locate and update the logic.
In `@keep-mobile/src/lib.rs`:
- Around line 738-746: The read guard from self.node.read().await (node_guard)
is held across an await (node.health_check(...).await), blocking writers; fix by
limiting the guard's scope: acquire self.node.read().await, check that node is
Some (returning KeepMobileError::NotInitialized if not), clone or extract only
the needed sync data (e.g., call node.group_pubkey() and hex::encode it or clone
any request params), then drop the guard (end the block) before calling async
functions like node.health_check(Duration::from_secs(timeout_secs)). Do the same
pattern for other occurrences such as request_descriptor_with_timeout to ensure
no RwLockReadGuard is alive across .await.
---
Nitpick comments:
In `@keep-mobile/src/persistence.rs`:
- Around line 398-421: load_health_statuses currently ignores missing/corrupt
entries but leaves their keys in the health index (loaded via
load_health_index), causing dead keys to accumulate; modify load_health_statuses
so that when storage.load_share_by_key(key) returns Err or
serde_json::from_slice::<StoredHealthStatus>(&data) fails it removes that key
from the in-memory index and after iterating persists the cleaned index back to
storage (call the existing index save function, e.g. save_health_index or
equivalent) so only successfully loaded keys remain in the stored index; keep
all other logic (timestamp/staleness calculation and constructing
KeyHealthStatusInfo) unchanged and ensure errors do not abort the loop.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
keep-cli/src/cli.rskeep-cli/src/commands/frost_network/mod.rskeep-cli/src/commands/wallet.rskeep-cli/src/main.rskeep-core/src/backend.rskeep-core/src/lib.rskeep-core/src/storage.rskeep-core/src/wallet.rskeep-desktop/src/app.rskeep-frost-net/src/descriptor_session.rskeep-frost-net/src/lib.rskeep-frost-net/src/node/descriptor.rskeep-frost-net/src/node/mod.rskeep-frost-net/src/protocol.rskeep-mobile/src/error.rskeep-mobile/src/lib.rskeep-mobile/src/persistence.rskeep-mobile/src/types.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- keep-core/src/backend.rs
- keep-cli/src/main.rs
- keep-core/src/wallet.rs
- keep-mobile/src/types.rs
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
keep-frost-net/src/node/descriptor.rs (1)
219-244:⚠️ Potential issue | 🟠 MajorDon’t collapse all session-creation errors into “duplicate proposal.”
Err(_)currently treats every failure as a duplicate session. That can mask real protocol/state errors and make malformed proposals invisible operationally. Only the true “already exists” path should be ignored.🔧 Suggested fix
- let session_created = { - let mut sessions = self.descriptor_sessions.write(); - match sessions.create_session_with_timeout( - payload.session_id, - self.group_pubkey, - payload.policy.clone(), - payload.network.clone(), - expected_contributors, - HashSet::new(), - propose_timeout, - ) { - Ok(session) => { - session.set_initiator(sender); - - if let Err(e) = session.add_contribution( - sender_share_index, - payload.initiator_xpub.clone(), - payload.initiator_fingerprint.clone(), - ) { - debug!("Failed to store initiator contribution: {e}"); - } - true - } - Err(_) => { - debug!("Descriptor session already exists, ignoring duplicate proposal"); - false - } - } - }; + let session_created = { + let mut sessions = self.descriptor_sessions.write(); + if sessions.get_session(&payload.session_id).is_some() { + debug!("Descriptor session already exists, ignoring duplicate proposal"); + false + } else { + let session = sessions.create_session_with_timeout( + payload.session_id, + self.group_pubkey, + payload.policy.clone(), + payload.network.clone(), + expected_contributors, + HashSet::new(), + propose_timeout, + )?; + session.set_initiator(sender); + + if let Err(e) = session.add_contribution( + sender_share_index, + payload.initiator_xpub.clone(), + payload.initiator_fingerprint.clone(), + ) { + debug!("Failed to store initiator contribution: {e}"); + } + true + } + };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/node/descriptor.rs` around lines 219 - 244, The Err arm of the sessions.create_session_with_timeout call currently treats all errors as duplicates; change the match to capture the error (e.g., Err(e)) and explicitly handle the “already exists”/duplicate session variant by comparing e to the specific error (or using matches!(e, SessionError::AlreadyExists | ...)) and return false only for that case, while logging and returning false/propagating (per surrounding convention) for any other error; update the debug message in the non-duplicate branch to include the error details so malformed proposals and real protocol/state errors are visible (refer to sessions.create_session_with_timeout, the Err(_) branch and the surrounding match block).
🧹 Nitpick comments (1)
keep-frost-net/src/node/mod.rs (1)
1084-1088: Consider using a set for responsive membership checks.
responsive.contains(idx)in a loop is quadratic. ConvertingresponsivetoHashSet<u16>first keeps this linear and clearer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@keep-frost-net/src/node/mod.rs` around lines 1084 - 1088, The current collection of unresponsive peers is doing a membership check against responsive in a loop causing O(n*m) behavior; convert responsive into a HashSet<u16> first (e.g., build responsive_set from responsive via iter().cloned().collect()) and then use responsive_set.contains(idx) inside the peers_snapshot mapping to produce unresponsive, and add use std::collections::HashSet if not already imported; update references to use responsive_set.contains in the block that produces unresponsive.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@keep-mobile/src/persistence.rs`:
- Around line 414-423: The current use of saturating_sub when computing
stale_age hides future timestamps (last_check_timestamp > now) by yielding 0 and
marking keys as fresh; change the computation to treat future timestamps as aged
by replacing saturating_sub with checked_sub and handling None as a very large
age (e.g., now.checked_sub(stored.last_check_timestamp).unwrap_or(u64::MAX)) so
that is_stale/is_critical (which compare against KEY_HEALTH_STALE_THRESHOLD_SECS
and KEY_HEALTH_CRITICAL_THRESHOLD_SECS) correctly flag skewed future timestamps
as stale/critical; update the stale_age binding accordingly (referencing
stale_age, last_check_timestamp, now, and the KEY_HEALTH_* constants).
- Around line 390-394: The read-modify-write in load_health_index(...) /
persist_health_index(...) is non-atomic and can lose entries under concurrent
writes; change this to an atomic update: use the storage layer's
transaction/compare-and-swap or a retry loop that reloads the index and
re-applies the push until persist_health_index succeeds (or use a mutex around
HEALTH_STATUS_INDEX_KEY updates if storage has no transaction API).
Specifically, locate the block using load_health_index, the local variable
index, the key variable, and the call to persist_health_index and replace it
with an atomic update strategy (transactional write, CAS, or retry-on-conflict)
so two concurrent updates cannot overwrite each other.
---
Outside diff comments:
In `@keep-frost-net/src/node/descriptor.rs`:
- Around line 219-244: The Err arm of the sessions.create_session_with_timeout
call currently treats all errors as duplicates; change the match to capture the
error (e.g., Err(e)) and explicitly handle the “already exists”/duplicate
session variant by comparing e to the specific error (or using matches!(e,
SessionError::AlreadyExists | ...)) and return false only for that case, while
logging and returning false/propagating (per surrounding convention) for any
other error; update the debug message in the non-duplicate branch to include the
error details so malformed proposals and real protocol/state errors are visible
(refer to sessions.create_session_with_timeout, the Err(_) branch and the
surrounding match block).
---
Nitpick comments:
In `@keep-frost-net/src/node/mod.rs`:
- Around line 1084-1088: The current collection of unresponsive peers is doing a
membership check against responsive in a loop causing O(n*m) behavior; convert
responsive into a HashSet<u16> first (e.g., build responsive_set from responsive
via iter().cloned().collect()) and then use responsive_set.contains(idx) inside
the peers_snapshot mapping to produce unresponsive, and add use
std::collections::HashSet if not already imported; update references to use
responsive_set.contains in the block that produces unresponsive.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
keep-frost-net/src/node/descriptor.rskeep-frost-net/src/node/mod.rskeep-mobile/src/persistence.rs
Summary
Summary by CodeRabbit
New Features
Bug Fixes