Conversation
- Add KEEP_ASSERT macro with 58 precondition checks on crypto/storage paths - Replace all hot-path malloc/free with bounded stack/static buffers - Split 4 oversized functions (frost_sign, frost_commit, frost_session_resume, app_main) - Consolidate duplicate logging macros into log_compat.h
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughAdds assertion-based error handling and ERR_ASSERT, centralizes cross-platform logging, replaces several heap allocations with fixed-size buffers, tightens input validation across modules, and refactors FROST signing/commit flows into validation and execution stages. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
main/nostr_frost.h (1)
77-86:⚠️ Potential issue | 🟠 MajorStack overflow risk:
frost_sign_request_tis now ~4.2 KB.Embedding a 4096-byte array directly in the struct means every local
frost_sign_request_tvariable consumes ~4.2 KB of stack. Infrost_coordinator.c'swebsocket_event_handler, this struct is stack-allocated at line 92 alongside achar msg[4096]buffer (line 74), putting ~8.3 KB on the stack within the same scope. ESP32's default event loop task stack is only ~3.5 KB, making overflow certain.Consider:
- Making
frost_sign_request_tstatic in the coordinator (single active request), or- Keeping the pointer-based payload design with bounded allocation.
main/error_codes.c (1)
359-372:⚠️ Potential issue | 🟠 Major
error_to_jsonrpc_codedefault case leaks internal error codes to clients.The
defaultbranch returns the raw internal error code (e.g.,0x0C01forERR_ASSERT). JSON-RPC clients should not see internal error taxonomy. Consider mapping all non-protocol errors to-32603(internal error).Proposed fix
default: - return code; + return -32603; /* JSON-RPC internal error */ }
🤖 Fix all issues with AI agents
In `@main/frost_coordinator.c`:
- Around line 72-146: The handler currently places a large stack buffer msg
(MAX_WS_MESSAGE_SIZE) and big structs like frost_sign_request_t on the stack
inside the WebSocket callback which can overflow ESP32 task stacks; fix by
removing large stack allocations: copy the incoming payload into heap memory
(malloc length+1) instead of char msg[MAX_WS_MESSAGE_SIZE], or enqueue a
heap-allocated copy of data->data_ptr/length to a worker task so parsing (calls
to cJSON_Parse, frost_parse_sign_request / frost_parse_sign_response /
frost_parse_dkg_round1_event / frost_parse_dkg_round2_event /
frost_parse_nip46_event) runs on a task with a larger stack; alternatively use a
single static buffer protected by g_ctx mutex for msg but still allocate large
result structs (e.g., frost_sign_request_t, frost_sign_response_t,
frost_dkg_round1_t, frost_dkg_round2_t, nip46_request_t) on the heap
(malloc/free) or have parse functions return heap objects; ensure all heap
allocations are freed after callbacks and adjust code paths that currently
expect stack-allocated structs
(frost_sign_request_free/frost_nip46_request_free) accordingly.
In `@main/frost_crypto_ops.c`:
- Around line 87-91: Remove the redundant participant-count check that
duplicates the earlier guard: delete the second "if (group->participant_count >
MAX_GROUP_PARTICIPANTS) { secp256k1_frost_vss_commitments_destroy(vss); return
-4; }" so execution can proceed to declare secp256k1_frost_keygen_secret_share
shares[MAX_GROUP_PARTICIPANTS];; keep the initial check that returns -2 intact
and leave the VSS cleanup only where it is needed (refer to
group->participant_count and secp256k1_frost_vss_commitments_destroy/vss in this
function to locate the duplicated guard).
In `@main/nostr_frost_sign.c`:
- Around line 184-188: The local stack allocation payload_hex (size
MAX_SIGN_PAYLOAD_SIZE * 2 + 1) can be ~8KB and may overflow ESP32 task stacks;
replace the large stack buffer with a heap allocation or a shared static buffer
protected by a mutex. Specifically, in the block that checks
request->payload_len and calls bytes_to_hex and cJSON_AddStringToObject,
allocate payload_hex via malloc(sizeof(...) ) before calling bytes_to_hex and
free it after cJSON_AddStringToObject (or change payload_hex to a static buffer
and guard usage with a mutex if reentrancy is required), ensuring you handle
allocation failure and keep the same behavior for request->payload_len bounds.
In `@main/storage_checkpoint.c`:
- Around line 186-187: Remove the redundant bounds check: the second check
comparing header.data_len to CHECKPOINT_MAX_DATA_SIZE is duplicative of the
earlier check against STORAGE_CHECKPOINT_MAX_SIZE - sizeof(header); keep only
one consistent validation (preferably the existing STORAGE_CHECKPOINT_MAX_SIZE -
sizeof(header) check) and delete the duplicate `if (header.data_len >
CHECKPOINT_MAX_DATA_SIZE) return STORAGE_ERR_INVALID_DATA;` so validation uses a
single canonical limit (header.data_len, STORAGE_CHECKPOINT_MAX_SIZE,
sizeof(header)).
- Around line 120-121: The second bounds check using CHECKPOINT_MAX_DATA_SIZE is
redundant because you already validate len against STORAGE_CHECKPOINT_MAX_SIZE -
sizeof(checkpoint_header_t) earlier; remove the unreachable check (the if (len >
CHECKPOINT_MAX_DATA_SIZE) return STORAGE_ERR_INVALID_DATA;) from the function to
avoid duplicate validation and keep the single canonical check (len >
STORAGE_CHECKPOINT_MAX_SIZE - sizeof(checkpoint_header_t)). Ensure you leave the
original check that references STORAGE_CHECKPOINT_MAX_SIZE and
checkpoint_header_t and keep the same error return STORAGE_ERR_INVALID_DATA
where that first check exists.
- Around line 37-38: The static file-scoped buffer checkpoint_crypto_buf (size
CHECKPOINT_MAX_DATA_SIZE) is shared by storage_checkpoint_save and
storage_checkpoint_load and must be protected for concurrent FreeRTOS tasks; add
a mutex (e.g., a static SemaphoreHandle_t checkpoint_mutex initialized once) and
take the mutex at the start of storage_checkpoint_save and
storage_checkpoint_load and give it before returning to serialize access, or
alternatively document a clear single-threaded invariant if you choose not to
add synchronization; ensure the mutex is created before first use and handle
failures to create/take it gracefully.
In `@main/storage.c`:
- Around line 501-505: storage_load_share currently only uses KEEP_ASSERT for
group and share_hex so when assertions are disabled a NULL group or share_hex
can crash (e.g., inside storage_pad_group_name). Add explicit NULL guards at the
top of storage_load_share: call storage_validate_group_name(group) (or check
group != NULL) and check share_hex != NULL, and return the same error code used
by storage_save_share on invalid args so the behavior matches that function;
ensure you reference storage_pad_group_name and storage_validate_group_name when
adding these checks.
- Around line 565-567: storage_delete_share lacks a real NULL check for the
group parameter (KEEP_ASSERT may be a no-op), which can lead to a crash in
storage_pad_group_name; add an explicit check at the start of
storage_delete_share to handle group == NULL gracefully (e.g., log an error and
return a failure code such as -1 or appropriate errno), ensuring you do not call
storage_pad_group_name when group is NULL and reference the function name
storage_delete_share and storage_pad_group_name so the change is made in the
correct spot.
🧹 Nitpick comments (4)
main/nostr_frost_sign.c (1)
93-103: Residual+1from null-terminated string era is unnecessary for binary payloads.The
hex_len / 2 + 1size check (line 93) andout_lenargument (line 100) appear to be a leftover from whenpayloadwas a null-terminated allocated string. Sincepayloadis now a rawuint8_tbuffer andhex_to_byteswrites exactlyhex_len / 2bytes, the+ 1only adds confusion. It doesn't cause a buffer overflow (the guard still keeps it withinMAX_SIGN_PAYLOAD_SIZE), but it artificially limits the usable payload size by one byte.Suggested simplification
- if (hex_len / 2 + 1 > MAX_SIGN_PAYLOAD_SIZE) { + if (hex_len / 2 > MAX_SIGN_PAYLOAD_SIZE) { cJSON_Delete(inner); free(decrypted); cJSON_Delete(root); return -7; } - int decoded = hex_to_bytes(payload_hex->valuestring, request->payload, - hex_len / 2 + 1); + int decoded = hex_to_bytes(payload_hex->valuestring, request->payload, + MAX_SIGN_PAYLOAD_SIZE);main/storage_crypto.c (1)
244-252:plaintext_len > 0assertion has no fallback guard.The
KEEP_ASSERT(plaintext != NULL)at line 244 is backed by the!plaintextcheck at line 250 — good defense-in-depth. However,KEEP_ASSERT(plaintext_len > 0)at line 248 has no corresponding fallback check. IfKEEP_ASSERTis compiled out or becomes a soft warning in some build configurations, a zero-length encrypt call would pass through tombedtls_gcm_crypt_and_tagunchecked.The same applies to
KEEP_ASSERT(ciphertext_len > 0)at line 280 instorage_crypto_decrypt.Add fallback check consistent with other guards
KEEP_ASSERT(plaintext_len > 0); - if (!key_initialized || !plaintext || !nonce || !ciphertext || !tag) { + if (!key_initialized || !plaintext || !nonce || !ciphertext || !tag || plaintext_len == 0) { return -1; }And similarly for decrypt:
KEEP_ASSERT(ciphertext_len > 0); - if (!key_initialized || !ciphertext || !nonce || !tag || !plaintext) { + if (!key_initialized || !ciphertext || !nonce || !tag || !plaintext || ciphertext_len == 0) { return -1; }main/frost_crypto_ops.c (1)
70-79: KEEP_ASSERT checks duplicate the existing NULL guards below.Lines 70-73 assert
group,round1,secret_shares_out,share_countare non-NULL, then lines 78-79 re-check the exact same pointers. The existing guards are now unreachable (KEEP_ASSERT already returnedERR_ASSERT). This pattern repeats infrost_dkg_finalize(lines 156-166) andfrost_sign_partial(lines 237-244).Not a bug — just dead code after the assertions. Consider removing the redundant
ifblocks to avoid confusion about which error code is actually returned.main/storage_checkpoint.c (1)
308-318: KEEP_ASSERT guards followed by now-redundant NULL checks.Same pattern as noted in
frost_crypto_ops.c— theKEEP_ASSERTat lines 309–311 / 378–380 already returnsERR_ASSERTbefore the manualif (!session_id || !data)checks can execute. The manual checks are dead code.Also applies to: 377-389
- Heap-allocate large buffers in frost_coordinator WebSocket handler - Validate subscription IDs to prevent JSON injection - Replace portMAX_DELAY with finite 10s timeout on WS sends - Check esp_websocket_client_send_text return values - Revert frost_sign_request_t.payload to heap pointer (was 4KB on stack) - Heap-allocate payload_hex in frost_create_sign_request - Add mutex protection for checkpoint_crypto_buf shared buffer - Map all non-protocol errors to -32603 in JSON-RPC responses - Wrap error context output in #ifndef NDEBUG - Clear sensitive fields (share, dkg_data) in protocol_free_request - Secure-zero share_hex and result buffers in DKG round2 - Remove dead code and redundant NULL checks after KEEP_ASSERT - Reduce session resume timeout from 10x to 3x
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@main/frost_coordinator.c`:
- Around line 406-410: The call to esp_websocket_client_send_text in the
unsubscribe path ignores its return value; update the block that iterates
g_ctx.relays (relay_connection_t *relay / COORDINATOR_STATE_CONNECTED /
relay->ws_handle) to capture the return from
esp_websocket_client_send_text(close_msg, strlen(close_msg),
pdMS_TO_TICKS(WS_SEND_TIMEOUT_MS)), check for non-success, and handle failures
(e.g., log via your logging facility, optionally mark relay->state as
disconnected or schedule a retry/cleanup) so send errors are not silently
ignored.
- Around line 87-95: The code around the websocket message handling (check of
data->op_code, allocation of msg, memcpy and call to cJSON_Parse) has
clang-format violations; run clang-format over this block or reformat so spacing
and indentation follow project style (align conditions, space around operators,
and wrap function calls consistently), ensuring the if statement, malloc/memcpy
lines, and the cJSON_Parse/free(msg) lines are formatted correctly; verify the
changes touch the block that uses data->op_code, msg, memcpy, cJSON_Parse, and
free to resolve the CI formatting errors.
- Around line 27-39: The function is_safe_subscription_id has clang-format style
violations; run clang-format (or apply the project's formatting rules) on that
function to restore consistent indentation and spacing, specifically reflow the
declaration and the for-loop/if blocks inside is_safe_subscription_id so they
match the repository's style; locate the function by name
is_safe_subscription_id and update its formatting (no logic changes) so CI's
clang-format check passes.
- Around line 384-386: The subscribe/unsubscribe path calls
esp_websocket_client_send_text without checking its return value; replicate the
publish_event behavior by capturing the return (e.g., int res =
esp_websocket_client_send_text(...)) and if res < 0 log a warning via
ESP_LOGW(TAG, ...) including relay->url and the error code so send failures are
observable; also fix the clang-format violation by ensuring the if-statement and
its body follow project formatting (brace/indentation) in the subscribe block
that references relay->state, relay->ws_handle and WS_SEND_TIMEOUT_MS.
In `@main/storage_checkpoint.c`:
- Around line 44-55: The clang-format failures stem from single-line if bodies
in checkpoint_lock and checkpoint_unlock not matching project style; update the
implementations to use consistent brace formatting and spacing: ensure the
static variable checkpoint_mutex remains, and change checkpoint_lock and
checkpoint_unlock to use explicit braced blocks for the if checks (referencing
functions checkpoint_lock and checkpoint_unlock and variable checkpoint_mutex)
so each conditional body is on its own line with braces and proper indentation
to satisfy clang-format.
🧹 Nitpick comments (3)
main/frost_coordinator.c (1)
96-164: Consider protectingg_ctxfields accessed in the WS handler with the existing mutex.The handler reads
g_ctx.callbacks,g_ctx.current_group, andg_ctx.privkeywithout holdingg_ctx.mutex, while other tasks can modify these viafrost_coordinator_set_callbacks,frost_coordinator_set_group, orfrost_coordinator_deinit. Since this is a hardening PR and the mutex already exists, it would be a natural addition to guard against data races here.main/frost_signer.c (1)
240-275: Message length validation reusesSESSION_ID_*constants — consider named constants for clarity.Lines 254 and 259 use
SESSION_ID_HEX_LENandSESSION_ID_LENto validate and parse the message. While numerically correct (messages are 32-byte hashes), using session-ID-specific constants for the message makes the code harder to reason about for future readers.main/storage_checkpoint.c (1)
46-51:checkpoint_locksilently becomes a no-op if called beforecheckpoint_init.When
checkpoint_mutexisNULL(beforecheckpoint_initcreates it),checkpoint_lock()silently returns without acquiring any lock. While the current call flow always callscheckpoint_init()first (viastorage_checkpoint_save/load), a future caller bypassing init would get no locking silently.Consider at minimum an assert or log, though the current code flow does guard this correctly.
- Revert checkpoint_crypto_buf from 24KB static to heap-allocated (fixes dram0_0_seg overflow by 4280 bytes) - Keep mutex protection for thread safety - Fix clang-format violations in coordinator, sign, checkpoint - Add missing direct includes for clang-tidy misc-include-cleaner
- Log warnings on failed subscribe/unsubscribe sends - Add direct includes to frost_signer_core.c and frost_signer_storage.c for clang-tidy misc-include-cleaner
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
main/storage_checkpoint.c (3)
136-151:⚠️ Potential issue | 🟠 MajorTOCTOU: existing-checkpoint check races with the lock acquisition.
The read of
existingat line 137 and theCHECKPOINT_MAGICcheck at line 140 happen beforecheckpoint_lock()at line 151. Two concurrent callers could both see "no existing checkpoint" and both proceed to write, corrupting data. Move the lock acquisition before the existence check (or move the existence check inside the locked region).Proposed fix
if (checkpoint_init() != 0) return STORAGE_ERR_NOT_INIT; + checkpoint_lock(); + checkpoint_header_t existing; esp_err_t err = esp_partition_read(checkpoint_partition, 0, &existing, sizeof(existing)); - if (err != ESP_OK) + if (err != ESP_OK) { + checkpoint_unlock(); return STORAGE_ERR_IO; + } if (existing.magic == CHECKPOINT_MAGIC) { + checkpoint_unlock(); return STORAGE_ERR_CHECKPOINT_EXISTS; } checkpoint_header_t header; memset(&header, 0, sizeof(header)); header.magic = CHECKPOINT_MAGIC; pad_session_id(header.session_id, session_id); header.counter = checkpoint_get_counter(); header.data_len = (uint16_t)len; - checkpoint_lock(); - uint8_t *encrypted = malloc(len);
259-294:⚠️ Potential issue | 🟠 Major
storage_checkpoint_clearlacks mutex protection.This function reads, then erases the checkpoint partition without holding
checkpoint_lock. It can race with concurrentstorage_checkpoint_saveorstorage_checkpoint_loadcalls, leading to corrupted reads/writes. Wrap the critical section withcheckpoint_lock()/checkpoint_unlock()similar to the save and load paths.
556-561:⚠️ Potential issue | 🟡 MinorMutex is not destroyed in
storage_checkpoint_cleanup, leaking the semaphore handle.If
checkpoint_init/storage_checkpoint_cleanupcan be called multiple times (e.g., during testing or re-initialization), thecheckpoint_mutexsemaphore is never freed, leaking a FreeRTOS kernel object each cycle.Proposed fix
void storage_checkpoint_cleanup(void) { +#ifdef ESP_PLATFORM + if (checkpoint_mutex) { + vSemaphoreDelete(checkpoint_mutex); + checkpoint_mutex = NULL; + } +#endif checkpoint_initialized = false; checkpoint_partition = NULL; checkpoint_counter = 0; checkpoint_counter_loaded = false; }
🤖 Fix all issues with AI agents
In `@main/frost_coordinator.c`:
- Around line 88-167: The code stack-allocates large event structs
(frost_sign_response_t, frost_dkg_round1_t, frost_dkg_round2_t, nip46_request_t)
inside the WEBSOCKET_EVENT_DATA handling, which can blow the stack (especially
frost_dkg_round1_t); change those local stack allocations to heap allocations
instead: replace locals like "frost_dkg_round1_t r1;" with pointers (e.g.
"frost_dkg_round1_t *r1 = malloc(sizeof(*r1));"), check malloc success, pass the
pointer to frost_parse_dkg_round1_event and the callback, free the pointer after
use, and do the same pattern for frost_sign_response_t, frost_dkg_round2_t and
nip46_request_t (also ensure frost_nip46_request_free is still called where
appropriate and avoid double-free). Locate changes around the event_str handling
and the calls to frost_parse_sign_response, frost_parse_dkg_round1_event,
frost_parse_dkg_round2_event, and frost_parse_nip46_event in
frost_coordinator.c.
🧹 Nitpick comments (1)
main/storage_checkpoint.c (1)
360-363:KEEP_ASSERTwill crash on invalid input instead of returning an error.
storage_checkpoint_save(line 126) andstorage_checkpoint_clear(line 260) returnSTORAGE_ERR_INVALID_DATAfor null/invalid arguments, butstorage_save_session_checkpointusesKEEP_ASSERTwhich will abort. On an embedded device, crashing due to a null pointer from a caller is a harsh failure mode. Consider using the same graceful error-return pattern for consistency and resilience, reservingKEEP_ASSERTfor true invariants that indicate internal logic errors.
- Heap-allocate frost_sign_response_t, frost_dkg_round1_t (~1.1KB), frost_dkg_round2_t, nip46_request_t in websocket handler - Fix TOCTOU: move checkpoint_lock() before existence check in save - Add mutex protection to storage_checkpoint_clear - Destroy mutex in storage_checkpoint_cleanup - Add direct includes for clang-tidy misc-include-cleaner
Summary by CodeRabbit
Bug Fixes
Improvements
Tests