Skip to content

Comments

Harden codebase#137

Merged
kwsantiago merged 6 commits intomainfrom
hardening
Feb 12, 2026
Merged

Harden codebase#137
kwsantiago merged 6 commits intomainfrom
hardening

Conversation

@kwsantiago
Copy link
Contributor

@kwsantiago kwsantiago commented Feb 12, 2026

  • Add KEEP_ASSERT macro with 58 precondition checks on crypto/storage paths
  • Replace all hot-path malloc/free with bounded stack/static buffers
  • Split 4 oversized functions (frost_sign, frost_commit, frost_session_resume, app_main)
  • Consolidate duplicate logging macros into log_compat.h

Summary by CodeRabbit

  • Bug Fixes

    • Hardened input validation, bounds checks, and subscription validation; enforced payload/message size limits and safer publish/send timeouts.
    • JSON-RPC error codes now preserve standard-range codes and default non-RPC errors to a consistent server error code.
  • Improvements

    • New public "assert" error surfaced as a distinct error code.
    • Unified cross-platform logging, tighter memory zeroing, and PSBT moved to fixed-size storage for safer handling.
  • Tests

    • Updated tests to reflect JSON-RPC code defaults and PSBT content validation.

- Add KEEP_ASSERT macro with 58 precondition checks on crypto/storage paths
- Replace all hot-path malloc/free with bounded stack/static buffers
- Split 4 oversized functions (frost_sign, frost_commit, frost_session_resume, app_main)
- Consolidate duplicate logging macros into log_compat.h
@kwsantiago kwsantiago self-assigned this Feb 12, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 12, 2026

Warning

Rate limit exceeded

@kwsantiago has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 35 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Walkthrough

Adds assertion-based error handling and ERR_ASSERT, centralizes cross-platform logging, replaces several heap allocations with fixed-size buffers, tightens input validation across modules, and refactors FROST signing/commit flows into validation and execution stages.

Changes

Cohort / File(s) Summary
Error handling & assertions
main/error_codes.h, main/error_codes.c
Adds ERR_CAT_ASSERT/ERR_ASSERT, KEEP_ASSERT* macros, integrates ERR_ASSERT into name/string mappings, and exposes int error_to_jsonrpc_code(int).
Logging compatibility
main/log_compat.h, main/random_utils.c, main/frost_dkg.c, main/frost_signer.c, main/storage_crypto.c, main/storage_export.c
Introduces log_compat.h and migrates modules to use unified ESP_LOG* macros (removes platform-specific ad-hoc logging branches).
FROST signer refactor
main/frost_signer.c, main/frost_signer_core.c, main/frost_signer_storage.c
Splits commit/sign into validate+generate and validate_rng+check_policy+execute stages, adds session resume helpers, standardizes return codes, and adds input guards; mostly internal reorganization.
Coordinator & WebSocket robustness
main/frost_coordinator.c
Adds MAX_WS_MESSAGE_SIZE, MAX_PUBLISH_MSG_SIZE, send timeout constant, safe subscription-id validation, tighter parsing, prompt frees, and timeout-bound sends.
DKG & crypto ops memory hardening
main/frost_dkg.c, main/frost_crypto_ops.c
Replace heap allocations with fixed-size stack buffers, add secure_memzero wiping, and use KEEP_ASSERT preconditions to reduce dynamic-allocation error paths.
Protocol / PSBT change
main/protocol.h, main/protocol.c, test/native/test_protocol.c
Changes rpc_request_t.psbt from char * to char psbt[PROTOCOL_MAX_PSBT_LEN], adjusts parsing/cleanup and tests to check content (psbt[0]) rather than pointer.
Nostr sign payload limits
main/nostr_frost.h, main/nostr_frost_sign.c
Adds MAX_SIGN_PAYLOAD_SIZE (4096), enforces hard payload limits, adjusts allocation/serialization, and tightens cleanup.
Checkpoint & storage safety
main/storage_checkpoint.c, main/storage.c, main/storage_metadata.c, main/storage_crypto.c, main/storage_export.c
Adds CHECKPOINT_MAX_DATA_SIZE, mutex locking for checkpoint ops on ESP, boundary checks, KEEP_ASSERT input guards, and ensures locks/unlocks and secure-zeroing on error paths.
Random utils / logging unification
main/random_utils.c
Unconditionally includes log_compat.h and uses ESP_LOG* macros (removes previous no-op logging branch).
Tests updated
test/native/test_protocol.c
Updated expected JSON-RPC error codes to -32603 for non-JSONRPC errors and adjusted PSBT presence assertions to check psbt[0].

Sequence Diagram(s)

mermaid
sequenceDiagram
autonumber
participant Client
participant FrostSigner as "Frost Signer"
participant RNG
participant Storage
participant Network
Client->>FrostSigner: sign_request(id, msg)
FrostSigner->>RNG: validate_rng()
alt RNG healthy
RNG-->>FrostSigner: ok
FrostSigner->>Storage: check_policy_and_load(session_id)
Storage-->>FrostSigner: policy, shares
FrostSigner->>FrostSigner: validate inputs (KEEP_ASSERT)
FrostSigner->>FrostSigner: generate commitment / signature
FrostSigner->>Storage: optionally checkpoint/update shares
FrostSigner->>Network: emit signature share / response
Network-->>Client: signature/share response
else RNG unhealthy
RNG-->>FrostSigner: fail
FrostSigner-->>Client: error (ERR_ASSERT / ERR_*)
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I twitched my nose and checked the stack,

No leaking mallocs on my track,
KEEP_ASSERT keeps burrows tight,
Logs now glow in cosy light,
Signature hops, secure at night.

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Harden codebase' is vague and generic, using a non-descriptive term that does not convey meaningful information about the specific changes. Use a more specific title that highlights the main change, such as 'Add assertion macros and refactor hot-path memory allocation' or 'Consolidate logging and add precondition checks'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch hardening

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
main/nostr_frost.h (1)

77-86: ⚠️ Potential issue | 🟠 Major

Stack overflow risk: frost_sign_request_t is now ~4.2 KB.

Embedding a 4096-byte array directly in the struct means every local frost_sign_request_t variable consumes ~4.2 KB of stack. In frost_coordinator.c's websocket_event_handler, this struct is stack-allocated at line 92 alongside a char msg[4096] buffer (line 74), putting ~8.3 KB on the stack within the same scope. ESP32's default event loop task stack is only ~3.5 KB, making overflow certain.

Consider:

  1. Making frost_sign_request_t static in the coordinator (single active request), or
  2. Keeping the pointer-based payload design with bounded allocation.
main/error_codes.c (1)

359-372: ⚠️ Potential issue | 🟠 Major

error_to_jsonrpc_code default case leaks internal error codes to clients.

The default branch returns the raw internal error code (e.g., 0x0C01 for ERR_ASSERT). JSON-RPC clients should not see internal error taxonomy. Consider mapping all non-protocol errors to -32603 (internal error).

Proposed fix
     default:
-        return code;
+        return -32603; /* JSON-RPC internal error */
     }
🤖 Fix all issues with AI agents
In `@main/frost_coordinator.c`:
- Around line 72-146: The handler currently places a large stack buffer msg
(MAX_WS_MESSAGE_SIZE) and big structs like frost_sign_request_t on the stack
inside the WebSocket callback which can overflow ESP32 task stacks; fix by
removing large stack allocations: copy the incoming payload into heap memory
(malloc length+1) instead of char msg[MAX_WS_MESSAGE_SIZE], or enqueue a
heap-allocated copy of data->data_ptr/length to a worker task so parsing (calls
to cJSON_Parse, frost_parse_sign_request / frost_parse_sign_response /
frost_parse_dkg_round1_event / frost_parse_dkg_round2_event /
frost_parse_nip46_event) runs on a task with a larger stack; alternatively use a
single static buffer protected by g_ctx mutex for msg but still allocate large
result structs (e.g., frost_sign_request_t, frost_sign_response_t,
frost_dkg_round1_t, frost_dkg_round2_t, nip46_request_t) on the heap
(malloc/free) or have parse functions return heap objects; ensure all heap
allocations are freed after callbacks and adjust code paths that currently
expect stack-allocated structs
(frost_sign_request_free/frost_nip46_request_free) accordingly.

In `@main/frost_crypto_ops.c`:
- Around line 87-91: Remove the redundant participant-count check that
duplicates the earlier guard: delete the second "if (group->participant_count >
MAX_GROUP_PARTICIPANTS) { secp256k1_frost_vss_commitments_destroy(vss); return
-4; }" so execution can proceed to declare secp256k1_frost_keygen_secret_share
shares[MAX_GROUP_PARTICIPANTS];; keep the initial check that returns -2 intact
and leave the VSS cleanup only where it is needed (refer to
group->participant_count and secp256k1_frost_vss_commitments_destroy/vss in this
function to locate the duplicated guard).

In `@main/nostr_frost_sign.c`:
- Around line 184-188: The local stack allocation payload_hex (size
MAX_SIGN_PAYLOAD_SIZE * 2 + 1) can be ~8KB and may overflow ESP32 task stacks;
replace the large stack buffer with a heap allocation or a shared static buffer
protected by a mutex. Specifically, in the block that checks
request->payload_len and calls bytes_to_hex and cJSON_AddStringToObject,
allocate payload_hex via malloc(sizeof(...) ) before calling bytes_to_hex and
free it after cJSON_AddStringToObject (or change payload_hex to a static buffer
and guard usage with a mutex if reentrancy is required), ensuring you handle
allocation failure and keep the same behavior for request->payload_len bounds.

In `@main/storage_checkpoint.c`:
- Around line 186-187: Remove the redundant bounds check: the second check
comparing header.data_len to CHECKPOINT_MAX_DATA_SIZE is duplicative of the
earlier check against STORAGE_CHECKPOINT_MAX_SIZE - sizeof(header); keep only
one consistent validation (preferably the existing STORAGE_CHECKPOINT_MAX_SIZE -
sizeof(header) check) and delete the duplicate `if (header.data_len >
CHECKPOINT_MAX_DATA_SIZE) return STORAGE_ERR_INVALID_DATA;` so validation uses a
single canonical limit (header.data_len, STORAGE_CHECKPOINT_MAX_SIZE,
sizeof(header)).
- Around line 120-121: The second bounds check using CHECKPOINT_MAX_DATA_SIZE is
redundant because you already validate len against STORAGE_CHECKPOINT_MAX_SIZE -
sizeof(checkpoint_header_t) earlier; remove the unreachable check (the if (len >
CHECKPOINT_MAX_DATA_SIZE) return STORAGE_ERR_INVALID_DATA;) from the function to
avoid duplicate validation and keep the single canonical check (len >
STORAGE_CHECKPOINT_MAX_SIZE - sizeof(checkpoint_header_t)). Ensure you leave the
original check that references STORAGE_CHECKPOINT_MAX_SIZE and
checkpoint_header_t and keep the same error return STORAGE_ERR_INVALID_DATA
where that first check exists.
- Around line 37-38: The static file-scoped buffer checkpoint_crypto_buf (size
CHECKPOINT_MAX_DATA_SIZE) is shared by storage_checkpoint_save and
storage_checkpoint_load and must be protected for concurrent FreeRTOS tasks; add
a mutex (e.g., a static SemaphoreHandle_t checkpoint_mutex initialized once) and
take the mutex at the start of storage_checkpoint_save and
storage_checkpoint_load and give it before returning to serialize access, or
alternatively document a clear single-threaded invariant if you choose not to
add synchronization; ensure the mutex is created before first use and handle
failures to create/take it gracefully.

In `@main/storage.c`:
- Around line 501-505: storage_load_share currently only uses KEEP_ASSERT for
group and share_hex so when assertions are disabled a NULL group or share_hex
can crash (e.g., inside storage_pad_group_name). Add explicit NULL guards at the
top of storage_load_share: call storage_validate_group_name(group) (or check
group != NULL) and check share_hex != NULL, and return the same error code used
by storage_save_share on invalid args so the behavior matches that function;
ensure you reference storage_pad_group_name and storage_validate_group_name when
adding these checks.
- Around line 565-567: storage_delete_share lacks a real NULL check for the
group parameter (KEEP_ASSERT may be a no-op), which can lead to a crash in
storage_pad_group_name; add an explicit check at the start of
storage_delete_share to handle group == NULL gracefully (e.g., log an error and
return a failure code such as -1 or appropriate errno), ensuring you do not call
storage_pad_group_name when group is NULL and reference the function name
storage_delete_share and storage_pad_group_name so the change is made in the
correct spot.
🧹 Nitpick comments (4)
main/nostr_frost_sign.c (1)

93-103: Residual +1 from null-terminated string era is unnecessary for binary payloads.

The hex_len / 2 + 1 size check (line 93) and out_len argument (line 100) appear to be a leftover from when payload was a null-terminated allocated string. Since payload is now a raw uint8_t buffer and hex_to_bytes writes exactly hex_len / 2 bytes, the + 1 only adds confusion. It doesn't cause a buffer overflow (the guard still keeps it within MAX_SIGN_PAYLOAD_SIZE), but it artificially limits the usable payload size by one byte.

Suggested simplification
-                    if (hex_len / 2 + 1 > MAX_SIGN_PAYLOAD_SIZE) {
+                    if (hex_len / 2 > MAX_SIGN_PAYLOAD_SIZE) {
                         cJSON_Delete(inner);
                         free(decrypted);
                         cJSON_Delete(root);
                         return -7;
                     }
-                    int decoded = hex_to_bytes(payload_hex->valuestring, request->payload,
-                                               hex_len / 2 + 1);
+                    int decoded = hex_to_bytes(payload_hex->valuestring, request->payload,
+                                               MAX_SIGN_PAYLOAD_SIZE);
main/storage_crypto.c (1)

244-252: plaintext_len > 0 assertion has no fallback guard.

The KEEP_ASSERT(plaintext != NULL) at line 244 is backed by the !plaintext check at line 250 — good defense-in-depth. However, KEEP_ASSERT(plaintext_len > 0) at line 248 has no corresponding fallback check. If KEEP_ASSERT is compiled out or becomes a soft warning in some build configurations, a zero-length encrypt call would pass through to mbedtls_gcm_crypt_and_tag unchecked.

The same applies to KEEP_ASSERT(ciphertext_len > 0) at line 280 in storage_crypto_decrypt.

Add fallback check consistent with other guards
     KEEP_ASSERT(plaintext_len > 0);

-    if (!key_initialized || !plaintext || !nonce || !ciphertext || !tag) {
+    if (!key_initialized || !plaintext || !nonce || !ciphertext || !tag || plaintext_len == 0) {
         return -1;
     }

And similarly for decrypt:

     KEEP_ASSERT(ciphertext_len > 0);

-    if (!key_initialized || !ciphertext || !nonce || !tag || !plaintext) {
+    if (!key_initialized || !ciphertext || !nonce || !tag || !plaintext || ciphertext_len == 0) {
         return -1;
     }
main/frost_crypto_ops.c (1)

70-79: KEEP_ASSERT checks duplicate the existing NULL guards below.

Lines 70-73 assert group, round1, secret_shares_out, share_count are non-NULL, then lines 78-79 re-check the exact same pointers. The existing guards are now unreachable (KEEP_ASSERT already returned ERR_ASSERT). This pattern repeats in frost_dkg_finalize (lines 156-166) and frost_sign_partial (lines 237-244).

Not a bug — just dead code after the assertions. Consider removing the redundant if blocks to avoid confusion about which error code is actually returned.

main/storage_checkpoint.c (1)

308-318: KEEP_ASSERT guards followed by now-redundant NULL checks.

Same pattern as noted in frost_crypto_ops.c — the KEEP_ASSERT at lines 309–311 / 378–380 already returns ERR_ASSERT before the manual if (!session_id || !data) checks can execute. The manual checks are dead code.

Also applies to: 377-389

- Heap-allocate large buffers in frost_coordinator WebSocket handler
- Validate subscription IDs to prevent JSON injection
- Replace portMAX_DELAY with finite 10s timeout on WS sends
- Check esp_websocket_client_send_text return values
- Revert frost_sign_request_t.payload to heap pointer (was 4KB on stack)
- Heap-allocate payload_hex in frost_create_sign_request
- Add mutex protection for checkpoint_crypto_buf shared buffer
- Map all non-protocol errors to -32603 in JSON-RPC responses
- Wrap error context output in #ifndef NDEBUG
- Clear sensitive fields (share, dkg_data) in protocol_free_request
- Secure-zero share_hex and result buffers in DKG round2
- Remove dead code and redundant NULL checks after KEEP_ASSERT
- Reduce session resume timeout from 10x to 3x
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@main/frost_coordinator.c`:
- Around line 406-410: The call to esp_websocket_client_send_text in the
unsubscribe path ignores its return value; update the block that iterates
g_ctx.relays (relay_connection_t *relay / COORDINATOR_STATE_CONNECTED /
relay->ws_handle) to capture the return from
esp_websocket_client_send_text(close_msg, strlen(close_msg),
pdMS_TO_TICKS(WS_SEND_TIMEOUT_MS)), check for non-success, and handle failures
(e.g., log via your logging facility, optionally mark relay->state as
disconnected or schedule a retry/cleanup) so send errors are not silently
ignored.
- Around line 87-95: The code around the websocket message handling (check of
data->op_code, allocation of msg, memcpy and call to cJSON_Parse) has
clang-format violations; run clang-format over this block or reformat so spacing
and indentation follow project style (align conditions, space around operators,
and wrap function calls consistently), ensuring the if statement, malloc/memcpy
lines, and the cJSON_Parse/free(msg) lines are formatted correctly; verify the
changes touch the block that uses data->op_code, msg, memcpy, cJSON_Parse, and
free to resolve the CI formatting errors.
- Around line 27-39: The function is_safe_subscription_id has clang-format style
violations; run clang-format (or apply the project's formatting rules) on that
function to restore consistent indentation and spacing, specifically reflow the
declaration and the for-loop/if blocks inside is_safe_subscription_id so they
match the repository's style; locate the function by name
is_safe_subscription_id and update its formatting (no logic changes) so CI's
clang-format check passes.
- Around line 384-386: The subscribe/unsubscribe path calls
esp_websocket_client_send_text without checking its return value; replicate the
publish_event behavior by capturing the return (e.g., int res =
esp_websocket_client_send_text(...)) and if res < 0 log a warning via
ESP_LOGW(TAG, ...) including relay->url and the error code so send failures are
observable; also fix the clang-format violation by ensuring the if-statement and
its body follow project formatting (brace/indentation) in the subscribe block
that references relay->state, relay->ws_handle and WS_SEND_TIMEOUT_MS.

In `@main/storage_checkpoint.c`:
- Around line 44-55: The clang-format failures stem from single-line if bodies
in checkpoint_lock and checkpoint_unlock not matching project style; update the
implementations to use consistent brace formatting and spacing: ensure the
static variable checkpoint_mutex remains, and change checkpoint_lock and
checkpoint_unlock to use explicit braced blocks for the if checks (referencing
functions checkpoint_lock and checkpoint_unlock and variable checkpoint_mutex)
so each conditional body is on its own line with braces and proper indentation
to satisfy clang-format.
🧹 Nitpick comments (3)
main/frost_coordinator.c (1)

96-164: Consider protecting g_ctx fields accessed in the WS handler with the existing mutex.

The handler reads g_ctx.callbacks, g_ctx.current_group, and g_ctx.privkey without holding g_ctx.mutex, while other tasks can modify these via frost_coordinator_set_callbacks, frost_coordinator_set_group, or frost_coordinator_deinit. Since this is a hardening PR and the mutex already exists, it would be a natural addition to guard against data races here.

main/frost_signer.c (1)

240-275: Message length validation reuses SESSION_ID_* constants — consider named constants for clarity.

Lines 254 and 259 use SESSION_ID_HEX_LEN and SESSION_ID_LEN to validate and parse the message. While numerically correct (messages are 32-byte hashes), using session-ID-specific constants for the message makes the code harder to reason about for future readers.

main/storage_checkpoint.c (1)

46-51: checkpoint_lock silently becomes a no-op if called before checkpoint_init.

When checkpoint_mutex is NULL (before checkpoint_init creates it), checkpoint_lock() silently returns without acquiring any lock. While the current call flow always calls checkpoint_init() first (via storage_checkpoint_save/load), a future caller bypassing init would get no locking silently.

Consider at minimum an assert or log, though the current code flow does guard this correctly.

- Revert checkpoint_crypto_buf from 24KB static to heap-allocated
  (fixes dram0_0_seg overflow by 4280 bytes)
- Keep mutex protection for thread safety
- Fix clang-format violations in coordinator, sign, checkpoint
- Add missing direct includes for clang-tidy misc-include-cleaner
- Log warnings on failed subscribe/unsubscribe sends
- Add direct includes to frost_signer_core.c and frost_signer_storage.c
  for clang-tidy misc-include-cleaner
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
main/storage_checkpoint.c (3)

136-151: ⚠️ Potential issue | 🟠 Major

TOCTOU: existing-checkpoint check races with the lock acquisition.

The read of existing at line 137 and the CHECKPOINT_MAGIC check at line 140 happen before checkpoint_lock() at line 151. Two concurrent callers could both see "no existing checkpoint" and both proceed to write, corrupting data. Move the lock acquisition before the existence check (or move the existence check inside the locked region).

Proposed fix
     if (checkpoint_init() != 0)
         return STORAGE_ERR_NOT_INIT;
 
+    checkpoint_lock();
+
     checkpoint_header_t existing;
     esp_err_t err = esp_partition_read(checkpoint_partition, 0, &existing, sizeof(existing));
-    if (err != ESP_OK)
+    if (err != ESP_OK) {
+        checkpoint_unlock();
         return STORAGE_ERR_IO;
+    }
     if (existing.magic == CHECKPOINT_MAGIC) {
+        checkpoint_unlock();
         return STORAGE_ERR_CHECKPOINT_EXISTS;
     }
 
     checkpoint_header_t header;
     memset(&header, 0, sizeof(header));
     header.magic = CHECKPOINT_MAGIC;
     pad_session_id(header.session_id, session_id);
     header.counter = checkpoint_get_counter();
     header.data_len = (uint16_t)len;
 
-    checkpoint_lock();
-
     uint8_t *encrypted = malloc(len);

259-294: ⚠️ Potential issue | 🟠 Major

storage_checkpoint_clear lacks mutex protection.

This function reads, then erases the checkpoint partition without holding checkpoint_lock. It can race with concurrent storage_checkpoint_save or storage_checkpoint_load calls, leading to corrupted reads/writes. Wrap the critical section with checkpoint_lock()/checkpoint_unlock() similar to the save and load paths.


556-561: ⚠️ Potential issue | 🟡 Minor

Mutex is not destroyed in storage_checkpoint_cleanup, leaking the semaphore handle.

If checkpoint_init / storage_checkpoint_cleanup can be called multiple times (e.g., during testing or re-initialization), the checkpoint_mutex semaphore is never freed, leaking a FreeRTOS kernel object each cycle.

Proposed fix
 void storage_checkpoint_cleanup(void) {
+#ifdef ESP_PLATFORM
+    if (checkpoint_mutex) {
+        vSemaphoreDelete(checkpoint_mutex);
+        checkpoint_mutex = NULL;
+    }
+#endif
     checkpoint_initialized = false;
     checkpoint_partition = NULL;
     checkpoint_counter = 0;
     checkpoint_counter_loaded = false;
 }
🤖 Fix all issues with AI agents
In `@main/frost_coordinator.c`:
- Around line 88-167: The code stack-allocates large event structs
(frost_sign_response_t, frost_dkg_round1_t, frost_dkg_round2_t, nip46_request_t)
inside the WEBSOCKET_EVENT_DATA handling, which can blow the stack (especially
frost_dkg_round1_t); change those local stack allocations to heap allocations
instead: replace locals like "frost_dkg_round1_t r1;" with pointers (e.g.
"frost_dkg_round1_t *r1 = malloc(sizeof(*r1));"), check malloc success, pass the
pointer to frost_parse_dkg_round1_event and the callback, free the pointer after
use, and do the same pattern for frost_sign_response_t, frost_dkg_round2_t and
nip46_request_t (also ensure frost_nip46_request_free is still called where
appropriate and avoid double-free). Locate changes around the event_str handling
and the calls to frost_parse_sign_response, frost_parse_dkg_round1_event,
frost_parse_dkg_round2_event, and frost_parse_nip46_event in
frost_coordinator.c.
🧹 Nitpick comments (1)
main/storage_checkpoint.c (1)

360-363: KEEP_ASSERT will crash on invalid input instead of returning an error.

storage_checkpoint_save (line 126) and storage_checkpoint_clear (line 260) return STORAGE_ERR_INVALID_DATA for null/invalid arguments, but storage_save_session_checkpoint uses KEEP_ASSERT which will abort. On an embedded device, crashing due to a null pointer from a caller is a harsh failure mode. Consider using the same graceful error-return pattern for consistency and resilience, reserving KEEP_ASSERT for true invariants that indicate internal logic errors.

- Heap-allocate frost_sign_response_t, frost_dkg_round1_t (~1.1KB),
  frost_dkg_round2_t, nip46_request_t in websocket handler
- Fix TOCTOU: move checkpoint_lock() before existence check in save
- Add mutex protection to storage_checkpoint_clear
- Destroy mutex in storage_checkpoint_cleanup
- Add direct includes for clang-tidy misc-include-cleaner
@kwsantiago kwsantiago merged commit a73b2bb into main Feb 12, 2026
9 checks passed
@kwsantiago kwsantiago deleted the hardening branch February 12, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant