Skip to content

Fix erlang:raise/3 assert with built stacktrace#2252

Open
bettio wants to merge 1 commit intoatomvm:release-0.7from
bettio:fix-raise-built-stacktrace
Open

Fix erlang:raise/3 assert with built stacktrace#2252
bettio wants to merge 1 commit intoatomvm:release-0.7from
bettio:fix-raise-built-stacktrace

Conversation

@bettio
Copy link
Copy Markdown
Collaborator

@bettio bettio commented Apr 1, 2026

  • Fix assert in stacktrace_exception_class when erlang:raise/3 is called with a built stacktrace and the re-raised exception hits a non-matching catch clause (triggering the compiler-generated OP_RAISE)
  • Wrap the built list into a raw 6-tuple in stacktrace_create_raw_mfa
  • Route OP_RAW_RAISE through HANDLE_ERROR() so both the NIF path and the raw_raise opcode path go through the wrapping logic
  • Add early exit in stacktrace_build for pre-built stacktraces

Fixes #2185

These changes are made under both the "Apache 2.0" and the "GNU Lesser General
Public License 2.1 or later" license terms (dual license).

SPDX-License-Identifier: Apache-2.0 OR LGPL-2.1-or-later

@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 07263bb to 7e62129 Compare April 1, 2026 16:50
@bettio bettio added this to the v0.7.0 milestone Apr 1, 2026
@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 7e62129 to 86ff1db Compare April 1, 2026 16:52
term_put_tuple_element(stack_info, 2, term_from_int(0));
term_put_tuple_element(stack_info, 3, term_from_int(0));
term_put_tuple_element(stack_info, 4, built_stacktrace);
term_put_tuple_element(stack_info, 5, exception_class);

Check failure

Code scanning / CodeQL

Use of term variable after garbage collection Error

Term variable 'exception_class' may hold a stale value after GC call
here
. Defined
here
.
When erlang:raise/3 is called with a built stacktrace (list of
{M,F,A,Loc} tuples) and the re-raised exception passes through a
try/catch whose clauses do not match, the compiler-generated catch-all
(OP_RAISE) calls stacktrace_exception_class, which asserts on a
6-tuple. The built list was stored as-is, never wrapped.

Wrap the built stacktrace into a raw 6-tuple in stacktrace_create_raw_mfa
so OP_RAISE and stacktrace_build can process it. Route OP_RAW_RAISE
through HANDLE_ERROR() so it also hits the wrapping path.

Signed-off-by: Davide Bettio <davide@uninstall.it>
@bettio bettio force-pushed the fix-raise-built-stacktrace branch from 86ff1db to 949690a Compare April 1, 2026 19:32
@petermm
Copy link
Copy Markdown
Contributor

petermm commented Apr 2, 2026

https://ampcode.com/threads/T-019d4d69-3ac6-718e-8d7c-1ad32f287c7b

PR Review: Fix erlang:raise/3 assert with built stacktrace

Commit: 949690aFix erlang:raise/3 assert with built stacktrace
Author: Davide Bettio
Files changed: 6 (stacktrace.c, opcodesswitch.h, test_raise_built_stacktrace.erl, CMakeLists.txt, test.c, CHANGELOG.md)


Summary

When erlang:raise/3 is called with a built stacktrace (list of {M,F,A,Loc} tuples) and the
re-raised exception passes through a try/catch whose clauses do not match, the compiler-generated
catch-all (OP_RAISE) calls stacktrace_exception_class(), which asserts on a 6-tuple. The built
list was stored as-is, never wrapped.

The fix:

  1. Changes OP_RAW_RAISE from goto handle_error to HANDLE_ERROR() so stacktrace_create_raw_mfa() is called
  2. stacktrace_create_raw_mfa() detects a built stacktrace (non-empty list) and wraps it into a raw 6-tuple
  3. stacktrace_build() detects num_frames == 0 and returns the pre-built list directly

Verdict: ✅ Approve with suggestions

The core fix is correct and well-reasoned for the interpreter path. GC root handling is safe,
the num_frames == 0 sentinel is sound (normal stacktraces always have num_frames >= 1), and
TUPLE_SIZE(6) allocation matches. The test covers both the raw_raise opcode and NIF paths.


Issues Found

🔴 Critical: JIT raw_raise path is NOT fixed

The same bug exists in the JIT path. jit_raw_raise() calls jit_handle_error(ctx, jit_state, 0),
which only calls stacktrace_create_raw() when offset != 0 || term_is_invalid_term(stacktrace).
With a built list and offset=0, normalization is skipped — the list flows through unchanged and
will hit the same assertion in jit_raise()stacktrace_exception_class().

File: src/libAtomVM/jit.c

// Line 288-294 — jit_handle_error skips normalization for built lists
static Context *jit_handle_error(Context *ctx, JITState *jit_state, int offset)
{
    if (offset || term_is_invalid_term(ctx->exception_stacktrace)) {
        ctx->exception_stacktrace
            = stacktrace_create_raw(ctx, jit_state->module, offset);
    }
    // ← A built list (from raise/3) passes through here unchanged
// Line 415-422 — jit_raw_raise passes offset=0, built list survives
static Context *jit_raw_raise(Context *ctx, JITState *jit_state)
{
    context_set_exception_class(ctx, ctx->x[0]);
    ctx->exception_reason = ctx->x[1];
    ctx->exception_stacktrace = ctx->x[2];
    return jit_handle_error(ctx, jit_state, 0);
}

Suggested fix:

 static Context *jit_handle_error(Context *ctx, JITState *jit_state, int offset)
 {
-    if (offset || term_is_invalid_term(ctx->exception_stacktrace)) {
+    if (offset || term_is_invalid_term(ctx->exception_stacktrace)
+            || term_is_nonempty_list(ctx->exception_stacktrace)) {
         ctx->exception_stacktrace
             = stacktrace_create_raw(ctx, jit_state->module, offset);
     }

Also, jit_raise() at line 409 calls stacktrace_exception_class(stacktrace) directly on its
argument. If a built list reaches this path, it will assert-fail. This would also need the same
wrapping treatment.


🟡 Minor: Stale comment in OP_RAW_RAISE

The old comment at opcodesswitch.h:5248-5250 says:

"This is an optimization from the compiler where we don't need to call stacktrace_create_raw
here because the stack trace has already been created and set in x[2]."

This is now misleading — the whole point of the fix is that we do call stacktrace_create_raw
(via HANDLE_ERROR()) to normalize built lists. The comment should be updated.

Suggested fix:

             case OP_RAW_RAISE: {

                 TRACE("raw_raise/0\n");

-                // This is an optimization from the compiler where we don't need to call
-                // stacktrace_create_raw here because the stack trace has already been created
-                // and set in x[2].
+                // The compiler emits raw_raise when re-raising with an already
+                // captured stacktrace. We route through HANDLE_ERROR() so that
+                // stacktrace_create_raw_mfa normalizes any built stacktrace
+                // (list of {M,F,A,Loc} tuples) into the internal raw 6-tuple.
                 term ex_class = x_regs[0];

🟡 Minor: Empty stacktrace [] not preserved

erlang:raise(error, badarg, []) is valid in OTP but won't be detected by the
term_is_nonempty_list() check. The empty list will fall through to the normal stacktrace
construction path, producing a new stacktrace instead of preserving [].

This may be acceptable if AtomVM doesn't need full OTP compatibility here, but worth documenting
as a known limitation.


🟢 Nitpick: CHANGELOG wording

"causing an assert" reads slightly better as "causing an assertion failure":

-- Fixed `erlang:raise/3` with a built stacktrace causing an assert when the re-raised exception
+- Fixed `erlang:raise/3` with a built stacktrace causing an assertion failure when the re-raised exception
 passes through a non-matching catch clause

🟢 Nitpick: Test coverage could be stronger

The tests verify "no crash" but don't assert that the preserved stacktrace is correct. Adding a
check that the caught stacktrace remains a list matching the original would strengthen the test.
Also missing: exit/throw exception classes and raise(error, badarg, []).


What's Good

  • GC root handling is correctexception_stacktrace and exception_reason are properly
    rooted through ctx->x[0..1] before memory_ensure_free_with_roots
  • NOLINT(term-use-after-gc) for exception_class is safeexception_class is always an
    atom (immediate value), never a heap term, so GC cannot move it
  • num_frames == 0 sentinel is sound — normal stacktrace construction always does
    num_frames++ at line 211, so 0 is a safe sentinel for "pre-built"
  • No performance regression — the added branch is only on exception paths
  • Test covers both opcode and NIF paths — good coverage of the two entry points

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants