Skip to content

Conversation

@lorettayao
Copy link

@lorettayao lorettayao commented Oct 6, 2025

Summary
Fix a segfault when evaluating array compound literals like (int[]){1,2,3,4,5} in expression context. The literal now yields the temporary array’s address (via decay) instead of collapsing to a single element, so indexing and reads work correctly.

Motivation
Previously, code like the snippet below crashed because the array literal was reduced to a scalar and later treated as a pointer.

Reproduction (manual)

#include <stdio.h>

int main(void) {
    int *a = (int[]){1,2,3,4,5};   // used to crash later
    printf("Testing basic array compound literal:\n");
    for (int i = 0; i < 5; i++)
        printf("  a[%d] = %d\n", i, a[i]);

    int sum = a[0] + a[4];
    printf("Sum = %d (expect 6)\n", sum);
    return 0;
}

Before: segfault
After: prints a[0..4] and Sum = 6 as expected

Approach (high level)

  • Ensure (type[]){…} produces a real temporary array object and the expression value decays to its address.
  • Only collapse to a scalar when a scalar is actually required by context.
  • No changes to struct compound literals or plain scalars.

Scope

Tests

  • No test files are added in this PR. The manual snippet above demonstrates the fix.

Compatibility

  • Restores standard pointer semantics for array compound literals.
  • No behavioral changes for structs or scalars.

Issue


Summary by cubic

Fix parsing of array compound literals so they allocate a temporary array and decay to its address, preserving pointer semantics and preventing segfaults.

  • Bug Fixes

  • Refactors

    • Added parse_array_compound_literal and scalarize_array_literal to centralize handling; scalarize only when a scalar is required.
    • Emits element writes and counts initializers; tracks brace consumption to avoid double reads.
    • Struct and scalar compound literals remain unchanged.

Written for commit b88294c. Summary will update automatically on new commits.

Implement proper array compound literal handling that emits element
writes, counts initializers, and returns the temporary array pointer
instead of collapsing to the first element. This restores correct
pointer semantics and avoids discarding array literals during parsing.

Struct and scalar compound literals are unchanged. The parser now
tracks whether the closing brace was already consumed by the array
helper to prevent double reads.
Introduce helpers to centralize array-literal scalar decay. Temporary
array compound literals are detected and converted to a scalar only
when a scalar is actually required; otherwise the expression yields
the temporary array’s address, preserving pointer semantics.

Update binary ops, direct/compound assignments, function-call
arguments, and ternary results to use the unified helper instead of
ad-hoc collapsing. This fixes cases where array literals were reduced
to their first element in pointer contexts, while keeping struct and
plain scalar behavior unchanged.

Addresses sysprog21#299 (array compound literals).
cubic-dev-ai[bot]

This comment was marked as outdated.

}
lex_expect(T_close_curly);
var->array_size = count;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new blank line.

Copy link
Collaborator

@DrXiao DrXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some test cases to the test suite for validation.

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use static qualifier since shecc does not support. Check 'COMPLIANCE.md' carefully.

@jserv
Copy link
Collaborator

jserv commented Oct 6, 2025

You MUST ensure bootstrapping is fully functional before submitting pull requests.

@visitorckw
Copy link
Collaborator

IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything.

@DrXiao
Copy link
Collaborator

DrXiao commented Oct 6, 2025

IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything.

As far as I know, shecc is planned to fully support the C99 standard, so I think the term "fix" is acceptable.

@visitorckw
Copy link
Collaborator

Fine, but since we haven't claimed to fully support C99 and, IIUC, array compound literals were never supported before, this seems more like supporting a new feature to me, rather than fixing an existing problem.

@DrXiao
Copy link
Collaborator

DrXiao commented Oct 6, 2025

In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299).

Therefore, this pull request specifically aims to fix it.

@visitorckw
Copy link
Collaborator

In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299).

Therefore, this pull request specifically aims to fix it.

Thanks, that resolves my doubt.
However, should we also mention the issue in the first commit?

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add tests/array_ptr.c. Instead, consolidate tests/driver.sh.

Copy link
Collaborator

@visitorckw visitorckw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your branch to keep the git history clean.
Commits that fix problems introduced within the same pull request should be avoided.


add_insn(parent, *bb, OP_read, scalar, array_var, NULL, elem_size, NULL);
return scalar;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line break.

jserv

This comment was marked as resolved.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 7, 2025
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 7 files

Fix IR test failure.
Regenerate ARM IR snapshots to reflect the corrected lowering.
Restore ARM/RISC-V codegen files to original structure after earlier brace experiment.
Treat zero-length array compound literals as constant zero during lowering so scalar uses don't load garbage.

Fix coding style
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 7, 2025
@cubic-dev-ai
Copy link

cubic-dev-ai bot commented Nov 7, 2025

Looking at the changes, the core logic for handling array compound literals looks solid—good job centralizing the decay behavior to avoid ad-hoc fixes everywhere. But yeah, there are some style inconsistencies and comment opportunities that could make this even tighter. Here's what I'd suggest, focused on parser.c (the bulk of the changes) and the test additions. I'll keep it practical: specific line tweaks for consistency with the rest of the codebase (e.g., brace placement, naming, spacing) and punchier comments.

Style Fixes for parser.c

  1. Brace placement and indentation in new functions:

    • In parse_array_compound_literal, the if (!lex_peek(T_close_curly, NULL)) block has inconsistent indentation for the inner for loop—align it to match the function's style (2-space indents). Also, add braces around single-statement blocks for consistency (e.g., the count++ line).
      if (!lex_peek(T_close_curly, NULL)) {
          for (;;) {
              read_expr(parent, bb);
              read_ternary_operation(parent, bb);
              var_t *value = opstack_pop();
              if (count == 0)
                  var->init_val = value->init_val;
              if (emit_code) {
                  // ... (existing code)
              }
              count++;  // Add braces if expanding later
              if (!lex_accept(T_comma))
                  break;
              if (lex_peek(T_close_curly, NULL))
                  break;
          }
      }
      This matches the style in read_expr and avoids future bugs.
  2. Variable naming consistency:

    • vd is overused as a generic var_t pointer—rename to something descriptive in read_ternary_operation (e.g., result_var for the final vd). It's clear in context, but vd feels like a holdover from declarations elsewhere.
      var_t *result_var = require_var(parent);
      gen_name_to(result_var->var_name);
      add_insn(parent, then_, OP_assign, result_var, true_val, NULL, 0, NULL);
      add_insn(parent, else_, OP_assign, result_var, false_val, NULL, 0, NULL);
      // ...
      result_var->is_ternary_ret = true;
      opstack_push(result_var);
    • Similarly, in read_body_assignment, rhs_val and rhs are redundant—use rhs consistently and drop the intermediate.
  3. Spacing and operator alignment:

    • In is_array_literal_placeholder, tighten the return condition: return var && var->array_size > 0 && !var->ptr_level && var->var_name[0] == '.';
    • In binary op handling (around line 2908), the new rs1_is_ptr_like checks have good logic but uneven spacing—align the bool defs:
      bool rs1_is_ptr_like = rs1 && (rs1->ptr_level || rs1->array_size);
      bool rs2_is_ptr_like = rs2 && (rs2->ptr_level || rs2->array_size);
    • Remove extra blank lines after function defs (e.g., after scalarize_array_literal) to match the compact style in the original parser.
  4. Minor: Unused params and locals:

    • In scalarize_array_literal, hint_type defaults to TY_int if invalid, but the null check for elem_type could be simplified: type_t *elem_type = hint_type ? hint_type : array_var->type ? array_var->type : TY_int;. Drops the nested if.
    • var->init_val = 0; in a few spots (e.g., zero-length handling) is fine, but ensure it's set consistently—it's already good, just a nit.

Comment Improvements

The inline comments are mostly helpful, but some are wordy or could be more precise. Aim for concise, factual notes that explain why not just what.

  1. Update the placeholder comment (around line 1364):

    • Current: "Identify compiler-emitted temporaries that hold array compound literals. Parsing assigns these temporaries synthetic names via gen_name_to (".tN") and they keep array metadata without pointer indirection."
    • Suggested: "Detect compiler temporaries ('.tN' names) holding array compound literals. These retain array metadata (size/type) without decaying to pointers until needed."
      • Shorter, focuses on the key invariant (no auto-decay).
  2. Add a brief header comment to parse_array_compound_literal:

    /* Parse array compound literal contents: emit element stores to a temporary array.
     * Counts initializers and consumes the closing brace. Used only for array types. */
    void parse_array_compound_literal(var_t *var, /* ... */)
    • Explains purpose without repeating the PR description.
  3. Clarify scalarization calls:

    • In function params (around line 1650), add: // Decay array literals to scalars for non-pointer params, per C semantics.
    • In ternary handling: // Scalarize array literals if the other branch isn't pointer-like, to match types.
    • In assignment: // Decay RHS array literal to scalar if target is non-pointer (e.g., int x = (int[]){1}; uses first elem).
      • These tie back to the motivation (avoiding scalar collapse in pointer contexts) without verbosity.
  4. Zero-length handling (in read_expr_operand):

    • Current implicit via if (compound_var->array_size == 0). Add: // Zero-length arrays default to 0 constant (avoids garbage reads in scalar contexts).

For Tests in driver.sh

  • The new tests are great for coverage, but add a comment block at the top of the array sections:
    # Array compound literal tests: verify decay to pointer in assignments and args
  • In the sum test, the 0, 0 padding is clever for partial init, but add an expected output comment: # Expects sum=6 from first 3 elements.
  • Consistent with existing tests: Use try_ 0 everywhere (you did), and keep EOF indents clean.

These tweaks keep the code readable and maintainable without overhauling anything. Total changes: ~20 lines. Run a style check (assuming you have one) after, and it'll align nicely with the rest of the parser. If you push an updated commit, I can re-review the diffs.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants