Skip to content

Conversation

@lorettayao
Copy link

@lorettayao lorettayao commented Oct 6, 2025

Summary
Fix a segfault when evaluating array compound literals like (int[]){1,2,3,4,5} in expression context. The literal now yields the temporary array’s address (via decay) instead of collapsing to a single element, so indexing and reads work correctly.

Motivation
Previously, code like the snippet below crashed because the array literal was reduced to a scalar and later treated as a pointer.

Reproduction (manual)

#include <stdio.h>

int main(void) {
    int *a = (int[]){1,2,3,4,5};   // used to crash later
    printf("Testing basic array compound literal:\n");
    for (int i = 0; i < 5; i++)
        printf("  a[%d] = %d\n", i, a[i]);

    int sum = a[0] + a[4];
    printf("Sum = %d (expect 6)\n", sum);
    return 0;
}

Before: segfault
After: prints a[0..4] and Sum = 6 as expected

Approach (high level)

  • Ensure (type[]){…} produces a real temporary array object and the expression value decays to its address.
  • Only collapse to a scalar when a scalar is actually required by context.
  • No changes to struct compound literals or plain scalars.

Scope

Tests

  • No test files are added in this PR. The manual snippet above demonstrates the fix.

Compatibility

  • Restores standard pointer semantics for array compound literals.
  • No behavioral changes for structs or scalars.

Issue


Summary by cubic

Fix parsing of array compound literals so they allocate a temporary array and decay to its address, preserving pointer semantics and preventing segfaults.

  • Bug Fixes

    • Eliminates crashes when using (int[]){...} in expressions, assignments, and parameter passing.
    • Correct behavior across binary ops, pointer arithmetic, ternary results, and function-call (incl. variadic) arguments.
    • Zero-length array literals yield 0 for scalar uses to avoid garbage reads.
    • Addresses Fix compound literals to capture all field values, not just first #299.
  • Refactors

    • Added parse_array_compound_literal and scalarize_array_literal to centralize handling; scalarize only when a scalar is required.
    • Emits element writes and counts initializers; tracks brace consumption to avoid double reads.
    • Struct and scalar compound literals remain unchanged.

Written for commit a4aba54. Summary will update automatically on new commits.

cubic-dev-ai[bot]

This comment was marked as outdated.

}
lex_expect(T_close_curly);
var->array_size = count;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a new blank line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line.

Copy link
Collaborator

@DrXiao DrXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some test cases to the test suite for validation.

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use static qualifier since shecc does not support. Check 'COMPLIANCE.md' carefully.

@jserv
Copy link
Collaborator

jserv commented Oct 6, 2025

You MUST ensure bootstrapping is fully functional before submitting pull requests.

@visitorckw
Copy link
Collaborator

IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything.

@DrXiao
Copy link
Collaborator

DrXiao commented Oct 6, 2025

IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything.

As far as I know, shecc is planned to fully support the C99 standard, so I think the term "fix" is acceptable.

@visitorckw
Copy link
Collaborator

Fine, but since we haven't claimed to fully support C99 and, IIUC, array compound literals were never supported before, this seems more like supporting a new feature to me, rather than fixing an existing problem.

@DrXiao
Copy link
Collaborator

DrXiao commented Oct 6, 2025

In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299).

Therefore, this pull request specifically aims to fix it.

@visitorckw
Copy link
Collaborator

In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299).

Therefore, this pull request specifically aims to fix it.

Thanks, that resolves my doubt.
However, should we also mention the issue in the first commit?

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add tests/array_ptr.c. Instead, consolidate tests/driver.sh.

Copy link
Collaborator

@visitorckw visitorckw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase your branch to keep the git history clean.
Commits that fix problems introduced within the same pull request should be avoided.


add_insn(parent, *bb, OP_read, scalar, array_var, NULL, elem_size, NULL);
return scalar;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a line break.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Add a blank line.

jserv

This comment was marked as resolved.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 7, 2025
cubic-dev-ai[bot]

This comment was marked as resolved.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 7, 2025
@cubic-dev-ai
Copy link

cubic-dev-ai bot commented Nov 7, 2025

Looking at the changes, the core logic for handling array compound literals looks solid—good job centralizing the decay behavior to avoid ad-hoc fixes everywhere. But yeah, there are some style inconsistencies and comment opportunities that could make this even tighter. Here's what I'd suggest, focused on parser.c (the bulk of the changes) and the test additions. I'll keep it practical: specific line tweaks for consistency with the rest of the codebase (e.g., brace placement, naming, spacing) and punchier comments.

Style Fixes for parser.c

  1. Brace placement and indentation in new functions:

    • In parse_array_compound_literal, the if (!lex_peek(T_close_curly, NULL)) block has inconsistent indentation for the inner for loop—align it to match the function's style (2-space indents). Also, add braces around single-statement blocks for consistency (e.g., the count++ line).
      if (!lex_peek(T_close_curly, NULL)) {
          for (;;) {
              read_expr(parent, bb);
              read_ternary_operation(parent, bb);
              var_t *value = opstack_pop();
              if (count == 0)
                  var->init_val = value->init_val;
              if (emit_code) {
                  // ... (existing code)
              }
              count++;  // Add braces if expanding later
              if (!lex_accept(T_comma))
                  break;
              if (lex_peek(T_close_curly, NULL))
                  break;
          }
      }
      This matches the style in read_expr and avoids future bugs.
  2. Variable naming consistency:

    • vd is overused as a generic var_t pointer—rename to something descriptive in read_ternary_operation (e.g., result_var for the final vd). It's clear in context, but vd feels like a holdover from declarations elsewhere.
      var_t *result_var = require_var(parent);
      gen_name_to(result_var->var_name);
      add_insn(parent, then_, OP_assign, result_var, true_val, NULL, 0, NULL);
      add_insn(parent, else_, OP_assign, result_var, false_val, NULL, 0, NULL);
      // ...
      result_var->is_ternary_ret = true;
      opstack_push(result_var);
    • Similarly, in read_body_assignment, rhs_val and rhs are redundant—use rhs consistently and drop the intermediate.
  3. Spacing and operator alignment:

    • In is_array_literal_placeholder, tighten the return condition: return var && var->array_size > 0 && !var->ptr_level && var->var_name[0] == '.';
    • In binary op handling (around line 2908), the new rs1_is_ptr_like checks have good logic but uneven spacing—align the bool defs:
      bool rs1_is_ptr_like = rs1 && (rs1->ptr_level || rs1->array_size);
      bool rs2_is_ptr_like = rs2 && (rs2->ptr_level || rs2->array_size);
    • Remove extra blank lines after function defs (e.g., after scalarize_array_literal) to match the compact style in the original parser.
  4. Minor: Unused params and locals:

    • In scalarize_array_literal, hint_type defaults to TY_int if invalid, but the null check for elem_type could be simplified: type_t *elem_type = hint_type ? hint_type : array_var->type ? array_var->type : TY_int;. Drops the nested if.
    • var->init_val = 0; in a few spots (e.g., zero-length handling) is fine, but ensure it's set consistently—it's already good, just a nit.

Comment Improvements

The inline comments are mostly helpful, but some are wordy or could be more precise. Aim for concise, factual notes that explain why not just what.

  1. Update the placeholder comment (around line 1364):

    • Current: "Identify compiler-emitted temporaries that hold array compound literals. Parsing assigns these temporaries synthetic names via gen_name_to (".tN") and they keep array metadata without pointer indirection."
    • Suggested: "Detect compiler temporaries ('.tN' names) holding array compound literals. These retain array metadata (size/type) without decaying to pointers until needed."
      • Shorter, focuses on the key invariant (no auto-decay).
  2. Add a brief header comment to parse_array_compound_literal:

    /* Parse array compound literal contents: emit element stores to a temporary array.
     * Counts initializers and consumes the closing brace. Used only for array types. */
    void parse_array_compound_literal(var_t *var, /* ... */)
    • Explains purpose without repeating the PR description.
  3. Clarify scalarization calls:

    • In function params (around line 1650), add: // Decay array literals to scalars for non-pointer params, per C semantics.
    • In ternary handling: // Scalarize array literals if the other branch isn't pointer-like, to match types.
    • In assignment: // Decay RHS array literal to scalar if target is non-pointer (e.g., int x = (int[]){1}; uses first elem).
      • These tie back to the motivation (avoiding scalar collapse in pointer contexts) without verbosity.
  4. Zero-length handling (in read_expr_operand):

    • Current implicit via if (compound_var->array_size == 0). Add: // Zero-length arrays default to 0 constant (avoids garbage reads in scalar contexts).

For Tests in driver.sh

  • The new tests are great for coverage, but add a comment block at the top of the array sections:
    # Array compound literal tests: verify decay to pointer in assignments and args
  • In the sum test, the 0, 0 padding is clever for partial init, but add an expected output comment: # Expects sum=6 from first 3 elements.
  • Consistent with existing tests: Use try_ 0 everywhere (you did), and keep EOF indents clean.

These tweaks keep the code readable and maintainable without overhauling anything. Total changes: ~20 lines. Run a style check (assuming you have one) after, and it'll align nicely with the rest of the parser. If you push an updated commit, I can re-review the diffs.

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 7, 2025
@lorettayao lorettayao force-pushed the fix-array-init branch 2 times, most recently from 0de5d30 to 08f1e29 Compare November 9, 2025 09:41
@jserv jserv requested a review from ChAoSUnItY November 14, 2025 08:26
@jserv
Copy link
Collaborator

jserv commented Nov 14, 2025

I defer to @ChAoSUnItY for confirmation.

fatal("Unsupported truncation operation with invalid target size");
}
return;
case OP_sign_ext: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removing the curly braces while this case branch has variable declaration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t really mean to drop those braces—they were just left behind while trying other tweaks. but it’s fine either way because C99 lets us declare source_size right after the case label without changing behavior.

Copy link
Collaborator

@ChAoSUnItY ChAoSUnItY Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although C99 allows this. But since in our compiler's standard workflow, stage 0 will be compiled by GCC with -Wpedantic compilation option included, and this will cause building process to output warning even in this case it didn't do anything in a harmful way.

Copy link
Author

@lorettayao lorettayao Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted it to the original order in my latest commit

@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 18, 2025
@sysprog21 sysprog21 deleted a comment from cubic-dev-ai bot Nov 18, 2025
cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

@jserv
Copy link
Collaborator

jserv commented Nov 19, 2025

@lorettayao You can reply to or chat with @cubic-dev-ai so the AI bot can learn from your feedback and provide smarter code reviews over time.

@jserv jserv requested review from DrXiao and visitorckw November 19, 2025 16:04
jserv

This comment was marked as duplicate.

Copy link
Collaborator

@DrXiao DrXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to How to Write a Git Commit Message and refine the commits accrodingly.

Comment on lines +727 to +757
# Test: Array compound literal decay to pointer in initializer
try_ 0 << EOF
int main(void) {
int *arr = (int[]){1, 2, 3, 4, 5};
return arr[0] != 1 || arr[4] != 5;
}
EOF

# Test: Passing array compound literal as pointer argument
try_ 0 << EOF
int sum(int *p, int n) {
int s = 0;
for (int i = 0; i < n; i++)
s += p[i];
return s;
}
int main(void) {
int s = sum((int[]){1, 2, 3, 0, 0}, 3);
return s != 6;
}
EOF

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some test cases to validate array compound literals of the char and short types:

int main(void) {
    char *s = (char[]){'A', 'B', 'C', 'D', 'E'};
    /* ... */
}
int main(void) {
    short *s = (short[]){1, 2, 3, 4, 5};
    /* ... */
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design test cases and verify that the compiler can correctly handle ternary operations involving array compound literals:

int *a = (condition) ? &arr : (int[]){1, 2, 3, 4, 5};

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test cases added in the latest commit [dbea9ca]

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase the latest 'master' branch and refine git commits.

@ChAoSUnItY
Copy link
Collaborator

The CI error is caused by IR regression test, if you happened to have this test failed, you can execute make update-snapshots to update the IR snapshots, then push the updated snapshots so that the test would proceed.

@lorettayao
Copy link
Author

The CI error is caused by IR regression test, if you happened to have this test failed, you can execute make update-snapshots to update the IR snapshots, then push the updated snapshots so that the test would proceed.

@ChAoSUnItY When i update-snapshots, there are only the fib-riscv.json & hello-riscv.json been updated, is that normal? I wonder if the CI fail is because I did not update fib-arm.json and hello-arm.json.

@ChAoSUnItY
Copy link
Collaborator

there are only the fib-riscv.json & hello-riscv.json been updated, is that normal?

Sometimes some changes only affect certain backend, which is normal. If you're not sure if this is correct, you can perform make check-snapshots to see the result first on local machine, if it succeeds, then it should be correct on CI, too.

@lorettayao
Copy link
Author

@jserv I have rebased to master, and I saw my branch is up to date with 'origin/master'. I am confused if the conflict is because of the rebase issue.

@visitorckw
Copy link
Collaborator

I guess your origin points to your own repository on github instead of the upstream repository under sysprog21?

@lorettayao
Copy link
Author

I guess your origin points to your own repository on github instead of the upstream repository under sysprog21?

yes! Thank you for pointing out!

Copy link
Collaborator

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use git rebase -i to rework commits, squashing intermediate ones.

@DrXiao
Copy link
Collaborator

DrXiao commented Nov 27, 2025

@lorettayao, note that the commit message body should be wrapped at 72 characters.

Unify and correct the handling of array compound literals across
parsing, semantic analysis, and lowering. The compiler now builds a
temporary array by writing each element, tracking initializer counts,
and returning the array’s address instead of collapsing the literal to
its first element. Decay to a scalar is performed only when a scalar
value is required.

A centralized helper now governs literal decay, replacing scattered
ad-hoc callers in binary operators, assignments, function-call
arguments, and ternary expressions. This resolves cases where array
literals were incorrectly forced to scalars in pointer contexts and
brings the behavior closer to standard C expectations.

The update also corrects scalarization in variadic and pointer
arithmetic contexts, ensures pointer-typed ternary results, handles
zero-length array literals as constant zero, avoids double-brace
consumption in the parser, and regenerates ARM and RISC-V snapshots to
match the corrected lowering. These changes restore correct pointer
semantics for array compound literals and address sysprog21#299.
src/parser.c Outdated
Comment on lines 48 to 51
void parse_array_compound_literal(var_t *var,
block_t *parent,
basic_block_t **bb,
bool emit_code);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this forward declaration is unnecessary here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I found it unnecessary, will remove

}
lex_expect(T_close_curly);
var->array_size = count;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line.

src/parser.c Outdated
void parse_array_compound_literal(var_t *var,
block_t *parent,
basic_block_t **bb,
bool emit_code)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the fourth parameter (emit_code) necessary?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove


add_insn(parent, *bb, OP_read, scalar, array_var, NULL, elem_size, NULL);
return scalar;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. Add a blank line.

(var->type && var->type->ptr_level > 0));
}

var_t *scalarize_array_literal(block_t *parent,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more comments for this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Comment on lines 3631 to 3658
if (true_array && !false_ptr_like)
true_val = scalarize_array_literal(parent, &then_, true_val,
false_val ? false_val->type : NULL);

rs1 = opstack_pop();
add_insn(parent, else_, OP_assign, vd, rs1, NULL, 0, NULL);
if (false_array && !true_ptr_like)
false_val = scalarize_array_literal(parent, &else_, false_val,
true_val ? true_val->type : NULL);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comments to explain this handling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Add regression tests that verify pointer and decay semantics of array
compound literals. The tests cover decay during initialization, passing
array literals to functions that expect pointer arguments, pointer-typed
ternary expressions involving literal branches, and correctness for
char[] and short[] literals in simple computations.

These tests confirm the behavior introduced by the refined compound
literal semantics implemented in the preceding commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants