Skip to content

Normalize include paths before source-cache lookup#976

Open
0pendev wants to merge 3 commits intoVexu:masterfrom
0pendev:fix/normalize_include_path
Open

Normalize include paths before source-cache lookup#976
0pendev wants to merge 3 commits intoVexu:masterfrom
0pendev:fix/normalize_include_path

Conversation

@0pendev
Copy link
Copy Markdown
Contributor

@0pendev 0pendev commented Mar 12, 2026

PR: Normalize include paths before source-cache lookup

Summary

  • Fix duplicate source loading when the same header is reached via different relative ../ paths (e.g. common/shared.h vs handler/../common/shared.h).
  • Normalize include paths with std.fs.path.resolvePosix before keying into the source cache, so that logically identical files map to a single cache entry.
  • Store the normalized path in the cache so all subsequent lookups (regardless of the original include spelling) hit the same entry.

Problem

The source cache in Compilation.addSourceFromPathExtra was keyed on the raw include-path string as provided by the preprocessor. When the same physical header file was included from different directories, each directory produced a different relative path containing ../ segments that were never collapsed.

For example, given this layout:

project/
├── common/
│   └── shared.h          (#pragma once, defines shared_t)
├── handler/
│   └── handler_types.h   (#include "../common/shared.h")
└── main.c                (#include "common/shared.h"
                            #include "handler/handler_types.h")

When the preprocessor processes these includes, the cache sees two different keys for the same file:

Include site Path passed to source cache
main.c common/shared.h
handler/handler_types.h handler/../common/shared.h

Because these two strings differ, the cache treated them as two distinct sources, even though they refer to the same file on disk. This caused:

  1. #pragma once failures — the guard is per-source-ID, so the second load didn't recognize the file as already included, leading to duplicate type definitions or, in pathological ./-loop cases, infinite self-include recursion.
  2. Redundant preprocessing — the same header was lexed and preprocessed multiple times, wasting work proportional to the number of distinct path spellings.

Reproducer

Create the following file tree:

common/shared.h

#pragma once

typedef struct {
    int id;
    int value;
} shared_t;

handler/handler_types.h

#pragma once
#include "../common/shared.h"

typedef struct {
    shared_t state;
    int flags;
} handler_info_t;

main.c

#include "common/shared.h"
#include "handler/handler_types.h"

shared_t global_state;
handler_info_t handler_info;

Without the fix, compiling main.c with arocc produces a duplicate definition error for shared_t because #pragma once fails to deduplicate the two source-cache entries for common/shared.h.

Fix

In src/aro/Compilation.zig, inside addSourceFromPathExtra:

  1. After the initial raw-path cache lookup, normalize the path using std.fs.path.resolvePosix to collapse /./" and X/../ segments.
  2. If the normalized path differs from the raw path, perform a second cache lookup with the normalized form.
  3. When inserting a new source into the cache, use the normalized path as the key so future lookups (from any spelling) converge on the same entry.
fn addSourceFromPathExtra(comp: *Compilation, path: []const u8, kind: Source.Kind) !Source {
    if (comp.sources.get(path)) |some| return some;

    // Normalize the path by collapsing "/./" and "X/../" segments so that
    // different relative paths to the same file resolve to one cache entry.
    // This prevents #pragma once failures and redundant preprocessing.
    const normalized = try std.fs.path.resolvePosix(comp.gpa, &.{path});
    defer comp.gpa.free(normalized);
    if (!mem.eql(u8, normalized, path)) {
        if (comp.sources.get(normalized)) |some| return some;
    }

    if (mem.indexOfScalar(u8, path, 0) != null) {
        return error.FileNotFound;
    }

    const file = try comp.cwd.openFile(comp.io, path, .{});
    defer file.close(comp.io);
    return comp.addSourceFromFile(file, normalized, kind);
}

Copy link
Copy Markdown
Owner

@Vexu Vexu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also make addSourceFromOwnedBuffer take a []u8 for path so that it doesn't need to be duped twice? And a test would be nice.

@0pendev
Copy link
Copy Markdown
Contributor Author

0pendev commented Mar 13, 2026

Could you also make addSourceFromOwnedBuffer take a []u8 for path so that it doesn't need to be duped twice? And a test would be nice.

How do want me to add the test case ? Is it okay to add a subdirectory for the includes needed for the test?

@Vexu
Copy link
Copy Markdown
Owner

Vexu commented Mar 13, 2026

There is already a directory for includes, you could look at test/cases/pragma once.c for an example of a test that uses includes.

@0pendev
Copy link
Copy Markdown
Contributor Author

0pendev commented Mar 14, 2026

Hey @Vexu
I added the tests as we discussed.
Fixed a regression I did not see.

My last commit is the refactor you suggested, I feel like it makes the function harder to call and would love a review on it 🙏

@0pendev 0pendev force-pushed the fix/normalize_include_path branch from 8ab77bd to 379a9a4 Compare March 14, 2026 22:29
@Vexu
Copy link
Copy Markdown
Owner

Vexu commented Mar 14, 2026

Looking at FindInclude more I think replacing the allocPrint in check with a resolve would make it unnecessary to resolve the path in addSourceFromPathExtra avoiding having to modify addSourceFromOwnedBuffer.

0pendev added 2 commits March 16, 2026 09:58
Signed-off-by: Francisco Freitas <francisco.freitas@ledger.fr>
Fix duplicate source loading when the same header is reached via
different relative ../ paths (e.g. common/shared.h vs
handler/../common/shared.h).
Replace fmt.allocPrint-based path construction in FindInclude.check
with std.fs.path.resolve, so logically identical paths are canonicalized
before keying into the source cache. Also resolve search-path prefixes
in the include-directory loop for the same reason.

Signed-off-by: Francisco Freitas <francisco.freitas@ledger.fr>
@0pendev 0pendev force-pushed the fix/normalize_include_path branch from 379a9a4 to bf10239 Compare March 16, 2026 09:20
@0pendev
Copy link
Copy Markdown
Contributor Author

0pendev commented Mar 16, 2026

  • Fix duplicate source loading when the same header is reached via different relative ../ paths (e.g. common/shared.h vs handler/../common/shared.h).
  • Normalize include paths with std.fs.path.resolvePosix before keying into the source cache, so that logically identical files map to a single cache entry.
  • Store the normalized path in the cache so all subsequent lookups (regardless of the original include spelling) hit the same entry.

Applied this refactoring and dropped previous attempts at fixes.
Tests were kept and all integration tests are passing.

Signed-off-by: Francisco Freitas <francisco.freitas@ledger.fr>
@0pendev 0pendev requested a review from Vexu March 18, 2026 08:42
},
}
for (comp.search_path.items) |include| {
const resolved_path = try std.fs.path.resolve(find.comp.gpa, &.{include.path});
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks redundant.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the tests I did The resolve is needed for #include_next support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants