Skip to content

Conversation

@LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Aug 26, 2025

Description

References

This PR depends on #211.

Overview

This PR adds reserved keywords in the grammar file so that they won't be parsed as ID. The reserved keywords cover most of the common keywords used by programming languages. We're especially targeting Python and C++ since they're the first two code-gen targets. We can enrich this keyword set in future PRs when needed.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • Ensure all workflows pass.
  • Add a simple unit test to ensure a reserved keyword won't be parsed as an ID.

Summary by CodeRabbit

  • New Features

    • Introduced a broad set of reserved keywords across control flow, definitions, types/storage, error handling, access control, and misc. Using these as identifiers now correctly produces a parse error.
    • AST output and diagnostics now include precise line/column annotations, improving traceability and debugging.
  • Tests

    • Added tests validating that reserved words are rejected as identifiers.
    • Updated parser tests to assert location-aware AST serialisation and error messages.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

Walkthrough

Introduces reserved-word token groups in TaskDefLang.g4 and reorders the ID rule after them. Regenerates ANTLR lexer/parser sources to reflect the expanded token set and renumbered tokens. Updates tests to include location-annotated AST strings and adds a parser error test asserting reserved words cannot be used as identifiers.

Changes

Cohort / File(s) Summary
Grammar: reserved words and precedence
src/spider/tdl/parser/TaskDefLang.g4
Added six RESERVED_* lexical rules enumerating keywords; moved ID rule after RESERVED_* to ensure precedence; kept SPACE and COMMENT unchanged.
ANTLR generated: lexer updates
src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h, .../TaskDefLangLexer.cpp
Expanded token enum and vocabulary with six RESERVED_* tokens; renumbered SPACE→24, COMMENT→25, new RESERVED_*→26–31, ID→32; regenerated ruleNames/symbolicNames and serialized ATN.
ANTLR generated: parser updates
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h, .../TaskDefLangParser.cpp
Updated token enum and symbolicNames to include RESERVED_*; shifted token IDs (notably ID→32); regenerated serialized ATN; no signature changes.
Tests: AST expectations and reserved-word error
tests/tdl/test-parser.cpp
Adjusted expected AST strings to include source locations; modified test input to account for comment line and line shifts; added test asserting parse error when a reserved word (if) is used where ID is expected.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ast-grep (0.38.6)
src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@LinZhihao-723 LinZhihao-723 marked this pull request as ready for review August 26, 2025 02:37
@LinZhihao-723 LinZhihao-723 requested review from a team and sitaowang1998 as code owners August 26, 2025 02:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/tdl/test-parser-ast.cpp (1)

959-966: Add source-location serialization to all AST node serializers

The verification script shows that while every .cpp implementation invokes get_source_location().serialize_to_str(), the inline serializer definitions in the corresponding header files are missing this call. To ensure consistency and complete location tracking, please update each of the following header files’ serialize_to_str(size_t) implementations to include the source-location serialization:

• src/spider/tdl/parser/ast/node_impl/Function.hpp
• src/spider/tdl/parser/ast/node_impl/Identifier.hpp
• src/spider/tdl/parser/ast/node_impl/NamedVar.hpp
• src/spider/tdl/parser/ast/node_impl/Namespace.hpp
• src/spider/tdl/parser/ast/node_impl/StructSpec.hpp
• src/spider/tdl/parser/ast/node_impl/TranslationUnit.hpp

• src/spider/tdl/parser/ast/node_impl/type_impl/Struct.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.hpp

Each serializer should prepend or append (as appropriate) a call to

get_source_location().serialize_to_str(),

to its output stream to thread the node’s source location into the serialized representation.

🧹 Nitpick comments (23)
src/spider/tdl/parser/SourceLocation.hpp (2)

7-7: Option: reduce runtime overhead using compile-time format.

If you prefer to keep fmt in the header, using a compile-time format string (FMT_COMPILE) can shave a bit of overhead and catches format mistakes at compile time.

Apply:

-#include <fmt/format.h>
+#include <fmt/format.h>
+#include <fmt/compile.h>

-        return fmt::format("({}:{})", m_line, m_column);
+        return fmt::format(FMT_COMPILE("({}:{})"), m_line, m_column);

Also applies to: 20-22


7-7: Option: avoid leaking fmt through a public header.

Including fmt from a widely-included header propagates that dependency and increases compile times. Two lightweight alternatives:

  • Keep the method inline but build the string via std::string to drop the fmt include from the header.
  • Or, declare in the header and define in a .cpp that includes fmt locally.

Minimal header-only change (drops fmt entirely here):

-#include <fmt/format.h>
+// (optional) Keep the header independent of fmt.

     [[nodiscard]] auto serialize_to_str() const -> std::string {
-        return fmt::format("({}:{})", m_line, m_column);
+        std::string out;
+        out.reserve(32);
+        out.push_back('(');
+        out += std::to_string(m_line);
+        out.push_back(':');
+        out += std::to_string(m_column);
+        out.push_back(')');
+        return out;
     }

If you prefer an out-of-line definition, I can provide a matching SourceLocation.cpp snippet.

Also applies to: 20-22

src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (1)

46-61: Minor allocation win: reserve capacity for serialized children.

Pre-reserving avoids a few reallocations when tuples have many elements.

Add this near the vector declaration:

// right after: std::vector<std::string> serialized_children;
serialized_children.reserve(get_num_children());
tests/tdl/test-parser-ast.cpp (1)

182-196: Trim duplication with small helpers (optional).

There’s a lot of repeated "(0:0)" in golden strings. Not mandatory, but a tiny helper can make future location-format tweaks cheaper.

Example:

namespace {
constexpr std::string_view kLoc{"(0:0)"};
// Then use: "[Identifier]" + std::string{kLoc} + ":TestStruct"
// or prefer raw strings with fmt::format if you already pull fmt into tests.
}

Alternatively, wrap string construction via a tiny helper that returns "Tag" for a given tag.

src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (1)

6-10: Header hygiene nit: avoid reliance on transitive includes.

Although this likely compiles via transitive inclusion through Float.hpp/Node.hpp, consider explicitly including SourceLocation.hpp where serialize_to_str() is used to prevent future include-order breakage.

 #include <spider/tdl/parser/ast/utils.hpp>
+#include <spider/tdl/parser/SourceLocation.hpp>
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1)

26-34: Minor micro-optimisation/readability nit.

If this pattern appears in hot paths, you could precompute indentation strings once for the function to avoid two create_indentation() calls; otherwise, fine to leave as-is for clarity.

src/spider/tdl/parser/ast/node_impl/Identifier.cpp (1)

1-10: Optional: include SourceLocation.hpp explicitly.

Similar to other files, consider adding the direct include to decouple from transitive headers.

 #include <spider/tdl/parser/ast/utils.hpp>
+#include <spider/tdl/parser/SourceLocation.hpp>
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (1)

6-10: Header hygiene nit: explicit SourceLocation include.

Same rationale as Float/Identifier — prefer explicit inclusion to strengthen include-order independence.

 #include <spider/tdl/parser/ast/utils.hpp>
+#include <spider/tdl/parser/SourceLocation.hpp>
src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (1)

53-56: Consider centralising header formatting for location-bearing nodes.

Multiple node serializers now duplicate the pattern "{}[...]{loc}:\n". A small helper (e.g., in ast/utils.hpp) to emit the header line given a label and indentation would reduce repetition and chances of divergence across node types.

src/spider/tdl/parser/ast/node_impl/Namespace.cpp (1)

98-101: Introduce a dedicated name accessor for AST nodes

Multiple serializers in src/spider/tdl/parser/ast/node_impl rely on
get_child_unsafe(0)->serialize_to_str(...), which couples serialization to the child-order invariant. While factories currently enforce that the first child is the node’s identifier, future changes could silently break this.

Occurrences to refactor:

  • StructSpec.cpp at line 99
  • Namespace.cpp at line 99
  • Function.cpp at lines 110 and 124
  • type_impl/Struct.cpp at line 60

Recommended refactor:

  • Add a member function, e.g.
    Identifier const* get_name_id() const {
        auto* child = get_child_unsafe(0);
        SP_ASSERT(dynamic_cast<Identifier const*>(child) != nullptr);
        return static_cast<Identifier const*>(child);
    }
  • Replace direct get_child_unsafe(0)->serialize_to_str(...) calls with
    get_name_id()->serialize_to_str(...) in the above locations.
  • The assertion ensures that if the AST construction ever changes, it will fail fast in debug builds.

This refactor centralizes the invariant and guards against accidental child-order coupling.

src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (1)

117-121: Id rule continues to accept only ID — reserved words are now correctly rejected. Add negative tests across all identifier positions.

You’ve added at least one unit test; consider covering:

  • Namespace name using a reserved word
  • Function name using a reserved word
  • Parameter name using a reserved word
  • Struct name and field name using reserved words

This ensures the reserved sets work uniformly across all id consumption sites.

I can draft concise test cases for each context if you’d like.

src/spider/tdl/parser/ast/node_impl/Function.cpp (1)

118-129: Minor DRY opportunity in serialisation branches.

Both branches compute the same name and return-type blocks. Consider precomputing name_block and ret_block, then only branching on the params portion. This simplifies future tweaks (e.g., spacing or label wording) to a single site.

src/spider/tdl/parser/ast/node_impl/StructSpec.cpp (4)

94-101: StructSpec header now includes source location — looks good; consider centralising header formatting.

The new format "StructSpec:" with a dedicated Name block aligns with the updated tests and other node serialisations. To avoid duplication across nodes, consider extracting a small helper for emitting the common header-with-location line so future tweaks happen in one place.


56-65: Minor perf tweak: reserve hash-set capacity for duplicate-field detection.

Pre-reserving avoids rehashing as fields grow.

Apply this diff:

-    absl::flat_hash_set<std::string_view> field_names;
+    absl::flat_hash_set<std::string_view> field_names;
+    field_names.reserve(fields.size());

79-90: Indexing for “Fields[n]” is 0-based — confirm this is the intended public surface.

Using serialized_fields.size() before emplace means indices start at 0 (matching current tests). If you plan to present 1-based indices to users later, now is the time to switch.


98-100: Use of get_child_unsafe(0) is acceptable given earlier validation, but a safer accessor would be nicer.

You already type-validate the name in create(). If a Result-returning safe accessor exists, using it would be more self-documenting here; otherwise, fine to keep.

tests/tdl/test-parser.cpp (2)

64-270: Broad AST expectation updates match the new location-bearing serialisation.

LGTM. As this string is long and brittle, consider factoring a few helper functions (e.g., to produce “Identifier:name”) or using SECTIONed/fragment assertions for the most important subtrees to reduce churn when formatting evolves.


331-344: Great addition: reserved-keyword rejection test; expand with a parameterised set.

To prevent regressions, cover a small matrix across the new RESERVED_* groups.

Here’s a Catch2 parametric snippet you could add near this section:

+SECTION("Reserved keywords cannot be used as identifiers (parameterised)") {
+    using spider::tdl::parser::SourceLocation;
+    // A representative subset from each RESERVED_* group
+    std::vector<std::string_view> words{
+        "if", "else",              // control flow
+        "class", "struct", "fn",   // def words (some also literals)
+        "const", "static",         // type/storage
+        "try", "catch",            // error handling
+        "public", "private",       // access control
+        "operator", "async", "True", "false" // other words (case variety)
+    };
+    for (auto w : words) {
+        std::string input = " namespace " + std::string(w) + " { fn empty(); }";
+        std::istringstream in{input};
+        auto const res = parse_translation_unit_from_istream(in);
+        INFO("Word: " << w);
+        REQUIRE(res.has_error());
+        auto const& err = res.error();
+        REQUIRE(err.get_message() == "Parser: mismatched input '" + std::string(w) + "' expecting ID");
+        // Column should point at the start of the identifier (0-based); with leading space + "namespace "
+        // that’s column 11 for 1-based line 1.
+        REQUIRE(err.get_source_location() == SourceLocation{1, 11});
+    }
+}
src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp (2)

186-206: Coverage looks broad; a couple of edge-case considerations.

  • Case sensitivity is mixed (“True”/“False” and “true”/“false” are both present). If that’s intentional, fine; otherwise, decide whether keywords are case-sensitive in TDL and align accordingly.
  • Some literals like "namespace"/"fn"/"struct" also exist as T__N tokens; duplication isn’t harmful due to rule order but is redundant. Consider documenting this in the grammar so future edits don’t reorder rules unexpectedly.

If desired, I can propose a grammar comment block clarifying precedence and the rationale for duplicated literals.

Also applies to: 245-316, 452-496, 535-620


621-627: ID rule remains ASCII-only; confirm non-ASCII identifiers are out-of-scope.

If TDL intends to allow Unicode identifiers, this will need to evolve. If ASCII-only is intended, ignore.

src/spider/tdl/parser/TaskDefLang.g4 (3)

284-300: Add Python 3.10’s match keyword to the control‑flow set.

Since you already include Pythonic keywords (elif, pass, yield), consider reserving match as well to prevent accidental use as an identifier in future grammar extensions.

Apply this minimal diff:

 RESERVED_CONTROL_FLOW_WORDS
 : 'if'
 | 'else'
 | 'elif'
 | 'switch'
 | 'case'
+| 'match'
 | 'while'
 | 'for'
 | 'loop'
 | 'do'
 | 'break'
 | 'continue'
 | 'return'
 | 'goto'
 | 'pass'
 | 'yield'
 ;

318-345: Verify longest‑match behaviour for intN tokens vs 'int' and consider reserving None.

  • With 'int' present in RESERVED_TYPE_AND_STORAGE_WORDS and 'int8/int16/int32/int64' defined as literals, ANTLR’s longest‑match should select the longer intN tokens. Add a regression test to confirm that e.g., int32 is tokenised as the builtin type, not as 'int' + '32'.
  • For Python parity, consider adding 'None' to this group alongside null/nullptr/true/false/True/False.

You can add a focused lexer test (or parser snippet) that asserts a declaration like "a: int32" recognises int32 as a single token.

Proposed optional addition:

 RESERVED_TYPE_AND_STORAGE_WORDS
 : 'int'
 ...
 | 'nullptr'
+| 'None'
 | 'true'
 | 'false'
 | 'True'
 | 'False'
 ;

366-384: Round out Python keywords and sanity‑check future C++ additions.

  • Python: consider adding 'lambda', 'from', and 'as' to anticipate imports and lambdas if expressions/statements are added later.
  • C++: when you expand later, plan for keywords like 'constexpr', 'typename', 'template', 'decltype', 'concept', 'requires', 'noexcept', 'mutable', and the co_‑keywords ('co_await', 'co_return', 'co_yield'). Not a blocker for this PR.

Apply this small optional extension for Python:

 RESERVED_OTHER_WORDS
 : 'operator'
 | 'inline'
 | 'virtual'
 | 'override'
 | 'extern'
 | 'sizeof'
 | 'await'
 | 'async'
 | 'with'
 | 'global'
 | 'nonlocal'
 | 'and'
 | 'or'
 | 'not'
 | 'in'
 | 'is'
 | 'del'
+| 'lambda'
+| 'from'
+| 'as'
 ;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d9db9f1 and 368fb79.

📒 Files selected for processing (21)
  • src/spider/tdl/parser/SourceLocation.hpp (2 hunks)
  • src/spider/tdl/parser/TaskDefLang.g4 (1 hunks)
  • src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp (2 hunks)
  • src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h (1 hunks)
  • src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (2 hunks)
  • src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/Function.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/Identifier.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/NamedVar.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/Namespace.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/StructSpec.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/TranslationUnit.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (2 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (1 hunks)
  • src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (1 hunks)
  • tests/tdl/test-parser-ast.cpp (17 hunks)
  • tests/tdl/test-parser.cpp (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (17)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.hpp (1)
  • indentation_level (31-32)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (4)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/utils.cpp (2)
  • create_indentation (14-19)
  • create_indentation (14-14)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.hpp (1)
  • indentation_level (30-31)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (3)
src/spider/tdl/parser/ast/utils.cpp (2)
  • create_indentation (14-19)
  • create_indentation (14-14)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.hpp (1)
  • indentation_level (30-31)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.cpp (4)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/utils.cpp (2)
  • create_indentation (14-19)
  • create_indentation (14-14)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.hpp (1)
  • indentation_level (27-28)
src/spider/tdl/parser/ast/node_impl/Identifier.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/utils.cpp (2)
  • create_indentation (14-19)
  • create_indentation (14-14)
src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
  • indentation_level (29-30)
src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (2)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1)
src/spider/tdl/parser/ast/utils.cpp (2)
  • create_indentation (14-19)
  • create_indentation (14-14)
src/spider/tdl/parser/ast/node_impl/Function.cpp (5)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
  • indentation_level (29-30)
src/spider/tdl/parser/ast/node_impl/NamedVar.hpp (1)
  • indentation_level (35-36)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/Function.hpp (1)
  • indentation_level (50-51)
src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h (1)
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (2)
  • ID (913-915)
  • ID (913-913)
src/spider/tdl/parser/SourceLocation.hpp (2)
src/spider/tdl/Error.hpp (3)
  • nodiscard (27-27)
  • nodiscard (29-31)
  • nodiscard (33-35)
src/spider/tdl/parser/ast/Node.hpp (5)
  • nodiscard (48-48)
  • nodiscard (53-53)
  • nodiscard (76-82)
  • nodiscard (95-97)
  • nodiscard (119-121)
src/spider/tdl/parser/ast/node_impl/StructSpec.cpp (5)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
  • indentation_level (29-30)
src/spider/tdl/parser/ast/node_impl/NamedVar.hpp (1)
  • indentation_level (35-36)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/StructSpec.hpp (1)
  • indentation_level (53-54)
src/spider/tdl/parser/ast/node_impl/TranslationUnit.cpp (1)
src/spider/tdl/parser/ast/node_impl/TranslationUnit.hpp (1)
  • indentation_level (45-46)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/utils.cpp (2)
  • create_indentation (14-19)
  • create_indentation (14-14)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.hpp (1)
  • indentation_level (44-45)
src/spider/tdl/parser/ast/node_impl/Namespace.cpp (4)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
  • indentation_level (29-30)
src/spider/tdl/parser/ast/Node.hpp (1)
  • indentation_level (91-92)
src/spider/tdl/parser/ast/node_impl/Namespace.hpp (1)
  • indentation_level (53-54)
tests/tdl/test-parser.cpp (1)
src/spider/tdl/parser/parse.cpp (2)
  • parse_translation_unit_from_istream (18-61)
  • parse_translation_unit_from_istream (18-19)
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h (1)
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (2)
  • ID (913-915)
  • ID (913-913)
src/spider/tdl/parser/ast/node_impl/NamedVar.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
  • create_indentation (22-22)
src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
  • indentation_level (29-30)
src/spider/tdl/parser/ast/node_impl/NamedVar.hpp (1)
  • indentation_level (35-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (23)
src/spider/tdl/parser/SourceLocation.hpp (1)

20-22: Location serializer looks good and aligns with the new AST output format.

Returns a concise "(line:column)" string and matches the usage sites across AST serializers.

src/spider/tdl/parser/ast/node_impl/NamedVar.cpp (1)

35-38: Header now carries location — consistent and readable. LGTM.

The formatter places the location right after [NamedVar], matching other node serializers. Child error propagation via TRYX remains intact.

src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.cpp (1)

14-18: Bool serializer includes location as expected.

Consistent with Int/Float formatting; no behavioural changes beyond the suffix. Looks good.

src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (1)

38-43: Tuple serialization updated for location (empty and non-empty) — looks correct.

The insertion point for the location is consistent, and empty/non-empty paths are symmetrical.

Also applies to: 63-67

tests/tdl/test-parser-ast.cpp (1)

185-185: Golden strings updated to include source locations — comprehensive and consistent.

The new expectations reflect "(0:0)" across all affected node types and nesting levels. This will catch regressions if any serializer omits the location.

Also applies to: 200-212, 231-231, 235-235, 258-258, 279-279, 282-282, 284-284, 286-286, 315-315, 317-317, 319-319, 321-321, 323-323, 406-406, 409-409, 411-411, 412-412, 414-414, 430-432, 459-459, 462-462, 465-465, 467-467, 469-469, 525-549, 624-628, 707-734, 763-781, 810-823, 848-854, 904-920, 967-1016

src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (1)

15-19: Location-aware Float serialisation — LGTM.

The added placeholder and argument using get_source_location().serialize_to_str() are correct and consistent with the other node types. No functional concerns.

src/spider/tdl/parser/ast/node_impl/TranslationUnit.cpp (2)

73-81: Add SourceLocation to TranslationUnit header — LGTM.

The new "{}" placeholder and get_source_location().serialize_to_str() argument are correctly positioned before the colon, and the rest of the layout remains stable. This matches the formatting pattern used across other nodes.


73-81: Consistency Verified: All AST Node Serialisers Include SourceLocation
A repository-wide scan of src/spider/tdl/parser/ast/node_impl confirmed that every serialize_to_str(size_t) implementation invokes get_source_location().serialize_to_str(). No missing cases were found.

src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1)

29-34: List header now carries SourceLocation — LGTM.

The format string and arguments are correct; indentation and child serialisation are unchanged and safe.

src/spider/tdl/parser/ast/node_impl/Identifier.cpp (1)

14-19: Identifier serialisation includes SourceLocation — LGTM.

Placement before the colon matches other nodes. Given grammar constraints on IDs, emitting m_name directly remains safe.

src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (1)

15-19: Location-aware Int serialisation — LGTM.

The additional placeholder and argument are correct; spec serialisation remains unchanged.

src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (1)

53-56: Location-aware Struct header: formatting and error propagation look correct.

  • The extra placeholder for get_source_location().serialize_to_str() is correctly added and arguments line up with the format string.
  • Use of YSTDLIB_ERROR_HANDLING_TRYX when serialising the name preserves error semantics.
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.cpp (1)

93-100: Map header now includes source location — consistent and correct.

  • The added {} for location is paired with get_source_location().serialize_to_str().
  • Key/Value type serialisation and indentation remain intact.
src/spider/tdl/parser/ast/node_impl/Namespace.cpp (1)

94-101: Namespace serialisation: added location and multiline Name block are coherent with the new pattern.

  • The header now prints the location; the "Name" field is broken out and serialised via the child node, which aligns with Function/Struct changes.
  • Error handling via YSTDLIB_ERROR_HANDLING_TRYX is appropriately used for the child serialisation.
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (1)

78-86: No numeric token comparisons detected—verification complete
The ripgrep searches for any hand-coded getType() == <number> checks, raw token-number comparisons involving ID, SPACE, or COMMENT, and references to old ordinal assumptions outside the generated ANTLR directory all returned no matches. There’s no evidence of brittle numeric token dependencies in the handwritten code.

src/spider/tdl/parser/ast/node_impl/Function.cpp (1)

105-115: Function header now embeds location and serialises the Identifier child — looks good.

  • Placeholders/arguments align; error propagation on name and return-type serialisation is preserved.
  • Param rendering remains stable; indices are computed before push, yielding deterministic labels.
tests/tdl/test-parser.cpp (1)

16-55: Input preface shifting line numbers is a smart way to exercise location tracking.

This ensures downstream line/column expectations are meaningful. No changes needed.

src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h (1)

30-33: No manual numeric token comparisons detected; enum reindexing confined to generated code

Ran the provided grep across the repository. All matches for TaskDefLang(Parser|Lexer)::ID and ID = 32 occur solely within the ANTLR-generated headers and source (TaskDefLangLexer.h, TaskDefLangParser.h/cpp). There are no instances of tokenType == <numeric literal> or other hard-coded integer comparisons in hand-written code. The shift of ID to value 32 is therefore safely isolated to generated files.

src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp (1)

74-78: Reserved-word token groups are correctly added before ID.

Good: ruleNames/symbolicNames include the six RESERVED_* entries; ID comes last, ensuring reserved words don’t lex as identifiers.

Also applies to: 92-96

src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h (1)

30-33: No internal hard-coded parser token IDs found

  • A full-text search for TaskDefLangParser::ID and the numeric literal = 32 only returned matches within the ANTLR-generated lexer and parser sources.
  • There are no occurrences elsewhere in the codebase referencing the old or new enum values.

Please manually verify any external tools or test suites (outside this repository) for hard-coded expectations of the ID token value.

src/spider/tdl/parser/TaskDefLang.g4 (3)

347-356: Error‑handling keywords set looks good.

The selection covers both C/C++ and Python variants and will effectively prevent accidental identifier use.


358-365: Access‑control keywords set looks good.

Consistent with common C++ visibility modifiers; reserving them early is sensible.


302-316: Action Required: Resolve Lexer–Parser Literal Conflicts

The RESERVED_DEF_WORDS lexer rule (lines 302–316 in src/spider/tdl/parser/TaskDefLang.g4) currently includes the keywords struct, namespace, and fn, which are also used as raw string literals in the parser rules for structDef, namespace, and funcDef (lines 127–128, 37–38, 63–64). In ANTLR4’s combined grammars, this causes the lexer to emit a RESERVED_DEF_WORDS token when it sees these keywords, whereas the parser expects the auto-generated literal tokens (T__n). This mismatch will lead to parse errors.

• Location of conflict:

  • Lexer definition: RESERVED_DEF_WORDS at line 302
  • Literal usages:
    struct at lines 127–128 and line 304
    namespace at lines 37–38 and line 308
    fn at lines 63–64 and line 312

Minimal fix (apply this diff and re-generate the parser/lexer):

@@ -303,7 +303,6 @@
 RESERVED_DEF_WORDS
 : 'class'
-| 'struct'
 | 'enum'
 | 'union'
 | 'interface'
@@ -306,7 +305,6 @@
 | 'namespace'
 | 'using'
-| 'fn'
 | 'typedef'
 | 'def'
 | 'let'

• Alternatively (preferred long-term): replace all raw literals in parser rules with dedicated keyword tokens:

KW_STRUCT   : 'struct';
KW_NAMESPACE: 'namespace';
KW_FN       : 'fn';

and update structDef, namespace, and funcDef to use these named tokens.

Please rebuild the lexer/parser and run your parsing tests for namespace, struct, and fn to confirm no regressions. Let me know if you’d like to include the named-token approach in this PR or address it in a follow-up.

Comment on lines +386 to +387
// ID must be defined after RESERVED_* so that the reserved words get higher precedence.
ID: [a-zA-Z_][a-zA-Z0-9_]* ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Good: ID placed after RESERVED_ to ensure precedence. Add guard tests.*

The placement and explanatory comment are correct. To avoid regressions, add unit tests asserting that representative reserved words (e.g., return, class, True) are not lexed as ID, while prefixes/suffixes like returnValue, className, isTrue are lexed as ID.

You can run this repository‑friendly script to detect overlaps between reserved words and parser literals (helps catch the 'struct'/'namespace'/'fn' conflict):

Expected (after fixing): overlap should be empty or limited to literals that are not used as parser tokens. If you still see 'namespace', 'struct', or 'fn' in the overlap, please adjust as suggested above.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4

python - << 'PY'
import re, sys, pathlib
p = pathlib.Path(sys.argv[1]).read_text()

# Collect reserved words (the right-hand side literals in RESERVED_* rules)
reserved = set(re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S))
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved], []))

# Collect all single-quoted literals in the grammar
literals = set(re.findall(r"'([^']+)'", p))

# Intersection shows potential lexer/parser literal conflicts
overlap = sorted(reserved & literals)
print("Overlap between RESERVED_* and parser literals:")
for s in overlap:
    print(" -", s)
PY
"$GRAMMAR"

Length of output: 186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4

python3 - "$GRAMMAR" << 'PY'
import re, pathlib, sys

# Read the grammar file
p = pathlib.Path(sys.argv[1]).read_text()

# Extract RESERVED_* rule literals
reserved_blocks = re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S)
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved_blocks], []))

# Extract all single-quoted literals
literals = set(re.findall(r"'([^']+)'", p))

# Compute overlap
overlap = sorted(reserved & literals)

print("Overlap between RESERVED_* and parser literals:")
for word in overlap:
    print(" -", word)
PY

Length of output: 931


Address Reserved vs Literal Token Conflicts in Grammar

The overlap detection script reveals that numerous reserved words (e.g. return, class, namespace, struct, fn, etc.) are still being lexed as implicit parser literals rather than as your explicit RESERVED_* tokens. This will cause reserved keywords to never match their intended token rules.

• File requiring fixes:

  • src/spider/tdl/parser/TaskDefLang.g4
    – Replace every string literal occurrence of a reserved word in parser rules with its corresponding RESERVED_* token.
    – Remove or refactor implicit literals for keywords so they no longer generate T__N tokens before your named lexer rules.
    • After correcting the grammar, re-run the overlap-detection script to confirm no reserved words intersect with literals.
    • Retain the suggestion to add guard tests ensuring that:
  • Representative reserved words (e.g., return, class, True) are not recognized as ID.
  • Identifiers containing those words as prefixes or suffixes (e.g., returnValue, className, isTrue) still lex as ID.
🤖 Prompt for AI Agents
In src/spider/tdl/parser/TaskDefLang.g4 around lines 386-387, parser rules still
include string literal keywords which get turned into implicit T__N tokens and
conflict with your explicit RESERVED_* lexer rules; update every parser rule
that currently uses a keyword string (e.g. "return", "class", "namespace",
"struct", "fn", "True", etc.) to use the corresponding RESERVED_* token name
instead, remove or refactor any remaining implicit literal occurrences so they
no longer produce T__N tokens (ensure all RESERVED_* lexer rules remain
positioned before the ID rule), re-run the overlap-detection script to confirm
no overlaps, and add unit tests asserting reserved words are not lexed as ID
while identifiers containing those words as substrings still lex as ID.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/spider/tdl/parser/TaskDefLang.g4 (1)

304-323: Eliminate literal-vs-keyword ambiguity for namespace/struct/fn

Right now, 'namespace', 'struct', and 'fn' are used both as parser string literals and as members of RESERVED_DEF_WORDS. This creates two competing lexer matchers for the same lexeme. Depending on ANTLR/target version details, the lexer may emit either T__N or RESERVED_DEF_WORDS for these words. That makes the grammar brittle and can cause hard-to-debug mismatches if rule ordering changes.

Two safe paths, pick one:

  • Preferred: introduce dedicated keyword tokens and reference them in parser rules; remove these words from the RESERVED_* groups to avoid duplication.
  • Minimal: keep string literals in parser rules and remove these words from RESERVED_DEF_WORDS.

Proposed fix (Preferred – introduces dedicated tokens and updates parser rules):

@@
-namespace returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
-: 'namespace' id '{' funcDefs '}' {
+namespace returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
+: KW_NAMESPACE id '{' funcDefs '}' {
@@
-funcDef returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
-: 'fn' id '(' params ')' ret ';' {
+funcDef returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
+: KW_FN id '(' params ')' ret ';' {
@@
-structDef returns [std::shared_ptr<spider::tdl::parser::ast::StructSpec> retval]
-: 'struct' id '{' namedVarList (',')? '}' ';' {
+structDef returns [std::shared_ptr<spider::tdl::parser::ast::StructSpec> retval]
+: KW_STRUCT id '{' namedVarList (',')? '}' ';' {

Add keyword tokens above ID (and after COMMENT), and remove duplicates from RESERVED_DEF_WORDS:

@@
 RESERVED_DEF_WORDS
-: 'class'
-| 'struct'
+: 'class'
 | 'enum'
 | 'union'
 | 'interface'
-| 'namespace'
+/* 'namespace' handled by KW_NAMESPACE */
 | 'using'
 | 'typedef'
 | 'decltype'
 | 'def'
 | 'dataclass'
-| 'fn'
+/* 'fn' handled by KW_FN */
 | 'let'
 | 'module'
 | 'import'
 | 'template'
 | 'requires'
 | 'concept'
 ;
+
+// Dedicated keyword tokens referenced in parser rules
+KW_NAMESPACE: 'namespace';
+KW_STRUCT: 'struct';
+KW_FN: 'fn';

If you prefer the minimal path, keep the parser rules as-is and only delete 'namespace'/'struct'/'fn' from RESERVED_DEF_WORDS.

Action: please pick an approach and re-run/extend tests accordingly.

To double-check there’s no remaining overlap after your change, run the overlap detector (same idea as the previous comment’s script) and ensure the intersection for these three words is empty. Would you like me to paste a repo-friendly script again?

🧹 Nitpick comments (4)
src/spider/tdl/parser/TaskDefLang.g4 (4)

284-302: Reserved control-flow set looks solid; add guard tests for substrings

Coverage is good and the placement before ID is correct. Please add lexer/parse tests to assert that:

  • Keywords themselves (e.g., if, return, while) are not lexed as ID.
  • Substring identifiers (e.g., ifCond, returnValue, whileLoop) are lexed as ID.

I can draft these tests if helpful.


359-368: Error-handling keywords set LGTM

Reasonable coverage for both Python and C++ families. No conflicts with existing parser rules. Consider adding a tiny negative test (e.g., “namespace try { fn x(); }” should report mismatched input 'try' expecting ID).


378-402: Optional: consider rounding out the “other” set for C++20/23

If you want closer parity with modern C++, consider adding:

  • export (modules), explicit, mutable, thread_local, alignas, alignof
  • co_await, co_return, co_yield

Not blocking; just future-proofing. I can open a follow-up PR if you agree.


304-323: Keep keyword lists maintainable

Consider alphabetising each RESERVED_* list and adding a short comment noting the intended source languages (Python, C/C++, general). This eases diff review and future additions. Non-blocking.

Also applies to: 325-357

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2274d2e and 3d6c535.

📒 Files selected for processing (2)
  • src/spider/tdl/parser/TaskDefLang.g4 (1 hunks)
  • src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: lint
🔇 Additional comments (3)
src/spider/tdl/parser/TaskDefLang.g4 (3)

370-376: Access-control keywords set LGTM

No parser usage; purely to block IDs. Looks good.


404-405: Good precedence note and placement of ID

The comment is accurate and the placement is correct. If you adopt dedicated KW_* tokens as suggested, update the comment to reflect “RESERVED_* and KW_*” precede ID.


304-323: No TDL source files detected; manual verification required

I scanned the repository for any *.tdl (or *.TDL) files and didn’t find any, so I can’t automatically verify that the newly reserved keywords aren’t used as identifiers in your TDL inputs. Please confirm where your TDL source files live (or share representative examples) and manually check that no existing identifiers collide with the newly reserved words.

  • No .tdl/.TDL files located under the repo root or subdirectories
  • Provide the path to your TDL examples (or include sample files) for automated scanning
  • Once located, rerun the provided script or manually inspect identifiers against the list:
    int, char, string, str, void, class, enum, union, interface, operator, override, decltype, concept, requires, module, template

Comment on lines +325 to +357
RESERVED_TYPE_AND_STORAGE_WORDS
: 'int'
| 'long'
| 'short'
| 'byte'
| 'signed'
| 'unsigned'
| 'char'
| 'string'
| 'str'
| 'float'
| 'double'
| 'bool'
| 'void'
| 'const'
| 'static'
| 'auto'
| 'volatile'
| 'register'
| 'final'
| 'abstract'
| 'this'
| 'self'
| 'new'
| 'delete'
| 'null'
| 'nullptr'
| 'None'
| 'true'
| 'false'
| 'True'
| 'False'
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Resolve potential mismatch for float/double/bool between RESERVED_ and builtinType literals*

Similar duplication exists for 'float', 'double', and 'bool': they appear as string literals in builtinType and inside RESERVED_TYPE_AND_STORAGE_WORDS. If the lexer emits RESERVED_TYPE_AND_STORAGE_WORDS for these, builtinType branches expecting T__N ('float', etc.) won’t match. While this may “work by accident” today, it’s fragile.

Preferred fix: introduce dedicated tokens and reference them in builtinType; remove duplicates from RESERVED_TYPE_AND_STORAGE_WORDS.

@@
 builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
@@
-| 'float' {
+| KW_FLOAT {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Float,
             loc
     );
 }
- | 'double' {
+| KW_DOUBLE {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Double,
             loc
     );
 }
-| 'bool' {
+| KW_BOOL {
     $retval = spider::tdl::parser::ast::Bool::create(loc);
 }
@@
 RESERVED_TYPE_AND_STORAGE_WORDS
-: 'int'
+: 'int'
 | 'long'
@@
-| 'float'
-| 'double'
-| 'bool'
+/* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
@@
 ;
+
+// Dedicated tokens for types referenced in parser rules
+KW_FLOAT: 'float';
+KW_DOUBLE: 'double';
+KW_BOOL: 'bool';

If you intentionally want to keep them only as literals, delete them from RESERVED_TYPE_AND_STORAGE_WORDS instead to remove the ambiguity.

Please verify that tests covering float/double/bool parsing still pass after this change.


🏁 Script executed:

#!/bin/bash
# Display the builtinType parser rule (to inspect float/double/bool usage)
rg -n 'builtinType returns' -C20 src/spider/tdl/parser/TaskDefLang.g4

# Check for existing token definitions for dedicated float/double/bool tokens
rg -n 'KW_FLOAT' -g 'src/spider/tdl/parser/TaskDefLang.g4'
rg -n 'KW_DOUBLE' -g 'src/spider/tdl/parser/TaskDefLang.g4'
rg -n 'KW_BOOL' -g 'src/spider/tdl/parser/TaskDefLang.g4'

# Display the RESERVED_TYPE_AND_STORAGE_WORDS lexer rule to confirm duplicates
rg -n 'RESERVED_TYPE_AND_STORAGE_WORDS' -C20 src/spider/tdl/parser/TaskDefLang.g4

Length of output: 2259


Deduplicate float/double/bool tokens in TaskDefLang.g4

To avoid the lexer capturing ‘float’, ‘double’ and ‘bool’ as part of RESERVED_TYPE_AND_STORAGE_WORDS—thus blocking your builtinType parser rule from ever seeing those literals—introduce explicit tokens for these keywords and remove them from the reserved-word set.

Affected locations:

  • src/spider/tdl/parser/TaskDefLang.g4: builtinType parser rule (currently lines 258–…)
  • src/spider/tdl/parser/TaskDefLang.g4: RESERVED_TYPE_AND_STORAGE_WORDS lexer rule (lines 325–337)
  • Add new lexer definitions for KW_FLOAT, KW_DOUBLE, KW_BOOL

Proposed diff:

--- a/src/spider/tdl/parser/TaskDefLang.g4
+++ b/src/spider/tdl/parser/TaskDefLang.g4
@@ builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
- | 'float' {
+ | KW_FLOAT {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Float,
             loc
     );
 }
- | 'double' {
+ | KW_DOUBLE {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Double,
             loc
     );
 }
- | 'bool' {
+ | KW_BOOL {
     $retval = spider::tdl::parser::ast::Bool::create(loc);
 }
@@ RESERVED_TYPE_AND_STORAGE_WORDS
- | 'float'
- | 'double'
- | 'bool'
+ /* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
@@
 ;
+
+// Dedicated tokens for float/double/bool
+KW_FLOAT : 'float';
+KW_DOUBLE: 'double';
+KW_BOOL  : 'bool';

Please verify that all existing tests for parsing float/double/bool still pass after applying this change.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RESERVED_TYPE_AND_STORAGE_WORDS
: 'int'
| 'long'
| 'short'
| 'byte'
| 'signed'
| 'unsigned'
| 'char'
| 'string'
| 'str'
| 'float'
| 'double'
| 'bool'
| 'void'
| 'const'
| 'static'
| 'auto'
| 'volatile'
| 'register'
| 'final'
| 'abstract'
| 'this'
| 'self'
| 'new'
| 'delete'
| 'null'
| 'nullptr'
| 'None'
| 'true'
| 'false'
| 'True'
| 'False'
;
// --- In the parser rule for builtinType (around line 258) ---
builtinType
returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
: /* … other type cases … */
// replace the string‐literal cases for float/double/bool with the new tokens
| KW_FLOAT {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Float,
loc
);
}
| KW_DOUBLE {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Double,
loc
);
}
| KW_BOOL {
$retval = spider::tdl::parser::ast::Bool::create(loc);
}
;
// --- In the lexer rule RESERVED_TYPE_AND_STORAGE_WORDS (around line 325) ---
RESERVED_TYPE_AND_STORAGE_WORDS
: 'int'
| 'long'
| 'short'
| 'byte'
| 'signed'
| 'unsigned'
| 'char'
| 'string'
| 'str'
| 'const'
| 'static'
| 'auto'
| 'volatile'
| 'register'
| 'final'
| 'abstract'
| 'this'
| 'self'
| 'new'
| 'delete'
| 'null'
| 'nullptr'
| 'None'
| 'true'
| 'false'
| 'True'
| 'False'
- | 'float'
- | 'double'
/* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
;
// Dedicated tokens for the floating-point and boolean keywords
KW_FLOAT : 'float';
KW_DOUBLE: 'double';
KW_BOOL : 'bool';

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants