-
Notifications
You must be signed in to change notification settings - Fork 9
feat(tdl): Add reserved keywords in the grammar file. #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughIntroduces reserved-word token groups in TaskDefLang.g4 and reorders the ID rule after them. Regenerates ANTLR lexer/parser sources to reflect the expanded token set and renumbered tokens. Updates tests to include location-annotated AST strings and adds a parser error test asserting reserved words cannot be used as identifiers. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.38.6)src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cppTip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/tdl/test-parser-ast.cpp (1)
959-966: Add source-location serialization to all AST node serializersThe verification script shows that while every
.cppimplementation invokesget_source_location().serialize_to_str(), the inline serializer definitions in the corresponding header files are missing this call. To ensure consistency and complete location tracking, please update each of the following header files’serialize_to_str(size_t)implementations to include the source-location serialization:• src/spider/tdl/parser/ast/node_impl/Function.hpp
• src/spider/tdl/parser/ast/node_impl/Identifier.hpp
• src/spider/tdl/parser/ast/node_impl/NamedVar.hpp
• src/spider/tdl/parser/ast/node_impl/Namespace.hpp
• src/spider/tdl/parser/ast/node_impl/StructSpec.hpp
• src/spider/tdl/parser/ast/node_impl/TranslationUnit.hpp• src/spider/tdl/parser/ast/node_impl/type_impl/Struct.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.hpp
• src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.hppEach serializer should prepend or append (as appropriate) a call to
get_source_location().serialize_to_str(),to its output stream to thread the node’s source location into the serialized representation.
🧹 Nitpick comments (23)
src/spider/tdl/parser/SourceLocation.hpp (2)
7-7: Option: reduce runtime overhead using compile-time format.If you prefer to keep fmt in the header, using a compile-time format string (FMT_COMPILE) can shave a bit of overhead and catches format mistakes at compile time.
Apply:
-#include <fmt/format.h> +#include <fmt/format.h> +#include <fmt/compile.h> - return fmt::format("({}:{})", m_line, m_column); + return fmt::format(FMT_COMPILE("({}:{})"), m_line, m_column);Also applies to: 20-22
7-7: Option: avoid leaking fmt through a public header.Including fmt from a widely-included header propagates that dependency and increases compile times. Two lightweight alternatives:
- Keep the method inline but build the string via std::string to drop the fmt include from the header.
- Or, declare in the header and define in a .cpp that includes fmt locally.
Minimal header-only change (drops fmt entirely here):
-#include <fmt/format.h> +// (optional) Keep the header independent of fmt. [[nodiscard]] auto serialize_to_str() const -> std::string { - return fmt::format("({}:{})", m_line, m_column); + std::string out; + out.reserve(32); + out.push_back('('); + out += std::to_string(m_line); + out.push_back(':'); + out += std::to_string(m_column); + out.push_back(')'); + return out; }If you prefer an out-of-line definition, I can provide a matching SourceLocation.cpp snippet.
Also applies to: 20-22
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (1)
46-61: Minor allocation win: reserve capacity for serialized children.Pre-reserving avoids a few reallocations when tuples have many elements.
Add this near the vector declaration:
// right after: std::vector<std::string> serialized_children; serialized_children.reserve(get_num_children());tests/tdl/test-parser-ast.cpp (1)
182-196: Trim duplication with small helpers (optional).There’s a lot of repeated "(0:0)" in golden strings. Not mandatory, but a tiny helper can make future location-format tweaks cheaper.
Example:
namespace { constexpr std::string_view kLoc{"(0:0)"}; // Then use: "[Identifier]" + std::string{kLoc} + ":TestStruct" // or prefer raw strings with fmt::format if you already pull fmt into tests. }Alternatively, wrap string construction via a tiny helper that returns "Tag" for a given tag.
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (1)
6-10: Header hygiene nit: avoid reliance on transitive includes.Although this likely compiles via transitive inclusion through Float.hpp/Node.hpp, consider explicitly including SourceLocation.hpp where serialize_to_str() is used to prevent future include-order breakage.
#include <spider/tdl/parser/ast/utils.hpp> +#include <spider/tdl/parser/SourceLocation.hpp>src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1)
26-34: Minor micro-optimisation/readability nit.If this pattern appears in hot paths, you could precompute indentation strings once for the function to avoid two create_indentation() calls; otherwise, fine to leave as-is for clarity.
src/spider/tdl/parser/ast/node_impl/Identifier.cpp (1)
1-10: Optional: include SourceLocation.hpp explicitly.Similar to other files, consider adding the direct include to decouple from transitive headers.
#include <spider/tdl/parser/ast/utils.hpp> +#include <spider/tdl/parser/SourceLocation.hpp>src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (1)
6-10: Header hygiene nit: explicit SourceLocation include.Same rationale as Float/Identifier — prefer explicit inclusion to strengthen include-order independence.
#include <spider/tdl/parser/ast/utils.hpp> +#include <spider/tdl/parser/SourceLocation.hpp>src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (1)
53-56: Consider centralising header formatting for location-bearing nodes.Multiple node serializers now duplicate the pattern
"{}[...]{loc}:\n". A small helper (e.g., in ast/utils.hpp) to emit the header line given a label and indentation would reduce repetition and chances of divergence across node types.src/spider/tdl/parser/ast/node_impl/Namespace.cpp (1)
98-101: Introduce a dedicated name accessor for AST nodesMultiple serializers in
src/spider/tdl/parser/ast/node_implrely on
get_child_unsafe(0)->serialize_to_str(...), which couples serialization to the child-order invariant. While factories currently enforce that the first child is the node’s identifier, future changes could silently break this.Occurrences to refactor:
StructSpec.cppat line 99Namespace.cppat line 99Function.cppat lines 110 and 124type_impl/Struct.cppat line 60Recommended refactor:
- Add a member function, e.g.
Identifier const* get_name_id() const { auto* child = get_child_unsafe(0); SP_ASSERT(dynamic_cast<Identifier const*>(child) != nullptr); return static_cast<Identifier const*>(child); }- Replace direct
get_child_unsafe(0)->serialize_to_str(...)calls with
get_name_id()->serialize_to_str(...)in the above locations.- The assertion ensures that if the AST construction ever changes, it will fail fast in debug builds.
This refactor centralizes the invariant and guards against accidental child-order coupling.
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (1)
117-121: Id rule continues to accept only ID — reserved words are now correctly rejected. Add negative tests across all identifier positions.You’ve added at least one unit test; consider covering:
- Namespace name using a reserved word
- Function name using a reserved word
- Parameter name using a reserved word
- Struct name and field name using reserved words
This ensures the reserved sets work uniformly across all
idconsumption sites.I can draft concise test cases for each context if you’d like.
src/spider/tdl/parser/ast/node_impl/Function.cpp (1)
118-129: Minor DRY opportunity in serialisation branches.Both branches compute the same name and return-type blocks. Consider precomputing
name_blockandret_block, then only branching on the params portion. This simplifies future tweaks (e.g., spacing or label wording) to a single site.src/spider/tdl/parser/ast/node_impl/StructSpec.cpp (4)
94-101: StructSpec header now includes source location — looks good; consider centralising header formatting.The new format "StructSpec:" with a dedicated Name block aligns with the updated tests and other node serialisations. To avoid duplication across nodes, consider extracting a small helper for emitting the common header-with-location line so future tweaks happen in one place.
56-65: Minor perf tweak: reserve hash-set capacity for duplicate-field detection.Pre-reserving avoids rehashing as fields grow.
Apply this diff:
- absl::flat_hash_set<std::string_view> field_names; + absl::flat_hash_set<std::string_view> field_names; + field_names.reserve(fields.size());
79-90: Indexing for “Fields[n]” is 0-based — confirm this is the intended public surface.Using serialized_fields.size() before emplace means indices start at 0 (matching current tests). If you plan to present 1-based indices to users later, now is the time to switch.
98-100: Use of get_child_unsafe(0) is acceptable given earlier validation, but a safer accessor would be nicer.You already type-validate the name in create(). If a Result-returning safe accessor exists, using it would be more self-documenting here; otherwise, fine to keep.
tests/tdl/test-parser.cpp (2)
64-270: Broad AST expectation updates match the new location-bearing serialisation.LGTM. As this string is long and brittle, consider factoring a few helper functions (e.g., to produce “Identifier:name”) or using SECTIONed/fragment assertions for the most important subtrees to reduce churn when formatting evolves.
331-344: Great addition: reserved-keyword rejection test; expand with a parameterised set.To prevent regressions, cover a small matrix across the new RESERVED_* groups.
Here’s a Catch2 parametric snippet you could add near this section:
+SECTION("Reserved keywords cannot be used as identifiers (parameterised)") { + using spider::tdl::parser::SourceLocation; + // A representative subset from each RESERVED_* group + std::vector<std::string_view> words{ + "if", "else", // control flow + "class", "struct", "fn", // def words (some also literals) + "const", "static", // type/storage + "try", "catch", // error handling + "public", "private", // access control + "operator", "async", "True", "false" // other words (case variety) + }; + for (auto w : words) { + std::string input = " namespace " + std::string(w) + " { fn empty(); }"; + std::istringstream in{input}; + auto const res = parse_translation_unit_from_istream(in); + INFO("Word: " << w); + REQUIRE(res.has_error()); + auto const& err = res.error(); + REQUIRE(err.get_message() == "Parser: mismatched input '" + std::string(w) + "' expecting ID"); + // Column should point at the start of the identifier (0-based); with leading space + "namespace " + // that’s column 11 for 1-based line 1. + REQUIRE(err.get_source_location() == SourceLocation{1, 11}); + } +}src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp (2)
186-206: Coverage looks broad; a couple of edge-case considerations.
- Case sensitivity is mixed (“True”/“False” and “true”/“false” are both present). If that’s intentional, fine; otherwise, decide whether keywords are case-sensitive in TDL and align accordingly.
- Some literals like "namespace"/"fn"/"struct" also exist as T__N tokens; duplication isn’t harmful due to rule order but is redundant. Consider documenting this in the grammar so future edits don’t reorder rules unexpectedly.
If desired, I can propose a grammar comment block clarifying precedence and the rationale for duplicated literals.
Also applies to: 245-316, 452-496, 535-620
621-627: ID rule remains ASCII-only; confirm non-ASCII identifiers are out-of-scope.If TDL intends to allow Unicode identifiers, this will need to evolve. If ASCII-only is intended, ignore.
src/spider/tdl/parser/TaskDefLang.g4 (3)
284-300: Add Python 3.10’s match keyword to the control‑flow set.Since you already include Pythonic keywords (elif, pass, yield), consider reserving match as well to prevent accidental use as an identifier in future grammar extensions.
Apply this minimal diff:
RESERVED_CONTROL_FLOW_WORDS : 'if' | 'else' | 'elif' | 'switch' | 'case' +| 'match' | 'while' | 'for' | 'loop' | 'do' | 'break' | 'continue' | 'return' | 'goto' | 'pass' | 'yield' ;
318-345: Verify longest‑match behaviour for intN tokens vs 'int' and consider reserving None.
- With 'int' present in RESERVED_TYPE_AND_STORAGE_WORDS and 'int8/int16/int32/int64' defined as literals, ANTLR’s longest‑match should select the longer intN tokens. Add a regression test to confirm that e.g., int32 is tokenised as the builtin type, not as 'int' + '32'.
- For Python parity, consider adding 'None' to this group alongside null/nullptr/true/false/True/False.
You can add a focused lexer test (or parser snippet) that asserts a declaration like "a: int32" recognises int32 as a single token.
Proposed optional addition:
RESERVED_TYPE_AND_STORAGE_WORDS : 'int' ... | 'nullptr' +| 'None' | 'true' | 'false' | 'True' | 'False' ;
366-384: Round out Python keywords and sanity‑check future C++ additions.
- Python: consider adding 'lambda', 'from', and 'as' to anticipate imports and lambdas if expressions/statements are added later.
- C++: when you expand later, plan for keywords like 'constexpr', 'typename', 'template', 'decltype', 'concept', 'requires', 'noexcept', 'mutable', and the co_‑keywords ('co_await', 'co_return', 'co_yield'). Not a blocker for this PR.
Apply this small optional extension for Python:
RESERVED_OTHER_WORDS : 'operator' | 'inline' | 'virtual' | 'override' | 'extern' | 'sizeof' | 'await' | 'async' | 'with' | 'global' | 'nonlocal' | 'and' | 'or' | 'not' | 'in' | 'is' | 'del' +| 'lambda' +| 'from' +| 'as' ;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (21)
src/spider/tdl/parser/SourceLocation.hpp(2 hunks)src/spider/tdl/parser/TaskDefLang.g4(1 hunks)src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp(2 hunks)src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h(1 hunks)src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp(2 hunks)src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h(1 hunks)src/spider/tdl/parser/ast/node_impl/Function.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/Identifier.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/NamedVar.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/Namespace.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/StructSpec.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/TranslationUnit.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp(2 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp(1 hunks)src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp(1 hunks)tests/tdl/test-parser-ast.cpp(17 hunks)tests/tdl/test-parser.cpp(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (17)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.hpp (1)
indentation_level(31-32)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (4)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/utils.cpp (2)
create_indentation(14-19)create_indentation(14-14)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.hpp (1)
indentation_level(30-31)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (3)
src/spider/tdl/parser/ast/utils.cpp (2)
create_indentation(14-19)create_indentation(14-14)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.hpp (1)
indentation_level(30-31)
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.cpp (4)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/utils.cpp (2)
create_indentation(14-19)create_indentation(14-14)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.hpp (1)
indentation_level(27-28)
src/spider/tdl/parser/ast/node_impl/Identifier.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/utils.cpp (2)
create_indentation(14-19)create_indentation(14-14)src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
indentation_level(29-30)
src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (2)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1)
src/spider/tdl/parser/ast/utils.cpp (2)
create_indentation(14-19)create_indentation(14-14)
src/spider/tdl/parser/ast/node_impl/Function.cpp (5)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
indentation_level(29-30)src/spider/tdl/parser/ast/node_impl/NamedVar.hpp (1)
indentation_level(35-36)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/Function.hpp (1)
indentation_level(50-51)
src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h (1)
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (2)
ID(913-915)ID(913-913)
src/spider/tdl/parser/SourceLocation.hpp (2)
src/spider/tdl/Error.hpp (3)
nodiscard(27-27)nodiscard(29-31)nodiscard(33-35)src/spider/tdl/parser/ast/Node.hpp (5)
nodiscard(48-48)nodiscard(53-53)nodiscard(76-82)nodiscard(95-97)nodiscard(119-121)
src/spider/tdl/parser/ast/node_impl/StructSpec.cpp (5)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
indentation_level(29-30)src/spider/tdl/parser/ast/node_impl/NamedVar.hpp (1)
indentation_level(35-36)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/StructSpec.hpp (1)
indentation_level(53-54)
src/spider/tdl/parser/ast/node_impl/TranslationUnit.cpp (1)
src/spider/tdl/parser/ast/node_impl/TranslationUnit.hpp (1)
indentation_level(45-46)
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/utils.cpp (2)
create_indentation(14-19)create_indentation(14-14)src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.hpp (1)
indentation_level(44-45)
src/spider/tdl/parser/ast/node_impl/Namespace.cpp (4)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
indentation_level(29-30)src/spider/tdl/parser/ast/Node.hpp (1)
indentation_level(91-92)src/spider/tdl/parser/ast/node_impl/Namespace.hpp (1)
indentation_level(53-54)
tests/tdl/test-parser.cpp (1)
src/spider/tdl/parser/parse.cpp (2)
parse_translation_unit_from_istream(18-61)parse_translation_unit_from_istream(18-19)
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h (1)
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (2)
ID(913-915)ID(913-913)
src/spider/tdl/parser/ast/node_impl/NamedVar.cpp (3)
src/spider/tdl/parser/ast/utils.hpp (1)
create_indentation(22-22)src/spider/tdl/parser/ast/node_impl/Identifier.hpp (1)
indentation_level(29-30)src/spider/tdl/parser/ast/node_impl/NamedVar.hpp (1)
indentation_level(35-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
- GitHub Check: lint
🔇 Additional comments (23)
src/spider/tdl/parser/SourceLocation.hpp (1)
20-22: Location serializer looks good and aligns with the new AST output format.Returns a concise "(line:column)" string and matches the usage sites across AST serializers.
src/spider/tdl/parser/ast/node_impl/NamedVar.cpp (1)
35-38: Header now carries location — consistent and readable. LGTM.The formatter places the location right after [NamedVar], matching other node serializers. Child error propagation via TRYX remains intact.
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Bool.cpp (1)
14-18: Bool serializer includes location as expected.Consistent with Int/Float formatting; no behavioural changes beyond the suffix. Looks good.
src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Tuple.cpp (1)
38-43: Tuple serialization updated for location (empty and non-empty) — looks correct.The insertion point for the location is consistent, and empty/non-empty paths are symmetrical.
Also applies to: 63-67
tests/tdl/test-parser-ast.cpp (1)
185-185: Golden strings updated to include source locations — comprehensive and consistent.The new expectations reflect "(0:0)" across all affected node types and nesting levels. This will catch regressions if any serializer omits the location.
Also applies to: 200-212, 231-231, 235-235, 258-258, 279-279, 282-282, 284-284, 286-286, 315-315, 317-317, 319-319, 321-321, 323-323, 406-406, 409-409, 411-411, 412-412, 414-414, 430-432, 459-459, 462-462, 465-465, 467-467, 469-469, 525-549, 624-628, 707-734, 763-781, 810-823, 848-854, 904-920, 967-1016
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Float.cpp (1)
15-19: Location-aware Float serialisation — LGTM.The added placeholder and argument using get_source_location().serialize_to_str() are correct and consistent with the other node types. No functional concerns.
src/spider/tdl/parser/ast/node_impl/TranslationUnit.cpp (2)
73-81: Add SourceLocation to TranslationUnit header — LGTM.The new "{}" placeholder and get_source_location().serialize_to_str() argument are correctly positioned before the colon, and the rest of the layout remains stable. This matches the formatting pattern used across other nodes.
73-81: Consistency Verified: All AST Node Serialisers Include SourceLocation
A repository-wide scan ofsrc/spider/tdl/parser/ast/node_implconfirmed that everyserialize_to_str(size_t)implementation invokesget_source_location().serialize_to_str(). No missing cases were found.src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/List.cpp (1)
29-34: List header now carries SourceLocation — LGTM.The format string and arguments are correct; indentation and child serialisation are unchanged and safe.
src/spider/tdl/parser/ast/node_impl/Identifier.cpp (1)
14-19: Identifier serialisation includes SourceLocation — LGTM.Placement before the colon matches other nodes. Given grammar constraints on IDs, emitting m_name directly remains safe.
src/spider/tdl/parser/ast/node_impl/type_impl/primitive_impl/Int.cpp (1)
15-19: Location-aware Int serialisation — LGTM.The additional placeholder and argument are correct; spec serialisation remains unchanged.
src/spider/tdl/parser/ast/node_impl/type_impl/Struct.cpp (1)
53-56: Location-aware Struct header: formatting and error propagation look correct.
- The extra placeholder for
get_source_location().serialize_to_str()is correctly added and arguments line up with the format string.- Use of
YSTDLIB_ERROR_HANDLING_TRYXwhen serialising the name preserves error semantics.src/spider/tdl/parser/ast/node_impl/type_impl/container_impl/Map.cpp (1)
93-100: Map header now includes source location — consistent and correct.
- The added
{}for location is paired withget_source_location().serialize_to_str().- Key/Value type serialisation and indentation remain intact.
src/spider/tdl/parser/ast/node_impl/Namespace.cpp (1)
94-101: Namespace serialisation: added location and multiline Name block are coherent with the new pattern.
- The header now prints the location; the "Name" field is broken out and serialised via the child node, which aligns with Function/Struct changes.
- Error handling via
YSTDLIB_ERROR_HANDLING_TRYXis appropriately used for the child serialisation.src/spider/tdl/parser/antlr_generated/TaskDefLangParser.cpp (1)
78-86: No numeric token comparisons detected—verification complete
The ripgrep searches for any hand-codedgetType() == <number>checks, raw token-number comparisons involvingID,SPACE, orCOMMENT, and references to old ordinal assumptions outside the generated ANTLR directory all returned no matches. There’s no evidence of brittle numeric token dependencies in the handwritten code.src/spider/tdl/parser/ast/node_impl/Function.cpp (1)
105-115: Function header now embeds location and serialises the Identifier child — looks good.
- Placeholders/arguments align; error propagation on name and return-type serialisation is preserved.
- Param rendering remains stable; indices are computed before push, yielding deterministic labels.
tests/tdl/test-parser.cpp (1)
16-55: Input preface shifting line numbers is a smart way to exercise location tracking.This ensures downstream line/column expectations are meaningful. No changes needed.
src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.h (1)
30-33: No manual numeric token comparisons detected; enum reindexing confined to generated codeRan the provided grep across the repository. All matches for
TaskDefLang(Parser|Lexer)::IDandID = 32occur solely within the ANTLR-generated headers and source (TaskDefLangLexer.h, TaskDefLangParser.h/cpp). There are no instances oftokenType == <numeric literal>or other hard-coded integer comparisons in hand-written code. The shift ofIDto value 32 is therefore safely isolated to generated files.src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp (1)
74-78: Reserved-word token groups are correctly added before ID.Good: ruleNames/symbolicNames include the six RESERVED_* entries; ID comes last, ensuring reserved words don’t lex as identifiers.
Also applies to: 92-96
src/spider/tdl/parser/antlr_generated/TaskDefLangParser.h (1)
30-33: No internal hard-coded parser token IDs found
- A full-text search for
TaskDefLangParser::IDand the numeric literal= 32only returned matches within the ANTLR-generated lexer and parser sources.- There are no occurrences elsewhere in the codebase referencing the old or new enum values.
Please manually verify any external tools or test suites (outside this repository) for hard-coded expectations of the
IDtoken value.src/spider/tdl/parser/TaskDefLang.g4 (3)
347-356: Error‑handling keywords set looks good.The selection covers both C/C++ and Python variants and will effectively prevent accidental identifier use.
358-365: Access‑control keywords set looks good.Consistent with common C++ visibility modifiers; reserving them early is sensible.
302-316: Action Required: Resolve Lexer–Parser Literal ConflictsThe
RESERVED_DEF_WORDSlexer rule (lines 302–316 insrc/spider/tdl/parser/TaskDefLang.g4) currently includes the keywordsstruct,namespace, andfn, which are also used as raw string literals in the parser rules forstructDef,namespace, andfuncDef(lines 127–128, 37–38, 63–64). In ANTLR4’s combined grammars, this causes the lexer to emit aRESERVED_DEF_WORDStoken when it sees these keywords, whereas the parser expects the auto-generated literal tokens (T__n). This mismatch will lead to parse errors.• Location of conflict:
- Lexer definition: RESERVED_DEF_WORDS at line 302
- Literal usages:
•structat lines 127–128 and line 304
•namespaceat lines 37–38 and line 308
•fnat lines 63–64 and line 312• Minimal fix (apply this diff and re-generate the parser/lexer):
@@ -303,7 +303,6 @@ RESERVED_DEF_WORDS : 'class' -| 'struct' | 'enum' | 'union' | 'interface' @@ -306,7 +305,6 @@ | 'namespace' | 'using' -| 'fn' | 'typedef' | 'def' | 'let'• Alternatively (preferred long-term): replace all raw literals in parser rules with dedicated keyword tokens:
KW_STRUCT : 'struct'; KW_NAMESPACE: 'namespace'; KW_FN : 'fn';and update
structDef,namespace, andfuncDefto use these named tokens.Please rebuild the lexer/parser and run your parsing tests for
namespace,struct, andfnto confirm no regressions. Let me know if you’d like to include the named-token approach in this PR or address it in a follow-up.
| // ID must be defined after RESERVED_* so that the reserved words get higher precedence. | ||
| ID: [a-zA-Z_][a-zA-Z0-9_]* ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Good: ID placed after RESERVED_ to ensure precedence. Add guard tests.*
The placement and explanatory comment are correct. To avoid regressions, add unit tests asserting that representative reserved words (e.g., return, class, True) are not lexed as ID, while prefixes/suffixes like returnValue, className, isTrue are lexed as ID.
You can run this repository‑friendly script to detect overlaps between reserved words and parser literals (helps catch the 'struct'/'namespace'/'fn' conflict):
Expected (after fixing): overlap should be empty or limited to literals that are not used as parser tokens. If you still see 'namespace', 'struct', or 'fn' in the overlap, please adjust as suggested above.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4
python - << 'PY'
import re, sys, pathlib
p = pathlib.Path(sys.argv[1]).read_text()
# Collect reserved words (the right-hand side literals in RESERVED_* rules)
reserved = set(re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S))
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved], []))
# Collect all single-quoted literals in the grammar
literals = set(re.findall(r"'([^']+)'", p))
# Intersection shows potential lexer/parser literal conflicts
overlap = sorted(reserved & literals)
print("Overlap between RESERVED_* and parser literals:")
for s in overlap:
print(" -", s)
PY
"$GRAMMAR"Length of output: 186
🏁 Script executed:
#!/bin/bash
set -euo pipefail
GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4
python3 - "$GRAMMAR" << 'PY'
import re, pathlib, sys
# Read the grammar file
p = pathlib.Path(sys.argv[1]).read_text()
# Extract RESERVED_* rule literals
reserved_blocks = re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S)
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved_blocks], []))
# Extract all single-quoted literals
literals = set(re.findall(r"'([^']+)'", p))
# Compute overlap
overlap = sorted(reserved & literals)
print("Overlap between RESERVED_* and parser literals:")
for word in overlap:
print(" -", word)
PYLength of output: 931
Address Reserved vs Literal Token Conflicts in Grammar
The overlap detection script reveals that numerous reserved words (e.g. return, class, namespace, struct, fn, etc.) are still being lexed as implicit parser literals rather than as your explicit RESERVED_* tokens. This will cause reserved keywords to never match their intended token rules.
• File requiring fixes:
src/spider/tdl/parser/TaskDefLang.g4
– Replace every string literal occurrence of a reserved word in parser rules with its correspondingRESERVED_*token.
– Remove or refactor implicit literals for keywords so they no longer generateT__Ntokens before your named lexer rules.
• After correcting the grammar, re-run the overlap-detection script to confirm no reserved words intersect with literals.
• Retain the suggestion to add guard tests ensuring that:- Representative reserved words (e.g.,
return,class,True) are not recognized asID. - Identifiers containing those words as prefixes or suffixes (e.g.,
returnValue,className,isTrue) still lex asID.
🤖 Prompt for AI Agents
In src/spider/tdl/parser/TaskDefLang.g4 around lines 386-387, parser rules still
include string literal keywords which get turned into implicit T__N tokens and
conflict with your explicit RESERVED_* lexer rules; update every parser rule
that currently uses a keyword string (e.g. "return", "class", "namespace",
"struct", "fn", "True", etc.) to use the corresponding RESERVED_* token name
instead, remove or refactor any remaining implicit literal occurrences so they
no longer produce T__N tokens (ensure all RESERVED_* lexer rules remain
positioned before the ID rule), re-run the overlap-detection script to confirm
no overlaps, and add unit tests asserting reserved words are not lexed as ID
while identifiers containing those words as substrings still lex as ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/spider/tdl/parser/TaskDefLang.g4 (1)
304-323: Eliminate literal-vs-keyword ambiguity for namespace/struct/fnRight now, 'namespace', 'struct', and 'fn' are used both as parser string literals and as members of RESERVED_DEF_WORDS. This creates two competing lexer matchers for the same lexeme. Depending on ANTLR/target version details, the lexer may emit either T__N or RESERVED_DEF_WORDS for these words. That makes the grammar brittle and can cause hard-to-debug mismatches if rule ordering changes.
Two safe paths, pick one:
- Preferred: introduce dedicated keyword tokens and reference them in parser rules; remove these words from the RESERVED_* groups to avoid duplication.
- Minimal: keep string literals in parser rules and remove these words from RESERVED_DEF_WORDS.
Proposed fix (Preferred – introduces dedicated tokens and updates parser rules):
@@ -namespace returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval] -: 'namespace' id '{' funcDefs '}' { +namespace returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval] +: KW_NAMESPACE id '{' funcDefs '}' { @@ -funcDef returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval] -: 'fn' id '(' params ')' ret ';' { +funcDef returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval] +: KW_FN id '(' params ')' ret ';' { @@ -structDef returns [std::shared_ptr<spider::tdl::parser::ast::StructSpec> retval] -: 'struct' id '{' namedVarList (',')? '}' ';' { +structDef returns [std::shared_ptr<spider::tdl::parser::ast::StructSpec> retval] +: KW_STRUCT id '{' namedVarList (',')? '}' ';' {Add keyword tokens above ID (and after COMMENT), and remove duplicates from RESERVED_DEF_WORDS:
@@ RESERVED_DEF_WORDS -: 'class' -| 'struct' +: 'class' | 'enum' | 'union' | 'interface' -| 'namespace' +/* 'namespace' handled by KW_NAMESPACE */ | 'using' | 'typedef' | 'decltype' | 'def' | 'dataclass' -| 'fn' +/* 'fn' handled by KW_FN */ | 'let' | 'module' | 'import' | 'template' | 'requires' | 'concept' ; + +// Dedicated keyword tokens referenced in parser rules +KW_NAMESPACE: 'namespace'; +KW_STRUCT: 'struct'; +KW_FN: 'fn';If you prefer the minimal path, keep the parser rules as-is and only delete 'namespace'/'struct'/'fn' from RESERVED_DEF_WORDS.
Action: please pick an approach and re-run/extend tests accordingly.
To double-check there’s no remaining overlap after your change, run the overlap detector (same idea as the previous comment’s script) and ensure the intersection for these three words is empty. Would you like me to paste a repo-friendly script again?
🧹 Nitpick comments (4)
src/spider/tdl/parser/TaskDefLang.g4 (4)
284-302: Reserved control-flow set looks solid; add guard tests for substringsCoverage is good and the placement before ID is correct. Please add lexer/parse tests to assert that:
- Keywords themselves (e.g., if, return, while) are not lexed as ID.
- Substring identifiers (e.g., ifCond, returnValue, whileLoop) are lexed as ID.
I can draft these tests if helpful.
359-368: Error-handling keywords set LGTMReasonable coverage for both Python and C++ families. No conflicts with existing parser rules. Consider adding a tiny negative test (e.g., “namespace try { fn x(); }” should report mismatched input 'try' expecting ID).
378-402: Optional: consider rounding out the “other” set for C++20/23If you want closer parity with modern C++, consider adding:
- export (modules), explicit, mutable, thread_local, alignas, alignof
- co_await, co_return, co_yield
Not blocking; just future-proofing. I can open a follow-up PR if you agree.
304-323: Keep keyword lists maintainableConsider alphabetising each RESERVED_* list and adding a short comment noting the intended source languages (Python, C/C++, general). This eases diff review and future additions. Non-blocking.
Also applies to: 325-357
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/spider/tdl/parser/TaskDefLang.g4(1 hunks)src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/spider/tdl/parser/antlr_generated/TaskDefLangLexer.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: non-storage-unit-tests (ubuntu-22.04)
- GitHub Check: non-storage-unit-tests (ubuntu-24.04)
- GitHub Check: lint
🔇 Additional comments (3)
src/spider/tdl/parser/TaskDefLang.g4 (3)
370-376: Access-control keywords set LGTMNo parser usage; purely to block IDs. Looks good.
404-405: Good precedence note and placement of IDThe comment is accurate and the placement is correct. If you adopt dedicated KW_* tokens as suggested, update the comment to reflect “RESERVED_* and KW_*” precede ID.
304-323: No TDL source files detected; manual verification requiredI scanned the repository for any
*.tdl(or*.TDL) files and didn’t find any, so I can’t automatically verify that the newly reserved keywords aren’t used as identifiers in your TDL inputs. Please confirm where your TDL source files live (or share representative examples) and manually check that no existing identifiers collide with the newly reserved words.
- No
.tdl/.TDLfiles located under the repo root or subdirectories- Provide the path to your TDL examples (or include sample files) for automated scanning
- Once located, rerun the provided script or manually inspect identifiers against the list:
int, char, string, str, void, class, enum, union, interface, operator, override, decltype, concept, requires, module, template
| RESERVED_TYPE_AND_STORAGE_WORDS | ||
| : 'int' | ||
| | 'long' | ||
| | 'short' | ||
| | 'byte' | ||
| | 'signed' | ||
| | 'unsigned' | ||
| | 'char' | ||
| | 'string' | ||
| | 'str' | ||
| | 'float' | ||
| | 'double' | ||
| | 'bool' | ||
| | 'void' | ||
| | 'const' | ||
| | 'static' | ||
| | 'auto' | ||
| | 'volatile' | ||
| | 'register' | ||
| | 'final' | ||
| | 'abstract' | ||
| | 'this' | ||
| | 'self' | ||
| | 'new' | ||
| | 'delete' | ||
| | 'null' | ||
| | 'nullptr' | ||
| | 'None' | ||
| | 'true' | ||
| | 'false' | ||
| | 'True' | ||
| | 'False' | ||
| ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Resolve potential mismatch for float/double/bool between RESERVED_ and builtinType literals*
Similar duplication exists for 'float', 'double', and 'bool': they appear as string literals in builtinType and inside RESERVED_TYPE_AND_STORAGE_WORDS. If the lexer emits RESERVED_TYPE_AND_STORAGE_WORDS for these, builtinType branches expecting T__N ('float', etc.) won’t match. While this may “work by accident” today, it’s fragile.
Preferred fix: introduce dedicated tokens and reference them in builtinType; remove duplicates from RESERVED_TYPE_AND_STORAGE_WORDS.
@@
builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
@@
-| 'float' {
+| KW_FLOAT {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Float,
loc
);
}
- | 'double' {
+| KW_DOUBLE {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Double,
loc
);
}
-| 'bool' {
+| KW_BOOL {
$retval = spider::tdl::parser::ast::Bool::create(loc);
}
@@
RESERVED_TYPE_AND_STORAGE_WORDS
-: 'int'
+: 'int'
| 'long'
@@
-| 'float'
-| 'double'
-| 'bool'
+/* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
@@
;
+
+// Dedicated tokens for types referenced in parser rules
+KW_FLOAT: 'float';
+KW_DOUBLE: 'double';
+KW_BOOL: 'bool';If you intentionally want to keep them only as literals, delete them from RESERVED_TYPE_AND_STORAGE_WORDS instead to remove the ambiguity.
Please verify that tests covering float/double/bool parsing still pass after this change.
🏁 Script executed:
#!/bin/bash
# Display the builtinType parser rule (to inspect float/double/bool usage)
rg -n 'builtinType returns' -C20 src/spider/tdl/parser/TaskDefLang.g4
# Check for existing token definitions for dedicated float/double/bool tokens
rg -n 'KW_FLOAT' -g 'src/spider/tdl/parser/TaskDefLang.g4'
rg -n 'KW_DOUBLE' -g 'src/spider/tdl/parser/TaskDefLang.g4'
rg -n 'KW_BOOL' -g 'src/spider/tdl/parser/TaskDefLang.g4'
# Display the RESERVED_TYPE_AND_STORAGE_WORDS lexer rule to confirm duplicates
rg -n 'RESERVED_TYPE_AND_STORAGE_WORDS' -C20 src/spider/tdl/parser/TaskDefLang.g4Length of output: 2259
Deduplicate float/double/bool tokens in TaskDefLang.g4
To avoid the lexer capturing ‘float’, ‘double’ and ‘bool’ as part of RESERVED_TYPE_AND_STORAGE_WORDS—thus blocking your builtinType parser rule from ever seeing those literals—introduce explicit tokens for these keywords and remove them from the reserved-word set.
Affected locations:
- src/spider/tdl/parser/TaskDefLang.g4: builtinType parser rule (currently lines 258–…)
- src/spider/tdl/parser/TaskDefLang.g4: RESERVED_TYPE_AND_STORAGE_WORDS lexer rule (lines 325–337)
- Add new lexer definitions for KW_FLOAT, KW_DOUBLE, KW_BOOL
Proposed diff:
--- a/src/spider/tdl/parser/TaskDefLang.g4
+++ b/src/spider/tdl/parser/TaskDefLang.g4
@@ builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
- | 'float' {
+ | KW_FLOAT {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Float,
loc
);
}
- | 'double' {
+ | KW_DOUBLE {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Double,
loc
);
}
- | 'bool' {
+ | KW_BOOL {
$retval = spider::tdl::parser::ast::Bool::create(loc);
}
@@ RESERVED_TYPE_AND_STORAGE_WORDS
- | 'float'
- | 'double'
- | 'bool'
+ /* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
@@
;
+
+// Dedicated tokens for float/double/bool
+KW_FLOAT : 'float';
+KW_DOUBLE: 'double';
+KW_BOOL : 'bool';Please verify that all existing tests for parsing float/double/bool still pass after applying this change.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RESERVED_TYPE_AND_STORAGE_WORDS | |
| : 'int' | |
| | 'long' | |
| | 'short' | |
| | 'byte' | |
| | 'signed' | |
| | 'unsigned' | |
| | 'char' | |
| | 'string' | |
| | 'str' | |
| | 'float' | |
| | 'double' | |
| | 'bool' | |
| | 'void' | |
| | 'const' | |
| | 'static' | |
| | 'auto' | |
| | 'volatile' | |
| | 'register' | |
| | 'final' | |
| | 'abstract' | |
| | 'this' | |
| | 'self' | |
| | 'new' | |
| | 'delete' | |
| | 'null' | |
| | 'nullptr' | |
| | 'None' | |
| | 'true' | |
| | 'false' | |
| | 'True' | |
| | 'False' | |
| ; | |
| // --- In the parser rule for builtinType (around line 258) --- | |
| builtinType | |
| returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval] | |
| : /* … other type cases … */ | |
| // replace the string‐literal cases for float/double/bool with the new tokens | |
| | KW_FLOAT { | |
| $retval = spider::tdl::parser::ast::Float::create( | |
| spider::tdl::parser::ast::FloatSpec::Float, | |
| loc | |
| ); | |
| } | |
| | KW_DOUBLE { | |
| $retval = spider::tdl::parser::ast::Float::create( | |
| spider::tdl::parser::ast::FloatSpec::Double, | |
| loc | |
| ); | |
| } | |
| | KW_BOOL { | |
| $retval = spider::tdl::parser::ast::Bool::create(loc); | |
| } | |
| ; | |
| // --- In the lexer rule RESERVED_TYPE_AND_STORAGE_WORDS (around line 325) --- | |
| RESERVED_TYPE_AND_STORAGE_WORDS | |
| : 'int' | |
| | 'long' | |
| | 'short' | |
| | 'byte' | |
| | 'signed' | |
| | 'unsigned' | |
| | 'char' | |
| | 'string' | |
| | 'str' | |
| | 'const' | |
| | 'static' | |
| | 'auto' | |
| | 'volatile' | |
| | 'register' | |
| | 'final' | |
| | 'abstract' | |
| | 'this' | |
| | 'self' | |
| | 'new' | |
| | 'delete' | |
| | 'null' | |
| | 'nullptr' | |
| | 'None' | |
| | 'true' | |
| | 'false' | |
| | 'True' | |
| | 'False' | |
| - | 'float' | |
| - | 'double' | |
| /* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */ | |
| ; | |
| // Dedicated tokens for the floating-point and boolean keywords | |
| KW_FLOAT : 'float'; | |
| KW_DOUBLE: 'double'; | |
| KW_BOOL : 'bool'; |
Description
References
This PR depends on #211.
Overview
This PR adds reserved keywords in the grammar file so that they won't be parsed as ID. The reserved keywords cover most of the common keywords used by programming languages. We're especially targeting Python and C++ since they're the first two code-gen targets. We can enrich this keyword set in future PRs when needed.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Tests