Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 123 additions & 1 deletion src/spider/tdl/parser/TaskDefLang.g4
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,128 @@ builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
}
;

ID: [a-zA-Z_][a-zA-Z0-9_]* ;
SPACE: [ \t\r\n] -> skip ;
COMMENT: '//' (~[\r\n])* -> skip;

RESERVED_CONTROL_FLOW_WORDS
: 'if'
| 'else'
| 'elif'
| 'switch'
| 'case'
| 'default'
| 'match'
| 'while'
| 'for'
| 'loop'
| 'do'
| 'break'
| 'continue'
| 'return'
| 'goto'
| 'pass'
| 'yield'
;

RESERVED_DEF_WORDS
: 'class'
| 'struct'
| 'enum'
| 'union'
| 'interface'
| 'namespace'
| 'using'
| 'typedef'
| 'decltype'
| 'def'
| 'dataclass'
| 'fn'
| 'let'
| 'module'
| 'import'
| 'template'
| 'requires'
| 'concept'
;

RESERVED_TYPE_AND_STORAGE_WORDS
: 'int'
| 'long'
| 'short'
| 'byte'
| 'signed'
| 'unsigned'
| 'char'
| 'string'
| 'str'
| 'float'
| 'double'
| 'bool'
| 'void'
| 'const'
| 'static'
| 'auto'
| 'volatile'
| 'register'
| 'final'
| 'abstract'
| 'this'
| 'self'
| 'new'
| 'delete'
| 'null'
| 'nullptr'
| 'None'
| 'true'
| 'false'
| 'True'
| 'False'
;
Comment on lines +325 to +357
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Resolve potential mismatch for float/double/bool between RESERVED_ and builtinType literals*

Similar duplication exists for 'float', 'double', and 'bool': they appear as string literals in builtinType and inside RESERVED_TYPE_AND_STORAGE_WORDS. If the lexer emits RESERVED_TYPE_AND_STORAGE_WORDS for these, builtinType branches expecting T__N ('float', etc.) won’t match. While this may “work by accident” today, it’s fragile.

Preferred fix: introduce dedicated tokens and reference them in builtinType; remove duplicates from RESERVED_TYPE_AND_STORAGE_WORDS.

@@
 builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
@@
-| 'float' {
+| KW_FLOAT {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Float,
             loc
     );
 }
- | 'double' {
+| KW_DOUBLE {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Double,
             loc
     );
 }
-| 'bool' {
+| KW_BOOL {
     $retval = spider::tdl::parser::ast::Bool::create(loc);
 }
@@
 RESERVED_TYPE_AND_STORAGE_WORDS
-: 'int'
+: 'int'
 | 'long'
@@
-| 'float'
-| 'double'
-| 'bool'
+/* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
@@
 ;
+
+// Dedicated tokens for types referenced in parser rules
+KW_FLOAT: 'float';
+KW_DOUBLE: 'double';
+KW_BOOL: 'bool';

If you intentionally want to keep them only as literals, delete them from RESERVED_TYPE_AND_STORAGE_WORDS instead to remove the ambiguity.

Please verify that tests covering float/double/bool parsing still pass after this change.


🏁 Script executed:

#!/bin/bash
# Display the builtinType parser rule (to inspect float/double/bool usage)
rg -n 'builtinType returns' -C20 src/spider/tdl/parser/TaskDefLang.g4

# Check for existing token definitions for dedicated float/double/bool tokens
rg -n 'KW_FLOAT' -g 'src/spider/tdl/parser/TaskDefLang.g4'
rg -n 'KW_DOUBLE' -g 'src/spider/tdl/parser/TaskDefLang.g4'
rg -n 'KW_BOOL' -g 'src/spider/tdl/parser/TaskDefLang.g4'

# Display the RESERVED_TYPE_AND_STORAGE_WORDS lexer rule to confirm duplicates
rg -n 'RESERVED_TYPE_AND_STORAGE_WORDS' -C20 src/spider/tdl/parser/TaskDefLang.g4

Length of output: 2259


Deduplicate float/double/bool tokens in TaskDefLang.g4

To avoid the lexer capturing ‘float’, ‘double’ and ‘bool’ as part of RESERVED_TYPE_AND_STORAGE_WORDS—thus blocking your builtinType parser rule from ever seeing those literals—introduce explicit tokens for these keywords and remove them from the reserved-word set.

Affected locations:

  • src/spider/tdl/parser/TaskDefLang.g4: builtinType parser rule (currently lines 258–…)
  • src/spider/tdl/parser/TaskDefLang.g4: RESERVED_TYPE_AND_STORAGE_WORDS lexer rule (lines 325–337)
  • Add new lexer definitions for KW_FLOAT, KW_DOUBLE, KW_BOOL

Proposed diff:

--- a/src/spider/tdl/parser/TaskDefLang.g4
+++ b/src/spider/tdl/parser/TaskDefLang.g4
@@ builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
- | 'float' {
+ | KW_FLOAT {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Float,
             loc
     );
 }
- | 'double' {
+ | KW_DOUBLE {
     $retval = spider::tdl::parser::ast::Float::create(
             spider::tdl::parser::ast::FloatSpec::Double,
             loc
     );
 }
- | 'bool' {
+ | KW_BOOL {
     $retval = spider::tdl::parser::ast::Bool::create(loc);
 }
@@ RESERVED_TYPE_AND_STORAGE_WORDS
- | 'float'
- | 'double'
- | 'bool'
+ /* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
@@
 ;
+
+// Dedicated tokens for float/double/bool
+KW_FLOAT : 'float';
+KW_DOUBLE: 'double';
+KW_BOOL  : 'bool';

Please verify that all existing tests for parsing float/double/bool still pass after applying this change.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RESERVED_TYPE_AND_STORAGE_WORDS
: 'int'
| 'long'
| 'short'
| 'byte'
| 'signed'
| 'unsigned'
| 'char'
| 'string'
| 'str'
| 'float'
| 'double'
| 'bool'
| 'void'
| 'const'
| 'static'
| 'auto'
| 'volatile'
| 'register'
| 'final'
| 'abstract'
| 'this'
| 'self'
| 'new'
| 'delete'
| 'null'
| 'nullptr'
| 'None'
| 'true'
| 'false'
| 'True'
| 'False'
;
// --- In the parser rule for builtinType (around line 258) ---
builtinType
returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval]
: /* … other type cases … */
// replace the string‐literal cases for float/double/bool with the new tokens
| KW_FLOAT {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Float,
loc
);
}
| KW_DOUBLE {
$retval = spider::tdl::parser::ast::Float::create(
spider::tdl::parser::ast::FloatSpec::Double,
loc
);
}
| KW_BOOL {
$retval = spider::tdl::parser::ast::Bool::create(loc);
}
;
// --- In the lexer rule RESERVED_TYPE_AND_STORAGE_WORDS (around line 325) ---
RESERVED_TYPE_AND_STORAGE_WORDS
: 'int'
| 'long'
| 'short'
| 'byte'
| 'signed'
| 'unsigned'
| 'char'
| 'string'
| 'str'
| 'const'
| 'static'
| 'auto'
| 'volatile'
| 'register'
| 'final'
| 'abstract'
| 'this'
| 'self'
| 'new'
| 'delete'
| 'null'
| 'nullptr'
| 'None'
| 'true'
| 'false'
| 'True'
| 'False'
- | 'float'
- | 'double'
/* 'float'|'double'|'bool' handled by KW_FLOAT|KW_DOUBLE|KW_BOOL */
;
// Dedicated tokens for the floating-point and boolean keywords
KW_FLOAT : 'float';
KW_DOUBLE: 'double';
KW_BOOL : 'bool';


RESERVED_ERROR_HANDLING_WORDS
: 'try'
| 'catch'
| 'throw'
| 'throws'
| 'raise'
| 'finally'
| 'assert'
| 'except'
;

RESERVED_ACCESS_CONTROL_WORDS
: 'public'
| 'private'
| 'protected'
| 'internal'
| 'friend'
;

RESERVED_OTHER_WORDS
: 'operator'
| 'inline'
| 'virtual'
| 'override'
| 'extern'
| 'sizeof'
| 'await'
| 'async'
| 'with'
| 'global'
| 'nonlocal'
| 'and'
| 'or'
| 'not'
| 'in'
| 'is'
| 'del'
| 'lambda'
| 'from'
| 'as'
| 'constexpr'
| 'nodiscard'
| 'noexcept'
;

// ID must be defined after RESERVED_* so that the reserved words get higher precedence.
ID: [a-zA-Z_][a-zA-Z0-9_]* ;
Comment on lines +404 to +405
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Good: ID placed after RESERVED_ to ensure precedence. Add guard tests.*

The placement and explanatory comment are correct. To avoid regressions, add unit tests asserting that representative reserved words (e.g., return, class, True) are not lexed as ID, while prefixes/suffixes like returnValue, className, isTrue are lexed as ID.

You can run this repository‑friendly script to detect overlaps between reserved words and parser literals (helps catch the 'struct'/'namespace'/'fn' conflict):

Expected (after fixing): overlap should be empty or limited to literals that are not used as parser tokens. If you still see 'namespace', 'struct', or 'fn' in the overlap, please adjust as suggested above.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4

python - << 'PY'
import re, sys, pathlib
p = pathlib.Path(sys.argv[1]).read_text()

# Collect reserved words (the right-hand side literals in RESERVED_* rules)
reserved = set(re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S))
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved], []))

# Collect all single-quoted literals in the grammar
literals = set(re.findall(r"'([^']+)'", p))

# Intersection shows potential lexer/parser literal conflicts
overlap = sorted(reserved & literals)
print("Overlap between RESERVED_* and parser literals:")
for s in overlap:
    print(" -", s)
PY
"$GRAMMAR"

Length of output: 186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4

python3 - "$GRAMMAR" << 'PY'
import re, pathlib, sys

# Read the grammar file
p = pathlib.Path(sys.argv[1]).read_text()

# Extract RESERVED_* rule literals
reserved_blocks = re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S)
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved_blocks], []))

# Extract all single-quoted literals
literals = set(re.findall(r"'([^']+)'", p))

# Compute overlap
overlap = sorted(reserved & literals)

print("Overlap between RESERVED_* and parser literals:")
for word in overlap:
    print(" -", word)
PY

Length of output: 931


Address Reserved vs Literal Token Conflicts in Grammar

The overlap detection script reveals that numerous reserved words (e.g. return, class, namespace, struct, fn, etc.) are still being lexed as implicit parser literals rather than as your explicit RESERVED_* tokens. This will cause reserved keywords to never match their intended token rules.

• File requiring fixes:

  • src/spider/tdl/parser/TaskDefLang.g4
    – Replace every string literal occurrence of a reserved word in parser rules with its corresponding RESERVED_* token.
    – Remove or refactor implicit literals for keywords so they no longer generate T__N tokens before your named lexer rules.
    • After correcting the grammar, re-run the overlap-detection script to confirm no reserved words intersect with literals.
    • Retain the suggestion to add guard tests ensuring that:
  • Representative reserved words (e.g., return, class, True) are not recognized as ID.
  • Identifiers containing those words as prefixes or suffixes (e.g., returnValue, className, isTrue) still lex as ID.
🤖 Prompt for AI Agents
In src/spider/tdl/parser/TaskDefLang.g4 around lines 386-387, parser rules still
include string literal keywords which get turned into implicit T__N tokens and
conflict with your explicit RESERVED_* lexer rules; update every parser rule
that currently uses a keyword string (e.g. "return", "class", "namespace",
"struct", "fn", "True", etc.) to use the corresponding RESERVED_* token name
instead, remove or refactor any remaining implicit literal occurrences so they
no longer produce T__N tokens (ensure all RESERVED_* lexer rules remain
positioned before the ID rule), re-run the overlap-detection script to confirm
no overlaps, and add unit tests asserting reserved words are not lexed as ID
while identifiers containing those words as substrings still lex as ID.

Loading