-
Notifications
You must be signed in to change notification settings - Fork 9
feat(tdl): Add reserved keywords in the grammar file. #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6027bfa
8ef26e5
6071e16
690b7f2
368fb79
14ef6c2
2274d2e
d062d91
3d6c535
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -278,6 +278,128 @@ builtinType returns [std::unique_ptr<spider::tdl::parser::ast::Node> retval] | |
| } | ||
| ; | ||
|
|
||
| ID: [a-zA-Z_][a-zA-Z0-9_]* ; | ||
| SPACE: [ \t\r\n] -> skip ; | ||
| COMMENT: '//' (~[\r\n])* -> skip; | ||
|
|
||
| RESERVED_CONTROL_FLOW_WORDS | ||
| : 'if' | ||
| | 'else' | ||
| | 'elif' | ||
| | 'switch' | ||
| | 'case' | ||
| | 'default' | ||
| | 'match' | ||
| | 'while' | ||
| | 'for' | ||
| | 'loop' | ||
| | 'do' | ||
| | 'break' | ||
| | 'continue' | ||
| | 'return' | ||
| | 'goto' | ||
| | 'pass' | ||
| | 'yield' | ||
| ; | ||
|
|
||
| RESERVED_DEF_WORDS | ||
| : 'class' | ||
| | 'struct' | ||
| | 'enum' | ||
| | 'union' | ||
| | 'interface' | ||
| | 'namespace' | ||
| | 'using' | ||
| | 'typedef' | ||
| | 'decltype' | ||
| | 'def' | ||
| | 'dataclass' | ||
| | 'fn' | ||
| | 'let' | ||
| | 'module' | ||
| | 'import' | ||
| | 'template' | ||
| | 'requires' | ||
| | 'concept' | ||
| ; | ||
|
|
||
| RESERVED_TYPE_AND_STORAGE_WORDS | ||
| : 'int' | ||
| | 'long' | ||
| | 'short' | ||
| | 'byte' | ||
| | 'signed' | ||
| | 'unsigned' | ||
| | 'char' | ||
| | 'string' | ||
| | 'str' | ||
| | 'float' | ||
| | 'double' | ||
| | 'bool' | ||
| | 'void' | ||
| | 'const' | ||
| | 'static' | ||
| | 'auto' | ||
| | 'volatile' | ||
| | 'register' | ||
| | 'final' | ||
| | 'abstract' | ||
| | 'this' | ||
| | 'self' | ||
| | 'new' | ||
| | 'delete' | ||
| | 'null' | ||
| | 'nullptr' | ||
| | 'None' | ||
| | 'true' | ||
| | 'false' | ||
| | 'True' | ||
| | 'False' | ||
| ; | ||
|
|
||
| RESERVED_ERROR_HANDLING_WORDS | ||
| : 'try' | ||
| | 'catch' | ||
| | 'throw' | ||
| | 'throws' | ||
| | 'raise' | ||
| | 'finally' | ||
| | 'assert' | ||
| | 'except' | ||
| ; | ||
|
|
||
| RESERVED_ACCESS_CONTROL_WORDS | ||
| : 'public' | ||
| | 'private' | ||
| | 'protected' | ||
| | 'internal' | ||
| | 'friend' | ||
| ; | ||
|
|
||
| RESERVED_OTHER_WORDS | ||
| : 'operator' | ||
| | 'inline' | ||
| | 'virtual' | ||
| | 'override' | ||
| | 'extern' | ||
| | 'sizeof' | ||
| | 'await' | ||
| | 'async' | ||
| | 'with' | ||
| | 'global' | ||
| | 'nonlocal' | ||
| | 'and' | ||
| | 'or' | ||
| | 'not' | ||
| | 'in' | ||
| | 'is' | ||
| | 'del' | ||
| | 'lambda' | ||
| | 'from' | ||
| | 'as' | ||
| | 'constexpr' | ||
| | 'nodiscard' | ||
| | 'noexcept' | ||
| ; | ||
|
|
||
| // ID must be defined after RESERVED_* so that the reserved words get higher precedence. | ||
| ID: [a-zA-Z_][a-zA-Z0-9_]* ; | ||
|
Comment on lines
+404
to
+405
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainGood: ID placed after RESERVED_ to ensure precedence. Add guard tests.* The placement and explanatory comment are correct. To avoid regressions, add unit tests asserting that representative reserved words (e.g., return, class, True) are not lexed as ID, while prefixes/suffixes like returnValue, className, isTrue are lexed as ID. You can run this repository‑friendly script to detect overlaps between reserved words and parser literals (helps catch the 'struct'/'namespace'/'fn' conflict): Expected (after fixing): overlap should be empty or limited to literals that are not used as parser tokens. If you still see 'namespace', 'struct', or 'fn' in the overlap, please adjust as suggested above. 🏁 Script executed: #!/bin/bash
set -euo pipefail
GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4
python - << 'PY'
import re, sys, pathlib
p = pathlib.Path(sys.argv[1]).read_text()
# Collect reserved words (the right-hand side literals in RESERVED_* rules)
reserved = set(re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S))
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved], []))
# Collect all single-quoted literals in the grammar
literals = set(re.findall(r"'([^']+)'", p))
# Intersection shows potential lexer/parser literal conflicts
overlap = sorted(reserved & literals)
print("Overlap between RESERVED_* and parser literals:")
for s in overlap:
print(" -", s)
PY
"$GRAMMAR"Length of output: 186 🏁 Script executed: #!/bin/bash
set -euo pipefail
GRAMMAR=src/spider/tdl/parser/TaskDefLang.g4
python3 - "$GRAMMAR" << 'PY'
import re, pathlib, sys
# Read the grammar file
p = pathlib.Path(sys.argv[1]).read_text()
# Extract RESERVED_* rule literals
reserved_blocks = re.findall(r"RESERVED_[A-Z_]+\s*:[^;]*;", p, flags=re.S)
reserved = set(sum([re.findall(r"'([^']+)'", block) for block in reserved_blocks], []))
# Extract all single-quoted literals
literals = set(re.findall(r"'([^']+)'", p))
# Compute overlap
overlap = sorted(reserved & literals)
print("Overlap between RESERVED_* and parser literals:")
for word in overlap:
print(" -", word)
PYLength of output: 931 Address Reserved vs Literal Token Conflicts in Grammar The overlap detection script reveals that numerous reserved words (e.g. • File requiring fixes:
🤖 Prompt for AI Agents |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Resolve potential mismatch for float/double/bool between RESERVED_ and builtinType literals*
Similar duplication exists for 'float', 'double', and 'bool': they appear as string literals in builtinType and inside RESERVED_TYPE_AND_STORAGE_WORDS. If the lexer emits RESERVED_TYPE_AND_STORAGE_WORDS for these, builtinType branches expecting T__N ('float', etc.) won’t match. While this may “work by accident” today, it’s fragile.
Preferred fix: introduce dedicated tokens and reference them in builtinType; remove duplicates from RESERVED_TYPE_AND_STORAGE_WORDS.
If you intentionally want to keep them only as literals, delete them from RESERVED_TYPE_AND_STORAGE_WORDS instead to remove the ambiguity.
Please verify that tests covering float/double/bool parsing still pass after this change.
🏁 Script executed:
Length of output: 2259
Deduplicate float/double/bool tokens in TaskDefLang.g4
To avoid the lexer capturing ‘float’, ‘double’ and ‘bool’ as part of RESERVED_TYPE_AND_STORAGE_WORDS—thus blocking your builtinType parser rule from ever seeing those literals—introduce explicit tokens for these keywords and remove them from the reserved-word set.
Affected locations:
Proposed diff:
Please verify that all existing tests for parsing float/double/bool still pass after applying this change.
📝 Committable suggestion