feat: support numeric cpp udls (swev-id: sphinx-doc__sphinx-7590)#98
feat: support numeric cpp udls (swev-id: sphinx-doc__sphinx-7590)#98casey-brooks wants to merge 5 commits intosphinx-doc__sphinx-7590from
Conversation
Local Validation
|
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for the improvement—see my comment about the missing regression coverage.
|
Summary:
Tests:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for adding the regression coverage and tightening the parser logic—much appreciated. One more fix needed before this can merge.
sphinx/domains/cpp.py
Outdated
| import re | ||
| from typing import ( | ||
| Any, Callable, Dict, Generator, Iterator, List, Tuple, Type, TypeVar, Union, Optional | ||
| Any, Callable, Dict, Generator, Iterator, List, Tuple, TypeVar, Union, Optional |
There was a problem hiding this comment.
[major] We still reference typing.Type later in this module (e.g. the # type: Type[TextElement] comment in CPPExprRole). Dropping it from the import list makes Type an undefined name, which flake8/pyflakes will flag as F821. Please keep Type in the imports.
noa-lucent
left a comment
There was a problem hiding this comment.
Thanks for following up — the Type import and the new coverage look good. We just need to loosen the literal-suffix guard to accept identifiers like _9bad (allowed by the C++ grammar) and adjust the fixtures accordingly.
sphinx/domains/cpp.py
Outdated
| if self.match(regex): | ||
| suffix_start = self.pos | ||
| while self.current_char in 'uUlLfF': | ||
| self.pos += 1 |
There was a problem hiding this comment.
[major] The new guard that rejects _ followed by a digit is too strict. A user-defined literal suffix is any C++ identifier (cf. [lex.name]), so _9bad is perfectly legal — identifiers can contain digits after the first character. With the current logic we would reject literals such as 3.0_1, which the standard allows. Please drop the next_char.isalpha() restriction (or extend it to include digits) and rely on _udl_suffix_re to do the right thing.
| ===================================== | ||
|
|
||
| .. cpp:var:: constexpr auto bad_digit_prefix = 7_9bad | ||
|
|
There was a problem hiding this comment.
[major] This case should not be considered invalid. _9bad is still a valid identifier (digits are allowed after the first character), so 7_9bad is a legal user-defined literal in C++. If we exercise this input against a conforming compiler it parses without issue. Please move this example into the positive suite (or drop it) and adjust the parser so we accept it.
| app.builder.build_all() | ||
| ws = filter_warnings(warning, "udl-negative") | ||
| assert len(ws) == 4 | ||
|
|
There was a problem hiding this comment.
[major] Once _9bad is treated as a valid literal the failure count should drop. Please update the negative fixture expectations (likely len(ws) == 3).
|
Summary:
Tests:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Appreciate the expanded coverage and literal-kind handling—this is close. The suffix guard still blocks legal identifiers like _1q, so the parser and tests need one more tweak per my comments.
sphinx/domains/cpp.py
Outdated
| if (not consumed_builtin) and (not self.eof): | ||
| if self.current_char == '_': | ||
| next_char = self.definition[self.pos + 1:self.pos + 2] | ||
| if next_char and (next_char.isalpha() or next_char == '_'): |
There was a problem hiding this comment.
[major] The guard here still rejects valid user-defined suffixes. _udl_suffix_re already enforces the C++ identifier grammar (first character letter or _, remainder alnum/_). A suffix such as _1q therefore is well-formed: the first character is _ and digits are permitted afterwards. By requiring next_char.isalpha() you prohibit those legal cases. Please drop the extra check (or extend it to allow digits) and rely on _udl_suffix_re to decide validity.
|
|
||
| .. cpp:var:: constexpr auto bad_order = 1llu_qs | ||
|
|
||
| .. cpp:var:: constexpr auto bad_digit_start = 1e-34_1q |
There was a problem hiding this comment.
[major] _1q is a valid identifier (the first character is _, subsequent characters may include digits), so 1e-34_1q should parse cleanly. This fixture should move to the positive set once the parser allows it.
| def test_build_domain_cpp_udl_negative(app, status, warning): | ||
| app.builder.build_all() | ||
| ws = filter_warnings(warning, "udl-negative") | ||
| assert len(ws) == 4 |
There was a problem hiding this comment.
[major] Once _1q is accepted, this suite should only surface the three truly invalid declarations (bad_mixed, bad_order, bad_ws). Please adjust the expected warning count accordingly.
|
Summary:
Tests:
|
noa-lucent
left a comment
There was a problem hiding this comment.
Excellent follow-through. The parser now leans entirely on the suffix regexes, and the expanded fixtures/execution tests cover the allowed combinations well. Thanks for iterating.
Summary
_udl_suffix_reto capture valid suffix tokens while leaving character and string literal handling untouchedTesting
PYTHONPATH=/workspace/sphinx /workspace/sphinx/.venv/bin/sphinx-build -b html . _build -W(withcpp_id_attributes = ['inline']in the sampleconf.py)/workspace/sphinx/.venv/bin/flake8 sphinx/domains/cpp.py --extend-ignore=F401Observed failure
Notes