Skip to content

Comments

feat: support numeric cpp udls (swev-id: sphinx-doc__sphinx-7590)#98

Open
casey-brooks wants to merge 5 commits intosphinx-doc__sphinx-7590from
feature/cpp-udl-support-7590
Open

feat: support numeric cpp udls (swev-id: sphinx-doc__sphinx-7590)#98
casey-brooks wants to merge 5 commits intosphinx-doc__sphinx-7590from
feature/cpp-udl-support-7590

Conversation

@casey-brooks
Copy link

@casey-brooks casey-brooks commented Dec 27, 2025

Summary

Testing

  • PYTHONPATH=/workspace/sphinx /workspace/sphinx/.venv/bin/sphinx-build -b html . _build -W (with cpp_id_attributes = ['inline'] in the sample conf.py)
  • /workspace/sphinx/.venv/bin/flake8 sphinx/domains/cpp.py --extend-ignore=F401

Observed failure

Warning, treated as error:
/workspace/sphinx/tmp_udl_demo/index.rst:6:Invalid C++ declaration: Expected identifier in nested name, got keyword: inline [error at 6]
  inline constexpr auto planck_constant = 6.62607015e-34q_J * 1q_s
  ------^

Notes

  • Only numeric user-defined literals are handled here; char and string literal suffixes remain out of scope for this change.

@casey-brooks casey-brooks requested a review from a team December 27, 2025 20:56
@casey-brooks
Copy link
Author

Local Validation

  • PYTHONPATH=/workspace/sphinx /workspace/sphinx/.venv/bin/sphinx-build -b html . _build -W (no warnings)
  • /workspace/sphinx/.venv/bin/flake8 sphinx/domains/cpp.py --extend-ignore=F401 (clean)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the improvement—see my comment about the missing regression coverage.

@casey-brooks
Copy link
Author

Summary:

  • update the C++ definition parser to consume numeric UDL identifiers only when adjacent and starting with a letter/underscore
  • add udl-numeric/udl-negative fixtures covering positive cases, builtin suffix regression, and malformed suffix warnings
  • add pytest coverage asserting zero warnings for udl-numeric and four warnings for udl-negative

Tests:

  • PYTHONPATH=/workspace/sphinx /workspace/sphinx/.venv/bin/pytest tests/test_domain_cpp.py -q
  • /workspace/sphinx/.venv/bin/flake8 sphinx/domains/cpp.py tests/test_domain_cpp.py
  • sphinx-build -b html tests/roots/test-domain-cpp _build/udl -W (fails on pre-existing roles2.rst duplicate)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the regression coverage and tightening the parser logic—much appreciated. One more fix needed before this can merge.

import re
from typing import (
Any, Callable, Dict, Generator, Iterator, List, Tuple, Type, TypeVar, Union, Optional
Any, Callable, Dict, Generator, Iterator, List, Tuple, TypeVar, Union, Optional

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] We still reference typing.Type later in this module (e.g. the # type: Type[TextElement] comment in CPPExprRole). Dropping it from the import list makes Type an undefined name, which flake8/pyflakes will flag as F821. Please keep Type in the imports.

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up — the Type import and the new coverage look good. We just need to loosen the literal-suffix guard to accept identifiers like _9bad (allowed by the C++ grammar) and adjust the fixtures accordingly.

if self.match(regex):
suffix_start = self.pos
while self.current_char in 'uUlLfF':
self.pos += 1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] The new guard that rejects _ followed by a digit is too strict. A user-defined literal suffix is any C++ identifier (cf. [lex.name]), so _9bad is perfectly legal — identifiers can contain digits after the first character. With the current logic we would reject literals such as 3.0_1, which the standard allows. Please drop the next_char.isalpha() restriction (or extend it to include digits) and rely on _udl_suffix_re to do the right thing.

=====================================

.. cpp:var:: constexpr auto bad_digit_prefix = 7_9bad

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] This case should not be considered invalid. _9bad is still a valid identifier (digits are allowed after the first character), so 7_9bad is a legal user-defined literal in C++. If we exercise this input against a conforming compiler it parses without issue. Please move this example into the positive suite (or drop it) and adjust the parser so we accept it.

app.builder.build_all()
ws = filter_warnings(warning, "udl-negative")
assert len(ws) == 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Once _9bad is treated as a valid literal the failure count should drop. Please update the negative fixture expectations (likely len(ws) == 3).

@casey-brooks
Copy link
Author

Summary:

  • gate builtin suffix consumption by literal kind before attempting UDL parsing
  • extend udl-numeric/udl-negative fixtures with hex-float, builtin-prefixed, and digit-start scenarios
  • document digit separator limitation and cover builtin+UDL rejection

Tests:

  • PYTHONPATH=/workspace/sphinx /workspace/sphinx/.venv/bin/pytest tests/test_domain_cpp.py -q
  • /workspace/sphinx/.venv/bin/flake8 sphinx/domains/cpp.py tests/test_domain_cpp.py
  • /workspace/sphinx/.venv/bin/sphinx-build -b html tests/roots/test-domain-cpp _build/udl -W (fails on existing roles2 duplicate declaration)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the expanded coverage and literal-kind handling—this is close. The suffix guard still blocks legal identifiers like _1q, so the parser and tests need one more tweak per my comments.

if (not consumed_builtin) and (not self.eof):
if self.current_char == '_':
next_char = self.definition[self.pos + 1:self.pos + 2]
if next_char and (next_char.isalpha() or next_char == '_'):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] The guard here still rejects valid user-defined suffixes. _udl_suffix_re already enforces the C++ identifier grammar (first character letter or _, remainder alnum/_). A suffix such as _1q therefore is well-formed: the first character is _ and digits are permitted afterwards. By requiring next_char.isalpha() you prohibit those legal cases. Please drop the extra check (or extend it to allow digits) and rely on _udl_suffix_re to decide validity.


.. cpp:var:: constexpr auto bad_order = 1llu_qs

.. cpp:var:: constexpr auto bad_digit_start = 1e-34_1q

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] _1q is a valid identifier (the first character is _, subsequent characters may include digits), so 1e-34_1q should parse cleanly. This fixture should move to the positive set once the parser allows it.

def test_build_domain_cpp_udl_negative(app, status, warning):
app.builder.build_all()
ws = filter_warnings(warning, "udl-negative")
assert len(ws) == 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[major] Once _1q is accepted, this suite should only surface the three truly invalid declarations (bad_mixed, bad_order, bad_ws). Please adjust the expected warning count accordingly.

@casey-brooks
Copy link
Author

Summary:

  • rely solely on _udl_suffix_re for UDL matching after builtin suffix consumption
  • promote digit-prefixed UDL suffixes into positive fixtures and refresh negatives
  • note valid digit-prefixed suffixes in docs while keeping whitespace/symbol failures

Tests:

  • PYTHONPATH=/workspace/sphinx /workspace/sphinx/.venv/bin/pytest tests/test_domain_cpp.py -q
  • /workspace/sphinx/.venv/bin/flake8 sphinx/domains/cpp.py tests/test_domain_cpp.py
  • /workspace/sphinx/.venv/bin/sphinx-build -b html tests/roots/test-domain-cpp _build/udl -W (fails on existing roles2 duplicate declaration)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(placeholder)

Copy link

@noa-lucent noa-lucent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent follow-through. The parser now leans entirely on the suffix regexes, and the expanded fixtures/execution tests cover the allowed combinations well. Thanks for iterating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants