Skip to content

Conversation

@SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Nov 7, 2025

Reference

Description

  • Allow all printable characters to be used in variable/capture names with the exception of:
    • Whitespace characters
    • Colon

Validation Performed

  • Add unit-test for various valid character combinations.
  • Add unit-test for various invalid character combinations.

Summary by CodeRabbit

  • Bug Fixes

    • Updated variable-name validation: underscores are no longer allowed and the permitted character set now includes printable characters except space and colon.
  • Documentation

    • Schema docs updated to reflect the revised variable-name rules and reserved-name restrictions.
  • Tests

    • Expanded tests to cover multiple valid and invalid variable/capture name combinations.

@SharafMohamed SharafMohamed requested a review from a team as a code owner November 7, 2025 13:15
@coderabbitai
Copy link

coderabbitai bot commented Nov 7, 2025

Walkthrough

This PR changes SchemaParser's identifier-character rules to two printable-character ranges (excluding colon and underscore) and updates tests to iterate over multiple valid and invalid variable/capture name pairs, expanding coverage.

Changes

Cohort / File(s) Summary
SchemaParser core logic
src/log_surgeon/SchemaParser.cpp
Replaced local comment_characters construction with an auto-deduced unique_ptr initialization. Reworked IdentifierCharacters from discrete ranges + underscore to two broader ranges (from '!' up to but not including ':', and from just after ':' up to '~'), removing '_' and altering allowed symbol set.
Test suite expansion
tests/test-schema.cpp
Added includes (utility, vector) and using declarations; converted single-name tests into looped tests that validate multiple valid (variable name, capture name) pairs and added a new looped test for multiple invalid pairs.
Documentation
docs/schema.md
Updated description of allowed variable-name characters: now any printable character except space and colon (reserved names "delimiters" and "timestamp" still prohibited).

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Test
    participant SchemaParser
    participant CharRanges as "Allowed Char Ranges"

    Note over Test,SchemaParser: Test provides candidate identifier
    Test->>SchemaParser: parseIdentifier(candidate)
    SchemaParser->>CharRanges: check each character in candidate
    alt all chars within allowed ranges (and not ':')
        CharRanges-->>SchemaParser: valid
        SchemaParser-->>Test: accept identifier
    else any invalid char (including ':' or removed '_')
        CharRanges-->>SchemaParser: invalid
        SchemaParser-->>Test: reject / throw parse error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to the exact inclusive/exclusive endpoints of the two character ranges to ensure intended characters are allowed/excluded.
  • Verify removal of '_' is deliberate and covered by negative tests.
  • Confirm documentation (docs/schema.md) matches implementation and tests.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Improve variable naming (fixes #170)' clearly and specifically describes the main change in the changeset, which is expanding variable naming to allow printable characters except whitespace and colons.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6b94dd and a806c25.

📒 Files selected for processing (1)
  • docs/schema.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: build (ubuntu-24.04, release)
  • GitHub Check: build (ubuntu-22.04, debug)
  • GitHub Check: build (macos-15, release)
  • GitHub Check: build (ubuntu-24.04, debug)
  • GitHub Check: lint (macos-15)
  • GitHub Check: lint (ubuntu-24.04)
🔇 Additional comments (2)
docs/schema.md (2)

32-33: Considère clarifier que les règles s'appliquent aussi aux noms de captures nommées.

Le PR vise à permettre les caractères imprimables dans les noms de variables ET les noms de captures nommées, mais la documentation à ces lignes ne traite que des noms de variables. À la ligne 133, le document mentionne les captures nommées (ex. : (?<var>[01]+)), mais il n'y a pas de clarification que ces noms de captures suivent les mêmes règles de caractères.

Considère d'ajouter une note ou une phrase pour clarifier que les noms de captures nommées suivent les mêmes règles de caractères que les noms de variables.


32-33: Excellent changement de documentation pour la clarté et l'inclusivité.

L'expansion de la documentation pour permettre « tous les caractères imprimables sauf l'espace et le deux-points » améliore la clarté par rapport aux restrictions précédentes (probablement alphanumériques). Cela rend la documentation plus flexible et plus accueillante pour les cas d'usage divers.

Comment on lines +32 to +33
* `variable-name` may contain any printable character except space and colon, but may not be the
reserved names `delimiters` or `timestamp`.
Copy link

@coderabbitai coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Améliore la clarté concernant l'exclusion des caractères d'espacement.

Le PR stipule « caractères d'espacement » au pluriel, mais la documentation dit seulement « space ». Bien que techniquement exact (les caractères imprimables n'incluent pas les caractères de contrôle comme tab ou newline), cela pourrait être plus explicite.

Pour plus de clarté, envisage de reformuler comme suit :

-* `variable-name` may contain any printable character except space and colon, but may not be the
-  reserved names `delimiters` or `timestamp`.
+* `variable-name` may contain any printable character except whitespace characters and colon, but may not be the
+  reserved names `delimiters` or `timestamp`.

Cela s'aligne mieux avec la description du PR et reflète que TOUS les caractères d'espacement (pas seulement l'espace) sont exclus.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `variable-name` may contain any printable character except space and colon, but may not be the
reserved names `delimiters` or `timestamp`.
* `variable-name` may contain any printable character except whitespace characters and colon, but may not be the
reserved names `delimiters` or `timestamp`.
🤖 Prompt for AI Agents
In docs/schema.md around lines 32 to 33, the sentence currently says "space"
while the intent is to exclude all whitespace characters; update the wording to
explicitly say "whitespace characters (e.g., space, tab, newline, etc.)" or "any
whitespace character" so it matches the PR description and clarifies that all
whitespace—not just the literal space character—is forbidden for
`variable-name`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SharafMohamed this small wording change is worthwhile imo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Member

@davidlion davidlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to limit the characters allowed based on offline discussion.

Based on the offline discussion we'll need to update the title to be more clear on what characters we now allow in variable names.

Comment on lines +32 to +33
* `variable-name` may contain any printable character except space and colon, but may not be the
reserved names `delimiters` or `timestamp`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SharafMohamed this small wording change is worthwhile imo.

@davidlion davidlion changed the title feat: Improve variable naming. feat: Improve variable naming (fixes #170). Nov 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants