Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,8 @@ For documentation, the schema allows for user comments by ignoring any text prec
<variable-name>:<variable-pattern>
```

* `variable-name` may contain any alphanumeric characters, but may not be the reserved names
`delimiters` or `timestamp`.
* `variable-name` may contain any printable character except space and colon, but may not be the
reserved names `delimiters` or `timestamp`.
Comment on lines +32 to +33
Copy link

@coderabbitai coderabbitai bot Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Améliore la clarté concernant l'exclusion des caractères d'espacement.

Le PR stipule « caractères d'espacement » au pluriel, mais la documentation dit seulement « space ». Bien que techniquement exact (les caractères imprimables n'incluent pas les caractères de contrôle comme tab ou newline), cela pourrait être plus explicite.

Pour plus de clarté, envisage de reformuler comme suit :

-* `variable-name` may contain any printable character except space and colon, but may not be the
-  reserved names `delimiters` or `timestamp`.
+* `variable-name` may contain any printable character except whitespace characters and colon, but may not be the
+  reserved names `delimiters` or `timestamp`.

Cela s'aligne mieux avec la description du PR et reflète que TOUS les caractères d'espacement (pas seulement l'espace) sont exclus.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
* `variable-name` may contain any printable character except space and colon, but may not be the
reserved names `delimiters` or `timestamp`.
* `variable-name` may contain any printable character except whitespace characters and colon, but may not be the
reserved names `delimiters` or `timestamp`.
🤖 Prompt for AI Agents
In docs/schema.md around lines 32 to 33, the sentence currently says "space"
while the intent is to exclude all whitespace characters; update the wording to
explicitly say "whitespace characters (e.g., space, tab, newline, etc.)" or "any
whitespace character" so it matches the PR description and clarifies that all
whitespace—not just the literal space character—is forbidden for
`variable-name`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SharafMohamed this small wording change is worthwhile imo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

* `variable-pattern` is a regular expression using the supported
[syntax](#regular-expression-syntax).

Expand Down
8 changes: 3 additions & 5 deletions src/log_surgeon/SchemaParser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -472,14 +472,12 @@ auto SchemaParser::add_lexical_rules() -> void {
// RegexASTGroupByte default constructs to an m_negate group, so we add the only two characters
// which can't be in a comment, the newline and carriage return characters as they signify the
// end of the comment.
unique_ptr<RegexASTGroupByte> comment_characters = make_unique<RegexASTGroupByte>();
auto comment_characters{make_unique<RegexASTGroupByte>()};
comment_characters->add_literal('\r');
comment_characters->add_literal('\n');
add_token_group("CommentCharacters", std::move(comment_characters));
add_token_group("IdentifierCharacters", make_unique<RegexASTGroupByte>('a', 'z'));
add_token_group("IdentifierCharacters", make_unique<RegexASTGroupByte>('A', 'Z'));
add_token_group("IdentifierCharacters", make_unique<RegexASTGroupByte>('0', '9'));
add_token("IdentifierCharacters", '_');
add_token_group("IdentifierCharacters", make_unique<RegexASTGroupByte>('!', ':' - 1));
add_token_group("IdentifierCharacters", make_unique<RegexASTGroupByte>(':' + 1, '~'));
}

auto SchemaParser::add_productions() -> void {
Expand Down
79 changes: 59 additions & 20 deletions tests/test-schema.cpp
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#include <string>
#include <string_view>
#include <utility>
#include <vector>

#include <log_surgeon/finite_automata/RegexAST.hpp>
#include <log_surgeon/Schema.hpp>
Expand All @@ -16,8 +18,10 @@

using log_surgeon::Schema;
using log_surgeon::SchemaVarAST;
using std::pair;
using std::string;
using std::string_view;
using std::vector;

using RegexASTCatByte
= log_surgeon::finite_automata::RegexASTCat<log_surgeon::finite_automata::ByteNfaState>;
Expand Down Expand Up @@ -155,29 +159,64 @@ TEST_CASE("add_invalid_var_priorities", "[Schema]") {

/**
* @ingroup unit_tests_schema
* @brief Create a schema, adding a variable and capture group name with an underscore.
* @brief Create a schema, adding a variable and capture group name with various character
* combinations that should succeed.
*/
TEST_CASE("add_underscore_name", "[Schema]") {
TEST_CASE("add_various_correct_variable_names", "[Schema]") {
Schema schema;
string const var_name = "var_name";
string const cap_name = "cap_name";
string const var_schema = var_name + string(":a(?<") + cap_name + string(">_)b");
schema.add_variable(string_view(var_schema), -1);

auto const schema_ast = schema.release_schema_ast_ptr();
REQUIRE(schema_ast->m_schema_vars.size() == 1);
REQUIRE(schema.release_schema_ast_ptr()->m_schema_vars.empty());

auto& schema_var_ast_ptr = schema_ast->m_schema_vars.at(0);
REQUIRE(nullptr != schema_var_ast_ptr);
auto& schema_var_ast = dynamic_cast<SchemaVarAST&>(*schema_var_ast_ptr);
REQUIRE(var_name == schema_var_ast.m_name);
vector<pair<string, string>> name_pairs {
{"varName123","capName123"},
{"var_name","cap_name"},
{"var@name","cap@name"},
{"var.name","cap.name"},
{"var-name","cap-name"},
{"var~name","cap~name"},
{"var\"name\"","cap\"name\""},
{"var'name'","cap'name'"}
};

for (auto const& [var_name, cap_name] : name_pairs) {
string const var_schema{var_name + string(":a(?<") + cap_name + string(">_)b")};
schema.add_variable(string_view(var_schema), -1);

auto const schema_ast{schema.release_schema_ast_ptr()};
REQUIRE(schema_ast->m_schema_vars.size() == 1);
REQUIRE(schema.release_schema_ast_ptr()->m_schema_vars.empty());

auto& schema_var_ast_ptr{schema_ast->m_schema_vars.at(0)};
REQUIRE(nullptr != schema_var_ast_ptr);
auto& schema_var_ast{dynamic_cast<SchemaVarAST&>(*schema_var_ast_ptr)};
REQUIRE(var_name == schema_var_ast.m_name);

REQUIRE_NOTHROW([&]() -> void {
(void)dynamic_cast<RegexASTCatByte&>(*schema_var_ast.m_regex_ptr);
}());

auto const captures{schema_var_ast.m_regex_ptr->get_subtree_positive_captures()};
REQUIRE(captures.size() == 1);
REQUIRE(cap_name == captures.at(0)->get_name());
}
}

REQUIRE_NOTHROW([&]() -> void {
(void)dynamic_cast<RegexASTCatByte&>(*schema_var_ast.m_regex_ptr);
}());
/**
* @ingroup unit_tests_schema
* @brief Create a schema, adding a variable and capture group name with various character
* combinations that should throw errors.
*/
TEST_CASE("add_various_incorrect_variable_names", "[Schema]") {
Schema schema;

auto const captures{schema_var_ast.m_regex_ptr->get_subtree_positive_captures()};
REQUIRE(captures.size() == 1);
REQUIRE(cap_name == captures.at(0)->get_name());
vector<pair<string, string>> name_pairs {
{"var name","validName"},
{"validName","cap:name"},
{"validName","cap name"},
};

for (auto const& [var_name, cap_name] : name_pairs) {
CAPTURE(var_name);
CAPTURE(cap_name);
string const var_schema{var_name + string(":a(?<") + cap_name + string(">_)b")};
REQUIRE_THROWS(schema.add_variable(string_view(var_schema), -1));
}
}
Loading