-
Notifications
You must be signed in to change notification settings - Fork 10
test: Add unit-tests for ReaderParser (fixes #183).
#184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds comprehensive ReaderParser unit tests (including wrap-around buffer scenarios), registers the new test in CMake, updates documentation references to separate buffer/file parser examples and add Reader Parser, and clarifies an inline comment about an even buffer size in Constants.hpp. Changes
Sequence Diagram(s)sequenceDiagram
participant T as Test Case
participant S as Schema
participant R as Reader
participant P as ReaderParser
rect rgb(220,240,255)
Note over T,S: Test constructs schema/patterns
end
T->>S: build Schema (patterns)
T->>R: initialize Reader with input buffer(s)
T->>P: instantiate ReaderParser with Schema
rect rgb(240,255,240)
Note over T,P: parse iterations (normal)
end
T->>P: parseChunk()
P->>R: request buffer data
R-->>P: provide bytes (tokens)
P-->>T: emit events & tokens
T->>T: validate token types/values
rect rgb(255,240,240)
Note over T,P: wrap-around scenario
end
T->>R: supply next buffer portion (wrap)
P->>R: read wrapped range
R-->>P: return wrapped bytes crossing boundary
P-->>T: emit events with capture positions (adjusted for wrap)
T->>T: validate capture positions and events
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
docs/doxygen/mainpage.dox(2 hunks)src/log_surgeon/Constants.hpp(1 hunks)tests/CMakeLists.txt(1 hunks)tests/test-reader-parser.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
src/log_surgeon/Constants.hpptests/test-reader-parser.cpp
🧠 Learnings (6)
📚 Learning: 2025-08-08T13:18:39.895Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:42-48
Timestamp: 2025-08-08T13:18:39.895Z
Learning: In y-scope/log-surgeon (C++), it is acceptable/preferred to keep `const` qualifiers on by-value function parameters to signal intent (e.g., in src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp). Do not suggest removing `const` from by-value parameters in future reviews.
Applied to files:
src/log_surgeon/Constants.hpp
📚 Learning: 2025-08-08T10:23:06.281Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:80-88
Timestamp: 2025-08-08T10:23:06.281Z
Learning: In y-scope/log-surgeon (C++), small function definitions are allowed to remain inline in headers. For src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp, do not suggest moving small methods like QueryInterpretation::append_variable_token to the .cpp for “consistency” in future reviews.
Applied to files:
src/log_surgeon/Constants.hpp
📚 Learning: 2025-08-15T01:04:03.681Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/ExpressionView.cpp:11-11
Timestamp: 2025-08-15T01:04:03.681Z
Learning: In y-scope/log-surgeon C++ codebase, prefer explicit includes in implementation files (.cpp) for types that are directly used, even when those headers are already transitively included through the corresponding header file. This makes dependencies explicit and improves maintainability.
Applied to files:
src/log_surgeon/Constants.hpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
Applied to files:
docs/doxygen/mainpage.doxtests/test-reader-parser.cpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.
Applied to files:
tests/test-reader-parser.cpp
🧬 Code graph analysis (1)
tests/test-reader-parser.cpp (2)
src/log_surgeon/finite_automata/PrefixTree.hpp (1)
PrefixTree(29-29)src/log_surgeon/LogParser.hpp (1)
symbol(73-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build (macos-15, debug)
- GitHub Check: build (macos-15, release)
- GitHub Check: lint (ubuntu-24.04)
- GitHub Check: lint (macos-15)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
tests/test-reader-parser.cpp(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>rather than!<expression>.
Files:
tests/test-reader-parser.cpp
🧠 Learnings (6)
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-08-15T01:04:03.681Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/ExpressionView.cpp:11-11
Timestamp: 2025-08-15T01:04:03.681Z
Learning: In y-scope/log-surgeon C++ codebase, prefer explicit includes in implementation files (.cpp) for types that are directly used, even when those headers are already transitively included through the corresponding header file. This makes dependencies explicit and improves maintainability.
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...
Applied to files:
tests/test-reader-parser.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.
Applied to files:
tests/test-reader-parser.cpp
🧬 Code graph analysis (1)
tests/test-reader-parser.cpp (1)
src/log_surgeon/LogParser.hpp (1)
symbol(73-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: lint (ubuntu-24.04)
- GitHub Check: lint (macos-15)
- GitHub Check: build (macos-15, release)
- GitHub Check: build (macos-15, debug)
🔇 Additional comments (3)
tests/test-reader-parser.cpp (3)
229-251: Well-designed test case!The test correctly validates variable matching without capture groups, ensuring exact matches are recognized and non-matches are treated as uncaught strings. The expected tokens properly alternate between matched variables and uncaught content.
271-365: Comprehensive wrap-around test implementation!This test thoroughly exercises buffer wrap-around scenarios by testing all possible positions where a log event can start/end relative to the buffer boundary. The capture position calculations (lines 332-340) correctly handle wrap-around by subtracting the buffer size when positions exceed it, and the iterative adjustment of capture positions (lines 356-358) properly accounts for cumulative offsets across multiple events.
135-136: Usefalse ==per coding guidelines.The codebase prefers
false == <expression>rather than using the expression directly in conditionals.As per coding guidelines
Apply this diff:
- if (expected_type.empty()) { + if (false == expected_type.empty()) {⛔ Skipped due to learnings
Learnt from: SharafMohamed Repo: y-scope/log-surgeon PR: 106 File: src/log_surgeon/Lalr1Parser.tpp:661-665 Timestamp: 2025-05-05T14:49:15.009Z Learning: The coding style for the log-surgeon codebase prefers `false == <expression>` rather than `!<expression>` when checking boolean conditions.Learnt from: SharafMohamed Repo: y-scope/log-surgeon PR: 42 File: src/log_surgeon/finite_automata/RegexAST.hpp:106-112 Timestamp: 2024-10-10T18:05:34.072Z Learning: In the C++ codebase for `log_surgeon`, the convention is to use `false == condition` instead of `!condition` when checking conditions, for clarity.Learnt from: SharafMohamed Repo: y-scope/log-surgeon PR: 42 File: src/log_surgeon/finite_automata/RegexAST.hpp:106-112 Timestamp: 2024-10-08T09:39:05.712Z Learning: In the C++ codebase for `log_surgeon`, the convention is to use `false == condition` instead of `!condition` when checking conditions, for clarity.
ReaderParser.ReaderParser (fixes #183).
| auto parse_and_validate( | ||
| ReaderParser& reader_parser, | ||
| string_view input, | ||
| vector<ExpectedEvent> const& expected_events | ||
| ) -> void { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai Could you please make an issue to refactor the parse_and_validate helpers in test-reader-parser.cpp and test-buffer-parser.cpp to use some common/shared helper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if test-buffer-parser.cpp exists and examine the parse_and_validate helper
if [ -f "tests/test-buffer-parser.cpp" ]; then
echo "=== Contents of test-buffer-parser.cpp ==="
cat tests/test-buffer-parser.cpp | head -150
fi
Length of output: 5701
🏁 Script executed:
#!/bin/bash
# Get more of test-buffer-parser.cpp to see the complete parse_and_validate and serialize_id_symbol_map
echo "=== More of test-buffer-parser.cpp ==="
cat tests/test-buffer-parser.cpp | tail -n +150 | head -100
echo -e "\n=== Check full parse_and_validate in test-reader-parser.cpp ==="
# Find the parse_and_validate function implementation
grep -A 100 "^auto parse_and_validate(" tests/test-reader-parser.cpp | head -120
Length of output: 7990
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to refactor parse_and_validate helpers
gh issue create \
--title "Refactor test helpers in test-reader-parser.cpp and test-buffer-parser.cpp to use common/shared utilities" \
--body "## Description
Both \`test-reader-parser.cpp\` and \`test-buffer-parser.cpp\` contain nearly identical test helper functions and data structures that should be refactored into a common/shared test utility module.
## Duplicated Code
### Shared Data Structures
Both files define:
- \`CapturePositions\` struct
- \`ExpectedToken\` struct
- \`ExpectedEvent\` struct
### Shared Helper Functions
Both files implement:
- \`parse_and_validate()\` - validates parsed events against expected tokens and captures (with parser-specific differences)
- \`serialize_id_symbol_map()\` - serializes the id-to-symbol map (identical implementation)
## Proposed Refactoring
1. Create a common test utilities header/source file (e.g., \`tests/common/test-parser-helpers.hpp\`).
2. Move the shared data structures to the common utility.
3. Extract the common validation logic from \`parse_and_validate()\` into reusable helper functions.
4. Use templates or function overloading to handle parser-specific differences (BufferParser vs ReaderParser).
5. Move \`serialize_id_symbol_map()\` to the common utility.
## Benefits
- Reduces code duplication
- Easier maintenance when validation logic needs to be updated
- Promotes consistency across test files
- Simplifies adding new parser test files in the future
## References
- PR: https://github.com/y-scope/log-surgeon/pull/184
- Comment: https://github.com/y-scope/log-surgeon/pull/184#discussion_r2510838518
- Requested by: @davidlion" \
--assignee davidlion
Length of output: 88
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to refactor parse_and_validate helpers
gh issue create \
--title "Refactor test helpers in test-reader-parser.cpp and test-buffer-parser.cpp to use common/shared utilities" \
--body "## Description
Both \`test-reader-parser.cpp\` and \`test-buffer-parser.cpp\` contain nearly identical test helper functions and data structures that should be refactored into a common/shared test utility module.
## Duplicated Code
### Shared Data Structures
Both files define:
- \`CapturePositions\` struct
- \`ExpectedToken\` struct
- \`ExpectedEvent\` struct
### Shared Helper Functions
Both files implement:
- \`parse_and_validate()\` - validates parsed events against expected tokens and captures (with parser-specific differences)
- \`serialize_id_symbol_map()\` - serializes the id-to-symbol map (identical implementation)
## Proposed Refactoring
1. Create a common test utilities header/source file (e.g., \`tests/common/test-parser-helpers.hpp\`).
2. Move the shared data structures to the common utility.
3. Extract the common validation logic from \`parse_and_validate()\` into reusable helper functions.
4. Use templates or function overloading to handle parser-specific differences (BufferParser vs ReaderParser).
5. Move \`serialize_id_symbol_map()\` to the common utility.
## Benefits
- Reduces code duplication
- Easier maintenance when validation logic needs to be updated
- Promotes consistency across test files
- Simplifies adding new parser test files in the future
## References
- PR: https://github.com/y-scope/log-surgeon/pull/184
- Comment: https://github.com/y-scope/log-surgeon/pull/184#discussion_r2510838518
- Requested by: @davidlion" \
--assignee davidlion
Length of output: 49
@davidlion I've created issue #188 to track the refactoring of the parse_and_validate helpers in test-reader-parser.cpp and test-buffer-parser.cpp.
The issue documents the duplicated code (the CapturePositions, ExpectedToken, and ExpectedEvent structs, plus the parse_and_validate() and serialize_id_symbol_map() functions) and proposes creating a common test utilities module to eliminate this duplication.
You can view the issue here: #188
🧠 Learnings used
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.
Reference
ReaderParserhas no unit-tests. #183Description
ReaderParser.Constants.hppto clarify that the buffer will break if it size is set to be odd.Validation performed
Summary by CodeRabbit
Documentation
Tests
Documentation