Skip to content

Conversation

@SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Nov 6, 2025

Reference

Description

  • Adds unit-tests for ReaderParser.
    • Added a unit-test to test a non-capture schema.
    • Added a unit-test to test a capture schema that wraps around the input buffer.
      • Tests all cases where the log ends right before the wrap around, right after the wrap around, and ever case in b/w.
  • Updates mainpage.docx with the new unit tests
  • Adds a comment to Constants.hpp to clarify that the buffer will break if it size is set to be odd.

Validation performed

  • New unit-tests pass.

Summary by CodeRabbit

  • Documentation

    • Updated documentation references to distinguish log buffer and log file parser examples and added unit-test references.
  • Tests

    • Added a new test suite validating the reader/log-file parser, including wrap-around buffer scenarios and capture-group position checks.
  • Documentation

    • Clarified inline comment about buffer size requirements for readability (no behavior change).

@SharafMohamed SharafMohamed requested a review from a team as a code owner November 6, 2025 10:55
@coderabbitai
Copy link

coderabbitai bot commented Nov 6, 2025

Walkthrough

Adds comprehensive ReaderParser unit tests (including wrap-around buffer scenarios), registers the new test in CMake, updates documentation references to separate buffer/file parser examples and add Reader Parser, and clarifies an inline comment about an even buffer size in Constants.hpp.

Changes

Cohort / File(s) Summary
Documentation
docs/doxygen/mainpage.dox
Replaces a single test reference with two: test_buffer_parser_no_capture ("Basic log buffer parser") and test_reader_parser_no_capture ("Basic log file parser"); adds unit_tests_reader_parser_wrap_around ("Reader Parser") to the Unit-tests list (two insertions).
Source comment
src/log_surgeon/Constants.hpp
Adds an inline comment clarifying that the static buffer size must be even.
Build configuration
tests/CMakeLists.txt
Adds test-reader-parser.cpp to the unit-test target sources.
Tests
tests/test-reader-parser.cpp
New test file implementing ReaderParser tests, helper types (CapturePositions, ExpectedToken, ExpectedEvent), a parse_and_validate harness, and two test cases: single_line_without_capture_reader_parser and reader_parser_wrap_around (exercises buffer wrap-around and capture-position validation).

Sequence Diagram(s)

sequenceDiagram
    participant T as Test Case
    participant S as Schema
    participant R as Reader
    participant P as ReaderParser

    rect rgb(220,240,255)
      Note over T,S: Test constructs schema/patterns
    end

    T->>S: build Schema (patterns)
    T->>R: initialize Reader with input buffer(s)
    T->>P: instantiate ReaderParser with Schema

    rect rgb(240,255,240)
      Note over T,P: parse iterations (normal)
    end

    T->>P: parseChunk()
    P->>R: request buffer data
    R-->>P: provide bytes (tokens)
    P-->>T: emit events & tokens
    T->>T: validate token types/values

    rect rgb(255,240,240)
      Note over T,P: wrap-around scenario
    end

    T->>R: supply next buffer portion (wrap)
    P->>R: read wrapped range
    R-->>P: return wrapped bytes crossing boundary
    P-->>T: emit events with capture positions (adjusted for wrap)
    T->>T: validate capture positions and events
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Pay attention to tests/test-reader-parser.cpp for correctness of wrap-around setup, expected event ordering, and capture position calculations.
  • Verify docs/doxygen/mainpage.dox insertions match existing doc reference style and anchors.
  • tests/CMakeLists.txt and src/log_surgeon/Constants.hpp are low-risk but check formatting and comment placement.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding unit tests for the ReaderParser component, with a reference to the related issue.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b1c4fea and e4e3540.

📒 Files selected for processing (4)
  • docs/doxygen/mainpage.dox (2 hunks)
  • src/log_surgeon/Constants.hpp (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/test-reader-parser.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • src/log_surgeon/Constants.hpp
  • tests/test-reader-parser.cpp
🧠 Learnings (6)
📚 Learning: 2025-08-08T13:18:39.895Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:42-48
Timestamp: 2025-08-08T13:18:39.895Z
Learning: In y-scope/log-surgeon (C++), it is acceptable/preferred to keep `const` qualifiers on by-value function parameters to signal intent (e.g., in src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp). Do not suggest removing `const` from by-value parameters in future reviews.

Applied to files:

  • src/log_surgeon/Constants.hpp
📚 Learning: 2025-08-08T10:23:06.281Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp:80-88
Timestamp: 2025-08-08T10:23:06.281Z
Learning: In y-scope/log-surgeon (C++), small function definitions are allowed to remain inline in headers. For src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp, do not suggest moving small methods like QueryInterpretation::append_variable_token to the .cpp for “consistency” in future reviews.

Applied to files:

  • src/log_surgeon/Constants.hpp
📚 Learning: 2025-08-15T01:04:03.681Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/ExpressionView.cpp:11-11
Timestamp: 2025-08-15T01:04:03.681Z
Learning: In y-scope/log-surgeon C++ codebase, prefer explicit includes in implementation files (.cpp) for types that are directly used, even when those headers are already transitively included through the corresponding header file. This makes dependencies explicit and improves maintainability.

Applied to files:

  • src/log_surgeon/Constants.hpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...

Applied to files:

  • docs/doxygen/mainpage.dox
  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.

Applied to files:

  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.

Applied to files:

  • tests/test-reader-parser.cpp
🧬 Code graph analysis (1)
tests/test-reader-parser.cpp (2)
src/log_surgeon/finite_automata/PrefixTree.hpp (1)
  • PrefixTree (29-29)
src/log_surgeon/LogParser.hpp (1)
  • symbol (73-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build (macos-15, debug)
  • GitHub Check: build (macos-15, release)
  • GitHub Check: lint (ubuntu-24.04)
  • GitHub Check: lint (macos-15)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4e3540 and e53eaaa.

📒 Files selected for processing (1)
  • tests/test-reader-parser.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,h,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • tests/test-reader-parser.cpp
🧠 Learnings (6)
📚 Learning: 2025-08-15T12:07:58.626Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.

Applied to files:

  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-08T10:00:20.963Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:00:20.963Z
Learning: In src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp (C++), do not default the comparison operators: the class stores std::vector<std::variant<StaticQueryToken, VariableQueryToken>>, which yields only a weak ordering. A custom operator<=> that maps the variant’s weak ordering to std::strong_ordering is required; avoid suggesting defaulting there in future reviews.

Applied to files:

  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-15T01:04:03.681Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: src/log_surgeon/wildcard_query_parser/ExpressionView.cpp:11-11
Timestamp: 2025-08-15T01:04:03.681Z
Learning: In y-scope/log-surgeon C++ codebase, prefer explicit includes in implementation files (.cpp) for types that are directly used, even when those headers are already transitively included through the corresponding header file. This makes dependencies explicit and improves maintainability.

Applied to files:

  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-08T10:17:43.495Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 144
File: src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp:28-31
Timestamp: 2025-08-08T10:17:43.495Z
Learning: In src/log_surgeon/wildcard_query_parser/VariableQueryToken.hpp/.cpp (C++), do not suggest defaulting operator<=>. The project prefers a custom out-of-line comparator that explicitly handles the bool member (via explicit cast) to avoid implicit conversions; keep the current manual implementation.

Applied to files:

  • tests/test-reader-parser.cpp
📚 Learning: 2025-10-22T15:40:29.992Z
Learnt from: davidlion
Repo: y-scope/log-surgeon PR: 165
File: src/log_surgeon/LogEvent.cpp:53-55
Timestamp: 2025-10-22T15:40:29.992Z
Learning: In `src/log_surgeon/LogEvent.cpp`, the `get_logtype()` method has two independent mechanisms for adding `<timestamp>` to the logtype string: (1) A prefix added when `has_timestamp()` returns true, indicating a standalone timestamp token from timestamp rules (lines 53-55), and (2) Named capture group processing that adds `<timestamp>` tags for `(?<timestamp>...)` patterns in any variable rule (lines 61-95). These mechanisms do not interfere with each other. Capture groups named `timestamp` do not affect `has_timestamp()` status, as using captures as actual timestamps is not supported.
<!-- [add_learning]
In the log-surgeon codebase, capture groups (e.g., `(?<timestamp>...)`) in schema variable rules are added to the logtype string through the existing capture group processing logic, but they do not create actual timestamp tokens and do not cause `has_timestamp()` to return true. Using capture groups as actual timestamps is not supported; standalone timestamp tokens come from `timestamp:...

Applied to files:

  • tests/test-reader-parser.cpp
📚 Learning: 2025-08-26T10:13:00.368Z
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 152
File: tests/test-query.cpp:61-67
Timestamp: 2025-08-26T10:13:00.368Z
Learning: In Catch2 unit tests, the REQUIRE macro already provides detailed debugging output when container comparisons (like std::set equality) fail, showing both expected and actual values. Additional CAPTURE statements are typically unnecessary for such comparisons.

Applied to files:

  • tests/test-reader-parser.cpp
🧬 Code graph analysis (1)
tests/test-reader-parser.cpp (1)
src/log_surgeon/LogParser.hpp (1)
  • symbol (73-73)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: lint (ubuntu-24.04)
  • GitHub Check: lint (macos-15)
  • GitHub Check: build (macos-15, release)
  • GitHub Check: build (macos-15, debug)
🔇 Additional comments (3)
tests/test-reader-parser.cpp (3)

229-251: Well-designed test case!

The test correctly validates variable matching without capture groups, ensuring exact matches are recognized and non-matches are treated as uncaught strings. The expected tokens properly alternate between matched variables and uncaught content.


271-365: Comprehensive wrap-around test implementation!

This test thoroughly exercises buffer wrap-around scenarios by testing all possible positions where a log event can start/end relative to the buffer boundary. The capture position calculations (lines 332-340) correctly handle wrap-around by subtracting the buffer size when positions exceed it, and the iterative adjustment of capture positions (lines 356-358) properly accounts for cumulative offsets across multiple events.


135-136: Use false == per coding guidelines.

The codebase prefers false == <expression> rather than using the expression directly in conditionals.

As per coding guidelines

Apply this diff:

-            if (expected_type.empty()) {
+            if (false == expected_type.empty()) {
⛔ Skipped due to learnings
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 106
File: src/log_surgeon/Lalr1Parser.tpp:661-665
Timestamp: 2025-05-05T14:49:15.009Z
Learning: The coding style for the log-surgeon codebase prefers `false == <expression>` rather than `!<expression>` when checking boolean conditions.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexAST.hpp:106-112
Timestamp: 2024-10-10T18:05:34.072Z
Learning: In the C++ codebase for `log_surgeon`, the convention is to use `false == condition` instead of `!condition` when checking conditions, for clarity.
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 42
File: src/log_surgeon/finite_automata/RegexAST.hpp:106-112
Timestamp: 2024-10-08T09:39:05.712Z
Learning: In the C++ codebase for `log_surgeon`, the convention is to use `false == condition` instead of `!condition` when checking conditions, for clarity.

@davidlion davidlion changed the title test: Add unit-tests for ReaderParser. test: Add unit-tests for ReaderParser (fixes #183). Nov 10, 2025
Comment on lines +79 to +83
auto parse_and_validate(
ReaderParser& reader_parser,
string_view input,
vector<ExpectedEvent> const& expected_events
) -> void {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai Could you please make an issue to refactor the parse_and_validate helpers in test-reader-parser.cpp and test-buffer-parser.cpp to use some common/shared helper?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if test-buffer-parser.cpp exists and examine the parse_and_validate helper
if [ -f "tests/test-buffer-parser.cpp" ]; then
    echo "=== Contents of test-buffer-parser.cpp ==="
    cat tests/test-buffer-parser.cpp | head -150
fi

Length of output: 5701


🏁 Script executed:

#!/bin/bash
# Get more of test-buffer-parser.cpp to see the complete parse_and_validate and serialize_id_symbol_map
echo "=== More of test-buffer-parser.cpp ==="
cat tests/test-buffer-parser.cpp | tail -n +150 | head -100

echo -e "\n=== Check full parse_and_validate in test-reader-parser.cpp ==="
# Find the parse_and_validate function implementation
grep -A 100 "^auto parse_and_validate(" tests/test-reader-parser.cpp | head -120

Length of output: 7990


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to refactor parse_and_validate helpers
gh issue create \
  --title "Refactor test helpers in test-reader-parser.cpp and test-buffer-parser.cpp to use common/shared utilities" \
  --body "## Description

Both \`test-reader-parser.cpp\` and \`test-buffer-parser.cpp\` contain nearly identical test helper functions and data structures that should be refactored into a common/shared test utility module.

## Duplicated Code

### Shared Data Structures
Both files define:
- \`CapturePositions\` struct
- \`ExpectedToken\` struct  
- \`ExpectedEvent\` struct

### Shared Helper Functions
Both files implement:
- \`parse_and_validate()\` - validates parsed events against expected tokens and captures (with parser-specific differences)
- \`serialize_id_symbol_map()\` - serializes the id-to-symbol map (identical implementation)

## Proposed Refactoring

1. Create a common test utilities header/source file (e.g., \`tests/common/test-parser-helpers.hpp\`).
2. Move the shared data structures to the common utility.
3. Extract the common validation logic from \`parse_and_validate()\` into reusable helper functions.
4. Use templates or function overloading to handle parser-specific differences (BufferParser vs ReaderParser).
5. Move \`serialize_id_symbol_map()\` to the common utility.

## Benefits

- Reduces code duplication
- Easier maintenance when validation logic needs to be updated
- Promotes consistency across test files
- Simplifies adding new parser test files in the future

## References

- PR: https://github.com/y-scope/log-surgeon/pull/184
- Comment: https://github.com/y-scope/log-surgeon/pull/184#discussion_r2510838518
- Requested by: @davidlion" \
  --assignee davidlion

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Create a GitHub issue to refactor parse_and_validate helpers
gh issue create \
  --title "Refactor test helpers in test-reader-parser.cpp and test-buffer-parser.cpp to use common/shared utilities" \
  --body "## Description

Both \`test-reader-parser.cpp\` and \`test-buffer-parser.cpp\` contain nearly identical test helper functions and data structures that should be refactored into a common/shared test utility module.

## Duplicated Code

### Shared Data Structures
Both files define:
- \`CapturePositions\` struct
- \`ExpectedToken\` struct  
- \`ExpectedEvent\` struct

### Shared Helper Functions
Both files implement:
- \`parse_and_validate()\` - validates parsed events against expected tokens and captures (with parser-specific differences)
- \`serialize_id_symbol_map()\` - serializes the id-to-symbol map (identical implementation)

## Proposed Refactoring

1. Create a common test utilities header/source file (e.g., \`tests/common/test-parser-helpers.hpp\`).
2. Move the shared data structures to the common utility.
3. Extract the common validation logic from \`parse_and_validate()\` into reusable helper functions.
4. Use templates or function overloading to handle parser-specific differences (BufferParser vs ReaderParser).
5. Move \`serialize_id_symbol_map()\` to the common utility.

## Benefits

- Reduces code duplication
- Easier maintenance when validation logic needs to be updated
- Promotes consistency across test files
- Simplifies adding new parser test files in the future

## References

- PR: https://github.com/y-scope/log-surgeon/pull/184
- Comment: https://github.com/y-scope/log-surgeon/pull/184#discussion_r2510838518
- Requested by: @davidlion" \
  --assignee davidlion

Length of output: 49


@davidlion I've created issue #188 to track the refactoring of the parse_and_validate helpers in test-reader-parser.cpp and test-buffer-parser.cpp.

The issue documents the duplicated code (the CapturePositions, ExpectedToken, and ExpectedEvent structs, plus the parse_and_validate() and serialize_id_symbol_map() functions) and proposes creating a common test utilities module to eliminate this duplication.

You can view the issue here: #188


🧠 Learnings used
Learnt from: SharafMohamed
Repo: y-scope/log-surgeon PR: 150
File: tests/test-expression-view.cpp:7-7
Timestamp: 2025-08-15T12:07:58.626Z
Learning: In tests/test-expression-view.cpp, the `<catch2/catch_message.hpp>` header is required for clang-tidy to pass, even though it may not be directly used in the visible code.

@SharafMohamed SharafMohamed merged commit a046bea into y-scope:main Nov 12, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants