Skip to content

perf(parser,metrics): parsing with event-based architecture and FFI optimization#86

Merged
hat0uma merged 5 commits intomainfrom
refactor-metrics
Dec 15, 2025
Merged

perf(parser,metrics): parsing with event-based architecture and FFI optimization#86
hat0uma merged 5 commits intomainfrom
refactor-metrics

Conversation

@hat0uma
Copy link
Owner

@hat0uma hat0uma commented Dec 15, 2025

Summary

  • Modularize metrics into separate modules (ColumnTracker, RowMapper) for better maintainability
  • Refactor parser to use event-based callbacks instead of returning extracted field text
  • Add dedicated string processing module with FFI optimization for display width calculation

Performance

Environment: Windows, 100k rows × 15 columns CSV
Command: nvim --headless -c "luaf tests/perfcheck.lua" -c "qa!"

Metric Before After Change
Parse time 1718.40 ms 398.87 ms -77% (4.3x faster)
Peak memory 36,600 KB 37,250 KB +1.8%
Perfcheck details (main branch)
================================================================
 PERFCHECK RESULTS (N=10, Lines=100000, Cols=15)
================================================================
 Execution Time : 1718.40 ms (±15.82) [Min: 1696.68, Max: 1742.98]
 Throughput     : 58199 lines/sec
 Observed Peak  : 36599.60 KB (Avg) [Min: 36230.47, Max: 36908.91]
 Retained Mem   : 2.75 KB (Avg) [Min: 0.37, Max: 6.99]
================================================================
Perfcheck details (refactor-metrics branch)
================================================================
 PERFCHECK RESULTS (N=10, Lines=100000, Cols=15)
================================================================
 Execution Time : 398.87 ms (±6.16) [Min: 386.89, Max: 406.03]
 Throughput     : 250770 lines/sec
 Observed Peak  : 37250.00 KB (Avg) [Min: 37244.65, Max: 37259.28]
 Retained Mem   : 2.88 KB (Avg) [Min: 0.37, Max: 4.92]
================================================================

Changes

Metrics Modularization

  • Extract ColumnTracker (metrics_column.lua) for column width tracking
  • Extract RowMapper (metrics_row_mapper.lua) for physical/logical line mapping
  • Simplify metrics.lua to coordinate between modules

Parser Event-Based Refactoring

  • Replace on_line callback with granular events: on_field, on_record_start, on_record_end, on_comment
  • Pass line with offset/endpos instead of extracting field text, avoiding intermediate string allocations

String Processing Optimization

  • Add strings.lua module with FFI-accelerated functions
  • Implement display_width() and is_number() that operate directly on offsets without substring extraction

…t/endpos

  - Refactor parser to use event-based callbacks (on_field, on_record_start, on_record_end)
  - Change on_field signature to pass full line with offset/endpos instead of extracted field text
  - Remove parse_lines, consolidate into parse_line and parse_records
  - Move FieldBuffer from metrics_row_builder.lua into metrics_row.lua
  - Update parser tests to use parse_line instead of parse_lines
Add perfcheck.lua for measuring parser and metrics calculation performance.
The script generates configurable test data and measures:
 - Execution time with warmup iterations
 - Memory usage (peak and retained)
 - Throughput (lines/sec)
Replace `string.sub` + `vim.fn.strdisplaywidth` with optimized strings module
functions that avoid intermediate string allocations.

Benchmark (tests/perfcheck.lua, 100,000 lines x 15 cols):
  Before: 852ms, 117k lines/sec
  After:  382ms, 261k lines/sec (~2.2x faster)
@hat0uma hat0uma changed the title refactor(parser/metrics): modularize architecture and optimize field processing with FFI perf(parser/metrics): parsing with event-based architecture and FFI optimization Dec 15, 2025
@hat0uma hat0uma changed the title perf(parser/metrics): parsing with event-based architecture and FFI optimization perf(parser,metrics): parsing with event-based architecture and FFI optimization Dec 15, 2025
Document that `parse_line()` and `create_field_collector()` are not intended for
performance-critical paths due to string allocation overhead. Users
should use `parse_records()` with event callbacks directly for optimal
performance.
@hat0uma hat0uma merged commit 4c4db58 into main Dec 15, 2025
5 checks passed
@hat0uma hat0uma deleted the refactor-metrics branch December 15, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant