Skip to content

feat(extract): table-aware field detection for table structures#60

Merged
avifenesh merged 3 commits intomainfrom
feature/table-aware-extract-50
Feb 24, 2026
Merged

feat(extract): table-aware field detection for table structures#60
avifenesh merged 3 commits intomainfrom
feature/table-aware-extract-50

Conversation

@avifenesh
Copy link
Collaborator

Summary

  • Add table-aware field detection to the extract macro's auto-detect mode, fixing detection of <table> structures on documentation and data pages
  • Table groups now receive a 2x scoring boost and tolerate mixed TH/TD header rows in signature matching
  • Headerless tables produce column-indexed fields (column_1, column_2, etc.) instead of falling back to generic card heuristics
  • Selector mode supports column_N field names for extracting specific table columns by index

Test Plan

  • All 373 existing tests pass (0 regressions)
  • Updated test: headerless tables now produce column-indexed fields instead of text blob
  • New test: headerless table URL extraction alongside column fields
  • New test: selector mode column_N field extraction from TR elements
  • Review loop: 2 iterations, 4 core reviewers, all findings resolved
  • Deslop scan: clean
  • Delivery validation: all checks passed

Closes #50

- Skip TH header row when computing TR group signatures so mixed
  TH/TD tables pass the allSame check
- Boost scoring for TR groups under TBODY/TABLE/THEAD
- Add column-indexed extraction (column_1, column_2, ...) for
  headerless tables instead of falling back to generic extractItem
- Support column_N fields in selector mode extractField
- Update and add tests for headerless table extraction and
  selector mode column_N fields
- Restore element count guard before sigStart logic to prevent potential
  access to elements[0] on empty groups
- Cache parent tagName once instead of accessing twice (pt and pt2)
- Add indexOf prefix check before regex in column_N extraction
Document headerless table column-indexed extraction (column_1, column_2)
and selector mode column_N field support in CHANGELOG, README, and
SKILL.md.
@avifenesh avifenesh merged commit d52ef93 into main Feb 24, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

extract: table-aware field detection for <table> structures

1 participant