Skip to content

feat: payload optimization for get_data, get_disaggregation, get_metadata#40

Open
rafmacalaba wants to merge 8 commits intofeat/error-managementfrom
feat/payload-optimization
Open

feat: payload optimization for get_data, get_disaggregation, get_metadata#40
rafmacalaba wants to merge 8 commits intofeat/error-managementfrom
feat/payload-optimization

Conversation

@rafmacalaba
Copy link
Collaborator

Summary

Optimizes API payloads returned by get_data, get_metadata, and get_disaggregation to reduce LLM token consumption. Based on a 16-database field survey (docs/payload_analysis.md).

Important

This PR is based on feat/error-management and should be merged after that branch lands.

Changes

Data Row Prefiltering (get_data)

  • _strip_data_row: Keeps 5 core fields (OBS_VALUE, TIME_PERIOD, REF_AREA, UNIT_MEASURE, claim_id) and conditionally includes disaggregation fields (SEX, AGE, URBANISATION, COMP_BREAKDOWN_1/2) only when non-trivial (not _T or _Z).
  • COMMENT_TS promotion: Extracts the first non-null COMMENT_TS value and stores it in metadata["indicator_description"] instead of repeating it per row.
  • claim_id is computed on the raw row before stripping, preserving data provenance.

Disaggregation Stripping (get_metadata, get_disaggregation)

  • _strip_disaggregation: Removes redundant dimensions (INDICATOR, FREQ), strips single-value _T dimensions, sorts TIME_PERIOD chronologically, and summarizes REF_AREA as {count, sample} or {count, queried}.
  • required_country parameter: Added to both get_metadata and get_disaggregation to let LLMs check country data availability without receiving the full REF_AREA list.

Resource Schema Update

  • Expanded DATABASES list from 4 to 16 entries (matching the survey).
  • Updated DATA_SCHEMA to document the prefiltered output shape (core vs conditional fields).

Code Review Fixes

  • Extracted _resolve_queried_countries helper to deduplicate country-resolution logic.
  • Updated docstrings to reflect post-stripping output shapes.
  • Documented COMP_BREAKDOWN_3 intentional exclusion.

Tests

  • 44/44 tests pass
  • New tests:
    • TestStripDataRow (9 tests): Core fields, conditional field retention, trivial value stripping
    • TestGetDataPrefiltering (3 tests): End-to-end integration for boilerplate stripping, disaggregation retention, COMMENT_TS promotion
    • TestStripDisaggregation (13 tests): Dimension stripping, TIME_PERIOD sorting, REF_AREA summarization, queried countries, edge cases (empty input, single country)
    • TestGetDataDisaggregationIndependence (1 test): Confirms get_data uses raw disaggregation, not stripped output

Checklist

  • Changes are focused and scoped
  • Tests added for new functionality
  • Existing tests still pass
  • Docstrings updated
  • Design decisions documented (docs/payload_analysis.md)

avsolatorio and others added 8 commits February 24, 2026 21:50
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
Addresses #19: Implement centralized error management.

- Add src/data360/errors.py with Data360MCPError base class and subclasses:
  APIError, TimeoutError, RequestError, ParseError, ValidationError, NotFoundError
- Add from_httpx_error() converter for clean httpx exception handling
- Add message registry mapping error codes to LLM-actionable messages
- Refactor api.py: replace scattered try/except blocks in _search_raw(),
  get_metadata(), get_disaggregation(), get_data() with centralized errors
- Update test assertions to match new error message format
- All 20 existing tests pass
When Draco cannot determine an optimal encoding (StopIteration), the
function now returns a 'warning' key in the response dict alongside the
URL. Previously the fallback was silent and indistinguishable from a
Draco success.

Added tests/test_visualization.py with 3 test cases:
- Fallback returns warning key
- Draco success has no warning key
- Both Draco and fallback fail returns error
- Move _FALLBACK_WARNING to module-level constant
- Update docstring to document the optional 'warning' key
- Mock get_codelist_mapping in tests to avoid real network calls
- Rename TimeoutError to Data360TimeoutError to avoid shadowing builtin
- Rename from_httpx_error to classify_error (handles non-httpx exceptions)
- Add tests/test_errors.py with 18 tests covering all error classes and
  classify_error mapping logic
- All 41 tests pass
…data

- get_data: prefilter rows from 24 fields to 5-10 (allowlist approach)
  - Core fields: OBS_VALUE, TIME_PERIOD, REF_AREA, UNIT_MEASURE, claim_id
  - Conditional: SEX, AGE, URBANISATION, COMP_BREAKDOWN_1/2 (kept if non-trivial)
  - Promote COMMENT_TS to metadata['indicator_description']
  - claim_id computed on raw data before stripping

- get_disaggregation / get_metadata: strip bloated dimensions
  - Remove INDICATOR, FREQ (always single-value, already known)
  - Remove single-value _T dimensions (SEX/AGE/URBANISATION with no disagg)
  - Sort TIME_PERIOD chronologically
  - Summarize REF_AREA to count + queried country lookup
  - Add required_country param (name or code, comma-separated)

- resources: update DATABASES (4 examples -> 16 verified), DATA_SCHEMA (prefiltered fields)
- tests: 21 new tests in test_prefiltering.py, 1 updated in test_api.py
- Extract _resolve_queried_countries helper to deduplicate logic
- Update get_metadata and get_disaggregation docstrings for new output shape
- Document COMP_BREAKDOWN_3 intentional exclusion
- Add edge-case tests (empty dims, single-country REF_AREA)
- Add get_data disaggregation independence test
- Fix test nits (unused param, blank line)
- Add docs/payload_analysis.md for traceability
@rafmacalaba rafmacalaba requested a review from avsolatorio March 18, 2026 00:44
@rafmacalaba rafmacalaba force-pushed the feat/error-management branch from a4f5544 to 8db5a1e Compare March 18, 2026 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants