feat: payload optimization for get_data, get_disaggregation, get_metadata#40
Open
rafmacalaba wants to merge 8 commits intofeat/error-managementfrom
Open
feat: payload optimization for get_data, get_disaggregation, get_metadata#40rafmacalaba wants to merge 8 commits intofeat/error-managementfrom
rafmacalaba wants to merge 8 commits intofeat/error-managementfrom
Conversation
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
Addresses #19: Implement centralized error management. - Add src/data360/errors.py with Data360MCPError base class and subclasses: APIError, TimeoutError, RequestError, ParseError, ValidationError, NotFoundError - Add from_httpx_error() converter for clean httpx exception handling - Add message registry mapping error codes to LLM-actionable messages - Refactor api.py: replace scattered try/except blocks in _search_raw(), get_metadata(), get_disaggregation(), get_data() with centralized errors - Update test assertions to match new error message format - All 20 existing tests pass
When Draco cannot determine an optimal encoding (StopIteration), the function now returns a 'warning' key in the response dict alongside the URL. Previously the fallback was silent and indistinguishable from a Draco success. Added tests/test_visualization.py with 3 test cases: - Fallback returns warning key - Draco success has no warning key - Both Draco and fallback fail returns error
- Move _FALLBACK_WARNING to module-level constant - Update docstring to document the optional 'warning' key - Mock get_codelist_mapping in tests to avoid real network calls
- Rename TimeoutError to Data360TimeoutError to avoid shadowing builtin - Rename from_httpx_error to classify_error (handles non-httpx exceptions) - Add tests/test_errors.py with 18 tests covering all error classes and classify_error mapping logic - All 41 tests pass
…data - get_data: prefilter rows from 24 fields to 5-10 (allowlist approach) - Core fields: OBS_VALUE, TIME_PERIOD, REF_AREA, UNIT_MEASURE, claim_id - Conditional: SEX, AGE, URBANISATION, COMP_BREAKDOWN_1/2 (kept if non-trivial) - Promote COMMENT_TS to metadata['indicator_description'] - claim_id computed on raw data before stripping - get_disaggregation / get_metadata: strip bloated dimensions - Remove INDICATOR, FREQ (always single-value, already known) - Remove single-value _T dimensions (SEX/AGE/URBANISATION with no disagg) - Sort TIME_PERIOD chronologically - Summarize REF_AREA to count + queried country lookup - Add required_country param (name or code, comma-separated) - resources: update DATABASES (4 examples -> 16 verified), DATA_SCHEMA (prefiltered fields) - tests: 21 new tests in test_prefiltering.py, 1 updated in test_api.py
- Extract _resolve_queried_countries helper to deduplicate logic - Update get_metadata and get_disaggregation docstrings for new output shape - Document COMP_BREAKDOWN_3 intentional exclusion - Add edge-case tests (empty dims, single-country REF_AREA) - Add get_data disaggregation independence test - Fix test nits (unused param, blank line) - Add docs/payload_analysis.md for traceability
a4f5544 to
8db5a1e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Optimizes API payloads returned by
get_data,get_metadata, andget_disaggregationto reduce LLM token consumption. Based on a 16-database field survey (docs/payload_analysis.md).Important
This PR is based on
feat/error-managementand should be merged after that branch lands.Changes
Data Row Prefiltering (
get_data)_strip_data_row: Keeps 5 core fields (OBS_VALUE,TIME_PERIOD,REF_AREA,UNIT_MEASURE,claim_id) and conditionally includes disaggregation fields (SEX,AGE,URBANISATION,COMP_BREAKDOWN_1/2) only when non-trivial (not_Tor_Z).COMMENT_TSvalue and stores it inmetadata["indicator_description"]instead of repeating it per row.claim_idis computed on the raw row before stripping, preserving data provenance.Disaggregation Stripping (
get_metadata,get_disaggregation)_strip_disaggregation: Removes redundant dimensions (INDICATOR,FREQ), strips single-value_Tdimensions, sortsTIME_PERIODchronologically, and summarizesREF_AREAas{count, sample}or{count, queried}.required_countryparameter: Added to bothget_metadataandget_disaggregationto let LLMs check country data availability without receiving the full REF_AREA list.Resource Schema Update
DATABASESlist from 4 to 16 entries (matching the survey).DATA_SCHEMAto document the prefiltered output shape (core vs conditional fields).Code Review Fixes
_resolve_queried_countrieshelper to deduplicate country-resolution logic.COMP_BREAKDOWN_3intentional exclusion.Tests
TestStripDataRow(9 tests): Core fields, conditional field retention, trivial value strippingTestGetDataPrefiltering(3 tests): End-to-end integration for boilerplate stripping, disaggregation retention, COMMENT_TS promotionTestStripDisaggregation(13 tests): Dimension stripping, TIME_PERIOD sorting, REF_AREA summarization, queried countries, edge cases (empty input, single country)TestGetDataDisaggregationIndependence(1 test): Confirmsget_datauses raw disaggregation, not stripped outputChecklist
docs/payload_analysis.md)