Skip to content

Comments

Ai 2161 update search tool to search in configurations#348

Open
mariankrotil wants to merge 33 commits intomainfrom
AI-2161-include-links-in-get-tables-exp
Open

Ai 2161 update search tool to search in configurations#348
mariankrotil wants to merge 33 commits intomainfrom
AI-2161-include-links-in-get-tables-exp

Conversation

@mariankrotil
Copy link
Contributor

@mariankrotil mariankrotil commented Jan 12, 2026

Description

Extended the search tool beyond table/bucket usage references to robust configuration-aware discovery across items (configuration, transformation, flow, data-app, etc.).

Key updates in this branch:

  • Added config-based search path reporting via match_scopes in SearchHit.
  • Reworked config matching to use JSONPath traversal (including scoped search) and return exact matched paths.
    config-based search now always collects all matched paths per item (simpler agent behavior, more complete results).
  • Removed return_all_matches and case_sensitive from the search tool API to reduce complexity and ambiguity.
  • Improved scope handling for scalar paths (e.g. direct scope like parameters.api.baseUrl) while preserving descendant matching.
  • Kept output practical by exposing only most-specific paths in match_scopes (while internal matches keep full detail).
  • Updated search tool docs/examples and project system prompt guidance to explicitly cover config-based search usage.

Linear: AI-2161-buckets-tables-references

Change Type

  • Major (breaking changes, significant new features)
  • Minor (new features, enhancements, backward compatible)
  • Patch (bug fixes, small improvements, no new features)

Summary

  • Update search tool

Testing

  • [x ] Tested with Cursor AI desktop (Streamable-HTTP transports)

Optional testing

  • Tested with Cursor AI desktop (all transports)
  • Tested with claude.ai web and canary-orion MCP (SSE and Streamable-HTTP)
  • Tested with In Platform Agent on canary-orion
  • Tested with RO chat on canary-orion

Checklist

  • Self-review completed
  • Unit tests added/updated (if applicable)
  • Integration tests added/updated (if applicable)
  • Project version bumped according to the change type (if applicable)
  • Documentation updated (if applicable)

@linear
Copy link

linear bot commented Jan 12, 2026

@mariankrotil mariankrotil changed the base branch from main to AI-2161-include-links-in-get-tables January 12, 2026 18:13
@mariankrotil mariankrotil changed the title Ai 2161 include references in tables and buckets Ai 2161 update search tool to search in configurations Jan 16, 2026
@mariankrotil mariankrotil self-assigned this Jan 20, 2026
Base automatically changed from AI-2161-include-links-in-get-tables to main January 23, 2026 08:46
@mariankrotil
Copy link
Contributor Author

@vita-stejskal heads-up: I switched config search to JSONPath-based matching in configuration payloads (including scoped search). I also updated the examples in the search tool description, and config-based results now return matched JSONPaths (match_scopes) so agent can see exactly where the value was found in the config, as we decided.

}
],
[('test-config', ['parameters.query', 'storage.input[0].source'])],
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add at least one test with the scopes that proves that the searching really is constrained by the scopes. For example by having the same value of alpha under two different paths, but searching only at one path and checking that the other path is not hit.

@@ -148,6 +155,15 @@ def check_id_fields(self) -> 'SearchHit':
def with_matches(self, matches: list['PatternMatch']) -> 'SearchHit':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick -- I'd remove this to set_matches to point out that this function mutates the existing SearchHit instance rather than creating a new one with the matches field replaced. The docstring already says this clearly, but the with_* prefix slightly contradicts this, because we typically use it for immutable setters that create a copy of the object with one particular field replaced.

scope
for scope in unique_scopes
if not any(
other != scope and other.startswith(scope) and other[len(scope) : len(scope) + 1] in {'.', '['}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be more efficient to use:

other.startswith(scope) and len(other) > len(scope) and other[len(scope)] in ('.', '[')
  • fetching a single character is faster than creating a single-char sub-string
  • creating a tuple is more efficient than creating a set

self._all_nodes_expr = jsonpath_ng.parse('$..*')
self._scope_exprs = []
for scope in self.search_scopes:
normalized = scope if scope.startswith('$') else f'$.{scope}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you should use the new _normalize_jsonpath() function from PR#394 to handle the #-prefixed keys properly. Without that the jsonpath_ng.parse() is likely to fail for paths such as authorization.#apiKey.

_compiled_patterns: list[re.Pattern] = PrivateAttr(default_factory=list)
_clean_patterns: list[str] = PrivateAttr(default_factory=list)
_all_nodes_expr: JSONPath | None = PrivateAttr(default=None)
_scope_exprs: list[tuple[str, JSONPath, JSONPath]] = PrivateAttr(default_factory=list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to mention what the tuple elements are.

return [PatternMatch(scope=None, patterns=matched)]
return []
# No scope provided – search all descendants and return exact match paths.
if self._all_nodes_expr is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this really be None if it is set in the after-validator?

if matched := self.match_patterns(configuration):
return [PatternMatch(scope=None, patterns=matched)]
return []
# No scope provided – search all descendants and return exact match paths.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be easier to read the code, if this code-path were moved to the else-branch of the if self.search_scopes statement.

- user_input: "Find components/transformations using my_bucket in input or output mappings"
-> patterns=["my_bucket"], item_types=["configuration", "transformation"], search_type="config-based",
scopes=["storage.input", "storage.output"]
-> Returns matches with paths like `storage.input[0].source` or `storage.output[0].target`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those paths accurate? Do they not look more like storage.input.tables[0].source and similar? In general, the input/output mappings can contain both tables and files.


- user_input: "Find transformations using this table / column / specific code in its script"
-> patterns=["element"], item_types=["transformation"], search_type="config-based",
scopes=["parameters"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is not accurate, I think. The table could be also mentioned under the storage scope (i.e. in the input/output mappings).

return matches
return matches

def _find_scalar_matches_for_expr(self, configuration: JsonDict, parsed_expr: JSONPath) -> list[PatternMatch]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is pretty much the same as _find_matches_for_expr() function. They could easily be collapsed to a single function with an extra "search_subtree" parameter (or "scalars_only" parameter or something like that).

@vita-stejskal
Copy link
Contributor

This looks much better now. I think that we are nearly done. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants