Ai 2161 update search tool to search in configurations#348
Ai 2161 update search tool to search in configurations#348mariankrotil wants to merge 33 commits intomainfrom
Conversation
…de-links-in-get-tables-exp
…de-links-in-get-tables-exp
…de-links-in-get-tables-exp
…ne filtering Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Codex <codex@openai.com>
|
@vita-stejskal heads-up: I switched config search to JSONPath-based matching in configuration payloads (including scoped search). I also updated the examples in the search tool description, and config-based results now return matched JSONPaths (match_scopes) so agent can see exactly where the value was found in the config, as we decided. |
| } | ||
| ], | ||
| [('test-config', ['parameters.query', 'storage.input[0].source'])], | ||
| ), |
There was a problem hiding this comment.
You should add at least one test with the scopes that proves that the searching really is constrained by the scopes. For example by having the same value of alpha under two different paths, but searching only at one path and checking that the other path is not hit.
| @@ -148,6 +155,15 @@ def check_id_fields(self) -> 'SearchHit': | |||
| def with_matches(self, matches: list['PatternMatch']) -> 'SearchHit': | |||
There was a problem hiding this comment.
A nitpick -- I'd remove this to set_matches to point out that this function mutates the existing SearchHit instance rather than creating a new one with the matches field replaced. The docstring already says this clearly, but the with_* prefix slightly contradicts this, because we typically use it for immutable setters that create a copy of the object with one particular field replaced.
| scope | ||
| for scope in unique_scopes | ||
| if not any( | ||
| other != scope and other.startswith(scope) and other[len(scope) : len(scope) + 1] in {'.', '['} |
There was a problem hiding this comment.
It would probably be more efficient to use:
other.startswith(scope) and len(other) > len(scope) and other[len(scope)] in ('.', '[')
- fetching a single character is faster than creating a single-char sub-string
- creating a tuple is more efficient than creating a set
| self._all_nodes_expr = jsonpath_ng.parse('$..*') | ||
| self._scope_exprs = [] | ||
| for scope in self.search_scopes: | ||
| normalized = scope if scope.startswith('$') else f'$.{scope}' |
There was a problem hiding this comment.
I think that you should use the new _normalize_jsonpath() function from PR#394 to handle the #-prefixed keys properly. Without that the jsonpath_ng.parse() is likely to fail for paths such as authorization.#apiKey.
| _compiled_patterns: list[re.Pattern] = PrivateAttr(default_factory=list) | ||
| _clean_patterns: list[str] = PrivateAttr(default_factory=list) | ||
| _all_nodes_expr: JSONPath | None = PrivateAttr(default=None) | ||
| _scope_exprs: list[tuple[str, JSONPath, JSONPath]] = PrivateAttr(default_factory=list) |
There was a problem hiding this comment.
It would be useful to mention what the tuple elements are.
| return [PatternMatch(scope=None, patterns=matched)] | ||
| return [] | ||
| # No scope provided – search all descendants and return exact match paths. | ||
| if self._all_nodes_expr is None: |
There was a problem hiding this comment.
Can this really be None if it is set in the after-validator?
| if matched := self.match_patterns(configuration): | ||
| return [PatternMatch(scope=None, patterns=matched)] | ||
| return [] | ||
| # No scope provided – search all descendants and return exact match paths. |
There was a problem hiding this comment.
It'd be easier to read the code, if this code-path were moved to the else-branch of the if self.search_scopes statement.
| - user_input: "Find components/transformations using my_bucket in input or output mappings" | ||
| -> patterns=["my_bucket"], item_types=["configuration", "transformation"], search_type="config-based", | ||
| scopes=["storage.input", "storage.output"] | ||
| -> Returns matches with paths like `storage.input[0].source` or `storage.output[0].target` |
There was a problem hiding this comment.
Are those paths accurate? Do they not look more like storage.input.tables[0].source and similar? In general, the input/output mappings can contain both tables and files.
|
|
||
| - user_input: "Find transformations using this table / column / specific code in its script" | ||
| -> patterns=["element"], item_types=["transformation"], search_type="config-based", | ||
| scopes=["parameters"] |
There was a problem hiding this comment.
This example is not accurate, I think. The table could be also mentioned under the storage scope (i.e. in the input/output mappings).
| return matches | ||
| return matches | ||
|
|
||
| def _find_scalar_matches_for_expr(self, configuration: JsonDict, parsed_expr: JSONPath) -> list[PatternMatch]: |
There was a problem hiding this comment.
This function is pretty much the same as _find_matches_for_expr() function. They could easily be collapsed to a single function with an extra "search_subtree" parameter (or "scalars_only" parameter or something like that).
|
This looks much better now. I think that we are nearly done. Thanks! |
Description
Extended the search tool beyond table/bucket usage references to robust configuration-aware discovery across items (configuration, transformation, flow, data-app, etc.).
Key updates in this branch:
config-based search now always collects all matched paths per item (simpler agent behavior, more complete results).
Linear: AI-2161-buckets-tables-references
Change Type
Summary
Testing
Streamable-HTTPtransports)Optional testing
canary-orionMCP (SSEandStreamable-HTTP)canary-orioncanary-orionChecklist