Skip to content

Conversation

@ppinchuk
Copy link
Collaborator

The core fundamentals for using one-shot extraction are in place.

Still TODO (future PR):

  • LLM-generated website keywords
  • LLM-generated keyword heuristic
  • Text extractor based on extraction schema

@ppinchuk ppinchuk requested a review from castelao as a code owner February 10, 2026 20:23
Copilot AI review requested due to automatic review settings February 10, 2026 20:23
@ppinchuk ppinchuk added enhancement Update to logic or general code improvements new computation Update that adds a new computation method p-critical Priority: critical topic-python-llm Issues/pull requests related to LLMs topic-python-general Issues/pull requests related to python labels Feb 10, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 41 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

pyproject.toml:50

  • compass.utilities.io now imports yaml and toml at runtime, but neither PyYAML nor a toml package is listed in the project dependencies. This will raise ImportError in environments where they aren't installed transitively. Add the needed dependencies (or switch TOML loading to the stdlib tomllib for Python 3.12+), and consider gating YAML/TOML support behind optional extras if you don't want them in the core install set.

Comment on lines +341 to +357
logger.debug("Loading query templates from cache at %s", cache_fp)
cache = json.loads(cache_fp.read_text(encoding="utf-8"))
if identifier.casefold() not in qt:
logger.debug(
"Adding query templates for %r to cache at %s",
identifier,
cache_fp,
)
cache[identifier.casefold()] = {
"templates": qt,
"sha256": hashlib.sha256(str(schema).encode()).hexdigest(),
}
cache_fp.write_text(json.dumps(cache, indent=4), encoding="utf-8")
return

potential_qt = qt[identifier.casefold()]
m = hashlib.sha256()
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_qt_to_cache is using the qt list (query templates) where it should be using the cache dict loaded from disk. As written, if identifier.casefold() not in qt: will almost always be true, and potential_qt = qt[identifier.casefold()] will then crash because qt is a list, not a dict. Use cache for membership/indexing and only use qt as the templates payload being stored.

Copilot uses AI. Check for mistakes.
Comment on lines +48 to +57
multiple=True,
help="One-shot plugin configuration to add to COMPASS before processing",
)
def process(config, verbose, no_progress, plugin):
"""Download and extract ordinances for a list of jurisdictions"""
config = load_config(config)

for one_shot_plugin_config in plugin:
create_schema_based_one_shot_extraction_plugin(
config=one_shot_plugin_config, tech=config["tech"]
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLI option --plugin/-p is declared as multiple=True, but each provided plugin config is registered with the same identifier tech=config["tech"]. If more than one --plugin is provided, register_plugin will raise because identifiers must be unique. Either make the option non-multiple, or allow each plugin config to supply its own unique identifier and pass that through here.

Suggested change
multiple=True,
help="One-shot plugin configuration to add to COMPASS before processing",
)
def process(config, verbose, no_progress, plugin):
"""Download and extract ordinances for a list of jurisdictions"""
config = load_config(config)
for one_shot_plugin_config in plugin:
create_schema_based_one_shot_extraction_plugin(
config=one_shot_plugin_config, tech=config["tech"]
help="One-shot plugin configuration to add to COMPASS before processing",
)
)
def process(config, verbose, no_progress, plugin):
"""Download and extract ordinances for a list of jurisdictions"""
config = load_config(config)
if plugin is not None:
create_schema_based_one_shot_extraction_plugin(
config=plugin, tech=config["tech"]

Copilot uses AI. Check for mistakes.
requirements. Keep the response concise and consistent.\
"""
_TEXT_COLLECTION_MAIN_PROMPT = """\
Determine wether this text excerpt contains any information relevant to \
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: _TEXT_COLLECTION_MAIN_PROMPT says "Determine wether..."; this should be "whether" to avoid prompting errors and to keep docs/prompt text professional.

Suggested change
Determine wether this text excerpt contains any information relevant to \
Determine whether this text excerpt contains any information relevant to \

Copilot uses AI. Check for mistakes.
Comment on lines +258 to +263
config_type = ConfigType(config_filepath.name.split(".")[-1])
config = config_type.load(config_filepath)
if resolve_paths:
return resolve_all_paths(config, config_filepath.parent)

return config
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_config calls ConfigType(...) directly, so an unknown extension (e.g. .txt) will raise a ValueError from Enum construction rather than the documented COMPASSValueError (and the unit test expects COMPASSValueError). Consider catching the Enum error and raising COMPASSValueError with a clear message listing supported extensions.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Update to logic or general code improvements new computation Update that adds a new computation method p-critical Priority: critical topic-python-general Issues/pull requests related to python topic-python-llm Issues/pull requests related to LLMs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant