Skip to content

Conversation

@twiggler
Copy link
Contributor

@twiggler twiggler commented Dec 2, 2025

No description provided.

@codspeed-hq
Copy link

codspeed-hq bot commented Dec 2, 2025

Merging this PR will degrade performance by 17.71%

❌ 2 regressed benchmarks
✅ 9 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Benchmark BASE HEAD Efficiency
test_benchmark[_data/loaders/acquire/test-windows-fs-c.zip-ZipLoader] 1.9 ms 2.3 ms -16.12%
test_benchmark[_data/loaders/acquire/test-windows-fs-c-relative.tar-TarLoader] 1.8 ms 2.2 ms -17.71%

Comparing poc-loader-args (beb1b9a) with main (4e9381b)

Open in CodSpeed


# The loader now knows how to print its own help.
# We just need to instantiate it. Note that the path is not important here.
loader_instance = loader_cls(pathlib.Path())
Copy link
Contributor Author

@twiggler twiggler Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing a dummy path is a bit dubious.

generate_argparse_for_plugin_class

def map(self, target: Target) -> None:
"""Wrapper around the _map function that handles argument passing."""
# The loader now parses its own arguments
loader_options = self._parser.parse_args(self.loader_args)
Copy link
Contributor Author

@twiggler twiggler Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of rest arguments, use parse_known_args



def open_targets(args: argparse.Namespace) -> Iterator[Target]:
def open_targets(args: argparse.Namespace, loader_args: list[str] | None = None) -> Iterator[Target]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: use loader namespace

@twiggler twiggler requested a review from Miauwkeru January 8, 2026 09:06
Comment on lines 70 to 71
self, path: Path, *, parsed_path: urllib.parse.ParseResult | None = None, resolve: bool = True,
**kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self, path: Path, *, parsed_path: urllib.parse.ParseResult | None = None, resolve: bool = True,
**kwargs
self, path: Path, *, parsed_path: urllib.parse.ParseResult | None = None, resolve: bool = True, **kwargs

Comment on lines 434 to 438
elif not path.is_dir():
raise TargetError(f"Failed to find loader for {path}: path is not a directory")

for entry in path.iterdir():
for target in _open_all(entry, include_children=include_children):
at_least_one_loaded = True
yield target
if path.is_dir():
for entry in path.iterdir():
for target in _open_all(entry, include_children=include_children):
at_least_one_loaded = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

# Pass the parsed options directly to the implementation map function
return self._map(target, **vars(loader_options))

def _map(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _map(self) -> None:
def _map(self, **kwargs) -> None:

Comment on lines -60 to -61
force_dirfs = "force-directory-fs" in self.parsed_query
fallback_to_dirfs = "fallback-to-directory-fs" in self.parsed_query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would force this new method of doing these command line arguments where acquire will stop functioning properly as it uses these query parameters. https://github.com/fox-it/acquire/blob/c109fe4f5e642078931fc8c3d80ab7f39e1c496f/acquire/acquire.py#L2433

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep both approaches if we combine the force_directory_fs argument with the one from self.parsed_query.

However, I also think it is a bit weird to apply "stringified" arguments when using target from acquire. Perhaps we can get rid of this URI scheme altogether. The question then becomes how to pass arguments to the loader programmatically.

- Do not instantiate full instance of loader when displaying help
- Process all known arguments
# Pass the parsed options directly to the implementation map function
return self._map(target, **vars(loader_options))

def _map(self, **kwargs) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _map(self, **kwargs) -> None:
def _map(self, target: Target, *, **kwargs) -> None:

After looking at it again, maybe like this instead as a target always gets passed down and we don't want any further positional arguments as far as I know

# Do not continue processing, we requested a child but got none
raise

return target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a good idea to just remove your changes from this specific file

def _create_parser(cls: type[Loader]) -> argparse.ArgumentParser:
"""Creates the argument parser for this loader."""
parser = argparse.ArgumentParser(
prog=f"loader:{cls.__name__.lower()}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to keep it in line with how the help is done for functions

E.g, like:

$ target-query -f activitiescache -h
usage: target-query -f activitiescache [-h]

as using -h for the loader will now not give any information on how to use it with the tool:

$ target-query -L local -h
usage: loader:localloader [--force-directory-fs] [--fallback-to-directory-fs]

which is a bit odd for a user perspective

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment, can be deferred when we implement it IMO

Comment on lines +109 to +115
if not args.loader:
parser.print_help()
return 0

if (loader_cls := LOADERS_BY_SCHEME.get(args.loader, None)) is None:
print(f"Error: Loader '{args.loader}' not found.")
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving this check to e.g. process generic arguments function in tools/utils/cli.py will add this behaviour to all target-* tooling instead of just query

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this in the proper implementation

dict(urllib.parse.parse_qsl(parsed_path.query, keep_blank_values=True)) if parsed_path else {}
)

self._parser = self._create_parser(self.__class__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still not quite sure we want to create the parser in the __init__ of the loader. As it feels (to me) that the argument parser should be created / handled before the loader is initiated.

Copy link
Contributor Author

@twiggler twiggler Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous objection was that showing the help message required instantiation of the loader, which is no longer the case?

From an architectural standpoint, parsing arguments typically is done in a different (outermost) layer of a program. Is that the reason you want it done before the loader is initiated?

I don't necessarily disagree with that, but plugin arguments are also processed pretty "late":

known_args, _ = parser.parse_known_args(args)

Moreover, originally the parsing of URI arguments was also done "late":
[target.parsed_path = urllib.parse.urlparse(str(target.path))](

target.parsed_path = urllib.parse.urlparse(str(target.path))
)

TLDR I think the poc is in line what is already there, while simultaneously agreeing it is architecturally unsound. If we change it, I propose we moving the parsing of arguments to the outermost layer of the program for all cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants