Skip to content

Conversation

@qmadev
Copy link
Contributor

@qmadev qmadev commented Aug 16, 2025

closes #139

Testing on VMs with IIS, Nginx and Caddy seems to work fine with an adjusted version of dissect.target #1287.

- Collecting file C:/Windows/System32/LogFiles/HTTPERR/httperr1.log to: fs/C:/Windows/System32/LogFiles/HTTPERR/httperr1.log
- Collecting file C:/Windows/System32/LogFiles/HTTPERR/httperr1.log succeeded

- Collecting file /var/caddy/access.json to: fs/$rootfs$/var/caddy/access.json
- Collecting file /var/caddy/access.json succeeded
- Collecting file /root/access.log to: fs/$rootfs$/root/access.log
- Collecting file /root/access.log succeeded

The issue specifies that tests should be written. Do we still want that, even with the tests that are already there in dissect.target?

@qmadev qmadev changed the title Fix/webserver collection Extend webserver collection Aug 16, 2025
@qmadev
Copy link
Contributor Author

qmadev commented Aug 16, 2025

What version should I put at line 873? See question marks.

@qmadev qmadev requested a review from Schamper August 18, 2025 16:31
@qmadev
Copy link
Contributor Author

qmadev commented Aug 25, 2025

Any news on this?

@qmadev
Copy link
Contributor Author

qmadev commented Sep 2, 2025

@Schamper do you have some time to take a second look at this?

@Schamper
Copy link
Member

Schamper commented Sep 2, 2025

I'd like @twiggler to have an opinion too.

@qmadev
Copy link
Contributor Author

qmadev commented Sep 10, 2025

@twiggler do you have some time to review this?

@qmadev
Copy link
Contributor Author

qmadev commented Oct 20, 2025

@Schamper
@twiggler

Just a friendly reminder.

@twiggler
Copy link
Contributor

@Schamper @twiggler

Just a friendly reminder.

Sorry @qmadev this slipped through the cracks, I will take a look this week

@twiggler
Copy link
Contributor

@Schamper @qmadev

One detail I don't like is that we are invoking a private method in target from acquire. I think this is a design oversight in the implementation of fox-it/dissect.target#1082. Clearly, _get_paths should be made public by removing the underscore. However, this causes a name-clash with the get_paths defined in the Plugin class (template method pattern).

Perhaps we should rename _get_paths to get_paths_from_target (contrast with _get_paths_direct).

Otherwise it looks fine!

@Schamper
Copy link
Member

I think this is a design oversight in the implementation of fox-it/dissect.target#1082.

Not necessarily. fox-it/dissect.target#1082 (review) goes into more detail here:

This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?

@twiggler
Copy link
Contributor

I think this is a design oversight in the implementation of fox-it/dissect.target#1082.

Not necessarily. fox-it/dissect.target#1082 (review) goes into more detail here:

This method still doesn't quite cover "getting all paths that are relevant to a plugin", since it's only really meant for "paths the plugin will parse for artefacts", and not configuration paths. So we can't quite 1:1 use this method to collect files with acquire. I think we should focus on this approach first, but perhaps in a next stage we can think of how to tackle that, perhaps a _get_aux_paths() to be implemented by plugins, and a Plugin.get_all_paths to be consumed by acquire?

Ah yes, we decided to defer.

Although I don´t see any config paths yet in the companion target PR, I think it would indeed be a good idea to introduce Plugin::get_all_paths already, which can delegate to Plugin::_get_paths for the time being, if it exists in the subclass. That way, we have the API in place, otherwise contributors might be inclined to keep using Plugin::_get_paths

@qmadev
Copy link
Contributor Author

qmadev commented Oct 24, 2025

To clarify, you want me to create a function Plugin.get_all_paths() that will return a collection/iterator something containing the paths to the config files and the (resolved) log files? And you want acquire to use that function instead of the _get_paths() function?

I do think the naming is confusing. Maybe something like
Plugin._get_log_paths()
Plugin._get_config_paths()
Plugin.get_all_paths()

would be clearer?

@twiggler
Copy link
Contributor

twiggler commented Oct 24, 2025

To clarify, you want me to create a function Plugin.get_all_paths() that will return a collection/iterator something containing the paths to the config files and the (resolved) log files? And you want acquire to use that function instead of the _get_paths() function?

Yes. Although I think we can leave the config_file part, because we don´t need that yet (I don´t see them in fox-it/dissect.target#1287)

The implementation of Plugin::get_all_paths() indeed invokes Plugin::_get_paths, while handling the possible exception when it is not implemented by yielding nothing. We could also write a default implementation for Plugin::_get_paths which returns nothing.

You can then remove the

 if not hasattr(subclass, "_get_paths"):
                continue

snippet, and indeed invoke Plugin::get_all_paths() from acquire.

The implementations of Plugin::get_all_paths is likely to change in the future, but the idea is that the interface remains the same.

I agree that the naming of the functions is not ideal.
_get_paths -> _get_log_paths seems fine to me,
but I think that _get_paths_direct should then also be renamed to _get_direct_log_paths or similar.

Actually, this is a bit of a contentious issue because I would like to keep the direct files feature restricted to these plugin, primarily because I don´t like how it works :). I would advocate for users to run parsers directly on log files without the target machinery.

Next week I am away, these are my 2 cents, Schamper can take over in my absence.

@Schamper
Copy link
Member

Schamper commented Oct 24, 2025

I agree that the naming of the functions is not ideal.
_get_paths -> _get_log_paths seems fine to me,
but I think that _get_paths_direct should then also be renamed to _get_direct_log_paths or similar.

Only a very small amount of plugins work on "logs", so the name is fine in my opinion. Most artifacts are not "logs".

Plugin._get_config_paths()

And not every "other" path is a config.

Actually, this is a bit of a contentious issue because I would like to keep the direct files feature restricted to these plugin, primarily because I don´t like how it works :). I would advocate for users to run parsers directly on log files without the target machinery.

That doesn't change how a Plugin is supposed to report what files it requires.

@twiggler
Copy link
Contributor

twiggler commented Oct 24, 2025

That doesn't change how a Plugin is supposed to report what files it requires.

Correct, because that is handled by Plugin.get_all_paths()

I don´t feel particularly strong about _get_paths or _get_log_paths; my only point is not to couple with module internals of dissect.target in acquire

@qmadev
Copy link
Contributor Author

qmadev commented Oct 24, 2025

I see these functions in dissect.target

    def get_paths(self) -> Iterator[Path]:
        if self.target.is_direct:
            yield from self._get_paths_direct()
        else:
            yield from self._get_paths()

    def _get_paths_direct(self) -> Iterator[Path]:
        """Return all paths as given by the user."""
        for path in self.target._loader.paths:
            yield self.target.fs.path(str(path))

    def _get_paths(self) -> Iterator[Path]:
        """Return all files of interest to the plugin.

        To be implemented by the plugin subclass.
        """
        raise NotImplementedError

Why do we need to create a new function get_all_paths() if there already is a get_paths() that uses _get_paths()?

@Schamper
Copy link
Member

@qmadev get_paths would refer to the "artifact paths". I.e. an actual log file or database to parse. But often you need more contextual files, such as webserver configuration files. We might call this "auxiliary paths". And then a get_all_paths would encompass both "artifact paths" and "auxiliary paths".

It's all very confusing but I've not had a better idea yet.

@qmadev
Copy link
Contributor Author

qmadev commented Oct 24, 2025

fox-it/dissect.target#1287

Is this what you guys meant?

All seems to work fine. Config files get collected as well.

Copy link
Contributor

@twiggler twiggler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed this is how we envisioned it.

@qmadev
Copy link
Contributor Author

qmadev commented Nov 19, 2025

@twiggler comments here and in dissect.target should be resolved!

@qmadev qmadev requested review from Schamper and twiggler November 23, 2025 23:31
@codecov
Copy link

codecov bot commented Nov 27, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 44.94%. Comparing base (9a1e62f) to head (e06ac42).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
acquire/acquire.py 50.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #264      +/-   ##
==========================================
+ Coverage   44.90%   44.94%   +0.03%     
==========================================
  Files          26       26              
  Lines        3543     3558      +15     
==========================================
+ Hits         1591     1599       +8     
- Misses       1952     1959       +7     
Flag Coverage Δ
unittests 44.94% <50.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Schamper Schamper dismissed their stale review November 27, 2025 16:41

Dismissed.

@Schamper Schamper removed their request for review November 27, 2025 16:42
@twiggler twiggler force-pushed the fix/webserver_collection branch from f2ae8b8 to 9224463 Compare December 1, 2025 09:03
@twiggler twiggler self-requested a review December 1, 2025 10:09
Copy link
Contributor

@twiggler twiggler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also remove the chain import

DeprecationWarning,
stacklevel=2,
)
return WebserverLog.get_spec_additions(cls, target, cli_args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return WebserverLog.get_spec_additions(cls, target, cli_args)
return Webserver.get_spec_additions(cls, target, cli_args)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

@Miauwkeru
Copy link
Contributor

don't forget to change the dissect.target dependency inside the pyproject.toml file

diff --git a/pyproject.toml b/pyproject.toml
index 36d7e33..9032ab7 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -26,7 +26,7 @@ classifiers = [
 ]
 dependencies = [
     "dissect.cstruct>=4,<5",
-    "dissect.target>=3.24,<4",
+    "dissect.target>=3.25.dev,<4",  # TODO: update on release
 ]
 dynamic = ["version"]
 
@@ -47,7 +47,7 @@ full = [
 dev = [
     "acquire[full]",
     "dissect.cstruct>=4.0.dev,<5.0.dev",
-    "dissect.target[dev]>=3.24.dev,<4.0.dev",
+    "dissect.target[dev]>=3.25.dev,<4.0.dev",
 ]
 
 [dependency-groups]

@qmadev
Copy link
Contributor Author

qmadev commented Dec 2, 2025

Will fix this Thursday.

@qmadev qmadev requested a review from twiggler December 4, 2025 15:41
@twiggler twiggler merged commit 8690319 into fox-it:main Dec 9, 2025
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Collect all nginx&apache logs in Acquire

4 participants