Skip to content

Run distributed query/carve based on custom tags #529

Open
@zhuoyuan-liu

Description

@zhuoyuan-liu
Contributor

We just explored osctrl-admin and found that we can add a custom tag to each node/device. However, after added the custom tag, we cannot run a distributed query based on this tag. It would be great help if can also run the query based on these tags.

I would like to contribute to this feature, but I would like to know more details about the implementation.

In the architecture definition, the osctrl-admin should only talk to osctrl-api instead of the database directly. However, I found osctrl-admin would interact with the DB directly in many cases. I am completely fine with implementation and want to make sure if the rest of the changes are allowed to do so.
image

From the source code, I can see that currently it's based on four types of tags: env, platform, UUID and localname. I guess the easiest solution is to add an extra field so that we can pass the custom tags. What do you think?

Activity

self-assigned this
on Oct 11, 2024
javuto

javuto commented on Oct 11, 2024

@javuto
Collaborator

This is something that I had planned to implement since I added tags, not only for distributed queries but for file carves as well (they are technically a type of distributed query), see #76 and #77
I see two different implementations that can be done:

  1. Add a new field for tags to the existing implementation - It will be faster to implement but it will contribute to potential performance issues involving the backend.
  2. Reimplement completely how distributed queries work - It will take longer but no more potential backend performance issues.
changed the title [-]Run distributed query based on custom tags[/-] [+]Run distributed query/carve based on custom tags[/+] on Oct 11, 2024
zhuoyuan-liu

zhuoyuan-liu commented on Oct 14, 2024

@zhuoyuan-liu
ContributorAuthor

Hi @javuto , I have the following idea with Redis:

  • When creating a distributed query, we find all target nodes based on the tags
  • Create a Redis set using node uuid as the key and put the query id into the set. Redis allows fast lookups to fetch all active tasks for a client, using operations like SMEMBERS to retrieve tasks associated with a client.
  • When nodes finish queries and send results back, mark the corresponding queries completed by removing them from the active task set using SREM (set remove). This ensures that the next time the client asks for queries, only unfinished queries will be returned.

I think it's enough for us, but if you want to actively track how many nodes are unfished, we can create another Redis set to maintain a list of unfinished nodes for each query or just query logs returned by nodes.

Benefits:

  • avoid massive database read and write. In the past, the read request need to go through the db and find a list of distributed query for the target node and the write request need to update the counter in db for each distributed query.
  • reduce latency for distributed read request since we changed the db query to a redis set query.
zhuoyuan-liu

zhuoyuan-liu commented on Nov 5, 2024

@zhuoyuan-liu
ContributorAuthor

With this change #558, it would be much easier to implement queries based on additional tags without significant performance impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

🙏 feature requestRequest for new featureosctrl-adminosctrl-admin related changesosctrl-tlsosctrl-tls related changesqueriesOn-demand queries related issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @javuto@zhuoyuan-liu

      Issue actions

        Run distributed query/carve based on custom tags · Issue #529 · jmpsec/osctrl