Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run distributed query/carve based on custom tags #529

Open
zhuoyuan-liu opened this issue Oct 11, 2024 · 3 comments
Open

Run distributed query/carve based on custom tags #529

zhuoyuan-liu opened this issue Oct 11, 2024 · 3 comments
Assignees
Labels
osctrl-admin osctrl-admin related changes osctrl-tls osctrl-tls related changes 🙏 feature request Request for new feature queries On-demand queries related issues

Comments

@zhuoyuan-liu
Copy link
Contributor

We just explored osctrl-admin and found that we can add a custom tag to each node/device. However, after added the custom tag, we cannot run a distributed query based on this tag. It would be great help if can also run the query based on these tags.

I would like to contribute to this feature, but I would like to know more details about the implementation.

In the architecture definition, the osctrl-admin should only talk to osctrl-api instead of the database directly. However, I found osctrl-admin would interact with the DB directly in many cases. I am completely fine with implementation and want to make sure if the rest of the changes are allowed to do so.
image

From the source code, I can see that currently it's based on four types of tags: env, platform, UUID and localname. I guess the easiest solution is to add an extra field so that we can pass the custom tags. What do you think?

@javuto javuto self-assigned this Oct 11, 2024
@javuto javuto added osctrl-tls osctrl-tls related changes osctrl-admin osctrl-admin related changes queries On-demand queries related issues 🙏 feature request Request for new feature labels Oct 11, 2024
@javuto
Copy link
Collaborator

javuto commented Oct 11, 2024

This is something that I had planned to implement since I added tags, not only for distributed queries but for file carves as well (they are technically a type of distributed query), see #76 and #77
I see two different implementations that can be done:

  1. Add a new field for tags to the existing implementation - It will be faster to implement but it will contribute to potential performance issues involving the backend.
  2. Reimplement completely how distributed queries work - It will take longer but no more potential backend performance issues.

This was referenced Oct 11, 2024
@javuto javuto changed the title Run distributed query based on custom tags Run distributed query/carve based on custom tags Oct 11, 2024
@zhuoyuan-liu
Copy link
Contributor Author

zhuoyuan-liu commented Oct 14, 2024

Hi @javuto , I have the following idea with Redis:

  • When creating a distributed query, we find all target nodes based on the tags
  • Create a Redis set using node uuid as the key and put the query id into the set. Redis allows fast lookups to fetch all active tasks for a client, using operations like SMEMBERS to retrieve tasks associated with a client.
  • When nodes finish queries and send results back, mark the corresponding queries completed by removing them from the active task set using SREM (set remove). This ensures that the next time the client asks for queries, only unfinished queries will be returned.

I think it's enough for us, but if you want to actively track how many nodes are unfished, we can create another Redis set to maintain a list of unfinished nodes for each query or just query logs returned by nodes.

Benefits:

  • avoid massive database read and write. In the past, the read request need to go through the db and find a list of distributed query for the target node and the write request need to update the counter in db for each distributed query.
  • reduce latency for distributed read request since we changed the db query to a redis set query.

@zhuoyuan-liu
Copy link
Contributor Author

With this change #558, it would be much easier to implement queries based on additional tags without significant performance impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
osctrl-admin osctrl-admin related changes osctrl-tls osctrl-tls related changes 🙏 feature request Request for new feature queries On-demand queries related issues
Projects
None yet
Development

No branches or pull requests

2 participants