Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogsDB: Add synthetic_source_keep = none to arrays where order/duplicates do not matter #2376

Open
andrewkroh opened this issue Sep 5, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@andrewkroh
Copy link
Member

andrewkroh commented Sep 5, 2024

For array fields treated as unordered sets, we should add synthetic_source_keep: "none" to the mappings to optimize storage under LogsDB. Fields like host.ip and related.ip would be candidates because order and duplicates are irrelevant.

Adding this option prevents the array field from being stored in _source.

Support for this is in-progress in Elasticsearch and will be first available in 8.16.

References

Related

@andrewkroh
Copy link
Member Author

A first step that can be taken here is to add support into the ECS repo to allow expressing which fields are unordered sets. This can be done before Elasticsearch has the synthetic_source_keep: "none" mapping parameter. Once Elasticsearch has it then we can update the generators to output Elasticsearch mappings containing the parameter.

I would like to begin the process of annotating the fields that can receive this optimization, but we need support in the schema/*.yml files first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant