Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort generated data files #1352

Merged
merged 5 commits into from
Nov 12, 2024
Merged

Conversation

MichaelTimbert
Copy link

@MichaelTimbert MichaelTimbert commented May 31, 2024

Purpose

This PR improve generating data files by sorting entries in *.data files and ensuring JSON dump are also sorted.
The goal is to reduce the mess of deletions and insertions from git diff when t/*.data files are modified.

Context

Fixes #1265

How to test this PR

run ZONEMASTER_RECORD=1 make test to save new formated *.data file.

  • "t/*.data" file must be sorted
  • each "Zonemaster::LDNS::Packet" JSON dump must also be sorted

Notes

To improve visibility of change we can also activate indentation of JSON dump with $json->indent( 1 );.
This makes it easier to read the differences between JSON dump with vimdiff or git diff but makes *.data files less readable/compact.

@matsduf
Copy link
Contributor

matsduf commented May 31, 2024

* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AAC' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

What benefit does that give?

(...) but makes *.data files less readable/compact.

What are the backsides of that?

@matsduf matsduf requested a review from tgreenx May 31, 2024 12:57
@matsduf matsduf added this to the v2024.1 milestone May 31, 2024
@matsduf matsduf added the V-Patch Versioning: The change gives an update of patch in version. label May 31, 2024
@matsduf
Copy link
Contributor

matsduf commented May 31, 2024

Please give a more descriptive headline of the PR.

@MichaelTimbert MichaelTimbert changed the title 1265 sort generated data files for idempotence Jun 3, 2024
@MichaelTimbert
Copy link
Author

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line.
The main drawback is that we have to rewrite the *.data loading code.
After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

@MichaelTimbert
Copy link
Author

* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AA' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

What benefit does that give?

To ensure the reproducibility of *.data files when doing a ZONEMASTER_RECORD=1 make test.
Otherwise, each time a new identifier will be used and the packet will be considered to have changed even if the content of the request is identical.

@matsduf
Copy link
Contributor

matsduf commented Jun 3, 2024

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.

@MichaelTimbert
Copy link
Author

MichaelTimbert commented Jun 3, 2024

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.

This is something i proposed, it is not implemented in this PR. (i have edited the PR's notes)

@mattias-p
Copy link
Member

With this approach we gain idempotence in the sense that the saved file is the same every time (provided that no code or configuration has changed). But we also lose idempotence in the sense that the original analysis sees the real IDs, but when we analyze a restored cache the analysis sees zeroed out IDs. In this light wouldn't it be more proper to clear the ID at response reception? I.e. before analysis and insertion into the cache.

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

@matsduf
Copy link
Contributor

matsduf commented Jun 3, 2024

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.

@mattias-p
Copy link
Member

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.

Yes, it wouldn't make much sense to clear the response's Message ID before it's matched to the request's Message ID. The ldns library takes care of this matching for us and just returns the correct response matching our request.

What I'm suggesting is that we clear the ID field in Zonemaster Engine or perhaps in Zonemaster LDNS after ldns returns it but before we add it to the cache.

@tgreenx tgreenx modified the milestones: v2024.1, v2024.2 Jun 11, 2024
@matsduf
Copy link
Contributor

matsduf commented Sep 4, 2024

I suggest that we discuss this at the F2F.

to ensure the idempotence of generated file we need to be sure that all keys of JSON exported data are sorted, especially for "Zonemaster::LDNS::Packet".
This reverts commit 0afe50e.
@MichaelTimbert MichaelTimbert changed the title sort generated data files for idempotence sort generated data files Nov 11, 2024
@matsduf
Copy link
Contributor

matsduf commented Nov 11, 2024

@MichaelTimbert, please merge.

Copy link
Contributor

@tgreenx tgreenx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that the description of this PR is also updated to match the latest changes

@MichaelTimbert MichaelTimbert merged commit 808889b into zonemaster:develop Nov 12, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
V-Patch Versioning: The change gives an update of patch in version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit tests: improve idempotence of generating data files
4 participants