sort generated data files #1352

MichaelTimbert · 2024-05-31T12:49:03Z

Purpose

This PR improve generating data files by sorting entries in *.data files and ensuring JSON dump are also sorted.
The goal is to reduce the mess of deletions and insertions from git diff when t/*.data files are modified.

Context

Fixes #1265

How to test this PR

run ZONEMASTER_RECORD=1 make test to save new formated *.data file.

"t/*.data" file must be sorted
each "Zonemaster::LDNS::Packet" JSON dump must also be sorted

Notes

To improve visibility of change we can also activate indentation of JSON dump with $json->indent( 1 );.
This makes it easier to read the differences between JSON dump with vimdiff or git diff but makes *.data files less readable/compact.

matsduf · 2024-05-31T12:56:50Z

* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AAC' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

What benefit does that give?

(...) but makes *.data files less readable/compact.

What are the backsides of that?

matsduf · 2024-05-31T12:58:24Z

Please give a more descriptive headline of the PR.

MichaelTimbert · 2024-06-03T07:59:25Z

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line.
The main drawback is that we have to rewrite the *.data loading code.
After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

MichaelTimbert · 2024-06-03T08:44:08Z

* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AA' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

What benefit does that give?

To ensure the reproducibility of *.data files when doing a ZONEMASTER_RECORD=1 make test.
Otherwise, each time a new identifier will be used and the packet will be considered to have changed even if the content of the request is identical.

matsduf · 2024-06-03T09:13:10Z

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.

MichaelTimbert · 2024-06-03T16:16:48Z

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.

This is something i proposed, it is not implemented in this PR. (i have edited the PR's notes)

mattias-p · 2024-06-03T18:22:28Z

With this approach we gain idempotence in the sense that the saved file is the same every time (provided that no code or configuration has changed). But we also lose idempotence in the sense that the original analysis sees the real IDs, but when we analyze a restored cache the analysis sees zeroed out IDs. In this light wouldn't it be more proper to clear the ID at response reception? I.e. before analysis and insertion into the cache.

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

matsduf · 2024-06-03T22:00:54Z

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.

mattias-p · 2024-06-05T08:29:20Z

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.

Yes, it wouldn't make much sense to clear the response's Message ID before it's matched to the request's Message ID. The ldns library takes care of this matching for us and just returns the correct response matching our request.

What I'm suggesting is that we clear the ID field in Zonemaster Engine or perhaps in Zonemaster LDNS after ldns returns it but before we add it to the cache.

matsduf · 2024-09-04T14:26:45Z

I suggest that we discuss this at the F2F.

to ensure the idempotence of generated file we need to be sure that all keys of JSON exported data are sorted, especially for "Zonemaster::LDNS::Packet".

This reverts commit 0afe50e.

matsduf · 2024-11-11T18:10:03Z

@MichaelTimbert, please merge.

tgreenx

I suggest that the description of this PR is also updated to match the latest changes

MichaelTimbert requested review from mattias-p, matsduf and marc-vanderwal May 31, 2024 12:49

MichaelTimbert linked an issue May 31, 2024 that may be closed by this pull request

Unit tests: improve idempotence of generating data files #1265

Open

matsduf requested a review from tgreenx May 31, 2024 12:57

matsduf added this to the v2024.1 milestone May 31, 2024

matsduf added the V-Patch Versioning: The change gives an update of patch in version. label May 31, 2024

MichaelTimbert changed the title ~~1265~~ sort generated data files for idempotence Jun 3, 2024

tgreenx modified the milestones: v2024.1, v2024.2 Jun 11, 2024

MichaelTimbert added 5 commits November 11, 2024 14:57

Sort cache entries before saving them

544c0ec

Activate canonical on JSON module

a7bfc3b

to ensure the idempotence of generated file we need to be sure that all keys of JSON exported data are sorted, especially for "Zonemaster::LDNS::Packet".

Set ID field to 0

e1ed8a0

Coding style

06a4ff0

Revert "Set ID field to 0"

17fde05

This reverts commit 0afe50e.

MichaelTimbert force-pushed the 1265 branch from 211e81c to 17fde05 Compare November 11, 2024 13:59

MichaelTimbert changed the title ~~sort generated data files for idempotence~~ sort generated data files Nov 11, 2024

matsduf approved these changes Nov 11, 2024

View reviewed changes

mattias-p approved these changes Nov 11, 2024

View reviewed changes

tgreenx approved these changes Nov 12, 2024

View reviewed changes

MichaelTimbert merged commit 808889b into zonemaster:develop Nov 12, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sort generated data files #1352

sort generated data files #1352

MichaelTimbert commented May 31, 2024 •

edited

Loading

matsduf commented May 31, 2024 •

edited by MichaelTimbert

Loading

matsduf commented May 31, 2024

MichaelTimbert commented Jun 3, 2024

MichaelTimbert commented Jun 3, 2024

matsduf commented Jun 3, 2024

MichaelTimbert commented Jun 3, 2024 •

edited

Loading

mattias-p commented Jun 3, 2024

matsduf commented Jun 3, 2024

mattias-p commented Jun 5, 2024

matsduf commented Sep 4, 2024

matsduf commented Nov 11, 2024

tgreenx left a comment •

edited

Loading

sort generated data files #1352

sort generated data files #1352

Conversation

MichaelTimbert commented May 31, 2024 • edited Loading

Purpose

Context

How to test this PR

Notes

matsduf commented May 31, 2024 • edited by MichaelTimbert Loading

matsduf commented May 31, 2024

MichaelTimbert commented Jun 3, 2024

MichaelTimbert commented Jun 3, 2024

matsduf commented Jun 3, 2024

MichaelTimbert commented Jun 3, 2024 • edited Loading

mattias-p commented Jun 3, 2024

matsduf commented Jun 3, 2024

mattias-p commented Jun 5, 2024

matsduf commented Sep 4, 2024

matsduf commented Nov 11, 2024

tgreenx left a comment • edited Loading

Choose a reason for hiding this comment

MichaelTimbert commented May 31, 2024 •

edited

Loading

matsduf commented May 31, 2024 •

edited by MichaelTimbert

Loading

MichaelTimbert commented Jun 3, 2024 •

edited

Loading

tgreenx left a comment •

edited

Loading