Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: lock-free counters for auth_zone up/down queries #1169

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

sakateka
Copy link
Contributor

It is simplified version of #1160

@gthess
Copy link
Member

gthess commented Nov 15, 2024

Hi @sakateka, sorry for the delayed response.

Removing the lock for authzones in #1160 is not a desired feature because the lock is needed during operation to handle zone updates with XFR for example. So I would like to close that PR if you don't mind :)

I like the idea of moving the authzone counters out of locks but putting them in worker->stats is not the correct solution because when libunbound is used(the library not the binary), there is no server worker that has stats and that would eventually lead to a crash. I am talking about accessing counters through env->stats from iterator.c and authzone.c.

If you would like to rework the PR you could put the counters in mesh_area and you could follow the paradigm of mesh_area->ans_expired for example.

@sakateka
Copy link
Contributor Author

when libunbound is used(the library not the binary), there is no server worker that has stats

Indeed, I missed it, thank you!
I'll rework the PR.

@sakateka sakateka force-pushed the lock-free-az-counters branch from 0cfb0c7 to 29c77d8 Compare November 15, 2024 20:00
@sakateka
Copy link
Contributor Author

Hi @gthess, it seems I didn't mess up anything and did everything right this time.
Take a look at this version of the patch, please.

Copy link
Member

@gthess gthess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!
Just a small change and this can be merged.

daemon/stats.c Outdated
Comment on lines 328 to 332
s->svr.num_query_authzone_up = (long long)worker->env.mesh->num_query_authzone_up;
s->svr.num_query_authzone_down = (long long)worker->env.mesh->num_query_authzone_down;
} else {
s->svr.num_query_authzone_up = 0;
s->svr.num_query_authzone_down = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily needed in this case but for consistency with other mesh-> values these should be added.
The else is not needed since the worker stats are initialized and reset based on configuration.

Suggested change
s->svr.num_query_authzone_up = (long long)worker->env.mesh->num_query_authzone_up;
s->svr.num_query_authzone_down = (long long)worker->env.mesh->num_query_authzone_down;
} else {
s->svr.num_query_authzone_up = 0;
s->svr.num_query_authzone_down = 0;
s->svr.num_query_authzone_up += (long long)worker->env.mesh->num_query_authzone_up;
s->svr.num_query_authzone_down += (long long)worker->env.mesh->num_query_authzone_down;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@sakateka sakateka force-pushed the lock-free-az-counters branch from 29c77d8 to 2c72a49 Compare November 19, 2024 15:55
Copy link
Member

@gthess gthess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Now counting for auth_zone stats can happen lock free per thread and the values can be aggregated for the stat commands.

@gthess gthess merged commit c1e9d7b into NLnetLabs:master Nov 19, 2024
1 check passed
gthess added a commit that referenced this pull request Nov 19, 2024
- Merge #1169 from Sergey Kacheev, fix: lock-free counters for
  auth_zone up/down queries.
@sakateka sakateka deleted the lock-free-az-counters branch November 19, 2024 16:11
jedisct1 added a commit to jedisct1/unbound that referenced this pull request Dec 26, 2024
* nlnet/master: (26 commits)
  - For NLnetLabs#1175, update serve-expired tests.
  - Fix NLnetLabs#1175: serve-expired does not adhere to secure-by-default   principle. The default value of serve-expired-client-timeout   is set to 1800 as suggested by RFC8767.
  - Fix comparison to help static analyzer.
  Changelog entry for NLnetLabs#1169: - Merge NLnetLabs#1169 from Sergey Kacheev, fix: lock-free counters for   auth_zone up/down queries.
  fix: lock-free counters for auth_zone up/down queries
  - Fix for NLnetLabs#1183: release nsec3 hashes per test file.
  - Fix NLnetLabs#1183: the data being used is released in method   nsec3_hash_test_entry.
  - Complete fix for max-global-quota to 200.
  - More descriptive text for 'harden-algo-downgrade'.
  - Increase the default of max-global-quota to 200 from 128 after   operational feedback. Still keeping the possible amplification   factor (CAMP related issues) in the hundreds.
  Changelog entry for: - Fix SETEX check during Redis (re)initialization.
  - Fix SETEX check during Redis (re)initialization.
  - Fix to log redis timeout error string on failure.
  - Fix for the serve expired DNSSEC information fix, it would not allow   current delegation information be updated in cache. The fix allows   current delegation and validation recursion information to be   updated, but as a consequence no longer has certain expired   information around for later dnssec valid expired responses.
  Changelog note for NLnetLabs#1167 - Merge NLnetLabs#1167: Makefile.in: fix occasional parallel build failures   around bison rule.
  Makefile.in: fix occasional parallel build failures around bison rule (NLnetLabs#1167)
  - Fix redis that during a reload it does not fail if the redis   server does not connect or does not respond. It still logs the   errors and if the server is up checks expiration features.
  - Fix redis that during a reload it does not fail if the redis   server does not connect or does not respond. It still logs the   errors and if the server is up checks expiration features.
  Changelog entry for NLnetLabs#1157: - Merge NLnetLabs#1157 from Liang Zhu, Fix heap corruption when calling   ub_ctx_delete in Windows.
  Fix heap corruption when calling ub_ctx_delete in Windows (NLnetLabs#1157)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants