Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run nts key provider on dedicated thread #1208

Conversation

mickvangelderen
Copy link

Loading the initial keyset is done using spawn_blocking since it involves file I/O. The watch loop is ran on a dedicated thread.

Fixes #1198

Loading the initial keyset is done using spawn_blocking since it
involves file I/O. The watch loop is ran on a dedicated thread.

Fixes pendulum-project#1198
Copy link

codecov bot commented Nov 16, 2023

Codecov Report

Attention: 79 lines in your changes are missing coverage. Please review.

Comparison is base (714e7e8) 85.47% compared to head (15c0046) 85.32%.
Report is 8 commits behind head on main.

Files Patch % Lines
ntpd/src/daemon/nts_key_provider.rs 0.00% 76 Missing ⚠️
ntpd/src/daemon/mod.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1208      +/-   ##
==========================================
- Coverage   85.47%   85.32%   -0.15%     
==========================================
  Files          58       58              
  Lines       17077    17106      +29     
==========================================
  Hits        14596    14596              
- Misses       2481     2510      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rnijveld
Copy link
Member

I think we can simplify this by just using the tokio alternative (i.e. tokio::fs::metadata instead of std::fs::metadata) for all the file manipulation stuff, using a separate blocking thread feels a little heavy for me, any thoughts?

@mickvangelderen
Copy link
Author

mickvangelderen commented Nov 17, 2023

I think we can simplify this by just using the tokio alternative (i.e. tokio::fs::metadata instead of std::fs::metadata) for all the file manipulation stuff

I think doing so makes sense if:

using a separate blocking thread feels a little heavy for me

This is what spawn_blocking does and what the tokio::fs implementation probably also does. I think you need cool new tech like tokio-uring in order not to have to spawn threads to hide the blocking.

What is concerning me more is that this code is not covered by any tests. Does that need to change?

@rnijveld
Copy link
Member

Thanks for your response! My comment was more aimed at the large change in code structure, given that the entire spawn function is now no longer async, although my wording probably wasn't great in that regard.

I think our initial version where spawn is an async function is better as it helps with legibility of the code if we don't have to context switch between blocking and async contexts too much, which is solved easily by replacing std::fs::metadata and std::fs::OpenOptions with their tokio alternatives. Those also keep implementation details hidden and allow us to focus our own program logic as much as possible, and that way we retain the capability of using other async operations within the spawn function in the future.

I agree that this code could use a little testing though! If you want to help with that it is very much appreciated!

@mickvangelderen
Copy link
Author

mickvangelderen commented Nov 17, 2023

Thanks for your response! My comment was more aimed at the large change in code structure

I've deduplicated the construction of a new keyset, simplified the computation of next_interval, and added a more accurate warning when the keyset file path is configured but the file does not exist. That explains the large change. Switching from async to sync does not actually require so many changes.

, given that the entire spawn function is now no longer async.

This is a good change in my opinion since KeySetProvider::store and KeySetProvider::load are sync (implemented using std::io::Read instead of tokio::io::AsyncRead and nothing in the implementation required async.

I think our initial version where spawn is an async function is better as it helps with legibility of the code if we don't have to context switch between blocking and async contexts too much, which is solved easily by replacing std::fs::metadata and std::fs::OpenOptions with their tokio alternatives.

Personally, and this is subjective, I like it when we can prevent async from leaking everywhere. This is not my project though. Are there any relevant guidelines for this project about when to use async vs sync?

Those also keep implementation details hidden and allow us to focus our own program logic as much as possible, and that way we retain the capability of using other async operations within the spawn function in the future.

Personally, I like to design for what we need, not what we might need.

Thanks for you time Ruben, and the speedy feedback! I think this change is good as is and we should focus on how to implement tests to ensure the behavior is correct. However, it seems it does not reach your original intent and so I'm closing the PR.

P.S. I decided to try to use async everywhere, but that also mean introducing tokio as a dependency for ntp-proto because the Keyset and KeySetProvider store and load methods need to use AsyncRead. While doable, do you want to allow the tokio runtime to do work inbetween every read_exact call and in general, introduce async everywhere because one day maybe it does not require spawning a thread on some platforms? Reopened the PR as is awaiting your feedback.

@rnijveld
Copy link
Member

Hmm, fair criticism! I think what we were trying to do was make ntp-proto sort of independent from the i/o layer, whereas ntp-daemon (i.e. the ntpd crate) would be mostly async. This works for NTP packet serialization and NTS extension field parsing because we actually pass buffers from the ntpd crate, instead of something like a file that directly does the i/o. I would consider this a little bit of shortcoming of the ntp-proto API, in that the way to use it within async code is non-obvious without looking at how it is done in the ntpd crate. Blocking code can opt to do the Read/Write with a buffer or without one, whereas async code really only has the option to do everything buffered.

I think the whole NTS key provider part never really focused enough on that aspect of our code, which is probably what caused the mix of blocking and async code there. All the blocking calls in the spawn function only run during startup, so it isn't a particularly big problem right now, but it's not great either.

I think your solution might actually be the one we should go for right now, but it did get me thinking about adjusting ntp-proto a little. I think we should reconsider some of the API we have in ntp-proto (especially if we want to make it available publicly), in most cases we could more explicitly require a buffer instead of the generic Read and Write traits given how there really is not much of a reason to do it without a buffer with the very limited size of NTS and NTP packets.

Especially for the keyset load and write it might be better to operate on buffers of sufficient size, and push the i/o entirely back to ntpd, where it can be either async or blocking depending on the usage site. @davidv1992 do you have any thoughts on this?

@davidv1992
Copy link
Member

Having taken a look, my opinion currently is that this is overkill for fixing the blocking code in an async context. Most of that is already addressed in the current code through a number of spawn_blockings in specific places, the only additional blocking call this would cover is the metadata one, which has a far simpler fix and on top of that doesn't seem to cause any real issues since it only happens on startup.

That said, the diagnosis that this code could benefit significantly from a refactoring pass is I think correct. However, I am not sure I find the particular approach taken here an improvement. In particular the passing around of closures creates a bit of disconnect between what run actually does and its code. I think if we want to make progress in making this code nicer a better approach would be to take a more holistic view, also removing some of the risk of blocking calls in ntp-proto as Ruben suggest. However, figuring out what the best interface there is is going to be a bit tricky and I don't immediately have a concrete direction there.

@davidv1992
Copy link
Member

After some internal discussion, we have decided to close this for now. Feel free to reopen it or open a new PR if you want to do the small fix.

If you want to take on the larger refactor work we would prefer it if you would discuss the direction of it with us beforehand, so we increase the chances of success. However, note that given our current priorities we may be quite slow to respond in such a discussion as fixing the style problems here is quite low on our list.

@davidv1992 davidv1992 closed this Nov 22, 2023
@mickvangelderen
Copy link
Author

@davidv1992 that's entirely reasonable. Are there any issues I could pick up that are a priority?

@davidv1992
Copy link
Member

Apologies for the late reply, right now not really. Our main priority at the moment is dealing with a bunch of things related to the grants that we have, but that does not have jobs that can be easily done by an external person. Feel free to look through the issue list though to see if anything catches your eye.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove blocking code in async function
3 participants