Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

picks the random time for attempting new update #3275

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

nagworld9
Copy link
Contributor

Description

Today we count self-update time 6/24 hours period as a sliding window from service start. This strategy may not be good for vmss, for example. All the vms in vmss may get update around same time

New strategy picks the random time[0..6/24] when we detect the new update

Issue #


PR information

  • Ensure development PR is based on the develop branch.
  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@nagworld9 nagworld9 changed the title random update time picks the random time when new update detected Dec 10, 2024
@nagworld9 nagworld9 changed the title picks the random time when new update detected picks the random time for attempting new update Dec 10, 2024
This method called when new update detected and computes random time for next update.
If the current time on or after upgrade time, we allow the update.

Note: After we allow the update, and it's not successful, the next update time will be recalculated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add more details to this comment? not sure I understand what is trying to point out. thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comments

if next_update_time <= now:
# Update the last upgrade check time even if no new agent is available for upgrade
self._last_attempted_self_update_time = now
if not self._update_time_refreshed:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add some comments about the intent of self._update_time_refreshed? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, after review, I realized I could achieve same thing without _update_time_refreshed flag. Added more details in the code comments

@@ -214,7 +214,8 @@ def run(self, goal_state, ext_gs_updated):

# Always agent uses self-update for initial update regardless vm enrolled into RSM or not
# So ignoring the check for updater switch for the initial goal state/update
if not self._is_initial_update():
initial_update = self._is_initial_update()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should make _is_initial_update a public static method and call it where you need it, instead of passing the value as argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing like that creating circular dependency issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could refactor it out to a utilities class

"""
Get the next upgrade time
Returns random time in between 0 to 24hrs(regular) or 6hrs(hotfix) from now
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the hotfix default is 4 hours, not 6

if next_update_time <= now:
# Update the last upgrade check time even if no new agent is available for upgrade
self._last_attempted_self_update_time = now
if self._next_update_time == datetime.datetime.min:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need another update. One scenario that comes to mind is when we release a version, the agent detects the update, and picks the next_update_time. Before the update is attempted, the version is removed from PIR due to it being a bad version. When we release a new version again, the current logic does not pick a new update time and instead assumes the previous update time. I think we need to pick the update time per version. What do you guys think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i was also thinking about similar scenarios and, since the goal is to spread updates over time, rather than updating at a specific time or updating to a specific version, my take is that the current approach is ok

narrieta
narrieta previously approved these changes Dec 16, 2024
@nagworld9 nagworld9 merged commit 50fe8ca into Azure:develop Dec 16, 2024
9 of 11 checks passed
@nagworld9 nagworld9 deleted the upgrade-attempt branch December 16, 2024 23:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants