picks the random time for attempting new update #3275

nagworld9 · 2024-12-10T01:29:41Z

Description

Today we count self-update time 6/24 hours period as a sliding window from service start. This strategy may not be good for vmss, for example. All the vms in vmss may get update around same time

New strategy picks the random time[0..6/24] when we detect the new update

Issue #

PR information

Ensure development PR is based on the develop branch.
The title of the PR is clear and informative.
There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
If applicable, the PR references the bug/issue that it fixes in the description.
New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

I have read the contribution guidelines.

narrieta · 2024-12-11T17:39:43Z

azurelinuxagent/ga/self_update_version_updater.py

+        This method called when new update detected and computes random time for next update.
+        If the current time on or after upgrade time, we allow the update.
+
+        Note: After we allow the update, and it's not successful, the next update time will be recalculated.


could you add more details to this comment? not sure I understand what is trying to point out. thanks

Updated comments

narrieta · 2024-12-11T17:40:10Z

azurelinuxagent/ga/self_update_version_updater.py

-        if next_update_time <= now:
-            # Update the last upgrade check time even if no new agent is available for upgrade
-            self._last_attempted_self_update_time = now
+        if not self._update_time_refreshed:


could you add some comments about the intent of self._update_time_refreshed? thanks

actually, after review, I realized I could achieve same thing without _update_time_refreshed flag. Added more details in the code comments

narrieta · 2024-12-11T17:41:29Z

azurelinuxagent/ga/agent_update_handler.py

@@ -214,7 +214,8 @@ def run(self, goal_state, ext_gs_updated):

            # Always agent uses self-update for initial update regardless vm enrolled into RSM or not
            # So ignoring the check for updater switch for the initial goal state/update
-            if not self._is_initial_update():
+            initial_update = self._is_initial_update()


you should make _is_initial_update a public static method and call it where you need it, instead of passing the value as argument

changing like that creating circular dependency issue

you could refactor it out to a utilities class

maddieford · 2024-12-13T17:53:51Z

azurelinuxagent/ga/self_update_version_updater.py

        """
-        Get the next upgrade time
+        Returns random time in between 0 to 24hrs(regular) or 6hrs(hotfix) from now


It looks like the hotfix default is 4 hours, not 6

nagworld9 · 2024-12-13T18:37:53Z

azurelinuxagent/ga/self_update_version_updater.py

-        if next_update_time <= now:
-            # Update the last upgrade check time even if no new agent is available for upgrade
-            self._last_attempted_self_update_time = now
+        if self._next_update_time == datetime.datetime.min:


This may need another update. One scenario that comes to mind is when we release a version, the agent detects the update, and picks the next_update_time. Before the update is attempted, the version is removed from PIR due to it being a bad version. When we release a new version again, the current logic does not pick a new update time and instead assumes the previous update time. I think we need to pick the update time per version. What do you guys think?

yes, i was also thinking about similar scenarios and, since the goal is to spread updates over time, rather than updating at a specific time or updating to a specific version, my take is that the current approach is ok

nagworld9 requested review from narrieta, ZhidongPeng and maddieford as code owners December 10, 2024 01:29

nagworld9 changed the title ~~random update time~~ picks the random time when new update detected Dec 10, 2024

nagworld9 force-pushed the upgrade-attempt branch from b847a3d to ea43990 Compare December 10, 2024 01:38

random update time

eb59fe7

nagworld9 force-pushed the upgrade-attempt branch from ea43990 to eb59fe7 Compare December 10, 2024 01:44

nagworld9 changed the title ~~picks the random time when new update detected~~ picks the random time for attempting new update Dec 10, 2024

update test comment

cae1b3e

narrieta reviewed Dec 11, 2024

View reviewed changes

nagworld9 added 3 commits December 11, 2024 15:03

addressed comments

a2fec9a

address comments

baa01f2

pylint warn

3eaea82

maddieford reviewed Dec 13, 2024

View reviewed changes

nagworld9 commented Dec 13, 2024

View reviewed changes

narrieta previously approved these changes Dec 16, 2024

View reviewed changes

addressed comment

2c71768

nagworld9 dismissed narrieta’s stale review via 2c71768 December 16, 2024 23:16

maddieford approved these changes Dec 16, 2024

View reviewed changes

narrieta approved these changes Dec 16, 2024

View reviewed changes

Merge branch 'develop' into upgrade-attempt

45f8da9

nagworld9 merged commit 50fe8ca into Azure:develop Dec 16, 2024
9 of 11 checks passed

nagworld9 deleted the upgrade-attempt branch December 16, 2024 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

picks the random time for attempting new update #3275

picks the random time for attempting new update #3275

nagworld9 commented Dec 10, 2024

narrieta Dec 11, 2024

nagworld9 Dec 11, 2024

narrieta Dec 11, 2024

nagworld9 Dec 11, 2024

narrieta Dec 11, 2024

nagworld9 Dec 11, 2024

narrieta Dec 11, 2024

maddieford Dec 13, 2024

nagworld9 Dec 13, 2024

narrieta Dec 16, 2024

picks the random time for attempting new update #3275

picks the random time for attempting new update #3275

Conversation

nagworld9 commented Dec 10, 2024

Description

PR information

Quality of Code and Contribution Guidelines

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment