Move waiting for tasks to separate phase to unblock process #1262

mnecas · 2024-12-12T13:02:09Z

Issue:
When the Forklift creates the snapshot we need to wait for the task to finish. Right now we are using task.WaitForResult, this causes the whole process to wait for the snapshot creation and blocks other VM migrations. Same problem we have with snapshot removal, if the ESXi host is busy and we start the snapshot removal the snapshots can take longer than the reconcile cycle (3s) and we can fail due to it.

Fix:
Instead of using the task.WaitForResult the forklift will start querying for the latest tasks per VM, by default it's 10 tasks. This querying will be done in a separate phase then the creation/deletion. So we will have WaitFor phases for each of the object manipulations. We find the specific task for the creation/deletion and check its status. This has the advantage that we are not only getting the status of the task but in addition also the results of the task, so we can propagate them to the user, in case the creation/deletion fails.

Ref:

Proper fix of: #1253

codecov-commenter · 2024-12-12T13:19:40Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 0% with 78 lines in your changes missing coverage. Please review.

Project coverage is 15.50%. Comparing base (ea38999) to head (d26472a).

Files with missing lines	Patch %	Lines
pkg/controller/plan/adapter/vsphere/client.go	0.00%	57 Missing ⚠️
pkg/controller/plan/migration.go	0.00%	12 Missing ⚠️
pkg/controller/plan/adapter/openstack/client.go	0.00%	3 Missing ⚠️
pkg/controller/plan/adapter/ova/client.go	0.00%	3 Missing ⚠️
pkg/controller/plan/adapter/ovirt/client.go	0.00%	3 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1262      +/-   ##
==========================================
- Coverage   15.57%   15.50%   -0.08%     
==========================================
  Files         112      112              
  Lines       23255    23300      +45     
==========================================
- Hits         3623     3612      -11     
- Misses      19345    19403      +58     
+ Partials      287      285       -2

Flag	Coverage Δ
unittests	`15.50% <0.00%> (-0.08%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Issue: When the Forklift creates the snapshot we need to wait for the task to finish. Right now we are using task.WaitForResult, this causes the whole process to wait for the snapshot creation and blocks other VM migrations. Same problem we have with snapshot removal, if the ESXi host is busy and we start the snapshot removal the snapshots can take longer than the reconcile cycle (3s) and we can fail due to it. Fix: Instead of using the task.WaitForResult the forklift will start querying for the latest tasks per VM, by default it's 10 tasks. This querying will be done in a separate phase then the creation/deletion. So we will have WaitFor phases for each of the object manipulations. We find the specific task for the creation/deletion and check its status. This has the advantage that we are not only getting the status of the task but in addition also the results of the task, so we can propagate them to the user, in case the creation/deletion fails. Ref: - https://issues.redhat.com/browse/MTV-1753 - https://issues.redhat.com/browse/MTV-1775 Signed-off-by: Martin Necas <mnecas@redhat.com>

sonarqubecloud · 2024-12-12T13:58:11Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Add a wait phase for snapshot tasks Issue: The main problem of the MTV-1753 and MTV-1775 is that we are either not waiting for the VMware task to finish or if we are waiting we are halting the whole controller process. This causes either performance issues or even migration failures. So we need to add a mechanism to wait for the tasks without halting the whole process. Fix: My first attempt was in PR kubev2v#1262 which used the event manager. This on the surface was an easy approach which did not require any additional changes to the CR. The problem there was that some of the tasks were not reported to the taskManager. These tasks had a prefix haTask. After some investigation, I found out that these tasks are directly on the ESXi host and not sent to the vspehre, so we can't use the taskManager. This PR adds the taskIds to the status CR so additional wait phases can monitor the tasks. The main controller will get the ESXi client and create a property collector to request the specific task from the host. Ref: - https://issues.redhat.com/browse/MTV-1753 - https://issues.redhat.com/browse/MTV-1775 - kubev2v#1262 - kubev2v#1265 Signed-off-by: Martin Necas <mnecas@redhat.com>

Issue: The main problem of the MTV-1753 and MTV-1775 is that we are either not waiting for the VMware task to finish or if we are waiting we are halting the whole controller process. This causes either performance issues or even migration failures. So we need to add a mechanism to wait for the tasks without halting the whole process. Fix: My first attempt was in PR kubev2v#1262 which used the event manager. This on the surface was an easy approach which did not require any additional changes to the CR. The problem there was that some of the tasks were not reported to the taskManager. These tasks had a prefix haTask. After some investigation, I found out that these tasks are directly on the ESXi host and not sent to the vspehre, so we can't use the taskManager. This PR adds the taskIds to the status CR so additional wait phases can monitor the tasks. The main controller will get the ESXi client and create a property collector to request the specific task from the host. Ref: - https://issues.redhat.com/browse/MTV-1753 - https://issues.redhat.com/browse/MTV-1775 - kubev2v#1262 - kubev2v#1265 Signed-off-by: Martin Necas <mnecas@redhat.com>

Issue: The main problem of the MTV-1753 and MTV-1775 is that we are either not waiting for the VMware task to finish or if we are waiting we are halting the whole controller process. This causes either performance issues or even migration failures. So we need to add a mechanism to wait for the tasks without halting the whole process. Fix: My first attempt was in PR #1262 which used the event manager. This on the surface was an easy approach which did not require any additional changes to the CR. The problem there was that some of the tasks were not reported to the taskManager. These tasks had a prefix haTask. After some investigation, I found out that these tasks are directly on the ESXi host and not sent to the vspehre, so we can't use the taskManager. This PR adds the taskIds to the status CR so additional wait phases can monitor the tasks. The main controller will get the ESXi client and create a property collector to request the specific task from the host. Ref: - https://issues.redhat.com/browse/MTV-1753 - https://issues.redhat.com/browse/MTV-1775 - #1262 - #1265 Signed-off-by: Martin Necas <mnecas@redhat.com>

Issue: The main problem of the MTV-1753 and MTV-1775 is that we are either not waiting for the VMware task to finish or if we are waiting we are halting the whole controller process. This causes either performance issues or even migration failures. So we need to add a mechanism to wait for the tasks without halting the whole process. Fix: My first attempt was in PR kubev2v#1262 which used the event manager. This on the surface was an easy approach which did not require any additional changes to the CR. The problem there was that some of the tasks were not reported to the taskManager. These tasks had a prefix haTask. After some investigation, I found out that these tasks are directly on the ESXi host and not sent to the vspehre, so we can't use the taskManager. This PR adds the taskIds to the status CR so additional wait phases can monitor the tasks. The main controller will get the ESXi client and create a property collector to request the specific task from the host. Ref: - https://issues.redhat.com/browse/MTV-1753 - https://issues.redhat.com/browse/MTV-1775 - kubev2v#1262 - kubev2v#1265 Signed-off-by: Martin Necas <mnecas@redhat.com>

Issue: The main problem of the MTV-1753 and MTV-1775 is that we are either not waiting for the VMware task to finish or if we are waiting we are halting the whole controller process. This causes either performance issues or even migration failures. So we need to add a mechanism to wait for the tasks without halting the whole process. Fix: My first attempt was in PR #1262 which used the event manager. This on the surface was an easy approach which did not require any additional changes to the CR. The problem there was that some of the tasks were not reported to the taskManager. These tasks had a prefix haTask. After some investigation, I found out that these tasks are directly on the ESXi host and not sent to the vspehre, so we can't use the taskManager. This PR adds the taskIds to the status CR so additional wait phases can monitor the tasks. The main controller will get the ESXi client and create a property collector to request the specific task from the host. Ref: - https://issues.redhat.com/browse/MTV-1753 - https://issues.redhat.com/browse/MTV-1775 - #1262 - #1265 Signed-off-by: Martin Necas <mnecas@redhat.com>

mnecas added this to the 2.7.7 milestone Dec 12, 2024

mnecas requested a review from yaacov as a code owner December 12, 2024 13:02

mnecas force-pushed the MTV-1753 branch from b4ccf74 to 95255a7 Compare December 12, 2024 13:14

mnecas force-pushed the MTV-1753 branch from 95255a7 to d26472a Compare December 12, 2024 13:57

mnecas merged commit 4208890 into kubev2v:main Dec 12, 2024
18 of 19 checks passed

mnecas mentioned this pull request Dec 12, 2024

Revert: Move waiting for tasks to separate phase to unblock process #1265

Merged

mnecas mentioned this pull request Dec 13, 2024

Add a wait phase for snapshot tasks #1267

Merged

mnecas mentioned this pull request Dec 13, 2024

Backport to 2.7.7 #1270

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move waiting for tasks to separate phase to unblock process #1262

Move waiting for tasks to separate phase to unblock process #1262

mnecas commented Dec 12, 2024

codecov-commenter commented Dec 12, 2024 •

edited

Loading

sonarqubecloud bot commented Dec 12, 2024

Move waiting for tasks to separate phase to unblock process #1262

Move waiting for tasks to separate phase to unblock process #1262

Conversation

mnecas commented Dec 12, 2024

codecov-commenter commented Dec 12, 2024 • edited Loading

Codecov Report

sonarqubecloud bot commented Dec 12, 2024

Quality Gate passed

codecov-commenter commented Dec 12, 2024 •

edited

Loading