-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed Unmanage flow - Unmanage cluster will happen per node and not via tendrl/monitor #975
Conversation
Can one of the admins verify this patch? |
Can one of the admins verify this patch? |
0635467
to
da12256
Compare
@r0h4n @shtripat @GowthamShanmugam , Please review. |
@Tendrl/admins Please DO NOT merge this PR, There is some new code that went in monitoring integration, need to verify again with same first. |
@Tendrl/admins PR verified. Can be merged now. |
raise FlowExecutionFailedError( | ||
"Cluster is already in un-managed state" | ||
) | ||
if (_cluster.status is not None and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this check should not be done on child task of un-manage flow, so cant you move this inside if condition at L#55?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will put it in first condition itself rather than L#55. Flow will be like : Check if another job being executed or not ( checked by parent node which has accepted unmanage job) -> If not , lock the cluster and then run the unmanage flow for the parent node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
loop_count += 1 | ||
continue | ||
|
||
# Deleting cluster details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a separate function for this and invoke here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do it
_cluster = NS.tendrl.objects.Cluster( | ||
integration_id=integration_id | ||
).load() | ||
_cluster.short_name = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why short_name to be un-set after un-manage? After un-manage short_name should be retained I feel, and while re-import user should have option to provide a different name or retain the same.
@nthomas-redhat @r0h4n @julienlim thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have same thoughts, By design, it should be an option for user to retain the same name of the cluster as it was part of their infra before. Needs to be discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what I feel is, if cluster.short_name available while import populate the same in UI screen where list of nodes to be imported are listed. User still should be able to edit the field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shtripat @anmolsachan +1 that it would be nice to retain the short_name during the unmanage if possible so that if the same cluster was imported again, user had the option to reuse the same short name. However, if it's too complex, then we defer to this in the future as I think it's a nice-to-have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gnehapk Will this be a big change for UI ? Can this be done ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anmolsachan Its a small change in UI. UI will by populate cluster name field with short_name if exist else integration_id on import cluster flow. The cluster name is a editable field and can be changed by user. Created issue for the same Tendrl/ui#973
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flow_id=self.parameters['flow_id'] | ||
) | ||
loop_count = 0 | ||
# Wait for (no of nodes) * 6 minutes for unmanage to complete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to wait for 6 mins per node for un-manage to complete? Its not an import where packages installation etc happen. So I feel a couple of mins would more more than required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will reduce the time
super(StopIntegrationServices, self).__init__(*args, **kwargs) | ||
|
||
def run(self): | ||
services = ["tendrl-gluster-integration"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write a utility function which takes list of services and stops+disables the services. The same utility could be called from both the atoms, StopMonitoringServices
and StopIntegrationServices
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, Will do it
ec6a2a4
to
ae376ce
Compare
@Tendrl/admins @shtripat Incorporated the recommended changes. Please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. Minor ones.
|
||
def run(self): | ||
services = ["tendrl-gluster-integration"] | ||
return service_utils.stop_service(services, self.parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it one statement as below
return service_utils.stop_service(
['tendrl-gluster-integration'],
self.parameters
)
|
||
def run(self): | ||
services = ["collectd"] | ||
return service_utils.stop_service(services, self.parameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
2047775
to
280cef2
Compare
) | ||
return False | ||
time.sleep(5) | ||
finished = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recent changes from @GowthamShanmugam for failed vs finished jobs to be taken care as if one of the child jobs fail, we should not straightway fail the parent job. We should still wait for all the child jobs to finish/fail and then only parent job should be marked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below code at L#114 still checks for finished
state only. I feel if its finished or failed, in both cases we should break the loop. @GowthamShanmugam ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes there also he has to change like job.status in ["finished", "failed"]
child_job = NS.tendrl.objects.Job( | ||
job_id=child_job_id | ||
).load() | ||
if child_job.status != "finished": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this can be as below
if child_job.status not in ['finished', 'failed']
@GowthamShanmugam ^^^ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over all looks fine. One small minor comment.
Make sure all the scenario are tested well and back to back import + unmanage + import is tested well.
) | ||
return False | ||
time.sleep(5) | ||
finished = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below code at L#114 still checks for finished
state only. I feel if its finished or failed, in both cases we should break the loop. @GowthamShanmugam ?
Updated And Verified. |
@shtripat please complete your reviews on this |
@shtripat @GowthamShanmugam Please finish reviewing this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,55 @@ | |||
""" This utility can be used to handle(start, stop, etc.) tendrl services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shtripat @GowthamShanmugam Why we need another service util or did you guys miss this during review?
Could you help @anmolsachan with re-using and modifying below for this PR
https://github.com/Tendrl/commons/blob/master/tendrl/commons/utils/service.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does disabling of service as well so thought of writing new one. But I feel at least the code can move to https://github.com/Tendrl/commons/blob/master/tendrl/commons/utils/service.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will make the change
… via tendrl/monitor tendrl-bug-id: Tendrl#841 bugzilla: 1583590 Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
tendrl-bug-id: Tendrl#841 bugzilla: 1583590 Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
tendrl-bug-id: Tendrl#841 bugzilla: 1583590 Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
tendrl-bug-id: Tendrl#841 bugzilla: 1583590 Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
tendrl-bug-id: Tendrl#841 bugzilla: 1583590 Signed-off-by: Anmol Sachan <anmol13694@gmail.com>
Closing thing for now, It will require a lot of change because of recent merges. Will raise a new PR if required, and will link to this one. |
tendrl-bug-id: #841
bugzilla: 1583590
Signed-off-by: Anmol Sachan anmol13694@gmail.com