Skip to content
This repository has been archived by the owner on Mar 4, 2024. It is now read-only.

Attempt to report agent failures as INFRA #87

Open
johnsca opened this issue Dec 5, 2016 · 3 comments
Open

Attempt to report agent failures as INFRA #87

johnsca opened this issue Dec 5, 2016 · 3 comments

Comments

@johnsca
Copy link
Contributor

johnsca commented Dec 5, 2016

Recently, we've had a spate of failures on GCE. For some reason, they report as hook failures, even though it seems that what actually happened is that the agent crashed. For example: http://cwr.vapour.ws/bundle_hadoop_kafka/361b12d5cace479a8ed3a9965e4610c7/report.html

We should look into whether it's possible to treat this as INFRA. Specifically, if "machine" is anything other than "started" is probably an INFRA.

@kwmonroe
Copy link
Contributor

kwmonroe commented Dec 5, 2016

Since the failures are so widespread and recent for gce, this may be related:

http://reports.vapour.ws/releases/issue/57c3c12c749a564be035e4ee

This is an issue where the controller can't contact agents, and there are numerous reports of this happening for gce over the last week.

@kwmonroe
Copy link
Contributor

Is this issue still valid? I haven't seen any hook failures due to juju agent crashing recently. I have seen INFRA failures for things like "gce cannot upload scripts", but that's not the same.

Perhaps this fixed itself with the move from juju-deployer (v1) to juju deploy (v2).

@johnsca
Copy link
Contributor Author

johnsca commented Apr 3, 2017

I think it would still be reasonable to detect the machine agent status and report that as INFRA even if workload-status is error.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants