Skip to content
This repository has been archived by the owner on Oct 11, 2019. It is now read-only.

VYGR-391: Add opsgenie integrations to notifications #120

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

fcobb
Copy link

@fcobb fcobb commented Feb 5, 2019

What: This PR uses the Opsgenie integration manager client to add Opsgenie integrations into the service-metadata configmap, allowing autowiring functions to make use of the integrations.
Opsgenie integration manager client PR: #119

Manually tested the following scenarios in the playground:

  • Creating/deploying a service without opsgenie team then adding it, verifying that the configmap was updated.
  • Services with invalid opsgenie team will return an error back through processing.
  • Services without Opsgenie team will successfully process.

@fcobb fcobb requested a review from nilebox February 5, 2019 04:30
ash2k
ash2k previously requested changes Feb 5, 2019
cmd/synchronization/app/options.go Outdated Show resolved Hide resolved
pkg/apis/creator/v1/types.go Outdated Show resolved Hide resolved
pkg/servicecentral/metadata.go Outdated Show resolved Hide resolved
pkg/servicecentral/metadata.go Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/servicecentral/metadata.go Outdated Show resolved Hide resolved
pkg/apis/creator/v1/types.go Outdated Show resolved Hide resolved
pkg/servicecentral/metadata.go Outdated Show resolved Hide resolved
pkg/servicecentral/store.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
pkg/synchronization/controller.go Outdated Show resolved Hide resolved
@fcobb fcobb force-pushed the fcobb/VYGR-391-create-opsgenie-int-manager-client branch from f6f5e25 to 04e1d7d Compare February 11, 2019 05:51
@fcobb fcobb self-assigned this Feb 12, 2019
@fcobb fcobb force-pushed the fcobb/VYGR-391-save-opsgenie-secret-during-sync branch from 0b5b2f0 to 005de8c Compare February 13, 2019 04:09
@fcobb fcobb changed the base branch from fcobb/VYGR-391-create-opsgenie-int-manager-client to master February 13, 2019 04:09
@fcobb fcobb changed the title [WIP] VYGR-391: Create opsgenie secret during sync [WIP] VYGR-391: Add opsgenie integrations to notifications Feb 15, 2019
@@ -459,41 +465,65 @@ func (c *Controller) getServiceData(user auth.OptionalUser, name voyager.Service
return c.ServiceCentral.GetService(context.Background(), user, servicecentral.ServiceName(name))
}

func (c *Controller) buildNotifications(spec creator_v1.ServiceSpec) (*orch_meta.Notifications, error) {
func (c *Controller) buildNotifications(spec creator_v1.ServiceSpec) (*orch_meta.Notifications, bool /* retriable */, error) {
// Default pagerduty values re-used from Micros config.js
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of return this at the end we now instantiate the notifications with defaults and override it if the pagerduty or opsgenie notifications creation is successful.

}, nil
}
}
integrations, retriable, err := c.getOpsgenieIntegrations(spec.Metadata.Opsgenie)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if team is not specified in Service Central? Will we get an empty list of integrations?

Copy link
Author

@fcobb fcobb Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do a check at the start of the called function to see if spec.Metadata.Opsgenie is nil

func (c *Controller) getOpsgenieIntegrations(metadata *creator_v1.OpsgenieMetadata) ([]opsgenie.Integration, bool /* retriable */, error) {
	// Opsgenie is optional
	if metadata == nil {
		return nil, true, nil
	}

}

return &orch_meta.Notifications{
notifications = orch_meta.Notifications{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Email remains the same, we could set notifications.PagerdutyEndpoint and notifications.LowPriorityPagerdutyEndpoint to make it more explicit which parts do we want to modify.

return errors.Wrap(err, "unable to parse Deployinator URL")
if err != nil {
return errors.Wrap(err, "unable to parse Deployinator URL")
}
Copy link
Author

@fcobb fcobb Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why this always returned an error and I cannot identify where this function is called.
I think its called in readAndValidateOptions when yaml.UnmarshalStrict is called, but if thats the case then I'm not sure why we always fail.
https://github.com/atlassian/voyager/blob/master/cmd/synchronization/app/options.go#L67

@ash2k Can you help me understand why we returned an error before?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a bug, not sure how did it work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnvTypeDev EnvType = "dev"
EnvTypeStaging EnvType = "staging"
EnvTypeProd EnvType = "prod"
EnvTypeGlobal EnvType = "null" // Intentionally a string called "null" as this is the expected result from opsgenie int manager
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not global?

Copy link
Author

@fcobb fcobb Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why its not called "global". @jokeyrhyme Do you know why?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fcobb Well, it's not that things are global, it's just that they aren't explicitly environment-specific
Maybe this should actually mean Global, and maybe we should actually emit "global" instead of "null"
When we put this together, the most important thing was making sure our CloudWatch alarms (which are environment-specific) were covered, everything else was secondary
I'm wondering whether we should call this Null or something so that we can meaningfully transition to Global in future?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jokeyrhyme I went for Global as if its not environment specific then its essentially global.
As you own this API I will defer these decisions to you and your team. My preference is to return a specific value for these non-environment specific values rather than null.


if found {
service.Attributes = append(service.Attributes, ogTeamAttr)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, we filtered the service attributes before adding them to the service object. We now do that filtering later on.

@@ -393,34 +394,6 @@ func TestGetServiceWithFailedAttributesCall(t *testing.T) {
require.Equal(t, 2, handler.RequestSnapshots.Calls())
}

func TestGetServiceWithMultipleOpsGenieAttribute(t *testing.T) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not make sense anymore as it was making sure we errored on multiple OG attributes, this is no longer an error case as we handle multiple attributes later on.

maxSyncWorkers = 10
maxSyncWorkers = 10
defaultPagerdutyGeneric = "5d11612f25b840faaf77422edeff9c76"
defaultPagerdutyCloudwatch = "https://events.pagerduty.com/adapter/cloudwatch_sns/v1/124e0f010f214a9b9f30b768e7b18e69"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These defaults were in the buildNotifcations function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values should come from configuration, not be constants. Please move them.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do this in a follow up PR as this PR is big enough.

@@ -459,41 +467,73 @@ func (c *Controller) getServiceData(user auth.OptionalUser, name voyager.Service
return c.ServiceCentral.GetService(context.Background(), user, servicecentral.ServiceName(name))
}

func (c *Controller) buildNotifications(spec creator_v1.ServiceSpec) (*orch_meta.Notifications, error) {
func (c *Controller) buildNotifications(spec creator_v1.ServiceSpec) (*orch_meta.Notifications, bool /* retriable */, error) {
Copy link
Author

@fcobb fcobb Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is easier to understand the changes to this function if you do a side-by-side diff.

@@ -306,6 +306,11 @@ func serviceDataToService(data *ServiceDataRead) (*creator_v1.Service, error) {
service.Spec.Metadata.PagerDuty = pagerDutyMetadata
}

ogMetadata, _ := GetOpsGenieAttribute(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we log an error?

Copy link
Author

@fcobb fcobb Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we log an error?
Considering this is optional, Should this be an error or info message?

ogMetadata, err := GetOpsGenieAttribute(data) // Error ignored as Opsgenie team is optional
	if err != nil {
		c.logger.Info("unable to get Opsgenie attribute - ignoring", zap.Error(err))
	}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that failure to make a call to Opsgenie integration manager is still an error, so I would log it at the error level. We should react to being unable to fetch Opsgenie integrations rather than completely ignoring them.

@ash2k WDYT?

Copy link
Author

@fcobb fcobb Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the path of calling OG int manager, this call will attempt to find any opsgenie attributes within the service object.

We call opsgenie int manager during the createOrUpdateServiceMetadata function when building the notifications. I agree that we should know when calls to OG int manager fail, and we return an error in that case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attempt to find any opsgenie attributes within the service object

If attribute not found, there won't be an error, but just a nil result AFAICT?
So then the error means that either the attribute format is invalid, or there is some bug in Voyager code, and we should also log with error, I think.

Copy link
Author

@fcobb fcobb Feb 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep you are correct, an error here is when:

  1. we found an issue with an Opsgenie attribute
  2. we found more than one Opsgenie attribute.

EDIT: As we have found a configured Opsgenie team but failed to process it, I am currently returning an error rather than logging it.

cmd/synchronization/app/options.go Show resolved Hide resolved
EnvTypeDev EnvType = "dev"
EnvTypeStaging EnvType = "staging"
EnvTypeProd EnvType = "prod"
EnvTypeGlobal EnvType = "null" // Intentionally a string called "null" as this is the expected result from opsgenie int manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

pkg/servicecentral/client.go Outdated Show resolved Hide resolved
pkg/servicecentral/metadata.go Outdated Show resolved Hide resolved
return nil, nil // Not an error as Opsgenie is optional
}

filtered := make([]opsgenie.Integration, 0, 4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually do not pre-allocate if we don't exactly know the number of items. Just a style thing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change this to make our style if you would like. We expect there to be exactly 4 but it is possible that they add more integrations in the future. What would you like to see in this scenario?

maxSyncWorkers = 10
maxSyncWorkers = 10
defaultPagerdutyGeneric = "5d11612f25b840faaf77422edeff9c76"
defaultPagerdutyCloudwatch = "https://events.pagerduty.com/adapter/cloudwatch_sns/v1/124e0f010f214a9b9f30b768e7b18e69"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values should come from configuration, not be constants. Please move them.

EnvTypeDev EnvType = "dev"
EnvTypeStaging EnvType = "staging"
EnvTypeProd EnvType = "prod"
EnvTypeGlobal EnvType = "null" // Intentionally a string called "null" as this is the expected result from opsgenie int manager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not global?

@@ -17,7 +17,7 @@ import (
)

const (
opsGenieIntManURL = "https://micros.prod.atl-paas.net"
opsgenieIntManURL = "https://micros.prod.atl-paas.net"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use real data in tests please. Use fake data =)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a manual integration test, I don't think fake data is applicable.

cmd/synchronization/app/options.go Show resolved Hide resolved
ash2k
ash2k previously approved these changes Feb 18, 2019
Copy link
Contributor

@ash2k ash2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine to be merged. Please address the issues/nits in a follow up PR. I think the manual opsgenie integration test should be moved to the internal repo.

ash2k
ash2k previously approved these changes Feb 18, 2019
@fcobb fcobb force-pushed the fcobb/VYGR-391-save-opsgenie-secret-during-sync branch from fcf889a to 434198e Compare February 20, 2019 05:48
@fcobb fcobb changed the title [WIP] VYGR-391: Add opsgenie integrations to notifications VYGR-391: Add opsgenie integrations to notifications Feb 20, 2019
@nilebox
Copy link
Contributor

nilebox commented Feb 21, 2019

As discussed, please add a test for Opsgenie integrations present in ServiceProperties (in entangler_test.go

@fcobb
Copy link
Author

fcobb commented Feb 21, 2019

As discussed, please add a test for Opsgenie integrations present in ServiceProperties (in entangler_test.go

Done

@nilebox
Copy link
Contributor

nilebox commented Feb 21, 2019

Make sure you block the PR in stash updating dependencies after you merge this change, and fix config.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants