Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added armory kb documents #8308

Merged
merged 3 commits into from
Oct 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: 403 and Permission Errors when Enabling New Services
---

## Issue
Armory has found that some customers enabling new services in Spinnaker may encounter various errors, including 403 access errors, when attempting to execute pipelines or perform other tasks.
This issue can usually be related to changes in customer deployments related to policies on minimum access requirements.  

## Cause
As general guidance for account role access, the AWS Power User role should be used when granting permissions.  However, customers may find that their internal security policy requires more granularly defined access. 


Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: 403 Errors around GitHub access and Rate Limit Issues
---

## Issue
Customers may encounter 403 errors around GitHub either as a part of their calls to retrieve Artifacts, Dinghy, or other services.  There are a variety of reasons why this may happen, but customers will often see that their CloudDriver logs will indicate the following errors:

```com.netflix.spinnaker.clouddriver.artifacts.exceptions.FailedDownloadException: Unable to determine the download URL of artifact Artifact(type=github/file, customKind=false, name=null, version=main, location=null, reference=https://api.github.com/, metadata={}, artifactAccount=, provenance=null, uuid=null): Received 403 status code from api.github.com```


or a 403 Status in their Spinnaker UI Console

## Cause
Multiple factors may be causing a Github 403 status.  

Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: Access denied to application on Deploy Stage without application WRITE permissions
---

## Issue
Currently, application ```WRITE``` permissions are required to use the Deploy Stage in Spinnaker with AWS, GCP, Titus and other deployment targets.However, when using the Kubernetes provider, only ```EXECUTE``` permissions are needed to use the Deploy Stage.This is an open issue in OSS Spinnaker: [spinnaker/spinnaker#6400](https://github.com/spinnaker/spinnaker/issues/6400)```EXECUTE``` application permissions were developed recently and, so far, have only been fully implemented on the Kubernetes V2 provider. It is possible to set ```EXECUTE``` permissions on the Deploy Stage for other targets, however the stage will also require application ```WRITE``` permissions to run successfully.

## Cause
Deploy stage on** targets other than the Kubernetes provider** does not run successfully because it does not have application ```WRITE``` permissions set.

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Accessing Armory Scale Agent Endpoints to help in Troubleshooting
---

## Introduction
Customers using Armory Scale Agent for Kubernetes may encounter issues when running Kubernetes deployments.  The endpoints below can be used to diagnose and gain more information to aid in troubleshooting.  The following KB article explains how to access those endpoints.
Customers may also want to access the CloudDriver endpoints and dig into information that can be found there.  They can do so by following the information in this KB article:[https://support.armory.io/support?id=kb_article_view&sysparm_article=KB0010601](https://support.armory.io/support?id=kb_article_view&sysparm_article=KB0010601)

## Prerequisites
* Armory Enterprise Spinnaker with Armory Scale Agent for Kubernetes enabled.* Access to the cluster in which Agent is deployed.* Users would also require to port-forward the Agent pod to access the endpoints. The process to do so can be found below in the Instructions section.

## Instructions
**Port Forward to the Agent ports**
Customers will first need to set up a port forward to the Agent pod.  This can be accomplished by executing the below command. After running the command, the Clouddriver service will be accessible on ```localhost:8082 ```
```kubectl port-forward pod/armory-agent-xxx 8082```
**Get details about the Armory Agent account**
To get the details about the accounts that are configured in Armory Agent and if they were loaded after the Agent startup, invoke the below endpoint
```curl -kv http://localhost:8082/accounts/```
**Attain Agent goroutines **
Armory Agent is written in Golang. For troubleshooting purposes, it might be necessary to capture the list of ```goroutines``` that are run within the Agent to see if a particular function is being executed or not. Invoking the below endpoint would return the list of ```goroutines``` within Agent.
```curl -kv http://:8082/debug/pprof/goroutine?debug=1```

Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: Accessing Clouddriver Endpoints to help in Troubleshooting Armory Scale Agent issues
---

## Introduction
Customers using Armory Agent for Kubernetes may run into issues when running Kubernetes deployments.  Customers may want to access CloudDriver endpoints for the purpose of Troubleshooting Armory Scale Agent issues.  The article below touches upon accessing the endpoints in Clouddriver to help in the troubleshooting.
For details on accessing Scale Agent Endpoints, please visit: [https://support.armory.io/support?id=kb_article_view&sysparm_article=KB0010602](https://support.armory.io/support?id=kb_article_view&sysparm_article=KB0010602)

## Prerequisites
* Armory Enterprise Spinnaker with Armory Agent for Kubernetes enabled.* Access to the cluster in which Spinnaker services are deployed.* Users would also require to port-forward the ```Clouddriver port 7002``` to access Clouddriver endpoints. The process to do so can be found below in the Instructions section.

## Instructions
### Port Forward to access Clouddriver ports
To port forward to access the Clouddriver service on port 7002, execute the below command. Post running the command, the Clouddriver service shall be accessible on ```localhost:7002 ```
```kubectl port-forward svc/spin-clouddriver -n spinnaker-namespace 7002:7002```
### List agents, accounts, and the Clouddriver instances they are connected to
The below endpoint returns a JSON response that contains
* The list of agents* The accounts each agent is accessing* The specific Clouddriver pods that the Agents are connected to/registered with
```curl -kv http://localhost:7002/armory/agents```

### Attain the steps that a deploy operation went through
When a deployment stage that contains an Agent account is triggered, an ```operation ID``` is generated by the plugin and this operation undergoes a series of steps.
* The request first gets received by a Clouddriver instance.* If there are multiple replicas of Clouddriver, the operation then gets passed on to the specific Clouddriver instance which is connected to/registered with the Armory Agent making the request using the lookup table.  It correlates it with the account where the deployment is supposed to be triggered.* The operation is then passed on to the Agent.* Once the Agent deploys the operation, the operation follows the return route where it gets passed on to the corresponding Clouddriver instances and then finally returns the response to Orca.
Below is the endpoint to track the steps that the operation went through
```curl -kv http://localhost:7002/armory/agent/operations/{opId}```
In the case of an error, the ```operation ID``` should be displayed in the Spinnaker Console UI.  However, if you need to locate the ```operation ID``` for a successful execution, please query the backend table, ```kubesvc_ops_history```.  For more information on querying tables related to Agent, please read the following KB article: [https://support.armory.io/support?id=kb_article&sysparm_article=KB0010603](https://support.armory.io/support?id=kb_article&sysparm_article=KB0010603).

### List Clouddriver instances
Invoking the below endpoint would return the list of Clouddriver replicas that have the plugin enabled
```curl -kv http://localhost:7002/armory/clouddrivers```


Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
title: Accessing the Spinnaker REST API using HTTPie and jq
---

## Introduction
This article advises about how to use ```HTTPie``` to access the ```Spinnaker REST API```. It provides guidelines about using the Spinnaker API programmatically.

## Prerequisites
The ```Spinnaker REST API``` can be accessed using ```HTTPie```. The results can be filtered and formatted by ```jq```.The following are the prerequisites for accessing the Spinnaker API in this KB:
* The Spinnaker API (Gate) URL.* Download and install of HTTPie: [https://httpie.io/](https://httpie.io/) and jq: [https://stedolan.github.io/jq/download/](https://stedolan.github.io/jq/download/)
**Note: **For long term use, consider the use of ```x509 certificates``` for Spinnaker to make the calls. For development cases, a session cookie after logging in to Spinnaker can be used to test the ```REST API``` call.
In the examples below, a session cookie will be used.

## Instructions
## Attain a session cookie
* Log on to the ```Deck GUI```.
* Open the **Developer Tools** of your browser.
* Go to the **Network** tab -> find the call to **user** on the left tab.
* Look for the **Request Headers** and the **cookie** parameter. This cookie will need to be set when making API calls.
* Note that this cookie does expire after some time and is not permanent.
## Using the Spinnaker REST API
In the following examples we have set the parameters as follows:
**Gate URL**
GATE=http://x.x.x.x

# Spinnaker Gate Session Cookie, below is just an example.
SPINCOOK='Cookie:SESSION=YjdhNDg3NmQtZjE5Zi00xxxxxxxxxx'

# Spinnaker pipeline ID, below is just an example.
PIPEID='01EWXEEZTZPX09FE86XA7S75BA'

# Spinnaker Application Name
APP=spintest

# Create httpie session named test. Then use the session name for all the following calls.
# Setting verify to no will skip host SSL certificate verification.
# This setting should only be used if GATE is using a self-signed certificate.
# This will eliminate having to pass the cookie which gets tricky
eval http --verify=no --session=test $GATE/applications "'"$SPINCOOK"'"

# create/update applications and projects (see example below for min-app-create.json)
eval http --verify=no --session=test POST $GATE/tasks

The following is an example of ```min-app-create.json``` file which will create an application ```app1``` with minimal configuration and can be used as a test.
```
{
"job": [
{
"type": "createApplication",
"application": {
"cloudProviders": "kubernetes",
"instancePort": 80,
"providerSettings": {
"aws": {
"useAmiBlockDeviceMappings": false
}
},
"name": "app1",
"email": "app@example.com",
"permissions": {
"READ": [
"superadmins",
"qa"
],
"EXECUTE": [
"superadmins"
],
"WRITE": [
"superadmins"
]
}
},
"user": "test_user"
}
],
"application": "app1",
"description": "Create Application: app1"
}
```
For more information on the Spinnaker API please refer to: [https://spinnaker.io/reference/api](https://spinnaker.io/reference/api) and [https://docs.armory.io/docs/spinnaker-user-guides/writing-scripts/](https://docs.armory.io/docs/spinnaker-user-guides/writing-scripts/).

Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Accounts with liveManifestCalls Set to True Have Incorrect Dynamic Lookup Results
---

## Issue
After enabling ```liveManifestCalls: true```, the environment begins exhibiting odd behaviors.  Resources that have been deployed/changed in previous stages are not being taken in to consideration in current stages, leading to errors and issues in the Pipeline DeploymentThis can be especially detrimental to any pipelines that use a rollout strategy along with a ```strategy.spinnaker.io/max-version-history``` annotation, causing inconsistent state of deployment targets, and also pipeline failures

## Cause
When the flag is set to true, the 'Deploy Manifest' stage waits for the newly-deployed resource by directly polling the cluster instead of by checking in Spinnaker's cache. In general, the stage will finish more quickly, as it can complete as soon as the resource is ready instead of once the new resource is reflected in the cache. 
One significant issue that may occur though, that the stage may complete before the cache reflects these changes. Spinnaker expects that stages mutating infrastructure will not complete until the cache has been updated to reflect these mutations.
**The result is that any downstream stages that rely on the cache being up-to-date (as stages are generally allowed to do) will either fail or produce incorrect results**. 
As for how it relates to this issue, any stages that use **dynamic target selection** to patch/enable/disable a resource. Stages looking in the cache to find the oldest/newest/etc. resource, and act based on the state of the cache when they run will also be affected.  Finally, rollout strategies are another example where dynamic target selection is affected.  As a result of the cache not being up to date, this can lead to omitting a newly deployed/deleted/patched resource from a prior stage.For pipelines that have use a rollout strategy along with a ```[strategy.spinnaker.io/max-version-history](http://strategy.spinnaker.io/max-version-history)``` annotation, this can be especially painful.  When a ```max version flag``` is set, at the time of execution **Clouddriver knows only of** **N replicasets existing** and will try to disable the **N-1 older replicasets**.This means that there are situations where although Orca plans for X disable manifest tasks, the oldest one is already deleted at the time of the task execution causing a pipeline failure.Furthermore if a failed pipeline is executed 2-3 or more times until it succeeds it causing a very inconsistent state of the deployment targets depending on which Disable manifest task completes and which doesn't.

Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
title: Adding a group with many members to application times out in GCP
---

## Issue
An organization using Spinnaker and Google Cloud Platform (GCP) may run into API rate limiting issues involving Fiat Managed Service Accounts.  These rate limits can cause errors, timeouts, and general non-responsiveness when the expectation is full functionality. 

Example Error: 
```202X-XX-XX XX:XX:XX.453 WARN 1 — [ scheduling-1] s.f.r.g.GoogleDirectoryUserRolesProvider : [] Failed to fetch groups for user xx####x#-xx#x-##xx-x#x#-###c####x#x#@managed-service-account: Invalid Input: memberKey" ```



## Cause
Spinnaker uses ```Google Workspaces API``` to manage and run ```FIAT``` Managed Services, resulting in consistent, frequent API calls when calling managed service accounts.  The results are that an organization can be temporarily blocked and rate limited due to excessive usage.
Specifically, requests are made to Google APIs whenever pipelines execute using the managed service account from an automated trigger, causing multiple API calls for routine deployments. 

Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
title: Adding Additional Policies and Access to an EKS Cluster
---

## Introduction
As environments become more complex and additional security requirements are necessary for the nods and cluster of the environment, some fundamental basics of policies and permissions should be considered for the environment. 
Depending on the particular security policy of the environment

## Prerequisites
Access to the cluster, roles and policy for the AWS environment

## Instructions
When deploying the EKS environment, IAM Roles are usually used to manage the security and access available to the environment.  The roles and permissions associated to allow Spinnaker to interact with the AWS environment are tied to the **EKS Cluster Node IAM Role** which can be found at:
* Log in to the AWS Management Console* Go to the EKS Administration portal* On the left menu select **Clusters **under **Amazon EKS** and select the appropriate cluster name* Click on the **Configuration** tab, then the **Compute** tab* Click on the appropriate **Node Group*** In the information for the Node Group, there will be an entry regarding the **Node IAM Role ARN**. Access needs to be provided to this role so that the cluster can access the appropriate resources (e.g. **Secrets Manager** or **S3 Buckets** for the storage of Secrets)

13 changes: 13 additions & 0 deletions kb/armory/General/adjust-clean-up-timing-for-armory-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: Adjust Clean Up Timing for Armory Agent
---

## Introduction
After a set amount of time, if CloudDriver is unable to reach an Agent, it is deemed to be marked for cleanup.  The cleanup timing may need to be adjusted Settings on CloudDriver can be adjusted to clean up missing / unreachable agents. 

## Prerequisites
Armory Agent should be [installed and configured](https://docs.armory.io/docs/armory-agent/)

## Instructions
To make adjustments on the cleanup timing, the adjustment should be made to the ```kubesvc.cache.accountCleanupFrequencySeconds``` value in the Plugin adjustments. By default, the clean up frequency is set to 600 seconds (10 minutes)For further information on adjusting the value on the plugin, please refer to: [https://docs.armory.io/docs/armory-agent/agent-plugin-options/](https://docs.armory.io/docs/armory-agent/agent-plugin-options/)

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Agent Deployment Error- exceptions.OperationTimedOutException- Timeout exceeded for operation
---

## Issue
When using Agent, customers may experience that they see the following errors during a deployment execution:

```
message="com.netflix.spinnaker.clouddriver.exceptions.OperationTimedOutException: Timeout exceeded for operation ......., type: 6, account: .....-ingress-dev, time: 30022 ms, timeout: 30000 ms
at io.armory.kubesvc.services.ops.KubesvcOperations.performOperation(KubesvcOperations.java:96)
at io.armory.kubesvc.services.ops.cluster.ClusteredKubesvcOperations.performOperation(ClusteredKubesvcOperations.java:70)
at io.armory.kubesvc.util.OperationUtils.perform(OperationUtils.java:76)
at io.armory.kubesvc.services.ops.executor.KubesvcExecutor.deploy(KubesvcExecutor.java:301)
at jdk.internal.reflect.GeneratedMethodAccessor703.invoke(Unknown Source)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
```

## Cause
The error indicates that the Clouddriver sent a ```deploy operation``` to the kubesvc Agent and was not able to obtain the result back from the Agent in time.


Loading
Loading