Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dogswatch: initial kubernetes operator #239

Merged
merged 51 commits into from
Nov 15, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
cefce8c
dogswatch: init core with stub updog
jahkeup Sep 6, 2019
7599e25
dogswatch: make constants true strings
jahkeup Sep 6, 2019
85725f0
dogswatch: tie in patching and node state
jahkeup Sep 10, 2019
00b9edf
dogswatch: push policy and action into controller
jahkeup Sep 11, 2019
9435380
dogswatch: move intent out and have agent respond
jahkeup Sep 11, 2019
2a65dea
dogswatch: rework into controller management loop
jahkeup Sep 11, 2019
cfeed5f
dogswatch: remove unused strings
jahkeup Sep 12, 2019
4693ba1
dogswatch: handle intent and clarify their names
jahkeup Sep 12, 2019
b86d89e
dogswatch: handle state propagation and response
jahkeup Sep 17, 2019
58e0721
dogswatch: return a stub UpdateID for MVP
jahkeup Sep 17, 2019
1efaf64
dogswatch: handle preflight on the agent side
jahkeup Sep 17, 2019
0c0c128
dogswatch: handle reset of state in intent
jahkeup Sep 17, 2019
08968d5
dogswatch: delete unused code
jahkeup Sep 17, 2019
0b1c703
dogswatch: accept interface in making intent
jahkeup Sep 17, 2019
152da4a
dogswatch: update todos
jahkeup Sep 18, 2019
08c95cd
dogswatch: add tests for intent
jahkeup Sep 18, 2019
d582b0c
dogswatch: update intent predicates and tests
jahkeup Sep 18, 2019
05cb5bc
dogswatch: clarify and document predicate methods
jahkeup Sep 19, 2019
3f3bc69
dogswatch: fix usage
jahkeup Sep 19, 2019
dfec08b
dogswatch: clean up predicate logic and comments
jahkeup Sep 19, 2019
db0a759
dogswatch: plumb nodeName for each process
jahkeup Sep 23, 2019
06e6a97
dogswatch: go mod tidy
jahkeup Sep 23, 2019
6e95295
dogswatch: update intent and tests
jahkeup Sep 27, 2019
7da889f
dogswatch: simplify calls to updog
jahkeup Sep 27, 2019
33ab26c
dogswatch: add development cluster resources
jahkeup Sep 27, 2019
2c5e7fb
dogswatch: isolate some cases and fix loops
jahkeup Sep 27, 2019
6cc0541
dogswatch: add development targets
jahkeup Sep 27, 2019
0517ce7
dogswatch: reset intent and replace properly
jahkeup Sep 27, 2019
df17abd
dogswatch: add tests for manager gating logic
jahkeup Sep 27, 2019
3b4240e
dogswatch: map thar rootfs to exec with
jahkeup Sep 27, 2019
33851e7
dogswatch: echo echo echo echo
jahkeup Sep 27, 2019
378a591
dogswatch: add dev targets for uploading to ECR
jahkeup Sep 28, 2019
b262f6b
dogswatch: update deployment yaml to support ECR
jahkeup Sep 28, 2019
d862836
dogswatch: remove stale code
jahkeup Sep 30, 2019
8d92ea1
dogswatch: use local cache for policy check
jahkeup Sep 30, 2019
69cb3ac
dogswatch: add manager tests
jahkeup Sep 30, 2019
d912fb6
dogswatch: move to extras/
jahkeup Sep 30, 2019
fde6115
dogswatch: pull images more eagerly
jahkeup Oct 1, 2019
467b98e
dogswatch: periodically check for updates
jahkeup Oct 1, 2019
735dee8
dogswatch: make checkers more aptly named
jahkeup Oct 1, 2019
e38994f
dogswatch: remove updog check that errors
jahkeup Oct 1, 2019
c99cf50
dogswatch: handle success uncordoning
jahkeup Oct 4, 2019
f149da2
dogswatch: update checks for intent handling
jahkeup Oct 4, 2019
a112336
dogswatch: add tests to controller
jahkeup Oct 4, 2019
0b811c2
dogswatch: refactor processs and poster interfaces
jahkeup Oct 5, 2019
dd18161
dogswatch: add testoutput
jahkeup Oct 5, 2019
545bae6
dogswatch: add initial docs
jahkeup Nov 11, 2019
21ee352
dogswatch: remove buildkit caching
jahkeup Nov 12, 2019
990ace3
dogswatch: update docstrings
jahkeup Nov 12, 2019
7e86d64
dogswatch: revise log to debuggable logging
jahkeup Nov 12, 2019
c8c4a0f
dogswatch: Update logging, climits, list update.
patraw Nov 5, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions extras/dogswatch/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.direnv/
*.nix
.envrc
*.el
*.tar*
11 changes: 11 additions & 0 deletions extras/dogswatch/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# syntax=docker/dockerfile:experimental
FROM golang:1.13 as builder
ENV GOPROXY=direct
COPY ./ /go/src/github.com/amazonlinux/thar/dogswatch/
RUN cd /go/src/github.com/amazonlinux/thar/dogswatch && \
CGO_ENABLED=0 GOOS=linux go build -o dogswatch . && mv dogswatch /dogswatch

FROM scratch
COPY --from=builder /dogswatch /etc/ssl /
ENTRYPOINT ["/dogswatch"]
CMD ["-help"]
47 changes: 47 additions & 0 deletions extras/dogswatch/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
GOPKG = github.com/amazonlinux/thar/dogswatch
GOPKGS = $(GOPKG) $(GOPKG)/pkg/... $(GOPKG)/cmd/...
GOBIN = ./bin/
DOCKER_IMAGE := dogswatch
DOCKER_IMAGE_REF := $(DOCKER_IMAGE):$(shell git describe --always --dirty)

build: $(GOBIN)
cd $(GOBIN) && \
go build -v -x $(GOPKG) && \
go build -v -x $(GOPKG)/cmd/...

$(GOBIN):
mkdir -p $(GOBIN)

test:
go test -ldflags '-X $(GOPKG)/pkg/logging.DebugEnable=true' $(GOPKGS)

container: vendor
docker build --network=host -t $(DOCKER_IMAGE_REF) .

load: container
kind load docker-image $(DOCKER_IMAGE)

vendor: go.sum go.mod
CGO_ENABLED=0 GOOS=linux go mod vendor
touch vendor/

deploy:
sed 's,@containerRef@,$(DOCKER_IMAGE_REF),g' ./dev/deployment.yaml \
| kubectl apply -f -

rollout: deploy
kubectl -n thar rollout restart deployment/dogswatch-controller
kubectl -n thar rollout restart daemonset/dogswatch-agent

rollout-kind: load rollout

cluster:
kind create cluster --config ./dev/cluster.yaml

dashboard:
kubectl apply -f ./dev/dashboard.yaml
@echo 'Visit dashboard at: http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/'
kubectl proxy

get-nodes-status:
kubectl get nodes -o json | jq -C -S '.items| map({(.metadata.name): (.metadata.labels * .metadata.annotations)})'
105 changes: 105 additions & 0 deletions extras/dogswatch/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
# Dogswatch: Update Operator

Dogswatch is a [Kubernetes operator](https://Kubernetes.io/docs/concepts/extend-Kubernetes/operator/) that coordinates update activities on Thar hosts in a Kubernetes cluster.

## How to Run on Kubernetes


To run the Dogswatch Operator in your Kubernetes cluster, the following are required resources and configuration (examples given in the [./dev/deployment.yaml](./dev/deployment.yaml) template):

- **`dogswatch` Container Image**

Holding the Dogswatch binaries and its supporting environment.

- **Controller Deployment**

Scheduling a stop-restart-tolerant Controller process on available Nodes.

- **Agent DaemonSet**

Scheduling Agent on Thar hosts

- **Thar Namespace**

Grouping Thar related resources and roles.

- **Service Account for the Agent**

Configured for authenticating the Agent process on Kubernetes APIs.

- **Cluster privileged credentials with read-write access to Nodes for Agent**

Applied to Agent Service Account to update annotations on the Node resource that the Agent is running under.

- **Service Account for the Controller**

Configured for authenticating the Controller process on Kubernetes APIs.

- **Cluster privileged credentials with access to Pods and Nodes for Controller**

Applied to the Controller Service Account for manipulating annotations on Node resources as well as cordon & uncordoning for updates.
The Controller also must be able to un-schedule (`delete`) Pods running on Nodes that will be updated.

In the [./dev/deployment.yaml example](./dev/deployment.yaml), the resource specifies the conditions that the Kubernetes Schedulers will place them in the Cluster.
These conditions include the Node being labeled as having the required level of support for the Operator to function on it: the `thar.amazonaws.com/platform-version` label.
With this label present and the workloads scheduled, the Agent and Controller process will coordinate an update as soon as the Agent annotates its Node (by default only one update will happen at a time).

To use the example [./dev/deployment.yaml](./dev/deployment.yaml) as a base, you must modify the resources to use the appropriate container image that is available to your kubelets (a common image is forthcoming, see #505).
Then with a appropriately configured deployment yaml, you may call `kubelet apply -f ./my-deployment.yaml` to prepare the above resources and schedule the Dogswatch Pods in your Cluster.

## What Makes Up Dogswatch

Dogswatch is made up of two distinct processes, one of which runs on each host.

- `dogswatch -controller`

The coordinating process responsible for the handling update of Thar nodes
cooperatively with the cluster's workloads.

- `dogswatch -agent`

The on-host process responsible for publishing update metadata and executing
update activities.

## How It Coordinates

The Dogswatch processes communicate by applying updates to the Kubernetes Node resources' Annotations.
The Annotations are used to communicate the Agent activity (called an `intent`) as determined by the Controller process, the current Agent activity in response to the intent, and the Host's update status
as known by the Agent process.

The Agent and Controller processes listen to an event stream from the Kubernetes cluster in order to quickly and reliably handle communicated `intent` in addition to updated metadata pertinent to updates and the Operator itself.

### Current Limitations

- Pod replication & healthy count is not taken into consideration (#502)
- Nodes update without pause between each (#503)
- Single Node cluster degrades into unscheduleable on update (#501)
- Node labels are not automatically applied to allow scheduling (#504)

## How to Contribute and Develop Changes for Dogswatch

Working on Dogswatch requires a fully functioning Kubernetes cluster.
For the sake of development workflow, you may easily run this within a container or VM as with [`kind`](https://github.com/Kubernetes-sigs/kind) or [`minikube`](https://github.com/Kubernetes/minikube).
The `dev/` directory contains several resources that may be used for development and debugging purposes:

- `dashboard.yaml` - A **development environment** set of Kubernetes resources (these use insecure settings and *are not suitable for use in Production*!)
- `deployment.yaml` - A _template_ for Kubernetes resources for Dogswatch that schedule a controller and setup a DaemonSet
- `kind-cluster.yml` - A `kind` Cluster definition that may be used to stand up a local development cluster

Much of the development workflow can be accommodated by the `Makefile` providedalongside the code.
Each of these targets utilize your existing environment and tools - for example: your `kubectl` as configured will be used.
If you have locally configured access to production, please ensure you've taken steps to reconfigure or otherwise cause `kubectl` to affect only your development cluster.

**General use targets**

- `container` - build a container image used by the Kubernetes resources
- `dashboard` - create or update Kubernetes-dashboard (*not suitable for use in Production*)
- `deploy` - create or update Dogswatch Kubernetes resources
- `rollout` - reload and restart Dogswatch processes in the cluster
- `test` - run `go test` against `dogswatch`

**`kind` development targets**

- `load`
- `cluster`
- `rollout-kind`
9 changes: 9 additions & 0 deletions extras/dogswatch/cmd/dogswatch-platform/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
package main

import (
"github.com/amazonlinux/thar/dogswatch/pkg/marker"
)

func main() {
println(marker.PlatformVersionBuild)
}
163 changes: 163 additions & 0 deletions extras/dogswatch/dev/dashboard.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# ------------------- Dashboard Secret ------------------- #

apiVersion: v1
kind: Secret
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard-certs
namespace: kube-system
type: Opaque

---
# ------------------- Dashboard Service Account ------------------- #

apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system

---
# ------------------- Dashboard Role & Role Binding ------------------- #

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: kubernetes-dashboard-minimal
namespace: kube-system
rules:
# Allow Dashboard to create 'kubernetes-dashboard-key-holder' secret.
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create"]
# Allow Dashboard to create 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
# Allow Dashboard to get, update and delete Dashboard exclusive secrets.
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs"]
verbs: ["get", "update", "delete"]
# Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["kubernetes-dashboard-settings"]
verbs: ["get", "update"]
# Allow Dashboard to get metrics from heapster.
- apiGroups: [""]
resources: ["services"]
resourceNames: ["heapster"]
verbs: ["proxy"]
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames: ["heapster", "http:heapster:", "https:heapster:"]
verbs: ["get"]

---
# ------------------- Dashboard Deployment ------------------- #

kind: Deployment
apiVersion: apps/v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kubernetes-dashboard
template:
metadata:
labels:
k8s-app: kubernetes-dashboard
spec:
containers:
- name: kubernetes-dashboard
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1
ports:
- containerPort: 8443
protocol: TCP
args:
- --auto-generate-certificates
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
# - --apiserver-host=http://my-address:port
- --enable-skip-login
volumeMounts:
- name: kubernetes-dashboard-certs
mountPath: /certs
# Create on-disk volume to store exec logs
- mountPath: /tmp
name: tmp-volume
livenessProbe:
httpGet:
scheme: HTTPS
path: /
port: 8443
initialDelaySeconds: 30
timeoutSeconds: 30
volumes:
- name: kubernetes-dashboard-certs
secret:
secretName: kubernetes-dashboard-certs
- name: tmp-volume
emptyDir: {}
serviceAccountName: kubernetes-dashboard
# Comment the following tolerations if Dashboard must not be deployed on master
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule

---
# ------------------- Dashboard Service ------------------- #

kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
ports:
- port: 443
targetPort: 8443
selector:
k8s-app: kubernetes-dashboard
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kubernetes-dashboard
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system

Loading