Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: feat/auto remediation #602

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
90c23f7
feat: added functionality for emitting results into the k8sgpt-operat…
AlexsJones Dec 20, 2024
130a71c
feat: added functionality for emitting results into the k8sgpt-operat…
AlexsJones Dec 20, 2024
0436a95
feat: extended remediation step into implementation
AlexsJones Jan 13, 2025
1ce83e2
feat: modest progress towards mutation log
AlexsJones Jan 13, 2025
ec3c292
feat: updated to kubebuilder v4 to enable new kubebuilder resource cr…
AlexsJones Jan 14, 2025
0ec348a
chore: added mutation type
AlexsJones Jan 14, 2025
ce1632d
feat: first pass of mutations
AlexsJones Jan 14, 2025
8790adb
feat: target configuration working
AlexsJones Jan 14, 2025
cc6a52e
feat: status and mutations updating, but controller seems to die
AlexsJones Jan 15, 2025
e5dba93
chore: wip patching, throws some status errors
AlexsJones Jan 15, 2025
51db33b
chore: print logs
AlexsJones Jan 16, 2025
646a347
fix: fixed update issue with status and mutation config
AlexsJones Jan 16, 2025
4a777f4
feat: added similarity score and retified some object update issues a…
AlexsJones Jan 17, 2025
11503b7
feat: simplified the reconcile
AlexsJones Jan 20, 2025
0d629a6
feat: added success/failure stats
AlexsJones Jan 21, 2025
cc78b16
feat: closer to a working flow that makes sense
AlexsJones Jan 23, 2025
8b149b4
feat: updated
AlexsJones Jan 23, 2025
d3be36d
feat: working deployment fixing
AlexsJones Jan 27, 2025
53473e8
chore: adding riskthreshold
AlexsJones Jan 28, 2025
5d0b1d6
feat: starting to document AR
AlexsJones Jan 28, 2025
bc4b679
chore: updated docs
AlexsJones Jan 28, 2025
6f83d26
chore: fixed broken test with type change
AlexsJones Jan 28, 2025
d0de700
chore: updated yaml
AlexsJones Jan 28, 2025
63e8d2a
feat: added similarity score check
AlexsJones Jan 28, 2025
a19c8f0
fix: wording in docs
AlexsJones Jan 29, 2025
6b64528
chore: added missing chart update to mutation CRD
AlexsJones Jan 29, 2025
0fce308
chore: annotated reconciler
AlexsJones Jan 30, 2025
c3abd86
feat: missing properties in the helm chart
AlexsJones Jan 30, 2025
e31a972
chore: updated yaml
AlexsJones Jan 30, 2025
5109a4a
feat: updated to kubebuilder v4 to enable new kubebuilder resource cr…
AlexsJones Jan 30, 2025
4de7af9
chore: updated yaml
AlexsJones Jan 30, 2025
a6e4738
feat: updated to kubebuilder v4 to enable new kubebuilder resource cr…
AlexsJones Jan 30, 2025
266d6de
feat: improvements to deployment completion flow
AlexsJones Feb 4, 2025
16ca1e4
chore: updated timeouts
AlexsJones Feb 4, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
main
__debug_bin
*.DS_Store
k8sgpt-operator
Expand Down
57 changes: 57 additions & 0 deletions AUTO_REMEDIATION.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Auto Remediation

Status: Alpha
Supported AI Backends:
- Amazonbedrock
- OpenAI

Auto Remediation will attempt to fix problems encountered in your cluster.
To accomplish this, it interprets K8sGPT results and applying a patch to fix the issue on the target resource (or parent ).

This feature is highly experimental and is not ready for use in a production environment.
To enable this feature, you need to set the following K8sGPT custom resource field:

```bash
cat<<EOF | kubectl apply -f -
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: default
spec:
ai:
autoRemediation:
enabled: true
riskThreshold: 90
...
```
Complete example available [here](./config/samples/autoremediation/valid_k8sgpt_remediation_sample.yaml)

## How does it work?

Opting-in to auto remediation will enable the following processes:
- K8sGPT operator will parse results that have been created, and calculate
kinds that auto remediation has been [enabled on](#supported_Kinds). Upon doing so, it will also create a [Mutation](#mutations).
- After Mutations are created they will attempt to reconcile the differenc in the origin resource vs the target changes.
- Once a patch has been calculated ( in-part based on similarity score), they will attempt to apply it.
- The resource change will be watched until the result either is removed ( as the resource is now fixed ) or persists.
- The mutation will keep an entire log of the changes and events that occured.


## Supported Kinds

Currently in Alpha state, the supported kinds are:
- Service
- Ingress
- Pod
- Owned (RS/Deployment)
- Static

## Mutations

Mutations are custom resources that hold the state and intent for mutating resources in the cluster.
Eventually this will be compatible with a GitOps process ( you can pull the mutations out of cluster and re-apply).

## Rollback

Deleting a mutation will revert the applied changes to the cluster resource.
6 changes: 3 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -13,21 +13,21 @@
RUN go mod download

# Copy the go source
COPY main.go main.go
COPY cmd/ cmd/
COPY api/ api/
COPY pkg/ pkg/
COPY controllers/ controllers/
COPY internal/ internal/

# Build
# the GOARCH has not a default value to allow the binary be built according to the host where the command
# was called. For example, if we call make docker-build in a local env which has the Apple Silicon M1 SO
# the docker BUILDPLATFORM arg will be linux/arm64 when for Apple x86 it will be linux/amd64. Therefore,
# by leaving it empty we can ensure that the container and binary shipped on it will have the same platform.
RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -a -o manager main.go
RUN CGO_ENABLED=0 GOOS=${TARGETOS} GOARCH=${TARGETARCH} go build -a -o manager cmd/main.go

# Use distroless as minimal base image to package the manager binary
# Refer to https://github.com/GoogleContainerTools/distroless for more details
FROM gcr.io/distroless/static:nonroot as production

Check warning on line 30 in Dockerfile

View workflow job for this annotation

GitHub Actions / Build Container Image

The 'as' keyword should match the case of the 'from' keyword

FromAsCasing: 'as' and 'FROM' keywords' casing do not match More info: https://docs.docker.com/go/dockerfile/rule/from-as-casing/

LABEL org.opencontainers.image.source="https://github.com/k8sgpt-ai/k8sgpt-operator" \
org.opencontainers.image.url="https://k8sgpt.ai" \
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,11 @@ test: manifests generate fmt vet envtest ## Run tests.

.PHONY: build
build: manifests generate fmt vet ## Build manager binary.
go build -o bin/manager main.go
go build -o bin/manager cmd/main.go

.PHONY: run
run: manifests generate fmt vet ## Run a controller from your host.
go run ./main.go
go run ./cmd/main.go

# If you wish built the manager image targeting other platforms you can use the --platform flag.
# (i.e. docker build --platform linux/arm64 ). However, you must enable docker buildKit for it.
Expand Down
11 changes: 10 additions & 1 deletion PROJECT
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
# More info: https://book.kubebuilder.io/reference/project-config.html
domain: k8sgpt.ai
layout:
- go.kubebuilder.io/v3
- go.kubebuilder.io/v4
plugins:
grafana.kubebuilder.io/v1-alpha: {}
projectName: k8sgpt-operator
Expand All @@ -27,4 +27,13 @@ resources:
kind: Result
path: github.com/k8sgpt-ai/k8sgpt-operator/api/v1alpha1
version: v1alpha1
- api:
crdVersion: v1
namespaced: true
controller: true
domain: k8sgpt.ai
group: core
kind: Mutation
path: github.com/k8sgpt-ai/k8sgpt-operator/api/v1alpha1
version: v1alpha1
version: "3"
37 changes: 24 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,17 @@
[![Artifact Hub](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/k8sgpt)](https://artifacthub.io/packages/search?repo=k8sgpt)
[![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fk8sgpt-ai%2Fk8sgpt-operator.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fk8sgpt-ai%2Fk8sgpt-operator?ref=badge_shield)

#### Feature Status:

[![stability-mature](https://img.shields.io/badge/stability-mature-008000.svg)](https://github.com/k8sgpt-ai/k8sgpt-operator/README.md#Installatiion)

- Analysis & Results generation

[![stability-alpha](https://img.shields.io/badge/stability-alpha-f4d03f.svg)](https://github.com/k8sgpt-ai/k8sgpt-operator/AUTO_REMEDIATION.md)

- [Auto Remediation](./AUTO_REMEDIATION.MD)

---
This Operator is designed to enable [K8sGPT](https://github.com/k8sgpt-ai/k8sgpt/) within a Kubernetes cluster.
It will allow you to create a custom resource that defines the behaviour and scope of a managed K8sGPT workload. Analysis and outputs will also be configurable to enable integration into existing workflows.

Expand Down Expand Up @@ -42,7 +53,7 @@ metadata:
spec:
ai:
enabled: true
model: gpt-3.5-turbo
model: gpt-4o-mini
backend: openai
secret:
name: k8sgpt-sample-secret
Expand All @@ -55,7 +66,7 @@ spec:
# proxyEndpoint: https://10.255.30.150 # use proxyEndpoint to setup backend through an HTTP/HTTPS proxy
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
version: v0.3.48
#integrations:
# trivy:
# enabled: true
Expand Down Expand Up @@ -128,7 +139,7 @@ spec:
anonymized: true
backend: openai
language: english
model: gpt-3.5-turbo
model: gpt-4o-mini
secret:
key: api_key
name: my_openai_secret
Expand Down Expand Up @@ -184,7 +195,7 @@ kubectl create secret generic k8sgpt-sample-secret --from-literal=openai-api-key
spec:
ai:
enabled: true
model: gpt-3.5-turbo
model: gpt-4o-mini
backend: openai
secret:
name: k8sgpt-sample-secret
Expand Down Expand Up @@ -226,15 +237,15 @@ metadata:
namespace: k8sgpt-operator-system
spec:
ai:
model: gpt-3.5-turbo
model: gpt-4o-mini
backend: openai
enabled: true
secret:
name: k8sgpt-sample-secret
key: openai-api-key
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
version: v0.3.48
remoteCache:
credentials:
name: k8sgpt-sample-cache-secret
Expand Down Expand Up @@ -271,15 +282,15 @@ metadata:
namespace: k8sgpt-operator-system
spec:
ai:
model: gpt-3.5-turbo
model: gpt-4o-mini
backend: openai
enabled: true
secret:
name: k8sgpt-sample-secret
key: openai-api-key
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
version: v0.3.48
remoteCache:
credentials:
name: k8sgpt-sample-cache-secret
Expand Down Expand Up @@ -320,13 +331,13 @@ spec:
secret:
name: k8sgpt-sample-secret
key: azure-api-key
model: gpt-35-turbo
model: gpt-4o-mini
backend: azureopenai
baseUrl: https://k8sgpt.openai.azure.com/
engine: llm
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
version: v0.3.48
EOF
```

Expand Down Expand Up @@ -420,7 +431,7 @@ spec:
baseUrl: http://local-ai.local-ai.svc.cluster.local:8080/v1
noCache: false
repository: ghcr.io/k8sgpt-ai/k8sgpt
version: v0.3.41
version: v0.3.48
EOF
```

Expand Down Expand Up @@ -448,7 +459,7 @@ metadata:
spec:
ai:
enabled: true
model: gpt-3.5-turbo
model: gpt-4o-mini
backend: openai
secret:
name: k8sgpt-sample-secret
Expand Down Expand Up @@ -478,7 +489,7 @@ metadata:
spec:
ai:
enabled: true
model: gpt-3.5-turbo
model: gpt-4o-mini
backend: openai
secret:
name: k8sgpt-sample-secret
Expand Down
9 changes: 9 additions & 0 deletions api/v1alpha1/k8sgpt_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,16 @@ type BackOff struct {
MaxRetries int `json:"maxRetries"`
}

type AutoRemediation struct {
// +kubebuilder:default:=false
Enabled bool `json:"enabled"`
// Defaults to 10%
// +kubebuilder:default="10"
RiskThreshold string `json:"riskThreshold"`
}

type AISpec struct {
AutoRemediation AutoRemediation `json:"autoRemediation,omitempty"`
// +kubebuilder:default:=openai
// +kubebuilder:validation:Enum=ibmwatsonxai;openai;localai;azureopenai;amazonbedrock;cohere;amazonsagemaker;google;googlevertexai
Backend string `json:"backend"`
Expand Down
71 changes: 71 additions & 0 deletions api/v1alpha1/mutation_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
/*
Copyright 2023 K8sGPT Contributors.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package v1alpha1

import (
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN!
// NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized.

// MutationSpec defines the desired state of Mutation.
type MutationSpec struct {
// INSERT ADDITIONAL SPEC FIELDS - desired state of cluster
// Important: Run "make" to regenerate code after modifying this file
SimilarityScore string `json:"similarityScore,omitempty"`
ResourceGVK string `json:"resourceGVK,omitempty"`
ResourceRef corev1.ObjectReference `json:"resource,omitempty"`
ResultRef corev1.ObjectReference `json:"result,omitempty"`
OriginConfiguration string `json:"originConfiguration,omitempty"`
TargetConfiguration string `json:"targetConfiguration,omitempty"`
}

// MutationStatus defines the observed state of Mutation.
type MutationStatus struct {
// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file
Phase AutoRemediationPhase `json:"phase,omitempty"`
Message string `json:"message,omitempty"`
}

// +kubebuilder:object:root=true

// Display in wide format the autoremediationphase status and similarity score
// +kubebuilder:printcolumn:name="State",type="string",JSONPath=".status.message",description="Updates of the autoremediation phase"
// +kubebuilder:printcolumn:name="Similarity Score",type="string",JSONPath=".spec.similarityScore",description="The similarity score of the autoremediation"
// Mutation is the Schema for the mutations API.
type Mutation struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
Spec MutationSpec `json:"spec,omitempty"`
Status MutationStatus `json:"status,omitempty"`
}

// +kubebuilder:object:root=true

// MutationList contains a list of Mutation.
type MutationList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []Mutation `json:"items"`
}

func init() {
SchemeBuilder.Register(&Mutation{}, &MutationList{})
}
Loading
Loading