2
2
3
3
[ ![ build] ( https://travis-ci.org/stefanprodan/flagger.svg?branch=master )] ( https://travis-ci.org/stefanprodan/flagger )
4
4
[ ![ report] ( https://goreportcard.com/badge/github.com/stefanprodan/flagger )] ( https://goreportcard.com/report/github.com/stefanprodan/flagger )
5
+ [ ![ codecov] ( https://codecov.io/gh/stefanprodan/flagger/branch/master/graph/badge.svg )] ( https://codecov.io/gh/stefanprodan/flagger )
5
6
[ ![ license] ( https://img.shields.io/github/license/stefanprodan/flagger.svg )] ( https://github.com/stefanprodan/flagger/blob/master/LICENSE )
6
7
[ ![ release] ( https://img.shields.io/github/release/stefanprodan/flagger/all.svg )] ( https://github.com/stefanprodan/flagger/releases )
7
8
@@ -19,7 +20,7 @@ Deploy Flagger in the `istio-system` namespace using Helm:
19
20
20
21
``` bash
21
22
# add the Helm repository
22
- helm repo add flagger https://stefanprodan.github.io/ flagger
23
+ helm repo add flagger https://flagger.app
23
24
24
25
# install or upgrade
25
26
helm upgrade -i flagger flagger/flagger \
@@ -32,10 +33,11 @@ Flagger is compatible with Kubernetes >1.10.0 and Istio >1.0.0.
32
33
33
34
### Usage
34
35
35
- Flagger requires two Kubernetes [ deployments] ( https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ ) :
36
- one for the version you want to upgrade called _ primary_ and one for the _ canary_ .
37
- Each deployment must have a corresponding ClusterIP [ service] ( https://kubernetes.io/docs/concepts/services-networking/service/ )
38
- that exposes a port named http or https. These services are used as destinations in a Istio [ virtual service] ( https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService ) .
36
+ Flagger takes a Kubernetes deployment and creates a series of objects
37
+ (Kubernetes [ deployments] ( https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ ) ,
38
+ ClusterIP [ services] ( https://kubernetes.io/docs/concepts/services-networking/service/ ) and
39
+ Istio [ virtual services] ( https://istio.io/docs/reference/config/istio.networking.v1alpha3/#VirtualService ) )
40
+ to drive the canary analysis and promotion.
39
41
40
42
![ flagger-overview] ( https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-overview.png )
41
43
@@ -44,102 +46,69 @@ Gated canary promotion stages:
44
46
* scan for canary deployments
45
47
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
46
48
* check primary and canary deployments status
47
- * halt rollout if a rolling update is underway
48
- * halt rollout if pods are unhealthy
49
+ * halt advancement if a rolling update is underway
50
+ * halt advancement if pods are unhealthy
49
51
* increase canary traffic weight percentage from 0% to 5% (step weight)
50
52
* check canary HTTP request success rate and latency
51
- * halt rollout if any metric is under the specified threshold
53
+ * halt advancement if any metric is under the specified threshold
52
54
* increment the failed checks counter
53
55
* check if the number of failed checks reached the threshold
54
56
* route all traffic to primary
55
57
* scale to zero the canary deployment and mark it as failed
56
58
* wait for the canary deployment to be updated (revision bump) and start over
57
59
* increase canary traffic weight by 5% (step weight) till it reaches 50% (max weight)
58
- * halt rollout while canary request success rate is under the threshold
59
- * halt rollout while canary request duration P99 is over the threshold
60
- * halt rollout if the primary or canary deployment becomes unhealthy
61
- * halt rollout while canary deployment is being scaled up/down by HPA
60
+ * halt advancement while canary request success rate is under the threshold
61
+ * halt advancement while canary request duration P99 is over the threshold
62
+ * halt advancement if the primary or canary deployment becomes unhealthy
63
+ * halt advancement while canary deployment is being scaled up/down by HPA
62
64
* promote canary to primary
63
65
* copy canary deployment spec template over primary
64
66
* wait for primary rolling update to finish
65
- * halt rollout if pods are unhealthy
67
+ * halt advancement if pods are unhealthy
66
68
* route all traffic to primary
67
69
* scale to zero the canary deployment
68
70
* mark rollout as finished
69
71
* wait for the canary deployment to be updated (revision bump) and start over
70
72
71
73
You can change the canary analysis _ max weight_ and the _ step weight_ percentage in the Flagger's custom resource.
72
74
73
- Assuming the primary deployment is named _ podinfo_ and the canary one _ podinfo-canary_ , Flagger will require
74
- a virtual service configured with weight-based routing:
75
+ For a deployment named _ podinfo_ , a canary promotion can be defined using Flagger's custom resource:
75
76
76
77
``` yaml
77
- apiVersion : networking.istio.io/v1alpha3
78
- kind : VirtualService
79
- metadata :
80
- name : podinfo
81
- spec :
82
- hosts :
83
- - podinfo
84
- http :
85
- - route :
86
- - destination :
87
- host : podinfo
88
- port :
89
- number : 9898
90
- weight : 100
91
- - destination :
92
- host : podinfo-canary
93
- port :
94
- number : 9898
95
- weight : 0
96
- ` ` `
97
-
98
- Primary and canary services should expose a port named http:
99
-
100
- ` ` ` yaml
101
- apiVersion : v1
102
- kind : Service
103
- metadata :
104
- name : podinfo-canary
105
- spec :
106
- type : ClusterIP
107
- selector :
108
- app : podinfo-canary
109
- ports :
110
- - name : http
111
- port : 9898
112
- targetPort : 9898
113
- ` ` `
114
-
115
- Based on the two deployments, services and virtual service, a canary promotion can be defined using Flagger's custom resource:
116
-
117
- ` ` ` yaml
118
- apiVersion : flagger.app/v1beta1
78
+ apiVersion : flagger.app/v1alpha1
119
79
kind : Canary
120
80
metadata :
121
81
name : podinfo
122
82
namespace : test
123
83
spec :
124
- targetKind : Deployment
125
- virtualService :
84
+ # deployment reference
85
+ targetRef :
86
+ apiVersion : apps/v1
87
+ kind : Deployment
126
88
name : podinfo
127
- primary :
89
+ # hpa reference (optional)
90
+ autoscalerRef :
91
+ apiVersion : autoscaling/v2beta1
92
+ kind : HorizontalPodAutoscaler
128
93
name : podinfo
129
- host : podinfo
130
- canary :
131
- name : podinfo-canary
132
- host : podinfo-canary
94
+ service :
95
+ # container port
96
+ port : 9898
97
+ # Istio gateways (optional)
98
+ gateways :
99
+ - public-gateway.istio-system.svc.cluster.local
100
+ # Istio virtual service host names (optional)
101
+ hosts :
102
+ - app.istio.weavedx.com
133
103
canaryAnalysis :
134
- # max number of failed checks
135
- # before rolling back the canary
136
- threshold : 10
104
+ # max number of failed metric checks before rollback
105
+ threshold : 5
137
106
# max traffic percentage routed to canary
138
107
# percentage (0-100)
139
108
maxWeight : 50
140
109
# canary increment step
141
110
# percentage (0-100)
142
- stepWeight : 5
111
+ stepWeight : 10
143
112
metrics :
144
113
- name : istio_requests_total
145
114
# minimum req success rate (non 5xx responses)
@@ -150,14 +119,14 @@ spec:
150
119
# maximum req duration P99
151
120
# milliseconds
152
121
threshold : 500
153
- interval : 1m
122
+ interval : 30s
154
123
` ` `
155
124
156
125
The canary analysis is using the following promql queries:
157
126
158
127
_HTTP requests success rate percentage_
159
128
160
- ` ` ` promql
129
+ ` ` ` sql
161
130
sum(
162
131
rate(
163
132
istio_requests_total{
182
151
183
152
_ HTTP requests milliseconds duration P99_
184
153
185
- ``` promql
154
+ ``` sql
186
155
histogram_quantile(0 .99 ,
187
156
sum (
188
157
irate(
@@ -198,8 +167,6 @@ histogram_quantile(0.99,
198
167
199
168
### Automated canary analysis, promotions and rollbacks
200
169
201
- ![ flagger-canary] ( https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-hpa.png )
202
-
203
170
Create a test namespace with Istio sidecar injection enabled:
204
171
205
172
``` bash
@@ -208,66 +175,72 @@ export REPO=https://raw.githubusercontent.com/stefanprodan/flagger/master
208
175
kubectl apply -f ${REPO} /artifacts/namespaces/test.yaml
209
176
```
210
177
211
- Create the primary deployment, service and hpa :
178
+ Create a deployment and a horizontal pod autoscaler :
212
179
213
180
``` bash
214
- kubectl apply -f ${REPO} /artifacts/workloads/primary-deployment.yaml
215
- kubectl apply -f ${REPO} /artifacts/workloads/primary-service.yaml
216
- kubectl apply -f ${REPO} /artifacts/workloads/primary-hpa.yaml
181
+ kubectl apply -f ${REPO} /artifacts/canaries/deployment.yaml
182
+ kubectl apply -f ${REPO} /artifacts/canaries/hpa.yaml
217
183
```
218
184
219
- Create the canary deployment, service and hpa :
185
+ Create a canary promotion custom resource (replace the Istio gateway and the internet domain with your own) :
220
186
221
187
``` bash
222
- kubectl apply -f ${REPO} /artifacts/workloads/canary-deployment.yaml
223
- kubectl apply -f ${REPO} /artifacts/workloads/canary-service.yaml
224
- kubectl apply -f ${REPO} /artifacts/workloads/canary-hpa.yaml
188
+ kubectl apply -f ${REPO} /artifacts/canaries/canary.yaml
225
189
```
226
190
227
- Create a virtual service (replace the Istio gateway and the internet domain with your own) :
191
+ After a couple of seconds Flagger will create the canary objects :
228
192
229
193
``` bash
230
- kubectl apply -f ${REPO} /artifacts/workloads/virtual-service.yaml
194
+ # applied
195
+ deployment.apps/podinfo
196
+ horizontalpodautoscaler.autoscaling/podinfo
197
+ canary.flagger.app/podinfo
198
+ # generated
199
+ deployment.apps/podinfo-primary
200
+ horizontalpodautoscaler.autoscaling/podinfo-primary
201
+ service/podinfo
202
+ service/podinfo-canary
203
+ service/podinfo-primary
204
+ virtualservice.networking.istio.io/podinfo
231
205
```
232
206
233
- Create a canary promotion custom resource:
207
+ ![ flagger-canary-steps] ( https://raw.githubusercontent.com/stefanprodan/flagger/master/docs/diagrams/flagger-canary-steps.png )
208
+
209
+ Trigger a canary deployment by updating the container image:
234
210
235
211
``` bash
236
- kubectl apply -f ${REPO} /artifacts/rollouts/podinfo.yaml
212
+ kubectl -n test set image deployment/podinfo \
213
+ podinfod=quay.io/stefanprodan/podinfo:1.2.1
237
214
```
238
215
239
- Canary promotion output :
216
+ Flagger detects that the deployment revision changed and starts a new rollout :
240
217
241
218
```
242
219
kubectl -n test describe canary/podinfo
243
220
244
221
Status:
245
- Canary Revision: 16271121
246
- Failed Checks: 6
222
+ Canary Revision: 19871136
223
+ Failed Checks: 0
247
224
State: finished
248
225
Events:
249
226
Type Reason Age From Message
250
227
---- ------ ---- ---- -------
251
- Normal Synced 3m flagger Starting canary deployment for podinfo.test
228
+ Normal Synced 3m flagger New revision detected podinfo.test
229
+ Normal Synced 3m flagger Scaling up podinfo.test
230
+ Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
252
231
Normal Synced 3m flagger Advance podinfo.test canary weight 5
253
232
Normal Synced 3m flagger Advance podinfo.test canary weight 10
254
233
Normal Synced 3m flagger Advance podinfo.test canary weight 15
255
- Warning Synced 3m flagger Halt podinfo.test advancement request duration 2.525s > 500ms
256
- Warning Synced 3m flagger Halt podinfo.test advancement request duration 1.567s > 500ms
257
- Warning Synced 3m flagger Halt podinfo.test advancement request duration 823ms > 500ms
258
234
Normal Synced 2m flagger Advance podinfo.test canary weight 20
259
235
Normal Synced 2m flagger Advance podinfo.test canary weight 25
260
236
Normal Synced 1m flagger Advance podinfo.test canary weight 30
261
- Warning Synced 1m flagger Halt podinfo.test advancement success rate 82.33% < 99%
262
- Warning Synced 1m flagger Halt podinfo.test advancement success rate 87.22% < 99%
263
- Warning Synced 1m flagger Halt podinfo.test advancement success rate 94.74% < 99%
264
237
Normal Synced 1m flagger Advance podinfo.test canary weight 35
265
238
Normal Synced 55s flagger Advance podinfo.test canary weight 40
266
239
Normal Synced 45s flagger Advance podinfo.test canary weight 45
267
240
Normal Synced 35s flagger Advance podinfo.test canary weight 50
268
- Normal Synced 25s flagger Copying podinfo-canary .test template spec to podinfo.test
269
- Warning Synced 15s flagger Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
270
- Normal Synced 5s flagger Promotion completed! Scaling down podinfo-canary .test
241
+ Normal Synced 25s flagger Copying podinfo.test template spec to podinfo-primary .test
242
+ Warning Synced 15s flagger Waiting for podinfo-primary .test rollout to finish: 1 of 2 updated replicas are available
243
+ Normal Synced 5s flagger Promotion completed! Scaling down podinfo.test
271
244
```
272
245
273
246
During the canary analysis you can generate HTTP 500 errors and high latency to test if Flagger pauses the rollout.
@@ -313,45 +286,8 @@ Events:
313
286
Normal Synced 2m flagger Halt podinfo.test advancement success rate 55.06% < 99%
314
287
Normal Synced 2m flagger Halt podinfo.test advancement success rate 47.00% < 99%
315
288
Normal Synced 2m flagger (combined from similar events): Halt podinfo.test advancement success rate 38.08% < 99%
316
- Warning Synced 1m flagger Rolling back podinfo-canary.test failed checks threshold reached 10
317
- Warning Synced 1m flagger Canary failed! Scaling down podinfo-canary.test
318
- ```
319
-
320
- Trigger a new canary deployment by updating the canary image:
321
-
322
- ``` bash
323
- kubectl -n test set image deployment/podinfo-canary \
324
- podinfod=quay.io/stefanprodan/podinfo:1.2.1
325
- ```
326
-
327
- Steer detects that the canary revision changed and starts a new rollout:
328
-
329
- ```
330
- kubectl -n test describe canary/podinfo
331
-
332
- Status:
333
- Canary Revision: 19871136
334
- Failed Checks: 0
335
- State: finished
336
- Events:
337
- Type Reason Age From Message
338
- ---- ------ ---- ---- -------
339
- Normal Synced 3m flagger New revision detected podinfo-canary.test old 17211012 new 17246876
340
- Normal Synced 3m flagger Scaling up podinfo.test
341
- Warning Synced 3m flagger Waiting for podinfo.test rollout to finish: 0 of 1 updated replicas are available
342
- Normal Synced 3m flagger Advance podinfo.test canary weight 5
343
- Normal Synced 3m flagger Advance podinfo.test canary weight 10
344
- Normal Synced 3m flagger Advance podinfo.test canary weight 15
345
- Normal Synced 2m flagger Advance podinfo.test canary weight 20
346
- Normal Synced 2m flagger Advance podinfo.test canary weight 25
347
- Normal Synced 1m flagger Advance podinfo.test canary weight 30
348
- Normal Synced 1m flagger Advance podinfo.test canary weight 35
349
- Normal Synced 55s flagger Advance podinfo.test canary weight 40
350
- Normal Synced 45s flagger Advance podinfo.test canary weight 45
351
- Normal Synced 35s flagger Advance podinfo.test canary weight 50
352
- Normal Synced 25s flagger Copying podinfo-canary.test template spec to podinfo.test
353
- Warning Synced 15s flagger Waiting for podinfo.test rollout to finish: 1 of 2 updated replicas are available
354
- Normal Synced 5s flagger Promotion completed! Scaling down podinfo-canary.test
289
+ Warning Synced 1m flagger Rolling back podinfo.test failed checks threshold reached 10
290
+ Warning Synced 1m flagger Canary failed! Scaling down podinfo.test
355
291
```
356
292
357
293
### Monitoring
@@ -388,9 +324,22 @@ Advance podinfo.test canary weight 40
388
324
Halt podinfo.test advancement request duration 1.515s > 500ms
389
325
Advance podinfo.test canary weight 45
390
326
Advance podinfo.test canary weight 50
391
- Copying podinfo-canary.test template spec to podinfo-primary.test
392
- Scaling down podinfo-canary.test
393
- Promotion completed! podinfo-canary.test revision 81289
327
+ Copying podinfo.test template spec to podinfo-primary.test
328
+ Halt podinfo-primary.test advancement waiting for rollout to finish: 1 old replicas are pending termination
329
+ Scaling down podinfo.test
330
+ Promotion completed! podinfo.test
331
+ ```
332
+
333
+ Flagger exposes Prometheus metrics that can be used to determine the canary analysis status and the destination weight values:
334
+
335
+ ``` bash
336
+ # Canary status
337
+ # 0 - running, 1 - successful, 2 - failed
338
+ flagger_canary_status{name=" podinfo" namespace=" test" } 1
339
+
340
+ # Canary traffic weight
341
+ flagger_canary_weight{workload=" podinfo-primary" namespace=" test" } 95
342
+ flagger_canary_weight{workload=" podinfo" namespace=" test" } 5
394
343
```
395
344
396
345
### Roadmap
0 commit comments