Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KEDA HPA TriggerAuthentication and postgresql ScaledObject. #2384

Closed
wants to merge 162 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
162 commits
Select commit Hold shift + click to select a range
90532c6
Add KEDA HPA TriggerAuthentication and postgresql ScaledObject.
pt247 Apr 8, 2024
57c188f
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Apr 8, 2024
bf87d9d
Terraform fmt.
pt247 Apr 8, 2024
6b1e7ff
More reactive scale up and down.
pt247 Apr 9, 2024
e57aa74
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 Apr 9, 2024
d62ecea
Formatting changes.
pt247 Apr 9, 2024
6dbfcef
Formating changes.
pt247 Apr 9, 2024
d07928c
Tweak default parameters.
pt247 Apr 9, 2024
b1be5b5
Code refactor.
pt247 Apr 9, 2024
c6a38cb
Set max nodes of general node to 5.
pt247 Apr 9, 2024
5d0607c
Add node affinity for KEDA pods to general node.
pt247 Apr 9, 2024
4d61350
Set maxReplicaCount for conda worker scaling.
pt247 Apr 9, 2024
bbae748
Move keda resources to conda.
pt247 Apr 9, 2024
bcb8a82
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Apr 9, 2024
4106d7d
Fix variable discription.
pt247 Apr 10, 2024
22e1fbe
Keeping default as more aggressive polling of postgresql is resulting…
pt247 Apr 15, 2024
bd5b051
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 Apr 15, 2024
701eb6e
Add resource limits for conda pods.
pt247 Apr 16, 2024
bff0c4c
Set CondaStoreWorker.concurrency = 1
pt247 Apr 17, 2024
0c2af1d
Expose worker resources and replica count to Nebari config.
pt247 Apr 17, 2024
75a194c
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Apr 17, 2024
7db539d
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 Apr 23, 2024
f01bda9
Add integration test for KEDA.
Apr 25, 2024
e0442bc
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Apr 25, 2024
3236832
Fix integration test for KEDA.
Apr 25, 2024
d78153b
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Apr 25, 2024
bef082e
disable ssl verify.
Apr 25, 2024
5339861
Ignore insecure request warning.
Apr 25, 2024
7d19315
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Apr 25, 2024
c10457c
Increase timer for scaledown.
Apr 25, 2024
92537ae
Keep replica count for conda-store-worker deployment as 0 to start with.
Apr 26, 2024
164f548
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 May 1, 2024
2dfffce
Modify test.
May 2, 2024
01975f8
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 2, 2024
c538bc1
Add more memory to conda-store worker.
May 2, 2024
832613a
Add more CPU to conda-store worker.
May 2, 2024
7fcadf9
Reduce cpu back to 250 for conda-store-workers.
May 3, 2024
c0a1a3e
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 3, 2024
479f708
Increase replicas back to 1.
May 3, 2024
4851b05
Revert resource constraints for conda-store-worker.
May 3, 2024
0431fca
Fix memory and cpu for conda store workers.
May 3, 2024
d5e4703
Adjust CPU and Memory consumptions.
May 4, 2024
b0349a1
Increase CPU to 1 core.
May 4, 2024
9a9ed3e
Debug keda test.
May 4, 2024
5b84bd2
Reduce memory for conda worker and add more info logs.
May 5, 2024
f8e03eb
Fix test.
May 5, 2024
d10c05e
Try CONDA_STORE_TOKEN from env
May 5, 2024
609ee7f
Fix logging.
May 5, 2024
72bfb2d
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 5, 2024
3fa05ae
Re-enable configmap patch in test.
May 6, 2024
d5370e6
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 6, 2024
2f23a23
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 May 8, 2024
b3b7b82
Setup tmate.
May 8, 2024
ebdb6e4
Update test.
May 8, 2024
1073877
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 8, 2024
b7d6e45
Fix env url.
May 8, 2024
21af4d7
Test refactor. rebase master.
May 9, 2024
225b887
Pause CI on failour.
May 9, 2024
7458aa2
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 9, 2024
2edb084
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 9, 2024
9702f2d
Fix test.
May 9, 2024
1cea90c
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 9, 2024
b1e51a0
Fix test.
May 9, 2024
94dd480
Skip failing cypress tests.
May 9, 2024
330a507
Skip failing cypress tests.
May 9, 2024
cfdafce
Fix test.
May 9, 2024
f04a4be
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 9, 2024
4bc8c88
Fix test.
May 9, 2024
df7efd7
Add cyprus tests back.
May 9, 2024
c4bad9c
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 9, 2024
c6eccf7
Remove changes from ci.
May 9, 2024
57df7fe
Remove node affinity for testing.
May 9, 2024
be4e4fa
Run pytest first.
May 9, 2024
c2c894c
Reduce cooldown period for tests.
May 9, 2024
59d1412
Change test for CI.
May 9, 2024
02fa687
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 9, 2024
e955d6d
Still increase timeout for test to finish
May 9, 2024
b701900
ignore::pytest.PytestUnraisableExceptionWarning
May 9, 2024
2ef1414
Fix test decorators.
May 9, 2024
118f3e0
Limit to 2 envs.
May 9, 2024
9cec9cf
IncreaseCI memort.
May 10, 2024
df63165
Test refactor.
May 10, 2024
40d129d
Revert ci workflow changes.
May 10, 2024
b2e1567
Remove unrelated changes.
May 10, 2024
8780325
Skip Cyprus tests.
May 10, 2024
52b4b63
Minor test refactor.
May 10, 2024
71cdf88
Revert inctance change.
May 10, 2024
f1208d0
Revert inctance change.
May 10, 2024
ee385db
Revert test_local_integration.yaml chanes.
May 10, 2024
49161f4
Add nodeselector for Keda.
May 12, 2024
7f78618
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 12, 2024
c7791e8
Remove cyprus tests.
May 13, 2024
14a79f6
Reduce pollingInterval and cooldownPeriod for tests.
May 13, 2024
41ff5c5
Reduce number of deployments to 1 for testing.
May 13, 2024
a9860ec
Add tmate on failour.
May 13, 2024
3c6dae1
tqdm instead of pandas for test.
May 13, 2024
71e572c
Fix tmate location.
May 13, 2024
ff5fbc1
r5a.12xlarge
May 13, 2024
a4bb7f2
Remove tmate.
May 13, 2024
2cbf117
Fix terraform format.
May 13, 2024
c104554
r5ad.4xlarge
May 13, 2024
19f8b61
Skip test_scale_up_and_down.
May 14, 2024
2fd929a
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
85856d2
Rebase
May 14, 2024
306b5c0
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
24cb0c8
Remove commentes.
May 14, 2024
fbd4f73
Remove print statements.
May 14, 2024
e18835d
Refactor test.
May 14, 2024
f1de525
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
0954b6e
Add logs.
May 14, 2024
1c73150
Add more logging.
May 14, 2024
e839b61
Update timer.
May 14, 2024
a0a66fe
Remove ignore::pytest.PytestUnraisableExceptionWarning fixture from t…
May 14, 2024
c315a24
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
b5ced72
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 14, 2024
daf3a5f
Test cleanup.
May 14, 2024
9703916
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
68966b1
Revert cirun instance_type.
May 14, 2024
3ec7df7
Add variable needed for pytest and sync file with develop.
May 14, 2024
789b461
Refactor test.
May 14, 2024
4b35cef
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
d9f3502
Upgrade python client for kubernetes version to 29.0.0
May 14, 2024
2c94f6e
add comment in test.
May 14, 2024
10140bb
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 14, 2024
d9d2338
Include cypress tests.
May 14, 2024
52716f8
Minor change to trigger local-integration-tests.
May 14, 2024
4889c8b
Ingore DeprecationWarning in tests.
May 14, 2024
8321804
Update test_local_integration.yaml
pt247 May 14, 2024
fe70439
Update test_conda_store_scaling.py
pt247 May 14, 2024
3d84840
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
13aa233
Revert to hardcoded namespace for testing.
May 14, 2024
04a21da
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
b46a3a3
Re-add cypress tests in CI.
May 14, 2024
388aca5
Remove hardocded namespace from test.
May 14, 2024
ecd50f3
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 14, 2024
6c40448
Revert version upgrade for kubernetes client.
May 14, 2024
a593758
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 15, 2024
a54f291
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 20, 2024
01e6ac4
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 21, 2024
6c9ce07
Fix node_slector lookup.
May 22, 2024
a1bce8f
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] May 22, 2024
73b933d
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 24, 2024
bcddb7b
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 28, 2024
40ee18f
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 28, 2024
c4a748b
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 30, 2024
9a782e6
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 May 31, 2024
87089a4
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 Jun 3, 2024
93b51e7
Deployment and pod logs.
Jun 5, 2024
dd948f6
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Jun 5, 2024
797c682
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 Jun 5, 2024
2eba1bf
Merge branch 'nebari-dev:develop' into 2284-keda-conda-store-worker-hpa
pt247 Jun 7, 2024
95c84b4
KEDA scaling based on conda-store API.
Jun 11, 2024
7fd2670
Merge branch 'develop' into 2284-keda-conda-store-worker-hpa
pt247 Jun 11, 2024
9b5b0e7
Cleanup tests.
Jun 11, 2024
44ac977
Fix conda-store-worker terrafrom file format and syntax.
Jun 11, 2024
cdad85f
Update Azure general node group max nodes to 5 to be consistent with …
Jun 11, 2024
96b70d6
Make verbose conda-store-worker logs ad debug.
Jun 11, 2024
d4378b3
Fix typo.
Jun 11, 2024
2e7b1b6
Cleanup KEDA scaleed object config.
Jun 11, 2024
2c39a05
[pre-commit.ci] Apply automatic pre-commit fixes
pre-commit-ci[bot] Jun 11, 2024
142633c
Terrafrom fmt.
Jun 11, 2024
c72f95a
Reduce default max workers to 4.
Jun 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions src/_nebari/stages/kubernetes_initialize/template/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,8 @@ module "nvidia-driver-installer" {
gpu_enabled = var.gpu_enabled
gpu_node_group_names = var.gpu_node_group_names
}

module "keda-installer" {
source = "./modules/keda"
namespace = var.environment
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
resource "helm_release" "keda" {
name = "keda"
namespace = var.namespace
repository = "https://kedacore.github.io/charts"
chart = "keda"
version = "2.13.2"
wait_for_jobs = "true"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
variable "namespace" {
description = "deploy argo server on this namespace"
type = string
default = "dev"
}
Original file line number Diff line number Diff line change
Expand Up @@ -75,3 +75,12 @@ module "conda-store-nfs-mount" {
module.kubernetes-conda-store-server
]
}

module "conda-store-worker-hpa" {
source = "./modules/kubernetes/services/worker-hpa"
namespace = var.environment

depends_on = [
module.kubernetes-conda-store-server
]
}
Original file line number Diff line number Diff line change
Expand Up @@ -202,3 +202,64 @@ resource "kubernetes_deployment" "worker" {
}
}
}

resource "kubernetes_manifest" "triggerauthenticator" {
manifest = {
apiVersion = "keda.sh/v1alpha1"
kind = "TriggerAuthentication"

metadata = {
name = "trigger-auth-postgres"
namespace = var.namespace
}

spec = {
secretTargetRef = [
{
name = "nebari-conda-store-postgresql"
parameter = "password"
key = "postgresql-password"
}
]
}
}
}

resource "kubernetes_manifest" "scaledobject" {
manifest = {
apiVersion = "keda.sh/v1alpha1"
kind = "ScaledObject"

metadata = {
name = "scaled-conda-worker"
namespace = var.namespace
}

spec = {
scaleTargetRef = {
kind = "Deployment"
name = "nebari-conda-store-worker"
}
# minReplicaCount : 1 # Default: 0
pollingInterval : 5 # Default: 30 seconds
# cooldownPeriod : 30 # Default: 300 seconds
triggers = [
{
type = "postgresql"
pt247 marked this conversation as resolved.
Show resolved Hide resolved
metadata = {
query = "SELECT COUNT(*) FROM build WHERE status IN ('QUEUED', 'BUILDING');"
targetQueryValue = "1"
host = "nebari-conda-store-postgresql"
userName = "postgres"
port = "5432"
dbName = "conda-store"
sslmode = "disable"
}
authenticationRef = {
name = "trigger-auth-postgres"
}
}
]
}
}
}
Loading