Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8ssandra-operator memory leak after failed MedusaBackupJob #1312

Open
adziura-tcloud opened this issue May 13, 2024 · 1 comment · May be fixed by thelastpickle/cassandra-medusa#786
Open
Assignees
Labels
bug Something isn't working review Issues in the state 'review'

Comments

@adziura-tcloud
Copy link

adziura-tcloud commented May 13, 2024

What happened?
After upgrading K8ssandra-operator from 1.7.0 to 1.16.0 I noticed periodic restarts of k8ssandra-operator due to memory leak.
It happened twice right after failed MedusaBackupJob

spec:
  backupType: full
  cassandraDatacenter: gke-dc1
status:
  finished:
  - example-test-1-gke-dc1-r1-sts-2
  - example-test-1-gke-dc1-r1-sts-3
  - example-test-1-gke-dc1-r1-sts-1
  inProgress:
  - example-test-1-gke-dc1-r1-sts-0
  startTime: "2024-05-12T00:30:15Z"

MedusaBackupJob failed because of POD OOM restart (we are using pretty small instances on test env)

Also MedusaBackupSchedule is not working after failed backup - not creating new MedusaBackupJobs

Did you expect to see something different?
Failed backup should not affect operator and next backups.

How to reproduce it (as minimally and precisely as possible):
I think killing one POD during the backup should work

  • K8ssandra Operator version: 1.16.0
  • Kubernetes version information: 1.29
  • Kubernetes cluster kind: GKE
  • K8ssandra Operator Logs: I see a lot of MedusaBackupJob is still being processed INFO messages
2024-05-13T07:07:34.916Z	INFO	MedusaBackupJob is still being processed	{"controller": "medusabackupjob", "controllerGroup": "medusa.k8ssandra.io", "controllerKind": "MedusaBackupJob", "MedusaBackupJob": {"name":"daily-1715473800","namespace":"cassandra"}, "namespace": "cassandra", "name": "daily-1715473800", "reconcileID": "ace193e0-a3b6-4935-80f7-736b7fbae2a0", "medusabackupjob": "cassandra/daily-1715473800", "Backup": {"namespace": "cassandra", "name": "daily-1715473800"}}

┆Issue is synchronized with this Jira Story by Unito
┆Fix Versions: 2024-10,2024-11
┆Issue Number: K8OP-26

@adziura-tcloud adziura-tcloud added the bug Something isn't working label May 13, 2024
@rzvoncek rzvoncek moved this to In Progress in K8ssandra Jun 18, 2024
@rzvoncek rzvoncek self-assigned this Jun 18, 2024
@adejanovski adejanovski added the in-progress Issues in the state 'in-progress' label Jun 18, 2024
@rzvoncek rzvoncek moved this from In Progress to Review in K8ssandra Jun 19, 2024
@adejanovski adejanovski added review Issues in the state 'review' and removed in-progress Issues in the state 'in-progress' labels Jun 19, 2024
@adejanovski adejanovski moved this from Review to Ready For Review in K8ssandra Jul 16, 2024
@adejanovski adejanovski added ready-for-review Issues in the state 'ready-for-review' and removed review Issues in the state 'review' labels Jul 16, 2024
@adejanovski adejanovski moved this from Ready For Review to Review in K8ssandra Jul 17, 2024
@adejanovski adejanovski added review Issues in the state 'review' and removed ready-for-review Issues in the state 'ready-for-review' labels Jul 17, 2024
Copy link

sync-by-unito bot commented Oct 14, 2024

➤ Radovan Zvoncek commented:

PR to review: thelastpickle/cassandra-medusa#786 ( thelastpickle/cassandra-medusa#786 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working review Issues in the state 'review'
Projects
No open projects
Status: Review
3 participants