Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes issue when multiple VRGs conflict with the PVCs being protected #1535

Draft
wants to merge 117 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
ed18630
Fixes https://issues.redhat.com/browse/OCSBZM-4691
asn1809 Aug 30, 2024
e127357
Moving locks to util and some refactoring
asn1809 Sep 5, 2024
ef385a5
Correcting lint issues
asn1809 Sep 5, 2024
6e45f89
Use PVC name and namespace for changing PVC conditions (#1528)
ELENAGER Aug 29, 2024
025adc3
rephrase alerts message
rakeshgm Aug 22, 2024
37aae20
Reformat python version list
nirs Sep 4, 2024
b015270
Test more python versions
nirs Sep 4, 2024
48b3643
Upgrading capabilities in the CSVs
abhijeet219 Sep 2, 2024
f8ed0e5
Update open-cluster-management to use v0.13.0
abhijeet219 May 9, 2024
93481df
Delete namespace manifestwork for applications
abhijeet219 May 9, 2024
02ba28b
Update unit-tests to verify deletion of namespace manifestwork
abhijeet219 May 12, 2024
b511e77
tests: keep all suite cleanup functions in suite_test.go
raghavendra-talur Mar 15, 2024
b00e284
Bump CSI version to 0.9.1
ELENAGER Sep 4, 2024
e198936
Skip check for duplicate controller names
ELENAGER Sep 9, 2024
13e0c77
Fix PlacementDecision Exclusion from Hub Backup Due to Missing Label
Sep 9, 2024
be4cd85
Ignore not found namespace in argocd test
nirs Sep 10, 2024
a3cf1e9
'AllowVolumeExpansion' flag enables PV resizing
rakeshgm Sep 11, 2024
e907603
exclude PV and PVC in velero backups
rakeshgm Sep 3, 2024
1ba302c
Update setup-envtest to 0.19 version
ShyamsundarR Sep 12, 2024
49fa212
Add sample commit-msg hook
nirs Aug 15, 2024
5bddec4
go.mod: run go mod tidy
raghavendra-talur Sep 13, 2024
0a467d6
Update Ramen to use latest CRD from external-snapshotter
Sep 10, 2024
4fb500d
Ensure VGS name contains at most 63 characters
Sep 12, 2024
88c742a
Ensure the Secondary VRG is updated when DRPC is updated
Sep 12, 2024
e1fc2e2
Prepare for volsync block device test (#1559)
nirs Sep 17, 2024
1530d0c
Add debug logs in envfile
nirs Sep 12, 2024
dcf5dba
More clear argument names
nirs Aug 27, 2024
9933744
Move minikube helpers to minikube module
nirs Aug 27, 2024
871da2d
Remove `drenv setup` from make-venv
nirs Sep 9, 2024
4e64b18
Introduce drenv providers
nirs Aug 27, 2024
4e00bfc
Make unused minikube function private
nirs Aug 27, 2024
8b7f6d0
Move suspend and resume to minikube provider
nirs Aug 27, 2024
a06fcf0
Improve provider interface
nirs Aug 27, 2024
5e1bb80
Move containerd configuration to minikube.configure()
nirs Aug 27, 2024
b2e2615
Move waiting for fresh status to minikube
nirs Aug 28, 2024
fa145f6
Support watching stderr
nirs Aug 29, 2024
40e9b01
Log drenv log in tests
nirs Sep 4, 2024
005a0bc
Use /readyz endpoint for ready check
nirs Sep 5, 2024
40fe77c
Add external provider
nirs Aug 28, 2024
7aa8ae4
Implement DRClusterConfig reconciler to create required ClusterClaims…
ShyamsundarR Sep 18, 2024
18e3c21
Add yaml wraper
nirs Aug 29, 2024
41d0c76
Add kubeconfig module
nirs Aug 29, 2024
f70cee5
Update kubectl-gather to 0.5.1
nirs Sep 4, 2024
fed1215
Add setup for macOS
nirs Sep 4, 2024
6f52e8f
Add lima provider for Apple silicon
nirs Aug 29, 2024
ee98edb
Add development environment for lima
nirs Sep 2, 2024
756c2fb
Add a delay after starting a stopped cluster
nirs Sep 2, 2024
61a078f
Log limactl errors
nirs Sep 9, 2024
67e6cbc
Log invalid json logs in debug mode
nirs Sep 12, 2024
9002195
Disable port forwarding
nirs Aug 30, 2024
940d012
Change API server to use the shared network
nirs Aug 30, 2024
20d5e90
Configure kubelet to use the right IP address
nirs Aug 31, 2024
89c2ddd
Configure kubelet to pull images in parallel
nirs Aug 31, 2024
5d0e348
Configure kubelet feature gates
nirs Sep 1, 2024
201b273
Increase fs.inotify limits
nirs Sep 5, 2024
e63bcb3
Update minio to latest release
nirs Aug 31, 2024
2a39429
Use hostpath storage for minio
nirs Aug 31, 2024
6035794
Support commands reading from stdin
nirs Sep 8, 2024
44584e5
Support command work directory
nirs Sep 8, 2024
af53a2a
Introduce drenv load command
nirs Sep 8, 2024
8924dc7
Use drenv load to load images
nirs Aug 31, 2024
c4f5f54
Disable broker certificate check on macOS
nirs Sep 5, 2024
b12d40e
Annotate nodes with submariner public ip
nirs Sep 6, 2024
5246b03
Promote vmnet shared network route
nirs Sep 8, 2024
28a5e87
Upgrade submariner to 0.18.0
nirs Sep 8, 2024
d55abec
Wait for all clusters before deploying submariner
nirs Sep 6, 2024
bb9fa7a
Simplify connectivity check
nirs Sep 7, 2024
ee7617c
Fix submariner test container to keep running
nirs Sep 7, 2024
88da933
Enable submariner for lima
nirs Sep 8, 2024
51d331a
Add external-snapshotter addon
nirs Sep 9, 2024
5187293
Replace volumesnapshot with external-snapshotter
nirs Sep 9, 2024
9ac2f37
Fix argocd deployment on macOS
nirs Sep 10, 2024
145065b
Avoid random failures when deleting environment
nirs Sep 13, 2024
634d247
Ensure deleted DRPolicies do not add schedules to DRClusterConfig (#1…
ShyamsundarR Sep 19, 2024
35e5d26
Update csi-addons to v0.10.0 in drenv
nirs Sep 18, 2024
0a895a5
Update csi-addons requirement to v.0.10.0 in ramen
nirs Sep 18, 2024
cad86be
Adjust drenv to support consistency groups
ELENAGER Sep 18, 2024
44c1776
ci: use lower version of z as hard dependency
Madhu-1 Sep 24, 2024
948ff76
go.mod: update to newer version of recipe api
raghavendra-talur Sep 24, 2024
46d441c
test: add the updated recipe crd to the test dir
raghavendra-talur Sep 24, 2024
6fda9b2
controller: update implementation to match with recipe CRD
raghavendra-talur Sep 24, 2024
88168ee
go.mod: update to latest version of the ramen api
raghavendra-talur Sep 24, 2024
0ee14df
Add ClusterClaim to the ramen-dr-cluster reconciler scheme
ShyamsundarR Sep 25, 2024
24e0133
Kustomize rbd-mirror directory.
ELENAGER Sep 22, 2024
0134dea
Add flag for enabling or disabling consistency groups
ELENAGER Sep 26, 2024
627ccc8
Refactor: Extract resource watching logic from drpc controller into d…
Sep 26, 2024
1204662
Watch for DRPolicy resource changes
Sep 26, 2024
d17d112
Add external-snapshotter addon to kubevirt envs
nirs Sep 24, 2024
fcdddfd
Reformat reconcileMissingVR comment for readability
nirs Sep 23, 2024
fb1d4fb
Fix logs in reconcileMissingVR
nirs Sep 23, 2024
5fa81f5
Extract VRGInstance.validateVRCompletedStatus()
nirs Sep 23, 2024
99093c5
Clarify isVRConditionMet
nirs Sep 23, 2024
5661d38
Improve logging in delete VR flow
nirs Sep 23, 2024
a4d0b54
Fix disable dr if VR failed validation
nirs Sep 23, 2024
38654c6
Updating broken clusteradm link in the doc
abhijeet219 Oct 1, 2024
8a70425
Add daily e2e job for refreshing the cache
nirs Sep 19, 2024
86f5d17
drenv: add cluster name to storage id
ELENAGER Sep 29, 2024
1c04e61
Fix intermittent VolSync unit test failure
Oct 4, 2024
7e88713
Quote addresses to avoid yaml parsing surprises
nirs Sep 21, 2024
5482b46
Add drenv start --timeout option
nirs Sep 20, 2024
eede468
Make the vm environment smaller
nirs Sep 20, 2024
7043af7
Disable rosetta for vm environment
nirs Sep 21, 2024
5f9c5bb
Use lima also on darwin/x86_64
nirs Sep 19, 2024
77dc276
Scale down coredns to 1 replica
nirs Sep 22, 2024
7049474
Don't wait for coredns deployment
nirs Sep 22, 2024
cd6cffb
Clean up probing for completion
nirs Sep 22, 2024
3692dc5
Simplify port forwarding rules
nirs Sep 22, 2024
c10641d
Group kubeconfig setup in same provision step
nirs Sep 22, 2024
b911b9a
Remove unneeded sudo usage
nirs Sep 22, 2024
da3dc20
Remove unneeded route fix
nirs Sep 30, 2024
030adfa
Disallow relocation execution when a cluster is unreachable
Oct 4, 2024
79be579
Fixes https://issues.redhat.com/browse/OCSBZM-4691
asn1809 Aug 30, 2024
9611e95
Moving locks to util and some refactoring
asn1809 Sep 5, 2024
87c5a1d
Adding changes to the locking mechanism
asn1809 Sep 20, 2024
62faf94
TODO: Comments check
asn1809 Sep 25, 2024
af60e2a
Some more changes
asn1809 Oct 8, 2024
04f99b1
Condition changes
asn1809 Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions internal/controller/util/nslock.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// SPDX-FileCopyrightText: The RamenDR authors
// SPDX-License-Identifier: Apache-2.0

package util

import (
"sync"
)

// NamespaceLock implements atomic operation for namespace. It will have the namespace
// having multiple vrgs in which VRGs are being processed.
type NamespaceLock struct {
namespace string
mux sync.Mutex
}

// NewNamespaceLock returns new NamespaceLock
func NewNamespaceLock() *NamespaceLock {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : Pass ns string and use for struct init.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just gave a thought on what we discussed. volumereplicationgroup_controller -- SetupWithManager (locks are initialized) r.locks = rmnutil.NewNamespaceLock() ns cannot be passed here as this call is well before Reconcile is invoked. So instead of string with ns, set of namespaces(locked) ones would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, it could be

type NamespaceLock struct {
	mux       map[string]sync.Mutex
}

return &NamespaceLock{}
}

// TryToAcquireLock tries to acquire the lock for processing VRG in a namespace having
// multiple VRGs and returns true if successful.
// If processing has already begun in the namespace, returns false.
func (nl *NamespaceLock) TryToAcquireLock(namespace string) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : Namespace arg is not required as it would be already acquired.

nl.mux.Lock()
defer nl.mux.Unlock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : This should be removed.


if nl.namespace == namespace {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : Handle this according to struct.

return false
}

if nl.namespace == "" {
nl.namespace = namespace
}

return true
}

// Release removes lock on the namespace
func (nl *NamespaceLock) Release(namespace string) {
nl.mux.Lock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : Remove lock line.

defer nl.mux.Unlock()
nl.namespace = ""
}
18 changes: 18 additions & 0 deletions internal/controller/util/nslock_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
// SPDX-FileCopyrightText: The RamenDR authors
// SPDX-License-Identifier: Apache-2.0

package util_test

import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/ramendr/ramen/internal/controller/util"
)

var _ = Describe("Testing Locks", func() {
nsLock := util.NewNamespaceLock()
Expect(nsLock.TryToAcquireLock("test")).To(BeTrue())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raghavendra-talur : May have to use go func or anonymous functions.

Expect(nsLock.TryToAcquireLock("test")).To(BeFalse())
nsLock.Release("test")
Expect(nsLock.TryToAcquireLock("test")).To(BeTrue())
})
38 changes: 38 additions & 0 deletions internal/controller/volumereplicationgroup_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ type VolumeReplicationGroupReconciler struct {
kubeObjects kubeobjects.RequestsManager
RateLimiter *workqueue.RateLimiter
veleroCRsAreWatched bool
locks *rmnutil.NamespaceLock
}

// SetupWithManager sets up the controller with the Manager.
Expand Down Expand Up @@ -115,6 +116,8 @@ func (r *VolumeReplicationGroupReconciler) SetupWithManager(
r.Log.Info("Kube object protection disabled; don't watch kube objects requests")
}

r.locks = rmnutil.NewNamespaceLock()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass the vrg ns as arg.


return ctrlBuilder.Complete(r)
}

Expand Down Expand Up @@ -437,6 +440,12 @@ func (r *VolumeReplicationGroupReconciler) Reconcile(ctx context.Context, req ct
"Please install velero/oadp and restart the operator", v.instance.Namespace, v.instance.Name)
}

err = v.vrgParallelProcessingCheck(adminNamespaceVRG)

if err != nil {
Copy link
Member Author

@asn1809 asn1809 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err will not be not nil anytime.

return ctrl.Result{Requeue: true}, err
}

v.volSyncHandler = volsync.NewVSHandler(ctx, r.Client, log, v.instance,
v.instance.Spec.Async, cephFSCSIDriverNameOrDefault(v.ramenConfig),
volSyncDestinationCopyMethodOrDefault(v.ramenConfig), adminNamespaceVRG)
Expand Down Expand Up @@ -1626,3 +1635,32 @@ func (r *VolumeReplicationGroupReconciler) addKubeObjectsOwnsAndWatches(ctrlBuil

return ctrlBuilder
}

func (v *VRGInstance) vrgParallelProcessingCheck(adminNamespaceVRG bool) error {
ns := v.instance.Namespace

if !adminNamespaceVRG {
vrgList := &ramendrv1alpha1.VolumeReplicationGroupList{}
listOps := &client.ListOptions{
Namespace: ns,
}

err := v.reconciler.APIReader.List(context.Background(), vrgList, listOps)
if err != nil {
v.log.Error(err, "Unable to list the VRGs in the", " namespace ", ns)

return err
}

// if the number of vrgs in the ns is more than 1, lock is needed.
if len(vrgList.Items) > 1 {
if isLockAcquired := v.reconciler.locks.TryToAcquireLock(ns); !isLockAcquired {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since TryToAcquireLock is a blocking call, not of response will not be helpful. Think on alternative.

// Acquiring lock failed, VRG reconcile should be requeued
return err
}
defer v.reconciler.locks.Release(ns)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release of the locks should be done once PVC is verified for the ownership label.

}
}

return nil
}