CA-388933: rework GC Active lock to ensure GC starts #679
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Commit b7b90c8 was addresing a race condition between the Garbage Collector and the SR detach operation where the GC could get started while the SR detach was proceeding due to there not being any mutual exclusion between these operations. To address this the code was changed to obtain the gc_active lock only when within the SR lock, which is also held by the detach operation. This prevents the starting GC process from acquiring the active lock until the detach has completed, at which point the SR would be detached and the GC would exit.
There was a problem with this commit in that it used a Lock acquireNoblock to acquire the SR lock and if it failed to do so assumed that this meant the GC was already running. As the GC is typically kicked as the result of a VDI delete (or manually as an SR scan) the SR lock would be held. This results in the GC lock acquisition being racy and dependent on the time taken for the process to fork and daemonise. The outcome of this is that under some conditions the GC process will never start and cleanup will not occur, leading to an inability to take new snapshots when the maximum chain length is exceeded. It should instead be using a blocking acquire which will wait until the current holder exits. This commit applies this change.