-
Notifications
You must be signed in to change notification settings - Fork 660
Reduce memory usage by caching commits on their sha hash #4681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4e5ed74
to
33f7107
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very promising! 👍🏼
new-cli/GitVersion.Core.Libgit2Sharp/GitVersion.Core.Libgit2Sharp.csproj
Outdated
Show resolved
Hide resolved
@pvanbuijtene I'm quite happy with the direction you're heading with these changes, I will have a look later as well. Thanks for shared the anonymous repo |
60a70cd
to
c295a82
Compare
As you might have noticed, there are some tests failing. I think one of the reasons for this is that the cache's life span is longer than you would normally expect I think. From a CLI perspective you would request the version using GitVersion and you're done. Within the tests there are complete scenarios being checked and I'm not sure if and how the cache should behave. As an example this one, it fails with caching of branches and succeeds without branch caching: https://github.com/GitTools/GitVersion/blob/main/src/GitVersion.Core.Tests/VersionCalculation/NextVersionCalculatorTests.cs#L124 [Test]
public void MergeFeatureIntoMainline()
{
var configuration = TrunkBasedConfigurationBuilder.New.Build();
using var fixture = new EmptyRepositoryFixture();
fixture.MakeACommit();
fixture.ApplyTag("1.0.0");
fixture.AssertFullSemver("1.0.0", configuration);
fixture.BranchTo("feature/foo");
fixture.MakeACommit();
fixture.AssertFullSemver("1.1.0-foo.1", configuration);
fixture.ApplyTag("1.1.0-foo.1");
fixture.Checkout(MainBranch);
fixture.MergeNoFF("feature/foo");
fixture.AssertFullSemver("1.1.0", configuration);
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within the tests there are complete scenarios being checked and I'm not sure if and how the cache should behave. Maybe it should be cleared with certain steps, or it should be disabled 🤔
As an example this one, it fails with caching of branches and succeeds without branch caching […]
If the cache breaks tests, I'm thinking it might be uncovering a problem with the cache.
Since this PR was originally about caching the Commit
class, perhaps we should keep it at that and ensure that works with 100% green tests before we move on with a future followup PR where we attempt to cache even more Git objects?
I will try to investigate why those tests are failing as well |
c295a82
to
fe5e6b0
Compare
Think I would agree with that, caching branches and tags is as you might expect not as straight forward as the commits. |
fe5e6b0
to
fa09804
Compare
@pvanbuijtene I see you managed to fix the CI build, please rebase onto |
f71242b
to
df996c4
Compare
@arturcic it was more a test to see if I could get it work. I couldn't get the tests to work with cached Let me know what you think, and if you want some tests specifically for the |
I think for now we can skip those 2. |
@pvanbuijtene have you run this version of GitVersion for that repo with the issues? If so can you provide some statistics/graphs with the comparison? |
df996c4
to
bd47a66
Compare
I didn't, but here are the results. Main @ 9939c56: |
|
So there was quite a pressure on the GC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, approved, thanks for the time you spent
@asbjornu time for a second review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is most excellent! Well done! 👏🏼 I have a few questions and comments that can be ignored if there's good reason to, which is why I'm approving. But I would appreciate if they were addressed. :)
return new Branch(innerBranch, repoDiff, this); | ||
} | ||
|
||
var cacheKey = $"{innerBranch.CanonicalName}|{innerBranch.Tip.Sha}|{innerBranch.RemoteName}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not discovering this earlier. It's not very important, but will improve debugging if the format of the key matches what one would expect to find used in Git and GitHub a bit more than this.
var cacheKey = $"{innerBranch.CanonicalName}|{innerBranch.Tip.Sha}|{innerBranch.RemoteName}"; | |
var cacheKey = $"{innerBranch.RemoteName}/{innerBranch.CanonicalName}@{innerBranch.Tip.Sha}|"; |
|
||
public Tag GetOrCreate(LibGit2Sharp.Tag innerTag, Diff repoDiff) | ||
{ | ||
var cacheKey = $"{innerTag.CanonicalName}|{innerTag.Target.Sha}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var cacheKey = $"{innerTag.CanonicalName}|{innerTag.Target.Sha}"; | |
var cacheKey = $"{innerTag.CanonicalName}@{innerTag.Target.Sha}"; |
public Tag GetOrCreate(LibGit2Sharp.Tag innerTag, Diff repoDiff) | ||
{ | ||
var cacheKey = $"{innerTag.CanonicalName}|{innerTag.Target.Sha}"; | ||
return cachedTags.GetOrAdd(cacheKey, new Tag(innerTag, repoDiff, this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cachedTags.GetOrAdd(cacheKey, new Tag(innerTag, repoDiff, this)); | |
return cachedTags.GetOrAdd(cacheKey, () => new Tag(innerTag, repoDiff, this)); |
} | ||
|
||
public Commit GetOrCreate(LibGit2Sharp.Commit innerCommit, Diff repoDiff) => | ||
cachedCommits.GetOrAdd(innerCommit.Sha, new Commit(innerCommit, repoDiff, this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cachedCommits.GetOrAdd(innerCommit.Sha, new Commit(innerCommit, repoDiff, this)); | |
cachedCommits.GetOrAdd(innerCommit.Sha, () => new Commit(innerCommit, repoDiff, this)); |
} | ||
|
||
var cacheKey = $"{innerBranch.CanonicalName}|{innerBranch.Tip.Sha}|{innerBranch.RemoteName}"; | ||
return cachedBranches.GetOrAdd(cacheKey, new Branch(innerBranch, repoDiff, this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't lazy initialization here be even more optimal, avoiding the construction of a Branch
instance altogether if it already exists?
return cachedBranches.GetOrAdd(cacheKey, new Branch(innerBranch, repoDiff, this)); | |
return cachedBranches.GetOrAdd(cacheKey, () => new Branch(innerBranch, repoDiff, this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the rename from GitCache
to GitRepository
? I don't feel like the provided functionality is quite that of a repository, which is often used as an abstraction over a database. 🤔
Thank you @pvanbuijtene for your contribution! |
Oh, it was set to auto-merge. 😅 Well well. You can work through my comments and see if you think they are worthy a followup PR, @pvanbuijtene. |
oh sorry about that, I might create a PR with the suggestions |
@pvanbuijtene: Thank you for your contribution and the work you have done. I have following remarks/suggestions:
|
@asbjornu @HHobeck @pvanbuijtene follow up here #4685 |
Description
Reduce the amount of allocations of
Commit
instances.Related Issue
Fix #4680, Fix #4409
Motivation and Context
After some investigation it turned out that a lot of commits are being re-created during the process of calculating the version.
By implementing a cache for the commits the overall memory consumption went from ~10GB to ~400MB.
How Has This Been Tested?
The solution was tested by running the project again with the changes against the same repository resulting in less memory consumption as shown below.
The problem of the memory usage was internally reported on Linux environments and reproduced on Windows.
My machine has 64GB & 22 logical processors.
Screenshots (if appropriate):
The CPU and Memory consumption before the change, peaking at ~10GB.


The CPU and Memory consumption after the change, peaking at ~400MB.
Checklist: