Cloning the Linux kernel in under a minute #579
Byron
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Cloning the Linux kernel in under a minute
TLDR
Using
gitoxide
with default settings, we can now clone the linux kernel repository (receiving a pack, resolving it and a checkout of the working tree) in 43s using all cores of an M1 Pro. Canonicalgit
(with default settings) finishes the same clone in 115s, makinggitoxide
~2.7x faster.On a 16 core AMD workstation we can achieve the same clone in 30s, while canonical
git
takes 141s. Putting it into a number,gitoxide
is able to outperformgit
by a factor of ~4.8.This will make a difference on CI and locally saving time and memory, when it's ready for prime time early next year.
For reproduction, please see the
Reproduction
section at the bottom of the document, or keep going for all the details.The Results
We see that
gix
is ~1.4x faster than git on a single core, and ~2.6x faster with all cores of the test system.It's notable that the default settings of
gix
compared to the ones bygit
allow it to reach ~2.7x of its speed as it will use all cores of the test system (M1 Pro).Raw benchmark results
gix -c pack.threads=1 -c checkout.workers=1 clone ./linux ./linux-clone
git -c pack.threads=1 -c checkout.workers=1 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=1 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=1 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=3 -c checkout.workers=1 clone ./linux ./linux-clone
git -c pack.threads=3 -c checkout.workers=1 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=3 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=3 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=10 -c checkout.workers=1 clone ./linux ./linux-clone
git -c pack.threads=10 -c checkout.workers=1 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=10 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=10 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
Bonus Round: Amd64 (or how to clone in ~30s)
The test system at hand is a custom built PC with an AMD Ryzen™ 9 3950X, 64GB 3200MHz DDR4, M.2 PCIE-4 NVME SDD running Fedora 36 workstation.
It's notable how much better a standard build of
gix
performs compared to a standard build ofgit
with just a single thread being 2.9x faster. With default settings,gix
using all cores outperformsgit
by a factor of ~4.8x which effectively uses only 3 cores.Note that even though
gitoxide
scales nicely with additional cores, the absolute time saved has diminishing returns due to the pack transfer already taking ~23s, while the checkout takes 1.7s and is limited by the SSD + filesystem.Raw benchmark results
gix -c pack.threads=1 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=1 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=3 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=3 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=8 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=8 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=16 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=16 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
gix -c pack.threads=32 -c checkout.workers=4 clone ./linux ./linux-clone
git -c pack.threads=32 -c checkout.workers=4 clone file://$PWD/linux ./linux-clone
Breakdown of clone time with default settings
The times shown here may overlap due to parallel execution and do not add up to the total runtime (30 seconds).
Bonus Round: Memory Consumption
All this performance offered by
gix
might come at the expense of memory consumption, and here are two measurements created by hand with default options to represent typical usage./usr/bin/time -lp gix -v clone ./linux ./linux-clone
/usr/bin/time -lp git clone file://$PWD/linux ./linux-clone
gix
can do the same work faster and with nearly half the memory.Raw benchmark results
Bonus: Racing
git
See how
gix
tries to beatgit
in cloning the Linux kernel over the network on a beefy machine.Conclusions
gitoxide
has the potential to substantially speed upclone
operations by scaling with today's multi-core CPUs, and it will keep scaling with every new hardware generation.git
currently does not scale well with cores and it will be a major undertaking to change that.Special Thanks
I am grateful for the help of Pascal Kuthe who generously gave his time to review this post and improve it tremendously in the process. He is also responsible for the graphs, making it so much more accessible, and prettier too. Thank you!
FAQ
Can I use it now?
Yes, if the checkout does not involve submodules or rely on filters (like line-feed conversions or
git-lfs
). These features are expected to be fully implemented early next year (2023).Can I post my own results here?
Yes, please, test it on your 128core machine to see how low these numbers can go. Please note though that the runtime is dominated by transfer time which clocks in at about 28s.
How can
gitoxide
be that fast?gitoxide
has been built from the ground up for performance. It doesn't use the heap generously and reuses allocations where ever feasible.On top of that, the most time-consuming stage of a clone, the pack index creation, is algorithmically optimal such that a data structure is built to know exactly which delta to apply on which base, effectively representing the delta-tree in memory. With it one can resolve the pack, that is decompress every object, without requiring any other caches and without wasting any work or memory.
Thanks to the Rust ecosystem, it's easy to get the best performing ZLIB implementation and the fastest SHA1 hash implementation for most platforms, which affects this workload a lot. With the right hardware, this step can now scale linearly with each core, yielding ~38GB/s decompression speed on a recent AMD Ryzen.
All of the above wouldn't be possible without Rust, the key-enabler for all optimizations and fearless concurrency.
Why is canonical
git
slower on an AMD workstation than on a M1 MacBook?We found this surprsing as well, but after rerunning the benchmarks multiple times, the results turned out to be consistent.
Some component of canonical
git
is probably much better optimized on Aarch64 to greatly improve performance.As
git
does not scale nearly as well asgitoxide
across multiple cores, it's not able to capitalize on the higher core count, which increases the gap even further.Reproduction
The test setup
We will use the Linux kernel as a benchmark. To get a reliable benchmark we exclude the network by using a local copy of the repository. To get clone performance similar to the one of an optimized server we also enable some caches.
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git cd linux git checkout v6.0-rc3 git repack --write-bitmap-index -a
Now we can direct our git clients,
gix
andgit
, towards the local copy and perform a clone using the same machinery that would normally run over the network.By default
git
will just hardlink the relevant files when performing local clones. However we can nudge it to perform a proper clone by usinggit clone file://./linux linux-clone
.gitoxide
does currently not utilize hardlinking and just callinggix clone ./linux linux-clone
is enough.Since
git
now treats the clone as 'remote' with limited trust, it would force a connectivity check on the received data to assure it's not garbage which takes time and (a lot of) memory, so we disable it with the following patch on top of this commit.The patch can be applied with
git apply <PATCH_FILE>
and with that, we get:make # add -j10 for using 10 cores ./git --version ./git version 2.38.1.381.gc03801e19c.dirty
The experimental
gitoxide
CLIgix
can be installed using cargo:Understanding Performance Options
The unit of data transport in git is a
pack
which is a set of highly compressed objects. Together these objects make up the object graph of the repository which contains all commits and files tracked by git.When cloning, a
pack
with all objects required by the client is created by the server and streamed to the client.This process can be broken down into the following steps:
git
and 'all-CPUs' forgix
git
andall-CPUs
forgix
Thus we have two parameters that affect the two last stages of the clone operation, with the last stage being the fastest one, and the second to last being the one that that has the biggest impact on performance.
Gathering Results
We use
hyperfine
for obtaining the results and run it with:Note that the checkout is done onto an actual disk (SSD) to represent typical usage.
Our parameters have been chosen so that they reflect typical usage:
git
if the host has that many CPUs. It's chosen because it's known that higher numbers yield greatly diminished returns or are even reducing performance due to lock contention.git
, meaning only one file will be written at a time.gix
orgit
.Detailed test data
Output of test runs
gix clone
git
git clone
gix
Output of memory consumption tests
gix
git
Beta Was this translation helpful? Give feedback.
All reactions