🎉 Introducing gix index entries
🎉
#978
Byron
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Important
What follows is not an apples-to-apples comparison as in this version of
gix
, it will not validate the index's hash. Doing so costs significant amounts of time to the point wheregix
is only 20% faster thangit
when reading big index files. However, whenindex.skipHash
is enabled and both don't validate the hash,gix
is still 2.17x faster.gix index entries
isgitoxide
's version ofgit ls-files
, with some added features that I always wanted to have. As such, it mostly prints files that are in the index, i.e. under version control, but with a twist:By default, it also displays all attributes added via
.gitattribute
and.gitignore
files, soometimes with surprising revelations when one realises that some ignored files have been checked in and are marked with a big red cross.That alone makes it quite useful, but with the right
pathspec
it's also possible to drill in.What are all the files that need
git-lfs
one might ask with an incantation likegix index entries ':(attr:filter=lfs)'
:The wonky expression
:(attr:filter=lfs)
is actually just one way of writing apathspec
.About git-pathspecs
When interacting with
git
commands that take file path, that innocent looking path is actually a full-blownpathspec
. Most of us won't ever realise this as expressions like*
are typically expanded by the shell anyway, and whatever we do with paths just works.Or does it?
On a case-insensitive filesystem as they occour naturally on Windows or MacOS, one could try the following:
Indeed, the commit failed because the file
File
wasn't added even thoughgit add
didn't fail. What failed wasgit commit
which didn't have anything to add to the commit.With a trick, we can lure out the
pathspec
ish nature of the innocent file-path we mistyped the first letter of.But once the case is corrected, it does work:
Pathspecs
are case-sensitive by default even if the underlying filesystem is not.However, they can also do additional tricks, which are described quite concisely in the git-glossary, and I wasn't able to find a dedicated chapter akin to the one for
git-attributes
for example. To wrap up with something useful, here is how you can get everything undertests/
while ignoring shell scripts:Performance
Supporting
pathspecs
is great, but will it be fast enough? For those who want to reproduce it on their machine, there is the folded "Data" section at the bottom of the document. Note that for each run we made sure that the outputgit
andgix
match perfectly.Before we start, note that
r2k
is thegitoxide
repo with a mere 1994 files, whiler370k
is theWebKit
repository with ~370,000 of them.baseline - no pathspec
This is to see how fast it can be at best by merely dumping the index paths to standard out without doing any extra processing.
hyperfine --warmup 3 'git ls-files' 'gix index entries --no-attributes' -N
It looks like
gix
gets better the more work there is by taking only half the timegit
takes to output 370k paths.single attribute lookup
Attribute lookup is expensive as many paths have to be matched against many globs which are on top of that dependent on the input path.
r2k
=hyperfine --warmup 1 "git ls-files ':(attr:filter=lfs)'" "gix index entries --no-attributes ':(attr:filter=lfs)'" -N
r370k
=hyperfine --warmup 1 "git ls-files ':(attr:export-ignore)'" "gix index entries --no-attributes ':(attr:export-ignore)'" -N
It's strange to see that
git
is that much slower in the r370k case, as it's unlikely thatgix
matching engine is this much faster. Apparently it manages to do way less work for the same result.single attribute lookup - more fair
Since the algorithm used by
gix
in the case above, with--no-attributes
is a bit different, maybe some special optimization sneaks in. Now we leave--no-attributes
out which means thatgix
will use worktree attributes first, forcing it to touch disk each time the directory of a path to match is changing (this time turned out to be worth 70ms). Further, much likegit
,gix
it will now query all attributes instead of just a single one.hyperfine --warmup 1 "git ls-files ':(attr:export-ignore)'" "gix index entries ':(attr:export-ignore)'" -N
As expected, less optimal attribute matching bears a cost, but
gix
is still significantly faster at that. It's worth noting thatgix
now also outputs more information, which represents another cost thatgit
doesn't have, even though it's probably quite minor.single glob
Much more common than attribute
pathspecs
certainly are those with shell-globs, which is also the default globbing mode inpathspecs
.hyperfine --warmup 1 "git ls-files 'Web*'" "gix index entries --no-attributes 'Web*'" -N
It's interesting that
gix
is still is faster - it must be more optimal implementation with less losses, as both most definitely do the same amount of work.trivial prefix
Special optimisations are possible if the pathspecs have a common prefix. With it, one is able to work only on a subset of the entries in the index which can lead to significant speedups.
hyperfine --warmup 1 "git ls-files WebDriverTests" "gix index entries --no-attributes WebDriverTests" -N
In this case, the set of paths to work on is reduced to 828, and to my mind it's surprising is still takes that long. A lot of time is spent, of course, to read the entire 53MB index even though most of it remains unused.
Bonus:
--recurse-submodules
on Rust @ a39bdb1d6b9eaf23f2636baee0949d67890abcd8With
gix v0.28.0-233-gec1e5506e
on the Rust repository we see what happens if--recurse-submodules
is used, also in comparison with invocations that don't recurse into submodules.It's interesting that there is overhead in dealing with recursion into submodules, which seems to weigh significantly enough to make
gix
loose some of its performance head-room. One may wonder if this trend continues to break-even, or if no-recurse version ofgit
just has a disadvantage related to getting started.Conclusion
pathspecs
are a powerful and probably undervalued feature, which now is fully supported bygitoxide
to enable a variety of features that build on it.gix index entries [PATHSPEC ...]
is the first of many moregix
commands to come, showing how much performance we can expect to gain when using it.Q&A
Q: What's the deal with
--no-attributes
?This is necessary to make
gix index entries
comparable togit ls-files
. Without it,gix
will do a lot of extra work which will be noticeable in the very large repositories we look at here.Data
Programs
Datasets
r2k
)r370k
)Runs
Baseline - no pathspec
r2k
r370k
Single attribute lookup
r2k
r370k
Single attribute lookup - more fair
r370k
Different output due to attribute display. This prints more, too.
Removal of
--no-attributes
matches all attributes for each path, and uses worktree attributes by default.Simple single-glob
Trivial Prefix
r370k
Bonus: --recurse submodules on Rust repo
Apples vs Apples
When both aren't computing the index,
gix
is still a lot faster.Beta Was this translation helpful? Give feedback.
All reactions