gix clean
- the git clean
I never knew I wanted
#1308
Byron
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
It's a surpirse to me that ignored files will be removed by git-clean. I think it might be reasonable to put them into a trash bin instead of removing immediately? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
git clean
is a fantastic tool - it makes it possible to dissect all files in the working tree that are not tracked, and chose them for deletion. Very practical as data just keeps accumulating.git clean
- how I used itWhen running it on
gitoxide
, and I keep making this mistake, I am typically bumping against a fair reminder:Of course, how could I forget, again, to at least add
-n
to explicitly enable—dry-run
mode.No output. Now would be a good time to study the manual or add
—help
, and most of us probably have developed muscle memory that will crank up the amount of directory entries it will print, for a first overview.In the
gitoxide
repository, it prints hundreds of lines, most of them are messages like “Would skip repository…” that make its output incredibly noisy and hard to understand. One can force their removal by passing another-f
, so-ff
, but unfortunately there is no way to not display their presence in dry-run mode, with-nn
maybe. This alone makes me dread its usage. That automatic repository search in what constitutes ignored directories also isn’t free, as it now has to traverse thousands of additional directories causing visible delay.But there is more, as some of this output is probably typical for many of its runs in other repositories as well.
First off, it’s unclear why it wants to remove the entries it shows. Of course, these candidates are not tracked, but are they ignored, or are they untracked? Typically, untracked files are less likely to be removable, and ideally these are easy to discern.
Then, it says that it
Would remove .idea/
and that itWould remove .vscode/
, which are ignored directories in every repository. It’s less obvious that patterns in.gitignore
files cause files and directories to be ignored and expendable..idea/
and.vscode/
definitely should not be tracked, but unfortunately that also puts them on the default kill list ofgit clean
. Of course,git clean
isn’t at fault here, but I mention it because it’s a usability issue everyone has to content with, and… mistakes happen all the time.Despite trying hard to skip repositories, it is happy to
remove tests/fixtures/repos/linux.git
for example, which is an ignored fixture I use for local manual testing from time to time (tests/fixtures/repos/
is in.gitignore
). And it is a bare repository, whichgit clean
doesn’t detect which seems surprising given all the prior chatter about ignored repositories that it refuses to remove.Summary
Here are the shortcomings for using
git clean
for cleaning thegitoxide
working directory, roughly in order of severity..idea/
and.vscode/
)tests/fixtures/repos/linux.git
)-n
it’s easy to see nothing on screen which might keep new users puzzledEnter
gix clean
With all my grievances with
git clean
, right after finishinggix_dir::walk()
it seemed like an obvious challenge to try the ‘walk’ implementation on. And while at it, I thought I could make a few improvements on the way. First of all, the initial implementation ofgix clean
took about two days, and another day for polishing after more shortcomings ingix_dir::walk()
were unearthed, tested for and fixed. This probably speaks for the overall power held in the dirwalk implementation, which makes it easy to set tools on top.(It also seems fair to note that it took many more days on top of that to fix many more issues that I noticed after using
gix clean
more, so it's not like the dirwalk was perfectly working in every which way from the beginning.)It starts out with
-n
implied, and instead of printing nothing, it actively suggests which flags to add in order to show additional entries. Let’s go with-dx
to get closer to the typicalgit clean -nxd
invocation.(This is the entire output, nothing was truncated from it)
This looks a lot cleaner. Thanks to the use of (debatable) emoji, it makes clear which files are expandable and which are untracked. It does not show or look for repositories contained in ignored or untracked directories, but loudly tells the user about it while offering a remedy in the form of
--skip-hidden-repositories
and--find untracked-repositories
.And now, it’s definitely time to wonder why it didn’t list any o the
tests/fixtures/<bare-repository
entries, or how it could exclude the editor configuration.Let’s take another look with
-p
, showing precious files.Using the experimental feature that is ready for first implementation in Git as well, I was able to declare a subset of ignored files as precious. With that knowledge, there is no risk of accidentally nuking important ignored files anymore. For instance, to make
.vscode/
precious, edit the.gitignore
file that contains the line to ignore it, to look like this:Now
gitoxide
will see these entries as precious, while Git (or older Git once it supports it) will meaningfully see only.vscode/
and treat it as ignored and expendable like before.Going back, let’s be sure that we don’t remove repositories in untracked directories.
Now the warning message disappears, and instead if says that it
Skipped 1 repositories
, while avoiding to collapse the `untracked-with-nested-bare’ directory, instead showing which files it can delete without affecting the repository within.If we wanted to remove that repository too,
-r
can be specified.Typically I am most interested in removing generated files, so let’s use a pathspec to narrow the list we see.
What’s going on though? I would have expected two directories, so the line
WOULD remove gix-dir/tests/fixtures/generated-do-not-edit/ (🗑️)
is unexpectedly missing. The message fortunately tells us about 108 pruned entries that the pathspec doesn’t entail, and that can be shown using—debug
.When doing that one will see that the pathspec actually affects the directory walk, and wildcards in pathspecs will cause the search to continue into otherwise ignored directories, fundamentally changing the set of returned files.
Of course, we could adjust the pathspec to
*/generated-do-not-edit/*
to also list the pruned entries, but then we are back to a noisy display that originally we tried to avoid. Now alsogix clean
will detect the ignored non-bare directories, and thus fail to collapse the folder like before.In order to get the results that was originally intended, there has to be a way to apply the pathspec to displayed entries only. And with
-m
we do just that.I also added
target
to the pathspec for good measure, and believe that this is now the set to delete. Adding-e
for—execute
will do the trick.Conclusion
gix clean
is built on the newgix-dir
crate which provides a configurable Git-style directory walk. Its 1100SLOC are supported by more than 4000SLOC of tests to pin-point every bit of logic, and yet, there is probably still a test or two missing as real-world usage might point out later.While working on it, I gained an even greater respect for the accomplishment of
git clean
as is, given how much it gets right, and how many edge-cases it handles gracefully. This was particularly noticeable when I simply had to keep adding features to merely be en-par with what Git can already do.At some point I did also realise how many shortcuts Git takes in the name of optimisations, and how this actually hurts the general-purposes-ness of the library that
gix-dir
is going to be. Step by stepgitoxide
’s model evolved to become more general. Ultimately this helpedgix clean
to gain access to much richer and consistent information about the directory, and to keep its own logic simple. And with that simplicity as baseline, it was possible to add complexity for a better user experience.Hopefully, this is also just the beginning :).
Bonus - CWD handling
git clean
will assure that it won’t delete the directory you are currently in, showing incredible attention to detail and love for the command-line.Expand for CWD details
gix clean
inherited this, while keeping up its reporting detail.When entering a populated directory, the result is a little bit different.
Now Git will expand the entries in our current working directory,
target/debug
, but displays it in an unusual form, I would have expected the./.fingerprint
notation for example, not../debug/.fingerprint
.Compare this to
gix clean
:It also prevented the collapse of the current working directory, but displays paths as we would expect.
Bonus - Performance
This is clearly not an apples to apples comparison as the logic of
gix clean
is different in one major way: it won’t search for hidden repositories by default. But with that enabled, how will it fare?Expand for Benchmarks
gitoxide
repository@aa7c1908b82e3e23859a4c663faa40ec54611919 - default settingsIn order to get similar state, run
cargo test -p gix dir && cargo test -p gix
.In a way, the out-of-the-box performance with default settings matters to me, even though it’s not a fair comparison. That one follows.
gix
is a lot faster here because it does way less work out of the box.gitoxide
repository@aa7c1908b82e3e23859a4c663faa40ec54611919 - apples-to-applesHere it reads 3065 directories and sees 36799 filesystem entries. Git also sees 3065 directories along with 37516 entries, but takes significantly longer for that, for some reason. Maybe it’s related to the way it detects non-bare repositories?
linux
@ffc2532This is a clean working tree.
On bigger repositories, Git seems to gain the advantage. It traverses 5317 directories with 87071 entries.
gix
sees the same amount of directories with 81755 entries, but takes longer overall, particularly in user land.When disabling a few bottlenecks of the
gitoxide
implementation,ignoreCase
andprecomposeUnicode
, we see this:That’s just a little better, this round clearly goes to Git :).
WebKit @ 886077e077a496a6e398df52a4b7915d8cd68f76
This is a plain checkout without changes to the working tree. However, I have turned off
core.ignoreCase
andcore.precomposeUnicode
in the repository configuration as it makes a difference for both applications (but more so forgix
).Here Gix is just a little bit slower, but that gets much worse once the aforementioned flags are turned on.
Now Git is 1.4 times faster even, and it’s good to know that
gix
still has some untapped potential for performance here. Git is highly optimised in this area, and even uses multi-threading in a place wheregix
doesn’t (yet).Git @ 3c2a3fd
The working tree is the one after a build, with plenty of build-artifacts littered everywhere.
Despite taking more CPU time in user land,
gix
takes much less time of the system for some reason, to be a little faster over all.Verdict
It seems that for small to medium repositories,
gix clean
will be a little faster, while being up to 50% slower on very large repositories.Q&A
Precious Files?
In my own words, precious files are ‘the missing class’ of ignored files. Currently, all ignored files are also expendable. But precious files are ignored and… not expendable which makes them closer to untracked files.
They can be specified by prefixing a pattern in
.gitignore
files with$
. Git (or older Git when it supports that) sees these entries verbatim, butgix
will know it’s precious. If it sees such a line before one that marks it as expendable, it will have the desired effect.To retrofit a
.gitignore
file that works normally for all, let the precious pattern follow the expendable one.The technical document for all the details is available on the mailing list or on GitHub, authored by Elijah Newren.
Should I use
gix clean
?I think using it makes a major difference once precious files are leveraged. Because now everything will look so much cleaner that you wouldn’t want to miss it anymore. Without precious files it makes much less of a difference, but is still easier to use overall particularly with the
-m
flag.You can download the latest release here to give it a try: https://github.com/Byron/gitoxide/releases/tag/v0.34.0
Beta Was this translation helpful? Give feedback.
All reactions