[January]: “Precious” files and core.precomposeUnicode
support
#1304
Byron
announced in
Progress Update
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This month, was surprising as it felt I was always busy, yet there isn’t much to show for it, not at all. So what happened? This time I am desperate enough to actually look at my timesheet to see what happened between the 22nd of December and 22nd of January.
The good news is that I did manage to put in 130h of work, of which 95 were for open-source. From these, it’s 55 for
gitoxide
, 21 for general maintenance, 9 for GitPython, and 5 forduo-cli
, 1 for Google-APIs and another hour forstarship
,helix-editor
andcargo
specific maintenance, probablygix
upgrades. Thus, 43% of my open-source time went into maintenance, quite a high value that’s not entirely unsurprising given how active GitPython anddua-cli
both felt.Now that this is out of the way and I can feel better about myself 😁, let’s dig in.
Precious Files
When using
.gitignore
patterns to mark files as ignored, what really happens is that they are classified as expendable. Expandable files won’t be tracked by default, but they also will be removed the next time Git gets a chance. Think about all the ignored-and-local editor configuration we must have lost over the years thanks to a thoughtlessgit clean -fxd
.Precious files are the solution to the problem as they are another class of ignored files, which Git will neither track by default, nor will it remove them. Hence it treats them like untracked files during worktree mutations, and as ignored anytime else.
The proposition is from the beautiful mind of Josh Triplett, who, as it turns out, wasn’t the first one to have the idea. More than 10 years ago the conversation was started already, but lost in the sands of time as well.
Josh and I, alongside Elijah Newren, a long-time Git contributor, managed to reinvigorate the feature, sketch out a technical document to bring it to the point where it can be implemented in Git itself 🎉.
gitoxide
is already able to parse the new$precious
syntax, of course, even though it still has no opportunity to make use of this capability as neither worktree resets nor index updates are implemented yet.The next step for me is to start the precious files implementation, in C, for Git, and I am excited and afraid at the same time 😅.
Precompose Unicode
The idea is very simple: Unicode is powerful, so powerful that there isn’t one right way to encode a string, but many different ones. One common difference is how unicode characters are composed. The umlaut
”ä”
for example is precomposed, it’s a single code-point telling us it’s the “Umlaut for A”. Another form though is the decomposed one, which looks like”a\u{308}”, two code points, an
a` and the double-dot on top of it.Some filesystems, and as far as I know only the one on Apple platforms, treat decomposed and precomposed unicode the same. So if a file is called
ä
, it can be read by describing it asä
anda\u{308}
alike. That’s great, actually, but the problem is that the Finder tends to decompose unicode, so creating a file calledä
will actually create a file known asa\u{308|
. When Git sees this file, for example when adding it to the repository, it will see exactly how it exists on the file-system which can be decomposed unicode. On other platforms that’s uncommon though, which can be the cause of confusion at the very least.Thus, Git will precompose all decomposed files as it receives them as input from command-line arguments, the current working directory, and when traversing files as well.
gitoxide
now respectscore.precomposeUnicode
and assures that whenever files are involved, quite common when handling references, it will also fold precomposed and decomposed strings so one will encounter the expected behaviour. Before setting this up there actually were subtle bugs that could be triggered in specifically written tests, so I am happy thatgix-ref
in particular has seen the amount of work that it did.Community
Improvements to Unsafe - how to do it right
gitoxide
has a couple of usages ofunsafe
, and they all stem from parallelisation. Most of these are just a couple of lines which are quite straightforward to vet. One of these though has it all, as it’s not less than the algorithm that makes clones and fetches incredibly fast and efficient. The idea is that one builds a tree of inter-dependent pack entries which are then resolved to obtain their hashes from which one will build the index that allows fast access to the received pack later.The computation, decompressing entries, applying deltas, and hashing the result, is about as involved as it sounds and thus distributed to all cores of the machine in a lock-free fashion while doing all work only once.
What sounds simple becomes quite complicated when
unsafe
is involved and one effectively has to try and prove to fellow humans that this is indeed sound.Fortunately, Manish swooped in and saved the day. The idea is to track invariants through all
unsafe
code blocks, while minimising them to leverage the normal capabilities of the Rust compiler to the greatest extent.Overall I do admit that I still wouldn’t claim that I am able to do
unsafe
correctly, and simply hope I won’t have to use it again. But when I do, I will be sure to revisit Manish’s PR for a little chance to do better next time.Gix in Cargo
The following I wrote last month:
And of course, it came very differently as I got sucked into getting
core.precomposeUnicode
support done instead, a topic which was discovered naturally when starting to implement git-style dir-walking after having written all the tests I could think of. On the bright side, the nextgitoxide
session will pick up right where I left off and I’d expect to finally get that done, it’s nothing short of exciting to finally be able to implement it.Cheers,
Sebastian
PS: The latest timesheets can be found here (2024) and here (2023).
Beta Was this translation helpful? Give feedback.
All reactions